Data Science Altitude for This Article: Camp Two. Previously, we removed a bunch of metadata from The Federalist Papers that was introduced from its being hosted by the team at The Gutenberg Project. After that, we took out much of the intra-document metadata that was explanatory in nature to each of the 85 essays. Now, our goal is to polish off the metadata removal and transition the original unstructured data into object types that are more conducive to numerical analysis.
Data Science Altitude for This Article: Camp Two. Our last post set the stage for what it’s going to take for us to end up at our desired conclusion: a programmatic assessment of topics gleaned from The Federalist Papers. Before we can throw some fancy mathematics at the subject matter, we have to get the data to the point where it’s conducive to analysis. In this era of data coming at us from multiple sources and in structured and unstructured formats, we have to be versatile in our coding skills to deal with data in whatever manner it comes.
Data Science Altitude for This Article: Camp Two. For our next post, we tackle something a little more difficult than in the past posts: the use of Probabilistic Topic Modeling for thematic assessment in literature. Got a lot of documents that you’re trying to condense down into manageable themes? From an academic perspective, you might be interested in thematic constructs in one or more of Mark Twain’s works, perhaps. Or from a socio-political perspective - and something a little more present-day - you might want to identify recurring themes in political speeches and see how they track among candidates for a particular office.
Data Science Altitude for This Article: Base Camp. In the prior two posts, we first took the temperature of the R and RStudio user community and then we installed R. Now we’re ready to download the RStudio installer executable file and take a look at some of the functionality inside the IDE. The RStudio Product Page However, before we do that, please take note of the RStudio product page.
Data Science Altitude for This Article: Base Camp. Today’s post goes into the ‘meat and potatoes’ of the install process for R. If you look to get more involved in a data science career or a more classic application development job in Java, .NET, or something similar, it’s important to develop a comfort level with installing software on your own machine. You’ll have to do it repeatedly over your career.
Data Science Altitude for This Article: Sea Level. Today’s post is for those of you that are looking to install R and RStudio on your home machine for the first time, and for those that don’t (yet) have a comfort level with installing software in general. I’ll provide links to others’ guides and documentation that provides a lot of detail; I could try that out myself, but I’m not here to re-invent someone else’s wheel.
Data Science Altitude for this Article: Sea Level. Today’s post is about vectors, one of the most common object types in R. It’s designed to be low-level introductory subject matter (thus our ‘Sea Level’ altitude on the mountain) for those that are are at the ‘curio sity’ stage of Data Science: what is it, how does it work, how do I go about getting my feet wet… That sort of thing.
Welcome to the inagural post for An Ascent Of Analytics. Our journey awaits! Not all pathways up the mountain of Data Science are as fiery as in Pierre-Jacques Volaire’s The Eruption of Mount Vesuvius (1777), but we’ll see what we can do to keep the sunscreen to a minimum. Maybe some pictures of snow later on will help… This blog, once it gets off the ground, is aimed primarily at helping out anyone that feels as if the acquisition of skills to practice Data Science is comparable with a summit of Mount Everest.