R

Reticulate - Leveraging Python from R

Python and R, besties forever...

Data Science Altitude for This Article: Camp One. If you’re a Python developer that has been thrust into an R working environment or an R developer that would like to try out Python packages and methods in the comfort of an already-familiar RStudio IDE, then Reticulate is the package for you. Using rmarkdown, you’re able to knit R and Python code blocks into a unified whole and can refer to objects across the language barrier.

Probabilistic Topic Models and Latent Dirichlet Allocation: Part 5

From Model Formation to Conclusions and a Critique of Process. A drum roll, please...

Data Science Altitude for This Article: Camp Two. So, all the pieces on the chessboard are in their strategic locations. We’ve identified a set of papers from which we want to identify thematic intent, taking The Federalist Papers directly from the Project Gutenberg site. We’ve cleaned them up, removing common words and metadata, and have formatted them into a DocumentTermMatrix. We then pulled that object into a Latent Dirichlet Allocation (LDA) model as defined in the topicmodels package and took a look at some of the high-level mathematics involved and the resulting object’s composition.

Probabilistic Topic Models and Latent Dirichlet Allocation: Part 4

From Data Formatting to Model Formation and Object Characteristics. All this effort is about to pay off...

Data Science Altitude for This Article: Camp Two. Previously, we created a DocumentTermMatrix for the express purpose of its fitting in nicely with our upcoming LDA model formation. Here, we’ll discuss LDA in a bit of detail and dive into our findings. But first, a brief refresher on the format and content of the DT matrix. The word count for the first ten stemmed words out of the first eight documents and their aggregation for documents 9-85 are seen below:

Probabilistic Topic Models and Latent Dirichlet Allocation: Part 3

From Data Cleaning to Data Formatting: Finding a statue within a block of marble.

Data Science Altitude for This Article: Camp Two. Previously, we removed a bunch of metadata from The Federalist Papers that was introduced from its being hosted by the team at The Gutenberg Project. After that, we took out much of the intra-document metadata that was explanatory in nature to each of the 85 essays. Now, our goal is to polish off the metadata removal and transition the original unstructured data into object types that are more conducive to numerical analysis.

R and RStudio Install: The User Experience, Part 2

Time to get your hands dirty...

Data Science Altitude for This Article: Base Camp. Today’s post goes into the ‘meat and potatoes’ of the install process for R. If you look to get more involved in a data science career or a more classic application development job in Java, .NET, or something similar, it’s important to develop a comfort level with installing software on your own machine. You’ll have to do it repeatedly over your career.

R and RStudio Install: The User Experience, Part 1

A step-through of both installation processes, with a smile.

Data Science Altitude for This Article: Sea Level. Today’s post is for those of you that are looking to install R and RStudio on your home machine for the first time, and for those that don’t (yet) have a comfort level with installing software in general. I’ll provide links to others’ guides and documentation that provides a lot of detail; I could try that out myself, but I’m not here to re-invent someone else’s wheel.

An Introduction to Vectors

One of the workhorses of the R language.

Data Science Altitude for this Article: Sea Level. Today’s post is about vectors, one of the most common object types in R. It’s designed to be low-level introductory subject matter (thus our ‘Sea Level’ altitude on the mountain) for those that are are at the ‘curio sity’ stage of Data Science: what is it, how does it work, how do I go about getting my feet wet… That sort of thing.