Probabilistic Topic Models and Latent Dirichlet Allocation: Part 5

From Model Formation to Conclusions and a Critique of Process. A drum roll, please...

Data Science Altitude for This Article: Camp Two. So, all the pieces on the chessboard are in their strategic locations. We’ve identified a set of papers from which we want to identify thematic intent, taking The Federalist Papers directly from the Project Gutenberg site. We’ve cleaned them up, removing common words and metadata, and have formatted them into a DocumentTermMatrix. We then pulled that object into a Latent Dirichlet Allocation (LDA) model as defined in the topicmodels package and took a look at some of the high-level mathematics involved and the resulting object’s composition.