Probabilistic Topic Models and Latent Dirichlet Allocation: Part 5

From Model Formation to Conclusions and a Critique of Process. A drum roll, please...

Data Science Altitude for This Article: Camp Two.


So, all the pieces on the chessboard are in their strategic locations. We’ve identified a set of papers from which we want to identify thematic intent, taking The Federalist Papers directly from the Project Gutenberg site. We’ve cleaned them up, removing common words and metadata, and have formatted them into a DocumentTermMatrix. We then pulled that object into a Latent Dirichlet Allocation (LDA) model as defined in the topicmodels package and took a look at some of the high-level mathematics involved and the resulting object’s composition.

Now we come to the payoff. What did the model find? How hard are the findings to visualize and interpret? Are there revisions or improvements that can be made to the process? In order to make this a bit more transparent, I offer up one of my GitHub repos for you to pull down the code, take a look at my original paper, and try this out for yourself if you so choose. By all means, find places to improve upon it.

The only differences you’ll find between here there are an additional set of objects for consideration of a three-topic structure. The variable names between the two sets should be pretty clear. The code here is worked around a little bit to make it more conducive to a presentation using the kable() function in the knitr package as in the last post. I’m also using some pretty cool formatting enhancers in the kableExtra package.

As we go, I’ll show you a small bit of ‘kable table’ magic used here to pique your interest. If you use RMarkdown I’d highly recommend it. Otherwise, I think it’ll get into the way of our assessment if I show it all the time as it makes the code a little more bulky for discussion…


Most Likely Topic for each of the 85 Federalist Papers

The call to the topicmodels function topics() provides the most likely topic that a document belongs to. It’s very likely basing it upon the gamma object that we’ll detail in a later section. The function call returns a simple vector of topic numbers, one for each document. The first ten Federalist Papers look to be distributed between topics 2 and 4 as their most likely topic. This will also provide us our counts of papers-per-topic in the ‘Paper Count’ grouping in one of the tables below.

#docs to topics
ldaOut.topics4 <- as.matrix(topics(ldaOut4))
str(ldaOut.topics4)
##  int [1:85, 1] 2 2 2 4 4 4 4 4 4 2 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:85] "1" "2" "3" "4" ...
##   ..$ : NULL

Determining Thematic Intent - Top Terms by Topic:

Now we’re getting to a question you may have been asking yourselves as we’ve stepped through this process - how do we identify the themes involved with these topic numbers? We told the model we wanted to search for 4 topics, but how do we put an identity to them? That’s where the base R terms() function comes into play, returning for us an ordered collection of the most common words for each topic when supplied an LDA object.

From there, it’s up to us to use those words and their ordering to put a flashy title on what that theme is for each topic. I’ve tried to do so given the top ten words provided. You might come up with a different set, or might use 15 instead of 10 to make that decision. Once you pull down the code you can give it a try yourself…

But first, a word of warning! The topic numbers that the LDA() call comes up with won’t necessarily match up with the same topic numbers that the serVis() call will come up with when we perform our visualizations later on in this post. But the grouping of words within a topic is the same. I’ve labled them for you so later commentary will make sense. Now for some details…


#top 10 terms in each topic
ldaOut.terms4 <- as.matrix(terms(ldaOut4,10))
colnames(ldaOut.terms4) <- c("LDA Topic 4.1", "LDA Topic 4.2", "LDA Topic 4.3", "LDA Topic 4.4")
LDAVisTopics <- c("LDAVis Topic 4.3", "LDAVis Topic 4.1", "LDAVis Topic 4.2", "LDAVis Topic 4.4")

# Overall topic counts for all 85 papers broken into four topics, plus
# kable table' magic using a pastel hue and alternating row colors.
shadeColor = '#D5F5E0'
kable(rbind(LDAVisTopics, ldaOut.terms4, c(rep("Paper Count",4)), 
            table(ldaOut.topics4)), row.names = FALSE) %>%
      row_spec(0:1, bold = T, background = shadeColor) %>%
      row_spec(seq(2,10,2), background = "beige") %>% 
      row_spec(12:13, bold = T, background = shadeColor)
LDA Topic 4.1 LDA Topic 4.2 LDA Topic 4.3 LDA Topic 4.4
LDAVis Topic 4.3 LDAVis Topic 4.1 LDAVis Topic 4.2 LDAVis Topic 4.4
execut will state nation
power govern power union
legisl may constitut war
bodi peopl law countri
constitut state author part
senat repres case confederaci
might interest court natur
appoint must nation forc
offic object may foreign
upon number unit danger
Paper Count Paper Count Paper Count Paper Count
17 31 17 20

Putting a Theme to our Topic Numbers

After taking a look at the words associated with the topics, and browsing through some of the highest-probability papers allocated to them below, I tried my hand at putting one potential theme to each of the topic numbers coming out of the LDA object. Again, this is but one possible interpretation.

  • LDA Topic 4.1 concerns itself with the role of the executive branch as compared to the legislative branch in a balance of powers.

  • LDA Topic 4.2 looks to be focused on the government’s role in handling differing interests.

  • LDA Topic 4.3 focuses on the law and the power of the states to execute them.

  • LDA Topic 4.4 is plainly about the role of the government in foreign relations and during conflicts.

Themes for our Topics
4.1: Powers and Limitations of the Executive Branch
4.2: Structure Amidst Differing Agendas
4.3: A Nation of Laws
4.4: Internal/External Relations and Conflict

Gamma: the Probability Distribution of our Themes by Document

As we noted in our last post, the gamma component of the LDA object tells us the probabilities that a paper belongs in a specific topic. I’ve prettied it up a bit below in the table follwing the code segment, but gamma corresponds to columns pTopic1 through pTopic4. As the functions in the topicmodels package evolve past the version we’ve used here (0.2-8) over time, these could change a little bit. And they certainly will change somewhat if you go back and attempt some more text cleaning…

#probabilities associated with each topic assignment (85 rows, 4 columns)
head(round(as.data.frame(ldaOut4@gamma),3))
##      V1    V2    V3    V4
## 1 0.181 0.353 0.218 0.248
## 2 0.156 0.350 0.174 0.320
## 3 0.209 0.333 0.189 0.270
## 4 0.142 0.260 0.098 0.501
## 5 0.159 0.228 0.101 0.512
## 6 0.198 0.195 0.115 0.492

I’ve highlighted the data in the ‘Topic’ column here. It’s set to the highest-probability topic for that paper as seen in gamma and is the same as if you’d used topics() as we did earlier. For Federalist #1, the highest probability is that it belongs to topic number 2.

Now I think we’re ready for some observations about our choice of themes and the papers that most represent them.

Paper pTopic1 pTopic2 pTopic3 pTopic4 Topic# TopicDesc
Fed 1:Hamiltion 0.181 0.353 0.218 0.248 2 Structure Amidst Differing Agendas
Fed 2:Jay 0.156 0.350 0.174 0.320 2 Structure Amidst Differing Agendas
Fed 3:Jay 0.209 0.333 0.189 0.270 2 Structure Amidst Differing Agendas
Fed 4:Jay 0.142 0.260 0.098 0.501 4 Internal/External Relations and Conflict
Fed 5:Jay 0.159 0.228 0.101 0.512 4 Internal/External Relations and Conflict
Fed 6:Hamilton 0.198 0.195 0.115 0.492 4 Internal/External Relations and Conflict
Fed 7:Hamilton 0.107 0.253 0.274 0.365 4 Internal/External Relations and Conflict
Fed 8:Hamilton 0.068 0.276 0.129 0.526 4 Internal/External Relations and Conflict
Fed 9:Hamilton 0.198 0.283 0.218 0.301 4 Internal/External Relations and Conflict
Fed 10:Madison 0.131 0.693 0.079 0.096 2 Structure Amidst Differing Agendas
Fed 11:Hamilton 0.097 0.229 0.122 0.552 4 Internal/External Relations and Conflict
Fed 12:Hamilton 0.080 0.312 0.146 0.462 4 Internal/External Relations and Conflict
Fed 13:Hamilton 0.086 0.424 0.198 0.292 2 Structure Amidst Differing Agendas
Fed 14:Madison 0.086 0.461 0.135 0.318 2 Structure Amidst Differing Agendas
Fed 15:Hamilton 0.211 0.238 0.238 0.313 4 Internal/External Relations and Conflict
Fed 16:Hamilton 0.191 0.277 0.185 0.347 4 Internal/External Relations and Conflict
Fed 17:Hamilton 0.209 0.351 0.192 0.248 2 Structure Amidst Differing Agendas
Fed 18:Madison 0.171 0.132 0.133 0.564 4 Internal/External Relations and Conflict
Fed 19:Madison 0.127 0.166 0.189 0.518 4 Internal/External Relations and Conflict
Fed 20:Madison 0.230 0.210 0.174 0.385 4 Internal/External Relations and Conflict
Fed 21:Hamilton 0.077 0.272 0.379 0.271 3 A Nation of Laws
Fed 22:Hamilton 0.193 0.296 0.233 0.277 2 Structure Amidst Differing Agendas
Fed 23:Hamilton 0.124 0.282 0.349 0.244 3 A Nation of Laws
Fed 24:Hamilton 0.143 0.212 0.290 0.355 4 Internal/External Relations and Conflict
Fed 25:Hamilton 0.139 0.217 0.256 0.388 4 Internal/External Relations and Conflict
Fed 26:Hamilton 0.168 0.257 0.263 0.313 4 Internal/External Relations and Conflict
Fed 27:Hamilton 0.232 0.356 0.219 0.193 2 Structure Amidst Differing Agendas
Fed 28:Hamilton 0.100 0.348 0.206 0.346 2 Structure Amidst Differing Agendas
Fed 29:Hamilton 0.165 0.243 0.260 0.332 4 Internal/External Relations and Conflict
Fed 30:Hamilton 0.140 0.247 0.297 0.316 4 Internal/External Relations and Conflict
Fed 31:Hamilton 0.159 0.270 0.369 0.202 3 A Nation of Laws
Fed 32:Hamilton 0.110 0.099 0.733 0.059 3 A Nation of Laws
Fed 33:Hamilton 0.150 0.158 0.597 0.095 3 A Nation of Laws
Fed 34:Hamilton 0.142 0.242 0.291 0.325 4 Internal/External Relations and Conflict
Fed 35:Hamilton 0.149 0.413 0.169 0.269 2 Structure Amidst Differing Agendas
Fed 36:Hamilton 0.174 0.329 0.345 0.152 3 A Nation of Laws
Fed 37:Madison 0.183 0.481 0.200 0.136 2 Structure Amidst Differing Agendas
Fed 38:Madison 0.213 0.279 0.283 0.225 3 A Nation of Laws
Fed 39:Madison 0.253 0.400 0.295 0.053 2 Structure Amidst Differing Agendas
Fed 40:Madison 0.178 0.215 0.519 0.088 3 A Nation of Laws
Fed 41:Madison 0.074 0.273 0.325 0.328 4 Internal/External Relations and Conflict
Fed 42:Madison 0.105 0.153 0.600 0.142 3 A Nation of Laws
Fed 43:Madison 0.104 0.326 0.399 0.172 3 A Nation of Laws
Fed 44:Madison 0.164 0.139 0.608 0.089 3 A Nation of Laws
Fed 45:Madison 0.138 0.448 0.242 0.171 2 Structure Amidst Differing Agendas
Fed 46:Madison 0.123 0.545 0.164 0.168 2 Structure Amidst Differing Agendas
Fed 47:Madison 0.664 0.099 0.199 0.037 1 Powers and Limitations of the Executive Branch
Fed 48:Madison 0.443 0.306 0.182 0.070 1 Powers and Limitations of the Executive Branch
Fed 49:Disputed 0.294 0.399 0.237 0.070 2 Structure Amidst Differing Agendas
Fed 50:Disputed 0.301 0.356 0.202 0.141 2 Structure Amidst Differing Agendas
Fed 51:Disputed 0.219 0.546 0.143 0.093 2 Structure Amidst Differing Agendas
Fed 52:Disputed 0.176 0.540 0.216 0.068 2 Structure Amidst Differing Agendas
Fed 53:Disputed 0.196 0.525 0.203 0.076 2 Structure Amidst Differing Agendas
Fed 54:Disputed 0.124 0.580 0.255 0.041 2 Structure Amidst Differing Agendas
Fed 55:Disputed 0.160 0.602 0.144 0.093 2 Structure Amidst Differing Agendas
Fed 56:Disputed 0.091 0.654 0.168 0.088 2 Structure Amidst Differing Agendas
Fed 57:Disputed 0.184 0.558 0.164 0.093 2 Structure Amidst Differing Agendas
Fed 58:Disputed 0.168 0.607 0.142 0.083 2 Structure Amidst Differing Agendas
Fed 59:Hamilton 0.249 0.332 0.293 0.126 2 Structure Amidst Differing Agendas
Fed 60:Hamilton 0.219 0.475 0.147 0.159 2 Structure Amidst Differing Agendas
Fed 61:Hamilton 0.159 0.445 0.271 0.125 2 Structure Amidst Differing Agendas
Fed 62:Disputed 0.238 0.458 0.163 0.141 2 Structure Amidst Differing Agendas
Fed 63:Disputed 0.208 0.550 0.122 0.120 2 Structure Amidst Differing Agendas
Fed 64:Jay 0.454 0.274 0.163 0.109 1 Powers and Limitations of the Executive Branch
Fed 65:Hamilton 0.385 0.266 0.261 0.089 1 Powers and Limitations of the Executive Branch
Fed 66:Hamilton 0.509 0.251 0.191 0.049 1 Powers and Limitations of the Executive Branch
Fed 67:Hamilton 0.488 0.116 0.326 0.069 1 Powers and Limitations of the Executive Branch
Fed 68:Hamilton 0.460 0.349 0.099 0.092 1 Powers and Limitations of the Executive Branch
Fed 69:Hamilton 0.446 0.099 0.347 0.107 1 Powers and Limitations of the Executive Branch
Fed 70:Hamilton 0.437 0.280 0.154 0.128 1 Powers and Limitations of the Executive Branch
Fed 71:Hamilton 0.495 0.307 0.089 0.109 1 Powers and Limitations of the Executive Branch
Fed 72:Hamilton 0.439 0.234 0.121 0.207 1 Powers and Limitations of the Executive Branch
Fed 73:Hamilton 0.432 0.255 0.175 0.139 1 Powers and Limitations of the Executive Branch
Fed 74:Hamilton 0.402 0.183 0.228 0.187 1 Powers and Limitations of the Executive Branch
Fed 75:Hamilton 0.447 0.266 0.149 0.137 1 Powers and Limitations of the Executive Branch
Fed 76:Hamilton 0.543 0.248 0.160 0.048 1 Powers and Limitations of the Executive Branch
Fed 77:Hamilton 0.502 0.278 0.176 0.045 1 Powers and Limitations of the Executive Branch
Fed 78:Hamilton 0.345 0.218 0.347 0.090 3 A Nation of Laws
Fed 79:Hamilton 0.308 0.269 0.289 0.133 1 Powers and Limitations of the Executive Branch
Fed 80:Hamilton 0.096 0.132 0.674 0.099 3 A Nation of Laws
Fed 81:Hamilton 0.178 0.130 0.653 0.039 3 A Nation of Laws
Fed 82:Hamilton 0.083 0.124 0.738 0.054 3 A Nation of Laws
Fed 83:Hamilton 0.113 0.184 0.655 0.048 3 A Nation of Laws
Fed 84:Hamilton 0.167 0.275 0.471 0.087 3 A Nation of Laws
Fed 85:Hamilton 0.186 0.316 0.313 0.185 2 Structure Amidst Differing Agendas


Most Likely Papers for Topic 1 - Theme: Powers and Limitations of the Executive Branch

Let’s now order these by decreasing probability of being in topic one. I’ve identified the titles of the first five papers that have the highest probability. From them, I think the theme chosen is highly representative of the subject matter.

  • Federalist 47: The Particular Structure of the New Government and the Distribution of Power Among Its Different Parts

  • Federalist 76: The Appointing Power of the Executive

  • Federalist 77: The Appointing Power Continued and Other Powers of the Executive Considered

  • Federalist 71: The Duration in Office of the Executive

  • Federalist 67: The Executive Department

Paper pTopic1 pTopic2 pTopic3 pTopic4 Topic# TopicDesc
Fed 47:Madison 0.664 0.099 0.199 0.037 1 Powers and Limitations of the Executive Branch
Fed 76:Hamilton 0.543 0.248 0.160 0.048 1 Powers and Limitations of the Executive Branch
Fed 66:Hamilton 0.509 0.251 0.191 0.049 1 Powers and Limitations of the Executive Branch
Fed 77:Hamilton 0.502 0.278 0.176 0.045 1 Powers and Limitations of the Executive Branch
Fed 71:Hamilton 0.495 0.307 0.089 0.109 1 Powers and Limitations of the Executive Branch
Fed 67:Hamilton 0.488 0.116 0.326 0.069 1 Powers and Limitations of the Executive Branch
Fed 68:Hamilton 0.460 0.349 0.099 0.092 1 Powers and Limitations of the Executive Branch
Fed 64:Jay 0.454 0.274 0.163 0.109 1 Powers and Limitations of the Executive Branch
Fed 75:Hamilton 0.447 0.266 0.149 0.137 1 Powers and Limitations of the Executive Branch
Fed 69:Hamilton 0.446 0.099 0.347 0.107 1 Powers and Limitations of the Executive Branch
Fed 48:Madison 0.443 0.306 0.182 0.070 1 Powers and Limitations of the Executive Branch
Fed 72:Hamilton 0.439 0.234 0.121 0.207 1 Powers and Limitations of the Executive Branch
Fed 70:Hamilton 0.437 0.280 0.154 0.128 1 Powers and Limitations of the Executive Branch
Fed 73:Hamilton 0.432 0.255 0.175 0.139 1 Powers and Limitations of the Executive Branch
Fed 74:Hamilton 0.402 0.183 0.228 0.187 1 Powers and Limitations of the Executive Branch
Fed 65:Hamilton 0.385 0.266 0.261 0.089 1 Powers and Limitations of the Executive Branch
Fed 78:Hamilton 0.345 0.218 0.347 0.090 3 A Nation of Laws
Fed 79:Hamilton 0.308 0.269 0.289 0.133 1 Powers and Limitations of the Executive Branch
Fed 50:Disputed 0.301 0.356 0.202 0.141 2 Structure Amidst Differing Agendas
Fed 49:Disputed 0.294 0.399 0.237 0.070 2 Structure Amidst Differing Agendas
Fed 39:Madison 0.253 0.400 0.295 0.053 2 Structure Amidst Differing Agendas
Fed 59:Hamilton 0.249 0.332 0.293 0.126 2 Structure Amidst Differing Agendas
Fed 62:Disputed 0.238 0.458 0.163 0.141 2 Structure Amidst Differing Agendas
Fed 27:Hamilton 0.232 0.356 0.219 0.193 2 Structure Amidst Differing Agendas
Fed 20:Madison 0.230 0.210 0.174 0.385 4 Internal/External Relations and Conflict
Fed 51:Disputed 0.219 0.546 0.143 0.093 2 Structure Amidst Differing Agendas
Fed 60:Hamilton 0.219 0.475 0.147 0.159 2 Structure Amidst Differing Agendas
Fed 38:Madison 0.213 0.279 0.283 0.225 3 A Nation of Laws
Fed 15:Hamilton 0.211 0.238 0.238 0.313 4 Internal/External Relations and Conflict
Fed 3:Jay 0.209 0.333 0.189 0.270 2 Structure Amidst Differing Agendas
Fed 17:Hamilton 0.209 0.351 0.192 0.248 2 Structure Amidst Differing Agendas
Fed 63:Disputed 0.208 0.550 0.122 0.120 2 Structure Amidst Differing Agendas
Fed 6:Hamilton 0.198 0.195 0.115 0.492 4 Internal/External Relations and Conflict
Fed 9:Hamilton 0.198 0.283 0.218 0.301 4 Internal/External Relations and Conflict
Fed 53:Disputed 0.196 0.525 0.203 0.076 2 Structure Amidst Differing Agendas
Fed 22:Hamilton 0.193 0.296 0.233 0.277 2 Structure Amidst Differing Agendas
Fed 16:Hamilton 0.191 0.277 0.185 0.347 4 Internal/External Relations and Conflict
Fed 85:Hamilton 0.186 0.316 0.313 0.185 2 Structure Amidst Differing Agendas
Fed 57:Disputed 0.184 0.558 0.164 0.093 2 Structure Amidst Differing Agendas
Fed 37:Madison 0.183 0.481 0.200 0.136 2 Structure Amidst Differing Agendas
Fed 1:Hamiltion 0.181 0.353 0.218 0.248 2 Structure Amidst Differing Agendas
Fed 40:Madison 0.178 0.215 0.519 0.088 3 A Nation of Laws
Fed 81:Hamilton 0.178 0.130 0.653 0.039 3 A Nation of Laws
Fed 52:Disputed 0.176 0.540 0.216 0.068 2 Structure Amidst Differing Agendas
Fed 36:Hamilton 0.174 0.329 0.345 0.152 3 A Nation of Laws
Fed 18:Madison 0.171 0.132 0.133 0.564 4 Internal/External Relations and Conflict
Fed 26:Hamilton 0.168 0.257 0.263 0.313 4 Internal/External Relations and Conflict
Fed 58:Disputed 0.168 0.607 0.142 0.083 2 Structure Amidst Differing Agendas
Fed 84:Hamilton 0.167 0.275 0.471 0.087 3 A Nation of Laws
Fed 29:Hamilton 0.165 0.243 0.260 0.332 4 Internal/External Relations and Conflict
Fed 44:Madison 0.164 0.139 0.608 0.089 3 A Nation of Laws
Fed 55:Disputed 0.160 0.602 0.144 0.093 2 Structure Amidst Differing Agendas
Fed 5:Jay 0.159 0.228 0.101 0.512 4 Internal/External Relations and Conflict
Fed 31:Hamilton 0.159 0.270 0.369 0.202 3 A Nation of Laws
Fed 61:Hamilton 0.159 0.445 0.271 0.125 2 Structure Amidst Differing Agendas
Fed 2:Jay 0.156 0.350 0.174 0.320 2 Structure Amidst Differing Agendas
Fed 33:Hamilton 0.150 0.158 0.597 0.095 3 A Nation of Laws
Fed 35:Hamilton 0.149 0.413 0.169 0.269 2 Structure Amidst Differing Agendas
Fed 24:Hamilton 0.143 0.212 0.290 0.355 4 Internal/External Relations and Conflict
Fed 4:Jay 0.142 0.260 0.098 0.501 4 Internal/External Relations and Conflict
Fed 34:Hamilton 0.142 0.242 0.291 0.325 4 Internal/External Relations and Conflict
Fed 30:Hamilton 0.140 0.247 0.297 0.316 4 Internal/External Relations and Conflict
Fed 25:Hamilton 0.139 0.217 0.256 0.388 4 Internal/External Relations and Conflict
Fed 45:Madison 0.138 0.448 0.242 0.171 2 Structure Amidst Differing Agendas
Fed 10:Madison 0.131 0.693 0.079 0.096 2 Structure Amidst Differing Agendas
Fed 19:Madison 0.127 0.166 0.189 0.518 4 Internal/External Relations and Conflict
Fed 23:Hamilton 0.124 0.282 0.349 0.244 3 A Nation of Laws
Fed 54:Disputed 0.124 0.580 0.255 0.041 2 Structure Amidst Differing Agendas
Fed 46:Madison 0.123 0.545 0.164 0.168 2 Structure Amidst Differing Agendas
Fed 83:Hamilton 0.113 0.184 0.655 0.048 3 A Nation of Laws
Fed 32:Hamilton 0.110 0.099 0.733 0.059 3 A Nation of Laws
Fed 7:Hamilton 0.107 0.253 0.274 0.365 4 Internal/External Relations and Conflict
Fed 42:Madison 0.105 0.153 0.600 0.142 3 A Nation of Laws
Fed 43:Madison 0.104 0.326 0.399 0.172 3 A Nation of Laws
Fed 28:Hamilton 0.100 0.348 0.206 0.346 2 Structure Amidst Differing Agendas
Fed 11:Hamilton 0.097 0.229 0.122 0.552 4 Internal/External Relations and Conflict
Fed 80:Hamilton 0.096 0.132 0.674 0.099 3 A Nation of Laws
Fed 56:Disputed 0.091 0.654 0.168 0.088 2 Structure Amidst Differing Agendas
Fed 13:Hamilton 0.086 0.424 0.198 0.292 2 Structure Amidst Differing Agendas
Fed 14:Madison 0.086 0.461 0.135 0.318 2 Structure Amidst Differing Agendas
Fed 82:Hamilton 0.083 0.124 0.738 0.054 3 A Nation of Laws
Fed 12:Hamilton 0.080 0.312 0.146 0.462 4 Internal/External Relations and Conflict
Fed 21:Hamilton 0.077 0.272 0.379 0.271 3 A Nation of Laws
Fed 41:Madison 0.074 0.273 0.325 0.328 4 Internal/External Relations and Conflict
Fed 8:Hamilton 0.068 0.276 0.129 0.526 4 Internal/External Relations and Conflict


Most Likely Papers for Topic 2 - Theme: Structure Amidst Differing Agendas

Let’s try this again, but ordering by decreasing probability of being in topic two. Seems to me the theme of those five papers looks very much to be focused on structural concerns.

  • Federalist 10: The Same Subject Continued (The Union as a Safeguard Against Domestic Faction and Insurrection)

  • Federalist 56: The Same Subject Continued (The Total Number of the House of Representatives)

  • Federalist 58: Objection That The Number of Members Will Not Be Augmented as the Progress of Population Demands Considered

  • Federalist 55: The Total Number of the House of Representatives

  • Federalist 54: The Apportionment of Members Among the States

Paper pTopic1 pTopic2 pTopic3 pTopic4 Topic# TopicDesc
Fed 10:Madison 0.131 0.693 0.079 0.096 2 Structure Amidst Differing Agendas
Fed 56:Disputed 0.091 0.654 0.168 0.088 2 Structure Amidst Differing Agendas
Fed 58:Disputed 0.168 0.607 0.142 0.083 2 Structure Amidst Differing Agendas
Fed 55:Disputed 0.160 0.602 0.144 0.093 2 Structure Amidst Differing Agendas
Fed 54:Disputed 0.124 0.580 0.255 0.041 2 Structure Amidst Differing Agendas
Fed 57:Disputed 0.184 0.558 0.164 0.093 2 Structure Amidst Differing Agendas
Fed 63:Disputed 0.208 0.550 0.122 0.120 2 Structure Amidst Differing Agendas
Fed 51:Disputed 0.219 0.546 0.143 0.093 2 Structure Amidst Differing Agendas
Fed 46:Madison 0.123 0.545 0.164 0.168 2 Structure Amidst Differing Agendas
Fed 52:Disputed 0.176 0.540 0.216 0.068 2 Structure Amidst Differing Agendas
Fed 53:Disputed 0.196 0.525 0.203 0.076 2 Structure Amidst Differing Agendas
Fed 37:Madison 0.183 0.481 0.200 0.136 2 Structure Amidst Differing Agendas
Fed 60:Hamilton 0.219 0.475 0.147 0.159 2 Structure Amidst Differing Agendas
Fed 14:Madison 0.086 0.461 0.135 0.318 2 Structure Amidst Differing Agendas
Fed 62:Disputed 0.238 0.458 0.163 0.141 2 Structure Amidst Differing Agendas
Fed 45:Madison 0.138 0.448 0.242 0.171 2 Structure Amidst Differing Agendas
Fed 61:Hamilton 0.159 0.445 0.271 0.125 2 Structure Amidst Differing Agendas
Fed 13:Hamilton 0.086 0.424 0.198 0.292 2 Structure Amidst Differing Agendas
Fed 35:Hamilton 0.149 0.413 0.169 0.269 2 Structure Amidst Differing Agendas
Fed 39:Madison 0.253 0.400 0.295 0.053 2 Structure Amidst Differing Agendas
Fed 49:Disputed 0.294 0.399 0.237 0.070 2 Structure Amidst Differing Agendas
Fed 27:Hamilton 0.232 0.356 0.219 0.193 2 Structure Amidst Differing Agendas
Fed 50:Disputed 0.301 0.356 0.202 0.141 2 Structure Amidst Differing Agendas
Fed 1:Hamiltion 0.181 0.353 0.218 0.248 2 Structure Amidst Differing Agendas
Fed 17:Hamilton 0.209 0.351 0.192 0.248 2 Structure Amidst Differing Agendas
Fed 2:Jay 0.156 0.350 0.174 0.320 2 Structure Amidst Differing Agendas
Fed 68:Hamilton 0.460 0.349 0.099 0.092 1 Powers and Limitations of the Executive Branch
Fed 28:Hamilton 0.100 0.348 0.206 0.346 2 Structure Amidst Differing Agendas
Fed 3:Jay 0.209 0.333 0.189 0.270 2 Structure Amidst Differing Agendas
Fed 59:Hamilton 0.249 0.332 0.293 0.126 2 Structure Amidst Differing Agendas
Fed 36:Hamilton 0.174 0.329 0.345 0.152 3 A Nation of Laws
Fed 43:Madison 0.104 0.326 0.399 0.172 3 A Nation of Laws
Fed 85:Hamilton 0.186 0.316 0.313 0.185 2 Structure Amidst Differing Agendas
Fed 12:Hamilton 0.080 0.312 0.146 0.462 4 Internal/External Relations and Conflict
Fed 71:Hamilton 0.495 0.307 0.089 0.109 1 Powers and Limitations of the Executive Branch
Fed 48:Madison 0.443 0.306 0.182 0.070 1 Powers and Limitations of the Executive Branch
Fed 22:Hamilton 0.193 0.296 0.233 0.277 2 Structure Amidst Differing Agendas
Fed 9:Hamilton 0.198 0.283 0.218 0.301 4 Internal/External Relations and Conflict
Fed 23:Hamilton 0.124 0.282 0.349 0.244 3 A Nation of Laws
Fed 70:Hamilton 0.437 0.280 0.154 0.128 1 Powers and Limitations of the Executive Branch
Fed 38:Madison 0.213 0.279 0.283 0.225 3 A Nation of Laws
Fed 77:Hamilton 0.502 0.278 0.176 0.045 1 Powers and Limitations of the Executive Branch
Fed 16:Hamilton 0.191 0.277 0.185 0.347 4 Internal/External Relations and Conflict
Fed 8:Hamilton 0.068 0.276 0.129 0.526 4 Internal/External Relations and Conflict
Fed 84:Hamilton 0.167 0.275 0.471 0.087 3 A Nation of Laws
Fed 64:Jay 0.454 0.274 0.163 0.109 1 Powers and Limitations of the Executive Branch
Fed 41:Madison 0.074 0.273 0.325 0.328 4 Internal/External Relations and Conflict
Fed 21:Hamilton 0.077 0.272 0.379 0.271 3 A Nation of Laws
Fed 31:Hamilton 0.159 0.270 0.369 0.202 3 A Nation of Laws
Fed 79:Hamilton 0.308 0.269 0.289 0.133 1 Powers and Limitations of the Executive Branch
Fed 65:Hamilton 0.385 0.266 0.261 0.089 1 Powers and Limitations of the Executive Branch
Fed 75:Hamilton 0.447 0.266 0.149 0.137 1 Powers and Limitations of the Executive Branch
Fed 4:Jay 0.142 0.260 0.098 0.501 4 Internal/External Relations and Conflict
Fed 26:Hamilton 0.168 0.257 0.263 0.313 4 Internal/External Relations and Conflict
Fed 73:Hamilton 0.432 0.255 0.175 0.139 1 Powers and Limitations of the Executive Branch
Fed 7:Hamilton 0.107 0.253 0.274 0.365 4 Internal/External Relations and Conflict
Fed 66:Hamilton 0.509 0.251 0.191 0.049 1 Powers and Limitations of the Executive Branch
Fed 76:Hamilton 0.543 0.248 0.160 0.048 1 Powers and Limitations of the Executive Branch
Fed 30:Hamilton 0.140 0.247 0.297 0.316 4 Internal/External Relations and Conflict
Fed 29:Hamilton 0.165 0.243 0.260 0.332 4 Internal/External Relations and Conflict
Fed 34:Hamilton 0.142 0.242 0.291 0.325 4 Internal/External Relations and Conflict
Fed 15:Hamilton 0.211 0.238 0.238 0.313 4 Internal/External Relations and Conflict
Fed 72:Hamilton 0.439 0.234 0.121 0.207 1 Powers and Limitations of the Executive Branch
Fed 11:Hamilton 0.097 0.229 0.122 0.552 4 Internal/External Relations and Conflict
Fed 5:Jay 0.159 0.228 0.101 0.512 4 Internal/External Relations and Conflict
Fed 78:Hamilton 0.345 0.218 0.347 0.090 3 A Nation of Laws
Fed 25:Hamilton 0.139 0.217 0.256 0.388 4 Internal/External Relations and Conflict
Fed 40:Madison 0.178 0.215 0.519 0.088 3 A Nation of Laws
Fed 24:Hamilton 0.143 0.212 0.290 0.355 4 Internal/External Relations and Conflict
Fed 20:Madison 0.230 0.210 0.174 0.385 4 Internal/External Relations and Conflict
Fed 6:Hamilton 0.198 0.195 0.115 0.492 4 Internal/External Relations and Conflict
Fed 83:Hamilton 0.113 0.184 0.655 0.048 3 A Nation of Laws
Fed 74:Hamilton 0.402 0.183 0.228 0.187 1 Powers and Limitations of the Executive Branch
Fed 19:Madison 0.127 0.166 0.189 0.518 4 Internal/External Relations and Conflict
Fed 33:Hamilton 0.150 0.158 0.597 0.095 3 A Nation of Laws
Fed 42:Madison 0.105 0.153 0.600 0.142 3 A Nation of Laws
Fed 44:Madison 0.164 0.139 0.608 0.089 3 A Nation of Laws
Fed 18:Madison 0.171 0.132 0.133 0.564 4 Internal/External Relations and Conflict
Fed 80:Hamilton 0.096 0.132 0.674 0.099 3 A Nation of Laws
Fed 81:Hamilton 0.178 0.130 0.653 0.039 3 A Nation of Laws
Fed 82:Hamilton 0.083 0.124 0.738 0.054 3 A Nation of Laws
Fed 67:Hamilton 0.488 0.116 0.326 0.069 1 Powers and Limitations of the Executive Branch
Fed 32:Hamilton 0.110 0.099 0.733 0.059 3 A Nation of Laws
Fed 47:Madison 0.664 0.099 0.199 0.037 1 Powers and Limitations of the Executive Branch
Fed 69:Hamilton 0.446 0.099 0.347 0.107 1 Powers and Limitations of the Executive Branch


Most Likely Papers for Topic 3 - Theme: A Nation of Laws

A re-ordering of papers most probably associated with topic three give us this list. Their titles consistently reference the judiciary, so I think this choice is sound…

  • Federalist 82: The Judiciary Continued

  • Federalist 32: The Same Subject Continued (Concerning the General Power of Taxation)

  • Federalist 80: The Powers of the Judiciary

  • Federalist 83: The Judiciary Continued in Relation to Trial by Jury

  • Federalist 81: The Judiciary Continued, and the Distribution of the Judicial Authority

Paper pTopic1 pTopic2 pTopic3 pTopic4 Topic# TopicDesc
Fed 82:Hamilton 0.083 0.124 0.738 0.054 3 A Nation of Laws
Fed 32:Hamilton 0.110 0.099 0.733 0.059 3 A Nation of Laws
Fed 80:Hamilton 0.096 0.132 0.674 0.099 3 A Nation of Laws
Fed 83:Hamilton 0.113 0.184 0.655 0.048 3 A Nation of Laws
Fed 81:Hamilton 0.178 0.130 0.653 0.039 3 A Nation of Laws
Fed 44:Madison 0.164 0.139 0.608 0.089 3 A Nation of Laws
Fed 42:Madison 0.105 0.153 0.600 0.142 3 A Nation of Laws
Fed 33:Hamilton 0.150 0.158 0.597 0.095 3 A Nation of Laws
Fed 40:Madison 0.178 0.215 0.519 0.088 3 A Nation of Laws
Fed 84:Hamilton 0.167 0.275 0.471 0.087 3 A Nation of Laws
Fed 43:Madison 0.104 0.326 0.399 0.172 3 A Nation of Laws
Fed 21:Hamilton 0.077 0.272 0.379 0.271 3 A Nation of Laws
Fed 31:Hamilton 0.159 0.270 0.369 0.202 3 A Nation of Laws
Fed 23:Hamilton 0.124 0.282 0.349 0.244 3 A Nation of Laws
Fed 69:Hamilton 0.446 0.099 0.347 0.107 1 Powers and Limitations of the Executive Branch
Fed 78:Hamilton 0.345 0.218 0.347 0.090 3 A Nation of Laws
Fed 36:Hamilton 0.174 0.329 0.345 0.152 3 A Nation of Laws
Fed 67:Hamilton 0.488 0.116 0.326 0.069 1 Powers and Limitations of the Executive Branch
Fed 41:Madison 0.074 0.273 0.325 0.328 4 Internal/External Relations and Conflict
Fed 85:Hamilton 0.186 0.316 0.313 0.185 2 Structure Amidst Differing Agendas
Fed 30:Hamilton 0.140 0.247 0.297 0.316 4 Internal/External Relations and Conflict
Fed 39:Madison 0.253 0.400 0.295 0.053 2 Structure Amidst Differing Agendas
Fed 59:Hamilton 0.249 0.332 0.293 0.126 2 Structure Amidst Differing Agendas
Fed 34:Hamilton 0.142 0.242 0.291 0.325 4 Internal/External Relations and Conflict
Fed 24:Hamilton 0.143 0.212 0.290 0.355 4 Internal/External Relations and Conflict
Fed 79:Hamilton 0.308 0.269 0.289 0.133 1 Powers and Limitations of the Executive Branch
Fed 38:Madison 0.213 0.279 0.283 0.225 3 A Nation of Laws
Fed 7:Hamilton 0.107 0.253 0.274 0.365 4 Internal/External Relations and Conflict
Fed 61:Hamilton 0.159 0.445 0.271 0.125 2 Structure Amidst Differing Agendas
Fed 26:Hamilton 0.168 0.257 0.263 0.313 4 Internal/External Relations and Conflict
Fed 65:Hamilton 0.385 0.266 0.261 0.089 1 Powers and Limitations of the Executive Branch
Fed 29:Hamilton 0.165 0.243 0.260 0.332 4 Internal/External Relations and Conflict
Fed 25:Hamilton 0.139 0.217 0.256 0.388 4 Internal/External Relations and Conflict
Fed 54:Disputed 0.124 0.580 0.255 0.041 2 Structure Amidst Differing Agendas
Fed 45:Madison 0.138 0.448 0.242 0.171 2 Structure Amidst Differing Agendas
Fed 15:Hamilton 0.211 0.238 0.238 0.313 4 Internal/External Relations and Conflict
Fed 49:Disputed 0.294 0.399 0.237 0.070 2 Structure Amidst Differing Agendas
Fed 22:Hamilton 0.193 0.296 0.233 0.277 2 Structure Amidst Differing Agendas
Fed 74:Hamilton 0.402 0.183 0.228 0.187 1 Powers and Limitations of the Executive Branch
Fed 27:Hamilton 0.232 0.356 0.219 0.193 2 Structure Amidst Differing Agendas
Fed 1:Hamiltion 0.181 0.353 0.218 0.248 2 Structure Amidst Differing Agendas
Fed 9:Hamilton 0.198 0.283 0.218 0.301 4 Internal/External Relations and Conflict
Fed 52:Disputed 0.176 0.540 0.216 0.068 2 Structure Amidst Differing Agendas
Fed 28:Hamilton 0.100 0.348 0.206 0.346 2 Structure Amidst Differing Agendas
Fed 53:Disputed 0.196 0.525 0.203 0.076 2 Structure Amidst Differing Agendas
Fed 50:Disputed 0.301 0.356 0.202 0.141 2 Structure Amidst Differing Agendas
Fed 37:Madison 0.183 0.481 0.200 0.136 2 Structure Amidst Differing Agendas
Fed 47:Madison 0.664 0.099 0.199 0.037 1 Powers and Limitations of the Executive Branch
Fed 13:Hamilton 0.086 0.424 0.198 0.292 2 Structure Amidst Differing Agendas
Fed 17:Hamilton 0.209 0.351 0.192 0.248 2 Structure Amidst Differing Agendas
Fed 66:Hamilton 0.509 0.251 0.191 0.049 1 Powers and Limitations of the Executive Branch
Fed 3:Jay 0.209 0.333 0.189 0.270 2 Structure Amidst Differing Agendas
Fed 19:Madison 0.127 0.166 0.189 0.518 4 Internal/External Relations and Conflict
Fed 16:Hamilton 0.191 0.277 0.185 0.347 4 Internal/External Relations and Conflict
Fed 48:Madison 0.443 0.306 0.182 0.070 1 Powers and Limitations of the Executive Branch
Fed 77:Hamilton 0.502 0.278 0.176 0.045 1 Powers and Limitations of the Executive Branch
Fed 73:Hamilton 0.432 0.255 0.175 0.139 1 Powers and Limitations of the Executive Branch
Fed 2:Jay 0.156 0.350 0.174 0.320 2 Structure Amidst Differing Agendas
Fed 20:Madison 0.230 0.210 0.174 0.385 4 Internal/External Relations and Conflict
Fed 35:Hamilton 0.149 0.413 0.169 0.269 2 Structure Amidst Differing Agendas
Fed 56:Disputed 0.091 0.654 0.168 0.088 2 Structure Amidst Differing Agendas
Fed 46:Madison 0.123 0.545 0.164 0.168 2 Structure Amidst Differing Agendas
Fed 57:Disputed 0.184 0.558 0.164 0.093 2 Structure Amidst Differing Agendas
Fed 62:Disputed 0.238 0.458 0.163 0.141 2 Structure Amidst Differing Agendas
Fed 64:Jay 0.454 0.274 0.163 0.109 1 Powers and Limitations of the Executive Branch
Fed 76:Hamilton 0.543 0.248 0.160 0.048 1 Powers and Limitations of the Executive Branch
Fed 70:Hamilton 0.437 0.280 0.154 0.128 1 Powers and Limitations of the Executive Branch
Fed 75:Hamilton 0.447 0.266 0.149 0.137 1 Powers and Limitations of the Executive Branch
Fed 60:Hamilton 0.219 0.475 0.147 0.159 2 Structure Amidst Differing Agendas
Fed 12:Hamilton 0.080 0.312 0.146 0.462 4 Internal/External Relations and Conflict
Fed 55:Disputed 0.160 0.602 0.144 0.093 2 Structure Amidst Differing Agendas
Fed 51:Disputed 0.219 0.546 0.143 0.093 2 Structure Amidst Differing Agendas
Fed 58:Disputed 0.168 0.607 0.142 0.083 2 Structure Amidst Differing Agendas
Fed 14:Madison 0.086 0.461 0.135 0.318 2 Structure Amidst Differing Agendas
Fed 18:Madison 0.171 0.132 0.133 0.564 4 Internal/External Relations and Conflict
Fed 8:Hamilton 0.068 0.276 0.129 0.526 4 Internal/External Relations and Conflict
Fed 11:Hamilton 0.097 0.229 0.122 0.552 4 Internal/External Relations and Conflict
Fed 63:Disputed 0.208 0.550 0.122 0.120 2 Structure Amidst Differing Agendas
Fed 72:Hamilton 0.439 0.234 0.121 0.207 1 Powers and Limitations of the Executive Branch
Fed 6:Hamilton 0.198 0.195 0.115 0.492 4 Internal/External Relations and Conflict
Fed 5:Jay 0.159 0.228 0.101 0.512 4 Internal/External Relations and Conflict
Fed 68:Hamilton 0.460 0.349 0.099 0.092 1 Powers and Limitations of the Executive Branch
Fed 4:Jay 0.142 0.260 0.098 0.501 4 Internal/External Relations and Conflict
Fed 71:Hamilton 0.495 0.307 0.089 0.109 1 Powers and Limitations of the Executive Branch
Fed 10:Madison 0.131 0.693 0.079 0.096 2 Structure Amidst Differing Agendas


Most Likely Papers for Topic 4 - Theme: Internal/External Relations and Conflict

And lastly, an ordering by topic 4. Another set of papers that looks to agree with the chosen theme; the role of a federal government as it deals with international relations and conflict…

  • Federalist 18: The Same Subject Continued (The Insufficiency of the Present Confederation to Preserve the Union)

  • Federalist 11: The Utility of the Union in Respect to Commercial Relations and a Navy

  • Federalist 8: The Consequences of Hostilities Between the States

  • Federalist 19: The Same Subject Continued (The Insufficiency of the Present Confederation to Preserve the Union)

  • Federalist 5: The Same Subject Continued (Concerning Dangers From Foreign Force and Influence)



Paper pTopic1 pTopic2 pTopic3 pTopic4 Topic# TopicDesc
Fed 18:Madison 0.171 0.132 0.133 0.564 4 Internal/External Relations and Conflict
Fed 11:Hamilton 0.097 0.229 0.122 0.552 4 Internal/External Relations and Conflict
Fed 8:Hamilton 0.068 0.276 0.129 0.526 4 Internal/External Relations and Conflict
Fed 19:Madison 0.127 0.166 0.189 0.518 4 Internal/External Relations and Conflict
Fed 5:Jay 0.159 0.228 0.101 0.512 4 Internal/External Relations and Conflict
Fed 4:Jay 0.142 0.260 0.098 0.501 4 Internal/External Relations and Conflict
Fed 6:Hamilton 0.198 0.195 0.115 0.492 4 Internal/External Relations and Conflict
Fed 12:Hamilton 0.080 0.312 0.146 0.462 4 Internal/External Relations and Conflict
Fed 25:Hamilton 0.139 0.217 0.256 0.388 4 Internal/External Relations and Conflict
Fed 20:Madison 0.230 0.210 0.174 0.385 4 Internal/External Relations and Conflict
Fed 7:Hamilton 0.107 0.253 0.274 0.365 4 Internal/External Relations and Conflict
Fed 24:Hamilton 0.143 0.212 0.290 0.355 4 Internal/External Relations and Conflict
Fed 16:Hamilton 0.191 0.277 0.185 0.347 4 Internal/External Relations and Conflict
Fed 28:Hamilton 0.100 0.348 0.206 0.346 2 Structure Amidst Differing Agendas
Fed 29:Hamilton 0.165 0.243 0.260 0.332 4 Internal/External Relations and Conflict
Fed 41:Madison 0.074 0.273 0.325 0.328 4 Internal/External Relations and Conflict
Fed 34:Hamilton 0.142 0.242 0.291 0.325 4 Internal/External Relations and Conflict
Fed 2:Jay 0.156 0.350 0.174 0.320 2 Structure Amidst Differing Agendas
Fed 14:Madison 0.086 0.461 0.135 0.318 2 Structure Amidst Differing Agendas
Fed 30:Hamilton 0.140 0.247 0.297 0.316 4 Internal/External Relations and Conflict
Fed 15:Hamilton 0.211 0.238 0.238 0.313 4 Internal/External Relations and Conflict
Fed 26:Hamilton 0.168 0.257 0.263 0.313 4 Internal/External Relations and Conflict
Fed 9:Hamilton 0.198 0.283 0.218 0.301 4 Internal/External Relations and Conflict
Fed 13:Hamilton 0.086 0.424 0.198 0.292 2 Structure Amidst Differing Agendas
Fed 22:Hamilton 0.193 0.296 0.233 0.277 2 Structure Amidst Differing Agendas
Fed 21:Hamilton 0.077 0.272 0.379 0.271 3 A Nation of Laws
Fed 3:Jay 0.209 0.333 0.189 0.270 2 Structure Amidst Differing Agendas
Fed 35:Hamilton 0.149 0.413 0.169 0.269 2 Structure Amidst Differing Agendas
Fed 1:Hamiltion 0.181 0.353 0.218 0.248 2 Structure Amidst Differing Agendas
Fed 17:Hamilton 0.209 0.351 0.192 0.248 2 Structure Amidst Differing Agendas
Fed 23:Hamilton 0.124 0.282 0.349 0.244 3 A Nation of Laws
Fed 38:Madison 0.213 0.279 0.283 0.225 3 A Nation of Laws
Fed 72:Hamilton 0.439 0.234 0.121 0.207 1 Powers and Limitations of the Executive Branch
Fed 31:Hamilton 0.159 0.270 0.369 0.202 3 A Nation of Laws
Fed 27:Hamilton 0.232 0.356 0.219 0.193 2 Structure Amidst Differing Agendas
Fed 74:Hamilton 0.402 0.183 0.228 0.187 1 Powers and Limitations of the Executive Branch
Fed 85:Hamilton 0.186 0.316 0.313 0.185 2 Structure Amidst Differing Agendas
Fed 43:Madison 0.104 0.326 0.399 0.172 3 A Nation of Laws
Fed 45:Madison 0.138 0.448 0.242 0.171 2 Structure Amidst Differing Agendas
Fed 46:Madison 0.123 0.545 0.164 0.168 2 Structure Amidst Differing Agendas
Fed 60:Hamilton 0.219 0.475 0.147 0.159 2 Structure Amidst Differing Agendas
Fed 36:Hamilton 0.174 0.329 0.345 0.152 3 A Nation of Laws
Fed 42:Madison 0.105 0.153 0.600 0.142 3 A Nation of Laws
Fed 50:Disputed 0.301 0.356 0.202 0.141 2 Structure Amidst Differing Agendas
Fed 62:Disputed 0.238 0.458 0.163 0.141 2 Structure Amidst Differing Agendas
Fed 73:Hamilton 0.432 0.255 0.175 0.139 1 Powers and Limitations of the Executive Branch
Fed 75:Hamilton 0.447 0.266 0.149 0.137 1 Powers and Limitations of the Executive Branch
Fed 37:Madison 0.183 0.481 0.200 0.136 2 Structure Amidst Differing Agendas
Fed 79:Hamilton 0.308 0.269 0.289 0.133 1 Powers and Limitations of the Executive Branch
Fed 70:Hamilton 0.437 0.280 0.154 0.128 1 Powers and Limitations of the Executive Branch
Fed 59:Hamilton 0.249 0.332 0.293 0.126 2 Structure Amidst Differing Agendas
Fed 61:Hamilton 0.159 0.445 0.271 0.125 2 Structure Amidst Differing Agendas
Fed 63:Disputed 0.208 0.550 0.122 0.120 2 Structure Amidst Differing Agendas
Fed 64:Jay 0.454 0.274 0.163 0.109 1 Powers and Limitations of the Executive Branch
Fed 71:Hamilton 0.495 0.307 0.089 0.109 1 Powers and Limitations of the Executive Branch
Fed 69:Hamilton 0.446 0.099 0.347 0.107 1 Powers and Limitations of the Executive Branch
Fed 80:Hamilton 0.096 0.132 0.674 0.099 3 A Nation of Laws
Fed 10:Madison 0.131 0.693 0.079 0.096 2 Structure Amidst Differing Agendas
Fed 33:Hamilton 0.150 0.158 0.597 0.095 3 A Nation of Laws
Fed 51:Disputed 0.219 0.546 0.143 0.093 2 Structure Amidst Differing Agendas
Fed 55:Disputed 0.160 0.602 0.144 0.093 2 Structure Amidst Differing Agendas
Fed 57:Disputed 0.184 0.558 0.164 0.093 2 Structure Amidst Differing Agendas
Fed 68:Hamilton 0.460 0.349 0.099 0.092 1 Powers and Limitations of the Executive Branch
Fed 78:Hamilton 0.345 0.218 0.347 0.090 3 A Nation of Laws
Fed 44:Madison 0.164 0.139 0.608 0.089 3 A Nation of Laws
Fed 65:Hamilton 0.385 0.266 0.261 0.089 1 Powers and Limitations of the Executive Branch
Fed 40:Madison 0.178 0.215 0.519 0.088 3 A Nation of Laws
Fed 56:Disputed 0.091 0.654 0.168 0.088 2 Structure Amidst Differing Agendas
Fed 84:Hamilton 0.167 0.275 0.471 0.087 3 A Nation of Laws
Fed 58:Disputed 0.168 0.607 0.142 0.083 2 Structure Amidst Differing Agendas
Fed 53:Disputed 0.196 0.525 0.203 0.076 2 Structure Amidst Differing Agendas
Fed 48:Madison 0.443 0.306 0.182 0.070 1 Powers and Limitations of the Executive Branch
Fed 49:Disputed 0.294 0.399 0.237 0.070 2 Structure Amidst Differing Agendas
Fed 67:Hamilton 0.488 0.116 0.326 0.069 1 Powers and Limitations of the Executive Branch
Fed 52:Disputed 0.176 0.540 0.216 0.068 2 Structure Amidst Differing Agendas
Fed 32:Hamilton 0.110 0.099 0.733 0.059 3 A Nation of Laws
Fed 82:Hamilton 0.083 0.124 0.738 0.054 3 A Nation of Laws
Fed 39:Madison 0.253 0.400 0.295 0.053 2 Structure Amidst Differing Agendas
Fed 66:Hamilton 0.509 0.251 0.191 0.049 1 Powers and Limitations of the Executive Branch
Fed 76:Hamilton 0.543 0.248 0.160 0.048 1 Powers and Limitations of the Executive Branch
Fed 83:Hamilton 0.113 0.184 0.655 0.048 3 A Nation of Laws
Fed 77:Hamilton 0.502 0.278 0.176 0.045 1 Powers and Limitations of the Executive Branch
Fed 54:Disputed 0.124 0.580 0.255 0.041 2 Structure Amidst Differing Agendas
Fed 81:Hamilton 0.178 0.130 0.653 0.039 3 A Nation of Laws
Fed 47:Madison 0.664 0.099 0.199 0.037 1 Powers and Limitations of the Executive Branch

LDAVis for Visualization and a User-Defined Function to Serve it:

This is the code you’ll run to visualize the topics using functions in the LDAVis package. Full disclosure here, the topicmodels_json_ldavis() user-defined function was provided to me by the professors in my classwork. I don’t know for sure if they derived it themselves or found it somewhere. I’m guessing they found it in this May 2015 link to r-bloggers.com, given the similarity of the code I was supplied.

To that end, I’ve made some tweaks and replaced the inspect() call with as.matrix(). The inspect() call has changed between versions of the tm package that was in effect at that time. Seems to me that tm occasionally (two separate occurrences for me now) exhibits behavior contrary to backward compatibility, but it’s not been a big deal to fix it up. I also changed a line or two to call the str_count function in the already-loaded stringr library we used earlier. That lets me use one less library than I originally did.

The serVis() command from LDAVis takes the output JSON file from the user-defined function. If the ‘open.browser’ parameter is True, it sets up a local process that serves up an interactive session where the user can dynamically interact. Also, the objects in the vis2 directory can be accessed so that it can run interactively without having to serve it up. More on that below.


# Function for visualization
topicmodels_json_ldavis <- function(fitted, corpus, doc_term){
  # Find required quantities
  phi <- as.matrix(posterior(fitted)$terms)
  theta <- as.matrix(posterior(fitted)$topics)
  vocab <- colnames(phi)
  doc_length <- vector()
  for (i in 1:length(corpus)) {
    temp <- paste(corpus[[i]]$content, collapse = ' ')
    doc_length <- c(doc_length, str_count(temp, pattern = '\\S+'))
  }

  # temp_frequency <- inspect(doc_term)
  temp_frequency <- as.matrix(doc_term)
  freq_matrix <- data.frame(ST = colnames(temp_frequency),
                            Freq = colSums(temp_frequency))
  # Convert to json
  json_lda <- LDAvis::createJSON(phi = phi, theta=theta, vocab = vocab,
                                 doc.length = doc_length,
                                 term.frequency = freq_matrix$Freq)
  return(json_lda)
}

json1 <- topicmodels_json_ldavis(ldaOut4,documents,dtMatrix)
# You might want the files to go to a specific output directory. If so, change
# 'vis2' to a full path destination.
serVis(json1, out.dir = 'vis2', open.browser = TRUE)

Visual Interpretation of Topic Separation and Word Relevance

In the images related to our four topics and for the output of the serVis() function in general, the display is broken up into two sections. The first is the plot of inter-topic separation on the left and the relevance of word distributions on the right.

The separation of topics as regions on the PC1 and PC2 axes are a result of plotting inter-topic distances using Principal Component Analysis (PCA). On page 68 of Sievert and Shirley’s paper, the authors of the LDAVis package state:

The default for scaling the set of inter-topic distances defaults to Principal Components, but other algorithms are also enabled.

Principal Component Analysis is explained quite nicely here. Suffice it to say that we’re looking to take all the variance that matters to a dataset’s predictors (here, each of the wordcounts) and summarize it into fewer significant dimensions. Here, just two dimensions for ease of visualization, but it could as easily be three, four, or more. We’ll do a walkthrough of PCA another time…

The relevance of the word distribution is affected by the slider for lambda. When set to 0, it lets us incrementally weight a word’s relevance in only the topic highlighted. When set to 1, weights its relevance to overall usage. Also on page 68 of their paper:

If lambda = 1, terms are ranked solely by phi, which implies the red bars would be sorted from widest (at the top) to narrowest (at the bottom). By comparing the widths of the red and gray bars for a given term, users can quickly understand whether a term is highly relevant to the selected topic because of its lift (a high ratio of red to gray), or its probability (absolute width of red).

So, here are our top terms by topic, this time with screnshots and not in tabular format. Or try it interactively here using the objects in the ‘vis2’ working directory that the topicmodels_json_ldavis() user-defined function created.

Curious about how to embed an LDAVis construct on your website? If so, the link at the bottom of this post for Sievert’s LDA GitHub page has an example of how to format the URL using a path through that working directory.


Topic 1 - Powers and Limitations of the Executive Branch


Topic 2 - Structure Amidst Differing Agendas


Topic 3 - A Nation of Laws


Topic 4 - Internal/External Relations and Conflict


Conclusions

So, what did we learn here? Well, a couple of things:

  • Text cleaning is unavoidable and oftentimes context-dependent. What might be a good cleaning methodology with one set of documents may not necessarily be appropriate for another set of documents. A strategy of using a mix of base R code along with string-processing packages in the tidyverse (like stringr) will put you well on the way to gaining a comfort level with whatever is thrown your way. Also, the functions out of the tm package that remove stopwords, extra punctuation and perform word stemming are crucial tools of the trade.

  • Formatting data in R for word compilation uses object types that are easy to work with and understand, again provided by the tm package. A DocumentTermMatrix is exactly what it says it is - a matrix with words as columns and documents as rows, with each cell containing a word count in that document.

  • Models like LDA concisely address the needed mathematics and provide objects with components - like the set of gamma probabilities - that service simple calls such as topics() and terms()

  • Visualization of - and interaction with - topics and their content is as simple as getting a good handle on the LDAVis package.


A Critique of Processes and Decisions Made

So, what can we do better to make the results a little more accurate? The first thing that comes to mind is in a more thorough inspection and cleaning of the data. We can probably do a better job tossing out words that don’t have a thematic impact. The most notable instances of those are in the propensity of these documents to fully spell out numbers that we would more often write today as numerals.

An example from Federalist 12:

…Hitherto, I believe, it may safely be asserted, that these duties have not upon an average exceeded in any State three per cent. In France they are estimated to be about fifteen per cent., and in Britain they exceed this proportion. …Upon a ratio to the importation into this State, the whole quantity imported into the United States may be estimated at four millions of gallons; which, at a shilling per gallon, would produce two hundred thousand pounds.

An example from Federalist 56:

The number of inhabitants in the two kingdoms of England and Scotland cannot be stated at less than eight millions. The representatives of these eight millions in the House of Commons amount to five hundred and fifty-eight. Of this number, one ninth are elected by three hundred and sixty-four persons, and one half, by five thousand seven hundred and twenty-three persons.

Also, there’s nothing precluding us from expanding the topic set. The choice of four topics was an arbitary number. In theory, we could keep adding them if distinct themes presented themselves. That’s the nice thing about the modeling of the inter-topic distance through PCA, we can tell when topics overlap. That’s one of the things I noticed earlier when using the SnowballC stemmer function; it generated an overlapping of the circles for topics one and two whereas the standard stemmer did not.

Those of you that have more experience on the subject than I do, please comment!! I’d greatly appreciate your thoughts and suggestions for improving this series of posts for people who’d like to explore this subject further…

Further Information on the Subject:

If you’d like to pull down the code that I used for this series of posts, feel free!

The original paper from the LDAVis authors is highly illuminating. One of them, Kenny Shirley, lets you interactively work with an LDAVis object on his website. Its other author, Carson Sievert, maintains the GitHub repo for LDA with the latest code and related information.

Here’s a wonderful slide deck and walkthrough of topic model visualization by Ben Mabey.