LDA

LDA stands for Latent Dirichlet Allocation. It is commonly used for language patterns and semantic analysis but it can be – used in non-textual data as well (as long as the data is composed only of positive integers).

LDA is a topic modelling method: a statistical model that, given a set of documents, outputs the topics that belong to (or emerge from) the set. A topic is (defined by) a group of words that frequently appear together. A word can, in a set of documents, appear in different topics – and in each one of them the word will appear with a given weight.

Being a statistical model, LDA provides probabilities of topics and word weights. The number of topics and consequently each topic composition and words weights are subject not only to the data, of course, but to set parameters.

This video recorded by Carson Sievert (Iowa State University) shows the interactive tool LDAvis (ported to Python as pyLDAvis) being used to display LDA topics and words.

 

Leave a Reply

All fields are optional. Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.