Friday, July 3, 2020

Topic Modeling: Identifying Patterns in Text Data

Generally documents are grouped into topics. Sometimes we need to identify the patterns in a text that correspond to a particular topic. The technique of doing this is called topic modeling. In other words, we can say that topic modeling is a technique to uncover abstract themes or hidden structure in the given set of documents.

Topic Modeling (NLP) LSA, pLSA, LDA with python | Medium

We can use the topic modeling technique in the following scenarios:

1.Text Classification

With the help of topic modeling, classification can be improved because it groups similar words together rather than using each word separately as a feature.

2. Recommender Systems

With the help of topic modeling, we can build the recommender systems by using similarity measures.

Algorithms for Topic Modeling

Topic modeling can be implemented by using algorithms. The algorithms are as follows:

1. Latent Dirichlet Allocation(LDA)

This algorithm is the most popular for topic modeling. It uses the probabilistic graphical models for implementing topic modeling. We need to import gensim package in Python for using LDA slgorithm.

2. Latent Semantic Analysis(LDA) or Latent Semantic Indexing(LSI)

This algorithm is based upon Linear Algebra. Basically it uses the concept of SVD (Singular Value Decomposition) on the document term matrix.

3. Non-Negative Matrix Factorization (NMF)

It is also based upon Linear Algebra.

All of the above mentioned algorithms for topic modeling would have the number of topics as a parameter, Document-Word Matrix as an input and WTM (Word Topic Matrix) & TDM (Topic Document Matrix) as output.


Share:

0 comments:

Post a Comment