القائمة الرئيسية

الصفحات

 

Techniques for Topic Modeling

 

Topic modeling is regarding logically correlating many words. Say a medium operator desires to spot whether or not the poor network may be a reason for low client satisfaction. Here, “bad network” is that the topic. The document is analyzed for words like “bad”, “slow speed”, “call not connecting”, etc., that are additional doubtless to explain network problems compared to common words like “the” or “and”. 

 

 Latent linguistics Analysis (LSA)

 

Latent linguistics analysis (LSA) aims to leverage the context around the words so as to capture hidden ideas or topics. 

 

In this technique, machines use Term Frequency-Inverse Document Frequency (TF-IDF) for analyzing documents. TF-IDF may be a numerical data point that reflects however necessary a word is to a document at intervals a corpus.

 

Say there's a set of ‘m’ text documents and every document incorporates a total of ‘n’ distinctive words. The TF-IDF matrix – m*n – contains the TF-IDF scores for every word within the document. This matrix is then reduced to ‘k's dimensions, k being the required variety of topics. The reduction is finished victimization Singular price Decomposition (SVD).

This decomposition provides a vector illustration of every|of every} word term in each document within the entire assortment through the equation A = USVT where: 

 

A is that the SVD matrix 

U is that the vector illustration of the documents with vector length k 

V is that the vector illustration of terms within the given document with length k

S represents the square matrix of the singular topic frequency scores 

T may be a hyperparameter reflective of the number of topics

The SVD matrix will be wont to notice similar topics and documents victimization of the trigonometric function similarity technique.

The main disadvantages of LSA are the inefficient illustration and non-interpretable embeddings. It additionally needs an oversized corpus to yield correct results.