L15 - TF-IDF Approach Flashcards
1
Q
- What does TF-IDF stand for?
A
- Term Frequency - Inverse Document Frequency
2
Q
- How do we calculate the score if TF-IDF?
A
- (Word counts of target text) / (word counts of other texts)
3
Q
- What is the purpose of TF-IDF?
A
- Establishes the important of terms in a document relative to a corpus of other documents
4
Q
- Generally how does TF-IDF work?
A
- Calculate the frequency of every term E.g corpus of metal lyrics
- For each document, calculate the TF-IDF score
5
Q
- Explain the steps of the TF-IDP process…
A
- Tokenise words i.e perform stemming or lemmatisation
- For each term in the document, calculate it’s frequency -> (Number of term occurrences) / (number of terms in document)
- Multiple TF and IDF values to get the TF-IDF score for each term
- Represent the TF-IDF scores in a document-term matrix