NLP-4 Flashcards
what is bag of word or bow algorithm
Bag of Words is a Natural Language Processing model which helps in extracting features out of the
text which can be helpful in machine learning algorithms. In bag of words, we get the occurrences of
each word and construct the vocabulary for the corpus
Bag of Words just creates a set of vectors containing the count of word occurrences in the document (reviews). Bag of Words vectors are easy to interpret.
what does bow algorithm give us
we can say that the bag of words gives us two things:
1. A vocabulary of unique words for the corpus
2. The frequency of these words (number of times it has occurred in the whole corpus).
justify why we call it ‘bag’ of words algorithgm
calling this algorithm “bag” of words symbolises that the sequence of sentences or tokens does
not matter in this case as all we need are the unique words and their frequency in it.
what are the steps to implement bow algorithm
Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many times the
word from the unique list of words has occurred.
- Create document vectors for all the documents
define vector table
It is a table containing the frequency of each unique word in the vocabulary in a document.
If the document contains a particular word it is represented by 1 and absence of word is represented by 0 value.n
a document vector table, the header row contains the vocabulary of the corpus and other rows correspond to different documents.
define vocabulary
It is the collection of all unique words in the corpus along with its frequency.