Natural Language Processing in Google Cloud Flashcards
What are key features on NLP API?
- Entity extraction
- Sentiment analysis
- Entity sentiment analysis
- Text classification
- Syntax analysis
What is stemming?
Stemming is a process of converting a word to its root representations ex. playing -> play
What is normalization in NLP?
Normalization is a process of converting a phrase to its normal form ex. lol -> laugh out loud, wtf -> what the fuck. It is useful for social networks.
Explain bag-of-words approach.
Bag-of-words represents a tecnique for converting text to numeric representation. It is similar to one-hot encoding but in this case each occurence of the word in the sentece is marked in vocabulary. You can track the occurence for each word like marking 2 for a dog or ignore number of occurence of the same word.
What is word2vec?
word2vec reprents a family of model architectures and optimization that are used to learn word embeddings from a large dataset.
What are 2 most common model architectures in word2vec?
CBOW (continous bag-of-words) - predicts center word by providing a surrounding context
Skip-gram - predicts surrounding context by providing a center word
For both architectures you first need to traing a neural network by creating training data.
What is a Tensorflow Hub?
Tensorflow Hub is platform that contains pretrained models that you can use to fine-tune and deploy.
How do you decide if pretrained embeddings model should be fine-tuned or not?
It depends on the amount of training data. You should not fine-tune the model if you don’t have too much data to avoid overfitting.
What is the difference between ANN (artificial neural network) and DNN (deep neural network)?
ANN has only one hidden layer.