Final Review Flashcards
(T/F) Supervised learning and unsupervised clustering both require at least one input attribute
False
(T/F) Grouping people in a social network is an example of unsupervised machine learning
True
What is topic modelling in natural language processing (NLP)?
Topic modelling is an unsupervised machine learning approach that can scan a series of documents, find word and phrase patterns within them, and automatically cluster word groupings and related expressions that best represent the set
What is a recurrent neural network (RNN)?
Recurrent neural networks are a class of neural networks that are helpful in modelling sequence data.
Derived from feed forward networks, RNNs exhibit similar behaviour to how the human brain functions. Simply put: recurrent neural networks produce predictive results in sequential data that other algorithms can’t
Explain the bias-variance tradeoff
Bias is the degree to which a models predictions vary from the true value. High bias implies a simple model that is not able to capture the complexity of the data and is underfit.
Variance is the degree to which a models predictions vary for different training sets. High variance implies a complex model that overfits to the training data.
The bias variance tradeoff is this the balance of model complexity that will get you the best amounts of bias and variance so as to not overfit or underfit to the training data and make more accurate predictions on new unseen data.
What is lexicon normalization in text preprocessing?
A type of textual noise is about the multiple representations exhibited by a single word. For example - “play”, “player”, “played”, and “plays” are different variations of the word “play”. Though they mean different things contextually they are all similar.
Lexicon normalization converts all of the disparities of a word into their normalized form (also known as the lemma). Normalization is a pivotal feature for feature engineering with text as it converts the high dimensional features to a lower dimensional space which is ideal for any machine learning model. The most common lexicon normalization practices are stemming and lemmatization
Define confusion matrix, accuracy, precision, and recall
A confusion matrix is an NxN matric used for evaluating the performance of a classification model where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kind of errors it is making.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
What are the regularization techniques that can be used for a convolutional neural network?
- L2 & L1 regularization
- Dropout
- Data augmentation
- Early stopping
Explain the steps to create a bag of words
- Tokenization: First, the input text is tokenized. A sentence is represented as a list of its constituent words and it’s done for all the input sentences
- Vocabulary creation: Of all the obtained tokenized words, only unique words are selected to create the vocabulary and then sorted in alphabetical order
- Vector creation: Finally, a sparse matrix is created for the input out of the frequency of vocabulary words. In this sparse matrix, each row sentence vector whose length (the columns of the matrix) is equal to the size of the vocabulary
(T/F) You have classification data with classes Y = {+1, -1} and features Fi = {+1, -1} for i = {1, …, K}. In an attempt to turbocharge your classifier you duplicate each feature so now each example has 2K features with Fk+i = Fi for i = {1, …, K}. The following questions compare the original feature set with the doubled one. You may assume in the case of ties, class +1 is always chosen. Assume there are equal numbers of training examples in each class.
For a Naive Bayes model, which of the following are true:
1. The test accuracy could be higher with the doubled feature set
2. The test accuracy will be the same with either feature set
3. The test accuracy could be higher with the original features
- False
- False
- True
You are training a sum model and you find training loss is near 0 but test loss is very high. Which of the following is expected to reduce test loss? (multi)
- Increase training data size
- Decrease training data size
- Increase model complexity
- Decrease model complexity
- Training on a combination of training and test but only test on test
- Conclude that ML doesn’t work
- Increase training data size
- Decrease model complexity
- Training on a combination of training and test but test only on test (would reduce test loss but is not good practice)
You train a linear classifier on 1000 training points and discover accuracy is only 50%. Which of the following if done in isolation has a good chance of improving training accuracy? (multi)
1. Add new features
2. Train on more data
3. Train on less data
- Add new features
- Train on less data
In supervised learning, training data includes:
1. Output
2. Input
3. Both
4. None
Both
You are given rows of a few Netflix series marked as positive, negative, or neutral. Classifying reviews of a new Netflix series is an example of:
1. Supervised Learning
2. Unsupervised Learning
3. Semisupervised Learning
4. Reinforcement Learning
Supervised learning
Which of the following is the second stage in NLP?
1. Discourse analysis
2. Syntactic analysis
3. Semantic analysis
4. Pragmatic analysis
Syntactic analysis
Text summarization finds the most informative sentences in which of the following:
1. Video
2. Sound
3. Image
4. Document
Document
Why is the XOR problem exceptionally interesting to researchers?
Because it is the simplest linearly inseparable problem that exists
Which of the following gives non-linearity to a neural network?
1. Convolution
2. Stochastic gradient descent
3. Sigmoid activation function
4. Non-zero bias
Sigmoid activation function