Midterm 1 Flashcards
What is Machine Learning?
- A training set has attributes, where one of the attributes is the class.
- We want to find a model for class attribute as some function of the values of other attributes using a test set.
The dataset is split into?
a training and testing set
Confusion Matrix
A way to lay out how many predicted categories/classes were correctly predicted and how many were not.
- true positive, true negative, false positives, false negatives
Which of the following statements is/are correct?
a) In machine learning, most of the data is used for testing.
b) In machine learning, most of the data is used for training.
c) Training set is used to determine the accuracy of the model.
d) b and c.
b) In machine learning, most of the data is used for training.
which machine learning technique should be applied to the following problem?
“In information retrieval, a search engine needs to find groups of documents that
are similar to each other based on important term appearing in them”.
a) Clustering
b) Classification
c) Regression
d) Validation
a) Clustering
Which of the following tasks is an unsupervised learning technique?
a) Clustering
b) Classification
c) Regression
d) All of the above
a) Clustering
Which of the following methods requires having a training set and test set?
a) Supervised Learning
b) Unsupervised Learning
c) a and b
d) None of the above
a) Supervised Learning
Which of the following is NOT an example of a machine learning problem?
a) Optical character recognition: categorize images of handwritten characters by
letters represented
b) Face detection: find faces in images
c) Topic spotting: categorize news articles
d) None of the above.
d) None of the above.
In classification problems, there may be multiple ways of classifying data items,
i.e., a data item may belong to more than one classification category. T/F
T
Which of the following is an example of a flag variable?
a) Gender: female/male
b) Weather: clear/rainy/cloudy
c) Temperature: [21, 80]
d) a and b
d) a and b
K-means clustering is an unsupervised technique to partition the dataset into K
pre-defined distinct non-overlapping subgroups. T/F
T
Association rules are good means to predict sequential dependencies among
different events. T/F
T
Why do we use regularization on models?
a) To measure the accuracy of a model
b) To prevent overfitting
c) To train a model
d) All of the above
b) To prevent overfitting
What does the loss function measure?
a) residual error
b) prediction error
c) model parameters
d) all of the above
b) prediction error
Training set is used to determine the accuracy of the model. T/F
F