Midterm 1 Flashcards
What is Machine Learning?
- A training set has attributes, where one of the attributes is the class.
- We want to find a model for class attribute as some function of the values of other attributes using a test set.
The dataset is split into?
a training and testing set
Confusion Matrix
A way to lay out how many predicted categories/classes were correctly predicted and how many were not.
- true positive, true negative, false positives, false negatives
Which of the following statements is/are correct?
a) In machine learning, most of the data is used for testing.
b) In machine learning, most of the data is used for training.
c) Training set is used to determine the accuracy of the model.
d) b and c.
b) In machine learning, most of the data is used for training.
which machine learning technique should be applied to the following problem?
“In information retrieval, a search engine needs to find groups of documents that
are similar to each other based on important term appearing in them”.
a) Clustering
b) Classification
c) Regression
d) Validation
a) Clustering
Which of the following tasks is an unsupervised learning technique?
a) Clustering
b) Classification
c) Regression
d) All of the above
a) Clustering
Which of the following methods requires having a training set and test set?
a) Supervised Learning
b) Unsupervised Learning
c) a and b
d) None of the above
a) Supervised Learning
Which of the following is NOT an example of a machine learning problem?
a) Optical character recognition: categorize images of handwritten characters by
letters represented
b) Face detection: find faces in images
c) Topic spotting: categorize news articles
d) None of the above.
d) None of the above.
In classification problems, there may be multiple ways of classifying data items,
i.e., a data item may belong to more than one classification category. T/F
T
Which of the following is an example of a flag variable?
a) Gender: female/male
b) Weather: clear/rainy/cloudy
c) Temperature: [21, 80]
d) a and b
d) a and b
K-means clustering is an unsupervised technique to partition the dataset into K
pre-defined distinct non-overlapping subgroups. T/F
T
Association rules are good means to predict sequential dependencies among
different events. T/F
T
Why do we use regularization on models?
a) To measure the accuracy of a model
b) To prevent overfitting
c) To train a model
d) All of the above
b) To prevent overfitting
What does the loss function measure?
a) residual error
b) prediction error
c) model parameters
d) all of the above
b) prediction error
Training set is used to determine the accuracy of the model. T/F
F
In a 2-layered Neural Network, the perceptron takes an input, calculates the weighted
sum of the inputs and weights, and returns 1 if the weighted sum is above a threshold
value (T/F)
T
When training a model, the main goal is to:
a) Update model coefficients
b) Minimize the error by updating model coefficients
c) Add bias
d) None of the above
b) Minimize the error by updating model coefficients
N-fold Cross validation is a method used to prevent overfitting. T/F
T
OLS method is used when the relationship between input and output is very complex.
T/F
F
What is Ordinary Least Squares Method for?
a) Minimize the loss function
b) Maximize the loss function
c) Update the parameters of a model
d) a and c
d) a and c
Why do we use regularization on models?
a) To measure the accuracy of a model
b) To prevent overfitting
c) To train a model
d) All of the above
b) To prevent overfitting
Gradient Descent method is used when the relationship between input and output is very
complex. T/F
T
Regularization is a method that penalizes model coefficients to reduce overfitting.
T/F
T
Lasso is an example of regularization method. T/F
T