H2022 ML Flashcards

Question 1

Q

Question 2

Q

Explain the difference between classification and regression in machine learning. Include at least one example of each.

Answer

A

Classification: Predicts categories. Example: Predicting if an email is “spam” or “not spam.”
Regression: Predicts numerical values. Example: Predicting the price of a house based on its size.

Question 3

Q

In machine learning, one typically divides one’s dataset into three subsets: a training set, a validation set, and a test set. Why? What are the roles of each of these subsets?

Answer

A

Training Set: Used to train the model by fitting patterns.
Validation Set: Used to tune hyperparameters and prevent overfitting.
Test Set: Used to evaluate the model’s final performance on unseen data.

Question 4

Q

Which of the following tasks are examples of multilabel classification? Note that there may be more than one correct answer.

Classify a recording of speech as belonging to an authorized or a non-authorized user

Classify whether an article is about either sports, politics, science, culture, or finance

Classify rating and profit for a new product that you’re considering launching

Classify customers into those that are interested in a product or not

Classify images as being of either dogs or cats

Answer

A

Classify rating and profit for a new product that you’re considering launching

Multilabel Classification: A task where each instance can belong to multiple classes simultaneously.
Example: Tagging a movie as “Action,” “Comedy,” and “Drama.”

Question 5

Q

Question 6

Q

Question 7

Q

Which of the following statements are true for k-fold cross-validation? Note that there may be more than one correct answer.

Fits multiple models on different splits of the data

Provides a more robust estimate of generalization performance than hold-out validation

Is less computationally expensive than hold-out validation

It makes it unnecessary to have a test set

Answer

A

Fits multiple models on different splits of the data

Provides a more robust estimate of generalization performance than hold-out validation

Question 8

Q

Which of the following statements are true about decision trees? Note that there may be more than one correct answer.

Their predictions are relatively simple to interpret

They and the models based on them require less preprocessing of data than most other models

They rarely overfit the training data

They are sensitive to minor changes in the training data

Answer

A

Their predictions are relatively simple to interpret

They and the models based on them require less preprocessing of data than most other models

They are sensitive to minor changes in the training data

Question 9

Q

Say you have trained a decision tree and achieved an accuracy of 85% on the training set and 47% on the validation set. Which of the following ideas would you pursue to improve the model’s performance? Note that there may bemore than one correct answer.

Reduce the maximum allowed depth of the tree ( max_depth )

Increase the maximum allowed depth of the tree ( max_depth )

Reduce the minimum number of samples required to allow a node to be split ( min_samples_split )

Answer

A

Reduce the maximum allowed depth of the tree ( max_depth )

Question 10

Q

Explain how a random forest is constructed from decision trees.

Answer

A

Random Forest Construction: A random forest combines multiple decision trees, each trained on a random subset of data and features, using bagging. Predictions are made by averaging (regression) or majority vote (classification).

Question 11

Q

Explain the difference between machine learning and machine learning engineering

Answer

A

Machine Learning: Focuses on developing models and algorithms to analyze data and make predictions.

Machine Learning Engineering: Focuses on deploying, scaling, and maintaining machine learning models in production systems.

Question 12

Q

Describe some challenges faced when putting machine learning-based systems into production. Include at least one concrete example.

Answer

A

Data Drift: Model accuracy drops as input data changes over time.
Example: A fraud detection model trained on past patterns fails with new fraud tactics.
Scalability: Ensuring models handle large user requests efficiently.
Monitoring: Tracking model performance in real-time to detect failures or bias.
Integration: Merging models into existing systems seamlessly.

Brainscape's Knowledge GenomeTM

H2022 ML Flashcards

Brainscape's Knowledge Genome^TM