ML assignment 1: Theory part Flashcards

Question 1

Q

In the context of machine learning, what is classification?

The process of grouping data into different subsets

The process of predicting a continuous output

The process of assigning labels to data points

The process or reducing the dimensonality of data

Answer

A

The process of assigning labels to data points

Question 2

Q

What do we mean by the term feature?

A characteristic or a property of a machine learning model

An individual measurable property or characteristic of a phenomenon being observed

The set of predictions made by a machine learning model

The type of machine learning algorithm use for a project

Answer

A

An individual measurable property or characteristic of a phenomenon being observed

Question 3

Q

Match the task with the type of learning involved:

Self-driving car
Image classification
customer or market segmentation

task:
Supervised learning
Unsupervised learning
Reinforcement learning

Answer

A

Self-driving car - Reinforcement learning
Image classification - Supervised learning
customer or market segmentation - Unsupervised learning

Question 4

Q

Training a model is the process of

Finding relevant data points

Finding optimal model parameters

Estimating model performance

Deploying the model to users

Answer

A

Finding optimal model parameters

Question 5

Q

Select the properties of a dataset that can pose problems for a machine learning project:

The dataset contains very many features

The dataset has very few data points

The dataset contains only numerical values

The dataset contains private, personal information

The dataset was downloaded from the Internet

Answer

A

The dataset has very few data points
The dataset contains private, personal information

Question 6

Q

What are reasonable methods of handling missing or corrupted data in a dataset?

Remove data points where values are missing

Remove entire features where data are missing

Replace missing values with values from a neighboring feature

Replace missing values with the mean or the median of the feature (computed from the training set)

Replace missing values with the mean or the median of the feature (computed from the test set)

Answer

A

Remove data points where values are missing
Remove entire features where data are missing
Replace missing values with the mean or the median of the feature (computed from the training set)

Question 7

Q

A dataset contains a feature named “has_computer”, where the values can be “yes”, “no”, and “unknown”. What is the best strategy for processing this feature?

No processing needed, the ML algorithm will figure things out.

Text values can’t be input to an ML algorithm, so the feature must be removed.

The text strings “yes” and “no” should be converted to the binary values True and False, and data points with “unknown” should be removed.

The text should be converted to the categorical values 1, 2, and 3.

Answer

A

The text should be converted to the categorical values 1, 2, and 3.

Question 8

Q

We want to analyse the iris datasetLinks to an external site., and have done the following to get the data as a numpy array named X:

> > > from sklearn.datasets import load_iris
dataset = load_iris()
X = dataset[‘data’]
type(X)

<class ‘numpy.ndarray’>
How do we print out all the values of the second column (second feature) of this array?

We want to analyse the iris datasetLinks to an external site., and have done the following to get the data as a numpy array named X:

> > > from sklearn.datasets import load_iris
dataset = load_iris()
X = dataset[‘data’]
type(X)

<class ‘numpy.ndarray’>
How do we print out all the values of the second column (second feature) of this array?

print(X[1, :])

print(X[2, :])

print(X[:, 1])

print(X[:, 2])

Answer

A

print(X[:, 1])

Question 9

Q

In a binary classification problem, what does the confusion matrix show?

The number of data points in the training and the test set

The correlation between the different features

The mean-squared error between predictions and true values of the classes

The number of true positives, false positives, true negatives, and false negatives

Answer

A

The number of true positives, false positives, true negatives, and false negatives

Question 10

Q

We train a model to classify pictures of road vehicles into the following types: Trucks, personal cars, and taxis. The training dataset contains 250 pictures of trucks, 1750 pictures of personal cars, and 7 pictures of taxis. Why may accuracy not be the best metric for evaluating the model’s performance?

Accuracy can be misleading when it comes to performance on the minority classes

Accuracy can only be computed for binary classifiers

The interpretation of accuracy in multiclass classification is unclear

Accuracy applies only to regression tasks

Answer

A

Accuracy can be misleading when it comes to performance on the minority classes

Question 11

Q

Increasing the threshold of a binary classifier is likely to produce which of the following effects?

False positives increase

False positives decrease

False positives and false negatives both increase

False positives and false negatives both decrease

Answer

A

False positives decrease

Question 12

Q

We want to develop a classifier that detects if students are cheating on an exam. Since we don’t want to wrongfully accuse a student of cheating, the classifier should keep false positives to a minimum. Which metric is most important to pay attention to in this case?

Accuracy

Precision

Recall

Mean squared error

Answer

A

Precision

Question 13

Q

What do we call a dataset where the majority of data points belong to one class, making classification difficult?

Noisy dataset

Asymmetric dataset

Imbalanced dataset

Sparse dataset

Answer

A

Imbalanced dataset

Question 14

Q

What does the term overfitting refer to?

When a dataset has high dimensionality

When a model shows equal performance on the training and the testing datasets

When a model has higher recall than precision

When a model performs well on training data but fails to generalise to new data

Answer

A

When a model performs well on training data but fails to generalise to new data

Question 15

Q

What is the purpose of using cross-validation in model evaluation?

To reduce the training time of the model

To ensure that data processing is applied uniformly to all data points

To remove the need for different training and test sets

To evaluate the model’s performance more reliably using multiple dataset splits

Answer

A

To evaluate the model’s performance more reliably using multiple dataset splits

ML assignment 1: Theory part Flashcards

(15 cards)