ML assignment 1: Theory part Flashcards

1
Q

In the context of machine learning, what is classification?

The process of grouping data into different subsets

The process of predicting a continuous output

The process of assigning labels to data points

The process or reducing the dimensonality of data

A

The process of assigning labels to data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we mean by the term feature?

A characteristic or a property of a machine learning model

An individual measurable property or characteristic of a phenomenon being observed

The set of predictions made by a machine learning model

The type of machine learning algorithm use for a project

A

An individual measurable property or characteristic of a phenomenon being observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Match the task with the type of learning involved:

Self-driving car
Image classification
customer or market segmentation

task:
Supervised learning
Unsupervised learning
Reinforcement learning

A

Self-driving car - Reinforcement learning
Image classification - Supervised learning
customer or market segmentation - Unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Training a model is the process of

Finding relevant data points

Finding optimal model parameters

Estimating model performance

Deploying the model to users

A

Finding optimal model parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Select the properties of a dataset that can pose problems for a machine learning project:

The dataset contains very many features

The dataset has very few data points

The dataset contains only numerical values

The dataset contains private, personal information

The dataset was downloaded from the Internet

A

The dataset has very few data points
The dataset contains private, personal information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are reasonable methods of handling missing or corrupted data in a dataset?

Remove data points where values are missing

Remove entire features where data are missing

Replace missing values with values from a neighboring feature

Replace missing values with the mean or the median of the feature (computed from the training set)

Replace missing values with the mean or the median of the feature (computed from the test set)

A

Remove data points where values are missing
Remove entire features where data are missing
Replace missing values with the mean or the median of the feature (computed from the training set)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A dataset contains a feature named “has_computer”, where the values can be “yes”, “no”, and “unknown”. What is the best strategy for processing this feature?

No processing needed, the ML algorithm will figure things out.

Text values can’t be input to an ML algorithm, so the feature must be removed.

The text strings “yes” and “no” should be converted to the binary values True and False, and data points with “unknown” should be removed.

The text should be converted to the categorical values 1, 2, and 3.

A

The text should be converted to the categorical values 1, 2, and 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We want to analyse the iris datasetLinks to an external site., and have done the following to get the data as a numpy array named X:

> > > from sklearn.datasets import load_iris

> > > dataset = load_iris()

> > > X = dataset[‘data’]

> > > type(X)

<class ‘numpy.ndarray’>

How do we print out all the values of the second column (second feature) of this array?

print(X[1, :])

print(X[2, :])

print(X[:, 1])

print(X[:, 2])

A

print(X[:, 1])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a binary classification problem, what does the confusion matrix show?

The number of data points in the training and the test set

The correlation between the different features

The mean-squared error between predictions and true values of the classes

The number of true positives, false positives, true negatives, and false negatives

A

The number of true positives, false positives, true negatives, and false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

We train a model to classify pictures of road vehicles into the following types: Trucks, personal cars, and taxis. The training dataset contains 250 pictures of trucks, 1750 pictures of personal cars, and 7 pictures of taxis. Why may accuracy not be the best metric for evaluating the model’s performance?

Accuracy can be misleading when it comes to performance on the minority classes

Accuracy can only be computed for binary classifiers

The interpretation of accuracy in multiclass classification is unclear

Accuracy applies only to regression tasks

A

Accuracy can be misleading when it comes to performance on the minority classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Increasing the threshold of a binary classifier is likely to produce which of the following effects?

Increasing the threshold of a binary classifier is likely to produce which of the following effects?

False positives increase

False positives decrease

False positives and false negatives both increase

False positives and false negatives both decrease

A

False positives decrease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

We want to develop a classifier that detects if students are cheating on an exam. Since we don’t want to wrongfully accuse a student of cheating, the classifier should keep false positives to a minimum. Which metric is most important to pay attention to in this case?

Accuracy

Precision

Recall

Mean squared error

A

Precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do we call a dataset where the majority of data points belong to one class, making classification difficult?

Noisy dataset

Asymmetric dataset

Imbalanced dataset

Sparse dataset

A

Imbalanced dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the term overfitting refer to?

When a dataset has high dimensionality

When a model shows equal performance on the training and the testing datasets

When a model has higher recall than precision

When a model performs well on training data but fails to generalise to new data

A

When a model performs well on training data but fails to generalise to new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of using cross-validation in model evaluation?

To reduce the training time of the model

To ensure that data processing is applied uniformly to all data points

To remove the need for different training and test sets

To evaluate the model’s performance more reliably using multiple dataset splits

A

To evaluate the model’s performance more reliably using multiple dataset splits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly