Lecture 11 Flashcards

1
Q

What are the three main datasets used in supervised machine learning?

A

Training set, validation set, and test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of a validation set?

A

To tune hyperparameters and improve model performance without using the test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it important not to ‘peek’ at the test set?

A

To ensure the model is evaluated on completely unseen data, preventing overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a feature in machine learning?

A

An attribute-value pair representing characteristics of each data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the goal of feature selection?

A

To choose the most relevant attributes that contribute to a model’s predictive power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a feature vector?

A

A numerical representation of a data point using selected features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is binary classification?

A

A classification problem where there are only two possible classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a probabilistic classifier?

A

A classifier that assigns probabilities to each class and selects the one with the highest probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is k-Nearest Neighbour (k-NN)?

A

A classification algorithm that assigns a class based on the majority class of the k closest data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does k-NN measure similarity?

A

Using distance metrics such as Euclidean distance between feature vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of text features in machine learning?

A

They convert text into numerical representations for processing by algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is one way to represent text features in machine learning?

A

Using a binary vector where 1 indicates a word’s presence and 0 indicates absence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is term frequency-inverse document frequency (tf-idf)?

A

A weighting method that measures how important a word is in a document relative to a corpus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Hamming distance?

A

A similarity measure that counts the number of feature differences between two vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Euclidean distance?

A

A similarity measure based on the straight-line distance between two points in a multi-dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is cosine similarity?

A

A metric that measures the angle between two vectors to determine their similarity.

17
Q

What is the purpose of stemming in text processing?

A

To reduce words to their root form, improving consistency in feature extraction.

18
Q

Why is stop word removal useful in text classification?

A

It eliminates common words (e.g., ‘the’, ‘and’) that do not contribute meaningful information.

19
Q

What is cross-validation?

A

A technique where data is repeatedly split into training and test sets to assess model performance.

20
Q

What is a confusion matrix?

A

A table that summarizes a classification model’s performance by showing true and false positives and negatives.

21
Q

What is the formula for accuracy in classification?

A

(TP + TN) / Total Samples.

22
Q

What is precision in classification?

A

The fraction of correctly predicted positive instances out of all predicted positive instances.

23
Q

What is recall in classification?

A

The fraction of correctly predicted positive instances out of all actual positive instances.

24
Q

What is mean squared error (MSE) used for?

A

Evaluating the performance of regression models by measuring the average squared differences between predicted and actual values.

25
Q

What is precision@k used for?

A

An evaluation metric in ranking tasks that measures the fraction of relevant items in the top k recommendations.