CS35 - First Quiz Flashcards

1
Q

The mathematical discipline that studies the methods of collecting, analysing, and interpreting data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Subset or subcollection of the population

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Specific collection of items of interest

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Logic is built based on business rules

A

Traditional Rule-Based AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Logic is built by modelling and training data

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Machine learning algorithm in which the training data includes both input and output

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Training data consists of only input without any known output

A

Unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The model predicts whether a record is an instance of a specific class or category

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The label predicted by the model is a numeric value

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The model predicts whether a record is an instance of one of multiple classes or categories

A

Multiclass Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model identifies similarities between observations based on their features and groups them into discrete clusters

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Inputs are called

A

Feature values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outputs are called

A

Label values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Proportion of prediction that the model got right

A

Accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Proportion of predicted positive cases where the true label is actually positive

A

Precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Proportion of positive cases that the model identified correctly

A

Recall

16
Q

Overall metric combining recall and precision

A

F1 Score

17
Q

Formula of accuracy

A

(TN + TP)/(TN + FN + FP + TP)

18
Q

Formula of precision

A

TP / (TP + FP)

19
Q

Formula of recall

A

TP / (TP + FN)

20
Q

Formula of f1 score

A

(2 x Precision x Recall) / (Precision + Recall)

21
Q

Available data set is split into:

A

Training data and test data

22
Q

Typical training data is around:

A

70% to 80%

23
Q

A technique to balance uneven datasets by keeping all of the data in the minority class and decreasing the size of the majority class.

A

Undersampling

24
Q

It works by creating synthetic examples for the minority class rather than simply duplicating existing instances. These synthetic examples are generated by interpolating between existing instances of the minority class.

A

Synthetic Minority Oversampling Technique)

25
Q

Occurs when one class is significantly more frequent than the other.

A

Class Imbalance

26
Q

Used to address imbalance.

A

Undersampling the minority class