CS35 - First Quiz Flashcards

1
Q

The mathematical discipline that studies the methods of collecting, analysing, and interpreting data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Subset or subcollection of the population

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Specific collection of items of interest

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Logic is built based on business rules

A

Traditional Rule-Based AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Logic is built by modelling and training data

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Machine learning algorithm in which the training data includes both input and output

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Training data consists of only input without any known output

A

Unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The model predicts whether a record is an instance of a specific class or category

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The label predicted by the model is a numeric value

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The model predicts whether a record is an instance of one of multiple classes or categories

A

Multiclass Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model identifies similarities between observations based on their features and groups them into discrete clusters

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Inputs are called

A

Feature values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outputs are called

A

Label values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Proportion of prediction that the model got right

A

Accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Proportion of predicted positive cases where the true label is actually positive

A

Precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Proportion of positive cases that the model identified correctly

16
Q

Overall metric combining recall and precision

17
Q

Formula of accuracy

A

(TN + TP)/(TN + FN + FP + TP)

18
Q

Formula of precision

A

TP / (TP + FP)

19
Q

Formula of recall

A

TP / (TP + FN)

20
Q

Formula of f1 score

A

(2 x Precision x Recall) / (Precision + Recall)

21
Q

Available data set is split into:

A

Training data and test data

22
Q

Typical training data is around:

A

70% to 80%

23
Q

A technique to balance uneven datasets by keeping all of the data in the minority class and decreasing the size of the majority class.

A

Undersampling

24
It works by creating synthetic examples for the minority class rather than simply duplicating existing instances. These synthetic examples are generated by interpolating between existing instances of the minority class.
Synthetic Minority Oversampling Technique)
25
Occurs when one class is significantly more frequent than the other.
Class Imbalance
26
Used to address imbalance.
Undersampling the minority class