CS35 - First Quiz Flashcards by Rusyl Anne Espiña

The mathematical discipline that studies the methods of collecting, analysing, and interpreting data.

Statistics

How well did you know this?

Not at all

Perfectly

Subset or subcollection of the population

Sample

How well did you know this?

Not at all

Perfectly

Specific collection of items of interest

Population

How well did you know this?

Not at all

Perfectly

Logic is built based on business rules

Traditional Rule-Based AI

How well did you know this?

Not at all

Perfectly

Logic is built by modelling and training data

Machine Learning

How well did you know this?

Not at all

Perfectly

Machine learning algorithm in which the training data includes both input and output

Supervised

How well did you know this?

Not at all

Perfectly

Training data consists of only input without any known output

Unsupervised

How well did you know this?

Not at all

Perfectly

The model predicts whether a record is an instance of a specific class or category

Binary Classification

How well did you know this?

Not at all

Perfectly

The label predicted by the model is a numeric value

Regression

How well did you know this?

Not at all

Perfectly

The model predicts whether a record is an instance of one of multiple classes or categories

Multiclass Classification

How well did you know this?

Not at all

Perfectly

Model identifies similarities between observations based on their features and groups them into discrete clusters

Clustering

How well did you know this?

Not at all

Perfectly

Inputs are called

Feature values

How well did you know this?

Not at all

Perfectly

Outputs are called

Label values

How well did you know this?

Not at all

Perfectly

Proportion of prediction that the model got right

Accuracy

How well did you know this?

Not at all

Perfectly

Proportion of predicted positive cases where the true label is actually positive

Precision

How well did you know this?

Not at all

Perfectly

Proportion of positive cases that the model identified correctly

Recall

Overall metric combining recall and precision

F1 Score

Formula of accuracy

(TN + TP)/(TN + FN + FP + TP)

Formula of precision

TP / (TP + FP)

Formula of recall

TP / (TP + FN)

Formula of f1 score

(2 x Precision x Recall) / (Precision + Recall)

Available data set is split into:

Training data and test data

Typical training data is around:

70% to 80%

A technique to balance uneven datasets by keeping all of the data in the minority class and decreasing the size of the majority class.

Undersampling

It works by creating synthetic examples for the minority class rather than simply duplicating existing instances. These synthetic examples are generated by interpolating between existing instances of the minority class.

Synthetic Minority Oversampling Technique)

Occurs when one class is significantly more frequent than the other.

Class Imbalance

Used to address imbalance.

Undersampling the minority class