CS35 - First Quiz Flashcards
The mathematical discipline that studies the methods of collecting, analysing, and interpreting data.
Statistics
Subset or subcollection of the population
Sample
Specific collection of items of interest
Population
Logic is built based on business rules
Traditional Rule-Based AI
Logic is built by modelling and training data
Machine Learning
Machine learning algorithm in which the training data includes both input and output
Supervised
Training data consists of only input without any known output
Unsupervised
The model predicts whether a record is an instance of a specific class or category
Binary Classification
The label predicted by the model is a numeric value
Regression
The model predicts whether a record is an instance of one of multiple classes or categories
Multiclass Classification
Model identifies similarities between observations based on their features and groups them into discrete clusters
Clustering
Inputs are called
Feature values
Outputs are called
Label values
Proportion of prediction that the model got right
Accuracy
Proportion of predicted positive cases where the true label is actually positive
Precision
Proportion of positive cases that the model identified correctly
Recall
Overall metric combining recall and precision
F1 Score
Formula of accuracy
(TN + TP)/(TN + FN + FP + TP)
Formula of precision
TP / (TP + FP)
Formula of recall
TP / (TP + FN)
Formula of f1 score
(2 x Precision x Recall) / (Precision + Recall)
Available data set is split into:
Training data and test data
Typical training data is around:
70% to 80%
A technique to balance uneven datasets by keeping all of the data in the minority class and decreasing the size of the majority class.
Undersampling