Lesson 1 Flashcards
What are the 4 steps of machine learning?
- Collect data
- Learning algorithm
- Evaluation
- Deployment
What are the 2 parts of supervised learning?
Regression and classification
What 3 kinds of machine learning do you have?
Supervised, unsupervised, and reinforcement learning
Can a regression target be a positive number?
Yes
The target is a real number
Give 3 classification goals and an example.
- Binary classification (give loan, or not (0,1))
- Multiple classification (identifying birds (more than one label))
- Sequence labeling (partioning a sound sample into words)
What are all the data split sets and what are they used for?
- Train, learn about data and making the model.
- Validation, monitor performance and choose the best hyperparameters.
- Test, Evaluate generalization to the real-world.
What is K-fold cross validation?
It splits the training set into K amount of folds and evaluates its performance on 1 of those folds while using the other sets as training data.
What is stratification in data splitting?
It ensures that each subset (like train/test) has the same proportion of classes or categories as the full dataset.
What is time-series cross-validation?
It evaluates models on time-ordered data by respecting the chronological order. It splits data into train/test sets multiple times, always using past data to predict future data, avoiding data leakage.
What 3 evaluation metrics for reggression are there?
- Mean Absolute Error
- Mean Squared Error
- Coefficient of Determination (R2, low is low variance)
What 2 basic evaluation metrics for classification are there?
- Error rate (proportion of wrongly classified data)
- Accuracy (1-error rate)
What does the confusion matrix say?
amount of TP, TN, FP, FN in a classification problem
What does the F-score say?
It is a evaluation metric that will give more weight to either precision or recall
What happens when β is low en when is β high?
β > 1 -> less weight to precision and more to recall.
β < 1 -> more weight to precision, less to recall.