11_Machine Learning Flashcards
1
Q
AI vs ML vs DL
A
2
Q
Machine Learning Options
A
3
Q
What is Machine Learning
- Process of combining inputs to produce useful predictions on never-before-seen data.
- Makes a machine learn from data to make predictions on future data, instead of programming every scenario.
- How it works:
- Train a model with examples
- Example = input + label
-
Training = adjust model to learn relationship between features and label - minimize error:
- Optimize weights and biases (parameters) to different input features.
- Feature = input variable(s)
- Inference = apply trained model to unlabeled examples.
- Separate test and training data ensures model is generalized for additional data.
- Otherwise, leads to overfitting (only models to training data, not new data)
A
4
Q
Machine Learning Pipeline
A
5
Q
Features and Labels
A
6
Q
Curve Fitting
A
7
Q
Optimization using Gradient Descent
A
8
Q
Machine Learning Types
-
Supervised learning
- Apply labels to data (“cat”, “spam”)
- Regression - Continuous, numeric variables:
- Predict stock price, student test scores
- Classification - categorical variables:
- yes/no, decision tree
- “is this email spam?” “is this picture a cat?”
- Same types for dataset columns:
- continuous (regression) and categorical (classification)
- income, birth year = continuous
- gender, country = categorical
-
Unsupervised learning
- Clustering - finding patterns
- Not labeled or categorized
- “Given the location of a purchase, what is the likely amount purchased?”
- Heavily tied to statistics
-
Reinforcement learning
- Use positive/negative reinforcement to complete a task
- Complete a maze, learn chess
- Use positive/negative reinforcement to complete a task
A
9
Q
Supervised Learning
A
10
Q
Reinforcement Learning
A
11
Q
Model Type - Regression
A
12
Q
Model Type - Classification
A
13
Q
Model Type - Clustering
A
14
Q
Transfer Learning
A
15
Q
Overfitting
- Training model overfitted to training data: Unable to generalize with new data
- Training model fails to generalize: Accounting for slightly different but close enough data.
-
Causes of Overfitting:
-
Not enough training data
- Need more variety of samples
-
Too many features
- Too complex
- Model fitted to unnecessary features unique to training data, a.k.a “Noise”
-
Not enough training data
- Solving for Overfitting:
- Use more data:
- Add more training data
- More varied data allows for better generalization
- Make the model less complex:
- Use less (but more relevant) features = Feature Selection
- Combine multiple co-dependant/redundant features into a single representative feature
- This also helps reduce model training time
- Remove noise
- Increase regularization parameters
- Regularization
- Early Stopping
- Cross Validation
- Dropout Methods
- Use more data:
- If data is scarce:
- Use independent test data
- Cross Validation
A