11_Machine Learning Flashcards

1
Q

Overfitting

  • Training model overfitted to training data: Unable to generalize with new data
  • Training model fails to generalize: Accounting for slightly different but close enough data.
  • Causes of Overfitting:
    • Not enough training data
      • Need more variety of samples
    • Too many features
      • Too complex
    • Model fitted to unnecessary features unique to training data, a.k.a “Noise”
  • Solving for Overfitting:
    • Use more data:
      • Add more training data
      • More varied data allows for better generalization
    • Make the model less complex:
      • Use less (but more relevant) features = Feature Selection
      • Combine multiple co-dependant/redundant features into a single representative feature
        • This also helps reduce model training time
    • Remove noise
      • Increase regularization parameters
    • Regularization
    • Early Stopping
    • Cross Validation
    • Dropout Methods
  • If data is scarce:
    • Use independent test data
    • Cross Validation
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hyperparameters

  • Selection: Hyperparameter values needs to be specified before training begins
  • Types of Hyperparameters:
    • Model hyperparameters relate directly to the model that is selected.
    • Algorithm hyperparameters relate to the training of the model.
  • Training and Tuning: The process of finding the optimal, or near optimal values for hyperparameters.
  • Not related to training data!
  • Examples:
    • Batch size
    • Training epochs
    • Number of hidden layers in neural network
    • Number of nodes in hidden layers in neural network
    • Regularization type
    • Regularization rate
    • Learning rate aka steps size
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Feature Engineering

Transform data so it is fit for Machine Learning.

  • Imputation (for missing data)
  • Outliers and Feature Clipping
    • If your data set contains extreme outliers, you might try feature clipping, which caps all feature values above (or below) a certain value to fixed value
  • One-hot Encoding (for categorical data)
    • One-hot encoding is commonly used to represent strings or identifiers that have a finite set of possible values. For example, suppose a given botany dataset chronicles 15,000 different species, each denoted with a unique string identifier. As part of feature engineering, you’ll probably encode those string identifiers as one-hot vectors in which the vector has a size of 15,000
  • Linear Scaling
  • Log Scaling
  • Bucketing/Bucketization
    • Transformation of numeric features into categorical features, using a set of thresholds (e.g. latitude in equally spaced buckets to predict house prices)
  • Feature prioritization
  • Feature Crosses
    • A feature cross is a synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regularization

  • Training: minimise (loss (data | model)
  • Regularization: complexity (model)
    • L2 Regularization term
    • L1 Regularization term
  • Training with Regularization: minimise (loss (data | model) + Lambda * complexity (model)
  • Adds a penalty to a model as it becomes more complex
  • Penalizing parameters = better generalization
  • Cuts out noise and unimportant data, to avoid overfitting

Regularization Types

L1 and L2 regularization - Different approaches to tuning out noise. Each has different use case and purpose

  • L1 - Lasso Regression: Simplicity. Assigns greater importance to more influential features
    • Shrinks less important features influence to zero
    • Good for models with many features, some more important than others
    • Example: Choosing features to predict likelihood of home selling:
      • House price more influential feature than carpet color
  • L2 - Ridge Regression: Sparcity. Performs better when all the input features influence the output, and with all weights being roughly equal size.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Techniques Glossary

  • Precision: formula to check how accurate the model is when most of the output are positives.
  • Recall: formula to check how accurate the model is when most of the output are negatives.
  • Gradient Descent: optimization algorithm to find the minimal value of a function. Gradient descent is used to find the minimal RMSE or cost function
  • Dropout Regularization: regularization method to remove random selection of a fixed number of units in a neural network layer. More units dropped out, the stronger the regularization.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly