Python ML Principles Flashcards
Learn the main steps and sub-steps of ML.
What are the four main steps in ML?
Visualisation
Cleaning and Transformation
Construction of ML model
Evaluation of ML model
What are the two main sub-steps of the Cleaning and Transformation step?
1) Data Preparation & Cleaning
2) Feature Engineering
What should you do before starting Preparation & Cleaning?
Explore the data to understand the issues that are present.
What are six sub- steps of the Data Preparation & Cleaning step?
- Recode chr strings to eliminate unrecognised characters
- Find & treat missing values
- Set correct data type and column
- Transform categorical features to increase cases
- Apply transformation to numerics to improve distributions
- Duplicate management
What’s another name for “transformation to improve distribution” ?
Feature engineering
Name a common transformation.
Log
What is the main thing we’re trying to achieve with Feature Engineering?
We’re trying to achieve distinct separation of the labelled cases, indicating better prediction.
What is a test used to evaluate linear regression accuracy?
Sum of squared errors.
Why is linear regression sometimes called Least Squares Regression?
Because it creates a line that minimises the square of variance (error) from the line.
With linear regression in scikit.learn what Python package should you use for your arrays?
Numpy
What are the 4 main steps for linear regression with scikit.learn?
Layout numpy arrays
Scale
Specify model object
Fit
What visualisation could you use to evaluate residuals of a regression model?
A histogram of residuals.
What two residuals histogram pattern indicate an accurate model?
- Clustering of residuals around zero.
2. Normal distribution.
If you see a multimodal residuals histogram what should you do about the non-zero modes?
Investigate what’s creating them and consider adding these features to your model.
In scikit learn, what is onehot encoding?
Conversion of multiple feature options to a numpy array, where only one row for the record shows a “1”