L1 - Intro to Machine Learning Flashcards
What are the basic steps involved with machine learning?
Raw data converted to matrix data.
Matrix data used to produce a test and training set.
Test and training set fed into model.
Model is evaluated.
What is raw data?
Data that is unprocessed
Missing or incomplete observations
Improperly formatted data
Explain data types in the context of data preparation?
Transforms data so it can be used for ML
ML algorithm inputs must be certain data types (e.g. numeric)
Raw data often has many variables with different data types that will not work in a ML model
How does machine learning “learn”?
You find historic data that has inputs and outputs (e.g. football teams as an input and total points as an output)
The model is trained by learning from the relationships between the inputs and outputs
When new data comes (without a prediction/output yet) this can be inputted into a trained model and predict values
What is CRISP-DM?
The leading ML methodology used by industry.
Cross Industry Standard Processing for Data Mining.
Includes:
Business understanding; data understanding; data preparation; modelling; evaluation; deployment.
Business understanding
Explore what your client wants to get from data mining
Data understanding
Assess what data is available and verify their quality
Data preparation
Transform initial raw data into one suitable for modelling
Modelling
Select the appropriate modelling technique and tune the parameter settings to optimise the results
Evaluation
Evaluate the model in the context of the business goals and success criteria
Deployment
Communicate results and suggestion implementation or actions based on findings