General Flashcards
What are the 9 key steps in the Data Science Process Workflow?
- Specify Objective
- Acquire Data
- Clean Data
- Explore Data
- Establish baseline model
- Model the Data
- Analyse Results
- Communicate findings
- Iterate -> 2.
What are the key questions to ask of recently acquired data?
What is relevant? How was it sampled? Where can we obtain the data? How can we clean the data? Are there privacy issues?
What are the key questions to ask when exploring data?
Are there anomalies? Are there any Patterns? How to deal with the missing data? What does the data represent? What features will be relevant ? Can we construct new features?
What does it mean to Establish a baseline?
Create a simple baseline model/score that all future models are compared against.
What are the key steps in modelling data
Build a model Fit the model Validate the model What assumptions are we making about the data? Which types of models perform better?
What are the important steps when communicating findings
Restate the hypothesis, process, findings and analysis.
Ensure that results are reproducible
Key steps in Iterating a model
Continue to aquire more data
Create new features
Improve the current model
Key Data Science Deliverables
Prediction Forecasts Anomaly Detection Recognition Optimisation Segmentation Recommendations
Prediction examples
Predicting is a borrower will repay on time or predicting who will win an election next year
Forecast examples
Forecasting future sales and demand, tomorrow’s weather
Anomaly detection
Detecting credit card fraud, money laundering etc
Optimisation examples
Minimise shipping costs
Finding optimal routes
Segmentation examples
Finding groups of similar customers to customize advertising, or detect high yield segments
Recognition examples
Recognise speech, text or images
Why is it important to establish a baseline
A baseline puts a more complex model into context