Wk 1: Data Science Process Flashcards
What are the steps in the DS process?
Acquire, Prepare: Explore, Prepare: Pre-Process, Analyze, Communicate, Take Action
What is the acquire step?
identify suitable data, use all available data
What are some different data sources?
Traditional DBs, Text files / spreadsheets, remote data (web sites), NoSQL Storage
What is the Prepare: Explore?
Looking for correlations, trends and outliers. Use visualizations.
What is the Prepare: Pre-Process step?
Clean and transform data. AKA munging, wrangling.
Why would you scale / normalize data?
To equalize the contributions of variables with different magnitudes
How might you do feature selection?
Remove, combine or add new features?
What is dimension reduction?
Find a smaller subset of features that captures most of the variation. Common method: Principal Component Analysis.
Why is data preparation so important?
Garbage in, Garbage out
Regression model
Predict a value. eg, stock price
Classification model
Predict the category of a thing. Eg, weather category or image classification
Association Analysis
Find associations between items. Eg, basket analysis
Graph Analysis
Graph structure. eg, Social Networks, Disease transmission
How to evaluate classification / regression models
Compare prediction vs actual
What are some javascript visualization tools for the web?
D3, Leaflet for maps, Timeline for timelines,