Study Flashcards
Also known as the discovery phase
Business understanding
Analyst defines the major questions of interest that need to be answered
Business understanding
The phase of collecting data
Data acquisition
Alternative names include data cleansing, data wrangling, data munging, and feature engineering
Data cleaning
When ignored the results from analysis may be irrelevant
No one common tool, may use SQL, Python, R, or Excel
Data quality is measured in terms of uniqueness and relevance
Data cleaning
Analyst begins to understand the basic nature of data and the relationships within
Often relies on visualization tools and numerical summaries such at central tendency and variability
Central tendency is a single value that attempts to describe a set of data by identifying the central position
Variability describes how far apart data points lie from each other and from the center of a distribution
Data exploration
Creating models that enable predictions of outcomes of interest
Tools such as Python and R play an important role in automating the training and use of models
Predictive modeling
Sometimes machine learning is used as a synonym
Data mining
Ability of computers to look for patterns in large amounts of data
Tools such as Python and R play an important role
Data mining
An analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses
Reporting and visualization
The goal is to provide actionable insights for various stakeholders
Reporting and visualization
Scope Project
Identify stakeholders and research questions/KPIs
Identify timeline, budget, and participants
Business Understanding
Gather/collect data from a variety of sources
Provide structure to data accessible via relational databases (SQL)
Build data pipeline (ETL)
Use of API to download data from an external source
Data acquisition
Estimate/project future values or likelihood of an event.
Extend correlations found in EDA to mathematical models
Predict/determine output values based on input values
Cross-validation of predictive models to ensure accuracy.
Predictive Modeling
Creating training and testing datasets to build models from
Identify/detect patterns
Determine if groups (clusters) exist in data
Classify data into groups
Create models that “learn” and improve (e.g., machine/deep learning, AI, etc.)
Data mining
Tell a story with data
Provide a summary of analytic analysis
Provide insights to stakeholders
Create insightful graphs that showcase trends and forecasts
Reporting and visualization
What happened?
Descriptive Analytics