D204 Flashcards
An analyst defines the major questions of interest that need to be answered, determines the needs of the stakeholders, and assesses the resource constraints of the project. Define project outcomes.
Business Understanding/Discovery phase
Data is collected and stored, for easy retrieval from a database, perhaps a component of a data warehouse, by using a language like SQL. Web scraping and surveys to acquire data.
Data Acquisition / Collecting Data phase
Also known as data wrangling, data munging, and feature engineering. Analyst will use SQL, Python, R, or Excel to perform data modifications and transformations
Data Cleaning phase
Analyst begins to understand the basic nature of data, the relationships within it (btw data variables), the structure of the dataset, the presence of outliers, and the distribution of data values.
This phase uses data visualization tools and numerical summaries such as measures of central tendency and variability.
Data Exploration phase
Allows the analyst to move beyond describing the data to creating models that enable predictions of outcomes of interest. Python and R are used in automating the training and use of models.
Predictive Modeling phase
Looks for patterns in large sets of data. Tools are Python and R. Also called Machine learning.
Data Mining phase
Analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses.
Data Reporting phase
Data Analytics Lifecycle phases in order
- Business Understanding/Discovery
- Data Acquisition / Collecting Data
- Data Cleaning
- Data Exploration
- The Predictive Modeling
- The Data Mining
- The Data Reporting