M3 U1 - Data Science Lifecycle - Q1 Flashcards
Describe the major parts of the data science lifecycle
Define Business understanding. Who’s involved?
This must include:
- defining business and analytical objectives
- identifying data sources.
Members involved: The client and data science team are involved in this step to ensure that the analytic solutions meets the business objectives.
Define Data Acquisition
This process involves obtaining data from various sources and may also require setting up a data collection task and infrastructure. Data preparation techniques are employed to ensure the data is useful for analysis.
Define Data Preparation.
This is the process of cleaning and transforming raw data prior to processing and analysis. This needs to be done carefully as assumptions made here may influence, or even limit, the use of the data during analysis.
Define Data Exploration and Cleaning. (4)
Includes:
- Identifying variables
- Conducting uni-variate and multi-variate analysis
- Identifying outliers, anomalies and missing values
- Feature creation and selection
What’s the purpose of Feature Engineering
It’s needed to prepare proper datasets that are compatible with the suitable algorithms, and to improve the performance of models by leveraging domain knowledge to capture the signal of interest in the features.
Define generalize
The ability to match the training performance on unseen test data is referred to as the models ability to generalize
At what stage in the DS Lifecycle do you identify the business objectives of a data science project?
Business Understanding.
The process of using transforming raw data into informative properties that represent the business problem you are trying to solve is called:
Feature Engineering.
What are the roles on a typical data science team? (7)
- Data Scientist.
- Data Engineer.
- Solutions Architect.
- Machine Learning (ML) Engineer.
- Data/Business Analyst.
- Software Engineer.
- Domain Experts.
Data scientist
This role involves solving business tasks using machine learning model development and statistical techniques. This individual identifies trends and patterns within the data and makes predictions based on trends. The data scientist will write code to support the data analysis and model building process.
Data engineer
The Data Engineer specializes in data structures and algorithms, as well as in working with data through the operation of databases and other large repositories.
Solutions architect
This is a customer facing role that ensures end-to-end customer deployment for company-related data services. The Solutions Architect interacts with clients to design, coordinate, and execute solution prototypes.
ML Engineer
- performs modeling and software engineering tasks
- This individual spends a considerable amount of time programming and creating ML solutions but must also have strong statistical skills.
- different from the data scientist in that she is further away from the domain-side of the project.
Data/business analyst
- Has data gathering, analysis, and visualization skills.
- Compared to data scientists, they are typically firmly rooted in the business domain and less technically proficient in systems programming and advanced machine learning.
- Like the data scientist, she provides insights from data to inform decision making.
- Develops key performance indicators and utilizes business intelligence and analytics tools.