Data Journey Flashcards

1
Q

What is a potential problem to consider in the planning phase?
A. Lack of clear focus on stakeholders, timeline, limitations, and budget
B. Quality and type of data may make access more difficult
C. Some cleaning techniques could dramatically change data/outcomes
D. Outliers not dealt with can cause problems with statistical models due to excessive variability.

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Which of these is NOT a topic of interest for Discovery/Planning/Business Understanding?
A. Project Scope
B. Identify stakeholders and research questions/KPIs
C. Build a data pipeline (ETL)
D. Identify timeline, budget, and participants

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a potential problem to consider in the planning phase?
A. Lack of clear focus on stakeholders, timeline, limitations, and budget
B. Quality and type of data may make access more difficult
C. Some cleaning techniques could dramatically change data/outcomes
D. Outliers not dealt with can cause problems with statistical models due to excessive variability.

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In what phase does the analyst identify the stake holders and research questions?

A

Business Understanding/Planning/Discovery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In what phase does the analyst deal with the following:

Gather/collect data from a variety of sources
Provide structure to data accessible via relational databases (SQL)
Build data pipeline (ETL)
Use of API to download data from an external source

A

Data acquisition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In what phase does the analyst deal with the following:

Fixing improperly formatted values
Dealing with duplicates, missing data, and outliers
Data reduction

A

Data cleaning/wrangling/scrubbing/munging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In what phase does the analyst deal with the following:

Central Tendency/ Measures of center (e.g., mean, median, mode), variability (e.g., standard deviations and quartiles) and distributions (e.g., normal, skewed, etc)
Identify basic correlations between variables
Pattern discovery

A

Data exploration/Exploratory Data Analysis(EDA)/Descriptive Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In what phase does the analyst deal with the following:

Estimate/project future values or likelihood of an event.
Extend correlations found in EDA to mathematical models
Predict/determine output values based on input values
Cross-validation of predictive models to ensure accuracy.

A

Predictive Modeling/Data Modeling/Correlation based models/Regression models/Time Series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In what phase does the analyst deal with the following:

Creating training and testing datasets to build models from
Identify/detect patterns
Determine if groups (clusters) exist in data
Classify data into groups
Create models that “learn” and improve (e.g., machine/deep learning, AI, etc)

A

Data Mining/Machine Learning/AI/Supervised, Unsupervised Models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
16
Q
A
17
Q
A
18
Q
A
19
Q
A
20
Q
A
21
Q
A
22
Q
A
23
Q
A
24
Q
A
25
Q
A
26
Q
A
27
Q
A
28
Q
A
29
Q
A