Study Flashcards

1
Q

Also known as the discovery phase

A

Business understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analyst defines the major questions of interest that need to be answered

A

Business understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The phase of collecting data

A

Data acquisition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Alternative names include data cleansing, data wrangling, data munging, and feature engineering

A

Data cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When ignored the results from analysis may be irrelevant
No one common tool, may use SQL, Python, R, or Excel
Data quality is measured in terms of uniqueness and relevance

A

Data cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Analyst begins to understand the basic nature of data and the relationships within
Often relies on visualization tools and numerical summaries such at central tendency and variability
Central tendency is a single value that attempts to describe a set of data by identifying the central position
Variability describes how far apart data points lie from each other and from the center of a distribution

A

Data exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Creating models that enable predictions of outcomes of interest
Tools such as Python and R play an important role in automating the training and use of models

A

Predictive modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sometimes machine learning is used as a synonym

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ability of computers to look for patterns in large amounts of data
Tools such as Python and R play an important role

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses

A

Reporting and visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The goal is to provide actionable insights for various stakeholders

A

Reporting and visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Scope Project
Identify stakeholders and research questions/KPIs
Identify timeline, budget, and participants

A

Business Understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gather/collect data from a variety of sources
Provide structure to data accessible via relational databases (SQL)
Build data pipeline (ETL)
Use of API to download data from an external source

A

Data acquisition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Estimate/project future values or likelihood of an event.
Extend correlations found in EDA to mathematical models
Predict/determine output values based on input values
Cross-validation of predictive models to ensure accuracy.

A

Predictive Modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Creating training and testing datasets to build models from
Identify/detect patterns
Determine if groups (clusters) exist in data
Classify data into groups
Create models that “learn” and improve (e.g., machine/deep learning, AI, etc.)

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tell a story with data
Provide a summary of analytic analysis
Provide insights to stakeholders
Create insightful graphs that showcase trends and forecasts

A

Reporting and visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What happened?

A

Descriptive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why did it happen?

A

Diagnostic Analytics

19
Q

What will happen?

A

Predictive Analytics

20
Q

How can we make it happen?

A

Predictive Analytics

21
Q

Is a relationship between two variables: when one variable changes, you know the degree in which the other variable changes

A

Correlation

22
Q

Is when there is a real-world explanation for why this is logically happening; it implies a cause and effect

A

Causation

23
Q

Which phase of the data analytics life cycle is also known as the discovery phase?

A

Business understanding

24
Q

Which phase of data analytics life cycle allows an analyst to use graphs or interactive dashboards to tell the story of the data?

A

Reporting and visualization

25
Q

Which phase of data analytics life cycle does the analyst begin to understand the nature of the data?

A

Data exploration

26
Q

Which phase of the data analytics life cycle provides structure to data accessible via relational databases?

A

Data acquisition

27
Q

Which term is defined as a relationship between two variables?

A

Correlation

28
Q

A way to graph numerical data in groups or bins that allow bars to represent frequencies

A

Histogram

29
Q

Provides a concise summary of the quartiles of numerical data (i.e., cut points that divide the data into 25% percentile segments)

A

Boxplot

30
Q

Colorful graph that can visually show frequency or interaction using a range of colors

A

Heatmap

31
Q

Two-dimensional graph
Great to visualize correlation or relationships

A

Scatterplot

32
Q

Predict an outcome based on a set of predictor variables

A

Regression

33
Q

Technique in which the analyst wants to assign an item to a specific category based on various conditions

A

Classification

34
Q

Groupings are unknown and the analyst wishes to determine if the objects belong to any groups

A

Clustering

35
Q

Looks for trends in data over time
Focused on breaking apart different reasons for the variation (decomposition)

A

Time series

36
Q

Technique attempts to group variables into meaning groups

A

Principal Component Analysis

37
Q

Tool - Data Science (Deep Learning/AI), Web Development, Embedded System

A

Python

38
Q

Tool - Data analysis and statistical modeling

A

R

39
Q

Tool - Can easily perform matrix computation as well as optimization

A

Python

40
Q

Tool - Consists of many to use packages

A

R

41
Q

Key Characteristic: Often numbers or labels, stored in a structured framework of columns and rows relating to pre-set parameters
Typical File Types: Databases

A

Structured Data File Type

42
Q

Key Characteristic: Loosely organized into categories using meta tags
Typical File Types: JSON, XML, Email, Web pages

A

Semi-structured Data File Type

43
Q

Key Characteristic: Text-heavy information that’s not organized in a clearly defined framework or model
Typical File Types: Audio, Video, Image data, Natural Language, Documents

A

Unstructured Data File Type