Chapter 2 - Descriptive Analytics I Flashcards

1
Q

What are the Characteristics that Define the Readiness Level of Data for an Analytics Study? (1-5)

A

Data source reliability - appropriateness of the medium where the data was obtained
Data content accuracy - data are correct and a good match for the analytics problem
Data accessibility - data are readily and easily obtainable
Data security and data privacy - only those who have authority and need to access the data can access it
Data richness - all the required data elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the Characteristics that Define the Readiness Level of Data for an Analytics Study? (6-10)

A

Data consistency - means data are accurately collected and combined/merged
Data currency/data timeliness - means the data should be up to date for a given analytics model
Data granularity - requires that the variables and data values be defined at lowest level of detail for the intended use of the data.
Data validity - the term used to describe a match/mismatch between the actual and expected data values of a given variable.
Data relevancy - means the variables in the data set are all relevant to the study being conducted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What has to match related to data and its usability?

A

The data has to match with the task for which it is intended to be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does it mean to have data “analytics ready”?

A

It means the data has been transformed into a flat-file format and is ready for ingestion into predictive algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three broad sources of data in which business analytics come from?

A

Unstructured data, categorical structured data, and numerical structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is datum defined?

A

A singular form of data. A collection of facts usually obtained as the result of experiments, observations, transactions, or experiences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is unstructured data?

A

Data composed of any combination of textual, imagery, voice, and Web content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is structured data?

A

Categorical or numeric data that is used in data mining algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the six elements of the data taxonomy?

A

Categorical Data
Nominal Data
Ordinal Data
Numeric Data
Interval Data
Ratio Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two main categories of structured data and the two subcategories below them?

A

Structured data is either categorical or numerical.
Categorical data is either nominal or ordinal
Numerical data is either interval or ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the subcategories for unstructured data?

A

Textual
Multimedia
XML/JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the four steps of data preprocessing?

A

Data consolidation
Data cleaning
Data transformation
Data reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is dimensional reduction or variable selection as it relates to data preparation?

A

The reduction of variables that describe the phenomenon from different perspectives down to a manageable size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the main tasks and methods for Data Consolidation?

A

Tasks: Access and collect the data, select and filter the data
Methods: SQL Queries, domain expertise, software agents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the main tasks and methods for Data Cleaning?

A

Tasks: Handle missing values, ID and reduce noise, Find and eliminate errors
Methods: Fill in missing values, ID outliers and either remove or smooth the them, ID erroneous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the main tasks and methods for Data Transformation?

A

Tasks: Normalize, aggregate data; construct new attributes
Methods: Reduce range of values to a standard range; if needed, convert numeric variables to discrete variables; Derive new and more informative variables from existing ones

17
Q

What are the main tasks and methods for Data Reduction?

A

Tasks: Reduce number of attributes and records, balance skewed data
Methods: Principle component analysis; sampling; oversample the less represented or undersample the overrepresented classes.

18
Q

How is dispersion defined?

A

It is the representation of the numerical spread of a given data set.

19
Q

What is the box and whiskers plot?

A

A graphical illustration of both centrality and dispersion of a given data set

20
Q

What is correlation vs. regression?

A

Correlation is interested in the low-level relationships between two variables, regression is concerned with the relationships between all explanatory variables and the response variable.

21
Q

What is simple versus multiple regression?

A

A simple regression is built between one response variable and one explanatory variable. Multiple regression is built between one response variable and multiple explanatory variables.

22
Q

What are the three elements of developing regression models? (Revisit)

A

R squared
Overall F-test
Root mean square error (RMSE)

23
Q

What are the five assumptions to linear regression?

A
  1. Linearity
  2. Independence
  3. Normality
  4. Constant variance
  5. Multicollinearity
24
Q

What is time series forecasting?

A

The use of mathematical modeling to predict future values of the variable of interest based on previously observed values.

25
Q

What are the three types of business reporting and what do each report?

A

Metric management reports - outcome-oriented metrics (e.g. KPIs)
Dashboard Reports - A range of different performance indicators on one page
Balanced-Scorecard Reports - Presents an integrated view of success in the org. From financial, customer, business process, and learning and growth perspectives.

26
Q

What is data visualization?

A

The use of visual representations to explore, make sense of, and communicate data.