Exam 1 Flashcards

1
Q

Four Major components of BI

A

Data warehouse
Business Analytics
BPM
User interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Analytics

A

Answers “What happened”

Historic Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Predictive Analytics

A

Determine what is likely to happen in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Prescriptive Analytics

A

“Prescribe” a solution based on data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a data warehouse?

A

o A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Relational Database

A

o A subject-oriented, integrated, time-variant, nonvolatile collection go data in supports of management’s decision-making process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Subject Oriented

A
  1. Data organized by detailed subject such as sales or customers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Integrated

A
  1. Data inconsistencies are removed; data from diverse operational applications is integrated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Time-Variant

A
  1. Contains historical data. Time is the one important dimension that all data warehouses must support
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Nonvolatile

A

Users cannot change or update the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Mart

A

 A departmental smaller-scale “DW” that stores only limited/relevant data focused on a particular subject or department

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dependent Data Mart

A

Subset that is created directly from a data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Independent Data Mart

A

Small data warehouse designed for a strategic business unit or department, but its source is not a EDW.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Compare DW &DL

The nature of Data

A

o Data Warehouse – Structured, processed

o Date Lake – Any data in raw/native format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Compare DW &DL

Processing

A

o Data Warehouse – Schema-on-write (SQL)

o Date Lake – Schema-on-read (NoSQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Compare DW &DL

Retrieval Speed

A

o Data Warehouse – Very Fast

o Date Lake - Slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Compare DW &DL

Cost

A

o Data Warehouse – Expensive for large data volumes

o Date Lake – Designed for low-cost storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Compare DW &DL

Agility

A

o Data Warehouse – Less agile, fixed configuration

o Date Lake – Highly agile, flexible configuration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Compare DW &DL

Novelty

A

o Data Warehouse – Not new/matured

o Date Lake – Very new/maturing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Compare DW &DL

Security

A

o Data Warehouse – Well-secured

o Date Lake – Not yet well-secured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Compare DW &DL

Users

A

o Data Warehouse – Business professionals

o Date Lake – Data scientists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Extract?

A

Reading data from one or more sources (i.e. OLTP databases, personal databases, spreadsheets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Transform?

A

Converting the extracted data into the appropriate form, cleaning data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Load?

A

Putting the data in the data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Inmon Model

A

 EDW approach (top-down)
 Highly consistent dimensional view of data
 Large-scale and scope of project
 Up-front cost
 Long duration; may be inflexible and unresponsive to changing business needs during implementation
 Flexible to support organizational changes as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Kimball Model

A

 Data mart approach (bottom-up)
 Emphasizes the value of the data warehouse to business users as quickly as possible
 Focuses on each individual business process making it a quick return on investment
 Lacking the big picture of enterprise data warehousing i.e. missing some dimensions/redundant dimensions
 Lower cost
 Fairly simple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Dimensional Modeling

A

A retrieval-based system that supports high-volume query access

28
Q

Star Schema

A
  1. Contain a fact table surrounded by and connected to several dimension tables
29
Q

What is OLTP

A

Online Transaction Processing

Main Focus = Efficiency of Routine tasks

30
Q

What is OLAP

A

Online Analytical Processing

Converting data into information for decision support

31
Q

Slice

A
  1. Subset of a multidimensional array
32
Q

Dice

A
  1. Slice on more than two dimensions
33
Q

Drill Down/Up

A
  1. Navigating among levels of data ranging from the most summarized (up) to the most detailed (down)
34
Q

Roll-up

A
  1. Computing all of the data relationships for one or more dimensions
35
Q

Pivot

A
  1. Used to change the dimensional orientation of a report or an ad hoc query-page display
36
Q

Compare OLTP & OLAP

Purpose

A
  1. OLTP - To carry out day-to-day business functions

2. OLAP - To support decision making and provide answers to business and management queries.

37
Q

Compare OLTP & OLAP

Data Source

A
  1. OLTP - Transaction database (a normalized data repository primarily focused on efficiency and consistency)
  2. OLAP - Data warehouse or DM (a nonnormalized data repository primarily focused on accuracy and completeness)
38
Q

Compare OLTP & OLAP

Reporting

A
  1. OLTP - Routine, periodic, narrowly focused reports

2. OLAP - Ad hoc, multidimensional, broadly focused reports and queries

39
Q

Compare OLTP & OLAP

Resource Requirements

A
  1. OLTP - Ordinary relational databases

2. OLAP - Multiprocessor, large-capacity, specialized databases

40
Q

Compare OLTP & OLAP

Execution Speed

A
  1. OLTP - Fast (recording of business trans-actions and routine reports)
  2. OLAP - Slow (resource intensive, complex, large-scale queries
41
Q

Nominal

A

Do not have logical ordering
String data
Lacks order, scale, or distance between them

42
Q

Ordinal

A

Have logical ordering

43
Q

Interval

A

Has order, scale, and distance

44
Q

Data Consolidation

A

 Access and collect the data
 Select and filter the data
 Integrate and unify the data

45
Q

Data Cleaning

A

 Handle missing values in the data
 Identify and reduce noise in the data
 Find and eliminate erroneous data

46
Q

Data Transformation

A

 Normalize the data
 Discretize or aggregate the data
 Construct new attributes

47
Q

Data Reduction

A

 Reduce number of attributes
 Reduce number of records
 Balance skewed data

48
Q

Line Chart

A

Show trends over time

49
Q

Bar Chart

A

Compare across multiple categories

Should always have a zero baseline

50
Q

Pie Chart

A

Proportions of a specific measure

51
Q

Scatter Plot

A

Explore trends, concentration, and outliers

Relationship between two variables by encoding data on x and y axis

52
Q

Bubble chart

A

 Scatter plots with varying size and/or color of the circle, which can add additional dimensions
 Display data in a cluster of circles

53
Q

Histogram

A

 X-axis usually a numeric variable, grouped into bins

 Conveys how data are distributed across groups

54
Q

Gantt Chart

A

 Horizontal bar chart that portrays project timelines, tasks/activities, etc..

55
Q

PERT Chart

A

 Shows precedence relationships among project activites/tasks

56
Q

Bullet Chart

A

 Compares a primary measure against a goal

57
Q

Highlight Table

A

 Quickly identifying highs and lows
 Enhancing a crosstab
 One or more dimension and exactly one measure

58
Q

Heat Map

A

 Compare data across two categories using color

59
Q

Tree Map

A

 Rectangles nested within other rectangles to show hierarchical data as a proportion to the whole

60
Q

Supervised Learning

A

 Values of the input and output are previously known

 Predict the data set for when ONLY the input variables are known

61
Q

Unsupervised Learning

A

 Find patterns in data based on the relationship between data points themselves
 No target variable
 The inputs are analyzed and clustered based on the proximity of input values to one another.

62
Q

Prediction

A

 Tells the nature of future occurrences of certain events based on what has happened in the past

63
Q

Association

A

 Market Basket

 Commonly co-occuring groupings of things

64
Q

Cluster

A

 Segmentation
 Natural grouping of things based on their known characteristics
 Customer demographics

65
Q

CRISP-DM vs. SEMMA

A

 CRISP_DM takes a more comprehensive approach – including understanding of the business and the relevant data
 SEMMA implicitly assumes that the data mining project’s goals and objective along with the appropriate data sources have been identified and understood