Exam 1 Flashcards

1
Q

Four Major components of BI

A

Data warehouse
Business Analytics
BPM
User interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Analytics

A

Answers “What happened”

Historic Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Predictive Analytics

A

Determine what is likely to happen in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Prescriptive Analytics

A

“Prescribe” a solution based on data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a data warehouse?

A

o A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Relational Database

A

o A subject-oriented, integrated, time-variant, nonvolatile collection go data in supports of management’s decision-making process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Subject Oriented

A
  1. Data organized by detailed subject such as sales or customers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Integrated

A
  1. Data inconsistencies are removed; data from diverse operational applications is integrated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Time-Variant

A
  1. Contains historical data. Time is the one important dimension that all data warehouses must support
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Nonvolatile

A

Users cannot change or update the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Mart

A

 A departmental smaller-scale “DW” that stores only limited/relevant data focused on a particular subject or department

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dependent Data Mart

A

Subset that is created directly from a data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Independent Data Mart

A

Small data warehouse designed for a strategic business unit or department, but its source is not a EDW.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Compare DW &DL

The nature of Data

A

o Data Warehouse – Structured, processed

o Date Lake – Any data in raw/native format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Compare DW &DL

Processing

A

o Data Warehouse – Schema-on-write (SQL)

o Date Lake – Schema-on-read (NoSQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Compare DW &DL

Retrieval Speed

A

o Data Warehouse – Very Fast

o Date Lake - Slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Compare DW &DL

Cost

A

o Data Warehouse – Expensive for large data volumes

o Date Lake – Designed for low-cost storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Compare DW &DL

Agility

A

o Data Warehouse – Less agile, fixed configuration

o Date Lake – Highly agile, flexible configuration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Compare DW &DL

Novelty

A

o Data Warehouse – Not new/matured

o Date Lake – Very new/maturing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Compare DW &DL

Security

A

o Data Warehouse – Well-secured

o Date Lake – Not yet well-secured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Compare DW &DL

Users

A

o Data Warehouse – Business professionals

o Date Lake – Data scientists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Extract?

A

Reading data from one or more sources (i.e. OLTP databases, personal databases, spreadsheets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Transform?

A

Converting the extracted data into the appropriate form, cleaning data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Load?

A

Putting the data in the data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Inmon Model
 EDW approach (top-down)  Highly consistent dimensional view of data  Large-scale and scope of project  Up-front cost  Long duration; may be inflexible and unresponsive to changing business needs during implementation  Flexible to support organizational changes as a whole
26
Kimball Model
 Data mart approach (bottom-up)  Emphasizes the value of the data warehouse to business users as quickly as possible  Focuses on each individual business process making it a quick return on investment  Lacking the big picture of enterprise data warehousing i.e. missing some dimensions/redundant dimensions  Lower cost  Fairly simple
27
Dimensional Modeling
A retrieval-based system that supports high-volume query access
28
Star Schema
2. Contain a fact table surrounded by and connected to several dimension tables
29
What is OLTP
Online Transaction Processing | Main Focus = Efficiency of Routine tasks
30
What is OLAP
Online Analytical Processing | Converting data into information for decision support
31
Slice
1. Subset of a multidimensional array
32
Dice
1. Slice on more than two dimensions
33
Drill Down/Up
1. Navigating among levels of data ranging from the most summarized (up) to the most detailed (down)
34
Roll-up
1. Computing all of the data relationships for one or more dimensions
35
Pivot
1. Used to change the dimensional orientation of a report or an ad hoc query-page display
36
Compare OLTP & OLAP | Purpose
1. OLTP - To carry out day-to-day business functions | 2. OLAP - To support decision making and provide answers to business and management queries.
37
Compare OLTP & OLAP | Data Source
1. OLTP - Transaction database (a normalized data repository primarily focused on efficiency and consistency) 2. OLAP - Data warehouse or DM (a nonnormalized data repository primarily focused on accuracy and completeness)
38
Compare OLTP & OLAP | Reporting
1. OLTP - Routine, periodic, narrowly focused reports | 2. OLAP - Ad hoc, multidimensional, broadly focused reports and queries
39
Compare OLTP & OLAP | Resource Requirements
1. OLTP - Ordinary relational databases | 2. OLAP - Multiprocessor, large-capacity, specialized databases
40
Compare OLTP & OLAP | Execution Speed
1. OLTP - Fast (recording of business trans-actions and routine reports) 2. OLAP - Slow (resource intensive, complex, large-scale queries
41
Nominal
Do not have logical ordering String data Lacks order, scale, or distance between them
42
Ordinal
Have logical ordering
43
Interval
Has order, scale, and distance
44
Data Consolidation
 Access and collect the data  Select and filter the data  Integrate and unify the data
45
Data Cleaning
 Handle missing values in the data  Identify and reduce noise in the data  Find and eliminate erroneous data
46
Data Transformation
 Normalize the data  Discretize or aggregate the data  Construct new attributes
47
Data Reduction
 Reduce number of attributes  Reduce number of records  Balance skewed data
48
Line Chart
Show trends over time
49
Bar Chart
Compare across multiple categories | Should always have a zero baseline
50
Pie Chart
Proportions of a specific measure
51
Scatter Plot
Explore trends, concentration, and outliers | Relationship between two variables by encoding data on x and y axis
52
Bubble chart
 Scatter plots with varying size and/or color of the circle, which can add additional dimensions  Display data in a cluster of circles
53
Histogram
 X-axis usually a numeric variable, grouped into bins |  Conveys how data are distributed across groups
54
Gantt Chart
 Horizontal bar chart that portrays project timelines, tasks/activities, etc..
55
PERT Chart
 Shows precedence relationships among project activites/tasks
56
Bullet Chart
 Compares a primary measure against a goal
57
Highlight Table
 Quickly identifying highs and lows  Enhancing a crosstab  One or more dimension and exactly one measure
58
Heat Map
 Compare data across two categories using color
59
Tree Map
 Rectangles nested within other rectangles to show hierarchical data as a proportion to the whole
60
Supervised Learning
 Values of the input and output are previously known |  Predict the data set for when ONLY the input variables are known
61
Unsupervised Learning
 Find patterns in data based on the relationship between data points themselves  No target variable  The inputs are analyzed and clustered based on the proximity of input values to one another.
62
Prediction
 Tells the nature of future occurrences of certain events based on what has happened in the past
63
Association
 Market Basket |  Commonly co-occuring groupings of things
64
Cluster
 Segmentation  Natural grouping of things based on their known characteristics  Customer demographics
65
CRISP-DM vs. SEMMA
 CRISP_DM takes a more comprehensive approach – including understanding of the business and the relevant data  SEMMA implicitly assumes that the data mining project’s goals and objective along with the appropriate data sources have been identified and understood