Exam 1 Flashcards

Question

Inmon Model

Answer 1

 EDW approach (top-down)  Highly consistent dimensional view of data  Large-scale and scope of project  Up-front cost  Long duration; may be inflexible and unresponsive to changing business needs during implementation  Flexible to support organizational changes as a whole

Answer 2

 Data mart approach (bottom-up)  Emphasizes the value of the data warehouse to business users as quickly as possible  Focuses on each individual business process making it a quick return on investment  Lacking the big picture of enterprise data warehousing i.e. missing some dimensions/redundant dimensions  Lower cost  Fairly simple

Answer 3

A retrieval-based system that supports high-volume query access

Answer 4

2. Contain a fact table surrounded by and connected to several dimension tables

Answer 5

Online Transaction Processing | Main Focus = Efficiency of Routine tasks

Answer 6

Online Analytical Processing | Converting data into information for decision support

Answer 7

1. Subset of a multidimensional array

Answer 8

1. Slice on more than two dimensions

Answer 9

1. Navigating among levels of data ranging from the most summarized (up) to the most detailed (down)

Answer 10

1. Computing all of the data relationships for one or more dimensions

Answer 11

1. Used to change the dimensional orientation of a report or an ad hoc query-page display

Answer 12

1. OLTP - To carry out day-to-day business functions | 2. OLAP - To support decision making and provide answers to business and management queries.

Answer 13

1. OLTP - Transaction database (a normalized data repository primarily focused on efficiency and consistency) 2. OLAP - Data warehouse or DM (a nonnormalized data repository primarily focused on accuracy and completeness)

Answer 14

1. OLTP - Routine, periodic, narrowly focused reports | 2. OLAP - Ad hoc, multidimensional, broadly focused reports and queries

Answer 15

1. OLTP - Ordinary relational databases | 2. OLAP - Multiprocessor, large-capacity, specialized databases

Answer 16

1. OLTP - Fast (recording of business trans-actions and routine reports) 2. OLAP - Slow (resource intensive, complex, large-scale queries

Answer 17

Do not have logical ordering String data Lacks order, scale, or distance between them

Answer 18

Have logical ordering

Answer 19

Has order, scale, and distance

Answer 20

 Access and collect the data  Select and filter the data  Integrate and unify the data

Answer 21

 Handle missing values in the data  Identify and reduce noise in the data  Find and eliminate erroneous data

Answer 22

 Normalize the data  Discretize or aggregate the data  Construct new attributes

Answer 23

 Reduce number of attributes  Reduce number of records  Balance skewed data

Answer 24

Show trends over time

Answer 25

Compare across multiple categories | Should always have a zero baseline

Answer 26

Proportions of a specific measure

Answer 27

Explore trends, concentration, and outliers | Relationship between two variables by encoding data on x and y axis

Answer 28

 Scatter plots with varying size and/or color of the circle, which can add additional dimensions  Display data in a cluster of circles

Answer 29

 X-axis usually a numeric variable, grouped into bins |  Conveys how data are distributed across groups

Answer 30

 Horizontal bar chart that portrays project timelines, tasks/activities, etc..

Answer 31

 Shows precedence relationships among project activites/tasks

Answer 32

 Compares a primary measure against a goal

Answer 33

 Quickly identifying highs and lows  Enhancing a crosstab  One or more dimension and exactly one measure

Answer 34

 Compare data across two categories using color

Answer 35

 Rectangles nested within other rectangles to show hierarchical data as a proportion to the whole

Answer 36

 Values of the input and output are previously known |  Predict the data set for when ONLY the input variables are known

Answer 37

 Find patterns in data based on the relationship between data points themselves  No target variable  The inputs are analyzed and clustered based on the proximity of input values to one another.

Answer 38

 Tells the nature of future occurrences of certain events based on what has happened in the past

Answer 39

 Market Basket |  Commonly co-occuring groupings of things

Answer 40

 Segmentation  Natural grouping of things based on their known characteristics  Customer demographics

Answer 41

 CRISP_DM takes a more comprehensive approach – including understanding of the business and the relevant data  SEMMA implicitly assumes that the data mining project’s goals and objective along with the appropriate data sources have been identified and understood