Exam 1 Flashcards
___ is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies.
BI
______ is data that cannot be stored or processed easily using traditional tools/means.
Big data
A(n) _____ is a major component of a BI system that holds source data.
Data Warehouse
A(n)_____ is a major component of a BI system that is often browser based and often presents a portal or dashboard.
User Interface
OLAP
Online Analytical Processing
In Chapter 1our authors defined three types of business analytics. Identify the three types and the two questions typically answered by each of the three.
Descriptive - What happened? Why is it happening? Predictive - What will happen? why will it happen? Prescriptive - What should i do? why should i do it?
_____ statistics is about drawing conclusions about population based on sample data.
Inferential
______ is used to charecterize the relationship between explanatory (input) and response (output) variables.
Regression
_____ refers to the use of visual representations to explore, make sense of, and communicate data.
Data visualization
the fundemental challenge of _____ design is to display all the required information on a single screen, clearly and without distraction, in manner that can be assimilated quickly
dashboard
KPI
Key Preformance Indicator
In chapter 2, our authors defined four main tasks in their summary of data preprocessing tasks and methods. First, briefly describe why data preprocessing is necessary and then identify the four main tasks.
real world data is dirty, misaligned, overly complex, and inaccurate. data: consolidation cleaning transforming reduction
common features of a data warehouse
single version of truth
checked for quality
transaction system of Independence
common features of the semantic model
inventory of data elements
common business names
pre-defined joins
standardized calculations
common features of a presentation layer
multiple tools available
choose based on purpose
all point at same data
ETL
Extract Transform Load
A(n) _____ the most commonly used and simplest style of dimensional modeling, contains a fact table surrounded by an connected to several dimension tables.
Star Schema
Star Schema
- most commonly used and the simplest style of dimensional modeling
- Contains fact table surrounded by and connected to several dimension tables
briefly describe some of the defining characteristics of Kimbell’s data warehouse development approach:
Dimensional model/star-schema
incremental
high development difficulty
_____ is the nontrivial process of identifying valid, novel, potentially useful, and ultimatly understandable patterned in data stored in structured databases
Data mining
CRISP DM
Cross Industry Standard Process for Data Mining
identify the three types of patterns used in data mining
assosiation
prediction
segmentation
sequential
identify the 6 steps in the CRISP DM process
- business understanding
- data understanding
- data preparation
- model building
- testing and evaluation
- deployment