Warehousing and BI Flashcards

1
Q

Why do we use a data warehouse

A
  • standardise from multiple sources
  • saves time when building reports & analysis
  • allows new ways to report and analyse
  • prevent adverse impact on operational systems
  • enable comparison without underlying data changing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

OLAP

A

Online analytical Processing (large amounts records and heavy duty queries, small number of knowledge areas)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

OLTP

A

Online transaction processing (lots of data coming from lots of different areas)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data warehouse

A

(big warehouse where we store all kinds of information from all of the different source systems in our organization)
subject orientated, integrated, time variant and non-volatile collection of summary and detailed historical data used to support the strategic decision making processes for the corporation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Once data is in a data warehouse…

A

it does not change (non-volatile)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does warehoused data vary from transactional data?

A
  • subject orientated, only a specific subject area
  • integrates multiple sources of data
  • contains data over time periods, not just most recent data
  • data is non-volatile, it will not change once its in the warehouse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Mart

A

Data stored for particular type of analytics, highly processed and easy to navigate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cubes

A

Multi dimensional storage (software e.g., PowerBI, tableau and hardware)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data lake vs warehouse

A

lake is broader and nearer real time, often has data pipelined through, limited processing (no cleaning, integration etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data lake data is used by

A

A data scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Active vs classic data warehousing difference

A

active add operational BI (i.e., closer to real time, operational services etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Big Data points of definition

A
  • Viscosity (how difficult to use and integrate)
  • Volatility (how often changes are made, how long is data useful)
  • Veracity (how trustworthy)
  • Variability (the forms that data is stored, inconsistent across data sets)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Veracity

A

How trustworthy data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data warehousing (processes)

A
  • extract processes
  • cleansing processes
  • transformation processes
  • load processes
  • associated control processes
  • the use of meta data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3 types of datawarehouse

A

conventional, dimensional virtual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Conventional Data warehouse (Inmon)

A

source -> transformation -> staging area -> data warehouse (normalised, atomic) -> data marts (normalised) / cubes (denormalised) -> analysis

entire datawarehouse, pull out data marts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Dimensional data warehouse (kimball)

A

sources -> transformation -> staging area -> data marts (all as one data warehouse) -> cubes

lots of datamarts ready for use, assembled into data warehouse. Denormalised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Dimensions vs facts

A

Do analysis on facts, item/thing being descirbed
Do analysis by dimensio, background/context e.g., time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Another word for a dimensional model

A

star schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Dimensional modelling is (normalised/denormalised)

A

denormalised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What makes up a dimensional model

A

dimension tables (nouns)
fact tables (events/measures)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Hierarchies for dimensions are…

A

stored in the dimensional table itself so there is no need for a separate hierachical table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is hierarchical data

A

Data where items are linked to each other in parent-child relationships e.g., animals, mammals, primates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Additive facts

A

facts that can be added up across multiple dimensions without loosing value

e.g., number of units sold across product and customer etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How to deal with changing dimensions?
- overwrite (loose history) - add a new row (for every product which was re categorised, with a new surrogate key which points to the facts) - add a new attribute (i.e. new category) (not good if lots of changes) - add history table ( )
26
example of changing dimensions
sales category changed
27
Problem with changes dimensions
- destroying history
28
How to change a dimension for lots of back to back comparisons
add an attribute
29
How to change dimension for in depth over time queries
add rows
30
Aggregated facts
associating the facts with a certain dimension and creating a data store
31
Types of BI Tools
-Query and reporting tools - OLAP (online analytical processing) - Analytical Applications (in a box, feed the exact data but then heaps of analytics available) - dashboard & scorecard - performance management tools (e.g., budgeting) - predictive analytics and data mining - advanced visualisation and discovery tools (interact with the visual)
32
Dashboard
dynamic presentation of operational information
33
Scorecard
static representation of progress towards longer term goals
34
Data Mining
Interactive exploration of data with no specific opening question, explore relationships via algorithms
35
Predictive Anlaytics
tools to predict the future and support what if analysis
36
Dimensional data modelling allows
ease of use and fast queries (but is difficult to set up).
37
Optimal data mining tool use
- Fraud detection - customer segregation and scoring - predictive analysis - identifying potential loan defaulters
38
Examples of BI
- strategic analytics for business decisions - decision support systems - supporting risk management decision reporting - identifying top quartile customers
39
What do analytic applications provide?
a pre-built solution to optimise a functional area or industry segment
40
Slice dice roll-up and pivot are terms used in what type of data processing?
OLAP
41
Active Data warehousing
(new) architectural approach where volatile data is provisioned in a data warehouse structure to provide transactional systems with a combination of historical and near real time data to meet customer needs
42
ODS acronym
Operational data store
43
EDI acronym
electronic data interchange
44
What implementation approaches do not support OLAP?
Flat Online Analytics Processing
45
Which of these is a core idea of Data Warehousing shared by both Inmon and Kimball?
Storing and organising data in a Data Warehouse increases its value
46
Data Warehouses need to be incrementally developed. Which process is critical to the success of such a project?
A strong release management process
47
Business intelligence
(1) a way of analyzing data to understand how an organization works. (2) a set of computer tools that help people do this kind of analysis.
48
Purposes of a data warehouse
support operational functions support compliance requirements enable effective business analysis and decision making,
49
Kimball DW approach
A data warehouse is copy of transaction data specifically structured for query and analysis (dimensional model)
50
Inmon DW approach
A data warehouse is a subject orientated integrated time-variant non-volatile collection of data in support of managements decision-making process (More complex to query and analyse)
51
Facts represent...
Facts represent the actual events or transactions e.g., number of products sold
52
Dimensions represent...
dimensions provide context for those events e.g., names, locations
53
What is a conformed dimension
A dimension that is shared across multiple fact tables
54
OLTP vs OLAP
OLTP captures, stores and processes data from transactions in real time OLAP uses complex queries to analyse aggregated historical data from OLTP systems.
54
Inmon vs kimball
Inmon focusses on organisation and normalised data Kimball makes data analysis easy
55
ERP
enterprise resource planning
56
Active Data Warehousing
the technical ability to capture transactions when they change and integrate them into the warehouse, along with maintaining batch or scheduled cycle refreshes. (automated updating to OLTP)
57
What implementation approaches support OLAP?
Relational Online Analytical Processing Hybrid Online Analytical Processing Multi-dimensional Online Analytical Processing
58
When supporting multi-dimensional business report requests, which would be best?
OLAP
59
ESB => (tight or loose)
Loosley coupled
60
What is true of a data vault
Data is cleansed and standardized as part of the staging process History is stored in a normalized atomic structure, dimensional surrogate, primary and alternate keys are defined Business key and surrogate key relationships should remain intact to provide the history of the data mart Reloading facts is possible when later increments introduce grain changes
61
When performing source to target mapping, one of the main challenges with any mapping effort is:
Determining valid links or equivalencies between data elements in multiple systems