Data Warehousing and Business Intelligence Management (Including Big Data) Flashcards

1
Q

The definition of _______________________: is the planning, implementation, and control processes to provide decision support data and support knowledge workers engaged in reporting, query, and analysis.

A

Data Warehousing and Business Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the goals of Data Warehousing and Business Intelligence?

A
  1. To build and maintain the technical environment and technical and business processes needed to deliver
    integrated data in support of operational functions, compliance requirements, and business intelligence
    activities.
  2. To support and enable effective business analysis and decision making by knowledge workers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are two models and approaches to data warehousing

A

Inmon and Kimball

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

One of the component parts of Inmon’s definition of a data warehouse includes: ___________________ where the data warehouse is organized based on major business entities, rather than focusing on a functional or application.

A

subject oriented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A component part of Inmon’s definition of a data warehouse includes: ___________________ where the data warehouse unified and cohesive. The same key structures, encoding and decoding of structures, data definitions, naming conventions are applied consistently throughout the warehouse

A

Integrated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A component part of Inmon’s definition of a data warehouse includes: ___________________ where the data warehouse stores data as it exists in a set point in time. Records in the DW are like snapshots. Each one reflects the state of the data at a moment of time. This means that querying data based on a specific time period will always produce the same result, regardless of when the query
is submitted.

A

Time variant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In Inmon’s DW, __________________ , describes records are not normally updated as they are in operational systems. Instead, new data is appended to existing data. A set of records may represent different states of the same transaction.

A

non volatiles (records)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When warehouses were first established, cost and space considerations drove the need to ____________ data. This data can be persistent (stored in a table) or non-persistent (rendered in a view) in contemporary DW environments. The deciding factor in whether to persist data is usually performance.

A

Summarize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The focus of operational systems is current data. Warehouses contain __________ data as well. Often they house vast amounts of it.

A

Historical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

in the context of the Corporate Information Factory (CIF):
__________ perform operational processes. Pulls from these systems bring data into the data warehouse and the operational data stores (ODS) where it can be analyzed.

A

Applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

in the context of the Corporate Information Factory (CIF): ________ is a database that stands between the operational source databases and the target databases. It is where the extract, transform, and load effort takes place. It is not used by end users. Most data in this area is transient, although typically there is some relatively small amount of persistent data.

A

Staging Area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

in the context of the Corporate Information Factory (CIF):
In the _________ layer, data from disparate sources is transformed
so that it can be integrated into the standard corporate representation / model in the DW and ODS.

A

integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

An ______________ is integrated database of operational data. It may be sourced directly from applications or from other databases. These databases generally contain current or near term data
(30-90 days), while a DW contains historical data as well (often several years of data). Data in this database is volatile, while warehouse data is stable. Not all organizations use these. They evolved as to meet the need for low latency data. These sometimes serve as the primary source for a data warehouse; it may
also be used to audit a data warehouse.

A

ODS Operational Data Store

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

_____________ provide data prepared for analysis. This data is often a sub-set of warehouse data designed to support particular kinds of analysis or a specific group of data consumers. For example, these can aggregate data to support faster analysis. Dimensional modeling (using denormalization techniques) is often used to design user-oriented data marts.

A

Data marts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

An ________ is a data mart focused on tactical decision support. It is
sourced directly from an ODS, rather than from a DW. It shares characteristics of the ODS: it contains current or near-term data. Its contents are volatile.

A

OpDM Operational Data Mart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The _________ provides a single integration point for corporate data to support management decision-making, and strategic analysis and planning. The data flows into this from the application systems and ODS, and flows out to the data marts, usually in one direction only. Data that needs correction is rejected, corrected at its source, and ideally re-fed through the system.

A

DW Data Warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

_____________________ is the other primary pattern for DW development. It defines a data warehouse simply as “a copy of transaction data specifically structured for query and analysis”

A

Kimball’s Dimensional Data Warehouse

16
Q

___________ are comprised facts, which contain quantitative data
about business processes (e.g., sales numbers), and dimensions, which store descriptive attributes related to fact data and allow data consumers to answer questions about the facts (e.g., how many units of product X were sold this quarter?)

A

Dimensional Models

Also known as star schemas

17
Q

The ______________ shows the intersection of business processes that generate fact data and data subject areas that represent dimensions.

A

DW bus matrix

The enterprise DW bus matrix can be used to represent the long-term data content requirements for the DW/BI system, independent of technology. This tool enables an organization to scope manageable development efforts. Each implementation builds an increment of the overall architecture.

18
Q

Is the below part of the Kimball or Inmon Data warehouse?

—Operational source systems: operational / transactional applications within the enterprise create the data that is integrated into the Operational Data Store (ODS) and then into Data Warehouse

—Data staging area: clean, combine, standardize, conform dimensions, sort, and sequence

—Data presentation area: datamarts linked by DW bus of conformed dimensions

—Data access tools that focus on end users’ data requirements

A

Components of Kimball Data Warehouse

19
Q

Kimball’s Data Warehouse Bus represents _______ or ______ dimensions unifying multiple datamarts

A

shared or conformed

20
Q

Name Data Storage areas

A

—Staging area: intermediate data store between original data source and centralized data repository; data is staged before transformation, integration, and prep for loading into warehouse —Reference and Master Data conformed dimensions

—Central Warehouse: maintains historical atomic data as well as latest instance of batch run. Considerations include:
○ Relationship between business key and surrogate keys for performance
○ Creation of indices and foreign keys to support dimensions
○ Change data capture (CDC) techniques used to detect, maintain, and store history

—Operational Data Store (ODS): lower latency for operational use; single time window

—Data mart: presents a departmental or functional subset of data warehouse

—Cubes: support Online Analytical Processing (OLAP); can be relational, multi-dimensional, or hybrid

21
Q

Kimball or Inmon: single data warehouse layer with atomic level data

A

Inmon

22
Q

Kimball or Inmon: departmental data marts informing conformed dimensions and facts

A

Kimball

23
Q

________ cleanses and standardizes as part of the staging process. History is stored in a normalized atomic structure, dimensional surrogate, primary and alternate keys are defined.

A

the Data Vault,

24
Q

_______________ is a technique used in data management and data integration to capture and process changes made to a dataset. This approach is particularly useful when you need to track and update changes to data over a period of time, rather than in real-time.

A

Batch Change Data Capture

25
Q

Rather than run on a nightly schedule, _________feeds execute
batch loads on a more frequent schedule (e.g., hourly, every 5 minutes) or when a threshold is reached (e.g., 300 transactions, 1G of data). This allows some processing to happen during the day, but not as intensely as with a dedicated nightly batch process.

A

Trickle feeds (Source accumulation):

26
Q

__________ interaction in real-time or near-real-time is useful when
extremely small packets of data are published to a bus as they occur. Target systems subscribe to the bus, and incrementally process the packets into the warehouse as needed. Source systems and target systems are independent of each other. Data-as-a-Service (DaaS)
frequently uses this method.

A

Messaging (Bus accumulation)

27
Q

Rather than wait on a source-based schedule or threshold, a _______
system collects data as it is received into a buffer area or queue, and processes it in order. The result interaction or some aggregate may later appear as an additional feed to the warehouse

A

Streaming (Target accumulation):

28
Q

What are the 3 V’s of big data?

A

Volume,
The amount of data. Big Data often has thousands of entities or elements in billions of records.

Velocity:
The speed at which data is captured, generated, or shared. Big Data is often generated and can also be distributed and even analyzed in real-time.

Variety / Variability:
The forms in which data is captured or delivered. Big Data requires storage of multiple formats; data structure is often inconsistent within or across data sets.

Expanded Definition: 6 V’s
Viscosity:
How difficult the data is to use or integrate.
Volatility:
How often data changes occur and therefore how long the data is useful.
Veracity:
How trustworthy the data is.

29
Q

______________a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making.”

A

Business Intelligence

30
Q

What are the objectives of Business Intelligence?

A

Integrated data
Ensuring credible, accurate, timely data
Ensuring high-performance data access
Making best use of the outputs

31
Q

Dimensions whose values change infrequently as a result of UPDATE operations in the source system

A

Slowly Changing Dimensions

32
Q

What type is each of the following slowly changing dimension types?

add history table
hybrid
add new attribute
add new row
overwrite

A

Type 4
Type 6
Type 3–Keep the latest old version in an “old” field
Type 2 –Keep all old versions in separate records
Type 1

33
Q

What are the steps in designing the dimensional model?

A
  • Gather user requirements
  • Investigate source data
  • Decide the grain
  • Hierarchies in source map to dimension tables
  • Transaction figures in source map to measures in fact tables
34
Q

________facts are the simplest and most common type of facts in dimensional modeling. They are facts that can be summed up across any dimension without losing meaning or accuracy. For example, the number of units sold, the amount of revenue, or the cost of goods sold are ______ facts.

A

Additive

35
Q

These are stored tables built primarily to improve query performance Each table “rolls up” the information from one or more fact tables at a higher level of granularity

A

Aggregate, aka Summary Tables

36
Q

What are some metrics used to monitor BI Performance?

A

*Average query response time
*Number of users per day/week/month
*Regular survey of DW-BIM customers satisfaction
*Usage statistics and patterns
*Frequency and resource usage of data, queries, and reports (for tuning BI activity)

37
Q

______________databases are specifically designed and optimized for nothing but OLAP query processing.

A

Column

Their schemas are also much more flexible since it’s as easy to drop, add, or update a column in a columnar database as it is to insert, change, or delete a row in a relational database.

38
Q

________ is a comparatively new architectural approach is where volatile data is provisioned in a data warehouse structure to provide transactional systems with a combination of historical and near real time data to meet customer needs.

A

Active Data Warehousing