Ch 5.4 Data Warehouses and Data Marts Flashcards

Question 1

Q

If a manaageer wants to know the profit margin on used books, how would they find this?

Answer

A

use SQL or query by example

this allows managers to look up things in the database

Question 2

Q

What is QBE

Answer

A

making database queries that allow user to search the doc based on an example (string of text )

Question 3

Q

Why do we need data warehouses and data marts if be have QBE or SQL

Answer

A

this is because if we want to look at trends, we need a complicated system

Question 4

Q

what do data warehouses and data marts have that sql doesnt

Answer

A

1) info is organized better than sql is
2) these databases are read only!!! no need for updating or processing
3) SQL is used to access one record at a time; Data warehouss and mart is designed to access large groups of related records

Question 5

Q

Data warehouse

Answer

A

holds historiccal data organized by subject

-> very expensive

Question 6

Q

data mart

Answer

A

cheaper alternative to data warehouses,

low cost and scaled down
-> implemented in less than 90 days

Question 7

Q

how are data warehouses and marts organized?

Answer

A

1) organized by business dimension or subject (customer, vendor, product, price etc)
2) use online analytical processing (analyzes data from end users)
3)integrated: data collected from multiple systems and integrated around subjects
4) time variant: have HISTORICAL data
5) non volatile: users cant change or update data
6) multidimensional: store data in more than 2 dimensions “data cube”

Question 8

Q

what are components of the data warehouse environment:

Answer

A

1) source systems that provide data
2) data integration tech
3) diff architectures for storing data
4) diff tools and applications for users
5) metadata (Data about the data)

Question 9

Q

Source systems

of data warehouses and data marts

Answer

A

“organizational pain point” motivates businesses to develop BI, this BI needs data to work

PROVIDE DATA TO THE WAREHOUSE OR MART

EX: operational data, erp, website data, 3rd party data,

Question 10

Q

relational database vs multidinensional data warehouses and data marts

Answer

A

Relational database is like ERD tables: separate tables w PKs and FKs

Data cube: its a matrix of 3 dimensiosns

Question 11

Q

Data integration- data warehouse environment

Answer

A

USES ETL

tech that processes and prepares data for usage

Question 12

Q

steps of ETL

Answer

A

EXTRACTION: handwritted code or SQL queries
TRANSFORMATION: data integrated around a common key (student id, and all data around the student classes courses major parking)
- sometimes transformation is changing 0 and 1 to M and F
LOADED: loading data into the mart/warehouse (loading window)

Question 13

Q

STORING DATA- environment of data warehouses/marts

Answer

A

1) most common: data warehouse
-> one single read-only version of truth

2) independant data mart: store data for a single/few applications (ex: marketing or finance) FUNCTIONAL AREA CENTRIC
-> not very effective, inconsistent data

3) Hub and spoke: a central data warehouse that has the data plus many dependant data martsMe

Question 14

Q

METADATA- environment of data warehouses/marts

Answer

A

data about the data, things like data quality and the data wrehouse status

INCLUDES UPDATE SCHEDULES

Question 15

Q

DATA QUALITY- environment of data warehouses/marts

Answer

A

QUALITY of data in the warehouse must be high!

DATA cleansing needs to be used to ensure data meets users needs

Question 16

Q

BI governance

Answer

A

establishing people committees processes to mainatin the data warehouse and provide for changes in data type or processings

Question 17

Q

users

Answer

A

business value for users rises when data can be accessed quickly and easily for analysis/consolidation

Question 18

Q

data lake

Answer

A

storage of all org data regardless of its source or format

-recieve data in any format (unstrcuture/structure)

recieive inconsistent data/ overlapping data

Question 19

Q

benefit of data lake

Answer

A

1) Organizations can derive value from unlimited types of data.

2) Organizations do not need to have all of the answers in advance.

3) Organizations have no limits on how they can query the data.

4) Organizations do not create silos. Instead, data lakes provide a single, unified view of data across the organization

Question 20

Q

how is data loaded into a data lake

Answer

A

1) Define the incoming data from a business perspective.
2) Document the context, origin, and frequency of the incoming data.
3) Classify the security level (public, internal, sensitive, restricted) of the incoming data.
4) Document the creation, usage, privacy, regulatory, and encryption business rules that apply to the incoming data.
5) Identify the owner (sponsor) of the incoming data.
6) Identify the data steward(s) charged with monitoring the health of the specific datasets

Question 21

Q

Schema on read

Answer

A

A data lake works on a principle called schema-on-read. This means that there is no predefined schema into which data needs to be fitted before storage. Only when the data is read during processing is it parsed and adapted into a schema as needed.

Question 22

Q

most big orgs have an EDW… WHAT IS THIS?

Answer

A

Enterprise data warehouse

Question 23

Q