Module 2 Flashcards

1
Q

What are some advantages and disadvantages of aggregates?

A

Good for general summations, but not much detail
o Very unspecific
o Aggregation is used for huge amounts of data
o Advantages
• Improved read times if query matches aggregate definition
• Reduced system load
o Disadvantages
• Data redundancy- create another table with the same information
• Time required for updates- each transactions changes the aggregate
• Need to anticipate how users query the data- If no one ever wants to see your table of groups then it’s a waste of time
• Lose Information/details

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do you use a multi-layer concept in the data staging process?

A

o Disadvantage- Create redundancy
o Advantages
• To be able to fix errors faster and quicker
• Can go through different stages and look at how you changed it
o Enable hub and spoke architecture
• Re-use of information (extract once, deploy many)
• Have high granular data and aggregated data at the same time
o Easier to maintain and understand
• Step by step manipulation
• Dedicated task monopoly- hold people accountable for what the change, can pinpoint who changed what
• Clearly defined design guidelines
• Faster troubleshooting
o Faster development cycles for future data marts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is BI?

A

• Gathering, storing, accessing, and analyzing data to help business users make better decisions. Analyzing business performance through data-driven insights. DSS, query and reporting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

o What is a DWH?

A
  • Decision support database that is maintained separately from the organizations operational database
  • A consistent database source that brings together information from multiple sources for decision support queries.
  • Support information processing by providing a solid platform of consolidated data
  • Keep history mainly and in an easy to read layout
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

o What differentiates OLAP and OLTP? When do you use each?

A

OLTP (on-line transaction processing)
• Typically traditional relational DBMS
• Optimized for day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
• Aims at reliable and efficient processing of a large number of transactions and
ensuring data consistency

OLAP (on-line analytical processing)
• Major task of data warehouse system
• Optimized for data analysis and decision making
• Aims at efficient multidimensional processing of large data volumes
• Fast, interactive answers to large aggregate queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are characteristics, Attributes, Facts, Metadata?

A
  • Metadata- Data that describes data
  • Characteristic- Ex- Customer, always will describe or classify a fact
  • Attribute- Further classifying the characteristic
  • Fact- Numbers in the schema, numbers of employees, sales, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a generic star schema look like?

A
  • Two key elements
  • Fact table in the center
  • Dimension tables on the outside of that
  • Snowflake Schema- Has another layer of dimension tables to reduce the amount of redundancy in the data warehouse
  • In the fact table you never store dimension information. The K in the front of the word means “KEY” and that is linked to the dimension table. In the fact table you only store numbers or keys to other tables. Whatever number you have that is not stored in the fact table, you cannot use it for calculation. Weight is an attribute of your body, it describes you.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is drill down, slice, and dice?

A
  • Drill Down- Finer set of summarizations, add additional characteristic, add additional level of detail, EX- Camping supplies- sunscreen, bug spray, first aid kits. Product category and then all the products in there.
  • Dice- Creating a subcube by picking specific values of multiple dimensions
  • Slice- Picking a subset of a cube, usually picking a single value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

o What are the key components to a BI architecture?

A
  • ETL process- moving the data, Acryonm is misleading, its not a 3 step process, it’s a 5 step. Selecting, extracting, transforming, integrate, loading
  • Sources- data sources
  • Data staging
  • Data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

• Why do DWHs use a star or snowflake schema instead of just single tables containing all the information?

A

o To be able to modify the information much easier
o One table would take forever to find one thing- it’s like a big ass book with no index and you’re trying to find something
o Create reusability of the data
o Reduce the size of the database and the redundancy of information
o Ability to change master data in dimensions for millions of facts faster
• Change attribute values
• Add or remove attributes
o Quick selects to show user options available for filtering
o Reusability of the same master data for multiple cubes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly