data warehousing 2/3 Flashcards
What is metadata in the context of data warehousing?
Metadata describes the structure, lineage, and definitions of data stored in the warehouse.
What are the three types of metadata in a data warehouse?
Business Metadata: Data ownership, business definitions.
Technical Metadata: Table/column names, data types, keys.
Operational Metadata: Data state (active, archived), lineage.
What is the difference between ETL and ELT?
ETL (Extract, Transform, Load): Data is transformed before loading into the warehouse.
ELT (Extract, Load, Transform): Data is loaded first, then transformed within the warehouse.
What are the strengths of ETL processes?
Shorter development time.
Targeted data extraction.
Wide availability of tools.
What is the purpose of data staging in a warehouse?
Acts as a temporary space for data transformation and integration before loading into the warehouse.
What is a data mart, and how does it differ from a data warehouse?
A data mart is a subset of a data warehouse tailored for specific departments or functions.
It is smaller, faster to build, and focused on departmental needs.
What are the types of data marts?
Independent: Stand-alone, specific to one department.
Dependent: Derived from the central data warehouse.
What are the key index structures in data warehousing?
Inverted Index: Maps terms to their data locations for quick lookup.
Bitmap Index: Efficient for columns with few distinct values.
Join Index: Pre-computed joins between fact and dimension tables.
What is the difference between inverted and bitmap indexes?
Inverted Index: Links terms to data records.
Bitmap Index: Uses binary vectors for efficient intersections of query results.
What are the three key security aspects of data warehousing?
Confidentiality, integrity, and availability.
How can users be classified in a data warehouse security framework?
By role, department, job type, or grade.
What are common reasons for ‘dirty’ data in a data warehouse?
Dummy values, missing data, cryptic data, non-unique identifiers, and violation of business rules.
What is the purpose of parsing in data cleaning?
Identifying and isolating data components, such as splitting names or addresses into parts.
How is data standardized during cleaning?
By applying consistent formats using business rules, such as replacing nicknames or adding prefixes.
What is data enrichment?
Enhancing existing data by incorporating external sources to provide additional context.