Data Warehousing with Apache Hive Flashcards
What is a star schema in data warehousing?
A star schema is a type of database schema that is a popular design pattern for data warehouses and business intelligence applications. It consists of one or more fact tables referencing any number of dimension tables, resembling a star.
What is the role of a fact table in a star schema?
The fact table is the central table in a star schema. It contains measurable, quantitative data (like sales amount) and foreign keys from dimension tables, representing the core data being analyzed.
What are dimension tables in a star schema?
Dimension tables in a star schema store attributes or dimensions that describe the objects in the fact table. These include descriptive data like dates, products, or geographic locations.
What are the benefits of using a star schema?
Star schemas simplify data modeling and querying. They enhance query performance due to their straightforward structure, making them ideal for business intelligence and data warehousing.
How does a star schema differ from a snowflake schema?
Unlike the more complex snowflake schema, the star schema is simpler with denormalized dimension tables. This simplicity can lead to faster query times but potentially larger storage requirements due to denormalization.
What would a fact table for a product order transaction typically include?
A fact table for a product order transaction would include measurable, quantitative data such as Order ID, Product ID, Quantity Ordered, Total Price, and possibly foreign keys linking to dimension tables like Date, Customer, Product, and Store.
What attributes might a Date dimension table contain for a product order?
The Date dimension table could include Date ID (as primary key), Day, Month, Quarter, Year, and other relevant time-related attributes like Weekday or Holiday.
What would you typically find in a Customer dimension table?
A Customer dimension table would feature Customer ID, Name, Address, Contact Information, Demographic Data, and possibly a Customer Segment or Category.
What are common fields in a Product dimension table?
The Product dimension table would contain Product ID, Name, Description, Category, Supplier, Price, and potentially Size, Color, or other product-specific attributes.
What details might a Store dimension table include for a product order?
A Store dimension table might have Store ID, Location, Address, Region, Manager, and Store Type or Size. This table provides context about where the order was placed or fulfilled.
What is a snowflake schema in data warehousing?
A snowflake schema is a type of database schema often used in data warehousing. It’s an extension of the star schema, where dimension tables are normalized into multiple related tables, forming a structure resembling a snowflake.
How does normalization work in a snowflake schema?
In a snowflake schema, dimension tables are broken down into more tables to eliminate redundancy and improve data integrity. This leads to a more complex structure with more joins than in a star schema.
What is the role of a fact table in a snowflake schema?
Similar to the star schema, the fact table in a snowflake schema is at the center. It stores transactional data (like sales) and links to dimension tables, which are further normalized.
What are the benefits of using a snowflake schema?
Snowflake schemas offer better data organization and improved data integrity due to normalization. They can also reduce data redundancy and storage costs.
How does a snowflake schema differ from a star schema?
The snowflake schema is more complex due to the normalization of dimension tables, leading to more joins and potentially slower query performance, compared to the simpler, denormalized structure of the star schema.
What is a galaxy schema in data warehousing?
A galaxy schema, also known as a fact constellation schema, is a data warehouse design that includes multiple fact tables sharing dimension tables. It’s a complex schema designed for multifaceted data analysis across various business processes.