Data Warehousing with Apache Hive Flashcards
What is a star schema in data warehousing?
A star schema is a type of database schema that is a popular design pattern for data warehouses and business intelligence applications. It consists of one or more fact tables referencing any number of dimension tables, resembling a star.
What is the role of a fact table in a star schema?
The fact table is the central table in a star schema. It contains measurable, quantitative data (like sales amount) and foreign keys from dimension tables, representing the core data being analyzed.
What are dimension tables in a star schema?
Dimension tables in a star schema store attributes or dimensions that describe the objects in the fact table. These include descriptive data like dates, products, or geographic locations.
What are the benefits of using a star schema?
Star schemas simplify data modeling and querying. They enhance query performance due to their straightforward structure, making them ideal for business intelligence and data warehousing.
How does a star schema differ from a snowflake schema?
Unlike the more complex snowflake schema, the star schema is simpler with denormalized dimension tables. This simplicity can lead to faster query times but potentially larger storage requirements due to denormalization.
What would a fact table for a product order transaction typically include?
A fact table for a product order transaction would include measurable, quantitative data such as Order ID, Product ID, Quantity Ordered, Total Price, and possibly foreign keys linking to dimension tables like Date, Customer, Product, and Store.
What attributes might a Date dimension table contain for a product order?
The Date dimension table could include Date ID (as primary key), Day, Month, Quarter, Year, and other relevant time-related attributes like Weekday or Holiday.
What would you typically find in a Customer dimension table?
A Customer dimension table would feature Customer ID, Name, Address, Contact Information, Demographic Data, and possibly a Customer Segment or Category.
What are common fields in a Product dimension table?
The Product dimension table would contain Product ID, Name, Description, Category, Supplier, Price, and potentially Size, Color, or other product-specific attributes.
What details might a Store dimension table include for a product order?
A Store dimension table might have Store ID, Location, Address, Region, Manager, and Store Type or Size. This table provides context about where the order was placed or fulfilled.
What is a snowflake schema in data warehousing?
A snowflake schema is a type of database schema often used in data warehousing. It’s an extension of the star schema, where dimension tables are normalized into multiple related tables, forming a structure resembling a snowflake.
How does normalization work in a snowflake schema?
In a snowflake schema, dimension tables are broken down into more tables to eliminate redundancy and improve data integrity. This leads to a more complex structure with more joins than in a star schema.
What is the role of a fact table in a snowflake schema?
Similar to the star schema, the fact table in a snowflake schema is at the center. It stores transactional data (like sales) and links to dimension tables, which are further normalized.
What are the benefits of using a snowflake schema?
Snowflake schemas offer better data organization and improved data integrity due to normalization. They can also reduce data redundancy and storage costs.
How does a snowflake schema differ from a star schema?
The snowflake schema is more complex due to the normalization of dimension tables, leading to more joins and potentially slower query performance, compared to the simpler, denormalized structure of the star schema.
What is a galaxy schema in data warehousing?
A galaxy schema, also known as a fact constellation schema, is a data warehouse design that includes multiple fact tables sharing dimension tables. It’s a complex schema designed for multifaceted data analysis across various business processes.
How is a galaxy schema structured?
In a galaxy schema, there are several fact tables that are related to shared dimension tables. Each fact table corresponds to a different business process but can share dimensions with other fact tables, resembling a collection of star schemas (hence the name “galaxy”).
What is the role of fact tables in a galaxy schema?
Each fact table in a galaxy schema represents a specific business process or event, such as sales or inventory, containing the metrics or measurements of that process.
How do shared dimension tables work in a galaxy schema?
Shared dimension tables in a galaxy schema serve multiple fact tables. For instance, a ‘Time’ dimension could be shared by both ‘Sales’ and ‘Inventory’ fact tables.
What are the advantages of using a galaxy schema?
The galaxy schema allows for extensive and complex analysis across different business processes, providing a more comprehensive view. It’s flexible in accommodating diverse business requirements and analyses.
What is a star schema?
The star schema is the simplest data warehouse schema, characterized by a central fact table connected to multiple dimension tables. It’s efficient for querying large data sets and is easy to understand.
What is a snowflake schema?
The snowflake schema is a variation of the star schema where dimension tables are normalized, breaking them down into additional tables. This reduces redundancy but increases complexity.
What is a galaxy schema?
The galaxy schema, or fact constellation schema, involves multiple fact tables that share dimension tables. It’s suitable for complex data warehousing scenarios involving multiple business processes.
How do star, snowflake, and galaxy schemas compare in terms of complexity and performance?
Star schemas are simplest and fastest for queries. Snowflake schemas are more complex due to normalization, potentially slowing down queries. Galaxy schemas are the most complex, suitable for advanced, multifaceted analysis.