1. Introduction to Database Systems Flashcards
Can you describe the lifecycle of a query in a database system, from creation to execution?
- Creation: The query is crafted in a language such as SQL.
- Parsing: It’s then transformed into an Abstract Syntax Tree (AST).
- Optimization: Since there are multiple ways to write a query for the same result, optimization becomes crucial. For example, a ‘select’ on a Cartesian product is costlier than a ‘join’, as the select operation is integrated into the join, affecting the result’s cardinality (size). Determining the most efficient execution plan is challenging due to the exponentially increasing number of potential plans.
- Plan Caching: To save time analyzing execution plans, frequently used or time-sensitive operations are stored in a plan cache for reuse.
- Plan Development without Cache: When not using cached plans, a new plan is formulated based on intermediate operation cardinalities. Estimating these cardinalities requires data statistics like synopses, histograms, and summaries, leading to the creation of a physical execution plan.
- Execution: This physical execution plan (also known as the access plan) is then executed by the database engine to produce the desired result.
What is a “Prepare Statement” in databases, and what are its benefits?
Definition: A prepare statement is the pre-compilation of an SQL query, useful for executing similar queries multiple times with varying parameters.
Process: It involves parsing, compiling, and optimizing the query once, without executing it, saving the plan for future use.
Benefits:
- Performance: Speeds up execution by skipping repeat parsing and compilation.
- Consistency: Ensures uniform execution plans.
What are OLAP queries?
OLAP queries are Online Analaytical Processing queries, or Analytical queries in short.
Nature: Analytical queries are designed for data analysis and business intelligence, involving complex operations like aggregating large data sets for sums, averages, and statistical analyses.
Execution Time: These queries can be time-consuming, taking from an hour to several days, due to the need to scan large datasets and perform complex calculations.
Use Case: They are commonly used in data warehousing for identifying trends, patterns, and aiding in strategic decision-making.
What are OLTP queries?
OLTP queries are - Online Transaction Processing - or Transactional queries.
Nature: Transactional queries handle transactional data, typically being simple, involving small data volumes, and focusing on speed and reliability.
**Execution Time: **These queries are executed rapidly, often within milliseconds, emphasizing efficiency and data integrity.
**Use Case: **Widely used in systems where quick response and data consistency are vital, such as in banking, e-commerce, and other transaction-heavy applications.
What is a Logical Plan in the context of database query processing?
Operation-Focused: Involves operations like selection, projection, and join, but does not specify the execution methods (e.g., type of join algorithm).
Independence from Physical Data: Concerns itself with what operations are to be performed, rather than how data is physically stored or accessed.
Optimization Stage: At this stage, query optimizers rearrange or combine operations for efficiency, based on data’s logical relationships.
What are the key elements of a Physical Execution Plan in database systems?
Algorithm Specification: Specifies exact algorithms for operations like joins, selections, or projections at each node of the plan’s tree structure.
Access Path Definition: Details how data is physically retrieved, including methods like direct file access or index usage.
Data Flow Mechanism: Describes how data is transmitted between nodes, including strategies like pushing data down the tree or pulling it up.
Execution Order: Dictates the sequence of node execution, optimized for efficiency based on data dependencies and chosen algorithms, with either a top-down (push) or bottom-up (pull) approach.
How do Logical and Physical Plans in database query processing differ, particularly in their focus and optimization strategies?
Logical Plan:
- Focus: Concentrates on ‘what’ operations (like selection, join) are needed.
- Optimization: Rearranges or combines operations based on data’s logical relationships.
Physical Plan:
- Focus: Details ‘how’ to execute operations (specific algorithms, data access paths).
- Optimization: Tailors execution order and data flow for efficiency, considering data dependencies and chosen methods.
What is the difference between Declarative and Imperative queries?
Declarative Queries:
- Focus: On ‘what’ data is needed, not how to retrieve it.
- Example: SQL, where you specify desired data (e.g., “SELECT name FROM customers WHERE city = ‘New York’”) without detailing the retrieval process.
- Advantage: Simplifies queries; database system handles complexity.
Imperative Queries:
- Focus: On ‘how’ to fetch data, detailing exact steps.
- Example: Using loops in programming languages (like Java, Python) to process data.
- Advantage: Provides more control, useful for specific, complex data processing needs.
What is the challenge in navigating the plan space in database query optimization, and what is the solution?
Challenge: The query optimizer must choose from a vast array of potential execution plans.
Solution: Optimize within time and resource constraints to find the most efficient plan.
How are the challenges of enumerating through the plan space addressed in database query optimization?
Challenge: Efficiently searching through the large plan space.
Solution: Use dynamic programming to reduce search complexity to a manageable level.
What is the cost model development challenge in database query optimization, and how is it solved?
Challenge: Accurately estimating the cost of operations like hash joins in terms of memory, time, and/or CPU cycles.
Solution: Develop sophisticated cost models for accurate resource prediction.
How does database query optimization tackle the challenge of cardinality estimates?
Challenge: Estimating the number of rows a query operation will return.
Solution: Employ statistical methods and data sampling for educated guesses on data distribution and sizes.
What are the optimization opportunities in database query processing?
Challenge: Identifying areas to make queries more efficient.
Solution: The optimizer rearranges, simplifies, or optimizes queries for better performance.
What is the approach to enhancing evaluation and execution in database query processing?
Challenge: Speeding up specific operations.
Solution: Techniques like indexing, query rewriting, and optimizing execution algorithms.
How are storage and data encoding challenges addressed in database systems?
Challenge: Deciding on efficient data storage and encoding methods.
Solution: Choose appropriate structures (like B-trees, columnar storage) for improved data access and manipulation.