146 - 208: Cost Estimation Flashcards
Q: What is selectivity estimation in query optimization?
It is the process of estimating how many tuples are expected in a query result or intermediate results. It uses data statistics and assumptions about predicates to guide optimization.
What are equi-width histograms?
Histograms where all buckets have the same width in the domain. They are used for summarizing data distributions.
Which error metric calculates the average squared difference between actual and estimated values?
Mean Squared Error (MSE).
What does the term “wavelet transform” represent?
A mathematical technique that decomposes data into resolution averages and detail coefficients, useful for selectivity estimation.
What are the advantages of histograms in databases?
A) Compact representation of data.
B) Allows estimation of point and range queries.
C) Fits all data perfectly without error.
D) Enables efficient storage and lookup.
A, B and D
Explain the concept of “V-Optimal histograms.”
These histograms minimize the Sum Squared Error (SSE) for a given number of buckets, providing optimal data representation.
What is the primary role of the FM-sketch in probabilistic counting?
To estimate the number of distinct values in a dataset using bit vectors and hashing.
What influences the cost of query evaluation plans?
A) Size of intermediate results.
B) Availability of indices.
C) Execution time only.
D) Memory and system statistics.
A, B and D
How does the CM-sketch estimate frequencies?
By using a two-dimensional array of counters indexed by hash functions, with estimates derived from the minimum value across rows.