Chapter 17 Understanding Further Optimization Aspects Flashcards
What are iterators?
SQL Server executes a query by using a set of physical operators. Because these operators iterate through rowsets, they are also called iterators.
If a table is organized as a heap, what is the only access method available?
If a table is organized as a heap, then the only access method available to SQL Server is a table scan.
What is a table scan?
In a table scan, SQL Server literally scans every data page that was allocated by the heap. Any query against the heap causes a table scan operation.
Even if you select only a few columns from this table, and even if you use a very selective WHERE clause that limits the result set to a single row, SQL Server uses a Table Scan iterator to retrieve the data.
To determine which pages have been allocated for the heap, SQL Server must refer back to the IAM. A table scan is known as an “allocation order scan” since it uses the IAM to scan the data pages in the order in which they were allocated by SQL Server.
Note that SQL Server can use the allocation order scan when a table is clustered as well.
An allocation order scan is faster if a table is less physically fragmented. Allocation order scans are not affected by logical fragmentation.
When using a clustered table, what is the difference between an “allocation order scan” and an “index order scan”?
When SQL Server performs an unordered clustered index scan (i.e. the query does not request any specific order), it refers back to the IAM to scan the data pages of the clustered index using an allocation order scan (just like a table scan).
However, SQL Server has another option for clustered indexes, the ordered clustered index scan (aka the index scan). In an index scan, SQL Server can follow the doubly linked list at the leaf node level instead of referring back to the IAM. The scan is performed in clustered index order.
In each of these cases, the Clustered Index Scan iterator is used.
What iterator is used to scan a nonclustered index?
SQL Server uses the Index Scan iterator to scan a nonclustered index.
As with the Clustered Index Scan iterator, SQL Server can perform an allocation or an index order scan when scanning a nonclustered index.
Can SQL Server Query Optimizer cover a query with multiple nonclustered indexes?
Recall that a nonclustered index can cover a query (included columns, etc). Covering a query means that SQL Server can find all data needed for the query in a nonclustered index and does not need to do any lookups in the base table.
In some cases, the Query Optimizer can even decide to cover a query with multiple nonclustered indexes. SQL Server can join nonclustered indexes. All nonclustered indexes of a table always have some data in common that can be used for a join. If a table is organized as a heap, then this data is the RID; if the table is clustered, the this is the clustering key.
Can allocation order scans be unsafe?
Yes. With an allocation order scan, SQL Server can skip some rows and read other rows multiple times.
This can happen if you use the Read Uncommitted isolation level in a read-write environment.
While one query is performing an allocation order scan, another command could update the data and cause movement of one or more rows. The scanning query might have already read these rows and could read the rows again after movement. Or the scanning query might already have passed the page to which a row was moved from a page that was not scanned yet and the scanning query never reads this row.
Why might a row be moved?
A row can move because of multiple causes. For example, a command might update a variable length column and replace a short value with a long one. A page might be full, and thus the updated row has to move to another page.
A set of rows can move if there are inserts in a clustered table and a page split occurs. Approx. half of the rows are moved to the new page.
What is a “seek and partial scan”?
When you scan an index, SQL Server is not limited to a full scan. If you limit a rowset returned by a query and the scan is ordered, then SQL Server can seek for the first value of the rowset needed and then perform a partial scan of subsequent values in the logical order of an index.
SQL Server can use a seek and partial scan for both clustered indexes and covering nonclustered indexes.
After the first record is found, SQL Server does not use seek for subsequent orders; instead it performs a partial scan.
What is the most common access method SQL Server uses in OLTP environments?
A nonclustered index seek with an ordered partial scan and then a lookup into the base table for each row found in the nonclustered index. Such plans are common for selective queries. The base tables can be organized as a heap or as a balanced tree.
When the table is organized as a heap, SQL Server uses the RID Lookup operator to retrieve the rows from the base table.
If the table is clustered, then SQL Server uses the Key Lookup operator instead of the RID lookup operator.
What is a partial scan?
Where SQL Server seeks to find the first qualifying row and then scans to the end of the qualifying range.
What are the basic JOIN algorithms?
SQL Server supports 3 basic join algorithms: nested loops, merge joins, and hash joins. A hash join can be further optimized by using a bitmap filtering (i.e. a bitmap filtered hash join).
How does the nested loops algorithm work?
The nested loops algorithm is very simple, and in many cases, efficient algorithm. SQL Server uses one table for the outer loop, typically the table with fewer rows. For each row in this outer input, SQL Server seeks for matching rows in the second table which is the inner table.
SQL Server uses a join condition to find the matching rows. The join can be a non-equijoin meaning that the Equals operator does not need to be part of the join predicate.
If the inner table has no supporting index for a seek, the SQL Server scans the inner input for each row of the outer input. This is not an efficient scenario. A nested loops join is efficient when SQL Server can perform an index seek in the inner input.
How does the merge join algorithm work?
Merge join is a very efficient algorithm. However, it has its own limitations. It needs at least one equijoin predicate and sorted inputs from both tables. This means that the merge join should be supported by indexes on both tables involved in the join.
If one input is much smaller than another, then the nested loops join could be more efficient than a merge join. In other words, it works better on larger tables.
In a one-to-one or one-to-many scenario, the merge join scans both inputs only once. It starts by finding the first rows on both sides. If the end of the input is not reached, the merge join checks the join predicate to determine whether the rows match. If the rows match, they are added to the output. Then the algorithm checks the next rows from the other side and adds them to the output if they match the predicate.
If the rows from the inputs do not match, then then algorithm reads the next row from the side with the lower value. It reads from this side and compares the row to the row from the other side until the value is bigger than the value from the other side. Then it continues reading from the other side and so on.
In a many-to-many scenario, the merge join algorithm uses a work table to put the rows from one input side aside for reusage when duplicate matching rows from the other input exist.
How does the hash join algorithm work?
If none of the inputs is supported by an index and an equijoin predicate is used, then the hash join algorithm might be the most efficient one. It uses a searching structure named a hash table which SQL Server builds internally. It uses a hash function to split the rows from the smaller input into buckets. This is called the “build” phase.
SQL Server uses the smaller input for building the hash table because it wants to keep the hash table in memory. If it needs to get spilled to disk, then the algorithm might become much slower. The hash function creates buckets of approx equal size.
After the hash table is built, SQL Server applies the hash function on each of the rows from the other input. It checks to see into which bucket the row fits. Then it scans through all rows from the bucket. This phase is called the “probe” phase.
In a single-thread mode hash join is usually slower than merge and nested loops join that are supported by indexes. However, SQL Server can split rows from the probe input in advance, and perform partial joins in multiple threads. The hash join is actually very scalable. This kind of optimization of a hash join is called a bitmap filtered hash join. It is typically used in a data warehousing scenarios where you have large inputs fro a query and a few concurrent users, so SQL Server can execute a query in parallel.
Although a regular hash join can be executed in parallel as well, the bitmap filtered hash join is more efficient because SQL Server can use bitmaps for early elimination of rows not used in the join from the bigger table.
Which join algorithm is the only one to support non-equijoins?
Nested Loops.
Summarize the characteristics of the three join types?
Hash -
Index on Joining Columns: Inner table: Not indexed Outer table: Optional Optimal condition: Small outer table, large inner table. Usual Size of Joining tables: Any Presorted: No Join Clause: Equijoin
Merge -
Index on Joining Columns: Both tables: must Optimal condition: Clustered or covering index on both. Usual Size of Joining tables: Large Presorted: Yes Join Clause: Equijoin
Nested loops -
Index on Joining Columns: Inner table: must Outer table preferable. Usual Size of Joining tables: Small Presorted: Optional Join Clause: All