SQL optimizations Flashcards
Which is better to use: joins or subqueries? Why
Prefer joins over subqueries and correlated subqueries, especially if the tables involved have indexes that can be leveraged by the join
Why should select specific columns?
reduce the amount of data that needs to be processed and transferred.
How do you exploit and improve where clause performance?
Conditions in WHERE clauses should be written to exploit indexes and to minimize the number of rows processed
How do you optimize aggregate functions and group by?
When using functions like COUNT, SUM, and AVG, or when grouping data, make sure that the fields involved are indexed if possible
Should you use wildcards?
If using LIKE, place wildcards (%) judiciously. A leading wildcard prevents index usage and can slow down searches
T or F. You can use functions on where columns that are indexed.
False. Applying functions on indexed columns within the WHERE clause can lead to full table scans because the index cannot be used directly.
What is the proper way to use order by and having clauses?
ORDER BY and HAVING clauses can be expensive, use them only when necessary, and ensure there’s an index that can support these operations.
T or F. Batch updates and inserts. Why?
Instead of single row inserts or updates, batch operations can reduce the transaction overhead and increase performance
What are some advanced indexing techniques?
Partial indexes: Create indexes on a subset of the data that meets certain conditions, reducing index size and maintenance overhead.
Composite indexes: Use multiple columns in an index to improve the performance of queries that filter or sort on these columns.
Index-organized tables: Store table data in an index-based structure to improve access time, useful for OLTP systems with frequent short queries.
Explain the differences between the joins:
INNER JOIN (a.k.a. “simple join”): Returns all rows for which there is at least one match in BOTH tables. This is the default type of join if no specific JOIN type is specified.
LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table, and the matched rows from the right table; i.e., the results will contain all records from the left table, even if the JOIN condition doesn’t find any matching records in the right table. This means that if the ON clause doesn’t match any records in the right table, the JOIN will still return a row in the result for that record in the left table, but with NULL in each column from the right table.
RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table, and the matched rows from the left table. This is the exact opposite of a LEFT JOIN; i.e., the results will contain all records from the right table, even if the JOIN condition doesn’t find any matching records in the left table. This means that if the ON clause doesn’t match any records in the left table, the JOIN will still return a row in the result for that record in the right table, but with NULL in each column from the left table.
FULL JOIN (or FULL OUTER JOIN): Returns all rows for which there is a match in EITHER of the tables. Conceptually, a FULL JOIN combines the effect of applying both a LEFT JOIN and a RIGHT JOIN; i.e., its result set is equivalent to performing a UNION of the results of left and right outer queries.
CROSS JOIN: Returns all records where each row from the first table is combined with each row from the second table (i.e., returns the Cartesian product of the sets of rows from the joined tables). Note that a CROSS JOIN can either be specified using the CROSS JOIN syntax (“explicit join notation”) or (b) listing the tables in the FROM clause separated by commas without using a WHERE clause to supply join criteria (“implicit join notation”).
Why is data partitioning helpful?
Data partitioning splits tables into smaller, more manageable pieces, improving query performance by limiting the number of rows to scan. Partitioning can be based on range, list, or hash of a column.
What are materialized views?
Precompute and store the result of a complex query, and then query this precomputed result. It’s highly effective for data that does not change frequently but is read often.
What is an execution plan? When would you use it? How would you view the execution plan?
An execution plan is basically a road map that graphically or textually shows the data retrieval methods chosen by the SQL server’s query optimizer for a stored procedure or ad hoc query. Execution plans are very useful for helping a developer understand and analyze the performance characteristics of a query or stored procedure, since the plan is used to execute the query or stored procedure.
In many SQL systems, a textual execution plan can be obtained using a keyword such as EXPLAIN, and visual representations can often be obtained as well
What are the ACID properties that effectively guarentee that database transactions are processed reliably?
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. They are defined as follows:
Atomicity. Atomicity requires that each transaction be “all or nothing”: if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes.
Consistency. The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
Isolation. The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method (i.e. if it uses strict - as opposed to relaxed - serializability), the effects of an incomplete transaction might not even be visible to another transaction.
Durability. Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.
What does UNION do? What is the difference between UNION and UNION ALL?
UNION merges the contents of two structurally-compatible tables into a single combined table. The difference between UNION and UNION ALL is that UNION will omit duplicate records whereas UNION ALL will include duplicate records.
It is important to note that the performance of UNION ALL will typically be better than UNION, since UNION requires the server to do the additional work of removing any duplicates. So, in cases where is is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL would be recommended for performance reasons.