SQL General Questions Flashcards
“My database is getting slow, how would you improve its performance?”
Possible solutions:
- Indexing -> Indexes can be a great way to fix read performance on your database but might slow down write performance
- Creating materialized views - While simple views serve as an alias to queries, materialized views store the results of the query. It means that instead of running the query at user request, you can pre-compute the query, store the result in the materialized view, and show the results to the user very quickly when the user actually asks for it.
- Connection pooling at the application level
- Application level caching
Why should you avoid using offset in queries?
Performing offset (or skip) on your database affects performance because of the way database fetches the rows. Offset happens after the database has fetched all the matching rows. As shown above, the number of rows that database fetches is offset + limit instead of just limit .
What are some common SQL performance bottle necks?
Full Table Scans: When a query scans the entire table instead of using indexes.
Improper Use of Indexes: Including lack of indexes, over-indexing, or incorrectly designed indexes.
Suboptimal Joins: Inefficient join operations can lead to slow performance, particularly when joining large tables or using non-sargable expressions.
Excessive Sorting: Ordering by columns that are not indexed can result in slow operations.
Nested Subqueries: These can sometimes be inefficient, particularly if they are correlated and executed for each row of the outer query.
Lack of Pagination: Retrieving large amounts of data in a single query without pagination can affect performance.
Inadequate Hardware Resources: Insufficient CPU, memory, or disk I/O can be bottlenecks.
Poorly Written SQL: This includes using non-sargable expressions, wildcard characters at the start of a LIKE pattern, and unnecessary complexity in queries.
How can you avoid full table scan of database tables?
Indexing: Create indexes on columns that are frequently used in WHERE, JOIN, ORDER BY, and GROUP BY clauses.
Selective Queries: Write queries that are as selective as possible by using WHERE clauses that filter out unnecessary rows.
Avoid * in SELECT: Instead of using SELECT *, specify only the columns you need.
Column Store Indexes: If you are working with read-intensive operations, consider using column store indexes.
Partitioning: If the table is very large, consider partitioning it, which can help query performance by scanning only relevant partitions.
How do you optimize SQL queries for large datasets? (Large Dataset Handling)
Batch Operations: Break large operations into smaller, more manageable batches.
Indexing: Properly index tables to improve the speed of data retrieval.
Parallel Processing: Use parallel query processing if supported by the database.
Query Refactoring: Simplify complex queries and eliminate unnecessary subqueries and joins.
Data Archiving: Archive old data that is not frequently accessed to reduce the size of the database.
Materialized Views: Create materialized views to store pre-computed results of complex queries.
How do you manage and optimize database concurrency? (Concurrency Control)
Locking Mechanisms: Use appropriate locking at the row or table level to protect data integrity while minimizing contention.
Isolation Levels: Choose suitable transaction isolation levels to balance between data consistency and performance. Higher isolation levels provide more consistency but may increase locking and reduce concurrency.
Optimistic Concurrency Control: This method assumes transactions won’t conflict and checks at commit time. It’s useful for high-read, low-write scenarios.
Indexing: Proper indexing can reduce locking contention by minimizing the amount of data locked during a query.
Database Design: Proper normalization and database design can reduce contention by minimizing the dependency between different operations.
Resource Allocation: Allocate sufficient resources (CPU, memory, I/O) to handle the expected concurrent load.
Q20. What are some best practices for writing high-performance SQL code? (Coding Best Practices)
Use Indexes Effectively: Create indexes on columns that are used in WHERE, JOIN, ORDER BY, and GROUP BY clauses.
Avoid Selecting Unnecessary Columns: Select only the columns you need rather than using SELECT *.
Use JOINs Instead of Subqueries: Subqueries can sometimes be less efficient than JOINs.
Aggregate Data: Use aggregate functions and GROUP BY wisely to minimize the amount of data processed.
Optimize WHERE Clauses: Write efficient WHERE clauses to filter out rows as early as possible.
Limit the Use of Wildcards: Avoid leading wildcards in LIKE clauses, as they prevent index utilization.
Batch Updates: Instead of single-row inserts or updates, use batch operations when possible
Describe the impact of database schema design on performance. (Database Schema Design)
Database schema design has a profound impact on performance, and good schema design can improve query speed, data integrity, and overall efficiency.
Normalization: Proper normalization reduces data redundancy and improves data integrity, but overly normalized schemas can lead to complex queries and joins, which might degrade performance.
Denormalization: Careful denormalization can improve read performance by reducing the number of joins but can affect write performance and data consistency.
Indexing: Effective use of indexes can drastically improve query performance. However, too many indexes can slow down write operations.
Data Types: Choosing the correct data types can save space and improve performance since operations on smaller data types are generally faster.
Partitioning: Partitioning tables can significantly improve performance for large tables by dividing them into smaller, more manageable pieces.
Archiving: Implementing data archiving for historical data can improve the performance of the operational database.