2. SQL and Databases Flashcards
What are the differences between OLTP and OLAP systems?
OLTP (Online Transaction Processing) systems are optimized for managing transaction-oriented applications, while OLAP (Online Analytical Processing) systems are designed for complex queries and data analysis.
What is indexing in databases, and how does it improve performance?
Indexing is a data structure technique that improves the speed of data retrieval operations on a database table. It allows the database to find data faster without scanning the entire table.
Indexing in databases is a performance optimization technique that creates a data structure to quickly locate and access specific rows in a table. Here’s how it works:
Structure: An index is like a lookup table that contains a copy of selected columns and a pointer to the full row data.
Performance Improvement: Instead of scanning the entire table (a full table scan), the database can use the index to quickly find the exact rows matching a query, significantly reducing search time.
Example: In a table with millions of user records, an index on the email column allows near-instant retrieval of a specific user’s record without checking every single row.
Trade-offs:
Pros: Faster read operations
Cons: Slower write operations (as indexes must be updated), additional storage space required
Common Index Types:
B-Tree Index: Most common, good for equality and range queries
Hash Index: Excellent for exact match lookups
Bitmap Index: Efficient for columns with low cardinality
Typical performance improvement: From O(n) (linear search) to O(log n) query time for indexed columns.
Explain the concept of normalization and denormalization.
Normalization is the process of organizing data to reduce redundancy, while denormalization is the process of combining tables to improve read performance at the cost of increased redundancy.
How do you optimize complex SQL queries?
Optimizing complex SQL queries can involve indexing, rewriting queries for efficiency, reducing the number of joins, and using appropriate filtering and aggregation.
What is a clustered vs. non-clustered index?
A clustered index determines the physical order of data in a table, while a non-clustered index is a separate structure that points to the physical data.
Clustered Index:
Defines the physical order of table data.
Only one per table.
Faster for range queries & sorting.
Non-Clustered Index:
Does not change physical order.
Multiple indexes allowed.
Stores pointers to actual data.
Faster for search queries on specific columns.
What is a materialized view, and how does it differ from a regular view?
A materialized view is a database object that contains the results of a query and is stored on disk, whereas a regular view is a virtual table that does not store data.
How do you handle duplicates in SQL data?
Duplicates in SQL data can be handled using the DISTINCT keyword, GROUP BY clause, or by using the ROW_NUMBER() function to identify and remove duplicates.
Explain ACID properties in a relational database.
ACID properties ensure that database transactions are processed reliably: Atomicity, Consistency, Isolation, and Durability.
What is the difference between UNION and UNION ALL?
UNION combines the results of two queries and removes duplicates, while UNION ALL combines the results and includes all duplicates.
How does partitioning improve query performance?
Partitioning improves query performance by dividing a large table into smaller, more manageable pieces, allowing for faster data access and maintenance.