Databases (SQL & NoSQL) Flashcards
What is a database transaction?
In database management systems (DBMS), a database transaction is essentially a sequence of one or more database operations (like reading, writing, updating, or deleting data) that are treated as a single, indivisible unit of work.
Here’s a breakdown of what that means:
Logical Unit of Work:
A transaction represents a complete, logical operation. For example, transferring money from one bank account to another involves two steps: debiting one account and crediting the other. These two steps should be treated as a single transaction.
All or Nothing:
The core idea is that either all the operations within a transaction succeed, or none of them do. If any part of the transaction fails, the entire transaction is rolled back, and the database returns to its original state.
What are ACID properties in transactions?
- Atomicity – Ensures all operations in a transaction either complete fully or not at all.
- Consistency – Guarantees that a transaction brings the database from one valid state to another.
- Isolation – Ensures that transactions do not interfere with each other.
- Durability – Ensures that once a transaction is committed, it remains permanent, even in case of system failure.
What are the types of transactions?
- Implicit Transactions – Automatically managed by the database system (e.g., simple INSERT or UPDATE queries).
- Explicit Transactions – Managed by developers using BEGIN TRANSACTION, COMMIT, or ROLLBACK.
What is the difference between commit and rollback?
- Commit: Saves all changes made during the transaction permanently.
- Rollback: Cancels all changes made during the transaction, returning the database to its previous state.
What are different isolation levels in transactions?
- Read Uncommitted: Allows reading uncommitted changes from other transactions (dirty reads possible).
- Read Committed: Ensures a transaction only reads committed data (dirty reads prevented).
- Repeatable Read: Prevents dirty and non-repeatable reads (but phantom reads possible).
- Serializable: The highest level, preventing all concurrency issues.
What is the difference between optimistic and pessimistic locking?
Here’s why we need different isolation levels
- Optimistic Locking: Assumes multiple transactions can complete without affecting each other. If a conflict occurs, the transaction is retried.
- Pessimistic Locking: Locks a resource when a transaction accesses it, preventing other transactions from modifying it.
Different isolation levels in transactions are necessary to balance the trade-offs between consistency, performance, and concurrency in database systems. Isolation levels define how transactions interact with each other and control the visibility of changes made by one transaction to other concurrent transactions. By choosing the appropriate isolation level, you can optimize for specific requirements of your application, such as data integrity, speed, or scalability.
Here’s why we need different isolation levels:
________________________________________
1. To Control Visibility of Uncommitted Changes - In a multi-user database system, multiple transactions may run concurrently. Without proper isolation, one transaction might see uncommitted changes made by another transaction, leading to dirty reads.
- Isolation levels like Read Uncommitted allow dirty reads, while higher levels like Read Committed prevent them.
________________________________________
2. To Balance Performance and Consistency - Higher isolation levels (e.g., Serializable) provide strong guarantees about data consistency but can significantly reduce performance due to locking or contention.
- Lower isolation levels (e.g., Read Uncommitted or Read Committed) improve performance by reducing locking but may allow anomalies like dirty reads or non-repeatable reads.3. To Handle Concurrency Anomalies
- Different isolation levels address specific concurrency anomalies:
o Dirty Read: A transaction reads uncommitted changes from another transaction.
o Non-Repeatable Read: A transaction reads the same row twice and gets different values because another transaction modified the row in between.
o Phantom Read: A transaction re-executes a query and finds new rows that were added by another transaction.
o Lost Update: Two transactions overwrite each other’s changes. - Higher isolation levels prevent these anomalies but may require more locking or versioning, which can impact performance.
________________________________________
4. To Support Application Requirements - Different applications have different requirements for data consistency and performance:
o Financial systems may require Serializable isolation to ensure absolute consistency.
o Analytical systems may use Read Uncommitted to maximize performance, even if it means occasionally reading inconsistent data.
o Web applications often use Read Committed or Repeatable Read as a balance between consistency and performance.
________________________________________
5. To Reduce Locking and Contention - Higher isolation levels often require more locks, which can lead to contention and deadlocks.
- Lower isolation levels reduce locking, allowing more concurrency and better throughput, but at the cost of potential inconsistencies.
Common Isolation Levels and Their Use Cases:
- Read Uncommitted:
o Lowest isolation level.
o Allows dirty reads.
o Use case: Applications where performance is critical, and occasional inconsistencies are acceptable (e.g., analytics). - Read Committed:
o Prevents dirty reads.
o Allows non-repeatable reads and phantom reads.
o Use case: Most general-purpose applications (e.g., web apps). - Repeatable Read:
o Prevents dirty reads and non-repeatable reads.
o Allows phantom reads.
o Use case: Applications requiring consistent reads within a transaction (e.g., reporting). - Serializable:
o Highest isolation level.
o Prevents all anomalies by executing transactions serially.
o Use case: Applications requiring absolute consistency (e.g., banking systems).
What’s the difference between SQL and NoSQL databases?
- SQL (Relational): Structured tables with fixed schemas (MySQL, PostgreSQL).
Transaction, roll back - NoSQL: Flexible schemas, used for unstructured data (MongoDB, Redis).
How do you optimize slow SQL queries?
- Use indexes.
- Optimize JOIN operations.
- Avoid SELECT * queries.
- Use caching (Redis, Memcached).
How do you prevent SQL Injection?
- Use prepared statements.
- Validate and sanitize user input.
- Use ORM frameworks (Hibernate, JPA).