[Fundamental] Database Flashcards
[DB] What is indexing
Indexes support the efficient execution of queries in DB. Without it: full scan
Effect: slower write
[DB] ACID
Atomicity: either all occurs, or nothing occurs. if a debit is made successfully from one account, the corresponding credit is made to the other account.
Consistency: the total value of funds in both the accounts is the same at the start and end of each transaction
Isolation: a transaction should take place in a system in such a way that it is the only transaction that is accessing the resources in a database system
Durability: Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure. This usually means that completed transactions (or their effects) are recorded in non-volatile memory
Traditionally, SQL are designed to follow ACID properties to ensure the integrity and reliability of data.
[DB] BASE
- Basically available- the system guarantees availability.
- Soft state- the state of the system may change over time, even without input.
- Eventual consistency- the system will become consistent over a period of time, given that the system doesn’t receive input during that period.
NoSQL databases often adopt the BASE model to provide high availability, partition tolerance, and scalability, at the cost of strong consistency.
CAP Theorem
Consistency: All requests returns same result (latest write) or error
Availability: All requests always returns result not guaranteed latest write
Partition Tolerance: Cluster must continue to work despite any number of communication breakdowns between nodes
CP and AP
CP: If a network partition occurs, the system will ensure that all nodes agree on the data (consistency), but some requests might not be processed until the partition is resolved (availability is sacrificed)
AP: If a network partition occurs, the system will continue to function and respond to requests (availability), but some nodes might return outdated data (consistency is sacrificed).
Replication and Sharding
Scaling Relational DB
Replication: creates additional copies of the data and allows for automatic failover to another node. Replication may help with horizontal scaling of reads if you are OK to read data that potentially isn’t the latest.
Sharding: allows for horizontal scaling of data writes by partitioning data across multiple servers using ashard key. It’s important tochoose a good shard key. For example, a poor choice of shard key could lead to “hot spots” of data only being written on a single shard
Master: W + R
Slave: R
SQL vs NoSQL
SQL
- Structured Data
- Applications that require strong ACID properties to ensure data integrity, such as banking and financial systems.
- Complex Queries: The need for complex queries and reporting, where relationships between data are important. SQL databases excel in handling complex queries.
- Vertical Scaling
NoSQL
- Unstructured or Semi-structured Data
- Scalability: When horizontal scaling is important
- Flexibility: Rapid development cycles where the schema might evolve over time. NoSQL databases allow for a more flexible schema.
- High Availability: Massive amount of data. Applications that require high availability and can tolerate eventual consistency, benefiting from the BASE model (Basically Available, Soft state, Eventually consistent).
SQL Database Scenarios:
Banking and financial services applications that need to process transactions atomically.
Enterprise applications that require complex joins and transactions, such as ERP and CRM systems.
Applications where data integrity and consistency are non-negotiable, such as healthcare systems.
NoSQL Database Scenarios:
Real-time analytics and big data applications where the data structure can vary and evolve over time.
Content management systems and e-commerce platforms that handle a variety of data types and structures.
Mobile apps that require flexible, schema-less data models to iterate quickly during development.
NoSQL
- do not enforce a strict schema. This flexibility allows for varied and nested data structures but makes joining data across different structures or collections more complex.
- To overcome the limitations of joins, NoSQL databases often use denormalization, where related data is stored together in the same document or record.
- Application-Level Joins
[DB] DB Performance Strategy
- Replication
- Writer instance
- Reader instance
- Partiotioning: Divide columns
- Sharding: Divide Rows
- Indexing
- Materialized View: Pre-compute long-running query, ideally low writes
- Denormalization