Databases Flashcards
CAP
Consistency
Availability
Partition Tolerance
C in ACID?
Consistency
- Consistency ensures that a transaction can only bring the database from one valid state to another.
- Maintains database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. T
- Prevents database corruption by an illegal transaction, but does not guarantee that a transaction is correct.
A in ACID
Atomicity
Each transaction (a group of statements) succeeds completely or fails completely.
If any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged.
An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes.
I in ACID
Isolation
Transactions are often executed concurrently (e.g., reading and writing to multiple tables at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
Isolation is the main goal of concurrency control; depending on the method used, the effects of an incomplete transaction might not even be visible to other transactions.
D in ACID
Durability
Once a transaction has been committed, it will remain committed even in the event of a power outage of crash.
This usually means that completed transactions (or their effects) are recorded in non-volatile memory.
ACID
Atomicity
Consistency
Isolation
Durability
What are the sources of most database performance problems?
- database design
- query design
- index design
Query Design
- Limit the number of columns in the select list
- Use highly selective where clauses
- Operate on a small results set
- Use indexes effectively
- Avoid optimizer hints
Index Design
- You want indexes on columns frequently used in WHERE clauses and JOIN criteria.
- Index should be as narrow as possible (fewest columns and smallest data types). Strings, binary fields, and globally unique identifiers can be wide.
- Uniqueness. Putting an index on a column with very little variance wouldn’t be very helpful.
- Consider column order in composite index
Non-Clustered Index
Defines a logical order that doesn’t necessarily match the physical order (the order records are written on disk). Can have many non-clustered indicies per table. Quicker for insert and update operations.
SQL Full Outer Join
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;
Union of two sets

SQL Right Join / Right Outer Join
SELECT column_name(s)
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;
The RIGHT JOIN keyword returns all records from the right table (Employees), even if there are no matches in the left table (Orders).

SQL Left Join / Left Outer Join
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
The LEFT JOIN keyword returns all records from the left table (Customers), even if there are no matches in the right table (Orders).
The left table is the one of the left (the first one) in the query.

SQL Inner Join
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
Intersection of two sets
The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns. If there are records in the “Orders” table that do not have matches in “Customers”, these orders will not be shown!

Clustered Index
Reorders the way records are physically stored. A table can only have one clustered index. Faster to read.
Physical Logging
Logging the contents of each row that is changed
Logical Logging
Logging the SQL commands that lead to the row changes
Fact Table
Record measurements or a specific event (e.g. a sale)
Dimension Tables
Attributes of a noun (e.g. details about a product, customer, or store). Fewer rows than fact tables but many columns (attributes).
Star Schema
Fact table at the center surrounded by dimension tables organized as the points of a star. Traditionally DEnormalized.
Snowflake Schema
Similiar to star schema but dimension tables are normalized to avoid redundnacy
Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.
RDMS: Horizontal and Vertical Scaling
Horizontal scaling means that you scale by adding more machines into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.
An easy way to remember this is to think of a machine on a server rack, we add more machines across the horizontal direction and add more resources to a machine in the vertical direction.
In a database world horizontal-scaling is often based on the partitioning of the data i.e. each node contains only part of the data, in vertical-scaling the data resides on a single node and scaling is done through multi-core i.e. spreading the load between the CPU and RAM resources of that machine.
With horizontal-scaling it is often easier to scale dynamically by adding more machines into the existing pool - Vertical-scaling is often limited to the capacity of a single machine, scaling beyond that capacity often involves downtime and comes with an upper limit.
Good examples of horizontal scaling are Cassandra, MongoDB, Google Cloud Spanner .. and a good example of vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL). It provides an easy way to scale vertically by switching from small to bigger machines. This process often involves downtime.
In-Memory Data Grids such as GigaSpaces XAP, Coherence etc.. are often optimized for both horizontal and vertical scaling simply because they’re not bound to disk. Horizontal-scaling through partitioning and vertical-scaling through multi-core support.
You can read more on this subject in my earlier posts: Scale-out vs Scale-up and The Common Principles Behind the NOSQL Alternatives
SQL vs NoSQL Tradeoffs
- Traditional RDMS databases ran on expensive specialized hardware. The idea was to avoid failure and the risk of failure is mitigated through regular backups and secondary machines.
- NoSQL systems are built on the assumption that disks, servers, and networks fail.
Advantages of Data Partitioning
- Minimize the impact of failure
- Distribute the load for read and write operations