Consistency Consistency ensures that a transaction can only bring the database from one valid state to another. Maintains database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. T Prevents database corruption by an illegal transaction, but does not guarantee that a transaction is correct.

Limit the number of columns in the select list Use highly selective where clauses Operate on a small results set Use indexes effectively Avoid optimizer hints

You want indexes on columns frequently used in WHERE clauses and JOIN criteria. Index should be as narrow as possible (fewest columns and smallest data types). Strings, binary fields, and globally unique identifiers can be wide. Uniqueness. Putting an index on a column with very little variance wouldn't be very helpful. Consider column order in composite index

Databases Flashcards by Katie D

CAP

Consistency

Availability

Partition Tolerance

How well did you know this?

Not at all

Perfectly

C in ACID?

Consistency

Consistency ensures that a transaction can only bring the database from one valid state to another.
Maintains database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. T
Prevents database corruption by an illegal transaction, but does not guarantee that a transaction is correct.

How well did you know this?

Not at all

Perfectly

A in ACID

Atomicity

Each transaction (a group of statements) succeeds completely or fails completely.

If any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged.

An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes.

How well did you know this?

Not at all

Perfectly

I in ACID

Isolation

Transactions are often executed concurrently (e.g., reading and writing to multiple tables at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.

Isolation is the main goal of concurrency control; depending on the method used, the effects of an incomplete transaction might not even be visible to other transactions.

How well did you know this?

Not at all

Perfectly

D in ACID

Durability

Once a transaction has been committed, it will remain committed even in the event of a power outage of crash.

This usually means that completed transactions (or their effects) are recorded in non-volatile memory.

How well did you know this?

Not at all

Perfectly

ACID

Atomicity
Consistency
Isolation
Durability

How well did you know this?

Not at all

Perfectly

What are the sources of most database performance problems?

database design
query design
index design

How well did you know this?

Not at all

Perfectly

Query Design

Limit the number of columns in the select list
Use highly selective where clauses
Operate on a small results set
Use indexes effectively
Avoid optimizer hints

How well did you know this?

Not at all

Perfectly

Index Design

You want indexes on columns frequently used in WHERE clauses and JOIN criteria.
Index should be as narrow as possible (fewest columns and smallest data types). Strings, binary fields, and globally unique identifiers can be wide.
Uniqueness. Putting an index on a column with very little variance wouldn’t be very helpful.
Consider column order in composite index

How well did you know this?

Not at all

Perfectly

Non-Clustered Index

Defines a logical order that doesn’t necessarily match the physical order (the order records are written on disk). Can have many non-clustered indicies per table. Quicker for insert and update operations.

How well did you know this?

Not at all

Perfectly

SQL Full Outer Join

SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;

Union of two sets

How well did you know this?

Not at all

Perfectly

SQL Right Join / Right Outer Join

SELECT column_name(s)
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;

The RIGHT JOIN keyword returns all records from the right table (Employees), even if there are no matches in the left table (Orders).

How well did you know this?

Not at all

Perfectly

SQL Left Join / Left Outer Join

SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

The LEFT JOIN keyword returns all records from the left table (Customers), even if there are no matches in the right table (Orders).

The left table is the one of the left (the first one) in the query.

How well did you know this?

Not at all

Perfectly

SQL Inner Join

SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

Intersection of two sets

The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns. If there are records in the “Orders” table that do not have matches in “Customers”, these orders will not be shown!

How well did you know this?

Not at all

Perfectly

Clustered Index

Reorders the way records are physically stored. A table can only have one clustered index. Faster to read.

How well did you know this?

Not at all

Perfectly

Physical Logging

Study These Flashcards

Logging the contents of each row that is changed

Logical Logging

Study These Flashcards

Logging the SQL commands that lead to the row changes

Fact Table

Study These Flashcards

Record measurements or a specific event (e.g. a sale)

Dimension Tables

Study These Flashcards

Attributes of a noun (e.g. details about a product, customer, or store). Fewer rows than fact tables but many columns (attributes).

Star Schema

Study These Flashcards

Fact table at the center surrounded by dimension tables organized as the points of a star. Traditionally DEnormalized.

Snowflake Schema

Study These Flashcards

Similiar to star schema but dimension tables are normalized to avoid redundnacy

Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.

RDMS: Horizontal and Vertical Scaling

Study These Flashcards

Horizontal scaling means that you scale by adding more machines into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.

An easy way to remember this is to think of a machine on a server rack, we add more machines across the horizontal direction and add more resources to a machine in the vertical direction.

In a database world horizontal-scaling is often based on the partitioning of the data i.e. each node contains only part of the data, in vertical-scaling the data resides on a single node and scaling is done through multi-core i.e. spreading the load between the CPU and RAM resources of that machine.

With horizontal-scaling it is often easier to scale dynamically by adding more machines into the existing pool - Vertical-scaling is often limited to the capacity of a single machine, scaling beyond that capacity often involves downtime and comes with an upper limit.

Good examples of horizontal scaling are Cassandra, MongoDB, Google Cloud Spanner .. and a good example of vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL). It provides an easy way to scale vertically by switching from small to bigger machines. This process often involves downtime.

In-Memory Data Grids such as GigaSpaces XAP, Coherence etc.. are often optimized for both horizontal and vertical scaling simply because they’re not bound to disk. Horizontal-scaling through partitioning and vertical-scaling through multi-core support.

You can read more on this subject in my earlier posts: Scale-out vs Scale-up and The Common Principles Behind the NOSQL Alternatives

SQL vs NoSQL Tradeoffs

Study These Flashcards

Traditional RDMS databases ran on expensive specialized hardware. The idea was to avoid failure and the risk of failure is mitigated through regular backups and secondary machines.
NoSQL systems are built on the assumption that disks, servers, and networks fail.

Advantages of Data Partitioning

Study These Flashcards

Minimize the impact of failure
Distribute the load for read and write operations

Relational Database vs Columnar Database

- Relational databases are optimized for storing rows typically for transactional applications. Transactional applications have frequent writes. - Columnar database are optimized for \*retrieving\* columns typically for analytical applications. (Analytic data is typically written infrequently by batch jobs.) - Column-oriented storage is important for analytical queries because it dramatically reduces the amount of disk I/O (only need to retrieve the columns you care about not the entire row) and reduces the amount of data you need to load from disk (repetitive data can be compressed)

Databases Flashcards

(25 cards)