Databases Flashcards

1
Q

CAP

A

Consistency

Availability

Partition Tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

C in ACID?

A

Consistency

  • Consistency ensures that a transaction can only bring the database from one valid state to another.
  • Maintains database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. T
  • Prevents database corruption by an illegal transaction, but does not guarantee that a transaction is correct.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A in ACID

A

Atomicity

Each transaction (a group of statements) succeeds completely or fails completely.

If any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged.

An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

I in ACID

A

Isolation

Transactions are often executed concurrently (e.g., reading and writing to multiple tables at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.

Isolation is the main goal of concurrency control; depending on the method used, the effects of an incomplete transaction might not even be visible to other transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

D in ACID

A

Durability

Once a transaction has been committed, it will remain committed even in the event of a power outage of crash.

This usually means that completed transactions (or their effects) are recorded in non-volatile memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ACID

A

Atomicity
Consistency
Isolation
Durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the sources of most database performance problems?

A
  • database design
  • query design
  • index design
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Query Design

A
  • Limit the number of columns in the select list
  • Use highly selective where clauses
  • Operate on a small results set
  • Use indexes effectively
  • Avoid optimizer hints
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Index Design

A
  • You want indexes on columns frequently used in WHERE clauses and JOIN criteria.
  • Index should be as narrow as possible (fewest columns and smallest data types). Strings, binary fields, and globally unique identifiers can be wide.
  • Uniqueness. Putting an index on a column with very little variance wouldn’t be very helpful.
  • Consider column order in composite index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Non-Clustered Index

A

Defines a logical order that doesn’t necessarily match the physical order (the order records are written on disk). Can have many non-clustered indicies per table. Quicker for insert and update operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SQL Full Outer Join

A

SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;

Union of two sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SQL Right Join / Right Outer Join

A

SELECT column_name(s)
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;

The RIGHT JOIN keyword returns all records from the right table (Employees), even if there are no matches in the left table (Orders).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SQL Left Join / Left Outer Join

A

SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

The LEFT JOIN keyword returns all records from the left table (Customers), even if there are no matches in the right table (Orders).

The left table is the one of the left (the first one) in the query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SQL Inner Join

A

SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

Intersection of two sets

The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns. If there are records in the “Orders” table that do not have matches in “Customers”, these orders will not be shown!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Clustered Index

A

Reorders the way records are physically stored. A table can only have one clustered index. Faster to read.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Physical Logging

A

Logging the contents of each row that is changed

17
Q

Logical Logging

A

Logging the SQL commands that lead to the row changes

18
Q

Fact Table

A

Record measurements or a specific event (e.g. a sale)

19
Q

Dimension Tables

A

Attributes of a noun (e.g. details about a product, customer, or store). Fewer rows than fact tables but many columns (attributes).

20
Q

Star Schema

A

Fact table at the center surrounded by dimension tables organized as the points of a star. Traditionally DEnormalized.

21
Q

Snowflake Schema

A

Similiar to star schema but dimension tables are normalized to avoid redundnacy

Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.

22
Q

RDMS: Horizontal and Vertical Scaling

A

Horizontal scaling means that you scale by adding more machines into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.

An easy way to remember this is to think of a machine on a server rack, we add more machines across the horizontal direction and add more resources to a machine in the vertical direction.

In a database world horizontal-scaling is often based on the partitioning of the data i.e. each node contains only part of the data, in vertical-scaling the data resides on a single node and scaling is done through multi-core i.e. spreading the load between the CPU and RAM resources of that machine.

With horizontal-scaling it is often easier to scale dynamically by adding more machines into the existing pool - Vertical-scaling is often limited to the capacity of a single machine, scaling beyond that capacity often involves downtime and comes with an upper limit.

Good examples of horizontal scaling are Cassandra, MongoDB, Google Cloud Spanner .. and a good example of vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL). It provides an easy way to scale vertically by switching from small to bigger machines. This process often involves downtime.

In-Memory Data Grids such as GigaSpaces XAP, Coherence etc.. are often optimized for both horizontal and vertical scaling simply because they’re not bound to disk. Horizontal-scaling through partitioning and vertical-scaling through multi-core support.

You can read more on this subject in my earlier posts: Scale-out vs Scale-up and The Common Principles Behind the NOSQL Alternatives

23
Q

SQL vs NoSQL Tradeoffs

A
  • Traditional RDMS databases ran on expensive specialized hardware. The idea was to avoid failure and the risk of failure is mitigated through regular backups and secondary machines.
  • NoSQL systems are built on the assumption that disks, servers, and networks fail.
24
Q

Advantages of Data Partitioning

A
  • Minimize the impact of failure
  • Distribute the load for read and write operations
25
Q

Relational Database vs Columnar Database

A
  • Relational databases are optimized for storing rows typically for transactional applications. Transactional applications have frequent writes.
  • Columnar database are optimized for *retrieving* columns typically for analytical applications. (Analytic data is typically written infrequently by batch jobs.)
  • Column-oriented storage is important for analytical queries because it dramatically reduces the amount of disk I/O (only need to retrieve the columns you care about not the entire row) and reduces the amount of data you need to load from disk (repetitive data can be compressed)