Databases Flashcards
What are the advantages of SQL over NoSQL
ACID vs CAP. Supports many to many relationships very well through normalized relational data.
If ACID is a much, sql is the way to go (like financial transactions).
Good for transactions though NoSQL supports that too, financial data may want sql for acid.
Supports dynamic querying though ORMs should resolve this issue anyways.
When should you use NoSQL overs equal. What are all he advantages
Essentially Key-value store:
Ability to store unstructured data, have flexible data models
Having a one to many relationship for data, embedding references instead of having relational ones, more human readable and faster machine readable than ‘normalization’.
Can capture highly complex parent child hierarchical structures
Good for horizontal scalability (have automatic solutions too).
Many have built in scalability, sharing and availability requirements for modern web apps.
Good for development because they are unstructured
What are advantages and disadvantages of indexes
Advantage, reads are fast
Disadvantage, writes slow down
They are normally stored as a B-tree or hash depending on what you’re querying in SQL.
Why do database indexes use a B-tree sometimes instead of a hash table?
A B-tree is a generalized binary tree with ‘m’ children.
This allows for Log(n) range selection for sequential data. and log(n) for column comparisons for =,>,>=, between, etc
A hash table would have to do O(n) operations to get a range or comparisons
Hashtables are more useful for primary keys, distinct/unique/ or foreign keys and for ‘=’ or ‘not = ‘ .
Hashtables are also harder to re-index when scaling, since you have pre-defined buckets. This can be inefficient storage.
more reading https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html
How do multi column indexes work in SQL
not sure
What does ACID Stand for
Atomicity - the ability to transactionally commit changes to the database and to rollback changes if there are errors. ‘ All or nothing’.
Consistency - Transactions find the database and leave database in valid state. Transactions are violating any constraints other they will be rolled back. Keep same datatypes, or foreign key/primary key relationships.
Isolation - no other transactions can affect the current transaction in question. Prevents “mid-air Collins”
Durability - System failures or restarts do not affect committed transactions.
What is InnoDB?
Its is a storage engine. MySQL in particular uses this.
Supports ACID transactions, caches, B-tree indexes, encryption foreign key support and has row granularity for locking.
Describe the need for index locking, and index concurrency.
Indexes dramatically increase read speed but slow down write speed, due to adding keys to the index when writing and updating. Only add indexes if necessary.
Index locking is needed for ACID. If a write or update is done with data in an index, the index needs to maintain its integrity so as to not provide bad data to other transactions.
https: //en.wikipedia.org/wiki/Index_locking
https: //en.wikipedia.org/wiki/Database_index
What is Denormalization
Denormalization is a concept which allows for moving data from requiring a bunch of sql ‘joins’ to just reads and doing joins in code or storing differently in a nosql database.
What is Database ‘sharding’?
When do we need to shard?
Splitting database into smaller pieces. Also called horizontal partitioning.
You should shard when
1) The write workload on a single server exceeds capacity.
2) Working set no longer fits in ram
3) Single server no longer handle the size of a dataset
Sharding
advantages and disadvantages
Advantages
1) It makes database searches smaller and faster and easier to maintain.
2) Often reduces transaction costs.
3) Easier to fit on smaller machines rather than one big machine
Disadvantages:
1)sharding itself can be a complex or expensive operation.
2) Data integrity/staleness - Have to make sure data integrity exists during reads and writes, and referential integrity (FK/PK) maintain integrity. Often need more cleanup to delete stale references
3) Joins across shards are not efficient. Should restrict “galactic” reporting. Do merging in applications wherever possible. Denormalization can solves this but then data inconsistency becomes an issue
4) Shard Key must be chosen well. Otherwise hot spots can occur.
5) schema changes can be expensive .
6) Rebalancing can be expensive or require downtime if we change our partitioning scheme or find hotspots. May be solved with directory based partitioning.
What is the difference between an exclusive lock and shard lock
An exclusive lock (also called a write lock) prevents any other transaction from reading or writing a record. Good for atomicity.
A Shard lock ensures no writes are made to a row that is locked while other rows are reading it.
By their nature any number of transactions can hold a shared lock, while one transaction at a time can hold an exclusive lock.
What should you consider when picking a shard key
Shard keys should have high cardinality (measure of elements with in a set) and avoid hotspots with low frequency (lots of keys tending to fall on a particular shard)
What is the CAP Theorem
CAP Theorem - Impossible for a distributed system to have more than two out of the three following guarantees:
Consistency - all nodes see same data t the same time. Need to update all nodes before further reads
Availability - every request to the system gets a response. Can do this by replicating servers (scaling horizontally).
Partition Tolerance - a system is resilient to failure in any of its parts. Data is sufficiently replicated across nodes and networks.
What are different partitioning techniques
Horizontal Paritioning: (range based partitioning - data sharding) - placing different rows into different tables.
Vertical Partitioning - moving different tables to different DBs by function.
Directory Based Partitioning - “loose coupling” placing a lookup (separate) service in front of databases, which knows the partition scheme. Solves elastic scaling problem without using consistent hashing.
Criteria
Key-Hash Based partitioning: Keys are ran through. hash function which then determine which database to put the key in. Disadvantage is rebalancing is difficult when servers are added and removed. This problem can be solved with consistent hashing.
List Partitioning: Each list value is assigned a server (like location based partitioning)
Round-robin partitioning: simple strategy that ensures uniform data distribution.
composite partitioning: a combination of any above. like List + hash partitioning. Consistent hashing could be considered a composite of hash and list partitioning