System Design Concepts Flashcards

Question

What are different types/"flavors" of NoSQL databases

Answer 1

Key Value Store Document Store Wide Column Databases Graph Databases

Answer 2

Used for simple data models or rapidly changing data, such as an in-memory cache layer. Complexity is shifted to application layer if additional operations are needed

Answer 3

Documents are organized by collections, tags, metadata, and directories. They may have fields in each one different from one another. Documents are good for working with occasionally changing data.

Answer 4

Basic unit data is a column (name/value) pair. A column can be grouped into column families. Super column families can further group column families. You can access data with a row key. Bigtable, HBase, Cassandra maintain keys in lexicographic order. Used for large data sets.

Answer 5

Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Disadvantages: - data is duplicated, leading to greater disk space - constraints can help redundant copies of information need to stay in sync, which increases complexity of the database design - denormalized database under heavy write load might perform worse than normalized counterpart

Answer 6

Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Disadvantages: - data is duplicated - constraints can help redundant copies of information stay in sync, which increases complexity of the database design - denormalized database under heavy write load might perform worse than normalized counterpart

Answer 7

Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Disadvantages: - data is duplicated - constraints can help redundant copies of information stay in sync, which increases complexity of the database design - denormalized database under heavy write load might perform worse than normalized counterpart

Answer 8

Break up the table by putting hot spots in a separate table to help keep it in memory.

Answer 9

Break up the table by putting hot spots in a separate table to help keep it in memory.

Answer 10

Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships. They are used to represent relationships, such as a social network.

Answer 11

Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships. They are used to represent relationships, such as a social network.

Answer 12

Client side caching - e.g. browser, CDNs Web Server Caching - reverse proxies and caches such as Varnish can serve static and dynamic content directly without contacting application servers Database Caching - caching to prevent having to query and retrieve data from disk Application Caching - in-memory caches such as Memcached or Redis are key-value stores between application and data storage.. Data is held in RAM so it's much faster than typical databases store in disk, but need cache invalidation over time.

Answer 13

A write through cache has the application use the cache as main data store, reading and writing data to it, while cache is responsible for reading and writing to db. This is a slow overall operation due to the write operation, but subsequent reads of data just written are fast.

Answer 14

Application adds/updates entry in cache, then writes entry to the data store asynchronously, resulting in improved write performance. However, there could be data loss if the cache goes down prior to its contents hitting data store, and it's more complex to implement write-behind than cache aside or write through.

Answer 15

Cache will automatically refresh any recently accessed cache entry prior to its expiration. This can reduce latency vs read-through if the cache can accurately predict which items are likely to be needed in the future. However, you will need to accurately predict which items are needed in the future.

Answer 16

The application is responsible for reading and writing data from storage. The application will look for an entry in cache, if cache misses, it'll load the entry from database, add entry to cache, then return entry. Drawbacks in that cache miss results in 3 trips, and data can become stale; empty node will wipe existing data out.

Answer 17

If you have a performance problem, your system is slow for a single user. If you have a scalability problem, your system is fast for a single user for slow under heavy load.

Answer 18

Consistent hashing suffers from potential uneven distribution with the addition and removal of nodes (assuming hash spaces are constant) We can solve this by having the hash space have many virtual nodes

Answer 19

scalable (thousands of requests per second), performant (fast latency/response time), available (no single point of failure, survives hardware/network failures)

Answer 20

Consistency (SQL) and ACID acronym vs Availability (NoSQL); Read Heavy (SQL) vs Read AND Write Heavy (NoSQL)

Answer 21

Sharding (based on the shard index) and replication (based on leader/follower, leader/leader arrangements) Make sure too that database clusters live in different datacenters

Answer 22

Noun with fields bundled into individual tables, joined by foreign keys

Answer 23

TCP load balancing forwards networking packets without inspecting contents of packets (enabling them to be faster) HTTP load balancing looks inside message to figure out what to do based on message contents (headers, cookies, etc.)

Answer 24

latency, traffic, errors, and saturation

Answer 25

Strong audit system calculates views and does a full "runthrough" comparing results at the end Weak audit systems may include end to end integration tests which make sure that the baseline functionality is correct

Answer 26

Semaphore - determine how many read threads; can be adjusted dynamically

Answer 27

We need to figure out if we want a push fan out (fan out on write) or a pull fan out (fan out on load) Push fan out immediately pushes a post to all followers. It has an advantage that it reduces read operations so you don't need to go through your friends to get their posts. But, it has disadvantage that a hot user will push to a significant number of users. Pull fan out pulls posts from followers based on user interaction. The main problem is that new data might not be shown to users until they issue a new request.

System Design Concepts Flashcards

(51 cards)