System Design Flashcards

1
Q

what are the two types of requirements you want to collect at the start of a system design interview?

A

functional and non-functional requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are functional requirements?

A

Specific functionalities that bring the user to the service. These specific functions are directly adjacent to providing the key service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are non-functional requirements?

A

Functionalities that affect the overall operation of the system but are broader system functions that aren’t adjacent to specific functions of a service. Examples include scalability, reliability, usability, security, and performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why is it important that application servers are stateless?

A

so that any server can handle any request for load balancing and failover and resilience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is mongodb a sql or nosql db?

A

nosql

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

in mongodb, what happens when a primary shard node goes down?

A

the secondary nodes will automatically elect a new leader

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

in mongodb, how does the mongos service know which shard a data request belongs to?

A

look up in the config service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how does cassandra trade of consistency to get more availability?

A

because any node can function as the primary read or write node, you are more available, but it might take time for the data to propagate to the other nodes so you have consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the hotspot problem in nosql databases?

A

aka the celebrity problem, if one shard gets a lot more traffic due to usage patters. many modern systems can reshard based on traffic patterns to avoid this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how can you design your data schema to make it easy to scale horizontally using nosql?

A

think in terms of simple key-value lookups. maybe break complex joins into a couple of simple key-value lookups. thats much easier to shard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does it mean to say that data is normalized?

A

data is stored in logical tables based on entity and refer to each other via foreign keys. data is not duplicated all over the place. for exampe, a dinner reservation has a customer_id, which points to the customer table to a row with that id

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the advantages of having normalised data?

A

there is less (or no) data duplication, saving space, and you can update data in a single place and have that reflected everywhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

why might you choose to denormalise your data?

A

it’s more efficient and performant. you get all the data you need with a single lookup, instead of having to do multiple lookups to join all the data. the cost you pay is data duplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the downsides of demormalised data?

A

it costs extra space due to data duplication, and updating data requires you to update in multiple locations which costs time and eventual consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

can you have normalised data in a nosql database?

A

yes, it just means that you would need to do multiple simple lookups. for example, looking up a reservation, and then looking up the customer_id in that reservation to the get the customer data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

would you generally start with normalised or denormalised data? when might you switch

A

normalised, because it’s simpler, saves space, and is more consistent. when performance becomes a bottleneck, you might denormalise to need to do fewer lookups and speed things up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if you design an LRU cache, what two data structures might you use under the hood?

A

a hashmap for finding a kv-pair in constant time, as well as a doubly-linked-list to move read item to the front of the list, so you can always delete the last item in the list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are three common cache eviction strategies?

A

LRU, LFU, FIFO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is LRU?

A

least recently used - a cache eviction strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is LFU?

A

least frequently used - a cache eviction strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are some popular caching solutions?

A

memcached, redis, elasticache, ncache, ehcache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is memchached

A

simple in-memory key value store. nothing fancy like redis has.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does CDN stand for?

A

Content Delivery Network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

dns based geo-routing of traffic to the correct region

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

how can you prevent cascading failures?

A

overprovisioning - ensure that all the other systems can handle the load of one system when it fails and traffic gets redistributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

how many minutes of downtime per year is five nines of availability?

A

about five (5.256)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what is an SLA?

A

a Service Level Agreement. What your users can expect from you. For example, five nines of uptiome, or p95 sub-second latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is HDFS?

A

hadoop distributed file system - an open source self managed distributed file system for big data storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Distributed “NoSQL” databases with a master node that distributes transactions fall on which side of the CAP triangle?

A

Single-master designs favor consistency and partition tolerance. Although in principle availability it what’s given up, in practice modern NoSQL databases have highly redundant master nodes that can quickly replace themselves in the event of failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

In HDFS, the server responsible for coordinating requests is called the:

A

In the Hadoop Distributed File System, the name node coordinates how files are broken into blocks, and where those blocks are stored. In high availability settings, multiple name nodes may be present for failover.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

advantage of a linked list over an array?

A

it can grow dynamically. an array needs to be resized because it is stored sequentially in memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what’s the order of complexity of accessing an item in a linked list?

A

O(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what kinds of data structures can you implement using a linked list?

A

stacks and queues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

whats the difference between a singly and doubly linked list?

A

in a doubly linked list you have pointers going in both directions, and you keep track of both head and tail, so you can move in either direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what data structure is commonly used to implement MRU / LRU?

A

doubly linked list, since you can move things to the back/front of the list, as well as delete from the back/front in constant time

36
Q

under what condition do you reach the worst case for a binary search tree?

A

if you insert every element in order, then you’re creating essentially a linked list, since you’re only adding item to one side the whole time. the tree is not balanced.

37
Q

what are some examples of self balancing binary tress?

A

AVL Tree, Red-Black Tree, Splay Tree, B-Tree, 2-3

38
Q

what is the time complexity of finding an element in a graph using BFS and DFS?

A

O(v+e)

39
Q

merge sort

A
40
Q

quicksort

A
41
Q

between merge sort and quick sort, which is easier to distribute?

A

merge sort

42
Q

what does TF-IDF stand for?

A

term frequency, inverse document frequency

43
Q

in TF-IDF how do you calculate relevance of a term?

A

term frequency / document frequency

44
Q

whats the advantage of using a message queue?

A

it decouples producers and consumers

45
Q

what tech did spark replace and improve on?

A

mapreduce

46
Q

what is spark used for?

A

distributed processing of large amounts of data

47
Q

can you use spark for realtime analytics?

A

yes, with spark streaming connecting to something like kinesis or kafka

48
Q

can you use spark for machine learning?

A

yes, it has libraries for doing that

49
Q

what is OLTP?

A

online transaction processing. exposing your data to the outside world, using it, normal database use

50
Q

OLTP vs OLAP

A

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two different types of data processing systems used in organizations for distinct purposes. OLTP systems are designed for transactional processing and real-time data operations. OLAP systems are designed for analytical processing and complex data analysis

51
Q

can spark be used to analyse graph data?

A

yes

52
Q

what is object storage?

A

blog storage. just storing blobs of data like files. for example on s3.

53
Q

what are the default key-value nosql dbs of the big cloud companies?

A

amazon dynamodb, google bigtable, microsoft cosmosdb / table storage

54
Q

what is the default service for managing containers on the cloud providers?

A

kubernetes

55
Q

what is etcd?

A

it is a distributed key value store used by kubernetes to store the cluster state

56
Q

what is amazon solution for data streaming?

A

kinesis

57
Q

what is the amazon service that you can deploy spark or similar on?

A

EMR - elastic map reduce

58
Q

what is a hybrid cloud?

A

combining your own data center or servers or private cloud with a public cloud

59
Q

explain http PUT, POST, and PATCH

A

HTTP PUT: Update/replace existing resource with new representation.
HTTP POST: Submit data for processing or create new resource.
HTTP PATCH: Partially update existing resource.

60
Q

What is the CAP theorem and its implications for distributed systems?

A

The CAP theorem states that in a distributed system, it is impossible to simultaneously guarantee consistency, availability, and partition tolerance. System designers must make trade-offs between these three properties.

61
Q

What are the key factors to consider during system design?

A

Key factors to consider during system design include scalability, availability, reliability, performance, security, maintainability, and cost.

62
Q

What is availability in the context of system design?

A

Availability refers to the proportion of time that a system remains operational and accessible to users. It is often measured as a percentage of uptime.

63
Q

What is reliability in the context of system design?

A

Reliability refers to the ability of a system to consistently perform its intended function without failure or downtime over a specified period.

64
Q

What is performance in the context of system design?

A

Performance refers to the speed, throughput, and responsiveness of a system, usually measured in terms of latency, throughput, and concurrency.

65
Q

How would you approach designing a highly available system?

A

Designing a highly available system involves techniques like redundancy, load balancing, failover mechanisms, and distributed architecture to minimize single points of failure and maximize uptime

66
Q

What is a relational database?

A

A relational database is a type of database that organizes data into tables with rows and columns, and establishes relationships between these tables using keys.

67
Q

What is normalization?

A

Normalization is the process of organizing data in a database to minimize redundancy and dependency. It involves dividing larger tables into smaller, well-structured tables to improve data integrity and efficiency

68
Q

What is ACID in the context of databases?

A

ACID stands for Atomicity, Consistency, Isolation, and Durability. It is a set of properties that ensure reliable processing and integrity of database transactions

69
Q

What is a NoSQL database?

A

A NoSQL (Not only SQL) database is a type of database that provides a flexible schema and allows for storage and retrieval of unstructured or semi-structured data. It is often used for big data and real-time applications

70
Q

What is atomicity?

A

Atomicity guarantees that a transaction is treated as a single, indivisible unit of work. It ensures that all changes within a transaction are committed or none of them are. If any part of the transaction fails, the entire transaction is rolled back

71
Q

What is consistency in the context of acid?

A

Consistency ensures that a transaction brings the database from one valid state to another. It enforces any predefined rules or constraints on the data, maintaining data integrity throughout the transaction

72
Q

What is isolation in the context of acid?

A

Isolation ensures that concurrent transactions do not interfere with each other. It allows transactions to execute as if they were the only ones running, preventing interference such as dirty reads, non-repeatable reads, and phantom reads

73
Q

What is durability in the context of acid?

A

Durability guarantees that once a transaction is committed, its changes are permanently saved and will survive any subsequent failures, such as system crashes or power outages. The changes become a permanent part of the database

74
Q

What is a document database?

A

A document database stores and retrieves semi-structured data in flexible, self-describing formats such as JSON or XML documents.
Example: MongoDB, Couchbase, Elasticsearch.

75
Q

What is a key-value database?

A

A key-value database stores and retrieves data as a collection of key-value pairs, providing fast and simple storage and retrieval operations.
Example: Redis, Amazon DynamoDB, Apache Cassandra.

76
Q

What is a columnar database?

A

A columnar database stores data in columns rather than rows, optimizing for efficient read operations and analytics.
Example: Apache HBase, Vertica, Apache Parquet.

77
Q

What is a graph database?

A

A graph database models data as nodes, edges, and properties, making it suitable for representing and traversing complex relationships.
Example: Neo4j, Amazon Neptune, JanusGraph.

78
Q

What is a time-series database?

A

A time-series database specializes in storing and analyzing time-stamped data points, making it ideal for data with a temporal component.
Example: InfluxDB, Prometheus, TimescaleDB.

79
Q

What is the difference between a 301 redirect and a 302 redirect?

A

301 Redirect: Permanent redirect indicating a permanent move.
302 Redirect: Temporary redirect indicating a temporary move.

80
Q

how many room nights does booking have per day?

A

1.5M

81
Q

how may physical servers does booking have?

A

over 50K

82
Q

how many requests per second does the booking customer review system need to handle?

A

tens of thousands per second

83
Q

what is the p99 response time sla for the booking review system?

A

p99 < 50ms

84
Q

how many reviews does the booking system handle?

A

over 250 million reviews

85
Q

explain monotonicity

A

One of the properties of consistent hashing is monotonicity, which says that when the number of shards is increased, keys move only from old shards to new shards (no unnecessary rearrangement)

86
Q

what is the volume of events handled by booking per day?

A

we have more than billions of events per day, streaming at more than 100 MB per second, and adding up to more than 6 TB per day.