System design interview an insider guide Flashcards

1
Q

What DNS?

A

Is the 3rd party web ip dictioniary. Returns IP in return for domain name during browser call.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What Problem DNS resolves?

A

Hard to use IP numbers by humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is IP?

A

It’s a internet number of device connected to internet. public ip are unique in the internet, private not. 192.168.0.3 - gdzie pierwsze trzy człony to część sieciowa a ostatnia to numer urządzenia w sieci.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is http?

A

It’s internet protocol, so it’s shape of requests and response which you handles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How looks http request?

A

http_method path http_version
headers
blank line
optional body
*path is url without things taken from cotext, like ‘http://’, like domain(because usually its as ‘host’ header in request

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How looks http response?

A

http_version status_code code_description
headers
blank line
returned body

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is option http method type?

A

Usually used before main rquest with GET/POST.. to check what http method can be used on resource. Knowledge is taken from response’s header Accept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

is dns used during each call by browser?

A

nope, there is DNS cache in OPERATING SYSTEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What problem NoSQL resolves?

A

Performance of queries is low and shape of data is unstructured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are examples of noSQL?

A

MongoDB, redis, elasticsearch, apache cassandra

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What scaling is?

A

Scaling is the process of adding more power to your infrastracture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What problem vertical scaling resolves?

A

Problems with lack of memory/CPU for application

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How vertical scaling resolves problem?

A

Increasing CPU/memory on specific machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What problem horizontal scaling resolves?

A
  • It’s impossible to add unlimited CPU/memory to machine
  • what will happen when machine will fail
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How horizontal scaling resolves problem?

A

Adding servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is failover?

A

That’s process of switching wrong dead server to live server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is redundancy of system?

A

It’s capacity of system to overcome broken elments (like keeping server turned on as a backup) or having kubernetes for quickly setup new server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is load balancer?

A

It’s unit in system which receives requests to server and distrubutes them through machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What problem load balancer resolves?

A
  • cloggeed machine with handling requests
  • failed machines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How servers under load balancere communicate each other?

A

They’re all on the same network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is data replication?

A

It’s master db server with slaves db servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What problem data replication it resolve?

A

Failed db server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What db operations support master server in db replication process?

A

Only write, unless there is no sleeves, so it handles writes and reads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What db operations supports sleeve server in db replication process?

A

Only read.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Why there is more sleeves than masters?

A

In standard system, there are more read operations than write.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are pros of data replication system?

A
  • better performance (more db servers can handle more queries)
  • better reliability (any failed db server can be replaced by another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What will happen when master db server is down?

A

One of sleeves became new master server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is caching?

A

It’s memory db which serves responses quicker than sql server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What problem caching resolves?

A

Long waiting time for expensive responses

30
Q

What is read-through cache strategy?

A

It’s strategy used only for system which fetch something. During request, there is check if cache contains such a response. If not, real request is made and result is saved in cache and returned to caller. Next time response will be returned from cache.

31
Q

Why use expiration policy in caching?

A

It’s good to make sure that cached data is not refreshed every 10 second (cache doesn’t use own capabilties) or cached data is not refresherd every 10 months (risk that such a data is stale is high).

32
Q

What is going on when caching server is down?

A

Another cache server should be available or we assuming that loas of cache is not such a problem

33
Q

What is evicition policy?

A

Cache has some size. If data is close to fill it, some data should be removed to save new one. There are couple of strategies, like LRU (least recently used) when least recently cache value is deleted.

34
Q

What is cdn?

A

It’s geographically distributed cache servers with page static content.

35
Q

What problem cdn resolves?

A

Long waiting time for response with static content.

36
Q

How caching resolves problem of long requests

A

Execution time of requests depends on DISTANCE between client computer and server. Using closer servers can be significant improve.

37
Q

What is static content which is persist by CDN servers?

A

Images, CSS, javascript fiels

38
Q

What is cookie?

A

Data stored on web browser. It is send as “set-cookie” header during server -> client and in “cookie” header during client -> server.

39
Q

How many cookie key,value pairs can be in cookie headers?

A

Multiply of them.

40
Q

When cookie is added to request to specific domain?

A

Cookies are added to EACH request by the browser.

41
Q

Does cookie can live forever?

A

Yep but usually there is possiblity to set expiration date for cookie.

42
Q

What is CSRF?

A

Situation where there is “bad guy page” and underneath it send request to domain where there is a chance that we keep cookie.

Example: this domain is our bank and “bad guy page” is sending request with money transfer for hacker. But they aren’t authenticated. Unfortunatelly cookies are AUTOMATICALLY sent with request to cookie domain, so cookie from our session to bank will be added to request and we lost money

43
Q

What is session?

A

Is the way to remember that user was authenticated - on backend record in db is created and sessionId from it is returned to client as a cookie. Now each request from client needs to have such a cookie in requests.

44
Q

What is back-of-the-envlope estimation?

A

It’s quick and rough estimation of “system solution” done with minimal details and assumptions, often using simple math and logical reasoning. Should be small, that’s why envelope.

45
Q

What is “power of two” in the context of calculation of system ?

A

Data is measured in the way that each unit has value which corresponts to 2^x. Easy to remember crucial units.

46
Q

What is basic unit in computers?

A

One byte - it contains 8 bits.

47
Q

What is kilobyte in ‘power of two’?

A

2^10 (thousand) -> 1024

48
Q

What is megabyte in ‘power of two’?

A

2^20 (milion)

49
Q

What is gigabyte in ‘power of two’?

A

2^30 (bilion)

50
Q

What is terabyte in ‘power of two’?

A

2^40 (trilion)

51
Q

In which units we calculate ‘availability’ of system?

A

In percentages of time being available. 99 % means that 15 minutes daily system can be down, 99,99% means ~9 sec.

52
Q

Divide system design interview on parts

A
  1. Q&A
  2. high-level design
  3. detailed design
  4. wrap up
53
Q

How long should take Q&A on system design interview?

A

3-10 minutes

54
Q

How long should take ‘high-level design’ on system design interview?

A

10-15 minutes

55
Q

How long should take ‘detailed design’ on system design interview?

A

10-25 minutes

56
Q

What you should do during ‘high-level design’ on system design interview?

A
  • initial blueprint of design
  • back-of-the-envelope calculations
  • go through FEW concrete cases
57
Q

What rate limiter does?

A

It is used to control the rate of traffic.

58
Q

Pros of rate limiter usage?

A
  • prevents DoS attacks
  • reduce costs (fewer requests means less money)
59
Q

Why is worth to ask question about “distributed environment” during system design interview?

A

If something works in “distributed environment”, that means that system needs to be prepared for adding new nodes, services, for scaling. It smells like some caching.

60
Q

Why rate limiter on the client side implementation is the bad idea?

A
  • possibility that requests can easily be forged by malicious actors.
  • no control on UI in some cases
61
Q

Where you can put rate limiter in the system?

A
  • client side
  • middleware (gateway)
  • server side
62
Q

When useage commercial API gateway as rate limiter is better than building own one.

A

Building the own rate limiter takes time. If there is lack of engineering resources - use commercial one.

63
Q

What algorithm can be implemented as rate limiter?

A

Token bucket

64
Q

How works “Token bucket” algorithm?

A

There is a container with fixed size which contains tokens. Each request entering the system gets a token. If the container is empty, the request is dropped. There is a fixed period of time, after which the container gets new tokens (only to fill out original size)

65
Q

Using “token bucket” algorithm, how many buckets need to be in the system?

A

It depends on the system - if requirements are detailed, like “5 new posts per day” or “6 queries per IP” it means that for each user and API needs to be created a separate bucket.

66
Q

How many parameters has “Token bucket” algorithm?

A

Two:
- bucket size
- Refill rate

67
Q

What status code should be returned for request blocked by rate limiter?

A

429

68
Q

What response from the system using rate limiter, contains?

A

rate headers:
- X-Ratelimit-Remaining
- X-Ratelimit-Limit
- X-Ratelimit-Retry-After

69
Q

How to resolve scaling for rate limiter (in context of caching rules in one place - synchronization problem)

A

Usage of centralized date store like Redis.

70
Q
A