FUNDAMENTALS Flashcards

1
Q

what is a system

A

An architecture or collection of technologies that serve a set of users to fulfill a set of requirements. Consists of the users, requirements and the components to server the users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is design

A

The process of understanding the user requirements and selecting the appropriate components, modules and software technologies, how are going to be intertwined and communicate with each other to meet the needs of the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is system design

A

System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specific requirements. It involves creating a blueprint that guides the construction and implementation of a software or hardware system to ensure it meets business and technical needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 broad components of a system

A

Logical and tangible entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of Tangible entities

A
  • Text, images, videos,..
  • mongoDB, Java
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of logical entities

A

Data, database, applications, cache, message queues, infra

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a client-server architecture

A

client requests data from the server
Server responds to the client’s request

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

thin client

A

When the core of the business logic sits on the server-side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

thick client

A

When the core of the business logic sits on the client-side, e.g. gaming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 - tier

A

client - 1 server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

3 - tier

A

client - server - database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

N - tier

A

client - CDN - server - loadbalancer - database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a proxy server

A

A middleman, that communicates on behalf of a component/node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Forward proxy

A

Sits between the client and server on the client side, and communicates with server on behalf of the client, hence client anonymity. Controls, and monitors traffic from clients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

reverse proxy

A

Sits between the client and server on the server side and responds to client requests on behalf of the server, hence server anonymity. Controls, and monitors traffic, load balancing, caching, DDoS attacks mitigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Proxy caveats

A

Single point of failure, bottleneck for reverse proxy
Forward proxy restriction bypass

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data representation in different layers
o Business
o Application
o Data stores(databases)
o Network
o Hardware

A

o Business; text, video, image
o Application: JSON, XML
o Data stores(databases): tables, lists, indexes
o Network: packets
o Hardware: bits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data stores types

A

o Databases
o Queues
o Caches
o Indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data-flow methods

A

o APIs
o Message queue
o Events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data generation sources

A

o Users: created by user directly
o Internal: created by the system
o Insights: created based on user information, actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Factors determining the data storage type

A

o Type of data
o Volume
o Consumption/retrieval
o Security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

database type factors

A

Vary by data type and structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

database types

A

Relational
Non-relational
File
Network
Large datasets
time series
storage bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Properties of relation

A

Schema
* If data is structured and relational
ACID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

meaning of ACID

A
  • Atomicity, completed or rollback
  • Consistency; reads values are the same
  • Isolation, reads/writes independent of each other
  • Durability; transactions completed will survive permanently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Pros of relational databases

A
  • Easily design complex related data
  • Ensures null value not populated
  • Ensures schema constraints are followed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Cons of relational databases

A
  • Horizontal scaling is complex
  • Adding new columns is complex
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Examples of non-relational databases

A

KV store like hashmap
Column based, cassandra, hbase, scylla
Document based;
Search db; elastic search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

properties of column based

A
  • Has tables and columns but no ACID
  • Midway of relational and document
  • Heavy writes, special reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Properties of document based

A
  • Supports heavy read/writes
  • No fixed schema, downside null values
  • No ACID, cannot ensure completed transactions
  • Benefits
    o Highly scalable
    o Sharding
    o Dynamic data flexibility
    o Special query operations/aggregation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Elements of Application design

A

o Requirement
o Layer
o Tech stack
o Code structure/design pattern
o Data store instructions
o Performance/cost
o Deployment
o Monitoring
o Operational excellence/reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is an API

A

Interaction between different applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Advantages of API

A

o Provides communication b/n different systems
o Abstraction
o Platform agnostic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Examples of APIs

A

o Private
o Public
o Web APIs
o SDK/library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

API contracts

A

o RPC
o SOAP
o REST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

API factors

A

o API contracts
o Documentation
o Data format
o Security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what is caching

A

Hardware or software component which helps save frequently accessed data or expensive to compute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what is cache invalidation

A

removing, updating cache value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

cache invalidation methods

A

o Removing old cached value, and replacing with updated data
o By keeping a TTL, cache expiry time
o Removing and updating the cached value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

cache eviction strategies

A

o Replace existing keys when limit reached
o Least recently used keys
o FIFO
o Least frequently used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

popular cache patterns

A

o Cache-aside strategy/ pattern
o Read through strategy/pattern
o Write around strategy/pattern
o Write back pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

describe Cache-aside strategy/ pattern

A

 Cache never talks to DB but to application code
 Redundancy when cache goes down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Read through strategy/pattern

A

 Client – app/server – cache – DB
 Imposes extra latency

42
Q

Write around strategy/pattern

A

 Client – app/server ––[reads]– cache
 ––[write]— DB
 Application writes to DB and reads from cache
 When the load is write heavy

43
Q

Write back pattern

A

 Client – app – cache – DB
 Writes are written to cache and batched to DB
 Write heavy
 Handles DB failure
 Caches single point of failure

44
Q

Where is my cache placed?

A

Browser, fwd proxy, rev proxy, APP, database

45
Q

What is REST?

A

o Representational State Transfer
 Transfers the state of data
o Guidelines to be followed for data exchange in a client-server architecture

46
Q

Guidelines of REST

A

o Client-server
o Cacheable
o Layered
o Stateless
 One server does not know about multiple clients
o Uniform interface
o Code on demand

47
Q

Message queue

A
  • A process with a data structure in memory to store the messages, on the same or different machine
  • E.g SQS, Kafka, RabbitMQ
  • Contains
    o Messages: short-sized data with instructions
48
Q

Components of a Message queue

A

o Producers
o Consumers
o Can handle multiple requests
o Can scale
 Increase consumers
o Requests remain queued in case of consumer outage or failure

49
Q

Ordering & Consumption

A

o One-to-one consumption
o Ordering is possible, blocks processing for failed messages for FIFO queues
o Unordered queue
 Retry/dead queue
o One to many(publish-subscribe)

50
Q

Publisher – subscribe pattern

A
  • Publisher inputs messages to input channel, messages output to output channel to subscribers
  • Message broker sits between input – output, splits topics
51
Q

Factors of pub-sub pattern

A

o No message ordering, ordering can be achieved by try/retry pattern
o Multiple message consumers
o Poison/wrongly formatted messages should be handled
o Duplicate messages are avoided

52
Q

use cases

A

o Async workflow
 Send order confirmation
 Process item for packaging
 Generate invoice
 Initiating seller workflow
o Decoupling
 Getting user actions, for analytics
o Load balancing
 Increases subscribers to handle load accordingly
o Deferred processing
 Queues tasks to be performed at low peak hours
o Data streaming

53
Q

performance metrics

A
  • Throughput
  • Bandwidth
  • Latency
  • Response time
54
Q

Throughput

A

o Some amount of work done in a particular time
o This can be increased by increased system capacity, number of systems
 e.g number of API calls per unit of time

55
Q

Bandwidth

A

The capacity of the transport path

56
Q

Response time

A

Time taken to process request

57
Q

Latency

A

Delay in response

58
Q

Performance metrics of applications

A

o API response time
o Throughput of APIs
o Low latency
o Error occurrences
o Bug/defect in the code

59
Q

Performance metrics of databases

A

o Query times, number of queries executed per unit time
 Depending on database, schema structure
o Memory management

60
Q

Performance metrics of CACHE

A

o Write latency
o Number of cache eviction and invalidation (high rate)
o Memory of cache instance

61
Q

Performance metrics of MESSAGE QUEUES

A

o Rate of production and consumption
o Fraction of stale or unprocessed messages
o The number of consumers affects bandwidth and throughput

62
Q

Performance metrics of WORKERS

A

o Job completion time
o Resources used in processing

63
Q

Performance metrics of SERVER INSTANCES

A

o Memory, CPU

64
Q

PERFORMANCE METRICS TOOLS

A

o Newrelic, datadog,

65
Q

Understanding faults

A

o Request/response time due to bug
o Server overload

66
Q

Fault tolerance

A

o Replicate server instances
o Proper code testing
o Handling errors with proper responses

67
Q

Types of faults

A

o Transient fault
 Occurs briefly and is hard to locate
o Permanent fault
 Continues until fixed
 Easily identifiable

68
Q

What is scaling

A

o Increased load
o Not complex
o Performance should not take a hit increase

69
Q

Forms of scaling

A

o Vertical
 For smaller systems, by increasing server capacity
o Horizontal

70
Q

what is database replication

A

o To have a copy of some data
o Primary/secondary database

71
Q

why replication

A

o Redundancy
o Latency
o Throughput

72
Q

Understanding replication lag

A
73
Q

Replicating synchronously

A

o Write requests get ack only after all replication write, here replication lag is zero
o Downside is performance hit
o Block ack response if one replica goes down

74
Q

Asynchronous replication

A

o Primary sends messages to all replicas, receives ack
o Inconsistency in case one fails
o High performance as replicates in the background

75
Q

Semi-synchronous

A

Sends writes requests and acks from one replica

76
Q

CAP THEOREM acronym

A

CONSISTENCY, AVAILABILITY, PARTITION TOLERANCE

77
Q

what is a distributed system ?

A

A system consisting of a group of machines working in coordination so as to appear as a single coherent system to the end user.

78
Q

Meaning of consistency in CAP

A

Any read that is happening after a latest write, all the nodes should return the latest value of that wrire.

79
Q

Meaning of availability in CAP

A

Any available node in the system should respond in a non-error format to any read request without the guarantee of returning the latest write.

79
Q

Partition tolerance

A

System will be responding to all read and write even if the communication channel between nodes is broken or partitioned.

80
Q
A
81
Q

Vertical partitioning

A

Partitioning database by storing columns in separate partitions.

81
Q

Horizontal partitioning

A

storing rows into separate partitions (sharding)

81
Q

What is database sharding

A

Storing the data into multiple database servers.

82
Q

Physical shards

A

partitions on separate physical servers

82
Q

Algorithmic sharding

A

The app knows which db to contact

82
Q

Sharding strategies/considerations

A

Algorithmic
Dynamic

82
Q

Logical shards

A

Partitioning users by user IDs into partitions

82
Q

Advantages of physical sharding

A
  • Query directed to particular shard, hence faster
  • Can be placed into different geo locations
  • Avoids single point of failure
83
Q

Dynamic Dynamic

A

App talks to a module, that looks up where to send the query

84
Q

Drawbacks of sharding

A
  • Impacting performance when not properly partitioned by overloading one partition.
  • Reverting to non-sharding architecture is challenging
  • Performing joint queries is very expensive
  • Not supported by all DBMS by default
85
Q

Key-based sharding

A
  • Determining database shard by computing key from column input using a hash function.
  • Use keys of values that do not change frequently., a combination of columns can be used as well if static.
  • Data is evenly distributed
  • The challenge with increasing shards and redistributing the data/columns
86
Q

Range-based sharding

A
  • Dividing data into shards by a given range like date range
  • Hence shards queried by the ranges
  • Also, it can be separated by price range for e-commerce.
  • The same database schema for all shards
  • No hashing function so can easily add more machines
  • Some ranges canbe overloaded due to higher event periods(hotspots)
87
Q

use-cases of Range-based sharding

A

o Dividing by usernames, alphabetically
o Useful when range queries are required.

88
Q

Directory-based sharding

A
  • In dynamic sharding keys are stored in a lookup table.
  • In directory-based sharding, shards are created by fixed zones, the addition of shards does not alter existing shards.
  • Increased latency with the lookup table
  • Point of failure with the lookup table, the solution could be a replica of the lookup table
89
Q

what is hashing

A
  • Fast access
  • Using a hash function to generate a key used to store and access a given value in the database
  • Adding or removing servers causes all keys to be reassigned
90
Q

consistent hashing

A
  • Problem with remapping of the existing keys on increase or decrease of servers.
  • Circular representation of servers and keys
  • Creating replicas of servers
    more: https://www.toptal.com/big-data/consistent-hashing
91
Q

Foundations of system design interviews

A

functional, non-functional requirements

92
Q

elements of functional requirements

A

based on user journey or stories

93
Q

elements of non-functional requirements

A

o Traffic
o Users
o Availability
o Throughput
o Latency
o Choice/No. of resources – capacity estimation

94
Q

CAPACITY ESTIMATION TIPS

A
  • Availability
  • Number of transactions
95
Q

capacity relationships,
Bits
Bytes
Kb
MB
GB

A
96
Q

timing relationships

A

1 million transactions/day = 12/sec
1 million transactions/day = 700/min
1 million transactions/day = 4200/hour

97
Q

5 MISTAKES TO AVOID

A
  • Not understanding the requirements properly, you can ask directly (functional or functional requirements)
  • NOT ESTIMATING THE LOAD
  • NOT DISCUSSING THE TRADEOFFS
  • NOT FINDING FAULTS WITH SOLUTION
  • BETTER TO SAY, I DON’T KNOW
97
Q
A