System Designs Fundamentals Flashcards
What is a client?
A machine or process that requests data or service from a service.
Note that a single machine can be either a client or a server
What is a Server?
A machine or process that provides data or service from for a client, usually by listening to incoming network calls
What is an IP address?
An address given to a machine connected to the public internet. Ipv4 addresses consists of four numbers separated by dots a.b.c.d where all four numbers are between 0-255.
- 0.0.1 - localhost
- 168.x.y - your private network
What is IP?
Internet protocol. This network protocol outlines how almost all machine-to-machine communications should happen in the world. Other protocols like TCP, UDP, and HTTP are built on top of IP
What is TCP?
Network protocol built on top of IP. Allows for ordered, reliable data delivery between machines over the public internet by creating a connection.
TCP is usually implemented at kernel level, which exposes sockets to applications that they can use to stream data through an open connection
What is HTTP?
The hypertext transfer Protocol is a very common network protocol implemented on top of TCP. Clients make HTTP requests and servers respond with a response.
They usually have the following esquema. Host, Port, method (GET,POST..), headers, body
What is latency?
The time that takes for a certain operation to complete in a system. Most often this measure is a time duration, like milliseconds or seconds.
What is throughput?
The numbers of operations that a system can handle properly per time unit. For ex. the throughput of a server can often be measured in requests per second (RPS or QPS)
What is availability?
The odds of a particular service being up and running at any point in time, usually measured in percentage. A server that has 99% availability will be operational 99% of the time (having two nines availability)
What is High availability?
Describe systems that have at least 5 nines or more of availability 99,999%
What is redundancy?
Process of replicating parts of a system in effort to make it more reliable
What is SLA/SLO?
SLA is short of service level agreement, SLA is a collection of guarantees given to a customer by a service provider. SLAs typically make guarantees on a system availability. SLAs are made up of one or more SLO.
SLO is service level objective
What are some types of cache Eviction Policy?
FIFO, LRU (least recently used), LFU (least frequently used)
What is Content Delivery Network?
a CDN is a third-party service that acts like a cache for your servers. Sometimes web apps can be slow for users in a particular region. CDN has servers all around the world meaning that the latency o a CDNs service will be always better than to your servers. Most populars CDN are Cloudflare and Google Cloud CDN
What is forward proxy?
A server that sits between a client and servers and acts on the behalf of the CLIENT, typically to mask the client’s identity (IP address).
What is a reverse Proxy?
A server that sits between the client and servers and acts on the behalf of the SERVER, typically for logging, load balancing or caching.
What is a load Balancer?
A reverse proxy that distributes traffic across servers
What is SHA?
“Secure Hashing Algorithms”, the SHA is a collection of cryptographic hash functions used in the industry. These days, SHA-3 is a popular choice to use in the system
What is an ACID transaction?
Atomicity: either fails or succeeds
Consistent: cannot bring the DB to an invalid state.
Isolation: The executions of multiples transactions concurrently will have the same effects as if they have been executed sequentially
Durability: any commited transaction is written to a non-volatile storage
What is Prometheus?
Popular open source time series database
What is Sharding?
Sometimes called data partitioning, sharding is the act of splitting the database in two or more pieces called shards and is typically done to increase the throughput of your database
What is a Hot spot?
When distributing workload, some servers might get more traffics than others. This can happen if your sharding key or hashing function are suboptimal.
What is Leader Election?
Process by which nodes in a cluster ( for instance, servers in a set of servers) elect a so called “leader” amongst them, responsible for the primary operations that these nodes provides. There are some known well algorithms for that like Paxos and Raft
What is pooling?
The act of fetching a resource or piece of that regularly at an interval to make sure the data is not too stale
What is Streaming?
In networking, usually refers to the act of continuously getting a feed of information from a server by keeping an open connection between two machines or processes
What dos the Pub/Sub pattern guarantees?
- At least once delivery
- Persistant storage
- ordering of messages
What is idempotent operations?
An operation that has the same ultimate outcome regardless of how many times it’s performed.
What is HTTPS?
Hypertext transfer protocol Secure is an extension of HTTP that is used for secure communication online.
It requires servers to have trusted certificates (usually SSL certificates) and uses the Transport Layer Security (TLS), security protocol built on top of TCP to encrypt data communicated between a server and a client
What is TLS?
Transport Layer Security is a security protocol over which HTTP runs in order to achieve secure communication online. HTTP over TLS is HTTPS
What is SSL certificate?
A digital certificate granted to a server by a certificate authority. Contains the servers public key, to be used a part of the TLS handshake
What is a certificate Authority?
trusted entity that signs digital certificates - namely, SSL certificates that are relied on HTTPS connections
How does a TLS handshake work?
- client sends a client hello - a string of random bytes to the server
- the server responds with hello server - another string of random bytes - as well as its SSL certificate which contains it’s public key
- the client verifies that the certificate was issued by a certificate authority and sends a premaster secret - yet another string of random bytes, this time to encrypted with the servers public key
- client and server uses the client hello, server hello and premaster secret to then generate the same symmetric encryption session keys to be used to decrypt and encrypt all data communicated during the remainder of the connection
What is MapReduce?
popular framework for processing very large datasets in a distributed setting efficiency, quickly and fault tolerante manner. A MapReduce job is composed of 3 main steps:
- the Map step, which runs a map function on the various data chunks of dataset nd transforms these chunks into intermediate key-value pair
- the Shuffle step, which reorganizes the intermediate key-value pairs such that pairs of the same keys are routed to the same machine in the final step
- the Reduce step, which runs a reduce function on the newly shuffled key-value pairs and transforms them into more meaningful data
What is a Distributed File System?
Is an abstraction over (usually large) clusters of machines that allow them to act like one large file system. The two most popular implementations of DFS are Google File System (GFS) and the Hadoop Distributed File System (HDFS)
What Consistent Hashing can used for?
can be used in Load Balancer to distribute load. For example, we will process the requestId into some hashing function to determine to which server it should go to. The bad/good thing is, it might go the same server. That is good to store user sessions in a local cache for example, but it would be bad that these servers would receive a lot of load. Also if you have cache stored in those servers, and you add a new one, all those cache info are lost because the hashing is going to change. But with consistent Hashing, these changes are reduced, because the ranges that it will make, will be equally distributed
What is CDN used for?
A content distribution network—also known as a content delivery network—is a large, geographically distributed network of specialized servers that accelerate the delivery of web content and rich media to internet-connected devices.
The primary technique that a content distribution network (CDN) uses to speed the delivery of web content to end users is edge caching, which entails storing replicas of static text, image, audio, and video content in multiple servers around the “edges” of the internet, so that user requests can be served by a nearby edge server rather than by a far-off origin serve
What are the different types of load balancers?
- Round Robin: Distributed equally in a rotational system manner.
- IP hash: the client’s IP address is hashed and determines which server receives the request
- Least connections:
- Least Response time
- Least bandwidth
Can a API act as a Load Balancer? What are the differences?
Yes.
API gateway can replace what a load balancer would usually provide, with a simpler interface, but it doesn’t come cheap.
Nevertheless, API Gateway offers many additional features missing in ALB. For example, it handles authentication and authorization, API token issuance and management, and can even generate SDKs based on the API structure. API Gateway integrates with the IAM (Identity Access Management) service, for example, simplifying access control of the underlying resources.
What are the benefits of NoSql?
- flexible data models
- Horizontal scaling
- fast queries
- easy for developers
When to use SQL instead of NoSql?
- you are working with complex queries and reports
- you have a high transaction application
- you need ACID compliance
- you don’t anticipate a lot of changes or growth
When to use NoSql instead of SQL?
- you are not concerned about that consistency and 100% data integrity is not your top goal
- you have a lot of data, many different data types and your data will only grow over time
- your data needs to scale up and down. NoSql provides a much greater flexibility