YT - System Design Flashcards

1
Q

How can API pagination help?

A
  • Helps manage server load
  • Reduce network traffic
  • Keep application responsive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two main approaches of Pagination?

A

Page + Offset
Cursor-based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Offset Based Pagination?

A
  • The client requests a specific page by providing a limit (number of records per page) and an offset (how many records to skip).
  • The server queries the database using SQL’s LIMIT and OFFSET.
  • The server returns the requested subset of data, along with pagination metadata (e.g., total pages, current page).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Offset pagination SQL query?
  2. GET REST API syntax
  3. JSON Pattern
A
  • SELECT * FROM users LIMIT 5 OFFSET 10;
  • GET users?limit=5&offset=0
  • {
    data: [{}, {}, …],
    total: 100,
    limit: 5,
    offset: 0
    }
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Advantages and Disadvantages of offset-based pagination

A

Advantages of Offset-Based Pagination
✅ Simple to Implement – Works well with SQL’s LIMIT and OFFSET.
✅ Allows Jumping to Any Page – You can directly request page 5 (e.g., offset=20 for limit=5).
✅ Good for Small Datasets – Works well when data size is moderate.
Disadvantages of Offset-Based Pagination
❌ Slow for Large Datasets – High offsets cause performance issues because the database still scans all preceding rows.
❌ Data Inconsistency Issues – If new records are added or deleted, users might see duplicate or missing records when paginating.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is cursor based pagination?

A
  • Client Requests Data: The API returns a limited set of records (limit=10) and includes a cursor (e.g., next_cursor=abc123).
  • Client Fetches Next Page: The client uses next_cursor in the next API request to get the next set of results.
  • Efficient Querying: Since the cursor is based on an indexed column (like id or created_at), it’s faster and more efficient than offset-based pagination.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Request, response of cursor based pagination

A

GET /users?limit=5
{
“data”: [
{ “id”: 101, “name”: “Alice” },
{ “id”: 102, “name”: “Bob” },
{ “id”: 103, “name”: “Carol” },
{ “id”: 104, “name”: “Dave” },
{ “id”: 105, “name”: “Eva” }
],
“nextCursor”: 105
}

The nextCursor (e.g., 105) is the id of the last user returned.

To get the next page -> GET /users?limit=5&cursor=105

NOTE: the cursor is based on an indexed column (like id or created_at),

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advantages and disadvantages of cursor based pagination

A

Advantages of Cursor-Based Pagination
✅ More Efficient – Faster than offset-based pagination, especially for large datasets.
✅ No Skipping Issues – Avoids missing or duplicate records if new data is inserted.
✅ Works Well for Infinite Scrolling – Many social media feeds use this approach.
Disadvantages
❌ More Complex Implementation – Requires careful handling of cursors.
❌ No Jumping to Specific Pages – Unlike page-based pagination, you can’t skip directly to page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is REST API?

A

REST is the most common communication standard on the internet

The resources should be grouped by noun and not a verb such as products/users and not getProducts from below example.

Request formats: (CRUD)
GET – Retrieve data
POST – Create new resources
PUT – Update existing resources
DELETE – Remove resources

Example of a RESTful API Endpoint:
Assume you have a RESTful API for a user management system.
GET /users → Retrieves a list of users
GET /users/1 → Retrieves details of a user with ID 1
POST /users → Creates a new user
PUT /users/1 → Updates user with ID 1
DELETE /users/1 → Deletes user with ID 1
GET with params: http://localhost:3000/api/users?name=alice&age=25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Session-Based Authentication

A

How It Works:
User Logs In → The user submits credentials (e.g., username & password).
Server Creates a Session → If credentials are valid, the server stores a session ID in a database (or memory) for example Redis. Data such as userId, expirationTime is saved.
Session ID Sent to Client → The session ID is stored in a browser cookie.
Client Sends Session ID on Requests → For each subsequent request, the client sends the session ID in the cookie.
Server Validates Session ID → The server checks if the session ID is valid and associated with a user.
Pros:
✅ Secure (session stored on the server).
✅ Easier to revoke access (just delete the session from the database).
✅ Supports server-side session management (track active users, expire sessions, etc.).
Cons:
❌ Requires server-side storage (not scalable for distributed systems without extra setup).
❌ More load on the server (storing & managing sessions).
❌ Doesn’t work well with stateless architectures like microservices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

JWT-Based Authentication

A

How It Works:
User Logs In → The user submits credentials.
Server Generates JWT Token → If credentials are valid, the server creates a JWT containing user info & expiration time and signs JWT with a secret key. This signature ensures the integrity of the token, prevent tampering
Token Sent to Client → The JWT is sent to the client (usually in a cookie or Authorization header).
Client Sends JWT on Requests → For each request, the client includes the JWT in the Authorization header.
Server Verifies JWT → The server checks if the JWT is valid using a secret key or public key.
Pros:
✅ Stateless (no need for server-side storage, works well with microservices).
✅ Scalable (no session tracking on the server, reducing database load).
✅ Portable (can be used across different domains and services).
Cons:
❌ Harder to revoke (you can’t just delete a session; you need a revocation strategy).
❌ If a token is stolen, it can be used until it expires.
❌ If not implemented securely, JWT can expose user data (if not encrypted).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

“If a JWT token is stolen, it can be used until it expires”,

A

Solution: Use https and refresh tokens

Refresh tokens in JWT (JSON Web Token) are long-lived tokens used to obtain new access tokens after the previous one expires. They help maintain user authentication without requiring them to log in repeatedly.
How JWT Refresh Tokens Work
User Authentication: When a user logs in, the server issues two tokens:
Access Token (short-lived, typically 15 minutes to a few hours)
Refresh Token (long-lived, usually days or weeks)
Using the Access Token: The client includes the access token in API requests to authenticate.
Token Expiry: When the access token expires, the client sends the refresh token to the server to request a new access token.
Issuing a New Access Token: If the refresh token is valid, the server issues a new access token without requiring the user to log in again.
Refresh Token Expiry or Revocation: If the refresh token expires or is revoked (e.g., user logs out), the user must log in again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why Use JWT for Distributed Applications?

A

JWT (JSON Web Token) is widely used for authentication and authorization in distributed applications because it is stateless, scalable, and secure. Here’s why it is beneficial:

  1. Stateless Authentication (No Need for a Central Session Store)
    In a distributed system, multiple microservices handle different parts of an application.
    Traditional session-based authentication requires a central session store (e.g., in-memory sessions or a database), which can become a bottleneck.
    JWT solves this by being self-contained – it carries all necessary authentication data within the token itself, eliminating the need for centralized session storage.
    Structure of a JWT
    A JWT is made up of three parts, separated by dots (.):
    Header: holds info about signing algo example HS256
    Payload: it also holds role info such as admin etc
    Signature
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HTTP Status code 1xx

A

1xx – Informational Responses
100 Continue – The server has received the request headers and the client should proceed with sending the body.
101 Switching Protocols – The server is switching protocols as requested by the client (e.g., WebSockets).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

HTTP Status code 2xx

A

2xx – Success
200 OK – The request was successful.
201 Created – A new resource was successfully created (e.g., after a POST request).
204 No Content – The request was successful, but there’s no content to return (common in DELETE requests).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

HTTP Status 3xx

A

3xx – Redirection
301 Moved Permanently – The resource has been permanently moved to a new URL.
302 Found (Temporary Redirect) – The resource is temporarily located at a different URL.
304 Not Modified – The requested resource hasn’t changed since the last request (used for caching).

17
Q

HTTP Status code 4xx

A

4xx – Client Errors
400 Bad Request – The request is malformed or has invalid parameters.
401 Unauthorized – Authentication is required but missing or incorrect.
403 Forbidden – The client is not allowed to access the resource. (Ex: trying to access admin features with guest account
404 Not Found – The requested resource does not exist.
429 Too Many Requests - The client has sent too many requests in a given period, triggering rate limiting.
405 Method Not Allowed – The HTTP method used is not allowed for the requested resource.
408 Request Timeout – The client took too long to send the request.

18
Q

HTTP Status code 5xx

A

5xx – Server Errors
500 Internal Server Error – A generic error indicating something went wrong on the server.
502 Bad Gateway – The server received an invalid response from an upstream server.
503 Service Unavailable – The server is temporarily overloaded or down for maintenance.
504 Gateway Timeout – The server did not receive a timely response from an upstream server.

19
Q

What is concurrency?

A

Definition: Concurrency refers to the ability of a system to handle multiple tasks (threads, processes) at the same time by interleaving(switching) execution.
Key Concept: Tasks make progress independently, but not necessarily at the same instant.
Execution Model: Achieved through time-slicing (context switching) on a single-core processor or through multi-threading.
Example: A web server handling multiple client requests by switching between them efficiently.
EX: A chef cooks soup, fries veggies, and bakes a cake — but does them one at a time, constantly switching between them to keep all tasks moving.

20
Q

What is Parallelism?

A

Definition: Parallelism refers to the simultaneous execution of multiple tasks or subtasks on multiple processing units.
Key Concept: Tasks run truly at the same time, often on multiple cores or processors.
Execution Model: Requires multiple physical or logical processing units (e.g., multi-core CPUs, GPUs).
Example: A data processing pipeline where different parts of data are processed on different CPU cores in parallel. Video streaming data frames

EX: Three chefs each take one task: one handles soup, one fries veggies, and one bakes the cake — all at the same time.

21
Q

Memory & Storage Data System - Deep dive on memory

A

Memory Systems (Volatile Storage) is typically fast and temporary (data is lost when power is off).

Random Access Memory (RAM)
-Location: Main system memory
-Speed: Slower than CPU cache, but faster than SSD/HDD
-Size: GBs (typically 8–64GB)
-Types:
DRAM (Dynamic RAM): Common system RAM, needs refresh cycles
SRAM (Static RAM): Faster but expensive, used in CPU cache

Use Case: Running active programs and temporary data storage
Read-Only Memory (ROM)

ROM (Read-Only Memory) is a type of non-volatile memory that stores permanent data and cannot be modified (or can only be modified with difficulty). It retains its data even when power is turned off.

Characteristics of ROM
Non-Volatile – Data is retained even without power.
Pre-Programmed – Typically contains firmware or essential system instructions.
Read-Only – Cannot be easily altered or rewritten (except for certain types like EEPROM).
Reliable & Fast – Provides fast access to critical system instructions.

22
Q

Memory & Storage Data System - Deep dive on Storage Data System

A

Storage Systems (Non-Volatile Storage are slower but persistent, meaning data remains after power is off.
a. Solid State Drives (SSD)
Technology: Flash memory (NAND)
Speed: Much faster than HDDs (~500MB/s to several GB/s)
Size: 256GB–4TB (common)
Types:
SATA SSD: Slower (~550MB/s)
NVMe SSD (PCIe): Faster (~3–7GB/s)
Use Case: Fast boot times, quick application loading
b. Hard Disk Drives (HDD)
Technology: Spinning magnetic disks
Speed: Slower (~80–200MB/s)
Size: 500GB–16TB
Use Case: Mass storage for backups, less speed-sensitive tasks

23
Q

What are CPU registers?

A

CPU registers are small, high-speed storage locations within the central processing unit (CPU) that temporarily hold data and instructions during processing. They are much faster than RAM and play a crucial role in executing instructions efficiently.

What is a Clock Cycle?

A clock cycle is the basic unit of time in a computer processor, driven by the system clock.

What Happens in One Clock Cycle?
In one cycle, the processor can perform a variety of tasks, such as:
- Fetching an instruction from memory.
- Decoding the instruction.
- Executing the instruction (e.g., addition, subtraction).
- Storing the result back into a register or memory.

The speed of the processor is largely determined by how fast the clock ticks. A higher clock speed means more cycles per second, allowing the CPU to perform more tasks in less time.

​​Example:
If a CPU is running at 3 GHz:
It completes 3 billion clock cycles per second.
Each cycle is 0.33 nanoseconds.

23
Q

What is CPU Cache?

A

The CPU cache is a small, super-fast type of memory that stores frequently used data so the CPU doesn’t have to fetch it from slower system memory (RAM) every time.

Imagine you’re working at a desk:

L1 Cache = Stuff in your hand (super quick access)

L2 Cache = Stuff on your desk (still close, a bit slower)

L3 Cache = Stuff in your office shelf (slower but faster than going to the library)

RAM = Library down the hall (slower still)

Disk Storage = Library across town 😅