SDI Terminology Flashcards

1
Q

Bandwidth

A

Bandwidth is how much information you receive every second. You can compare it to a bathtub. If the bathtub faucet has a wide opening, more water can flow at a faster rate than if the pipe was narrower. The water is like bandwidth.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Memory

A

An electronic holding place for the instructions and data a computer needs to reach quickly. It’s where information is stored for immediate use. Without memory, a computer wouldn’t function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Capacity

A

When referring to a disk/drive, capacity is the maximum amount of data a device such as a hard drive can hold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Storage

A

A mechanism that enables a computer to retain data, either temporarily or permanently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Horizontal Scaling

A

Also called scaling out. Refers to adding additional nodes or machines to your infrastructure to cope with new demands. For instance, adding servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Vertical Scaling

A

Also called scaling up. Describes adding additional resources to an existing system so that it meets demands.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Load Balancers

A

Refers to efficiently distributing incoming network traffic across a group of backend servers (called server farm or server pool).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Layer 4 vs. Layer 7

A

Layer 4 uses only TCP connection from client to the server while layer 7 uses two TCP connections from client to the server.
Layer 7 has application awareness and makes smart and informed load balances based on the content of the data, whereas layer 4 carries out its load balancing based on its built in software algorithm. Layer 7 is great for microservices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sharding

A

A “shard” means a small part of a whole. Hence, sharding means dividing a larger part into smaller parts. Shards are not only smaller, but also faster and hence easily manageable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Active-Active vs. Active-Passive

A

Both are used for high-availability configurations.
Active-Active - made up of at least two nodes, both actively running the same kind of service simultaneously. Used to achieve load balancing.
Active-Passive - made up of at least two nodes, but not all nodes are going to be active. The passive (failover) server serves as a backup that’s ready to take over as soon as the active (primary) server gets disconnected or it unable to serve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DNS

A

Domain Name System - a hierarchical naming system built on a distributed database for computers, services, or any resource connected to the Internet or a private network. Most importantly, it translates human readable domain names into the numerical identifiers associated with networking equipment, enabling devices to be located and connected worldwide. Analogous to a network “phone book”, DNS is how a browser can translate a domain name (e.g. facebook.com) to the actual IP address of the server, which stores the information requested by the browser.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CDN

A

A network of servers that distributes content from an “origin” server throughout the world by caching content close to where each end user is accessing the internet via a web-enabled device. The content they request is first stored on the origin server and it then replicated and stored elsewhere as needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Caching

A

A high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Database Replication

A

Database replication refers to the process of copying data from a primary database to one or more replica databases in order to improve data accessibility and system fault-tolerance and reliability. Typically an ongoing process which occurs in real time as data is created, updated, or deleted in the primary database but it can also occur as one-time or scheduled batch projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Redundancy

A

Redundancy is the duplication or mirroring of a device or data that helps prevent from becoming lost or a device from becoming unavailable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

mapReduce

A

Algorithm/Technique that contains two important tasks: Map and Reduce. Map takes a data set and converts it into another set of data, where individual elements are broken down into tuples (key-values). Reduce takes the output from map as an input and combines those data tuples into a smaller set of tuples. The major advantage of mapReduce is that it’s easy to scale data processing over multiple computing nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cache Eviction

A

The process by which old, relatively unused, or excessively voluminous data can be dropped from the cache, allowing the cache to remain within a memory budget.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

CAP Theorem

A

Applies the logic that a distributed system can deliver only two of three desired characteristics: consistency, availability, and partition tolerance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

ACID

A

The presence of four properties - atomicity, consistency, isolation, and durability - can ensure that a database transaction is completed in a timely manner. When a database possess these properties, they are said to be ACID compliant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

BASE

A

Stands for:
Basically Available - rather than enforcing immediate consistency, BASE-modelled NoSQL databases will ensure availability of data by spreading and replicating it across the nodes of the database cluster
Soft State - due to the lack of immediate consistency, data volumes may change over time.
Eventual Consistency -

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Strong vs. Eventual Consistency

A

Strong consistency means the latest data is returned, but, due to internal consistency methods, it may result with higher latency or delay. With eventual consistency, results are less consistent early on, but they are provided much faster with low latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

CPU

A

Central Processing Unit is the electronic circuitry that executes instructions comprising computer program. The CPU will be completing calculations by utilizing its billions of transistors. These calculations run the software that allows a device to perform its task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

http vs. http2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

TCP/IP Model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

IPv4 vs. IPv6

A

The fourth version of IP was introduced in 1983. The supply of available IPv4 addresses has become depleted. IPv6 has more permutations and it thus becoming the standard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

TCP vs. UDP

A

Both transport layer protocols. The main difference is reliability. UDP is faster and efficient, but transmissions aren’t always reliable. TCP relies on communication throughout transmission, UDP does not.

27
Q

CDNs and Edge

A

Edge Computing helps organizations distribute their computing resources closer to the network edge in order to provide users with better performance and more reliable access to data and applications. A CDN strategically places servers that provide cached data to users, which results in faster experiences and more reliable access to data.

28
Q

Bloom Filters

A

A bloom filter is a probabilistic data structure that is based on hashing. It is extremely space efficient and is typically used to add elements to a set and test if an element is in a set. Though, the elements themselves are not added to a set. Instead a hash of the elements is added to the set.

29
Q

Bottleneck

A

A bottleneck occurs when the capacity of an application or a computer system is limited by a single component, like the neck of a bottle slowing down the overall water flow. The bottleneck has the lowest throughput of all parts of the transaction path.

30
Q

Service Level Agreements/Assurances

A

An agreement that sets the expectations between the service provider and the customer and describes the products or services to be delivered, the single point of contact for end-user problems, and the metrics by which the effectiveness of the process is monitored and approved.

31
Q

Forward vs. Reverse Proxy

A

The key difference between a reverse proxy and a forward proxy is that a forward proxy enables computers isolated on a private network to connect to the public internet, while a reverse proxy enables computers on the internet to access a private subnet.

32
Q

Leader Election

A

Leader election is the simple idea of giving one thing (a process, host, thread, object, or human) in a distributed system some special powers. Those special powers could include the ability to assign work, the ability to modify a piece of data, or even the responsibility of handling all requests in the system.

Leader election is a powerful tool for improving efficiency, reducing coordination, simplifying architectures, and reducing operations. On the other hand, leader election can introduce new failure modes and scaling bottlenecks.

33
Q

Consensus Algorithm

A

A process in computer science used to achieve agreement on a single data value among distributed processes or systems.

34
Q

Polling

A
35
Q

Streaming

A
36
Q

Rate-Limiting

A
37
Q

Denial of Service (DoS) Attack vs. Distributed DoS

A
38
Q

Publisher vs. Subscriber Messaging

A
39
Q

Idempotency

A
40
Q

Concurrency

A
41
Q

Performance vs. Scalability

A
42
Q

Latency vs. Throughput

A
43
Q

Throughput

A

The maximum capacity of a machine or system. It’s often used in factories to calculate how much work an assembly line can do in an hour or a day, or some other unit of measurement. In computing, it would be the amount of data that can be passed around in a unit of time. So a 512 Mbps internet connection is a measure of throughout - 512 Mb (megabits) per second.

44
Q

Latency

A

Latency is simply the measure of a duration. The duration for an action to complete something or produce a result.

45
Q

Availability vs. Consistency

A
46
Q

Availability

A
47
Q

Consistency

A
48
Q

Consistency Patterns

A
49
Q

Availability Patterns

A
50
Q

Steps of SDI

A
  1. Define Requirements
  2. Rough Estimate of Scale
  3. Mock Basic UI
  4. Define Data Model
  5. Define APIs
  6. High Level Design
  7. Detailed Design
  8. Identify + Resolve Bottlenecks
51
Q

Load Balancer Routing Methods

A

Round Robin - requests are distributed across the group of servers sequentially
Least Connections - a new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections
Least Time - sends requests to the server selected by a formula that combines the fasted response time and fewest active connections
Hash - distributes requests based on a key you define, such as the client IP address or the request URL
IP Hash - the IP address of the client is used to determine which server receives the request

52
Q

Internet Protocol

A

Set of rules for routing and addressing packets of data so that they can travel across networks and arrive at the correct destination. Data traversing the Internet is divided into smaller pieces, called packets. IP information is attached to each packet, and this information helps routers to send packets to the right place. Every device or domain that connects to the Internet is assigned an IP address, and as packets are directed to the IP address attached to t hem, data arrives where it is needed.

53
Q

Transmission Control Protocol

A
54
Q

Disk Storage vs. Memory

A

Both terms refer to internal storage space in a computer. Each is used for a different purpose. “Memory” usually means RAM (Random Access Memory). RAM is a hardware that allows the computer to efficiently perform more than one task at a time (i.e. multitask). The terms “disk space” and “storage” usually refer to hard drive storage. This is typically used for long-term storage of various types of files.

55
Q

Data Eviction Options

A

LIFO, FIFO, LRU, LFU

56
Q

Proxy

A

A system or router that provides a gateway between users and the internet. Therefore, it helps prevent cyber attackers from entering a private network. It is a server, referred to as an “intermediary” because it goes between end-users and the web pages they visit online.

57
Q

Consistent Hashing

A

Used in Distributed Systems to keep the hash table independent of the number of servers available to minimize relocation when changes of scale occur.

58
Q

Deterministic

A

This is a key principle for a good hashing algorithm/function. It’s a fancy way for saying that identical inputs will generate identical outputs when passed into the function. Deterministic means, if I pass the string “Code” (case sensitive) and the function generates a hash of 11002, then every time I pass in “Code” it must generate “11002” as an integer. And if I pass in “code” it will generate a different number (consistently).

59
Q

Database Indexing

A

Indexes are a powerful tool used in the background of a database to speed up querying. So, rather than having to search the entire database, you can quickly access the index, which speeds up the process. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.

60
Q

Collision

A

When more than one input deterministically generates the same output, when hashing.

61
Q

Distributed Systems

A

A computing environment in which various components are spread across multiple computers (or other computing devices) on a network. These devices split up the work, coordinating their efforts to complete the job more efficiently than if a single device had been responsible for the task. Distributed systems reduce the risks involved with having a single point of failure, bolstering reliability and fault tolerance. DS are designed to be scalable in near real-time.

62
Q

Hashing

A

The classic hashing approach uses a hash function to generate a pseudo-random number, which is then divided by the size of the memory space to transform the random identifier into a position within the available space.

63
Q

Websocket vs HTTP

A

HTTP and WebSocket are both communication protocols used in client-server communication.
HTTP is unidirectional where the client sends the request and the server sends the response. After sending the response the connection gets closed, each HTTP or HTTPS request establish the new connection to the server every time and after getting the response the connection gets terminated by iteself. Websocket is bidirectional, a full-duplex protocol that is used in the same scenario of client-server communication. It is a stateful protocol which means the connection between client and server will keep alive until it is terminated by either party.