SDI Terminology Flashcards

Question 1

Q

Bandwidth

Answer

A

Bandwidth is how much information you receive every second. You can compare it to a bathtub. If the bathtub faucet has a wide opening, more water can flow at a faster rate than if the pipe was narrower. The water is like bandwidth.

Question 2

Q

Memory

Answer

A

An electronic holding place for the instructions and data a computer needs to reach quickly. It’s where information is stored for immediate use. Without memory, a computer wouldn’t function.

Question 3

Q

Capacity

Answer

A

When referring to a disk/drive, capacity is the maximum amount of data a device such as a hard drive can hold.

Question 4

Q

Storage

Answer

A

A mechanism that enables a computer to retain data, either temporarily or permanently.

Question 5

Q

Horizontal Scaling

Answer

A

Also called scaling out. Refers to adding additional nodes or machines to your infrastructure to cope with new demands. For instance, adding servers.

Question 6

Q

Vertical Scaling

Answer

A

Also called scaling up. Describes adding additional resources to an existing system so that it meets demands.

Question 7

Q

Load Balancers

Answer

A

Refers to efficiently distributing incoming network traffic across a group of backend servers (called server farm or server pool).

Question 8

Q

Layer 4 vs. Layer 7

Answer

A

Layer 4 uses only TCP connection from client to the server while layer 7 uses two TCP connections from client to the server.
Layer 7 has application awareness and makes smart and informed load balances based on the content of the data, whereas layer 4 carries out its load balancing based on its built in software algorithm. Layer 7 is great for microservices.

Question 9

Q

Sharding

Answer

A

A “shard” means a small part of a whole. Hence, sharding means dividing a larger part into smaller parts. Shards are not only smaller, but also faster and hence easily manageable.

Question 10

Q

Active-Active vs. Active-Passive

Answer

A

Both are used for high-availability configurations.
Active-Active - made up of at least two nodes, both actively running the same kind of service simultaneously. Used to achieve load balancing.
Active-Passive - made up of at least two nodes, but not all nodes are going to be active. The passive (failover) server serves as a backup that’s ready to take over as soon as the active (primary) server gets disconnected or it unable to serve.

Question 11

Q

DNS

Answer

A

Domain Name System - a hierarchical naming system built on a distributed database for computers, services, or any resource connected to the Internet or a private network. Most importantly, it translates human readable domain names into the numerical identifiers associated with networking equipment, enabling devices to be located and connected worldwide. Analogous to a network “phone book”, DNS is how a browser can translate a domain name (e.g. facebook.com) to the actual IP address of the server, which stores the information requested by the browser.

Question 12

Q

CDN

Answer

A

A network of servers that distributes content from an “origin” server throughout the world by caching content close to where each end user is accessing the internet via a web-enabled device. The content they request is first stored on the origin server and it then replicated and stored elsewhere as needed.

Question 13

Q

Caching

Answer

A

A high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data.

Question 14

Q

Database Replication

Answer

A

Database replication refers to the process of copying data from a primary database to one or more replica databases in order to improve data accessibility and system fault-tolerance and reliability. Typically an ongoing process which occurs in real time as data is created, updated, or deleted in the primary database but it can also occur as one-time or scheduled batch projects.

Question 15

Q

Redundancy

Answer

A

Redundancy is the duplication or mirroring of a device or data that helps prevent from becoming lost or a device from becoming unavailable.

Question 16

Q

mapReduce

Answer

A

Algorithm/Technique that contains two important tasks: Map and Reduce. Map takes a data set and converts it into another set of data, where individual elements are broken down into tuples (key-values). Reduce takes the output from map as an input and combines those data tuples into a smaller set of tuples. The major advantage of mapReduce is that it’s easy to scale data processing over multiple computing nodes.

Question 17

Q

Cache Eviction

Answer

A

The process by which old, relatively unused, or excessively voluminous data can be dropped from the cache, allowing the cache to remain within a memory budget.

Question 18

Q

CAP Theorem

Answer

A

Applies the logic that a distributed system can deliver only two of three desired characteristics: consistency, availability, and partition tolerance.

Question 19

Q

ACID

Answer

A

The presence of four properties - atomicity, consistency, isolation, and durability - can ensure that a database transaction is completed in a timely manner. When a database possess these properties, they are said to be ACID compliant.

Question 20

Q

BASE

Answer

A

Stands for:
Basically Available - rather than enforcing immediate consistency, BASE-modelled NoSQL databases will ensure availability of data by spreading and replicating it across the nodes of the database cluster
Soft State - due to the lack of immediate consistency, data volumes may change over time.
Eventual Consistency -

Question 21

Q

Strong vs. Eventual Consistency

Answer

A

Strong consistency means the latest data is returned, but, due to internal consistency methods, it may result with higher latency or delay. With eventual consistency, results are less consistent early on, but they are provided much faster with low latency.

Question 22

Q

CPU

Answer

A

Central Processing Unit is the electronic circuitry that executes instructions comprising computer program. The CPU will be completing calculations by utilizing its billions of transistors. These calculations run the software that allows a device to perform its task.

Question 23

Q

http vs. http2

Question 24

Q

TCP/IP Model

Question 25

Q

IPv4 vs. IPv6

Answer

A

The fourth version of IP was introduced in 1983. The supply of available IPv4 addresses has become depleted. IPv6 has more permutations and it thus becoming the standard.

Question 26

Q

TCP vs. UDP

Answer

A

Both transport layer protocols. The main difference is reliability. UDP is faster and efficient, but transmissions aren’t always reliable. TCP relies on communication throughout transmission, UDP does not.

Question 27

Q

CDNs and Edge

Answer

A

Edge Computing helps organizations distribute their computing resources closer to the network edge in order to provide users with better performance and more reliable access to data and applications. A CDN strategically places servers that provide cached data to users, which results in faster experiences and more reliable access to data.

Question 28

Q

Bloom Filters

Answer

A

A bloom filter is a probabilistic data structure that is based on hashing. It is extremely space efficient and is typically used to add elements to a set and test if an element is in a set. Though, the elements themselves are not added to a set. Instead a hash of the elements is added to the set.

Question 29

Q

Bottleneck

Answer

A

A bottleneck occurs when the capacity of an application or a computer system is limited by a single component, like the neck of a bottle slowing down the overall water flow. The bottleneck has the lowest throughput of all parts of the transaction path.

Question 30

Q

Service Level Agreements/Assurances

Answer

A

An agreement that sets the expectations between the service provider and the customer and describes the products or services to be delivered, the single point of contact for end-user problems, and the metrics by which the effectiveness of the process is monitored and approved.

Question 31

Q

Forward vs. Reverse Proxy

Answer

A

The key difference between a reverse proxy and a forward proxy is that a forward proxy enables computers isolated on a private network to connect to the public internet, while a reverse proxy enables computers on the internet to access a private subnet.

Question 32

Q

Leader Election

Answer

A

Leader election is the simple idea of giving one thing (a process, host, thread, object, or human) in a distributed system some special powers. Those special powers could include the ability to assign work, the ability to modify a piece of data, or even the responsibility of handling all requests in the system.

Leader election is a powerful tool for improving efficiency, reducing coordination, simplifying architectures, and reducing operations. On the other hand, leader election can introduce new failure modes and scaling bottlenecks.

Question 33

Q

Consensus Algorithm

Answer

A

A process in computer science used to achieve agreement on a single data value among distributed processes or systems.

Question 34

Q

Polling

Question 35

Q

Streaming

Question 36

Q

Rate-Limiting

Question 37

Q

Denial of Service (DoS) Attack vs. Distributed DoS

Question 38

Q

Publisher vs. Subscriber Messaging

Question 39

Q

Idempotency

Question 40

Q

Concurrency

Question 41

Q

Performance vs. Scalability

Question 42

Q

Latency vs. Throughput

Question 43

Q

Throughput

Answer

A

The maximum capacity of a machine or system. It’s often used in factories to calculate how much work an assembly line can do in an hour or a day, or some other unit of measurement. In computing, it would be the amount of data that can be passed around in a unit of time. So a 512 Mbps internet connection is a measure of throughout - 512 Mb (megabits) per second.

Question 44

Q

Latency

Answer

A

Latency is simply the measure of a duration. The duration for an action to complete something or produce a result.

Question 45

Q

Availability vs. Consistency

Question 46

Q

Availability

Question 47

Q

Consistency

Question 48

Q

Consistency Patterns

Question 49

Q

Availability Patterns

Question 50

Q

Steps of SDI

Answer

A

Define Requirements
Rough Estimate of Scale
Mock Basic UI
Define Data Model
Define APIs
High Level Design
Detailed Design
Identify + Resolve Bottlenecks

Question 51

Q

Load Balancer Routing Methods

Answer

A

Round Robin - requests are distributed across the group of servers sequentially
Least Connections - a new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections
Least Time - sends requests to the server selected by a formula that combines the fasted response time and fewest active connections
Hash - distributes requests based on a key you define, such as the client IP address or the request URL
IP Hash - the IP address of the client is used to determine which server receives the request

Question 52

Q

Internet Protocol

Answer

A

Set of rules for routing and addressing packets of data so that they can travel across networks and arrive at the correct destination. Data traversing the Internet is divided into smaller pieces, called packets. IP information is attached to each packet, and this information helps routers to send packets to the right place. Every device or domain that connects to the Internet is assigned an IP address, and as packets are directed to the IP address attached to t hem, data arrives where it is needed.

Question 53

Q

Transmission Control Protocol

Question 54

Q

Disk Storage vs. Memory

Answer

A

Both terms refer to internal storage space in a computer. Each is used for a different purpose. “Memory” usually means RAM (Random Access Memory). RAM is a hardware that allows the computer to efficiently perform more than one task at a time (i.e. multitask). The terms “disk space” and “storage” usually refer to hard drive storage. This is typically used for long-term storage of various types of files.

Question 55

Q

Data Eviction Options

Answer

A

LIFO, FIFO, LRU, LFU

Question 56

Q

Proxy

Answer

A

A system or router that provides a gateway between users and the internet. Therefore, it helps prevent cyber attackers from entering a private network. It is a server, referred to as an “intermediary” because it goes between end-users and the web pages they visit online.

Question 57

Q

Consistent Hashing

Answer

A

Used in Distributed Systems to keep the hash table independent of the number of servers available to minimize relocation when changes of scale occur.

Question 58

Q

Deterministic

Answer

A

This is a key principle for a good hashing algorithm/function. It’s a fancy way for saying that identical inputs will generate identical outputs when passed into the function. Deterministic means, if I pass the string “Code” (case sensitive) and the function generates a hash of 11002, then every time I pass in “Code” it must generate “11002” as an integer. And if I pass in “code” it will generate a different number (consistently).

Question 59

Q

Database Indexing

Answer

A

Indexes are a powerful tool used in the background of a database to speed up querying. So, rather than having to search the entire database, you can quickly access the index, which speeds up the process. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.

Question 60

Q

Collision

Answer

A

When more than one input deterministically generates the same output, when hashing.

Question 61

Q

Distributed Systems

Answer

A

A computing environment in which various components are spread across multiple computers (or other computing devices) on a network. These devices split up the work, coordinating their efforts to complete the job more efficiently than if a single device had been responsible for the task. Distributed systems reduce the risks involved with having a single point of failure, bolstering reliability and fault tolerance. DS are designed to be scalable in near real-time.

Question 62

Q

Hashing

Answer

A

The classic hashing approach uses a hash function to generate a pseudo-random number, which is then divided by the size of the memory space to transform the random identifier into a position within the available space.

Question 63

Q

Websocket vs HTTP

Answer

A

HTTP and WebSocket are both communication protocols used in client-server communication.
HTTP is unidirectional where the client sends the request and the server sends the response. After sending the response the connection gets closed, each HTTP or HTTPS request establish the new connection to the server every time and after getting the response the connection gets terminated by iteself. Websocket is bidirectional, a full-duplex protocol that is used in the same scenario of client-server communication. It is a stateful protocol which means the connection between client and server will keep alive until it is terminated by either party.

Brainscape's Knowledge GenomeTM

SDI Terminology Flashcards

Brainscape's Knowledge Genome^TM