Coding university Flashcards

1
Q

What is Hamming Code?

A

In telecommunication, Hamming codes are a family of linear error-correcting codes that generalize the Hamming(7,4)-code, and were invented by Richard Hamming in 1950. Hamming codes can detect up to two-bit errors or correct one-bit errors without detection of uncorrected errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Using bitwise operations, how would you test that a number is a power of 2

A

bool isPowerOfTwo = (x & (x - 1);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does ELF stand for?

A

Executable and Linkable Format. It’s a common standard file format for executables, object code, shared libraries, and core dumps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

example of a latency device

A

CPU core

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

example of a throughput device

A

GPU core

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Hamming Distance?

A

A number used to denote the number of differences between two binary strings of the same length.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 5 steps of the compiling process?

A

Lexical Analysis

Parsing

Semantic Analysis

Optimization

Code Generation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is parsing?

A

Combining tokens and groups of tokens into a tree structure (a parse tree).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is lexical analysis?

A

The process of dividing program text into words or tokens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is code generation?

A

Producing a translation from a high-level program to assembly code. (Linker and Archiver taker over from here to produce machine code)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would you turn OFF the 3rd bit from the end in a bitstring?

A

x &= ~(1 &laquo_space;2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Write a function that calculates the Hamming distance.

A

def hamming_distance(x, y):

difference = x ^ y

count = 0

while difference != 0:

    count += 1

    difference &= difference - 1

return count
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write a function to calculate the Hamming weight of an integer. (Kernighan method)

A

int countSetBits(int n) {

int count = 0;

while (n) {

n = n & (n - 1);

\++count;

}

return count;

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Write a function that calculates the Hamming weight in constant time. Divide and Conquer strategy

A

int countSetBits(unsigned int n) {

n = n - ((n&raquo_space; 1) & 0x55555555);

n = (n & 0x33333333) + ((n&raquo_space; 2) & 0x33333333);

n = (n + (n&raquo_space; 4)) & 0x0F0F0F0F;

n = n + (n&raquo_space; 8);

n = n + (n&raquo_space; 16);

return n & 0x0000003F;

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Write a function that tells you if a number is even, using bitwise operation

A

def is_even(x):

return x & 1 == 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Write a function to add 2 integers using bitwise operations.

A

def add(a, b):

while a:

    c = b & a

    b ^= a

    c <<= 1

    a = c

return b
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Write a function to get the sign of an integer

A

def get_sign(x):

return -(x < 0)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Write a function to calculate the absolute value of a 32-bit integer

A

def myabs(x):

high_bit_mask = x >> 31

return (x ^ high_bit_mask) - high_bit_mask
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a red-black tree?

A

BSTs having red and black links satisfying:

  • Red links lean left
  • No node has two links connected to it
  • The tree has perfect black balance: every path from the root to a null link has the same number of blacks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a splay tree?

A

A self-adjusting binary search tree where recently accessed elements are moved to the root so they are quick to access again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a treap?

A

A random priority is assigned to every key and must maintain two properties:

  • They are in order with respect to their keys, as in a typical binary search tree
  • They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor

O(log N) expected time for all operations, O(N) worst case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is typical cache line size?

A

64 bytes. To know the sizes, you need to look it up using the documentation for the processor, afaik there is no programatic way to do it. On the plus side however, most cache lines are of a standard size, based on intels standards. On x86 cache lines are 64 bytes, however, to prevent false sharing, you need to follow the guidelines of the processor you are targeting (intel has some special notes on its netburst based processors), generally you need to align to 64 bytes for this (intel states that you should also avoid crossing 16 byte boundries).

To do this in C or C++ requires that you use aligned_malloc or one of the compiler specific specifiers such as __attribute__((align(64))) or __declspec(align(64)). To pad between members in a struct to split them onto different cache lines, you need on insert a member big enough to align it to the next 64 byte boundery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is latency?

A

Latency is the delay from input into a system to desired outcome. The time interval between between a stimulus and response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a y-fast trie?

A

A y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982 to decrease the O(n log M) space used by an x-fast trie.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is an x-fast trie?

A

An x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a van Emde Boas tree?

A

The van Emde Boas tree supports insertions, deletions, lookups, successor queries, and predecessor queries in time O(log log U), where U is the universe of items to store. Items are stored in clusters of size sqrt(U).

The van Emde Boas data structure divides the range {0,…,n−1} into blocks of size sqrt(n), which we call clusters. Each cluster is itself a vEB structure of size sqrt(n). In addition, there is a “summary” structure that keeps track of which clusters are nonempty.

—More details —-
A van Emde Boas tree (or van Emde Boas priority queue), also known as a vEB tree, is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O(log m) time, or equivalently in O(log log M) time, where M = 2m is the maximum number of elements that can be stored in the tree. The M is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured. The vEB tree has good space efficiency when it contains a large number of elements, as discussed below. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a compressed trie?

A

It’s a trie where the non-branching paths are compacted into a single edge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What relationship of the keys do you lose with a hash table?

A

The ordering of the keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Is quicksort stable?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Can quicksort be done in-place?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Can merge sort be done in-place?

A

No. It requires O(n) space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Is merge sort stable?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Is insertion sort stable?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Can insertion sort be done in-place?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Can selection sort be done in-place?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Is selection sort stable?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Is heap sort stable?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Can heap sort be done in-place?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Write merge sort in C (check answer carefully)

A

void merge(int numbers[], int low, int mid, int high) {

// temp array for holding sorted items

int b[high - low - 1];

int i = low;

int j = mid + 1;

int k = 0;

// merge items from list in order

while (i <= mid && j <= high) {

if (numbers[i] <= numbers[j]) {

  b[k++] = numbers[i++];

} else {

  b[k++] = numbers[j++];

}

}

// copy the remaining items to tmp array

while (i <= mid) b[k++] = numbers[i++];

while (j <= high) b[k++] = numbers[j++];

–k;

while (k >= 0) {

numbers[low + k] = b[k];

--k;

}

}

void merge_sort(int numbers[], int low, int high) {

if (low < high) {

int mid = (low + high) / 2;

merge_sort(numbers, low, mid);

merge_sort(numbers, mid + 1, high);

merge(numbers, low, mid, high);

}

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Write a MergeSort class in Python (check answer carefully)

A

class MergeSort(object):

def \_\_init\_\_(self, numbers):

    self. values = numbers
    self. count = len(numbers)

def sort(self):

    self.merge_sort(0, self.count - 1)

    return self.values

def merge_sort(self, low, high):

    if low < high:

        mid = (low + high) // 2

        self. merge_sort(low, mid)
        self. merge_sort(mid + 1, high)
        self. merge(low, mid, high)

def merge(self, low, mid, high):

    b = []

    i = low

    j = mid + 1

    while i <= mid and j <= high:

        if self.values[i] <= self.values[j]:

            b.append(self.values[i])

            i += 1

        else:

            b.append(self.values[j])

            j += 1

    while i <= mid:

        b.append(self.values[i])

        i += 1

    while j <= high:

        b.append(self.values[j])

        j += 1

    for index, val in enumerate(b):

        self.values[low + index] = val
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Write quick sort in C with only one method and random pivot (check answer carefully)

A

void quick_sort(int numbers[], int left, int right) {

if (left == right)

return;

int i = left;

int j = right;

int temp = 0;

int count = right - left;

int pivot_mod = rand() % count;

int pivot = numbers[left + pivot_mod];

while (i <= j) {

while (numbers[i] < pivot) ++i;

while (numbers[j] > pivot) --j;

if (i <= j) {

  temp = numbers[i];

  numbers[i] = numbers[j];

  numbers[j] = temp;

  \++i;

  --j;

}

}

if (left < j) {

quick_sort(numbers, left, j);

}

if (right > i) {

quick_sort(numbers, i, right);

}

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Write a QuickSort class in Python (check answer carefully)

A

import random

class QuickSort(object):

def \_\_init\_\_(self, numbers):

    self. values = numbers
    self. count = len(self.values)

def sort(self):

    self.quick_sort(0, self.count - 1)

    return self.values

def quick_sort(self, left, right):

    if left == right:

        return

    i = left

    j = right

    pivot_index = random.randint(left, right)

    pivot = self.values[pivot_index]

    while i <= j:

        while self.values[i] < pivot:

            i += 1

        while self.values[j] > pivot:

            j -= 1

        if i <= j:

            if i < j:

                temp = self.values[i]

                self. values[i] = self.values[j]
                self. values[j] = temp

            i += 1

            j -= 1

    if left < j:

        self.quick_sort(left, j)

    if right > i:

        self.quick_sort(i, right)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

In what case would perfect hashing be practical?

A

When you don’t need to support inserts or deletes. The data is static.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How does perfect hashing handle collisions?

A

It creates a second hash table in the buckets where there are multiple items (k), using a second hash function, and k^2 space. The hash table has two hashing levels. k^2 is chosen because the Markov inequality (birthday paradox) ensures we will not have collisions in bucket.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is the optimal load factor for a hash table?

A

O(sqrt(n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the expected load factor for a hash table?

A

n/m, where n = items, m = buckets) n/m is also called alpha.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the technical running time for operations on a hash table?

A

O(1 + alpha), where alpha is the load factor (n/m). Table doubling operations are amortized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is the worst-case search time of perfect hashing?

A

O(1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is the worst-case space required for perfect hashing?

A

O(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What’s the best-case running time of binary search?

A

O(1) - we get lucky and find the element right at the midpoint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What’s the worst-case running time of binary search?

A

O(log n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What are the downsides of using an adjacency matrix to represent a graph?

A

Finding all the outgoing edges from a vertex takes O(n) time even if there aren’t very many, and the O(n^2) space cost is high for ““sparse graphs,”” those with much fewer than n^2 edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

When is using an adjacency list expensive?

A

Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n + e), where e is the total number of edges, although if this is something we actually need to do often we can store a second copy of the graph with the edges reversed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

When are adjacency lists most useful?

A

Adjacency lists are most useful when we mostly want to enumerate outgoing edges of each node. This is common in search tasks, where we want to find a path from one node to another or compute the distances between pairs of nodes. If other operations are important, we can optimize them by augmenting the adjacency list representation; for example, using sorted arrays for the adjacency lists reduces the cost of edge existence testing to O(log(d+ (u))), and adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d− (u)) time, where d− (u) is u’s in-degree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is the space required for a graph using an adjacency list?

A

O(n + e)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Given a fully balanced binary tree with x nodes, what is the height of the tree in nodes?

A

log(base2) x + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Given a fully balanced k-ary tree with x nodes, what is the height of the tree in nodes?

A

log(basek) x + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

A binary tree with height h can contain at most how many nodes?

A

2^(h+1) − 1 nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

For a k-ary tree with height h, the upper bound for the maximum number of leaves is:

A

k^h

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What is the complexity of Dijkstra’s shortest-path algorithm?

A

O(e log v), where e is the number of edges.

It must scan each edge, and gets and updates values on the heap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What is a drawback of using an adjacency matrix for an undirected graph?

A

Half of the entries in the matrix are duplicates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What is the memory needed to store an adjacency list?

A

Theta( |V| + |E| )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

What is the memory needed to store an adjacency matrix?

A

Theta(|V|^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

How would you implement a queue with a linked list?

A

Use a tail pointer. Push new items at the tail, pop items at the head. Both operations are constant-time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

How would you implement a stack with a linked list?

A

Push and pop items at the head. Both operations are constant-time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

What preference of nodes vs leaves does preorder traversal give on a tree?

A

Nodes first, leaves later.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What preference of nodes vs leaves does postorder traversal give on a tree?

A

Leaves first, internal nodes later.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What could you use in DFS to turn a recursive algorithm into an interative one?

A

A stack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What do you use to keep track of nodes to visit in BFS?

A

queue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Using a stack to keep track of unvisited nodes gives what kind of traversal?

A

DFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Using a queue to keep track of unvisited nodes gives what kind of traversal?

A

BFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

In a highly connected graph of n vertices, how many cycles can there be?

A

(n - 1)! - enumerating is possible (using backtracking), but there will be a lot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

What can use to find if a graph is bipartite?

A

BFS. Using only 2 colors. When you encounter a new vertex, if it has no color, give it the opposite color of its parent vertex. If it is already colored the same, the graph is not bipartite.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

How can you find a cycle in a graph?

A

DFS. If you discover an edge that connects to an ancestor (previously discovered vertex), you have a cycle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

What is an articulation vertex?

A

A vertex of a graph whose deletion disconnects the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

How can you find an articulation vertex?

A

DFS multiple times. Remove each edge one at a time, doing a DFS after each, so see if you end up with > 1 connected components. If you remove a node and then DFS and find you have fewer than m - 1 edges, you’ve deleted an articulation vertex. O(n(n+m)).

A faster way, with a little more bookkeeping, can be done in O(n+m) time, if you do DFS and keep track of parents and make a note when you reach a back edge, which connects to an ancestor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

How could you identify errors in a DNA fragment assembly given many pairs of sequences, where item A must appear before B in the larger sequence?

A

Build a DAG representing all the left-right constraints. Any topological sort of the DAG is a consistent ordering. If there are cycles, there must be errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

What path does BFS find in a graph?

A

The shortest path tree from start to all nodes (unweighted)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What’s the upper bound on the number of edges in a graph G(V, E)?

A

|V|^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

In Python, initialize a list of lists called x with 100 elements.

A

x = [[] for i in range(100)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

In Python, declare a named tuple called Edge, with attributes vertex and weight.

A

from collections import namedtuple

Edge = namedtuple(‘Edge’, [‘vertex’, ‘weight’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

What is the optimal substructure property tell us about shortest paths?

A

That a subpath of a shortest path is also a shortest path.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

C++: How would you initialize a vector of 25 integers to all zeroes?

A

std::vector mynums(25);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

What is a Dunder method?

A

A magic method in Python, such as __getitem__ and __len__.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

What is the sum of numbers from 1 to 2^n?

A

2^(n+1) - 1

The sum of a sequence of powers is roughly equal to the next value in the sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

How many ways can you rearrange a string of n unique characters

A

n!

Permutations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

How many ways can you arrange k characters from n unique characters?

A

n! / (n - k)!

Permutation of n elements of size k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

How many subsets (ordering doesn’t matter) of size k are there in n unique characters?

A

n! / k!(n - k)!

This is n choose k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

Python: Write a class function to tell if the graph is bipartite. Start with vertex 0. You can access the adjacency list for a vertex v with: self.adjacency_list[v]

A

def is_bipartite(self):

    """"""

    Returns true if graph is bipartite

    :rtype: bool

    """"""

    colorings = {}

    to_visit = queue.Queue()

    to_visit.put(0)

    colorings[0] = 0

    while not to_visit.empty():

        v = to_visit.get()

        for u in self.adjacency_list[v]:

            if u not in colorings:

                colorings[u] = 1 - colorings[v]

                to_visit.put(u)

            elif colorings[u] == colorings[v]:

                return False

    return True
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

What should you avoid in your base case in recursion

A

Too many base case scenarios. Just have one base case so you can return as quickly as possible. Avoid ““arm’s length”” recursion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

What is the bandwidth of a graph?

A

The longest edge in the permutation that gives you the shortest edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

When talking dynamic programming, what is feasibility?

A

The rules the algorithm must adhere to in reaching its solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

When talking dynamic programming, what is optimality?

A

An algorithm has optimality if the subsolutions of an optimal solution of the problem are themsleves optimal solutions for their subproblems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

What is dynamic programming?

A

Dynamic programming is a general-purpose algorithm design technique that is most often used to solve combinatorial optimization problems, where we are looking for the best possible input to some function chosen from an exponentially large search space.

There are two parts to dynamic programming. The first part is a programming technique: dynamic programming is essentially divide and conquer run in reverse: we solve a big instance of a problem by breaking it up recursively into smaller instances; but instead of carrying out the computation recursively from the top down, we start from the bottom with the smallest instances of the problem, solving each increasingly large instance in turn and storing the result in a table. The second part is a design principle: in building up our table, we are careful always to preserve alternative solutions we may need later, by delaying commitment to particular choices to the extent that we can.

The bottom-up aspect of dynamic programming is most useful when a straightforward recursion would produce many duplicate subproblems. It is most efficient when we can enumerate a class of subproblems that doesn’t include too many extraneous cases that we don’t need for our original problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

What is the complexity for a naive recursive Fibonacci function

A

Θ(φ^n), where phi(φ) is the golden ratio (1 + sqrt(5)) / 2.

approx: 1.618

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

Write a dynamic programming version of computing Fibonacci for n.

A

def fib(n):

fibValues = [0,1]

for i in range(2, n+1):

  fibValues.append(fibValues[i-1] + fibValues[i-2])

return fibValues[n]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

Write a dynamic programming implementation of longest common subsequence of 2 strings.

A

def longest_common_subsequence(sequence1, sequence2):

cols = len(sequence1) + 1   # Add 1 to represent 0 valued column for DP

rows = len(sequence2) + 1   # Add 1 to represent 0 valued row for DP

T = [[0 for _ in range(cols)] for _ in range(rows)]

max_length = 0

for i in range(1, rows):

    for j in range(1, cols):

        if sequence2[i - 1] == sequence1[j - 1]:

            T[i][j] = 1 + T[i - 1][j - 1]

        else:

            T[i][j] = max(T[i - 1][j], T[i][j - 1])

        max_length = max(max_length, T[i][j])

return max_length
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

What is the difference between __str__ and __repr__?

A

def __str__(self): - meant to be human readable

def __repr__(self): - mean to represent the object and be unambiguous, usually as the constructor: for a Vector class, would be:

def __repr__(self):

return ""Vector({!r}, {!r})"".format(self.x, self.y)

OR

return ""Vector(%r, %r)"" % (self.x, self.y)

Containers use __repr__ of elements when __str__ is called on container.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q

How many subsets are there in n items?

A

2^n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

What is a contiguously-allocated structures, and give examples

A

Contiguously-allocated structures are composed of single slabs of memory, and include arrays, matrices, heaps, and hash tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q

What are linked data structures and give examples.

A

Linked data structures are composed of distinct chunks of memory bound together by pointers, and include lists, trees, and graph adjacency lists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q

What are some benefits of arrays?

A

Constant-time access given the index

  • Space efficiency
  • Memory locality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

Why is memory locality important?

A

Physical continuity between successive data accesses helps exploit the high-speed cache memory on modern computer architectures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q

What are some advantages to linked lists over arrays?

A
  • Overflow on linked structures can never occur unless the memory is actually full.
  • Insertions and deletions are simpler than for contiguous (array) lists.
  • With large records, moving pointers is easier and faster than moving the items themselves.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

What are some advantages to arrays over linked lists?

A
  • Linked structures require extra space for storing pointer fields.
  • Linked lists do not allow efficient random access to items.
  • Arrays allow better memory locality and cache performance than random pointer jumping.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

Given two strings str1 and str2, find the minimum number of edits (edit one character to another, delete char from str1 or delete char from str2) to change str1 to str2.

A

”””””””

  • DP Runtime : O(len(str1) * len(str2))

””””””

def min_edit_distance(str1, str2):

rows = len(str2) + 1

cols = len(str1) + 1

T = [[0 for _ in range(cols)] for _ in range(rows)]

for j in range(cols):

    T[0][j] = j

for i in range(rows):

    T[i][0] = i

for i in range(1, rows):

    for j in range(1, cols):

        if str2[i - 1] == str1[j - 1]:

            T[i][j] = T[i - 1][j - 1]

        else:

            T[i][j] = 1 + min(T[i - 1][j - 1], T[i - 1][j], T[i][j - 1])

print_edits(T, str1, str2)

return T[rows - 1][cols - 1]

if __name__ == ‘__main__’:

str1 = ""azced""

str2 = ""abcdef""

expected = 3

assert expected == min_edit_distance(str1, str2)

assert expected == min_edit_distance(str2, str1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
107
Q

What is a finalizer in Python?

A

A finalizer is a destructor, named __del__. __del__() is run when the runtime is about to destroy the object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
108
Q

What are 2 advantages of reference counting?

A
  • easy to implement

- collects garbage incidentally without large pauses in execution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
109
Q

What are 2 disadvantages of reference counting?

A
  • it cannot collect circular references

- manipulating reference counts at each assignment is very slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
110
Q

What is cyclic garbage collection?

A

Detects and removes cycles unreachable by the program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
111
Q

How is garbage collection done in PyPy?

A

The GC implementation can be chosen at runtime. It’s pluggable.

PyPy uses mark and sweep, and generational gc optimization. Marked objects are promoted from the nursery to an older generation.

PyPy uses incremental garbage collection, where major collection is split into multiple passes, each lasting only a few milliseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
112
Q

How does mark and sweep work?

A

(In Java) Perform a DFS on the graph of references to objects. This graph can have multiple roots. Each root is a reference that the program can access directly, such as a variable. Traverse the graph, setting a mark bit in each object. The sweep phase causes unmarked memory to be linked together in a list, so that memory can be reallocated. Sometimes this also triggers compaction, which moves used objects adjacent to each other in memory. The side effect of this is that free memory is also adjacent to free memory so large blocks can be allocated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
113
Q

What is copying garbage collection (stop and copy)?

A

Heap memory is split into 2 partitions: an old space and a new space. Find live objects by DFS of their reference graph, and move live objects into the new space. The new space is now called the old space. Unreachable objects are simply left in the old space to be overwritten the next time collection occurs. The movement of objects implicitly compacts the objects. Disadvantage: you can only use half of the heap space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
114
Q

How could you implement an LRU cache?

A

A fast lookup table, like a hash table or binary tree, and a linked list of items by use. When you access or add an item, you delete it from the linked list and add it to the head of the list. Then to prune, traverse the linked list and remove trailing elements, and delete them from the storage (tree or hash table).

You can also use a splay tree, since it moves accesses to the root. To prune items, somehow find and remove the leaves, since the number of leaves will be about n/2.”
What is a direct mapped cache? It’s a type of cache used in the CPU, where the lower order bits of a given memory address are used modulo the number of cache lines to place or lookup in the cache. Collisions are treated as overwrites.
What is a fully-associative cache? “It’s a type of cache used in the CPU, where lookups are done on all cache lines in parallel to determine a hit or miss.

This requires a very large number of comparators that increase the complexity and cost of implementing large caches. Therefore, this type of cache is usually only used for small caches, typically less than 4K.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
115
Q

Disadvantage of a fully-associative cache?

A

expensive due to parallel checks

complexity of implementing this scheme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
116
Q

How do some processors handle caching for data and instructions?

A

There will be a slightly slower (3-4 clocks latency) separate cache for data.
What is an N-way set associative cache? A Set-Associative cache scheme is a combination of Fully-Associative and Direct Mapped caching schemes. A set-associate scheme works by dividing the cache SRAM into equal sections (2 or 4 sections typically) called cache ways. The cache page size is equal to the size of the cache way. Each cache way is treated like a small direct mapped cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
117
Q

Is GET idempotent?

A

YEs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
118
Q

Is PUT idempotent?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
119
Q

Is POST idempotent?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
120
Q

Is DELETE idempotent?

A

According to the REST spec, yes, but it’s up to the developer to conform to that. It can be achieved by using a deleted flag for a resource instead of completely removing the resource.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
121
Q

What is idempotent?

A

The property that a method has side-effects of making more than one identical requests is the same as for a single request.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
122
Q

What is HMAC?

A

HMAC is a keyed-hash message authentication code used to provide a checksum for a message, sent along with the message to provide confidence that the message has not been tampered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
123
Q

What is a MAC?

A

MAC is a message authentication code used to provide a checksum for a message, sent along with the message to provide confidence that the message has not been tampered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
124
Q

How does RSA work?

A

It’s a public/private key cryptography method. The public key can be used to encrypt a message into ciphertext that only the owner of the key can decrypt. The owner of the key uses their secret key to encrypt messages, and their secret key to decrypt messages encrypted with their public key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
125
Q

What is the phi function?

A

It answers the number of integers <= n that do not share a common factor with n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
126
Q

What is phi(n) if n is prime?

A

n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
127
Q

What is the gcd of a number a and prime p when p is prime?

A

1, unless a is a pseudoprime (Carmichael number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
128
Q

What is the largest output size of SHA-3?

A

512 bits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
129
Q

What are desirable properties of one-way functions?

A
  • collision resistant
  • target collision resistant
  • non-malleable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
130
Q

If a one-way function is collision-resistant, does that mean it’s also target collision-resistant?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
131
Q

If a one-way function is target collision-resistant, does that mean it’s also collision-resistant?

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
132
Q

What is symmetric key encryption?

A

There is a known encryption function, and one key is used to encrypt and decrypt. The key has to be shared between 2 parties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
133
Q

How does Diffie-Hellman key exchange work?

A

2 parties agree on a G and a modulus p, and each party comes up with a number. One party does G^a and the other G^b. They pass this information. One party A computes the key from B as B^a mod p. B computes A^b mod p to get the key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
134
Q

Is Diffie-Hellman key exchange perfect?

A

No. A man in the middle can intercept one side, and communicate with parties A and B independently.
How is RSA (using product of large primes) better than using NP-Complete algorithms for encryption? NP-Complete algorithms are hard in the worst case, but can be sometimes solved in linear time in the average case. Compositing the product of large primes is hard in the average case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
135
Q

What is Vigenere cipher?

A

Improvement on Caesar cipher. Letters are shifted based on a shifted dictionary. ““Polyalphabetic cipher””

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
136
Q

What is a one-time pad encryption?

A

“The ““perfect”” simple encryption scheme. Pad/key is the same size as the message being encrypted. The key is randomly generated and xored against the plain text. Or key used to determine the amount each letter should be shifted.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
137
Q

What is block size in cryptography?

A

Symmetric key ciphers are generally divided into stream ciphers and block ciphers. Block ciphers operate on a fixed length string of bits. The length of this bit string is the block size. Both the input (plaintext) and output (ciphertext) are the same length; the output cannot be shorter than the input – this is logically required by the Pigeonhole principle and the fact that the cipher must be invertible – and it is simply undesirable for the output to be longer than the input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
138
Q

What is the limiting factor of compression?

A

For lossless compression, it’s entropy. For lossy compression, it’s our acceptance with the amount of loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
139
Q

What is LZ* compression based on?

A

Cataloging the positions and lengths of redundant patterns and combining the values with a dictionary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
140
Q

What is LZMA?

A

It’s a variant of LZ77 that uses Markov chains. It’s used in the 7z compression algorithms used in 7-zip.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
141
Q

What is DEFLATE?

A

It’s an lossless compression algorithm based on LZ77 used in Gzip, WinZip, and mod_deflate, which is bundled with Apache web server for automated gzip compression of HTTP served content. It uses LZ77 and Huffman coding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
142
Q

How does LZ77-based compression work?

A

LZ77 is a dictionary encoding algorithm, which is a statistical encoding algorithm. Compression in the LZ77 algorithm is based on the notion that strings of characters (words, phrases, etc.) occur repeatedly in the message being compressed.

The input is partitioned into 2 segments: a search buffer and a look-ahead buffer. The search buffer maxes out at 32KB. Starting with one character in the LA buffer, it looks back in the search buffer to find a copy of the symbol. If one is found, it looks at the second symbol of the LA buffer to see if it also matches the predecessor. Using this method, it can detect long phrases of symbols and encode them as one unit.

This process implicitly creates a rolling statistical probability for each symbol/phrase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
143
Q

What is Huffman encoding?

A

Huffman encoding algorithm analyzes the occurrence of individual symbols and creates a binary tree where the common symbols are closest to the root, using fewer bits to encode, and less common/rare symbols have longer paths on the tree, with longer encodings to accommodate. By traversing the tree, from root to leaf, and keeping track of 1 or 0 at each node, we can determine the encoding of the symbol.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
144
Q

What is the primary factor of compression?

A

Probability of redundant portions of input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
145
Q

How can you maximize compression?

A

By deeply analyzing the given input to reduce redundancy as much as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
146
Q

What compression scheme uses Burrows-Wheeler transform?

A

BZip2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
147
Q

What is the Burrows-Wheeler transform?

A

It’s a compression method involving the sorting of all possible rotations of the input text into lexicographic order. Take as output the last column and the index of the row that the original text appears in.

To decode, take the single column and repeatedly add the final columns characters to each of the rows, sorting each time. Once you’ve reached the length of the column’s height, use the index to find the output string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
148
Q

For Gzip in web servers, what is the usual setting?

A

6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
149
Q

What is the min and max compression settings in command line gzip?

A

0-9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
150
Q

How can you make JSON better compressable with Gzip?

A

Transpose from multiple mini-dicts into one dict with arrays as the values. This allows the items in an array to fit within the 32KB search buffer common to LZ-based compression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
151
Q

What are Dense Codes?

A

A way to put symbols or words into a dictionary or array, and use the indices as the values in the text to save space so that words are not repeated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
152
Q

What is the LZ in LZ compression?

A

Lempel-Ziv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
153
Q

What is OS hardware virtualization?

A

The abstraction of heterogeneous hardware provided by the operating system, to hide the details of interfacing with various hardware so that they share a common interface for that type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
154
Q

What is a process?

A

An instance of an executing program consisting of an address space and one or more threads of control. It has restricted rights. It owns a region of memory. It owns file descriptors, file system context. It encapsulates one or more threads sharing the process’ resources. It is isolated from other processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
155
Q

What is a context switch?

A

The copying out and in of register state to switch from running one process to running another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
156
Q

What is base and bound?

A

It’s a memory addressing restriction where a processes are only allowed access to the memory between a base address and the bound + base addresses. Each process has its own base and bound. A drawback is you don’t get address 0. Address translation fixes this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
157
Q

How does the OS know how to handle an interrupt?

A

It keeps an interrupt vector in the memory of the OS. Each interrupt type is mapped to an address to execute. They are just pointers to code in the OS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
158
Q

How are base and bound enforced?

A

They are stored in registers. Access is restricted by the hardware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
159
Q

When a process forks, what happens?

A

The process is paused, and a complete copy is made: code, stack, heap, data, program counter and registers.

The child and parent resume with returning from fork syscall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
160
Q

What does fork() return?

A

It returns the child process id to the parent, and 0 to the child. < 0 if error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
161
Q

What does wait() do?

A

Causes the parent process to pause until the child terminates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
162
Q

What does exec() do?

A

It’s a system call to change the currently running program to something else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
163
Q

What comes back from wait()?

A

on success, returns the process ID of the terminated child; on error, -1 is returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
164
Q

What is a signal?

A

A system call to send a notification to another process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
165
Q

In a child process, what can you do with fork and then exec?

A

Create a completely new process and then exit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
166
Q

What is a shell?

A

A job control program. It allows a programmer to create and manage a set of programs to do some task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
167
Q

How does the kernel handle reads and writes?

A

It buffers reads so they can be handled as a stream in your program. Writes are buffered and are not written until the kernel flushes the buffer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
168
Q

What’s the difference between the fread, read, fwrite, write I/O calls?

A

The ones with f are high-level I/O and streamed and buffered by the kernel. The non-f are low-level I/O.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
169
Q

When a system call is made, where are parameters stored?

A

In registers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
170
Q

What is a socket?

A

It’s an abstraction of a network I/O queue. It’s a method of communication where a producer writes to one side, and a consumer reads from the other side. It’s similar to writing and reading a file, but no file is involved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
171
Q

What sockets are in modern use?

A

Local sockets to local machine, called UNIX sockets, and TCP/IP and UDP/IP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
172
Q

What is the GIL?

A

It’s the Global Interpreter Lock. It’s is a part of CPython. It ensures only one thread runs in the interpreter at once. Having the GIL simplifies many low-level details (memory management, callouts to C extensions, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
173
Q

When is the GIL released?

A

During I/O (disk IO, network IO, output to display) including when a thread uses sleep.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
174
Q

“What is a ““tick”” in CPython?”

A

Approximately 1 machine instruction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
175
Q

What happens every 100 ““ticks”” in the CPython interpreter?

A

A thread check occurs during which the thread releases the GIL then attempts to reacquire it. Other Python threads will contend for the the GIL. This is no longer the case in 3.4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
176
Q

What is a lock in CPython?

A

It’s a binary semaphore. It’s not a mutex lock.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
177
Q

What happens when the heap gets too large?

A

It does a page fault, and the kernel will allocate more memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
178
Q

What happens when the heap and stack meet in memory?

A

A guard page is hit and the process is killed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
179
Q

Where is information about a process stored?

A

In a PCB (process control block).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
180
Q

Where is information about a thread stored?

A

In a TCB (thread control block).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
181
Q

What do multiple threads in the same process share?

A

Heap, file descriptors, code, static data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
182
Q

What do threads in a process NOT share?

A

Registers and stack.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
183
Q

What can happen with thread stacks if one goes into a deep recursion?

A

One thread’s stack can grow into another thread’s stack and write over it. A guard page can help to protect from that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
184
Q

What can cause a thread to give control back to the dispatcher?

A

Thread returns control voluntarily (yield, requesting I/O (which blocks), wait for signal from another thread) or gets preempted by an interrupt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
185
Q

How long does it take to do a process context switch?

A

3-4 microseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
186
Q

How long does it take to perform a thread context switch?

A

100ns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
187
Q

How often do context switches happen?

A

Every 10-100 ms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
188
Q

Context switch time increases sharply with the size of what?

A

The working set - the subset of memory used by the process in a time window. Cache etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
189
Q

What happens in a system call to get the OS to switch to kernel mode?

A

A trap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
190
Q

How many threads should you run per process?

A

One per core

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
191
Q

How is concurrency accomplished?

A

By multiplexing CPU time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
192
Q

What’s the difference between parallelism and concurrency?

A

Concurrency means running multiple blocks of instructions independently. Parallelism means running instructions at the same time, as on multiple cores at once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
193
Q

What is oversubscription?

A

Spawning more threads than available cores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
194
Q

What is a race condition?

A

When the outcome of a deterministic procedure becomes non-deterministic based on differences in subprocess timing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
195
Q

What can you put in place to exclusively use a resource without another process interfering?

A

A mutex, or even better, a lock guard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
196
Q

How do you use a mutex in Python?

A

import threading

lock = threading.Lock()

[first process]

global lock

lock.release()

[other process]

global lock

lock.acquire() // attempts to get access, waits if it can’t

197
Q

What does a future do?

A

Allows us to receive a return value from a function in a child thread.

198
Q

What is a promise?

A

A promise to send a parameter to a child thread’s function later.

199
Q

What is livelock?

A

It occurs when multiple processes are attempting to deal with the current state, but neither makes progress. This can happen when a system is attempting to resolve a deadlock situation but another or the same process continue to trigger it.

Starvation is another example.

200
Q

How long does a terminated process stay in the terminated state?

A

Until the parent process does a wait to receive its exit code.

201
Q

In Python, what can you use to fork a process?

A

The multiprocessing module. It supports process Pool and Process for making a pool of worker processes or forking temporary subprocesses.

202
Q

What does the concurrent.futures module offer?

A

ThreadPoolExecutor

ProcessPoolExecutor

Executor objects

Future objects

203
Q

What is an interrupt?

A

A hardware-invoked context switch. The interrupt handler always runs immediately.

204
Q

What happens during an interrupt?

A

The currently running process’ state is saved. We switch to kernel mode, the interrupt handler runs, and once its complete, the system goes back to user mode and the process continues.

205
Q

What really happens when you fork a process?

A

A fork doesn’t copy everything, it just duplicates the page table pointers, which are all set at read-only. Called copy-on-write. Once you write to memory, then it copies the state.

206
Q

What is multiprocessing?

A

Parallel execution on multiple cores.

207
Q

What does a PCB contain?

A

Everything about a process:

  • status
  • register state (when not in ready state)
  • PID, User, Executable, Priority
  • Execution time
  • Memory space, translation
208
Q

What is special about an interrupt handler?

A

It disables interrupts and runs to completion.

209
Q

What are the five states a process can be in?

A
  • new (when being created)
  • ready
  • running
  • waiting (for I/O or event coordination)
  • terminated (waits for parent process to receive its exit code)
210
Q

What is the difference between filter() and map()?

A

Filter uses a function that returns true or false (predicate).

Map uses a function that returns a value.

211
Q

What is synchronization?

A

Using atomic operations to ensure cooperation between threads.

212
Q

What is a critical section?

A

A block of code that you lock before entering, then unlock when leaving. This creates a mutual exclusion on shared data.

213
Q

What is the priority inversion problem?

A

A thread that is busy-waiting for a lock to be released ends up stealing CPU and getting a higher priority than the thread with the lock. SO since the waiting thread gets higher priority, the thread holding the lock can’t complete and release the lock.

214
Q

What is busy-waiting?

A

One or more threads is using a lot of CPU by continuously checking a value, or test&set() checking and writing a value in wiating for a lock to release, thus stealing CPU from the thread holding the lock.

215
Q

What is a semaphore?

A

A semaphore (defined by Dijkstra) is kind of signaling solution for handling concurrency data integrity problems that arise in multi-threaded applications. It has a non-negative integer that supports 2 operations:

  • P() [proberen, to test/probe] - atomic operation that waits for semaphore to become > 1, then decrements it by 1 (wait)
  • V() [verhogen, to increment] - an atomic operation that increments the semaphore by 1, waking up any P (signal)

The initial semaphore value will determine how many threads can run in the critical section at once.

216
Q

What’s another name for a mutual exclusion?

A

Binary semaphore.

217
Q

What is a monitor?

A

A lock and zero or more condition variables for managing concurrent access to shared data.

218
Q

What should locks and condition variables each be used for?

A

locks - mutual exclusion

condition variables - scheduling constraints

219
Q

What is a condition variable?

A

A queue of threads waiting for access to something in a critical section.

220
Q

What is a special feature of condition variables?

A

It allows sleeping inside a critical section by atomically releasing lock at the time we sleep.

221
Q

What are the 3 methods on a condition variable?

A

wait(&lock)

signal() - signals the next waiting member

broadcast() - signals all waiting members

222
Q

What type of scheduling do most modern processors use?

A

Mesa-scheduling.

223
Q

What are some things the scheduler tries to accomplish?

A
  • minimize response time
  • maximize throughput
  • fairness
224
Q

What is a drawback of context switching?

A

CPU cache misses as thread comes back from switching and finds the CPU cache doesn’t have the values it had before.

225
Q

What’s the convoy effect?

A

Short processes get stuck behind long processes in a FIFO style ready queue.

226
Q

What is the round robin scheduling scheme?

A

Each process gets a time quantum q milliseconds to run. 10-100ms, the q is tunable. Each process runs for that time slice (or until completion if close to done) and then goes back on the ready queue.

227
Q

What are pros of the round-robin scheduling scheme?

A
  • better for short jobs (they fit in the time slice)

- fair

228
Q

What is a con of the round-robin scheduling scheme?

A

Long jobs take longer because context-switching time adds up.

229
Q

What is a weak reference in Python?

A

A weak reference to an object does not affect its reference count.

When the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else. A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping."
How does garbage collection work in CPython?	"CPython uses reference counting and generational garbage collection. There are 3 age stages where objects live in memory. They all start in the ""nursery"", stage0, then if they survive a garbage collection, they are moved to stage1, the oldest objects that continue to survive in stage1 are promoted to stage2. The gc module has thresholds 700, 10, 10 for each stage. In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts. If generation 0 has been examined more than threshold1 times since generation 1 has been examined, then generation 1 is examined as well. Similarly, threshold2 controls the number of collections of generation 1 before collecting generation 2."
What is reference counting?	RC is a method of garbage collection. The runtime keeps track of references to an object by manipulating the reference count on each assignment and delete (del), and when the reference count reaches 0 it means the object is practically unreachable. When the next collection runs, the object's memory will be reserved to allocate for new objects.
230
Q

What is starvation?

A

When low-priority jobs never get run because there are always higher priority jobs running.

231
Q

How does a process’ priority get changed?

A

The scheduler utilizes heuristics on interactivity, locking, burst behavior, etc.

232
Q

What are some methods of avoiding deadlock?

A
  • don’t allow waiting for a resource (means a lot of retries)
  • make all threads request everything they’ll need at the beginning
  • force all threads to request resources in a particular order preventing any cyclic uses of resources (so no cycle exists)
  • temporarily expand resources when a deadlock is detected
233
Q

What is the banker’s algorithm for preventing deadlock?

A
  • allocate resources dynamically
  • evaluate each request and grant if some ordering of threads is still deadlock-free afterward
  • do so by pretending the request was granted, then running a simulation to see if a deadlock would occur”
    How does the banker’s algorithm solve the dining lawyers problem? “When you try to grab a chopstick, it’s either:
  • not the last chopstick
  • is last chopstick but someone else will have two afterwards
234
Q

What translates virtual to physical addresses?

A

The MMU - the memory management unit

235
Q

What are the four conditions needed for a deadlock?

A
  • mutual exclusion
  • hold and wait
  • no preemption
  • circular wait
236
Q

How many bits represent an IPv6 address?

A

128

237
Q

Name some of the protocols used within the TCP/IP application layer.

A
  • http
  • https
  • ftp
  • tftp
  • ntp
  • irc
  • telnet
  • smtp
  • ssh
  • dns
  • snmp
  • pop3
238
Q

What is NTP?

A

Network time protocol

239
Q

What are some protocols in the TCP/IP transport layer?

A
  • TCP

- UDP

240
Q

ICMP

A

internet control message protocol

241
Q

What are some TCP/IP network access layer protocols?

A
  • RJ45
  • ISDN
  • Microwave
  • Ethernet
  • Wifi
  • Fiber optics
  • ATM
  • RJ48
  • Copper cables
242
Q

What is a PDU?

A

Protocol data unit

  • generic term used to describe the information at a given layer in the TCP/IP stack
243
Q

What is the PDU for OSI layer 7?

A

data, determined by what information is being exchanged: text, encrypted text, compressed data

244
Q

What are the PDUs for the the OSI transport layer?

A

for TCP, it’s called a segment

for UDP, it’s called a datagram

245
Q

What is the PDU for the TCP/IP internet layer?

A

packet

246
Q

What are the 2 PDUs of the OSI Network Access layer?

A

data link layer: frames

physical layer: bits

247
Q

What is the port for DNS?

A

53

248
Q

What is the port for telnet?

A

23

249
Q

What is the port for ssh?

A

22

250
Q

What is the port range for clients?

A

8000-65535

251
Q

How many bits are in an ethernet frame?

A

48 bits, represented as a hexadecimal number.

252
Q

What does MAC stand for?

A

medium access control, a sublayer in the data link layer.

253
Q

What is the PDU and the addressing at the data link layer?

A

PDU: frame

Addressing: physical (MAC) address

254
Q

What devices are at the data link layer?

A

Bridges, switches (multi-port bridge). They inspect frames and forward or not.

255
Q

What devices are at the Internet/Network layer?

A

Routers

Layer 3 switches: can be a switch or a router

256
Q

What is the PDU and the addressing at the Internet/Network layer?

A

PDU: packet

Addressing IP address

257
Q

What is the PDU and the addressing at the Transport layer?

A

PDU: segment

addressing: ports

258
Q

What devices are at the Transport layer?

A

Firewalls

259
Q

What is a socket in HTTP?

A

The combination of an IP address and a port.

260
Q

What is involved in the 3 way handshake (TCP)?

A

SYN=1 - synchronize, gives a Seq number and expects that number + 1 in response

ACK=1 - sent by acknowledging server with incremented number, who also sends a SYN=1 and a Seq

SYN=0 ACK=1 and the Seq (incremented number) back to the server

Now you’re talking!

261
Q

Does Kerberos use symmetric or asymmetric encryption?

A

Symmetric. It tracks all principals and their keys in its KDC table.

262
Q

What are the 7 layers of the OSI model?

A
  • application
  • presentation
  • session
  • transport
  • network
  • data link
  • physical
263
Q

What are the 4 layers of TCP/IP?

A
  • application (application, presentation, session in OSI)
  • transport
  • internet (network in OSI)
  • network access (data link & physical in OSI)
264
Q

How is an SSL certificate generated by the certificate authority (CA)?

A

The common name and public key for a given domain name, signed by the certificate authority’s secret key.

The browser can verify the cert with CA’s public key.

265
Q

What is the secure flag on a cookie?

A

When set on a cookie, it will only be sent on https requests.

When not set, cookie will be sent on both http and https requests.

266
Q

When does a Python multi-threaded program terminate?

A

The entire Python program exits when no alive non-daemon threads are left.

267
Q

In Python, if a thread is set to daemon, what happens when the thread sleeps?

A

If the Python program reaches its end, the thread will be killed even if it’s sleeping.

268
Q

If a thread is a daemon, what happens when you do a join()?

A

The main thread will wait for it.

269
Q

What does WebRTC stand for?

A

Web Real-Time Communication

270
Q

Give an example of the thread-per-connection pattern.

A

A web server might spawn a thread per connection, then reuse that thread once the connection ends, or terminate the thread.

271
Q

Give an example of the thread pool model.

A

A pool of threads can be maintained in order to quickly provide one as a resource for a database connection.

272
Q

Fibonacci in Python

A

def fib(n):

a, b = 1, 1

for i in range(1, n):

    a, b = b, a + b

return a
273
Q

What is contained in a packet?

A
  • source IP
  • destination IP
  • data - some portion of the final payload
274
Q

What is TLS?

A

The successor to SSL. All of SSL’s versions have been deprecated due to security issues.

275
Q

What is the purpose of the transport layer?

A

To allow multiple applications to use one network connection simultaneously.

276
Q

What is DNS spoofing?

A

A DNS server is compromised and returns incorrect IP addresses for a some domains.

277
Q

What does TCP stand for?

A

Transmission Control Protocol

278
Q

What is special about TCP?

A

It manages the sending and receiving of packet data.

It acknowledges receipt of packets.

If packets are missing, the source will resend the missing packets.

279
Q

What is HTTP?

A

The protocol for client-server communication.

280
Q

What does UDP stand for?

A

User Datagram Protocol.

281
Q

What is the size of a UDP header?

A

8 bytes

282
Q

What is the size of a TCP header?

A

20 bytes

283
Q

What does ICMP allow you to do?

A

Allows devices to communicate and send errors. Can echo to see if a device is on the network.

284
Q

What does SNMP stand for?

A

Simple Network Management Protocol.

285
Q

What does SNMP do?

A

Gathers info from machines on the network when each box has an SNMP agent installed. Can send a large amount of info about machines, software installed, and machine configuration.

286
Q

Do you need to establish a connection before sending data via UDP?

A

No, it’s connectionless.

287
Q

Tell me about the checksum in a UDP packet.

A

It’s a 16-bit checksum. It’s only mandatory on IPv6

288
Q

How many times are packets sent in UDP?

A

Once.

289
Q

What is special about UDP?

A

It’s connectionless, packets are only sent once and not re-sent if dropped. Packets may not arrive in the right order, and there is no ordering mechanism to fix on the receiving end. No congestion control.

290
Q

What’s special about TCP?

A

It does a 3-way handshake before data is sent.

Delivery is acknowledged by receiver.

Packets missing within a certain time window are re-requested.

Packets are put in order on receipt.

Congestion control: can delay delivery until network is uncongested.

IPv4 and IPv6: error detection, checksum mandatory.

291
Q

What does OSI stand for?

A

Open Systems Interconnect

292
Q

Why was OSI created?

A

To solve the interoperability problem of having multiple heterogeneous networks.

293
Q

Is OSI just a model?

A

Yes

294
Q

What network protocol won the networking wars?

A

TCP/IP, based on the OSI model.

295
Q

What happens at the Application level of the OSI model?

A

This is where applications live and they handle data in many forms.

296
Q

What happens in the Session layer of the OSI model?

A

This layer handles configuration of the data:

  • encryption
  • compression
  • translation to and from different character encodings
297
Q

What happens at the Transport layer of the OSI model?

A

This layer guarantees end-to-end delivery of data:

  • packet ordering
  • error detection
  • acknowledgements
298
Q

What happens at the Network layer of the OSI model?

A

This layer’s function is to find the shortest path through the network to the destination network.

Deals with congestion, bandwidth, etc.

299
Q

What happens at the Data Link layer of the OSI model?

A

It decides whose turn it is to talk on the network using bus arbitration methods.

It finds the physical device on the network.

300
Q

What happens at the Physical layer of the OSI model?

A

It’s the physical network that deals with the physical transmission of electricity through wire:

  • cables
  • voltages
  • frequencies
  • connectors
  • bits
  • transfer rates
  • and much more
301
Q

How does HTTP/2 save bandwidth?

A

Headers are compressed and do not need to send the same headers in a session if they haven’t changed.

Servers can send assets referenced in a document without waiting for discrete requests for them.

302
Q

How does HTTP/2 improve cache breaking?

A

A server can send updated assets using server push when it recognizes a file has changed.

303
Q

What is the stream parallelism in HTTP/2?

A

It’s fully multiplexed, so it can use 100-1000 streams in a connection.

304
Q

Is HTTP/2 binary or textual?

A

HTTP/2 is a binary protocol.

305
Q

How are headers and body treated differently in HTTP/2?

A

They are split into a header frame and a data frame. Multiple requests can be interleaved in a connection, so a request doesn’t block.

306
Q

What is priority in HTTP/2?

A

Different assets can have different priority so that below the fold content can arrive later.

307
Q

What is the range of the first octet on a Class A network?

A

1-126. We don’t use 0 or 127.

308
Q

How many network IDs are there on a Class A network?

A

2^7 = 128

First bit is 0, bits 1-7 are network IDs

309
Q

How many host IDs are supported on a Class A network?

A

2^24 = 16 million

There are 8 bits for the network ID, and the remaining 24 bits are for host IDs.

So there are 16 million per network.

310
Q

What is the range of the first octet on a Class B network?

A

128 - 191

311
Q

How many network IDs are supported on a Class B network?

A

2^14 = 16,384

First 2 bits are 10, bits 3-16 are network IDs

312
Q

How many host IDs are supported on a Class B network?

A

2^16 = 65,536

So there are 65,536 per network

313
Q

What is the range of the first octet on a Class C network?

A

192-223

314
Q

How many network IDs are supported on a Class C network?

A

2^21 = 2 million

First 3 bits are 110, bits 4-24 are network IDs

315
Q

How many host IDs are supported on a Class C network?

A

2^8 = 256

There are 256 hosts per network

316
Q

What is a class D network reserved for?

A

Multicasting

317
Q

What is unicasting?

A

Sending a packet from one host to another.

318
Q

What does a network ID end in?

A

0

319
Q

What does a broadcast ID end in?

A

255

320
Q

Who does a broadcast address of 255.255.255.255 send to?

A

All hosts within the network.

321
Q

What is a directed broadcast?

A

It’s a broadcast to all hosts within another network.

322
Q

What is a limited broadcast address?

A

The limited broadcast address is the address formed by setting all 32 bits of the IP address to 1 (255.255.255.255). The limited broadcast address is used when an IP node must perform a one-to-everyone delivery on the local network but the network ID is unknown.

323
Q

What should you make networks as small as possible?

A

For:

  • security
  • maintenance
  • management
324
Q

How you divide a network?

A

By subnetting.

325
Q

What does a /27 CIDR mean?

A

The first 27 bits are masked with 1s. The remaining 5 bits are reachable in the subnet.

326
Q

What does a /24 CIDR mean?

A

The first 24 bits of the IP address are masked. Only hosts with addresses in the unmasked portion are reachable.

327
Q

What is a block cipher?

A

A block cipher is a method of encrypting text (to produce ciphertext) in which a cryptographic key and algorithm are applied to a block of data (for example, 64 contiguous bits) at once as a group rather than to one bit at a time.

328
Q

What is QUIC?

A

QUIC is a new transport which reduces latency compared to that of TCP. On the surface, QUIC is very similar to TCP+TLS+HTTP/2 implemented on UDP.

329
Q

What is Capsicum?

A

A sandboxing framework that adds capability-based security to unix-like kernels and denies access to global namespaces.

330
Q

What is a global namespace in unixy terms?

A

aspects of a system that can be accessed from anywhere:

  • file paths
  • networks
  • PIDs
331
Q

What is Google Native Client?

A

Also called NaCl, Native Client is a sandbox for running compiled C and C++ code in the browser efficiently and securely, independent of the user’s operating system.

332
Q

What are web sockets?

A

Full-duplex communication between client and server.

333
Q

What is the same-origin policy?

A

Goal: Two websites should not be able to tamper with each other.

Strategy: each resource is assigned an origin. JS can only access resources from its own origin.

Origin: scheme + hostname + port

334
Q

How can 2 origins (let’s say 2 frames) communicate?

A

window.postMessage (HTML5) allows for sending data messages between two windows/frames across domains.

335
Q

What is PyPy?

A

PyPy is a replacement for CPython. It is built using the RPython language that was co-developed with it. RPython is a subset of Python and can be translated to C. The main reason to use it instead of CPython is speed: it runs generally faster due to JIT compilation.

PyPy implements Python 2.7.10. It supports all of the core language, passing the Python test suite (with minor modifications that were already accepted in the main python in newer versions). It supports most of the commonly used Python standard library modules.

336
Q

Does PyPY have a GIL?

A

Yes. The GIL is very difficult to remove. You can use pypy-stm instead, which uses software transactional memory, but will suffer a performance penalty.
How can a server deal with a SYN flood attack? When it detects a large number of SYN packets at once, or the size of its SN (sequence number) data structure reaches a certain threshold of entries, it can switch to a stateless version, where it send SN responses as signed values with a timestamp, and if it receives one back it lets them through without needing a lookup table.

337
Q

What is a stack canary?

A

It’s a buffer overflow defense where a random value is pushed onto the stack after the saved EBP, and before tearing down the stack frame, the value is checked. Any buffer flow targeting the return instruction pointer would have to have overwritten this value.

338
Q

What type of buffer overflow protection does gcc and Visual Studio employ?

A

They use a stack check guard of bytes before and after the buffer’s allocated memory. Once values are written to the buffer, the bytes are checked to ensure they are still the same.

339
Q

What is scalability?

A

Scalability is the measure to which a system can adapt to a change in demand for resources, without negatively impacting performance.

340
Q

What is Akka?

A

An open source project that provides a simpler, single programming model - one way of coding for concurrent and distributed applications - the actor programming model.

Akka’s primary goal is to make it simpler to build applications that are deployed in the cloud or run on devices with many cores and that efficiently leverage the full capacity of the computing power available. It’s a toolkit that provides an actor model, runtime, and required supporting tools for building scalable applications.

341
Q

What is an actor?

A

Briefly, actors are a lot like message queues without the configuration and message broker installation overhead. They’re like programmable message queues shrunk to microsize—you can easily create thousands, even millions of them. They don’t “do”

anything unless they’re sent a message.

Messages are simple data structures that can’t be changed after they’ve been created, or in a single word, they’re immutable.

Actors can receive messages one at a time and execute some behavior whenever a message is received. Unlike queues, they can also send messages (to other actors).

Everything an actor does is executed asynchronously. Simply put, you can send a message to an actor without waiting for a response. Actors aren’t like threads, but messages sent to them are pushed through on a thread at some point in time. How actors are connected to threads is configurable - this is not a hardwired relationship.

For now the most important aspect of actors is that you build applications by sending and receiving messages. A message could be processed locally on some available thread, or remotely on another server. Exactly where the message is processed and where the actor lives are things you can decide later, which is very different compared to hardcoding threads and RPC style networking. Actors make it easy to build your application out of small parts that resemble networked services, only shrunk to microsize in footprint and administrative overhead.

342
Q

What is an IDL-based encoding?

A

An interface description language or interface definition language (IDL) encoding.

It requires a schema definitions. They offer peace of mind with respect to data format and validation for consumers while sacrificing flexibility in the schema’s evolution.

343
Q

What is Tarantool?

A

An in-memory noSQL database that uses write-ahead logging for crash resistance and persistence.

344
Q

What is a coroutine?

A

An object representing activity that eventually completes. Also refers the the function we call that returns a coroutine.

In Python, coroutines are generators.

345
Q

What is a future?

A

An object representing a result that may not be available yet.

346
Q

What is AQP?

A

Approximate query processing. It means pulling a sample of data instead of taking time to process an exact result. It is often used when a data storage involves terabytes or more.

347
Q

How would you visualize billions of items in a graph?

A

In many cases, you don’t need to graph every point, just use visualization-aware sampling. Sometime 1% or less will do.
What is F1/Spanner? Fault-Tolerant Distributed RDBMS (Spanner) Supporting Google’s Ad Business (F1)

348
Q

What is Photon?

A

Fault-tolerant and Scalable Joining of Continuous Data Streams

349
Q

What is Mesa?

A

Geo-Replicated, Near Real-Time, Scalable Data Warehousing

350
Q

What is ElasticSearch?

A

Open Source, Distributed, RESTful Search Engine

351
Q

What is an example of a circuit breaker?

A

Start sending 503s if your service is choked to avoid numerous simultaneous retries that just make the system worse.

352
Q

What is the name of Google’s search ranking algorithm?

A

Hummingbird. PageRank is just one factor used by the algorithm.

353
Q

What is celery?

A

Distributed Task Queue

354
Q

LRU is the most popular type of what kind of policy?

A

Eviction

355
Q

What does an eviction policy try to predict?

A

An eviction policy tries to predict which entries are most likely to be used again in the near future, thereby maximizing the hit ratio

356
Q

What is Caffiene?

A

Caffeine is a high performance, near optimal caching library based on Java 8.

357
Q

What is request coalescing?

A

When many requests arrive for some content that’s missing in the cache (cache miss), only one instance request will proceed to the backend to fetch the content on behalf of all to avoid a flood.
When might you need to use a NoSQL database “You don’t have any relational data.

If you need to store > 5 TB of data or you have an incredibly data intensive workload.

Your application has super low-latency requirements.

You need really high throughput.

358
Q

What is BASE?

A

basically available

soft state

eventually consistent

A BASE based system is more tolerant to latency because it is an inherently partitioned and loosely coupled architecture and it uses eventual consistency.

359
Q

What is ACID?

A

atomicity

consistency

isolation

durability

360
Q

What is the CAP theorem?

A

The CAP theorem, also named Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

  • Consistency (all nodes see the same data at the same time)
  • Availability (every request receives a response about whether it succeeded or failed)
  • Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)
361
Q

What is zero copy?

A

Applications that use zero copy request that the kernel copy the data directly from the disk file to the socket, without going through the application. Zero copy greatly improves application performance and reduces the number of context switches between kernel and user mode.

362
Q

What is a metaclass?

A

Any callable (function or class) that implements type()’s function signature.

363
Q

What is privilege separation?

A

Separating an application into different areas so a vulnerability in one area doesn’t affect the entire application.

364
Q

In security, what is a principal?

A

An entity with privileges or rights.

365
Q

In Unix, who is the owner of a file?

A

The user with the user ID that matches the UID of the inode.

366
Q

What privilege do you need to lookup files or directories in a path?

A

The executable permission.

367
Q

When is security enforced on a file?

A

Security is checked when the file descriptor is created. Then it’s up to the user to be careful and secure the file descriptor.
What is ptrace? ptrace is a system call found in several Unix and Unix-like operating systems. By using ptrace, one process can control another, enabling the controller to inspect and manipulate the internal state of its target.

368
Q

What can you use to debug a process?

A

ptrace

369
Q

What user privilege is required to bind to ports < 1024?

A

root

370
Q

During system bootstrapping, what call is performed by the system to give a non-root user the ownership of a process?

A

setuid()

371
Q

What are the setuid binaries?

A

su

sudo

372
Q

What does chroot do?

A

Changes the root directory (/) for a user to be a directory on the filesystem where they can’t escape.

373
Q

What is a confused deputy?

A

A confused deputy is a computer program that is innocently fooled by some other party into misusing its authority. It is a specific type of privilege escalation. In information security, the confused deputy problem is often cited as an example of why capability-based security is important, as capability systems protect against this whereas access control list-based systems do not.

The classic example is a Fortran compiler that creates a billing record for each use. A user was able to tell the compiler to output a binary with the same name as the billing file, overwriting it.

374
Q

What is an example of a confused deputy in the web frontend world?

A

A CSRF attack.

375
Q

What is ambient authority, or ambient privilege?

A

The decision about whether a process or agent can perform an action is based on information not explicitly stated, but inherited instead.

376
Q

What is a capability?

A

The privilege to act upon something given your ownership of it, and the inability to act on something using an intermediate process’ privileges. An example would be a function where you pass a file descriptor as an argument and the function uses your capability, not its own.

377
Q

What is a requirement of enabling sandboxing?

A

The kernel must be able to support it by disallowing system calls that reference global namespaces:

  • file paths starting at root - must be relative
  • network
  • PIDs - use process descriptors instead”
    How is RSA decryption optimized for speed? “- c^d mod p and c^d mod q are processed in parallel and merged at the end using the Chinese remainder theorem
  • put into Montgomery format
  • sliding windows to exponentiate on bits of exponent
  • perhaps a single extra reduction
  • convert back from Montgomery format
  • merge using CRT
378
Q

How do you change a positive integer to negative?

A

Subtract 1, flip all bits

379
Q

How do you change a negative integer to positive?

A

Flip all bits, then add 1

380
Q

What Endianness is Intel?

A

Little Endian, but only in memory. In registers, all are Big Endian.

381
Q

What is Little Endianness?

A

The least significant bytes of a word or larger are stored in the lowest address. All bytes are the same. There is no Endianness within a byte.

382
Q

What is avalancing?

A

The effect of a hashing method where a small change in the input has a large effect on the output.

383
Q

What is Chef?

A

A configuration tool. You write or reuse recipes that declare the state you wish your server to be in. It calculates the delta and builds out for you.

384
Q

What is an example of a non-cryptographic hash function?

A

MurmurHash is an efficient, non-cryptographic hash function suitable for general hash-based lookup. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop. It has an avalanche effect. The current version is MurmurHash3 which yields a 32-bit or 128-bit hash value.

385
Q

How could you process calculations on elements of an array in parallel?

A

Using recursion, divide and conquer, breaking down the array into smaller segments, then merging the values as the recursion unwinds. Non-mutation of the array means locking is not required.

386
Q

What factor should you keep in mind when doing parallel computation on different parts of a large data structure?

A

The memory bandwidth of your RAM. It can become a bottleneck.

387
Q

What will be the computation time when processing multiple tasks?

A

The length of time the longest subcomputation takes.

388
Q

Why should you avoid starting and joining a task or thread on the same line or proximity?

A

Execution on the current thread will block until it completes, thereby obviating the concurrent call.

389
Q

What factors affect performance?

A
  • processor speed
  • number of processors
  • memory access latency and throughput
  • cache behavior
  • runtime behavior (garbage collection, JIT compilation, thread scheduling)
390
Q

What is a multiset?

A

A set in which elements do not have to be unique.

391
Q

Does Python have an opcode cache?

A

In a way. It outputs a .pyc file, containing the bytecode. When a module is imported for the first time, or when the source is more recent than the current compiled file, a .pyc file containing the compiled code will usually be created in the same directory as the .py file. When you run the program next time, Python uses this file to skip the compilation step.

392
Q

What are some key things to remember when scaling a large system?

A

1) Asynchronous is good (use queues, topics/pub-sub)
2) Parallel is good (multi-threading, load balancing etc.)
3) Avoid points of contention e.g. synchronization
4) Avoid writing to disk until you must - cache like crazy
5) Scale out not up
6) At web scale the speed of light is a problem
7) At web scale everything fails - networks, load balancers etc.

393
Q

What is Thrift?

A

Apache Thrift is a framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between different languages. It handles serialization and has its own communication protocol.

IDL-based.

394
Q

What is Memcache?

A

An in-memory distributed hash table. It supports only a few commands but it is extremely efficient.
How does/did Facebook use memcache and mySQL in 2009? No joins in production. They have many logical databases for all of their types: people, events, place info, etc. They treat the web tier as a CPU, memcache as system memory, and the database as disk. Everything has an ID and you use the ID to query memcache using a multiget. Any misses are fetched from the database and cached in memcache.

395
Q

What is an out-of-band cache?

A

A cache layer that does not synch with persistent storage. When changes are made to the database, there are no notifications to synchronize with the cache. The cache entry would need to be updated or evicted by other means.

396
Q

What is a monad?

A

In functional programming, monads are a way to build computer programs by joining simple components in predictable and robust ways. A monad is a structure that represents computations defined as sequences of steps: a type with a monad structure defines what it means to chain operations together, or nest functions of that type. This allows the programmer to build pipelines that process data in a series of steps (i.e. a series of actions applied to the data), in which each action is decorated with additional processing rules provided by the monad. A monad is defined by a return operator that creates values, and a bind operator used to link the actions in the pipeline.

397
Q

What is pyramidpypi?

A

A self-hosted mirror of pypi.

398
Q

Big Omega represents what?

A

A lower bound on the growth of a function. f grows at least as fast as g.

399
Q

Theta represents what?

A

A tight asymptotic bound on a function, in other words if both f and g have approximately the same rate of growth.

400
Q

For graph problems, the complexity Theta(N + M) is known as what?

A

linear in the graph size

401
Q

What is data normalization?

A

Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like insertion, update and deletion anomalies.

Normalization is used for mainly two purpose,

  • eliminating redundant (useless) data
  • ensuring data dependencies make sense
402
Q

What are the requirements for first normal form?

A
  • each cell has a single value
  • all items in a column must be of the same type
  • rows are uniquely identified by a unique ID or a composite key
403
Q

What are the requirements for second normal form?

A

All attributes (non-key columns) are dependent on the key

404
Q

What are the requirements for third normal form?

A

All fields can only be determined by the key in the table and no other column.

405
Q

What are the requirements for fourth normal form?

A

No multi-valued dependencies, meaning records should not be duplicated in a table just because more than one item is associated. This creates records that are duplicates except for one field.

406
Q

What are some use cases of Hadoop?

A
  • reporting on user behavior over many events

- log processing of 100s of billions of rows

407
Q

What are some solid principles to keep in mind for scaling?

A
  • Keep it very simple
  • Don’t re-invent the wheel
  • Go with boring, proven and well-supported technologies when you can
  • Build for what you will need over the next 12-18 months
  • Make different things look the same
  • Cache to protect the database
  • Good enough is good enough
408
Q

What is gunicorn?

A

A Python WSGI HTTP Server

409
Q

What can Redis be used for?

A
  • a noSQL key-value store
  • caches
  • queues
410
Q

What is a benefit of sharding a database as it grows, and what determines the size?

A

Keeping the index in cache ensures a user lookup doesn’t have to hit the disk, lookups can be served from RAM. How much RAM you have will determine the index size which will hint at the underlying data size.

411
Q

Why do most businesses end up sharding as they scale?

A

To support massive concurrent writes.

412
Q

What is a message broker?

A

Message broker is an intermediary program module that translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver. Message brokers are elements in telecommunication networks where software applications communicate by exchanging formally-defined messages. Message brokers are a building block of Message oriented middleware.

413
Q

What are some examples of message brokers?

A

Apache ActiveMQ

Apache Kafka

Apache Qpid

Celery

Gearman

HornetQ (Red Hat)

IBM Integration Bus

JBoss Messaging (JBoss)

JORAM

Microsoft BizTalk Server (Microsoft)

Microsoft Azure Service Bus (Microsoft)

NATS (MIT Open Source License, written in Go)

Open Message Queue

Oracle Message Broker (Oracle Corporation)

QDB (Apache License 2.0, supports message replay by timestamp)

RabbitMQ (Mozilla Public License, written in Erlang)

SAP PI (SAP AG)

Spread Toolkit

Tarantool, a NoSQL database, with a set of stored procedures for message queues

WSO2 Message Broker

Enduro/X Transactional Message Queue (TMQ)

414
Q

What is Ehcache

A

Ehcache is an open source, standards-based cache that boosts performance, offloads your database, and simplifies scalability. It’s the most widely-used Java-based cache.

415
Q

What is Kafka

A

Apache Kafka is pub-sub messaging rethought as a distributed commit log.

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.

416
Q

What is GAE?

A

Google App Engine is a platform for building scalable web applications and mobile backends. App Engine provides you with built-in services and APIs such as NoSQL datastores, memcache, and a user authentication API, common to most applications.

417
Q

What is GDS?

A

Google Cloud Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development. Cloud Datastore features include:

Atomic transactions.

Massive scalability with high performance.

Flexible storage and querying of data.

Balance of strong and eventual consistency.

Encryption at rest.

Fully managed with no planned downtime.

418
Q

What is the problem that serialization introduces?

A

The overhead of serializing and deserializing. It’s all expensive, and for Python, it can be terribly slow.

419
Q

What does the Python bisect module do?

A

The bisect module, part of the standard library, provides support for maintaining a list in sorted order without having to sort the list after each insertion. For long lists of items with expensive comparison operations, this can be an improvement over the more common approach.

420
Q

What is PycURL?

A

PycURL is a Python interface to libcurl. PycURL can be used to fetch objects identified by a URL from a Python program, similar to the urllib Python module. PycURL is mature, very fast, and supports a lot of features.

PycURL is targeted at an advanced developer - if you need dozens of concurrent, fast and reliable connections or any of the sophisticated features listed above then PycURL is for you.

The main drawback of PycURL is that it is a relatively thin layer over libcurl without any of those nice Pythonic class hierarchies. This means it has a somewhat steep learning curve unless you are already familiar with libcurl's C API."
How does PycURL compare to requests?	"PycURL can handle a large number of multiple concurrent requests. When reusing connections, it can perform more than 2,000 requests per second.

pycurl takes about 73 CPU-microseconds to issue a request when reusing a connection

requests takes about 526 CPU-microseconds to issue a request when reusing a connection

pycurl takes about 165 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~92 microseconds to open

requests takes about 1078 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~552 microseconds to open

421
Q

What is ZooKeeper?

A

Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

Has Java and C interfaces.

422
Q

What is orthogonality?

A

In mathematical terms, it means being perpendicular.

Orthogonality in programming language design is the ability to use various language features in arbitrary combinations with consistent results.

Orthogonality is a system design property which guarantees that modifying the technical effect produced by a component of a system neither creates nor propagates side effects to other components of the system. Typically this is achieved through the separation of concerns and encapsulation, and it is essential for feasible and compact designs of complex systems. The emergent behavior of a system consisting of components should be controlled strictly by formal definitions of its logic and not by side effects resulting from poor integration, i.e., non-orthogonal design of modules and interfaces. Orthogonality reduces testing and development time because it is easier to verify designs that neither cause side effects nor depend on them.

423
Q

When dealing with scaling, how can you deal with rapidly increasing counters, like YouTube video views?

A

You can add randomness to a monotonic counter, because as long as people can see it is increasing somewhat monotonically, it doesn’t need to be 100% accurate. And avoids need to lock it in a transaction.

424
Q

What is exponential backoff and when is it used?

A

Binary exponential backoff or truncated binary exponential backoff refers to an algorithm used to space out repeated retransmissions of network or other service requests, often as part of congestion avoidance.

425
Q

What data structure could be used to efficiently manage a leaderboard?

A

A b-tree where each node manages a subset of the range of the worst to best scores.

426
Q

What does adding jitter in system design help you avoid?

A

If your system doesn’t jitter then you get thundering herds. Distributed applications are really weather systems. Debugging them is as deterministic as predicting the weather. Jitter introduces more randomness because surprisingly, things tend to stack up.

427
Q

What is an example of adding jitter to a caching system?

A

Cache expirations. For a popular video they cache things as best they can. The most popular video they might cache for 24 hours. If everything expires at one time then every machine will calculate the expiration at the same time. This creates a thundering herd.

By jittering you are saying randomly expire between 18-30 hours. That prevents things from stacking up. They use this all over the place. Systems have a tendency to self synchronize as operations line up and try to destroy themselves. Fascinating to watch. You get slow disk system on one machine and everybody is waiting on a request so all of a sudden all these other requests on all these other machines are completely synchronized. This happens when you have many machines and you have many events. Each one actually removes entropy from the system so you have to add some back in.

428
Q

What value can you make cron or other system intervals so they don’t coincide?

A

Use distinct prime numbers for periodicities.

429
Q

What Python package provides tools for adapting or extending functions and other callable objects, without completely rewriting them?

A

functools

offers:

Decorators

Comparison

Rich Comparison

Collation Order

Caching

Reducing a Data Set

Generic Functions

430
Q

What is a proxy server?

A

A proxy server is an intermediate piece of hardware/software that receives requests from clients and relays them to the backend origin servers. Typically, proxies are used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compression).

431
Q

What is collapsed forwarding?

A

A proxy server can collapse the same (or similar) requests together into one request, and then return the single result to the requesting clients.

Another great way to use the proxy is to not just collapse requests for the same data, but also to collapse requests for data that is spatially close together in the origin store (consecutively on disk).

432
Q

What should be handling requests first, a proxy server or a cache?

A

Generally it is best to put the cache in front of the proxy. This is because the cache is serving data from memory, it is very fast, and it doesn’t mind multiple requests for the same result. But if the cache was located on the other side of the proxy server, then there would be additional latency with every request before the cache, and this could hinder performance.

433
Q

What are some popular proxies?

A

HAProxy

Squid

Varnish

434
Q

Why do indexes tend to slow down writes?

A

Since you must both write the data and update the index.

435
Q

What is the role of a load balancer?

A

The role is to distribute load across a set of nodes responsible for servicing requests. This allows multiple nodes to transparently service the same function in a system. Their main purpose is to handle a lot of simultaneous connections and route those connections to one of the request nodes, allowing the system to scale to service more requests by just adding nodes.

436
Q

What are some examples of well-known queue (or can act as a queue) software?

A

BeanstalkD

RabbitMQ

ActiveMQ

Redis

437
Q

Does asynchronous code tend to be CPU-bound or I/O bound?

A

Asynchronous code tends to be CPU bound, because anything that would block is simply deferred to later, until the blocking operation completes. This means that threads in asynchronous / non-blocking applications are much more likely to use their full time quantum before the kernel scheduler preempts them.

438
Q

What is the optimal number of threads?

A

And if there’s the same number of runnable threads as there are hardware threads, the kernel is very likely to reschedule threads on the same core, which significantly helps performance.

439
Q

What is the typical time slice for a process on a Linux box?

A

Linux kernels are often compiled with HZ=100, which entails that processes are given time slices of 10ms.
How does Linux handle CPU affinity? Default Linux kernels don’t do a good job at keeping CPU affinity, even on idle machines. You must explore alternative schedulers or use taskset or cpuset to control affinity yourself.

440
Q

What is futex?

A

A futex (short for ““fast userspace mutex””) is a Linux kernel system call that programmers can use to implement basic locking, or as a building block for higher-level locking abstractions such as semaphores and POSIX mutexes or condition variables.

441
Q

How do context switches perform under virtualization?

A

On average, it’s 2.5x to 3x more expensive to do a context switch when using virtualization. My guess is that this is due to the fact that the guest OS can’t update the page table itself, so when it attempts to change it, the hypervisor intervenes, which causes an extra 2 context switches (one to get inside the hypervisor, one to get out, back to the guest OS)

442
Q

What are 2 examples of in-memory caches?

A

Memcached and Redis are both examples of in-memory caches

443
Q

What is write-through cache?

A

Write-through cache directs write I/O onto cache and through to underlying permanent storage before confirming I/O completion to the host. This ensures data updates are safely stored on, for example, a shared storage array, but has the disadvantage that I/O still experiences latency based on writing to that storage. Write-through cache is good for applications that write and then re-read data frequently as data is stored in cache and results in low read latency.

444
Q

What is write-back cache?

A

Write-back cache is where write I/O is directed to cache and completion is immediately confirmed to the host. This results in low latency and high throughput for write-intensive applications, but there is data availability exposure risk because the only copy of the written data is in cache. As we will discuss later, suppliers have added resiliency with products that duplicate writes. Users need to consider whether write-back cache solutions offer enough protection as data is exposed until it is staged to external storage. Write-back cache is the best performing solution for mixed workloads as both read and write I/O have similar response time levels.

445
Q

What is read-through cache?

A

An item is accessed from cache, and if it’s a cache miss, the data will be read from persistent storage (perhaps with a callback) and then placed into cache. The response is then sent back to the host.

446
Q

What is HDFS?

A

Hadoop File System (HDFS) is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers.

447
Q

What is Hortonworks?

A

Hortonworks is a software company focused on the development and support of Apache Hadoop, a framework that allows for the distributed processing of large data sets across clusters of computers.

448
Q

What is multi-homing?

A

Running a service across multiple datacenters.

449
Q

Where is weak consistency OK?

A
  • caching
  • VOIP
  • real-time mutiplayer games
450
Q

What is the Paxos algorithm?

A

Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures.

451
Q

What problem does consistent hashing help solve?

A

If you’re using a caching scheme like server = hash(i) mod m, and one server in the cluster drops out, consistent hashing is needed to avoid swamping your servers when all the caches need to rehash their entities.

452
Q

What is the relationship between consistent hashing and memcache?

A

Consistent hashing can be used with memcache not even knowing about it. It is interesting to note that it is only the client that needs to implement the consistent hashing algorithm - the memcached server is unchanged.

453
Q

What are some examples of NoSQL solutions?

A

Google BigTable

HBase (based on Hadoop)

Hypertable

Amazon DynamoDB

Voldemort

Cassandra

Riak

Redis

CouchDB

MongoDB

454
Q

What is Paxos an example of?

A

quorum-based 2PC (2 phase commit) protocol

455
Q

What is MVCC?

A

Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.

456
Q

What is the S in SOLID?

A

The single responsibility principle. There should never be more than one reason for a class to change. We can relate the “reason to change” to “the responsibility of the class”. So each responsibility would be an axis for change.

457
Q

What does concurrent.futures do?

A

The concurrent.futures modules provides interfaces for running tasks using pools of thread or process workers. The APIs are the same, so applications can switch between threads and processes with minimal changes.

458
Q

What is the O in SOLID?

A

The Open/Closed Principle (OCP) states that the design and writing of the code should be done in a way that new functionality should be added with minimum changes in the existing code. The design should be done in a way to allow the adding of new functionality as new classes, keeping as much as possible existing code unchanged.

“open for extension / closed for modifications”

During the 1990s, the open/closed principle became popularly redefined to refer to the use of abstracted interfaces, where the implementations can be changed and multiple implementations could be created and polymorphically substituted for each other.

In contrast to Meyer’s usage, this definition advocates inheritance from abstract base classes. Interface specifications can be reused through inheritance but implementation need not be.

459
Q

Which SOLID principle is ““Make all Member Variables Private.”” helping to enforce?

A

The open/closed principle (OCP)

460
Q

What is the L in SOLID?

A

The Liskov substitution principle (LSP)

We must make sure that the new derived classes just extend without replacing the functionality of old classes. Otherwise the new classes can produce undesired effects when they are used in existing program modules.

Liskov’s Substitution Principle states that if a program module is using a Base class, then the reference to the Base class can be replaced with a Derived class without affecting the functionality of the program module.

This principle is just an extension of the Open Close Principle and it means that we must make sure that new derived classes are extending the base classes without changing their behavior.

461
Q

What is the I in SOLID?

A

The Interface Segregation Principle (ISP) states that clients should not be forced to implement interfaces they don’t use. Instead of one fat interface many small interfaces are preferred based on groups of methods, each one serving one submodule.

462
Q

What is the D in SOLID?

A

Dependency inversion principle (DIP)

High-level modules should not depend on low-level modules. Both should depend on abstractions.

Abstractions should not depend on details. Details should depend on abstractions.

463
Q

What are 3 things CDNs use to ensure availability?

A

Local clustering can improve fault-tolerance and scalability. Mirroring (deploying clusters in a few locations) and multihoming (using multiple ISPs to connect to the Internet).

Clustering, mirroring, and multihoming are common approaches for sites with stringent reliability and scalability needs.

464
Q

What is hyper-threading?

A

Hyper-threading enables a single processor core to be used for two or more concurrent executions with just a little extra hardware.

465
Q

What is DMA?

A

DMA (Direct Memory Access) allows devices, with the help of the Northbridge, to store and receive data in RAM directly without the intervention of the CPU.

466
Q

What does NUMA stand for?

A

Non-Uniform Memory Architecture

467
Q

Where are SRAM and DRAM used?

A

SRAMs are used in Caches because of higher speed and DRAMs are used for main memory in a PC because of higher densities.
What is the difference between SRAM and DRAM? DRAM stands for Dynamic Random Access Memory. It is a type of semiconductor memory in which the memory is stored in the form of a charge. Each memory cell in a DRAM is made of a transistor and a capacitor. The data is stored in the capacitor. Capacitors loose charge due to leakage and hence DRAM’s are volatile devices. To keep the data in the memory, the device must be regularly refreshed whereas SRAM is static, so it will retain a value as long as power is supplied. SRAM is typically faster than DRAM since it doesn’t have refresh cycles. Since each SRAM memory cell is comprised of 6 Transistors unlike a DRAM memory cell, which is comprised of 1 Transistor and 1 Capacitor, the cost per memory cell is far greater in an SRAM compared to a DRAM.

468
Q

What is the difference between a CPU core and a CPU thread?

A

The difference between a core and a thread is that separate cores have separate copies of (almost) all the hardware resources. The cores can run completely independently unless they are using the same resources–e.g., the connections to the outside - at the same time. Threads, on the other hand, share almost all of the processor’s resources.

Intel’s implementation of threads has only separate registers for the threads and even that is limited, some registers

are shared.

469
Q

What is SMP?

A

symmetric multi-processor

In symmetric multi-processor (SMP) systems the caches of the CPUs cannot work independently from each other. All processors are supposed to see the same memory content at all times. The maintenance of this uniform view of memory is called “cache coherency”.

470
Q

What is Spanner?

A

Spanner is a scalable, globally-distributed database designed, built, and deployed at Google. At the highest level of abstraction, it is a database that shards data across many sets of Paxos state machines in datacenters spread all over the world. Replication is used for global availability and geographic locality; clients automatically failover between replicas. Spanner automatically reshards data across machines as the amount of data or the number of servers changes, and it automatically migrates data across machines (even across datacenters) to balance load and in response to failures. Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows.

471
Q

What is Marzullo’s algorithm?

A

Marzullo’s algorithm, is an agreement algorithm used to select sources for estimating accurate time from a number of noisy time sources. A refined version of it, renamed the ““intersection algorithm””, forms part of the modern Network Time Protocol.

472
Q

What is BNF?

A

BNF (Backus Normal Form or Backus–Naur Form) is one of the two main notation techniques for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols; the other main technique for writing context-free grammars is the van Wijngaarden form.

473
Q

What is MapReduce?

A

MapReduce, developed by Google in 2004, is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

474
Q

What is a Zipf distribution?

A

The Zipf distribution, sometimes referred to as the zeta distribution, is a discrete distribution commonly used in linguistics, insurance, and the modeling of rare events.

475
Q

The memory addresses returned by the malloc function are typically aligned to at least ___ bytes.

A

8

476
Q

What is AddressSanitizer?

A

AddressSanitizer is a fast memory error detector. AddressSanitizer finds out-of-bounds (for heap, stack, and globals) accesses and use-after-free bugs at the cost of 73% slowdown on average and a 3.4x memory size; the tool has no false positives.

AddressSanitizer uses shadow memory to provide accurate and immediate bug detection. The conventional wisdom is that shadow memory either incurs high overhead through multi-level mapping schemes or imposes prohibitive address space requirements by occupying a large contiguous region. Our novel shadow state encoding reduces our shadow space footprint enough that we can use a simple mapping, which can be implemented with low overhead.

It has been included as a compilation option in LLVM since 3.1.

477
Q

What is transitive closure?

A

transitive closure can be thought of as constructing a data structure that makes it possible to answer reachability questions. That is, can one get from node a to node d in one or more hops?

478
Q

What is CUDA?

A

CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by NVIDIA.[1] It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach known as GPGPU. The CUDA platform is a software layer that gives direct access to the GPU’s virtual instruction set and parallel computational elements, for the execution of compute kernels.[2]

479
Q

What is Borg?

A

The first unified container-management system developed at Google. It was built to manage both long-running services and batch jobs.

480
Q

What is MPM?

A

Within Google, MPM (Midas Package Manager) is used to build and deploy container images. It corresponds to the Docker image registry for Docker containers.

481
Q

What are 3 benefits of containers?

A
  1. Containers encapsulate the application environment, abstracting away many details of machines and operating systems from the application developer and the deployment infrastructure.
  2. Because well-designed containers and container images are scoped to a single application, managing containers means managing applications rather than machines. This shift of management APIs from machine-oriented to application oriented dramatically improves application deployment and introspection.
  3. Decoupling of image and OS makes it possible to provide the same deployment environment in both development and production, which, in turn, improves deployment reliability and speeds up development by reducing inconsistencies and friction.
482
Q

What is Chubby?

A

A distributed lock service (master election) built on Borg.

483
Q

What is Protocol buffers?

A

Protocol buffers (aka protobuf) are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. IDL-based.

You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

484
Q

Why is using Protocol buffers better than pickling?

A

Python pickling doesn’t deal well with schema evolution, and also doesn’t work very well if you need to share data with applications written in C++ or Java.

485
Q

What is gRPC?

A

It’s an open source framework for RPC by Google. gRPC uses HTTP/2 and Google’s own Protobuf to provide a scalable and low latency communication. With gRPC comes a new version of Protobuf (proto3) for high performance binary serialization which includes new features and is easier to use than its predecessors.

486
Q

What is Redis?

A

Redis is an in-memory data structure store, used as database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

Redis is more susceptible to data loss in the event of abrupt termination or power failures.

487
Q

What is RabbitMQ?

A

RabbitMQ is a messaging broker - an intermediary for messaging.

Messages are routed through exchanges before arriving at queues. RabbitMQ features several built-in exchange types for typical routing logic. For more complex routing you can bind exchanges together or even write your own exchange type as a plugin.

It can be used as a durable queue, work queues, pub/sub, topic handler, and even for rpc.

488
Q

What is Celery?

A

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. You use it with a message broker, and it manages the task execution.

The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

The recommended message broker is RabbitMQ, but support for Redis, Beanstalk, MongoDB, CouchDB, and databases (using SQLAlchemy or the Django ORM) is also available.”
What does amqp stand for? Advanced Message Queuing Protocol