2. Cloud Flashcards

1
Q

What is cloud computing?

A

Cloud Computing is Computing in the Internet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the fat tree design?

A

Network design for datacenter:
- Three tier design: Edge, Aggregation, Core
- Defined by single parameter k = number of ports on a switch
- All layers use the same switch
- Supports k³/4 hosts
- High redundancy: k*k/4 paths between two endpoints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the jellyfish network design?

A

Forget network structure and use random connections:
- Each 4L ports switch connects to
– L hosts
– 3L other random switches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the CAP theorem?

A

In a distributed system you can satisfy at most two out of the following three properties:
1. Consistency: all nodes have same data at any time
2. Availability: the system allows operations all the time
3. Partition-tolerance: the system continues to work in spite of network
partitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does Cassandra handle the CAP theorem?

A

Weak consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the characteristics of Cassandra?

A
  • Key-Value Pair Storage
  • “No-SQL”
  • Supports get(key) and put(key,value) operations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is data stored in Cassandra?

A
  • Key-value pair
  • Nodes form a ring and key is hashed to determine the location (DHT)
  • Similar to chord
  • Replicated on n nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the replica policies in Cassandra?

A
  • Rack Unaware: replicate data at n-1 successive nodes
  • Rack Aware: coordinator tells nodes the range they are replicas for
  • Datacenter Aware: same as rack aware, but on datacenter level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does a write operation in Cassandra work?

A
  • Partitioner of the node determines the node responsible (hash function)
  • Log it to disk commit log
  • Modify memtables
  • When memtables are old or full, flush to disk
    – Datafile, Indexfile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do Bloom filters work and what are they used for in Cassandra?

A

Bloom filter: Bit map and a set of hash functions.
- Use the set of hash functions to create a fingerprint for a given key:
– h(x) = y -> BIT[y] = 1
- is used to check if data is present on a node
- might create false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is a delete operation done in Cassandra?

A
  • Don’t delete item right away
  • Add tombstone to item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is a read operation done in Cassandra?

A
  • Fetch data from closest replica
  • Also fetch multiple other replicas
    – If data differs init read-repair
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the potential speed-up of parallelization computed?

A
  • Amdahls formula (upper bound):
    n = number of processors
    p = portion of the program that is parallelizable

S = 1 / ((1-p) + p/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the two methods of parallelization in cloud computing

A

Request Level Parallelism (RLP):
- Concurrent processing of multiple requests: e.g. Google
– Distribute indexing, images, documents, ads, … to multiple nodes
Data Level Parallelism (DLP):
- Concurrent processing of multiple data: e.g. MapReduce
– Distribute data with map and reduce nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the main principle of MapReduce

A
  • Data in key-value format
  • Chunk of data is processed by Mapper (mapping function) to Intermediate Output
  • Intermediate Output is assigned by Partitioner to Reducer (reduce funciton)
    – Same Intermediate key -> same reducer
  • Reducer produces final output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is the architecture of MapReduce?

A

Master-Worker architecture:
Master = Job Tracker (JT), Worker = Task Tracker
- TT pulls map or reduce tasks from JT
- TT periodically sends heartbeat to JT

17
Q

How is fault tolerance implemented in MapReduce?

A
  • JT restarts task if it doesn’t receive a heartbeat from the TT
  • JT assigns all map or reduce tasks from the failed node to another node
  • JT identifies slow tasks (stragglers) by tracking the progress and runs them redundantly on a second node
18
Q
A