2. a. [SEE QUUESTION]

(CS6210Spring2018Final question 4a) [my wording] - Replication of data so that if a failure occurs we can have full harvest from a successfully answered query (this may reduce yield because we decrease capacity) - Results in 20 servers serving 1 query

- 10 threads for mapping require Tm+Ti to finish - 50 shards of data, total time for mapping: 5 (Tm+Ti) - Each reducer has to perform 50 blocking synchronous RPC operations to get intermediate results: 50 Trpc - 5 threads for reducing in parallel to produce the final result, each taking: 50 Trpc+Tr+Tf - Since there are 5 reduce instances producing 5 distinct outputs, total time to complete the reduce work: 50 Trpc+Tr+Tf - Total time = 50 Trpc+Tr+Tf + 5 (Tm+Ti)

5. a. i. [SEE QUUESTION]

- Meta server overload (many users generating similar key hashes) - Tree saturation (content at a node becoming hot for get/put operations that slows down servicing requests) - Origin server overload happens when there are many get/put requests for content on a single node

5. a. ii. [SEE QUUESTION]

- Coral avoids both these problems by using k=n as hint instead of an absolute in get/put operations. - Metadata server overload is avoided by returning a value from the first node on the path which has a value for that key. - Tree saturation is avoided by storing keys on the way to the destination if the path to the destination fills up

6. ii. [SEE QUUESTION]

- We can take advantage of periodic timer events by dispatching any one-shot event at preceding periodic timers - Periodic events in the kernel are much more efficient O(1) so grouping the one-shot event dispatch to one of these periodic events saves overhead

6. iii. [SEE QUUESTION]

- Interrupt handler will go through a timer data structure that contains tasks sorted by expiration time - In the data structure, there is a callback handler for dealing with each task that will be called by the interrupt handler - Expired timers will be removed from the data structure - Interrupt handler will reprogram the one-shot timer for a task in the data structure to handle another event

7. a. [SEE QUUESTION]

- In PTS programming model, each thread can use the upstream item’s timestamp (with get) and store resulting item (with put) in the output channel with the same timestamp, this allows propagation of the temporal causality for each data stream. - When N1 receives items from different channels, it can perform high level inference work based on the timestamps on these items to see if they are temporally correlated, with the underlying assumption that data producers have synchronized clocks.

Test_3 Flashcards by Matt Pearson

a. [SEE QUUESTION]

(CS6210Spring2018Final question 4a) [my wording]

Replication of data so that if a failure occurs we can have full harvest from a successfully answered query (this may reduce yield because we decrease capacity)
Results in 20 servers serving 1 query

How well did you know this?

Not at all

Perfectly

b. [SEE QUUESTION]

parallelism for individual query processing data will be partitioned (keeping multiple replicas for each partition) so that the response time for query can be reduced.
(CS6210Spring2018Final question 4a)

20 servers per query, with 20 queries equals 400 servers

How well did you know this?

Not at all

Perfectly

a. [SEE QUUESTION]

90% yield if all 10 servers come from the same group. That way the other 9 server groups will have full harvest still

How well did you know this?

Not at all

Perfectly

b. [SEE QUUESTION]

0% if all 10 servers come from different groups, because then no group will have full harvest, so no request can be fulfilled

How well did you know this?

Not at all

Perfectly

[SEE QUUESTION]

10 threads for mapping require Tm+Ti to finish
50 shards of data, total time for mapping: 5*(Tm+Ti)
Each reducer has to perform 50 blocking synchronous RPC operations to get intermediate results: 50*Trpc
5 threads for reducing in parallel to produce the final result, each taking: 50*Trpc+Tr+Tf
Since there are 5 reduce instances producing 5 distinct outputs, total time to complete the reduce work: 50*Trpc+Tr+Tf
Total time = 50Trpc+Tr+Tf + 5(Tm+Ti)

How well did you know this?

Not at all

Perfectly

a. i. [SEE QUUESTION]

Meta server overload (many users generating similar key hashes)
Tree saturation (content at a node becoming hot for get/put operations that slows down servicing requests)
Origin server overload happens when there are many get/put requests for content on a single node

How well did you know this?

Not at all

Perfectly

a. ii. [SEE QUUESTION]

Coral avoids both these problems by using k=n as hint instead of an absolute in get/put operations.
Metadata server overload is avoided by returning a value from the first node on the path which has a value for that key.
Tree saturation is avoided by storing keys on the way to the destination if the path to the destination fills up

How well did you know this?

Not at all

Perfectly

b. i. [SEE QUUESTION]

Node 150, because the desired destination (node 200) already has 2 entries so it is full.

How well did you know this?

Not at all

Perfectly

b. ii. [SEE QUUESTION]

Node 100, because Node 150 already has 2 entries so it is full

How well did you know this?

Not at all

Perfectly

b. iii. [SEE QUUESTION]

The value stored at Node 100, which points to Node 2, because it is the first node with a result that it reaches on its path

How well did you know this?

Not at all

Perfectly

i. [SEE QUUESTION]

Using APIC (Advanced programmable Interrupt Handler) timer hardware, we can reprogram one-shot timers in only a few cycles to have ~10ns of accuracy

How well did you know this?

Not at all

Perfectly

ii. [SEE QUUESTION]

We can take advantage of periodic timer events by dispatching any one-shot event at preceding periodic timers
Periodic events in the kernel are much more efficient O(1) so grouping the one-shot event dispatch to one of these periodic events saves overhead

How well did you know this?

Not at all

Perfectly

iii. [SEE QUUESTION]

Interrupt handler will go through a timer data structure that contains tasks sorted by expiration time
In the data structure, there is a callback handler for dealing with each task that will be called by the interrupt handler
Expired timers will be removed from the data structure
Interrupt handler will reprogram the one-shot timer for a task in the data structure to handle another event

How well did you know this?

Not at all

Perfectly

a. [SEE QUUESTION]

In PTS programming model, each thread can use the upstream item’s timestamp (with get) and store resulting item (with put) in the output channel with the same timestamp, this allows propagation of the temporal causality for each data stream.
When N1 receives items from different channels, it can perform high level inference work based on the timestamps on these items to see if they are temporally correlated, with the underlying assumption that data producers have synchronized clocks.

How well did you know this?

Not at all

Perfectly

b. i. [SEE QUUESTION]

False, a PTS channel is append-only and therefore T1’s put will not be overwritten by T2. There will be multiple items with the same timestamp

How well did you know this?

Not at all

Perfectly

b. ii. [SEE QUUESTION]

False, a PTS channel is append-only and therefore T1’s put will not be overwritten by T2. There will be multiple items with the same timestamp

b. iii. [SEE QUUESTION]

False, due to the generic persistence and single writer, memory sync is not required.

a. [SEE QUUESTION]

False, the boundary of a transaction is between the begin and end transaction APIs. Both threads get their own Transaction ID when they issue begin_transaction and are handled independently

b. [SEE QUUESTION]

False, T1’s abort will be handled and the abort will be for the data structure m1. T2 is modifying a different part of the persistent data structure m2

c. [SEE QUUESTION]

False, abort record is only in the memory and not in disk. Redo record will be on the disk if Thread T2 commits the transaction.

a. [SEE QUUESTION]

All the modifications done to the data segments are directly memory mapped into the virtual memory via the persistent Rio file cache
This way, redo logs are not required in RioVista.

b. [SEE QUUESTION]

undo logs are created at the beginning of the transaction; it lives in the Rio file cache
the undo log is removed once the transaction commits or aborts (after restoring the old values from the undo log into the memory mapped data segment)
undo log may NEVER be written to the disk unless the Rio file cache has to do a replacement due to space limitation WITHIN the duration of the transaction

c. [SEE QUUESTION]

This call notifies Rio of the part of the data segment (memory mapped to virtual memory) that is going to be modified.
When the set range call is invoked, the range of memory to be modified will be persisted to the file cache which is the undo log.

d. [SEE QUUESTION]

Upon transaction commit, the undo log is deleted

9. e. [SEE QUUESTION]

- On transaction abort, the contents of the undo log are applied to the memory mapped space of the data segment - As the memory space is automatically persisted, the data segment is back to the state which was prior to any modification - Once the contents of undo log are copied, the undo log is deleted

10. a. [SEE QUUESTION]

- The client node’s transactional manager(TM) is the root(owner) of the transactional tree(TT). The client should be the root because it is the initiator of the IPC for the transactions. - The root’s TM has 1 branch for S1’s TM, because it is the coordinator for recovery management. - S1’s TM will have 2 child nodes which are S2 and S3’s TM.

10. b. [SEE QUUESTION]

TMs write log records for recovering the persistent state, in-memory data structure of the TM

10. c. [SEE QUUESTION]

- Communication Manager (CM) of S1 detects connection failure with client. - An abnormal termination of the owner process is equivalent to an abort. - S1, as the coordinator, will clean up its intermediate file f1 (the first phase of the abort) and propagate the abort to its subordinates S2 and S3 (the second phase of the abort) - S2 and S3 will get the abort message and clean up their intermediate files f2 and f3.

10. d. [SEE QUUESTION]

The transaction is aborted. Files f1, f2, and f3 will be cleaned up on servers S1, S2 and S3.

11. a. [SEE QUUESTION]

False. All N users communicate with a single central entity, therefore O(N) keys are sufficient.

11. b. [SEE QUUESTION]

- Public-Key encryption would require that for every new client workstation in the network, a new pair of asymmetric keys needs to be generated. (+2) - Distributing this new key pair to ALL the workstations and keeping a replicated database of the keys (associated with the user logins) securely in ALL the workstations is challenging. (+1) - Servers (Vice) are physically secure but same guarantees cannot be made for the client workstations which could be distributed across the entire network. (+1)

11. c. i. [SEE QUUESTION]

- The identity is sent earlier in the same session | - Based on the socket the request comes in on, the server knows which identity it correspond to, and thus which key

11. c. ii. [SEE QUUESTION]

No, everything is still the same and provides the same authenticity guarantees. In this case the username is used as the identity and the password is used as the encryption key. Since both the user and server know the password already, the whole thing works without exposing anything over an insecure network