Test_3 Flashcards
- a. [SEE QUUESTION]
(CS6210Spring2018Final question 4a) [my wording]
- Replication of data so that if a failure occurs we can have full harvest from a successfully answered query (this may reduce yield because we decrease capacity)
- Results in 20 servers serving 1 query
- b. [SEE QUUESTION]
parallelism for individual query processing data will be partitioned (keeping multiple replicas for each partition) so that the response time for query can be reduced.
(CS6210Spring2018Final question 4a)
20 servers per query, with 20 queries equals 400 servers
- a. [SEE QUUESTION]
90% yield if all 10 servers come from the same group. That way the other 9 server groups will have full harvest still
- b. [SEE QUUESTION]
0% if all 10 servers come from different groups, because then no group will have full harvest, so no request can be fulfilled
- [SEE QUUESTION]
- 10 threads for mapping require Tm+Ti to finish
- 50 shards of data, total time for mapping: 5*(Tm+Ti)
- Each reducer has to perform 50 blocking synchronous RPC operations to get intermediate results: 50*Trpc
- 5 threads for reducing in parallel to produce the final result, each taking: 50*Trpc+Tr+Tf
- Since there are 5 reduce instances producing 5 distinct outputs, total time to complete the reduce work: 50*Trpc+Tr+Tf
- Total time = 50Trpc+Tr+Tf + 5(Tm+Ti)
- a. i. [SEE QUUESTION]
- Meta server overload (many users generating similar key hashes)
- Tree saturation (content at a node becoming hot for get/put operations that slows down servicing requests)
- Origin server overload happens when there are many get/put requests for content on a single node
- a. ii. [SEE QUUESTION]
- Coral avoids both these problems by using k=n as hint instead of an absolute in get/put operations.
- Metadata server overload is avoided by returning a value from the first node on the path which has a value for that key.
- Tree saturation is avoided by storing keys on the way to the destination if the path to the destination fills up
- b. i. [SEE QUUESTION]
Node 150, because the desired destination (node 200) already has 2 entries so it is full.
- b. ii. [SEE QUUESTION]
Node 100, because Node 150 already has 2 entries so it is full
- b. iii. [SEE QUUESTION]
The value stored at Node 100, which points to Node 2, because it is the first node with a result that it reaches on its path
- i. [SEE QUUESTION]
Using APIC (Advanced programmable Interrupt Handler) timer hardware, we can reprogram one-shot timers in only a few cycles to have ~10ns of accuracy
- ii. [SEE QUUESTION]
- We can take advantage of periodic timer events by dispatching any one-shot event at preceding periodic timers
- Periodic events in the kernel are much more efficient O(1) so grouping the one-shot event dispatch to one of these periodic events saves overhead
- iii. [SEE QUUESTION]
- Interrupt handler will go through a timer data structure that contains tasks sorted by expiration time
- In the data structure, there is a callback handler for dealing with each task that will be called by the interrupt handler
- Expired timers will be removed from the data structure
- Interrupt handler will reprogram the one-shot timer for a task in the data structure to handle another event
- a. [SEE QUUESTION]
- In PTS programming model, each thread can use the upstream item’s timestamp (with get) and store resulting item (with put) in the output channel with the same timestamp, this allows propagation of the temporal causality for each data stream.
- When N1 receives items from different channels, it can perform high level inference work based on the timestamps on these items to see if they are temporally correlated, with the underlying assumption that data producers have synchronized clocks.
- b. i. [SEE QUUESTION]
False, a PTS channel is append-only and therefore T1’s put will not be overwritten by T2. There will be multiple items with the same timestamp