Distributed Algorithms Flashcards
What does M stand for in distributed analysis?
is the number of message transmissions (usually includes unsuccessful transmissions in addition to successful ones). aka ‘message cost’
Lnode
the entity workload, that is, the
number of messages per entity,
Lnode = M/|V|
Llink
the transmission load, that is,
the number of messages per link.
Llink = M/|E|,
B
bit complexity, the number of transmitted bits
why may it be necessary to consider the bit complexity of a distributed protocol?`
Messages are sequences of bits; some protocols might employ messages that are very short (e.g., O(1) bit signals), others very long (e.g., .gif files). Thus, for a more
accurate assessment of a protocol, or to compare different solutions to the same problem that use different sizes of messages, it might be necessary to use as a cost
measure the number of transmitted bits B also called bit complexity.
total execution delay
the delay between the time the first entity starts the execution of a computation and
the time the last entity terminates its execution.
in the general model of distributed computing, what assumptions are there about execution time?
None, except for the fact that communication delays for a single message are finite (but widely varying) in absence of failure. in other words, communication delays are in general unpredictable.
T
the ideal execution delay or ideal time complexity, the execution delay experienced when the system is synchronous and (in the absence of failure) takes one unit of time for a message to arrive and to be processed.
When to do parallel processing?
The simplest performance model for parallelism is the “NQ” model, where N is the number of elements, and Q is the computation per element. In general, you need the product NQ to exceed some threshold before you start getting a performance benefit. For a low-Q problem like “add up numbers from 1 to N”, you will generally see a breakeven between N=1000 and N=10000. With higher-Q problems, you’ll see breakevens at lower thresholds.
You should look into doing parallel processing.The total time to execute the sequential version exceeds a minimum threshold. These days [ 2014], the threshold is roughly (within a factor of ten of) 100 microseconds across most platforms. You don’t need to measure this precisely though. You can estimate this well enough in practice by multiplying N (the number of elements) by Q (cost per element of F), in turn guestimating Q as the number of operations or lines of code, and then checking that N * Q is at least 10000. (If you are feeling cowardly, add another zero or two.) So when F is a tiny function like x -> x + 1, then it would require N >= 10000 elements for parallel execution to be worthwhile. And conversely, when F is a massive computation like finding the best next move in a chess game, the Q factor is so high that N doesn’t matter so long as the collection is completely splittable.
Is using Java data type implementations that use ‘synchronized’ keyword enough?
No
Consider this:
if (vector.isEmpty()){
vector.add(data);
}
Even though the methods involved are synchronized, because they are being locked and unlocked individually, two unfortunately timed threads can create a vector with two elements.
So in effect, you have to synchronize in your application code as well.