Exam cards Flashcards

Question

The propagation delay on a wire is proportional to what?

Answer 1

the product between its resistance and capacitance.

Answer 2

the ratio between the cross-section area and the length of the wire.

Answer 3

It shrinks. Good!

Answer 4

It shrinks. Bad!

Answer 5

Communication traffic is hierarchical: | most communication is local, while inter-core communication only happens occasionally.

Answer 6

It is much easier to replicate the same | structure multiple times than it is to design a large and complex system

Answer 7

half the distance between two metal wires.

Answer 8

Definition 1 (Monoid). Assume a set S and a binary operator ⊙ : S × S →S. (S, ⊙) is called a monoid if it satisfies the following two axioms: (1) Associativity: ∀x,y, z ∈ S we have (x ⊙ y) ⊙ z ≡ x ⊙ (y ⊙ z) and (2) Identity Element: ∃e ∈ S such that ∀a ∈ S, e ⊙ a ≡ a ⊙ e ≡ a.

Answer 9

Definition 2 (Group). (S, ⊙) is called a group if it is a monoid satisfying the additional property that any element is invertible: ∀a, ∃a−1 such that a ⊙ a−1 ≡ a−1 ⊙ a ≡ e.

Answer 10

(Z, +): monoid and group, (N,+): monoid but not group, (Z, ×): monoid not group

Answer 11

A monoid homomorphism from monoid (S, ⊕) to monoid (T, ⊙) is a function h : S →T such that ∀u,v ∈ S, h(u ⊕ v) ≡ h(u) ⊙ h(v).

Answer 12

A LHF function h is a (mathematical) function that can be implemented as: h( [ ] ) = e h( [x] ) = f(x) h( x ++ y) = h(x) ⊙ h(y)

Answer 13

(LT , ++) to (Img(h), ⊙). In particular (Img(h), ⊙) must be a monoid with neutral element e, which also means that ⊙ must be associative. The reverse also (trivially) holds: if h is a homomorphism between (LT , ++) and some monoid (M, ⊙) then it accepts a LHI (as above).

Answer 14

1. computing the length of a list, i.e., len : [α] → Int. len [ ] = 0 len [x] = one x -- where one x = 1 len (x ++ y) = (len x) + (len y) 2. allp that checks whether some predicate p : α → Bool satisfies (holds on) all elements of an input list: allp [ ] = true allp [x] = p x allp (x ++ y) = (allp x) && (allp y) 3. summing up the (numerical) elements of a list. This can be accomplished with the following LHI: sum [ ] = 0 sum [x] = id x -- where id x = x sum (x ++ y) = (sum x) + (sum y) 4. The function that adds two to every (numeric) element of a list and then multiplies the results. fld [ ] = 1 fld [x] = plus2 x -- where plus2 x = x + 2 fld (x ++ y) = (fld x) * (fld y)

Answer 15

the map second-order operator, which takes as argument an unary function and a list of elements, and produces a list of the same length as the original one by applying the function argument to each element of the input list. The type and semantics of map are presented below: map : (α → β) → [α] → [β] map f [x1, . . .xn ] = [f x1,. . ., f xn ]

Answer 16

The computation corresponding to some element x_i of the array is completely independent of all other elements of the array.

Answer 17

The reduce second-order operator, which takes as arguments an associative binary operator, the neutral element corresponding to the monoid defined by the binary operator, and a list of elements. The result of reduce is obtained by successively applying the operator to all the elements of the array. The type and semantics of reduce are presented below: reduce : (α →α →α) → α → [α] →α reduce ⊙ e [x1, . . .xn ] = e ⊙ x1 ⊙ . . . ⊙ xn

Answer 18

It must be associative.

Answer 19

A list-homomorphic implementation defined as: h( [ ] ) = e h( [x] ) = f(x) h( x ++ y) = h(x) ⊙ h(y) is semantically equivalent with (reduce ⊙ e) ◦ (map f), or in complete code: h z ≡ reduce ⊙ e ( map f z)

Answer 20

* len z ≡ reduce (+) 0 (map one z) * allp z ≡ reduce (&&) true (map p z) * sum z ≡ reduce (+) 0 (map id z) ≡ reduce (+) 0 z

Answer 21

1. ( map f) ◦ ( map g) ≡ map (f ◦ g) 2. ( map f) ◦ ( reduce (++) []) ≡ ( reduce (++) []) ◦ ( map ( map f)) 3. ( reduce (⊙) e⊙) ◦ ( reduce (++) []) ≡ ( reduce (⊙) e⊙) ◦ (map ( reduce (⊙) e⊙ ))

Answer 22

It is known as the map fusion/fission rule. Fusion corresponds to applying the transformation in the forward direction and is useful for reducing the number of accesses to global memory which is slow compared to registers. Fission corresponds to applying the transformation in the backwards direction and is useful for enhancing the degree of parallelism that is statically mapped to the hardware in the context of a nested parallel program.

Answer 23

In the forward direction they can be used to efficiently sequentialize the parallelism in excess of what the hardware can support and in the backward direction to enhance load balancing and the programs degree of parallelism that can be statically mapped to the hardware.

Answer 24

Map-reduce composition can be re-written into a semantically equivalent program that: • splits the input list into p sublists of roughly equal length, • applies the original computation to each sublist, such the computation of a sublist is performed sequentially on a core, but different sublists are processed in parallel on different cores, • applies the original reduction to the per-core results. The benefit of such an execution is not only given by spawning a number of threads equal to the numbers of cores, thus reducing scheduling and switching-contexts overheads, but also optimizing the reduction depth. Originally, the reduction was applied to a list of n elements, and would require log(n) sequential steps (see reduction tree in fig. 4). In the transformed program the final (parallel) reduction is performed on a list of p elements, requiring only log(p) sequential steps.

Answer 25

``` Theorem 4 (Optimized Map-Reduce Lemma). Assume splitp :: [α] → [[α]] distributes a list into p sublists, each containing about the same number of elements. Also assume ⊙ a binary associative operator with neutral element e⊙ and f a unary function. The following identity always holds: redomap (⊙) f e⊙ ≡ ( reduce (⊙) e⊙) ◦ (map ( redomap (⊙) f e⊙ )) ◦ splitp where redomap is defined as redomap ⊙ f e⊙ ≡ (reduce ⊙ e⊙) ◦ (map f). ```

Answer 26

The maximum segment sum problem is an example. It can be transformed by computing extra baggage of information - (maximum segment sum, maximum intial sum, maximum concluding sum, total sum) and then implementing i three main steps: each input element is lifted to before mentioned quadruple of info. The result is reduced using an associative operator for computing mentioned info. Then we project, i.e select the element in the quadruple that is the result (the mss in the quadruple). Longest satisfying segment is another similar example.

Answer 27

Let E be the enhancement providing speedup, F the fraction of the program that benefits and 1-F the fraction that does not. Then Speedup(E) = T(without E)/T(with E) = 1/((1-F)+F/S) Speedup(E) <= 1/(1-F) lim S -> inf Speedup(E) = 1/(1-F)

Answer 28

In essence, Amdahl’s law shows that no matter how big the improvement is, the overall application speedup is limited by the 1 − F fraction that does not benefit from the improvement.

Answer 29

Let T1 be the time execution take on one processor and TP be the time on P processors. F is the fraction run in parallel. Speedup(P) = T1/TP = P/(F+P(1−F )) < 1/(1−F). Ideally P cores results in a Px speedup.

Answer 30

Never leave sequential any part of the program that can possibly be parallelized. In other words, when developing parallel code, we must reason as if the hardware has an unlimited/infinity number of cores.

Answer 31

Parallel random access machine.

Answer 32

1. There are P processors connected to shared memory 2. Each processsor han an unique identifier/index 0 <=i < P 3. The execution happens in single instruction multiple data fashion, i.e all cores execute in lock step 4. each parallel instruction takes unit time 5. each processor has a flag that controls whether it is active in the execution of an instruction

Answer 33

ingle instruction multiple data which means all cores execute in lock step i.e. a core can not start the next instruction until all cores have finished the current instruction.

Answer 34

In the case of an if-then-else, the processors that did not take the then branch must wait until all the other processors has finished executing the then branch, before starting to execute the else branch, and similar for the else branch.

Answer 35

The total number of operations performed to execute a program.

Answer 36

The number of sequential steps needed to execute a program.

Answer 37

If the work complexity is asymptotically equal to that of the best sequential implementation of the same program.

Answer 38

A parallel implementation that has depth D(n) and work W(n) can be simulated on a P-processor PRAM in time complexity T such that: W(n)/P ≤ T ≤ W(n)/P + D(n)

Answer 39

Second order array operator

Answer 40

The implementation is organized in two parallel steps: (1) The first step is called “Up-Sweep” and is similar with a reduction, except that the accumulation of all elements is computed in the last element of the array, rather than the first. • After the up-sweep pass, value 0 is placed in the position of the last element. (2) The second step is called “Down-Sweep”, and it propagates updates to the array’s elements in the reverse order of the up-sweep pass (i.e., reverse the arrows and the traversal of the up sweep). Each propagation requires two substeps: 2.1. the left child sends its value to its parent and updates its value to that of the parent. 2.2. the right-child value is obtained by applying the binary operator of the scan to the leftchild value and to the (old) value of parent. Please notice that the right child is in fact the parent—an in-place algorithm.

Answer 41

Lg(n) and n

Answer 42

The unit of parallel execution in CUDA which consists of 32 parallel threads.

Answer 43

In SIMD fashion

Answer 44

``` shapeArray = [ 3, 2, 4 ] flagArray = [ 1, 0, 0, 1, 0, 1, 0, 0, 0 ] valueArray = [ 1, 3, 5, 7, 8, 9, 11, 14, 15 ] ```

Answer 45

Lg n and n -> since essentially a segmented scan can be implemented in terms of a scan and the work and depth of scan is lg n and n.

Exam cards Flashcards

Study for PMPH course fall 2018 (69 cards)