Parallel Architecture Flashcards

1
Q

Hypercube bisection width

A

p/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Square Toirodal Mesh Bisection Width

A

2*sqrt(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mesh Bisection Width

A

Min(n,m)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fully Connected Bisection Width

A

p^2/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Crossbar Bisection Width

A

p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Omega Bisection Width

A

p/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Message transimission time

A

l (latency) + n (bytes)/b (bytes/second)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Snooping Cache Coherence

A

The idea behind snooping comes from bus-based systems: When the cores share a bus, any signal transmitted on the bus can be “seen” by all the cores connected to the bus. Thus when core 0 updates the copy of x stored in its cache, if it also broadcasts this information across the bus, and if core 1 is “snooping” the bus, it will see that x has been updated, and it can mark its copy of x as invalid. This is more or less how snooping cache coherence works.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Directory Based Cache Coherence

A
  • Directory-based cache coherence protocols attempt to solve this problem through the use of a data structure called a directory.
  • The directory stores the status of each cache line.
  • Typically, this data structure is distributed; in our example, each core/memory pair might be responsible for storing the part of the structure that specifies the status of the cache lines in its local memory.
  • Thus when a line is read into, say, core 0’s cache, the directory entry corresponding to that line would be updated, indicating that core 0 has a copy of the line.
  • When a variable is updated, the directory is consulted, and the cache controllers of the cores that have that variable’s cache line in their caches will invalidate those lines.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Shared Memory vs Distributed Memory

A

Shared Memory:

Pros:
1. Implicit coordination of processors through shared data structures.
2. Appealing programming model for many programmers.
3. Generally suitable for systems with a small number of processors.

Cons:
1. Scaling interconnect can be costly.
2. Conflicts over access to the bus increase dramatically with more processors.
3. Large crossbars, while efficient, are expensive.

Distributed Memory:

Pros:
1. Relatively inexpensive interconnects like hypercube and toroidal mesh.
2. Well-suited for systems with thousands of processors.
3. Better for problems requiring vast amounts of data or computation.

Cons:
1. Requires explicit message passing for coordination.
2. More complex programming model for many programmers.
3. Not as suitable for small-scale systems with few processors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

MIMD

A
  • MIMD (Multiple Instruction, Multiple Data) systems support multiple simultaneous instruction streams operating on multiple data streams.
  • MIMD systems consist of fully independent processing units or cores, each with its own control unit and datapath.
  • MIMD systems are asynchronous, meaning processors can operate at their own pace.
  • Many MIMD systems lack a global clock and may have no relation between system times on different processors.
  • Without synchronization imposed by the programmer, even if processors execute the same sequence of instructions, they may execute different statements at any given instant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SIMD

A
  • SIMD (Single Instruction, Multiple Data) systems operate on multiple data streams by applying the same instruction to multiple data items simultaneously.
  • Abstract SIMD systems have a single control unit and multiple datapaths.
  • Instructions are broadcast from the control unit to the datapaths, where each datapath applies the instruction to a data item or remains idle.
  • An example application is “vector addition,” where two arrays with n elements each are added element-wise.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Modified State

A
  • cache block contains current value
  • all other copies contains an invalid value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Shared State

A
  • cache block has not been updated
  • cache block contains current value
  • all other copies also contains the current value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Invalid State

A
  • cache block does not contain most recent value of
    memory block
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bus Read

A

Generated by read operation of memory block not
in local cache

17
Q

Bus Read Exclusive

A
  • Generated by write operation to memory block not
    in local cache or in local cache, but not in state M
  • Memory provides the most recent value
  • All other copies are marked invalid
18
Q

Write Back

A

Cache controller writes a block marked M back to
main memory

19
Q

Local Write from M, Cache Miss

A
  • Print Write
  • Cache Miss
  • Flush your values back to main memory
  • Bus Read Ex from Main Memory
  • Write Change
20
Q

Local Write from I, Cache Miss

A
  • Print Write
  • Cache Miss
  • Flush the holder of the values
  • Make the holder invalid
  • Bus Read Ex from Main Memory
  • Local Write
  • Make current P M
21
Q

Read from I, cache miss

A
  • Print Read
  • Cache Miss
  • Flush holding processor into main memory
  • Make that processor and this one shared
  • Bus Read