Exam Preparation Questions Flashcards
NB. The exam questions will be more elaborated.
Flynn’s Taxonomy classifies computer architectures as:
A) Single/multipleCPUs anddistributed/shared memory
B) Single/multipleinstruction andsingle/multipledata
C) CPUs/GPUs anddistributed/shared memory
B) Single/multipleinstruction andsingle/multipledata
Which system requires parallel computing with message passing to write computer programs?
A) A multicorelaptop
B) A NUMA computer node
C) A multi-node computer cluster
C) A multi-node computer cluster
What is the bisection width of this network? (10 connections)
A) 2
B) 4
C) 6
B) 4
What is the advantage of a binary tree topology over a mesh topology?
A) Binary tree has a better diameter
B) Binary treehas more edges per node
C) Binary tree has a better bisection width
A) Binary tree has a better diameter
The communication complexity is …
using Distributed
addprocs(4)
N = 1600
ftr = @spawnat 2 sum(A)
s = fetch(ftr)
A) O(N)
B) O(N^2)
C) O(N^3)
B) O(N^2)
Which MPI directive would you use to send a copy of a 2D matrix to all processes from one process?
A) MPI_Bcast
B) MPI_Scatter
C) MPI_Allreduce
A) MPI.Bcast
Which one would you use to send a large message to avoid duplicating data?
A) MPI_Bsend
B) MPI_Ssend
C) MPI_Send
B) MPI_Ssend
Each process computes a set of rows of C = A * B. What is the communication complexity in each worker?
A) O(1)
B) O(N^3/P)
C) O(N^2)
C) O(N^2)
When parallelising matrix multiplication: C = A * B, which partition leads to the best computation/communication ratio?
A) Each processcomputes one row of C
B) Each process computes onecolumn of C
C) Each process computes a set of rows of C
C) Each process computes a set of rows of C
Can we use latency hiding in the Jacobi method?
A) No, because allcomputationsdepend on thepreviousiteration
B) Yes, because computation of interior value scan overlap communication
C) No, because all values depend on neighbouring values
B) Yes, because computation of interior value scan overlap communication
Which of the following suggestions does NOT solve the incorrect behaviour of the parallel Floyd algorithm?
A) Communicate row k using standard sends
B) Communicate row k using synchronous sends
C) Use MPI.Barrier in each iteration
A) Communicate row k using standard sends
Which are the data dependencies for each worker in Gaussian elimination at iteration k>1?
A) Workers withrows > kneed theentire row k
B) Workers with rows > k need all rows>= k
C) Workers with rows > k need only part of row k
C) Workers with rows > k need only part of row k
What can be done to solve the load imbalance of block partitioning in Gaussian elimination?
A) Use a cyclic distribution of rows
B) Use a replicated workers model
C) Use a 2D block partition
A) Use a cyclic distribution of rows
In parallel TSP, there may be load imbalance because
A) It is impossible todivide the searchtree evenly
B) We don’t know before hand how long the paths in the search tree are
C) Some workers might be able to do more pruning than others
C) Some workers might be able to do more pruning than others
How to cheat with speedups?
A) Use a fast compiler and a slow network
B) Use a slow compiler and a fast network
C) Use very small problem sizes
B) Use a slow compiler and a fast network