finals Flashcards

1
Q

the process contains more than
one thread, and the process is accomplishing a number of
things at the same time

A

multithreaded processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

is the unit of execution within a process.

A

Thread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

each ____ has a separate memory address space, which means that a process runs independently and is isolated from other processes.

A

Process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

There can be multiple instances of a single program, and each instance of
that running program is a _______.

A

Process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Speedup Formula

A

1 / [(1 - P ) + P/N ]
P (Parallel) = Parallel / Total Sequential Time
Total Sequential Time = Seq + Par + Seq
N = number of processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

It seems that we can infinitely increase the number of processors and thus make the system run as fast as possible.

This algorithm might look like:
1. Divide pile up into stacks and hand out one stack to each person (Serial)
2. Everyone looks for the “concurrency” card (Parallel)
3. Give the card to a separate pile (Serial)

A

Amdahl’s law

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Challenges in Concurrent/Parallel

A

-Complexity of designing parallel algorithms:
* Removing task dependencies
* Can add large overheads
-Limited by memory access speeds
-Execution speed is sensitive to data
-Real world problems are most naturally described with mathematical recurrences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the Pros of Parallel

A

Improved Performance
Better Resource Utilization
Handles Large-Scale Problems
Scalability
Energy Efficiency
Fault Tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

is when tasks actually run in parallel in multiple CPUs

A

Parallelism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

is when multiple tasks can run in overlapping
periods.

A

Concurrency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cons of Sequential Computing

A

Limited Performance
Poor Scalability
Inefficient for Complex Problems
Resource Underutilization
Longer Execution Time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the sequential computing, no communication or synchronization is required between different steps of the program execution. But there is an indirect ________ of the underutilization of available processing resources

A

Overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

considered scalable if its performance improves after adding more processing
resources. In the case of sequential computing, the only way to scale the system is to increase the performance of system resources used – CPU, memory, etc.

A

Scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

It is a straightforward approach, with a clear set of step-by-step instructions about what to do and when to do it.

A

Simplicity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The serial execution of tasks is a sort of chain, where the first task is followed by the second one, the second is followed by the third, and so on. The important point here is that tasks are physically executed without overlapping time periods

A

Sequential Computing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

machines (loosely coupled multiprocessor
systems) all PEs have a local memory

A

Distributed memory MIMD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

(tightly coupled multiprocessor
systems), all the PEs are connected to a single global memory and they all
have access to it.

A

shared memory MIMD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Multicomputers
Multiprocessors

A

MIMD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Vector processors
Fine-grained data parallel

A

SIMD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

May be pipelined computers

A

MISD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Traditional von Neumann
single-CPU computers

A

SISD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

is a multiprocessor machine which is
capable of executing multiple instructions on multiple data sets.

A

MIMD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

computing system is a multiprocessor machine capable of executing different instructions on different PEs but all of them operating on the same dataset.

A

MISD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

is a multiprocessor machine capable of
executing the same instruction on all the CPUs but operating on different data streams.

A

SIMD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
is a uniprocessor machine which is capable of executing a single instruction, operating on a single data stream. All the instructions and data to be processed have to be stored in primary memory
SISD
26
is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972.
Flynn's Taxonomy
27
slowed down, creating a need for parallel computing to handle larger datasets and modern application requirements. Sequential performance improvements are slowing down due to the end of ______.
Moore's Law
28
Time Frame: 2005 to 20??
Third Crisis Problem: Sequential performance is left behind by Moore’s law
29
does not require the programmers to know anything about the processors to get good speedups
Moore’s law
30
’80s and ’90s * Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers
Object Oriented Programming * C++, C# and Java Better tools * Component Libraries Better software engineering methodology * Design patterns, specification, testing, code reviews
31
High Level Languages for von-Neumann Machines * Fortran and C * Provide “common machine language” for uniprocessors
60’s and 70s Problem: Assembly Language Programming
32
“To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem”-- E. Dijkstra, 1972 Turing Award Lecture
The Software Crisis
33
is an elegant way to manage mutexes and ensure that they are properly locked and unlocked, preventing potential deadlocks and race conditions. locks the mtx mutex when the guard object is created.
std::lock_guard
34
Many atomic operations are lock-free, meaning they don't use mutexes and are faster
Efficiency
35
std::atomic makes sure that multiple threads can safely read and write shared data
Thread Safety
36
These are operations that complete without interruption, preventing race conditions.
Atomic Operations
37
is a C++ feature that ensures safe access to shared data in multi-threaded programs without using locks.
std::atomic
38
Attempts to lock the mutex without blocking. Returns true if successful, false otherwise
try_lock()
39
Releases the mutex, allowing other threads to acquire it.
unlock()
40
Blocks the calling thread until it successfully locks the mutex
lock()
41
stands for mutual exclusion. It ensures that only one thread can access a critical section at any one time is a synchronization primitive used to protect shared data from being accessed simultaneously by multiple threads.
mutex
42
is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly
race condition
43
Swaps the thread objects t1 and t2.
t.swap(t2) / std::swap(t1, t2)
44
Suggests the OS reschedule threads (allowing others to run)
std::this_thread::yield()
45
Blocks the execution of the current thread for at least the specified sleep_duration
std::this_thread::sleep_for(relTime)
46
Pauses the current thread until the absolute time point absTime.
std::this_thread::sleep_until(absTime)
47
Returns the number of concurrent threads supported by the system.
std::thread::hardware_concurrency()
48
Returns the unique identifier for the current thread.
std::this_thread::get_id()
49
Returns the unique identifier for thread t.
t.get_id()
50
Returns true if thread t is still joinable (not detached or already joined).
t.joinable()
51
permits the thread to execute independently from the thread handle
detach()
52
waits for the thread to finish its execution
join()
53
manages a separate thread
thread (C++11)
54
std::thread with support for auto-joining and cancellation
thread (C++20)
55
suggests that the implementation reschedule execution of threads
yield
56
returns the thread id of the current thread
get_id()
57
stops the execution of the current thread for a specified time duration
sleep_for
58
stops the execution of the current thread until a specified time point
sleep_until
59
header for threads
60
Executors std::jthread Atomic smart pointers std::future extensions Latches and barriers Coroutines Transactional memory Task blocks
C++20/23 (2020/2023)
61
C++17 (2017)
Parallel STL
62
C++14 (2014)
Reader-writer locks
63
C++11 (2011)
Memory model Threads Mutexes and locks Thread local data Condition variables Tasks
64
looks similar to a deadlock in the sense that two threads are blocking each other from making progress, but the difference is that the threads are actively trying to resolve the problem. two or more threads are designed to respond to the actions of each other.
Livelock
65
generally, higher priority threads will be scheduled to execute more often and that can leave low priority threads feeling “hungry”.
Cause of Starvation
66
occurs when a thread is unable to gain access to a necessary resource, and is therefore, unable to make progress.
Starvation
67
If one thread or process acquires a lock, and then terminates because of some unexpected reason, it may not automatically release the lock before it disappears.
Abandoned
68
They're both stuck waiting on the other thread to release the other lock to make progress Each member of a group is waiting for some other member to take action, and as a result, neither member is able to make progress.
Deadlock
69
is a synchronization primitive that allows multiple threads to read from a shared resource simultaneously while ensuring exclusive access for writing.
std::shared_mutex
70
what to use when you have a lot more threads that will be reading from the shared data than the number of threads that will be writing to it, such as certain types of database applications?
shared reader-writer lock
71
can improve a program's performance versus using a standard mutex. But they are more complicated to implement and they typically use more resources to keep track of the number of readers
Read-write Lock
72
Since only one thread can have the write lock at a time, all other threads wanting to read or write will have to ...
wait until the lock becomes available again
73
that limits access to only one thread at a time, allowing that thread to safely write to the shared resource.
exclusive write mode
74
that allows multiple threads that only need to read simultaneously to lock it.
shared read mode
75
to protect a critical section of code to defend against data races, which can occur when multiple threads are concurrently accessing the same location in memory and at least one of those threads is writing to that location.
locker mutex
76
allows you to attempt to acquire a lock without blocking. If the lock is already acquired by another thread, try_lock will immediately return false.
try lock
77
is a non-blocking version of the lock or acquire method
Try lock or try enter
78
is a type of mutex that allows the same thread to acquire the lock multiple times without causing a deadlock. This can be useful in scenarios where the same thread needs to re-enter a critical section
std::recursive_mutex
79
Many programmers like using reentrant locks because
it can make things easier
80
is a particular type of mutex that can be locked multiple times by the same process or thread.
Reentrant mutex
81
are used for thread synchronization, allowing threads to wait until a particular condition is met. They are part of the library.
Conditional Variables
82
There is no possibility to ______ or ______ the counter, which makes the latch a single-use barrier.
increase,reset
83
A class that defines an exception object thrown by functions in the thread library dealing with asynchronous execution and shared states (e.g., std::future, std::promise) upon failure.
std::future_error
84
An enumeration that represents the status of a std::future object, as returned by the wait_for and wait_until member functions. Possible values are std::future_status::reaady, std::future_status::timeout, and std::future_status::deferred
std::future_status
85
A class template that wraps a callable object (like a function or lambda) and allows its result to be retrieved asynchronously via a std::future.
std::packaged_task
86
A class template similar to std::future, but allows multiple threads to wait for and retrieve the result of a shared asynchronous operation.
std::shared_future
87
A class template that allows a thread to provide a value or an exception to a std::future object. It is used to set the result of an asynchronous operation.
std::promise
88
is an enumeration that specifies the launch policy for a task executed by std::async.
std::launch
89
is a class template that provides a mechanism to access the result of asynchronous operations.
std::future
90
is a specified function asynchronously and returns a std::future object to retrieve the result.
std::async
91
is a technique that enables your program to start a potentially long-running task and still be responsive to other events while that task runs.
asynchronous programming
92
decrements the counter and blocks until it reaches zero.
arrive_and_wait
93
blocks until the counter reaches zero.
wait
94
tests if the internal counter equals zero.
try_wait
95
decrements the counter in a non-blocking manner.
count_down
96
header is used for latches.
97
When may threads block on a latch?
Threads may block on the latch until the counter is decremented to zero.
98
When is the value of the counter initialized in a latch?
The value of the counter is initialized on creation.
99
are a downward counter type that can be used to synchronize threads.
Latches
100
Decrements the initial expected count for all subsequent phases by one, and then decrements the expected count for the current phase by one.
arrive_and_drop
101
arrives at barrier and decrements the expected count by one, then blocks until current phase completes
arrive_and_wait
102
blocks at the phase synchronization point until its phase completion step is run
wait
103
arrives at barrier and decrements the expected count
arrive
104
is a function object that is called on every phase completion step.
completion
105
an unspecified object type meeting requirements of MoveConstructible, MoveAssignable, and Destructible.
arrival_token
106
this header is used for barriers in parallel computing.
107
is a group of threads or processes in the source code that means any thread/process must stop at this point and cannot proceed until all other threads/processes reach this barrier.
Barrier
108
returns the maximum possible value of the internal counter.
max
109
tries to decrement the internal counter, blocking until a point in time.
try_acquire_until
110
tries to decrement the internal counter, blocking for up to a duration time.
try_acquire_for
111
tries to decrement the internal counter without blocking.
try_acquire
112
increments the internal counter and unblocks acquirers.
release
113
destructs the counting_semaphore.
destructor
114
C++20.
semaphores
115
have a value that ranges over an unrestricted domain and is used to control access to a resource that has multiple instances.
Counting Semaphore
116
also known as a mutex lock, can have only two values - 0 and 1. Its value is initialized to 1 and is used to implement the solution of critical section problems with multiple processes.
Binary Semaphore
117
What are the two types of semaphores?
Binary Semaphore and Counting Semaphore.
118
used to solve the critical section problem and to achieve process synchronization in a multiprocessing environment.
Semaphore
119
The fourth and final stage of our parallel design process, where we specify where each of the tasks we established will actually execute. really becomes a factor if you're using a distributed system or specialized hardware with lots of parallel processors for large scale problems, like in scientific computing applications.
Mapping
120
Splits the program into a small number of large tasks
Coarse-grained Parallelism
121
a benefit where lots of small tasks can be more evenly distributed among processors to maximize their usage
Load Balancing
122
A program is broken down into a large number of small tasks
Fine-grained Parallelism
123
In the first two stages of our parallel design process we partitioned the problem into a set of separate tasks, and then established communication to provide those tasks with the data they needed.
Agglomeration
124
is the amount of data that can be communicated per unit of time, expressed in some unit of bytes per second
Bandwidth
125
the time it takes for a message to travel from point A to B expressed in units of time like microseconds
Latency
126
Often referred to as nonblocking communications because, after a task sends an asynchronous message, it can begin doing other work immediately, regardless of when the receiving task actually gets that message.
Asynchronous Communications
127
Sometimes called blocking communications because all tasks involved have to wait until the entire communication process is completed to continue doing other work.
Synchronous Communications
128
process is to establish communication, which involves figuring out how to coordinate execution and share data between the task.
Communication
129
Rather than focusing on the data being manipulated, functional decomposition begins by considering all of the computational work that a program needs to do and then divides that into separate tasks that perform different portions of the overall work.
Functional Decomposition
130
Focuses on dividing the data associated with the problem into lots of small and if possible, equally sized partitions. Programmers typically start with this because it forms the foundation for a lot of parallel algorithms
Domain Decomposition
131
is about breaking down the problem into discrete chunks of work that can be distributed to multiple tasks.
partitioning
132
OpenACC is meant to be easy to use and easy to learn. Programmer remains in familiar C, C++, or Fortran. No reason to learn low-level details of the hardware.
Low Learning Curve
133
Rebuild the same code on multiple architectures. Compiler determines how to parallelize for the desired machine. Sequential code is maintained.
Single Source
134
Maintain existing sequential code. Add annotations to expose parallelism. After verifying correctness, annotate more of the code.
Incremental
135
directives tell the compiler or runtime to...
- Generate parallel code for GPU - Allocate GPU memory and copy input data - Execute parallel code on GPU - Copy output data to CPU and deallocate GPU memory ## Footnote Here’s an even simpler way to remember it: "Copy → Run → Copy Back → Free" Copy → Send data to GPU Run → Execute parallel code on GPU Copy Back → Get results from GPU Free → Release GPU memory
136
parallel program designed for performance and portability
OpenAcc
137
designed to simplify GPU/Multicore programming
OpenACC
138
is a programming standard for parallel computing on accelerators (mostly on NIVDIA GPU).
OpenAcc
139
CPU and GPU has _______ memories
Seperated
140
between CPU and GPU is required for programming.
Data Transfer
141
Computer program can be _____ and ______ on GPU.
parallelized and accelerated
142
Nowadays GPUs are used for tasks that were formerly the domain of CPUs, such as scientific computation. This kind of GPU is called general-purpose GPU.
GPGPU
143
Dedicated for manipulating computer graphics and image processing. Traditionally known as a 'video card'.
GPU
144
What are the requirements for pervasive systems?
• Embrace contextual changes. • Encourage ad hoc composition. • Recognize sharing as the default.
145
Once a transaction commits, the changes are permanent.
Durable
146
Concurrent transactions do not interfere with each other.
Isolated
147
The transaction does not violate system invariants.
Consistent
148
To the outside world, the transaction happens indivisibly.
Atomic
149
What are some false assumptions made by first-time developers?
• The network is reliable. • The network is secure. • The network is homogeneous. • The topology does not change. • Latency is zero. • Bandwidth is infinite. • Transport cost is zero. • There is one administrator.
150
Scalability Problems
No machine has complete information about the system state. Machines make decisions based only on local information. Failure of one machine does not ruin the algorithm. There is no implicit assumption that a global clock exists.
151
Doing routing based on complete information.
Centralized Algorithms
152
A single online telephone book.
Centralized Data
153
A single server for all users.
Centralized Services
154
Hides the failure and recovery of a resource.
Failure
155
Hides that a resource may be shared by several competitive users.
Concurrency
156
Replication Hides that a resource is replicated.
Replication
157
Hides that a resource may be moved to another location while in use.
Relocation
158
Hides that a resource may move to another location.
Migration
159
Hides where a resource is located.
Location
160
Hides differences in data representation and how a resource is accessed.
Access
161
A layer that extends over multiple machines and offers each application the same interface.
Middleware
162
A collection of independent computers that appears to its users as a single coherent system.
Distributed System