11-16 Flashcards

(153 cards)

1
Q

Thread: A new abstraction for __.

A

a single running process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multi-threaded program:

(3)…

A

• A multi-threaded program has more than one point of execution
- Multiple program counters, one for each thread
• They share the same address space
• Each thread has its own stack
• Each thread has its own private set of registers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Context switch between threads:
Each thread has its own __.
- …

A

program counter and set of registers;

One or more thread control blocks (TCBs) are needed to store the state of each thread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Context switch between threads:
When switching from running one thread (T1) to running the another (T2):
(3)…

A
  • The register state of T1 be saved
  • The register state of T2 restored
  • The address space remains the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why Use Threads?:

(2)…

A
  1. Parallelism
    • Divide a task among several threads
    • On a system with multiple processors threads can work in parallel with each other
  2. Overlap of I/O with other activities within a single program
    • When one thread requests I/O from the system, switch to another thread ready to work
    • Similar to multiprogramming across different processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Parallelism:

(2)…

A
  • Divide a task among several threads

* On a system with multiple processors threads can work in parallel with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Overlap of I/O with other activities within a single program:
(2)…

A
  • When one thread requests I/O from the system, switch to another thread ready to work
  • Similar to multiprogramming across different processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The stack of the relevant thread:
There will be one stack per thread:
A Single-Threaded Address Space:
(3)…

A
  1. The code segment:
    where instructions live
  2. The heap segment:
    contains malloc’d data dynamic data structures (it grows downward)
  3. The stack segment:
    (it grows upward) contains local variables arguments to routines, return values, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Example: Creating a Thread

A
#include [LTS]pthread.h>
void *mythread(void *arg) {
printf("%s\n", (char *) arg);
return NULL;
}
int main(int argc, char *argv[]) {
if (argc != 1) {
fprintf(stderr, "usage: main\n");
exit(1);
}
pthread_t p1, p2;
printf("main: begin\n");
Pthread_create(&p1, NULL, mythread, "A");
Pthread_create(&p2, NULL, mythread, "B");
// join waits for the threads to finish
Pthread_join(p1, NULL);
Pthread_join(p2, NULL);
printf("main: end\n");
return 0;
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
How to create and control threads?: 
#include [LTS]pthread.h>
int
pthread_create( pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg);
• thread: \_\_
• attr: \_\_
- \_\_
• start_routine: \_\_
• arg: \_\_
- a void pointer allows us to \_\_
• return value: \_\_
A

• thread: Used to interact with this thread
• attr: Used to specify any attributes this thread might have
- Stack size, Scheduling priority, …
• start_routine: the function this thread start running in
• arg: the argument to be passed to the function (start routine)
- a void pointer allows us to pass in any type of argument
• return value: on success, returns 0; on error, it returns an error number, and the contents of *thread are undefined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
Wait for a thread to complete: 
int pthread_join(pthread_t thread, void **value_ptr);

• thread: __
• value_ptr: __
- Because pthread_join() routine changes the value, __

A

• thread: Specify which thread to wait for
• value_ptr: A pointer to the return value
- Because pthread_join() routine changes the value, you need to pass in a pointer to that value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Example: Dangerous code
• Be careful with __ from a thread
(2)…

A

how values are returned

  1. Variable r is allocated on the stack
  2. ## When the thread returns, r is automatically de-allocatedvoid *mythread(void *arg) {
    myarg_t *m = (myarg_t *) arg;
    printf(“%d %d\n”, m->a, m->b);
    myret_t r; // Danger!!!!! Why is this bad?
    r.x = 1;
    r.y = 2;
    return (void *) &r;
    8 }
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example: Simpler Argument Passing to a Thread

A
Just passing in a single value:
void *mythread(void *arg) {
int m = (int) arg;
printf(“%d\n”, m);
return (void *) (arg + 1);
}
int main(int argc, char *argv[]) {
pthread_t p;
int rc, m;
pthread_create(&p, NULL, mythread, (void *) 100);
pthread_join(p, (void **) &m);
printf(“returned %d\n”, m);
return 0;
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Threading: Shared Variables

A
static volatile int counter = 0; // shared global variable
// mythread()
// Simply adds 1 to counter repeatedly, in a loop. No, this is not how you would add 10,000,000 to a counter,
// but it shows the problem nicely.
void *mythread(void *arg) {
printf("%s: begin\n", (char *) arg);
int i;
for (i = 0; i < 1e7; i++) {
counter = counter + 1;
}
printf("%s: done\n", (char *) arg);
return NULL;
// main()
// Just launches two threads (pthread_create)
// and then waits for them (pthread_join)
int main(int argc, char *argv[]) {
pthread_t p1, p2;
printf("main: begin (counter = %d)\n", counter);
Pthread_create(&amp;p1, NULL, mythread, "A");
Pthread_create(&amp;p2, NULL, mythread, "B");
Pthread_join(p1, NULL); // join waits for the threads to finish
Pthread_join(p2, NULL);
printf("main: done with both (counter = %d)\n", counter);
return 0;
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Race condition:
Example with two threads and counter = 50
• counter = __
• We expect the result to be __

A

counter + 1 (runs twice, once for each thread);

52

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Critical section:

(3)…

A
  • A piece of code that accesses a shared variable and must not be concurrently executed by more than one thread
  • Multiple threads executing critical section can result in a race condition
  • Need to support atomicity for critical sections (mutual exclusion)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Locks ensure that any such critical section executes as if…

A

• An example: the canonical update of a shared variable:
balance = balance + 1;

• Add some code around the critical section:
lock_t mutex; // some globally-allocated lock ‘mutex’

lock(&mutex);
balance = balance + 1;
unlock(&mutex);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Locks provide __ to a critical section.

A

  1. Interface
    int pthread_mutex_lock(pthread_mutex_t *mutex);
    int pthread_mutex_unlock(pthread_mutex_t *mutex);
  2. Usage (w/o lock initialization and error check)
    pthread_mutex_t lock;
    pthread_mutex_lock(&lock);
    x = x + 1; // or whatever your critical section is
    pthread_mutex_unlock(&lock);

  • No other thread holds the lock → the thread will acquire the lock and enter the critical section
  • If another thread hold the lock → the thread will not return from the call until it has acquired the lock
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
  • No other thread holds the lock → __

* If another thread hold the lock → __

A

the thread will acquire the lock and enter the critical section;
the thread will not return from the call until it has acquired the lock

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Lock Initialization:
All locks must be __.

2 ways:

A

  1. One way: using PTHREAD_MUTEX_INITIALIZER
  2. The dynamic way: using pthread_mutex_init()
    pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
    int rc = pthread_mutex_init(&lock, NULL);
    assert(rc == 0); // always check success!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Lock Error Checking:

• Always check the __.

A
return value for errors when calling lock and unlock
---
• An example wrapper
// Use this to keep your code clean but check for failures
// Only use if exiting program is OK upon failure
void Pthread_mutex_lock(pthread_mutex_t *mutex) {
int rc = pthread_mutex_lock(mutex);
assert(rc == 0);
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Condition variables are useful when __.

A

int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);
  1. pthread_cond_wait
    (2) …
  2. pthread_cond_signal
    (1) …
A
  • Put the calling thread to sleep
  • Wait for some other thread to signal it;

• Unblock at least one of the threads that are blocked on the condition variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
Condition Variable Example: 
A thread calling wait routine: 
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;
pthread_mutex_lock(&amp;lock);
while (initialized == 0)
pthread_cond_wait(&amp;init, &amp;lock);
pthread_mutex_unlock(&amp;lock);

(2)…

A

• The wait call releases the lock when putting said caller to sleep
• Before returning after being woken, the wait call re-acquire the lock

• A thread calling signal routine

pthread_mutex_lock(&lock);
initialized = 1;
pthread_cond_signal(&init);
pthread_mutex_unlock(&lock);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Threads: Compiling and Running: To compile, you must __.
include the header pthread.h • Explicitly link with the pthreads library, by adding the –pthread flag: prompt> gcc –o main main.c –Wall -pthread • For more information: man –k pthread
26
Thread API Guidelines: | (8)...
1. Keep it simple • Tricky thread interactions lead to bugs 2. Minimize thread interactions • Each interaction should be carefully thought out and constructed with known design patters (which we will learn about) 3. Initialize locks and condition variables • Otherwise code can sometime fail in strange and unpredictable ways 4. Check your return codes • Failure to do so will lead to bizarre and hard to understand behavior 5. Be careful how you pass arguments and return values • If you returning a reference to a variable on the stack, you are going to have a bad time 6. Each thread has its own stack • To share data between threads, the values must be in the heap or a global variable 7. Always use condition variables to signal between threads • It’s tempting to use a simple flag, but don’t do it 8. Use the man pages • Highly informative with good examples
27
Locks: The Basic Idea Lock variable holds the state of the lock: (2)...
1. available (or unlocked or free) • No thread holds the lock 2. acquired (or locked or held) • Exactly one thread holds the lock and presumably is in a critical section
28
lock(): | (3)...
• Try to acquire the lock • If no other thread holds the lock, the thread will acquire the lock • Enter the critical section - This thread is said to be the owner of the lock
29
Other threads are prevented from entering the critical section while __. (2)...
the first thread holds the lock; 1. Other threads will block on the call to lock, until the lock is released 2. If several threads are waiting on the lock, only one will get it when it is released
30
Pthread Locks - mutex: | (3)...
• The name that the POSIX library uses for a lock • Used to provide mutual exclusion between threads pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; Pthread_mutex_lock(&lock); // wrapper for pthread_mutex_lock() balance = balance + 1; Pthread_mutex_unlock(&lock); • We may be using different locks to protect different variables → Increase concurrency (a more fine-grained approach)
31
Building A Lock: | (2)...
* Efficient locks provided mutual exclusion at low cost | * Building a lock need some help from the hardware and the OS
32
Evaluating locks – Basic criteria: | (3)...
1. Mutual exclusion • Does the lock work, preventing multiple threads from entering a critical section? 2. Fairness • Does each thread contending for the lock get a fair shot at acquiring it once it is free? (Starvation) 3. Performance • The time overheads added by using the lock
33
Controlling Interrupts: | Disable Interrupts for __.
critical sections
34
Controlling Interrupts: Disable Interrupts for critical sections: (2)...
``` • One of the earliest solutions used to provide mutual exclusion • Invented for single-processor systems --- void lock() { DisableInterrupts(); } void unlock() { EnableInterrupts(); } ```
35
Disable Interrupts for critical sections: • One of the earliest solutions used to provide mutual exclusion • Invented for single-processor systems Problems: (3)...
• Require too much trust in applications - Greedy (or malicious) program could monopolize the processor • Do not work on multiprocessors • Code that masks or unmasks interrupts is executed slowly by modern CPUs
36
Locks: | Why is hardware support needed?
``` First attempt: Using a flag denoting whether the lock is held or not • The code below has problems 1 typedef struct __lock_t { int flag; } lock_t; 2 3 void init(lock_t *mutex) { 4 // 0 → lock is available, 1 → held 5 mutex->flag = 0; 6 } 7 8 void lock(lock_t *mutex) { 9 while (mutex->flag == 1) // TEST the flag 10 ; // spin-wait (do nothing) 11 mutex->flag = 1; // now SET it ! 12 } 13 14 void unlock(lock_t *mutex) { 15 mutex->flag = 0; 16 } ``` • Problem 1: No Mutual Exclusion (assume flag=0 to begin) • Problem 2: Spin-waiting wastes time waiting for another thread • So, we need an atomic instruction supported by hardware - test-and-set instruction, also known as atomic exchange
37
Test And Set (Atomic Exchange): An instruction to support the creation of simple locks ``` int TestAndSet(int *ptr, int new) { int old = *ptr; // fetch old value at ptr *ptr = new; // store ‘new’ into ptr return old; // return the old value } ``` (4)...
• return (test) old value pointed to by the ptr • Simultaneously update (set) said value to new • This sequence of operations is performed atomically • x86_64: - xchg rax, (mem)
38
Evaluating Spin Locks: 1. Correctness: ... 2. Fairness: ... 3. Performance: ...
1. Correctness: yes • The spin lock only allows a single thread to entry the critical section 2. Fairness: no • Spin locks don’t provide any fairness guarantees • Indeed, a thread spinning may spin forever 3. Performance: • For a single CPU, performance overheads can be quite painful • If the number of threads roughly equals the number of CPUs, spin locks work reasonably well
39
Compare-And-Swap: Test whether the value at the address(ptr) is equal to expected (3)...
• If so, update the memory location pointed to by ptr with the new value • In either case, return the actual value at that memory location • x86_64 - cmpxchg
40
Fetch-And-Add
Atomically increment a value while returning the old value at a particular address
41
__ can be built with fetch-and-add | • __ → fairness
Ticket lock; | Ensure progress for all threads
42
Hardware-based spin locks are __ and they __.
simple; | work
43
Spinning: • Hardware-based spin locks are simple and they work • In some cases, these solutions can be quite inefficient - Any time a thread gets caught spinning, it __
wastes an entire time slice doing nothing but checking a value
44
A Simple Approach: Just Yield: | When you are going to spin, __:
give up the CPU to another thread; 1. OS system call moves the caller from the running state to the ready state 2. The cost of a context switch can be substantial and the starvation problem still exists
45
Using Queues: Sleeping Instead of Spinning: Queue to __. • park() - __ • unpark(threadID) - __
keep track of which threads are waiting to enter the lock; • park() - Put a calling thread to sleep • unpark(threadID) - Wake a particular thread as designated by threadID
46
Wakeup/waiting race: Thread A releases the lock just right before Thread B calls park() • Thread B could potentially __. • Solaris solves this problem by adding a third system call: (2)...
sleep forever; setpark() • By calling this routine, a thread can indicate it is about to park • If the thread happens to be interrupted and the lock is freed before park is actually called, the subsequent park returns immediately instead of sleeping
47
Two-Phase Locks: | A two-phase lock realizes that spinning can be useful if __.
the lock is about to be released
48
``` Two-Phase Locks: A two-phase lock realizes that spinning can be useful ifthe lock is about to be released. 1. First phase: (2)... 2. Second phase: (2)... ```
1. First phase • The lock spins for a while, hoping that it can acquire the lock • If the lock is not acquired during the first spin phase, a second phase is entered, 2. Second phase • The caller is put to sleep • The caller is only woken up when the lock becomes free later
49
Condition Variables: | (2)...
• There are many cases where we wish to have coordination between threads • A thread wishes to check whether a condition is true before continuing its execution --- • Example: • A parent thread might wish to check whether a child thread has completed • This is often called a join()
50
How to wait for a condition: | __.
Condition variable – an object used to wait for some condition to be true
51
Condition variable – an object used to wait for some condition to be true: 1. Waiting on the condition variable (2) ... 2. Signaling on the condition variable (1) ...
1. Waiting on the condition variable: • An explicit queue that threads can put themselves on when some state of execution is not as desired • The thread is no longer running, freeing up the CPU to run another thread 2. Signaling on the condition variable: • Some other thread, when it changes said state, can wake one of those waiting threads and allow them to continue
52
The wait() call takes a __ as a parameter
mutex
53
``` The wait() call takes a mutex as a parameter (3)... ```
* The wait() call release the lock and put the calling thread to sleep * When the thread wakes up, it must re-acquire the lock * It is assumed the thread is holding the lock with signal() is called
54
Parent waiting for Child: Use a condition variable: Parent: (2)...
1. Create the child thread and continues running itself 2. Call into thr_join() to wait for the child thread to complete • Acquire the lock • Check if the child is done • Put itself to sleep by calling wait() • Release the lock
55
Parent waiting for Child: Use a condition variable: Child: (2)...
1. Print the message “child” 2. Call thr_exit() to wake the parent thread • Grab the lock • Set the state variable done • Signal the parent thus waking it
56
The Importance of the State Variable done: Can you think of a scenario where we could run into problems? (3)...
Imagine the case where the child runs immediately • The child will signal, but there is no thread sleeping on the condition • When the parent runs, it will call wait and be stuck • No thread will ever wake it, sad!
57
``` 1 volitile int done = 0; 2 3 void thr_exit() { 4 done = 1; 5 Pthread_cond_signal(&c); 6 } 7 8 void thr_join() { 9 if (done == 0) 10 Pthread_cond_wait(&c); 11 } ``` ``` 25 int main(int argc, char *argv[]) { 26 printf("parent: begin\n"); 27 pthread_t p; 28 Pthread_create(&p, NULL, child, NULL); 29 thr_join(); 30 printf("parent: end\n"); 31 return 0; 32 } ``` Can you find the bug? ...
The issue here is a race condition 1. The parent calls thr_join() • The parent checks the value of done • It will see that it is 0 and try to go to sleep • Just before it calls pthread_cond_wait to go to sleep, the parent is interrupted and the child runs 2. The child changes the state variable done to 1 and signals • But no thread is waiting and thus no thread is woken • When the parent runs again, it sleeps forever, sad!
58
The Producer / Consumer (Bounded Buffer) Problem: 1. Producer (2) ... 2. Consumer (1) ...
1. Producer • Produces data items • Wishes to place data items in a buffer 2. Consumer • Grabs data items out of the buffer to consume them in some way --- Example: Multi-threaded web server • A producer puts HTTP requests into a work queue • Consumer threads take requests out of this queue and process them
59
``` Producer/Consumer (non-working): • Put: ... • Get: ... • Producer: ... • Consumer: ... • Need... ```
* Put -- Only put data into the buffer when count is zero (i.e., when the buffer is empty) * Get -- Only get data from the buffer when count is one (i.e., when the buffer is full) * Producer -- puts an integer into the shared buffer loops number of times * Consumer -- gets the data out of that shared buffer * Need synchronization between the producer and consumer
60
Producer/Consumer: Single CV and If Statement: 1. A single condition variable cond and associated lock mutex • Works if __. 2. What happens if that is not the case (e.g., 2 consumers, 1 producer)? (3)... 3. Recheck state (in a while loop) upon returning from wait!
there is one producer and one consumer; * C1 runs and waits, P1 puts an item in and signals C1 * Before C1 gets to run, C2 sneaks in and consumes the item, setting count to 0 * When C1 runs, no more items left, sad!
61
Thread Trace: Broken Solution The problem arises for a simple reason: ...
After the producer woke T_c1_, but before T_c1_ ever ran, the state of the bounded buffer changed by T_c2_.
62
There is no guarantee that when the woken thread runs, the state will still be as desired -> __.
Mesa semantics • Virtually every system ever built employs Mesa semantics
63
Hoare semantics provides a stronger guarantee that...
the woken thread will run immediately upon being woken
64
Producer/Consumer: Single CV and While ``` 1. 1 cond_t cond; 2 mutex_t mutex; 3 4 void *producer(void *arg) { 5 int i; 6 for (i = 0; i < loops; i++) { 7 Pthread_mutex_lock(&mutex); 8 while (count == 1) 9 Pthread_cond_wait(&cond, &mutex); 10 put(i); 11 Pthread_cond_signal(&cond); 12 Pthread_mutex_unlock(&mutex); 13 } 14 } ``` ``` 2. 16 void *consumer(void *arg) { 17 int i; 18 for (i = 0; i < loops; i++) { 19 Pthread_mutex_lock(&mutex); 20 while (count == 0) 21 Pthread_cond_wait(&cond, &mutex); 22 int tmp = get(); 23 Pthread_cond_signal(&cond); 24 Pthread_mutex_unlock(&mutex); 25 printf("%d\n", tmp); 26 } 27 } ``` This fixes our previous problem, however, this code still has a bug (5)...
• Assume two consumers and one producer • C1 runs, finds the buffer empty and waits, C2 runs, finds the buffer empty and waits • P1 runs, produces an item, signals, and waits because buffer is full • C1 wakes (from P1 signal) and consumes the buffer, signals, and then waits - Who gets the signal, P1 or C2? • C2 wakes, finds the buffer empty and waits – everyone is sleeping, sad!
65
The single Buffer Producer/Consumer Solution
Use two condition variables and while loops • Producer threads wait on the condition empty, and signals fill • Consumer threads wait on fill and signal empty
66
The Final Producer/Consumer Solution | ...
More concurrency and efficiency • Add more buffer slots • Allow concurrent production or consuming to take place • Reduce context switches
67
Covering Conditions: • Assume we have implemented a multi-threaded memory allocator • Also, assume there are zero bytes are currently free: (4)...
• Thread T_a_ calls allocate(100) • Thread T_b_ calls allocate(10) • Both T_a_ and T_b_ wait on the condition and go to sleep • Thread T_c_ calls free(50) - Which waiting thread should be woken up?
68
Covering Conditions Solution: | (2)...
1. Replace pthread_cond_signal() with pthread_cond_broadcast() 2. pthread_cond_broadcast() • Wake up all waiting threads • Cost: too many threads might be woken up • Threads that shouldn’t be awake will simply wake up, re-check the condition, and then go back to sleep
69
We have a new synchronization primitive beyond locks: | ...
Condition variables
70
Allow for a thread to sleep when some program state is not as desired - Once sleeping, another thread must __.
wake up the thread by signal/broadcast
71
Condition variables are used in conjunction with a lock | (2)...
* When waiting on the CV, the lock is (temporarily) given up | * While returning from the wait, the thread re-acquires the lock
72
When a thread is signaled, it may not wake up right away | (2)...
* The state of the world may have changed | * Recheck your state (in a while loop) upon returning from wait if there is any chance the state may have changed
73
Adding threads is a good way to __.
parallelize your program
74
Adding threads is a good way to parallelize your program. Must be done correctly however: (3)...
* Adding locks to a data structure makes the structure thread safe * How locks are added determine both the correctness and performance of the data structure * Adding threads may actually slow down your code
75
Add a single lock: | The lock is acquired when __.
calling a routine that manipulates the data structure
76
Perfect Scaling: Ideally threads complete just as quickly on multiple processors as a single thread does on one (2)...
* Even though more work is done, it is done in parallel * The time taken to complete the task on each core is not increased For our example: • Single thread on one core: about 0.03 seconds • Two threads running concurrently: about 5 seconds
77
The performance costs of the simple approach: Each thread updates a single shared counter: (2)...
* Each thread updates the counter one million times | * iMac with four Intel 2.7GHz i5 CPUs
78
Synchronized counter scales __
poorly
79
The approximate counter works by representing: | (3)...
• A single logical counter, via numerous local physical counters, one per CPU core • A single global counter • There are multiple locks - One for each local counter and one for the global counter Example: on a machine with four CPUs • Four local counters • One global counter
80
The basic idea of approximate counting: When a thread running on a core wishes to increment the counter: (3)...
• It increments its local counter • Each CPU has its own local counter - Threads across CPUs can update local counters without contention - Therefore counter updates are scalable • The local values are periodically transferred to the global counter - Acquire the global lock - Increment it by the local counter’s value - The local counter is then reset to zero
81
``` Approximation Threshold: How often the local-to-global transfer occurs is determined by threshold S: 1. The smaller S: (1)... 2. The bigger S: (2)... ```
Approximation Threshold: How often the local-to-global transfer occurs is determined by threshold S: 1. The smaller S: • The more the counter behaves like the non-scalable counter 2. The bigger S: • The more scalable the counter • The further off the global value might be from the actual count - Worst case: S * NUMCPUS
82
Approximate counter example: Tracing the Approximate Counters (3)...
* The threshold S is set to 5 * There are threads on each of four CPUs * Each thread updates their local counters 𝐿1… 𝐿4
83
Importance of the threshold value S: Each four threads increments a counter 1 million times on four CPUs: • Low S → __ • High S → __
* Low S → Performance is poor, the global count is always quire accurate * High S → Performance is excellent, the global count lags
84
Concurrent Linked Lists: 1. The code __ a lock in the insert routine upon entry 2. The code __ the lock upon exit (3) ...
acquires; releases --- • If malloc() happens to fail, the code must also release the lock before failing the insert • This kind of exceptional control flow has been shown to be quite error prone • Solution: The lock and release only surround the actual critical section in the insert code
85
Hand-over-hand locking (lock coupling) | (3)...
• Add a lock per node of the list instead of having a single lock for the entire list • When traversing the list: - First grabs the next node’s lock - And then releases the current node’s lock • Enable a high degree of concurrency in list operations - However, in practice, the overheads of acquiring and releasing locks for each node of a list traversal is prohibitive
86
Michael and Scott Concurrent Queues: There are two locks: (3)...
* One for the head of the queue * One for the tail * The goal of these two locks is to enable concurrency of enqueue and dequeue operations
87
Michael and Scott Concurrent Queues: Add a dummy node: (2)...
* Allocated in the queue initialization code | * Enable the separation of head and tail operations
88
Concurrent Hash Table: Simple hash table: (3)...
* Does not resize * Built using the concurrent lists * It uses a lock per hash bucket each of which is represented by a list
89
We looked at a few of the concurrent data structures out there: (4)... Tips: (3)...
• Counters • Lists • Queues • Hash Tables --- • Be careful with acquiring and releasing locks around control flow changes • Enabling more concurrency does not necessarily increase performance • Premature optimization is the root of all evil!
90
``` Semaphore: • Created by Dijkstra to be a single primitive for synchronization - Can use as __. • An object with __. • We can manipulate with two routines: (2)... ```
both locks and condition variables; an integer value associated with it; • sem_wait() • sem_post()
91
#include [LTS]semaphore.h> sem_t s; sem_init(&s, 0, 1); // initialize s to the value 1 Must initialize before use: (2)...
* Declare a semaphore s and initialize it to the value 1 | * The second argument, 0, indicates that the semaphore is shared between threads in the same process
92
sem_wait(sem_t *s) | (3)...
• Decrements the integer value of the semaphore by 1 • If the value is negative the semaphore will wait - It will cause the caller to suspend execution waiting for a subsequent post - Similar to a cond_wait() • If the value of the semaphore (after the decrement) is positive or zero, return right away
93
sem_post(sem_t *s) | (2)...
* Increments the value of the semaphore by 1 | * If there is any threads waiting on the semaphore, wake one of them up
94
When negative, the value of the semaphore is __.
the number of threads waiting on the semaphore
95
Semaphores can be used to provide mutual exclusion: 1 sem_t m; 2 sem_init(&m, 0, X); // initialize semaphore to X; what should X be? 3 4 sem_wait(&m); 5 //critical section here 6 sem_post(&m); What should the semaphore above be initialized to? This is known as a __.
• The semaphore should be initialized to 1; binary semaphore • Works the same as a lock
96
The Producer/Consumer (Bounded-Buffer) Problem: | (2)...
• This works for a single producer and consumer • What is we have multiple producers or consumers? - We have a race condition - Need to add mutual exclusion for the calls to put() and get()
97
Reader-Writer Locks: Imagine a number of concurrent list operations, including inserts and simple lookups: 1. insert: (2)... 2. lookup: (2) ...
1. insert • Change the state of the list • A traditional critical section makes sense 2. lookup • Simply read the data structure • As long as we can guarantee that no insert is on-going, we can allow many lookups to proceed concurrently This special type of lock is known as a reader-writer lock
98
A Reader-Writer Locks: | Only a __ can acquire the lock
single writer
99
Once a reader has acquired a read lock: | (2)...
* More readers will be allowed to acquire the read lock too | * A writer will have to wait until all readers are finished
100
A Reader-Writer Locks: What about fairness?: (2)...
* It would be relatively easy for reader to starve writer | * A more sophisticated scheme could prevent this
101
The Dining Philosophers: Assume there are five “philosophers” sitting around a table (3)... Key challenges: (3)...
1. • Between each pair of philosophers is a single fork (five total) • The philosophers each have times where they think, and don’t need any forks, and times where they eat • In order to eat, a philosopher needs two forks, both the one on their left and the one on their right 2. Key challenges: • There is no deadlock • No philosopher starves and never gets to eat • Concurrency is high
102
* Philosopher p wishes to refer to the for on their left → call __ * Philosopher p wishes to refer to the for on their right → call __
left(p); right(p)
103
We need some semaphores, one for each fork: sem_t forks[5] ``` 1 void getforks() { 2 sem_wait(forks[left(p)]); 3 sem_wait(forks[right(p)]); 4 } 5 6 void putforks() { 7 sem_post(forks[left(p)]); 8 sem_post(forks[right(p)]); 9 } ``` Deadlock occurs: (2)...
1. If each philosopher happens to grab the fork on their left before any philosopher can grab the fork on their right 2. Each will be stuck holding one fork and waiting for another, forever
104
A Solution: Breaking The Dependency: Change how forks are acquired: • Let’s assume that philosopher 4 acquire the forks in a different order ``` 1 void getforks() { 2 if (p == 4) { 3 sem_wait(forks[right(p)]); 4 sem_wait(forks[left(p)]); 5 } else { 6 sem_wait(forks[left(p)]); 7 sem_wait(forks[right(p)]); 8 } 9 } ``` (2)...
* There is no situation where each philosopher grabs one fork and is stuck waiting for another * The cycle of waiting is broken
105
Thread throttling: | Used to prevent __.
“too many” threads from doing something all at once
106
Thread throttling: | Limit __.
the number of concurrent threads with a threshold semaphore • Throttling, a form of admission control --- Example: • Hundreds of threads solving a parallel problem • One area of the code is memory-intensive • If all threads are allowed into this area, machine will start swapping and thrashing Solution: • Add a semaphore initialized to the maximum number of threads allowed in the memory-intensive area • Put a sem_wait() and sem_post() around the memory-intensive area
107
Build our own version of semaphores called Zemaphores: | ...
Doesn't maintain the invariant that a negative value is a count of threads waiting on the semaphore • The value is never lower than zero • This behavior is easier to implement and matches the current Linux implementation
108
We need to synchronize for correctness: | (2)...
* Unsynchronized code can cause incorrect behavior | * But too much synchronization means threads spend a lot of time waiting, not performing useful work
109
Getting synchronization right is hard: | (2)...
* Testing isn’t enough | * Need to assume worst case: all interleavings are possible
110
How to choose between locks, semaphores and condition variables?:
1. Locks are very simple and suitable for many cases • Issues: Maybe not the most efficient solution • E.g., can’t allow multiple readers but one writer inside a standard lock 2. Condition variables allow threads to sleep until an even occurs • Just remember the state of the world might have changed since the signal was called 3. Semaphores provide pretty general functionality • But can be tricky to get correct
111
Common Concurrency Problems: 1. We’ve briefly talked about deadlock (2) ...
* Lots of early research focused on this | * We’ll dive in a bit more deeply today
112
What Types Of Bugs Exist?: Focus on four major open-source applications (4)...
MySQL, Apache, Mozilla, OpenOffice
113
Non-deadlock bugs make up __.
a majority of concurrency bugs
114
Two major types of non-deadlock bugs: | (2)...
* Atomicity violation | * Order violation
115
Atomicity-Violation Bugs: The desired serializability among multiple memory accesses is __. Solution: __.
violated; Simply add locks around the shared-variable references --- • Simple Example found in MySQL: • Two different threads access the field proc_info in the struct thd
116
Order-Violation Bugs: The desired order between two memory accesses is __. Solution: __.
flipped; enforce ordering using condition variables --- • I.e., A should always be executed before B, but the order is not enforced during execution • Example: - The code in Thread2 seems to assume that the variable mThread has already been initialized (and is not NULL) ``` 1 Thread1:: 2 void init(){ 3 mThread = PR_CreateThread(mMain, …); 4 } 5 6 Thread2:: 7 void mMain(…){ 8 mState = mThread->State 9 } ```
117
Deadlock Bugs: The presence of a cycle in a resource-allocation graph: (2)...
* Thread1 is holding a lock L1 and waiting for another one, L2. * Thread2 that holds lock L2 is waiting for L1 to be release.
118
Why Do Deadlocks Occur? | (2)...
1. Reason 1: • In large code bases, complex dependencies arise between components 2. Reason 2: • Due to the nature of encapsulation - Hide details of implementations and make software easier to build in a modular way - Such modularity does not mesh well with locking
119
``` Why Do Deadlocks Occur? Example: Java Vector class and the method AddAll() ``` Vector v1,v2; v1.AddAll(v2); Locks for both the vector being added to (v1) and the parameter (v2) need to be acquired: (2)...
* The routine acquires said locks in some arbitrary order (v1 then v2) * If some other thread calls v2.AddAll(v1) at nearly the same time → We have the potential for deadlock
120
Conditional for Deadlock: Four conditions need to hold for a deadlock to occur: (4)...
1. Mutual Exclusion: Threads claim exclusive control of resources that they require. 2. Hold-and-wait: Threads hold resources allocated to them while waiting for additional resources. 3. No preemption: Resources cannot be forcibly removed from threads that are holding them. 4. Circular wait: There exists a circular chain of threads such that each thread holds one more resources that are being requested by the next thread in the chain. If any of these four conditions are not met, deadlock cannot occur.
121
Deadlock vs. Starvation: | (3)...
• Deadlock: A circular waiting for resources • Starvation: A thread never makes progress because other threads are using resources it needs • Starvation != Deadlock - Deadlock can be seen as a special case of starvation
122
Methods for Handling Deadlocks: Ensure that the system will never enter a deadlock state: (2)...
* Deadlock prevention – deadlock is not possible in the system * Deadlock avoidance – prevent a particular instance of deadlock from happening
123
Allow the system to enter a deadlock state and then __.
recover
124
Methods for Handling Deadlocks: • Ensure that the system will never enter a deadlock state: - Deadlock prevention – deadlock is not possible in the system - Deadlock avoidance – prevent a particular instance of deadlock from happening • Allow the system to enter a deadlock state and then recover • Ignore the problem and __.
pretend that deadlocks never occur in the system | - Used by most operating systems, including UNIX
125
Deadlock Prevention: | Restrain the ways requests can be made to make __.
at least one of the four deadlock conditions does not hold
126
Mutual Exclusion – not required for __.
sharable resources (e.g., read-only files); must hold for non-sharable resources
127
Hold and Wait – must guarantee that __.
whenever a process requests a resource, it does not hold any other resources
128
Hold and Wait – must guarantee that whenever a process requests a resource, it does not hold any other resources (2)...
* Require process to request and be allocated all its resources before it begins execution, or allow process to request resources only when the process has none allocated to it. * Low resource utilization; starvation possible
129
No Preemption: | (3)...
* If a process that is holding some resources requests another resource that cannot be immediately allocated to it, then all resources currently being held are released * Preempted resources are added to the list of resources for which the process is waiting * Process will be restarted only when it can regain its old resources, as well as the new ones that it is requesting
130
Circular Wait
impose a total ordering of all resource types, and require that each process requests resources in an increasing order of enumeration
131
Deadlock Avoidance: | Requires that the system has some additional __.
a priori information available
132
Deadlock Avoidance: | Simplest and most useful model requires that each process declares __.
the maximum number of resources of each type that it may need
133
The deadlock-avoidance algorithm dynamically examines the __.
resource-allocation state to ensure that there can never be a circular-wait condition
134
Resource-allocation state is defined by __.
the number of available and allocated resources, and the maximum demands of the processes
135
Deadlock Avoidance: | (4)...
1. Requires that the system has some additional a priori information available 2. Simplest and most useful model requires that each process declare the maximum number of resources of each type that it may need 3. The deadlock-avoidance algorithm dynamically examines the resource-allocation state to ensure that there can never be a circular-wait condition 4. Resource-allocation state is defined by the number of available and allocated resources, and the maximum demands of the processes
136
When a process requests an available resource, system must decide if __.
immediate allocation leaves the system in a safe state
137
System is in safe state if there exists a __.
safe sequence [LTS]P1 , P2 , …, Pn> of ALL the processes in the systems such that for each Pi , the resources that Pi can still request can be satisfied by currently available resources + resources held by all the Pj , with j [LTS] i --- That is: • If Pi resource needs are not immediately available, then Pi can wait until all Pj have finished • When Pj is finished, Pi can obtain needed resources, execute, return allocated resources, and terminate • When Pi terminates, P_i +1_ can obtain its needed resources, and so on
138
Basic Facts: • If a system is in safe state ⇒ __ • If a system is in unsafe state ⇒ __ • Avoidance ⇒ __
no deadlocks; possibility of deadlock; ensure that a system will never enter an unsafe state.
139
Avoidance Algorithms: Single instance of a resource type (2)...
* Use a resource-allocation graph | * Check for cycles
140
Avoidance Algorithms: Multiple instances of a resource type: Use the ..
banker's algorithm
141
``` Resource-Allocation Graph: • We have a set of vertices V and a set of edges E • V is partitioned into two types: (2)... • request edge – __ • assignment edge – __ ```
- P = {P1 , P2 , …, Pn }, the set consisting of all the processes in the system - R = {R1 , R2 , …, Rm}, the set consisting of all resource types in the system; directed edge Pi → Rj; directed edge Rj → Pi
142
Basic Facts: • If graph contains no cycles ⇒ __ • If graph contains a cycle ⇒ (2)...
no deadlock; - if only one instance per resource type, then deadlock - if several instances per resource type, possibility of deadlock
143
Avoidance Algorithms: • Single instance of a resource type: (2)... • Multiple instances of a resource type: (1)...
- Use a resource-allocation graph - Check for cycles; - Use the banker's algorithm
144
Banker's Algorithm: | (4)...
* Have multiple instances of resources * Each process must a priori claim maximum resource use (not to exceed total resources in the system) * When a process requests a resource it may have to wait * When a process gets all its resources it must return them in a finite amount of time
145
Data Structures for the Banker’s Algorithm: Let n = number of processes, and m = number of resources 1. Available 2. Max 3. Allocation 4. Need
n = number of processes, and m = number of resources • Available – Resources currently available in the system - Vector of length m - If Available [j] = k, there are k instances of resource type Rj available • Max – Maximum resources processes may request in the system - n x m matrix - If Max [i,j] = k, then process Pi may request at most k instances of resource type Rj • Allocation – Resources allocated to processes - n x m matrix - If Allocation[i,j] = k then Pi is currently allocated k instances of Rj • Need – Resources currently needed by processes - n x m matrix - If Need[i,j] = k, then Pi may need k more instances of Rj to complete its task - Need [i,j] = Max[i,j] – Allocation [i,j]
146
Safety Algorithm: | (4)...
``` 1. Let Work and Finish be vectors of length m and n, respectively. Initialize: Work = Available Finish [i] = false for i = 0, 1, …, n- 1 2. Find an i such that both: (a) Finish [i] = false (b) Needi ≤ Work If no such i exists, go to step 4 3. Work = Work + Allocationi Finish[i] = true go to step 2 4. If Finish [i] == true for all i, then the system is in a safe state ```
147
Resource-Request Algorithm for Process Pi: Requesti = request vector for process Pi If Requesti[j] = k then process Pi wants k instances of resource type Rj: (3)...
1. If Requesti ≤ Needi go to step 2. Otherwise, raise error condition, since process has exceeded its maximum claim 2. If Requesti ≤ Available, go to step 3. Otherwise, Pi must wait since resources are not available 3. Pretend to allocate requested resources to Pi by modifying the state as follows: Available = Available – Requesti; Allocationi = Allocationi + Requesti; Needi = Needi – Requesti; • If safe ⇒ the resources are allocated to Pi • If unsafe ⇒ Pi must wait, and the old resource-allocation state is restored
148
Deadlock Detection: | (3)...
``` • Allow system to enter deadlock state • Detect that deadlock has occurred - Detection algorithm • Recover from deadlock - Recovery scheme ```
149
Single Instance of Each Resource Type: Maintain wait-for graph: (2)...
* Nodes are processes | * Pi → Pj if Pi is waiting for Pj
150
Single Instance of Each Resource Type: | Periodically invoke an algorithm that searches for...
a cycle in the graph | • If there is a cycle, deadlock exists
151
Single Instance of Each Resource Type: | • An algorithm to detect a cycle in a graph requires an order of __ operations, where n is __.
n^2; | the number of vertices in the graph
152
Recovery from Deadlock: Process Termination: | (2)...
* Abort all deadlocked processes | * Abort one process at a time until the deadlock cycle is eliminated
153
Recovery from Deadlock: Resource Preemption: | (3)...
* Selecting a victim – minimize cost * Rollback – return to some safe state, restart process for that state * Starvation – same process may always be picked as victim, include number of rollbacks in cost factor