1. Computer does many things at once either in software or hardware, 2. may be due to actual parallelism in hardware or work distributed across multiple machines 3. or, illusion of parallelism produced by rapid switching.

1. runs code within process. 2. multiple threads per process 3. scheduler decides which thread to run, which core, for how long 4. individual execution context, encapsulates thread of execution. 5. Independent stack, different instructions but access to shared state since running within shared process address space. 6. OS views TCBs (register file context, stack pointer, scheduler info). 7. Thread could trap to OS starts executing kernel code, view as a kernel thread (kernel running code on behalf of user space process).

- CPU changes from executing one thread to another. - Swap register files, modify TCBs, - Inter process context switch requires e.g. swapping PTBR, process specific state.

- Multiple cores in parallel - Cores switching processes - OS running on multpile CPUs - Higher throughput if program is written to exploit parallelism.

L1 Flashcards by P Q

concurrency

Computer does many things at once either in software or hardware,
may be due to actual parallelism in hardware or work distributed across multiple machines
or, illusion of parallelism produced by rapid switching.

How well did you know this?

Not at all

Perfectly

Concerns common to C and Ds

Scalability (more cores, more performance?)
Mask latency (do something while waiting for something else)
Ordering (nondeterministic, how can we restore a notion of order?)
Correctness (heisenbugs)

How well did you know this?

Not at all

Perfectly

Differences between C and D

Concurrency assumes shared memory accessed by multiple threads.
D - message passing concurrency.
Possibility of failure in Dses.
D:unbounded communications latency.
D:problem of distributed time not just ordering.

How well did you know this?

Not at all

Perfectly

Example of illusory parallelism/time slicing

Illusion of parallelism - CPU switching back and forth between different tasks, timesharing in processors.

How well did you know this?

Not at all

Perfectly

Example of true concurrency

Multiple cores, DMA, higher throughput.

How well did you know this?

Not at all

Perfectly

Process

instance of a program in execution, run on top of OS which isolates and schedules them.
- Unit of resource allocation (memory, CPU time-scheduling) and protection (isolated address spaces e.g. distinct global variables even if same binary).

How well did you know this?

Not at all

Perfectly

Thread

runs code within process.
multiple threads per process
scheduler decides which thread to run, which core, for how long
individual execution context, encapsulates thread of execution.
Independent stack, different instructions but access to shared state since running within shared process address space.
OS views TCBs (register file context, stack pointer, scheduler info).
Thread could trap to OS starts executing kernel code, view as a kernel thread (kernel running code on behalf of user space process).

How well did you know this?

Not at all

Perfectly

Context switch

CPU changes from executing one thread to another.
Swap register files, modify TCBs,
Inter process context switch requires e.g. swapping PTBR, process specific state.

How well did you know this?

Not at all

Perfectly

Single CPU concurrency

Interleaving / time slicing / round robin
Asynchronous interrupts
Synchronous interrupts
Process/OS interleaving via syscalls interface.
Inter process - round robin between processes
Intra process - time slicing of threads within process
system on chip, other asynchronous interrupts (e.g. accelerometer).

How well did you know this?

Not at all

Perfectly

Asynch interrupt

interrupt due to asynchronous external event - timer, IO, network timeout to retransmit. The running process may not be relevant to the interrupt that has just arrived, so the OS must asynchronously context switch to unrelated code.
-> introduces nondeterminism

How well did you know this?

Not at all

Perfectly

Synch interrupt

synchronous interrupts – happens immediately and as a result of an action taken by the process (syscall, pagefault)

How well did you know this?

Not at all

Perfectly

Multiple processors

Multiple cores in parallel
Cores switching processes
OS running on multpile CPUs
Higher throughput if program is written to exploit parallelism.

How well did you know this?

Not at all

Perfectly

Sources of concurrency in a C program

Thread performs nondeterministic action (random sleep)
Thread start times?
Joining times?
Waking order under unprioritised scheduling?
printf not atomic?

How well did you know this?

Not at all

Perfectly

Thread process context

Address space contains code, heap, stack (one per thread). Possibility of indexing another thread’s stack but unsafe (nondeterministically changes).
Implicit sharing of global variables.

How well did you know this?

Not at all

Perfectly

1:N threading in 1990s

Many workstations accessing services on one server
Want a thread per workstation acting on behalf of the workstation for that server.
OSes did not provide threads, only processes.
concept of threads is built on top of the process scheduling provided by the OS.
OS perspective - every process has a single thread, single register file.
In userspace - abstraction of multiple threads. Library creates additional threads and switches between them on demand. JVM used this if the OS didn’t’t support threads - illusion of threads but actually sharing the same execution context.
(Model of choice for major OS -es
)

How well did you know this?

Not at all

Perfectly

Advantages of 1:N threading

Study These Flashcards

Threads on any OS no matter what the kernel provides
Cheap context switches between threads, don’t need to enter the kernel, just move data around in user space.
Tune aspects e.g. use your own scheduling policy.

Disadvantages of 1:N threading

Study These Flashcards

If one thread in a user process performs a blocking system call, e.g. load data from disk and wait until it’s done, if we only have one thread in the kernel, the whole process is blocked, waiting for that to come back. Act of one thread blocking the kernel causes all threads of that process to be blocked (could solve with nonblocking IO, but fundamentally awkward behaviour).
Because the processes have exactly one thread running at a time, they cannot be scheduled on multiple cores at the same time, cannot get the benefit of parallelism. This was OK for the few processors per chip in 1990s.

Contemporary OS – 1:1 kernel level threading

Study These Flashcards

Every kernel thread corresponds to a process thread.
Process has multiple threads, these are scheduled by the kernel, servicing system calls on behalf of threads, sleeping, waking threads etc…
Kernel provides primitives that can be used by the user space process. By default a process has one thread but can create more via system calls.
Kernel implements the threads, context switching between threads, thread scheduling over multiple cores.
Userspace no longer needs a complex scheduler…
Every time the process wants to use a thread, just ask kernel to create a new one.

Advantages of 1:1 threading

Study These Flashcards

Threads can preempt each other, one thread can ask for blocking IO while the other threads continue running, threads can be scheduled on multiple cores.

Disadvantages of 1:1 threading

Study These Flashcards

Higher overhead in switching between threads, entering the kernel (trap) is expensive in terms of hardware (need for traps perturbs the microarchitecture?) looks like a very large number of cycles from a software perspective. It would be faster just to swap out the registers, which would be 32 instructions in user space, rather than taking the equivalent of 4000 instructions for the kernel to provide the context switch… for programs that context switch a lot, this would provide faster running programs.
Less portable? Not the API and primitives, but the scheduling policies in different operating systems may differ. Threads get ordered in different ways by different OS, may expose bugs on different OS.
Less flexible, as user space library has less ability to affect the thread scheduling policy.

M:N threading

Study These Flashcards

Kernel scheduling multiple threads from it’s perspective.
& user space doing its own scheduling of a different number of threads.
Scheduling N user threads on top of M kernel threads.
Sometimes, the kernel supports more threads than you need it to.
Kernel exposes a smaller number M of activations, usually one per CPU/Core (or, perhaps, number of activations exposed = number of cores + number of threads blocked in kernel).
Kernel has control over number of cores available to execute.
User library has its own scheduler and schedules a larger number N threads onto the M available activations.
Can do lightweight context switch between threads without having to enter the kernel.
Thread blocks, another activation available (#activations = #cores + #blocked), kernel upcalls to ask user space process whether it wants to schedule another thread on top of the newly released activation.
Thread wakes up, kernel upcalls, notifying user-space, user-space then schedules it on an activation
Message passing back and forth between user space and kernel
Kernel can mediate the core allocation between different processes by limiting the number of activations it exposes to each user space process.
Removed from most major OSs.
Too complex - two different schedulers in different parts of system, often poor communication,
But M:N scheduling is used in hypervisors, virtual machines
Consider a virtual CPU running in virtual machines as threads of a hypervisor.
ON which that OS is going to layer the abstraction of threading.
Virtual CPU being offered to a guest OS = the M threads being exposed by the kernel.
Guest OS itself exposes N threads to user space processes.
Question - could we reduce complexity. Is this a good design decision.
User space libraries allowing concurrent programming (Apple GCD, Intel Concurrency)
Provide threads but could also provide other abstractions e.g. running functions concurrently.
Concurrently executing units > number of threads being scheduled => effectively M:N.
Takeaway - someone has to implement the concurrency, is it the threading library, OS, language runtime.

Advantages of concurrency

Study These Flashcards

Overlap computation with slow IO (historically) - allow computation (e.g. animation on website) to display while more data turns up over the network.
Structured coding
application is doing concurrent things => provide language level features to express the concurrency in the way the program is structured rather than implicitly.
Could write a concurrent program (file IO/network/redrawing GUI) as a large while loop, performing each of the tasks in sequence repeatedly.
Could alternately express the program as a set of threads which express the concurrency explicitly, threading library switches between the threads for you.
As you represent the programs using threads, and the number of cores increases, your declaration of concurrency turns into parallelism.
In principle - allows program to be able to scale out to a parallel systme.

Debugging

Study These Flashcards

Debugging code
Different machines – different timings - different race conditions
Mostly correct code is worse than mostly wrong code
Heisenbugs - inserting a print statement, or using a debugger changes timing, may cause bug to dissappear (example).

Critical sections

Study These Flashcards

Critical sections and mutual exclusion
Most code should be run in parallel.
Some code needs to be expressed as being non-concurrent.
Label critical sections = piece of code that is going to run as if on a non-concurrent system.
If one thread is in its critical section, other thread has to wait for it to exit its critical section.
More general idea, mutual exclusion = certain resources (code OR data) cannot be accessed concurrently.
Instance of mutual exclusion for code – if one thread is executing in a critical section, all other threads are prohibited from entering it.

Mutual exclusion

Achieving mutual exclusion only let one thread every execute a particular critical section. But this comes at the cost of reducing concurrency reducing the benefits of parallelism.

Race condition

Race condition - two concurrent actors access the same shared resources, and they behave in an incorrect way.. Multiple threads "race" with another during conflict access to shared resource. example - non atomic read and set for marking a critical section.

Atomicity (concurrency meaning)

Atomicity (concurrency meaning, not transactions meaning) we want the checking and conditional setting to be atomic, without any other thread being involved. If two threads attempt to perform read and set concurrently, that the instructions within the RAS operation are not interleaved, and are strictly ordered. Want read and set to occur as one atomic operation. This becomes indivisible from the point of view of the program. This is sufficient as a primitive for building all of the other common synchronisation primitives.

Problems with spin lock

this is correct but not perfect Lock could be held for a long time, preventing other accesses to the shared resource (fridge) which would not have interfered with the work done in the critical section (e.g. getting the milk) Waiting threads waste CPU time Contention occurs when consumers have to wait for locks lock is contended when one thread is waiting and one thread is in execution waiting thread is burning CPU cycles (in this implementation)

Mutual exclusion spin lock implentation

Implementing mutual exclusion Single CPU could implement CS by eliminating context switches, re enabling context switches. Not an efficient solution, removes all concurrency including unrelated threads. Brute force - stops all other threads, not just those wanting to enter the critical section. Potentially unsafe, if you disable interrupts and then sleep, waiting for a timer interrupt. Doesn't work across multiple CPUs (too much removal of concurrency). OR associate a mutual exclusion lock with each critical section e.g. a variable L Entering CS = acquire lock on L associated with critical section leave CS = unlock L One very basic implementation would be to have lock(L) while (not read and set L) ; // do nothing Unlock(L) L = 0 this burns CPU cycles in a busy waiting / polling loop while the lock is unavailable. but is sufficient enough to implement critical sections Mutual exclusion locks are more general, can be associated with shared resources as well as critical sections.

L1 Flashcards

(29 cards)