Chapter 4 Flashcards
A thread is a basic unit of CPU utilization
it comprises a thread ID, a program counter (PC), a register set, and a stack. It shares with other threads belonging to the same process its code section, data section, and other operating-system resources, such as open filles and signals. A traditional process has a single thread of control. If a process has multiple threads of control, it can perform more than one task at a time. Figure 4.1 illustrates the difference between a traditional single-threaded process and a multithreaded process.
Most operating system kernels are also typically multithreaded
multithreaded. As an example, during system boot time on Linux systems, several kernel threads are created. Each thread performs a specif i c task, such as managing devices, memory management, or interrupt handling. The commandps -efcan be used to display the kernel threads on a running Linux system. Examining the output of this command will show the kernel threadkthreadd(withpid= 2), which serves as the parent of all other kernel threads.
Many applications can also take advantage of multiple threads, including basic sorting, trees, and graph algorithms. In addition, programmers who must solve contemporaryCPU-intensive problems in data mining, graphics, and artif i cial intelligence can leverage the power of modern multicore systems by designing solutions that run in parallel.
Benefits of multithreading
- Responsiveness. Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user.
This quality is especially useful in designing user interfaces. For instance, consider what happens when a user clicks a button that results in the performance of a time-consuming operation. A single-threaded application would be unresponsive to the user until the operation had been completed. In contrast, if the time-consuming operation is performed in a separate, asynchronous thread, the application remains responsive to the user. - Resource sharing. Processes can share resources only through techniques such as shared memory and message passing. Such techniques must be explicitly arranged by the programmer. However, threads share the memory and the resources of the process to which they belong by default.
The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space. - Economy. Allocating memory and resources for process creation is costly. Because threads share the resources of the process to which they belong, it is more economical to create and context-switch threads.
Empirically gauging the difference in overhead can be diff i cult, but in general thread creation consumes less time and memory than process creation. Additionally, context switching is typically faster between threads than between processes. - Scalability. The benefits of multithreading can be even greater in a multiprocessor architecture, where threads may be running in parallel on different processing cores. A single-threaded process can run on only one processor, regardless how many are available. We explore this issue further in the following section.
Multicore Programming
A later, yet similar, trend in system design is to place multiple computing cores on a single processing chip where each core appears as a separateCPU to the operating system (Section 1.3.2). We refer to such systems as multicore, and multithreaded programming provides a mechanism for more efficient use of these multiple computing cores and improved concurrency. Consider an application with four threads. On a system with a single computing core, concurrency merely means that the execution of the threads will be interleaved over time (Figure 4.3), because the processing core is capable of executing only one thread at a time. On a system with multiple cores, however, concurrency means that some threads can run in parallel, because the system can assign a separate thread to each core (Figure 4.4).
Notice the distinction between concurrency and parallelism in this discus-sion. Aconcurrent system supports more than one task by allowing all the tasks to make progress. In contrast, a parallel system can perform more than one task simultaneously. Thus, it is possible to have concurrency without parallelism.
Before the advent of multiprocessor and multicore architectures, most com-puter systems had only a single processor, andCPUschedulers were designed to provide the illusion of parallelism by rapidly switching between processes, thereby allowing each process to make progress. Such processes were running concurrently, but not in parallel.
Concurrency vs Parallelism
Concurrency is about multiple tasks which start, run, and complete in overlapping time periods, in no specific order. Parallelism is about multiple tasks or subtasks of the same task that literally run at the same time on a hardware with multiple computing resources like multicore processor. As you can see, concurrency and parallelism are similar but not identical.
In general, five areas present challenges in programming for multicore systems:
- Identifying tasks. This involves examining applications to find areas that can be divided into separate, concurrent tasks. Ideally, tasks are independent of one another and thus can run in parallel on individual cores.
- Balance. While identifying tasks that can run in parallel, programmers must also ensure that the tasks perform equal work of equal value. In some instances, a certain task may not contribute as much value to the overall process as other tasks. Using a separate execution core to run that task may not be worth the cost.
- Data splitting. Just as applications are divided into separate tasks, the data accessed and manipulated by the tasks must be divided to run on separate cores.
- Data dependency. The data accessed by the tasks must be examined for dependencies between two or more tasks. When one task depends on data from another, programmers must ensure that the execution of the tasks is synchronized to accommodate the data dependency. We examine such strategies in Chapter 6.
- Testing and debugging. When a program is running in parallel on multiple cores, many different execution paths are possible. Testing and debugging such concurrent programs is inherently more difficult than testing and debugging single-threaded applications.
AMDAHL’S LAW
Amdahl’s Law is a formula that identif i es potential performance gains from adding additional computing cores to an application that has both serial (nonparallel) and parallel components. If S is the portion of the application that must be performed serially on a system with N processing cores. As an example, assume we have an application that is 75 percent parallel and 25 percent serial. If we run this application on a system with two processing cores, we can get a speedup of 1.6 times. If we add two additional cores (for a total of four), the speedup is 2.28 times
Types of Parallelism
In general, there are two types of parallelism: data parallelism and task par-allelism. Data parallelism focuses on distributing subsets of the same data across multiple computing cores and performing the same operation on each core. Consider, for example, summing the contents of an array of size N. On a single-core system, one thread would simply sum the elements [0] …[N − 1].
On a dual-core system, however, thread A, running on core 0, could sum the elements [0] …[N∕2 − 1] while thread B, running on core 1, could sum the elements [N∕2] …[N − 1]. The two threads would be running in parallel on separate computing cores.
Task parallelism involves distributing not data but tasks (threads) across multiple computing cores. Each thread is performing a unique operation. Different threads may be operating on the same data, or they may be operating on different data. Consider again our example above. In contrast to that situation, an example of task parallelism might involve two threads, each performing a unique statistical operation on the array of elements. The threads again are operating in parallel on separate computing cores, but each is performing a unique operation.
Fundamentally,, data parallelism involves the distribution of data across multiple cores, and task parallelism involves the distribution of tasks across multiple cores, as shown in Figure 4.5. However, data and task parallelism are not mutually exclusive, and an application may in fact use a hybrid of these two strategies.
Kernel vs user threads
However, support for threads may be provided either at the user level, for user threads, or by the kernel, for kernel threads. User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the operating system. Virtually all contemporary operating systems—including Windows, Linux, and macOS— support kernel threads.
Many-to-One Model
The many-to-one model (Figure 4.7) maps many user-level threads to one kernel thread. Thread management is done by the thread library in user space, so it is eff i cient (we discuss thread libraries in Section 4.4). However, the entire process will block if a thread makes a blocking system call. Also, because only one thread can access the kernel at a time, multiple threads are unable to run in parallel on multicore systems. Green threads—a thread library available for Solaris systems and adopted in early versions of Java—used the many-to-one model. However, very few systems continue to use the model because of its inability to take advantage of multiple processing cores, which have now become standard on most computer systems.
One-to-One Model
The one-to-one model (Figure 4.8) maps each user thread to a kernel thread. It provides more concurrency than the many-to-one model by allowing another thread to run when a thread makes a blocking system call. It also allows multiple threads to run in parallel on multiprocessors. The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread, and a large number of kernel threads may burden the performance of a system. Linux, along with the family of Windows operating systems, implement the one-to-one model.
Many-to-Many Model
The many-to-many model (Figure 4.9) multiplexes many user-level threads to a smaller or equal number of kernel threads. The number of kernel threads may be specific to either a particular application or a particular machine (an application may be allocated more kernel threads on a system with eight processing cores than a system with four cores). Let’s consider the effect of this design on concurrency. Whereas the many-to-one model allows the developer to create as many user threads as she wishes, it does not result in parallelism, because the kernel can schedule only one kernel thread at a time. The one-to-one model allows greater concurrency, but the developer has to be careful not to create too many threads within an application. (In fact, on some systems, she may be limited in the number of threads she can create.) The many-to-many model suffers from neither of these shortcomings: developers can create as many user threads as necessary, and the corresponding kernel threads can run in parallel on a multiprocessor. Also, when a thread performs a blocking system call, the kernel can schedule another thread for execution.
One variation on the many-to-many model still multiplexes many user-level threads to a smaller or equal number of kernel threads but also allows a user-level thread to be bound to a kernel thread. This variation is sometimes referred to as the two-level model (Figure 4.10).
Although the many-to-many model appears to be the most f l exible of the models discussed, in practice it is diff i cult to implement. In addition, with an increasing number of processing cores appearing on most systems, limiting the number of kernel threads has become less important. As a result, most operating systems now use the one-to-one model. However, as we shall see in Section 4.5, some contemporary concurrency libraries have developers identify tasks that are then mapped to threads using the many-to-many model.
Thread Libraries
A thread library provides the programmer with anAPIfor creating and man-aging threads. There are two primary ways of implementing a thread library.
The f i rst approach is to provide a library entirely in user space with no kernel support. All code and data structures for the library exist in user space. This means that invoking a function in the library results in a local function call in user space and not a system call.
The second approach is to implement a kernel-level library supported directly by the operating system. In this case, code and data structures for the library exist in kernel space. Invoking a function in theAPIfor the library typically results in a system call to the kernel.
Three main thread libraries are in use today
POSIXPthreads, Windows, and Java. Pthreads, the threads extension of thePOSIXstandard, may be provided as either a user-level or a kernel-level library. The Windows thread library is a kernel-level library available on Windows systems. The Java threadAPI allows threads to be created and managed directly in Java programs. However, because in most instances theJVMis running on top of a host operating system, the Java threadAPIis generally implemented using a thread library available on the host system. This means that on Windows systems, Java threads are typ-ically implemented using the WindowsAPI;UNIX, Linux, and macOSsystems typically use Pthreads.
two general strategies for creating multiple threads: asynchronous threading and synchronous threading.
asynchronous threading and synchronous threading. With asynchronous threading, once the parent creates a child thread, the parent resumes its execution, so that the parent and child execute concurrently and independently of one another. Because the threads are independent, there is typically little data sharing between them. Asynchronous threading is the strategy used in the multithreaded server illustrated in Figure 4.2 and is also commonly used for designing responsive user interfaces.
Synchronous threading occurs when the parent thread creates one or more children and then must wait for all of its children to terminate before it resumes.
Here, the threads created by the parent perform work concurrently, but the parent cannot continue until this work has been completed. Once each thread has f i nished its work, it terminates and joins with its parent. Only after all of the children have joined can the parent resume execution. Typically, synchronous threading involves signif i cant data sharing among threads. For example, the parent thread may combine the results calculated by its various children. All of the following examples use synchronous threading.
Pthreads
Pthreads refers to thePOSIXstandard (IEEE1003.1c) def i ning anAPIfor thread creation and synchronization. This is a specification for thread behavior, not an implementation. Operating-system designers may implement the specification in any way they wish. Numerous systems implement the Pthreads specif i ca-tion; most areUNIX-type systems, including Linux and macOS. Although Win-dows doesn’t support Pthreads natively, some third-party implementations for Windows are available.
define NUM THREADS 10 /* an array of threads to be joined upon */ pthread t workers[NUM THREADS];
for (int i = 0; i < NUM THREADS; i++) pthread join(workers[i], NULL);
The C program shown in Figure 4.11 demonstrates the basic PthreadsAPI for constructing a multithreaded program that calculates the summation of a non-negative integer in a separate thread. In a Pthreads program, separate threads begin execution in a specif i ed function. In Figure 4.11, this is therun-ner()function. When this program begins, a single thread of control begins in main(). After some initialization,main()creates a second thread that begins control in therunner()function. Both threads share the global datasum.
Let’s look more closely at this program. All Pthreads programs must include thepthread.hheader f i le. The statementpthread t tiddeclares the identif i er for the thread we will create. Each thread has a set of attributes, including stack size and scheduling information. Thepthread attr t attr declaration represents the attributes for the thread. We set the attributes in the function callpthread attr init(&attr). Because we did not explicitly set any attributes, we use the default attributes provided. (In Chapter 5, we discuss some of the scheduling attributes provided by the PthreadsAPI.) A separate thread is created with thepthread create()function call. In addi-tion to passing the thread identif i er and the attributes for the thread, we also pass the name of the function where the new thread will begin execution—in this case, therunner()function. Last, we pass the integer parameter that was provided on the command line,argv[1].
At this point, the program has two threads: the initial (or parent) thread in main()and the summation (or child) thread performing the summation oper-ation in therunner()function. This program follows the thread create/join strategy, whereby after creating the summation thread, the parent thread will wait for it to terminate by calling thepthread join()function. The summa-tion thread will terminate when it calls the functionpthread exit(). Once the summation thread has returned, the parent thread will output the value of the shared datasum.
This example program creates only a single thread. With the growing dominance of multicore systems, writing programs containing several threads has become increasingly common. A simple method for waiting on several threads using thepthread join()function is to enclose the operation within a simpleforloop. For example, you can join on ten threads using the Pthread code shown in Figure 4.12.