Chapter 4 Flashcards
A thread is a basic unit of CPU utilization
it comprises a thread ID, a program counter (PC), a register set, and a stack. It shares with other threads belonging to the same process its code section, data section, and other operating-system resources, such as open filles and signals. A traditional process has a single thread of control. If a process has multiple threads of control, it can perform more than one task at a time. Figure 4.1 illustrates the difference between a traditional single-threaded process and a multithreaded process.
Most operating system kernels are also typically multithreaded
multithreaded. As an example, during system boot time on Linux systems, several kernel threads are created. Each thread performs a specif i c task, such as managing devices, memory management, or interrupt handling. The commandps -efcan be used to display the kernel threads on a running Linux system. Examining the output of this command will show the kernel threadkthreadd(withpid= 2), which serves as the parent of all other kernel threads.
Many applications can also take advantage of multiple threads, including basic sorting, trees, and graph algorithms. In addition, programmers who must solve contemporaryCPU-intensive problems in data mining, graphics, and artif i cial intelligence can leverage the power of modern multicore systems by designing solutions that run in parallel.
Benefits of multithreading
- Responsiveness. Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user.
This quality is especially useful in designing user interfaces. For instance, consider what happens when a user clicks a button that results in the performance of a time-consuming operation. A single-threaded application would be unresponsive to the user until the operation had been completed. In contrast, if the time-consuming operation is performed in a separate, asynchronous thread, the application remains responsive to the user. - Resource sharing. Processes can share resources only through techniques such as shared memory and message passing. Such techniques must be explicitly arranged by the programmer. However, threads share the memory and the resources of the process to which they belong by default.
The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space. - Economy. Allocating memory and resources for process creation is costly. Because threads share the resources of the process to which they belong, it is more economical to create and context-switch threads.
Empirically gauging the difference in overhead can be diff i cult, but in general thread creation consumes less time and memory than process creation. Additionally, context switching is typically faster between threads than between processes. - Scalability. The benefits of multithreading can be even greater in a multiprocessor architecture, where threads may be running in parallel on different processing cores. A single-threaded process can run on only one processor, regardless how many are available. We explore this issue further in the following section.
Multicore Programming
A later, yet similar, trend in system design is to place multiple computing cores on a single processing chip where each core appears as a separateCPU to the operating system (Section 1.3.2). We refer to such systems as multicore, and multithreaded programming provides a mechanism for more efficient use of these multiple computing cores and improved concurrency. Consider an application with four threads. On a system with a single computing core, concurrency merely means that the execution of the threads will be interleaved over time (Figure 4.3), because the processing core is capable of executing only one thread at a time. On a system with multiple cores, however, concurrency means that some threads can run in parallel, because the system can assign a separate thread to each core (Figure 4.4).
Notice the distinction between concurrency and parallelism in this discus-sion. Aconcurrent system supports more than one task by allowing all the tasks to make progress. In contrast, a parallel system can perform more than one task simultaneously. Thus, it is possible to have concurrency without parallelism.
Before the advent of multiprocessor and multicore architectures, most com-puter systems had only a single processor, andCPUschedulers were designed to provide the illusion of parallelism by rapidly switching between processes, thereby allowing each process to make progress. Such processes were running concurrently, but not in parallel.
Concurrency vs Parallelism
Concurrency is about multiple tasks which start, run, and complete in overlapping time periods, in no specific order. Parallelism is about multiple tasks or subtasks of the same task that literally run at the same time on a hardware with multiple computing resources like multicore processor. As you can see, concurrency and parallelism are similar but not identical.
In general, five areas present challenges in programming for multicore systems:
- Identifying tasks. This involves examining applications to find areas that can be divided into separate, concurrent tasks. Ideally, tasks are independent of one another and thus can run in parallel on individual cores.
- Balance. While identifying tasks that can run in parallel, programmers must also ensure that the tasks perform equal work of equal value. In some instances, a certain task may not contribute as much value to the overall process as other tasks. Using a separate execution core to run that task may not be worth the cost.
- Data splitting. Just as applications are divided into separate tasks, the data accessed and manipulated by the tasks must be divided to run on separate cores.
- Data dependency. The data accessed by the tasks must be examined for dependencies between two or more tasks. When one task depends on data from another, programmers must ensure that the execution of the tasks is synchronized to accommodate the data dependency. We examine such strategies in Chapter 6.
- Testing and debugging. When a program is running in parallel on multiple cores, many different execution paths are possible. Testing and debugging such concurrent programs is inherently more difficult than testing and debugging single-threaded applications.
AMDAHL’S LAW
Amdahl’s Law is a formula that identif i es potential performance gains from adding additional computing cores to an application that has both serial (nonparallel) and parallel components. If S is the portion of the application that must be performed serially on a system with N processing cores. As an example, assume we have an application that is 75 percent parallel and 25 percent serial. If we run this application on a system with two processing cores, we can get a speedup of 1.6 times. If we add two additional cores (for a total of four), the speedup is 2.28 times
Types of Parallelism
In general, there are two types of parallelism: data parallelism and task par-allelism. Data parallelism focuses on distributing subsets of the same data across multiple computing cores and performing the same operation on each core. Consider, for example, summing the contents of an array of size N. On a single-core system, one thread would simply sum the elements [0] …[N − 1].
On a dual-core system, however, thread A, running on core 0, could sum the elements [0] …[N∕2 − 1] while thread B, running on core 1, could sum the elements [N∕2] …[N − 1]. The two threads would be running in parallel on separate computing cores.
Task parallelism involves distributing not data but tasks (threads) across multiple computing cores. Each thread is performing a unique operation. Different threads may be operating on the same data, or they may be operating on different data. Consider again our example above. In contrast to that situation, an example of task parallelism might involve two threads, each performing a unique statistical operation on the array of elements. The threads again are operating in parallel on separate computing cores, but each is performing a unique operation.
Fundamentally,, data parallelism involves the distribution of data across multiple cores, and task parallelism involves the distribution of tasks across multiple cores, as shown in Figure 4.5. However, data and task parallelism are not mutually exclusive, and an application may in fact use a hybrid of these two strategies.
Kernel vs user threads
However, support for threads may be provided either at the user level, for user threads, or by the kernel, for kernel threads. User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the operating system. Virtually all contemporary operating systems—including Windows, Linux, and macOS— support kernel threads.
Many-to-One Model
The many-to-one model (Figure 4.7) maps many user-level threads to one kernel thread. Thread management is done by the thread library in user space, so it is eff i cient (we discuss thread libraries in Section 4.4). However, the entire process will block if a thread makes a blocking system call. Also, because only one thread can access the kernel at a time, multiple threads are unable to run in parallel on multicore systems. Green threads—a thread library available for Solaris systems and adopted in early versions of Java—used the many-to-one model. However, very few systems continue to use the model because of its inability to take advantage of multiple processing cores, which have now become standard on most computer systems.
One-to-One Model
The one-to-one model (Figure 4.8) maps each user thread to a kernel thread. It provides more concurrency than the many-to-one model by allowing another thread to run when a thread makes a blocking system call. It also allows multiple threads to run in parallel on multiprocessors. The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread, and a large number of kernel threads may burden the performance of a system. Linux, along with the family of Windows operating systems, implement the one-to-one model.
Many-to-Many Model
The many-to-many model (Figure 4.9) multiplexes many user-level threads to a smaller or equal number of kernel threads. The number of kernel threads may be specific to either a particular application or a particular machine (an application may be allocated more kernel threads on a system with eight processing cores than a system with four cores). Let’s consider the effect of this design on concurrency. Whereas the many-to-one model allows the developer to create as many user threads as she wishes, it does not result in parallelism, because the kernel can schedule only one kernel thread at a time. The one-to-one model allows greater concurrency, but the developer has to be careful not to create too many threads within an application. (In fact, on some systems, she may be limited in the number of threads she can create.) The many-to-many model suffers from neither of these shortcomings: developers can create as many user threads as necessary, and the corresponding kernel threads can run in parallel on a multiprocessor. Also, when a thread performs a blocking system call, the kernel can schedule another thread for execution.
One variation on the many-to-many model still multiplexes many user-level threads to a smaller or equal number of kernel threads but also allows a user-level thread to be bound to a kernel thread. This variation is sometimes referred to as the two-level model (Figure 4.10).
Although the many-to-many model appears to be the most f l exible of the models discussed, in practice it is diff i cult to implement. In addition, with an increasing number of processing cores appearing on most systems, limiting the number of kernel threads has become less important. As a result, most operating systems now use the one-to-one model. However, as we shall see in Section 4.5, some contemporary concurrency libraries have developers identify tasks that are then mapped to threads using the many-to-many model.
Thread Libraries
A thread library provides the programmer with anAPIfor creating and man-aging threads. There are two primary ways of implementing a thread library.
The f i rst approach is to provide a library entirely in user space with no kernel support. All code and data structures for the library exist in user space. This means that invoking a function in the library results in a local function call in user space and not a system call.
The second approach is to implement a kernel-level library supported directly by the operating system. In this case, code and data structures for the library exist in kernel space. Invoking a function in theAPIfor the library typically results in a system call to the kernel.
Three main thread libraries are in use today
POSIXPthreads, Windows, and Java. Pthreads, the threads extension of thePOSIXstandard, may be provided as either a user-level or a kernel-level library. The Windows thread library is a kernel-level library available on Windows systems. The Java threadAPI allows threads to be created and managed directly in Java programs. However, because in most instances theJVMis running on top of a host operating system, the Java threadAPIis generally implemented using a thread library available on the host system. This means that on Windows systems, Java threads are typ-ically implemented using the WindowsAPI;UNIX, Linux, and macOSsystems typically use Pthreads.
two general strategies for creating multiple threads: asynchronous threading and synchronous threading.
asynchronous threading and synchronous threading. With asynchronous threading, once the parent creates a child thread, the parent resumes its execution, so that the parent and child execute concurrently and independently of one another. Because the threads are independent, there is typically little data sharing between them. Asynchronous threading is the strategy used in the multithreaded server illustrated in Figure 4.2 and is also commonly used for designing responsive user interfaces.
Synchronous threading occurs when the parent thread creates one or more children and then must wait for all of its children to terminate before it resumes.
Here, the threads created by the parent perform work concurrently, but the parent cannot continue until this work has been completed. Once each thread has f i nished its work, it terminates and joins with its parent. Only after all of the children have joined can the parent resume execution. Typically, synchronous threading involves signif i cant data sharing among threads. For example, the parent thread may combine the results calculated by its various children. All of the following examples use synchronous threading.
Pthreads
Pthreads refers to thePOSIXstandard (IEEE1003.1c) def i ning anAPIfor thread creation and synchronization. This is a specification for thread behavior, not an implementation. Operating-system designers may implement the specification in any way they wish. Numerous systems implement the Pthreads specif i ca-tion; most areUNIX-type systems, including Linux and macOS. Although Win-dows doesn’t support Pthreads natively, some third-party implementations for Windows are available.
define NUM THREADS 10 /* an array of threads to be joined upon */ pthread t workers[NUM THREADS];
for (int i = 0; i < NUM THREADS; i++) pthread join(workers[i], NULL);
The C program shown in Figure 4.11 demonstrates the basic PthreadsAPI for constructing a multithreaded program that calculates the summation of a non-negative integer in a separate thread. In a Pthreads program, separate threads begin execution in a specif i ed function. In Figure 4.11, this is therun-ner()function. When this program begins, a single thread of control begins in main(). After some initialization,main()creates a second thread that begins control in therunner()function. Both threads share the global datasum.
Let’s look more closely at this program. All Pthreads programs must include thepthread.hheader f i le. The statementpthread t tiddeclares the identif i er for the thread we will create. Each thread has a set of attributes, including stack size and scheduling information. Thepthread attr t attr declaration represents the attributes for the thread. We set the attributes in the function callpthread attr init(&attr). Because we did not explicitly set any attributes, we use the default attributes provided. (In Chapter 5, we discuss some of the scheduling attributes provided by the PthreadsAPI.) A separate thread is created with thepthread create()function call. In addi-tion to passing the thread identif i er and the attributes for the thread, we also pass the name of the function where the new thread will begin execution—in this case, therunner()function. Last, we pass the integer parameter that was provided on the command line,argv[1].
At this point, the program has two threads: the initial (or parent) thread in main()and the summation (or child) thread performing the summation oper-ation in therunner()function. This program follows the thread create/join strategy, whereby after creating the summation thread, the parent thread will wait for it to terminate by calling thepthread join()function. The summa-tion thread will terminate when it calls the functionpthread exit(). Once the summation thread has returned, the parent thread will output the value of the shared datasum.
This example program creates only a single thread. With the growing dominance of multicore systems, writing programs containing several threads has become increasingly common. A simple method for waiting on several threads using thepthread join()function is to enclose the operation within a simpleforloop. For example, you can join on ten threads using the Pthread code shown in Figure 4.12.
Java Threads
Threads are the fundamental model of program execution in a Java program, and the Java language and itsAPIprovide a rich set of features for the creation and management of threads. All Java programs comprise at least a single thread of control—even a simple Java program consisting of only a bmain() method runs as a single thread in the JVM. Java threads are available on any system that provides aJVMincluding Windows, Linux, and macOS. The Java thread API is available for Android applications as well.
There are two techniques for explicitly creating threads in a Java program.
One approach is to create a new class that is derived from theThreadclass and to override its run()method. An alternative—and more commonly used —technique is to define a class that implements the Runnableinterface. This interface def ines a single abstract method with the signaturepublic void run(). The code in the run()method of a class that implementsRunnable is what executes in a separate thread. An example is shown below:
class Task implements Runnable { public voidbrun(){ System.out.println(“I am a thread.”);
} }
LAMBDA EXPRESSIONS IN JAVA
Beginning with Version 1.8 of the language, Java introduced Lambda expressions, which allow a much cleaner syntax for creating threads. Rather than def i ning a separate class that implementsRunnable, a Lambda expression can be used instead:
Runnable task = () ->{ System.out.println(“I am a thread.”);
};
Thread worker = new Thread(task);
worker.start();
Lambda expressions—as well as similar functions known as closures—are a prominent feature of functional programming languages and have been available in several nonfunctional languages as well including Python, C++, and C#. As we shall see in later examples in this chapter, Lamdba expressions often provide a simple syntax for developing parallel applications.
Thread creation in Java involves
creating aThreadobject and passing it an instance of a class that implementsRunnable, followed by invoking the start()method on the Thread object. This appears in the following example:
Thread worker = new Thread(new Task());
worker.start();
Invoking the start()method for the newThreadobject does two things:
1. It allocates memory and initializes a new thread in the JVM.
2. It calls the run()method, making the thread eligible to be run by thenJVM.
(Note again that we never call the run()method directly. Rather, we call thestart()method, and it calls the run()method on our behalf.) Recall that the parent threads in the Pthreads and Windows libraries use pthread join()and WaitForSingleObject()(respectively) to wait for the summation threads to f i nish before proceeding. The join()method in Java provides similar functionality. (Notice that join()can throw an Interrupt-edException, which we choose to ignore.) try{ worker.join();
} catch (InterruptedException ie){ } If the parent must wait for several threads to f i nish, then join()method can be enclosed in aforloop similar to that shown for Pthreads
Java Executor Framework
Java has supported thread creation using the approach we have described thus far since its origins. However, beginning with Version 1.5 and its API, Java introduced several new concurrency features that provide developers with much greater control over thread creation and communication. These tools are available in thejava.util.concurrentpackage.
Rather than explicitly creating Thread objects, thread creation is instead organized around the Executorinterface:
public interface Executor { void execute(Runnable command);
} Classes implementing this interface must define the execute()method, which is passed a Runnableobject. For Java developers, this means using theExecu-torrather than creating a separateThreadobject and invoking itsstart() method. TheExecutoris used as follows:
Executor service = newExecutor;
service.execute(new Task());
The Executorframework is based on the producer-consumer model; tasks implementing theRunnableinterface are produced, and the threads that exe-cute these tasks consume them. The advantage of this approach is that it not only divides thread creation from execution but also provides a mechanism for communication between concurrent tasks.
Data sharing between threads belonging to the same process occurs easily in Windows and Pthreads, since shared data are simply declared globally. As a pure object-oriented language, Java has no such notion of global data. We can pass parameters to a class that implementsRunnable, but Java threads cannot return results. To address this need, thejava.util.concurrentpack-age additionally def i nes theCallableinterface, which behaves similarly to Runnableexcept that a result can be returned. Results returned fromCallable tasks are known asFutureobjects. A result can be retrieved from theget() method def i ned in theFutureinterface. The program shown in Figure 4.14 illustrates the summation program using these Java features.
TheSummationclass implements theCallableinterface, which specif i es the methodV call()—it is the code in thiscall()method that is executed in a separate thread. To execute this code, we create anewSingleThreadEx-ecutorobject (provided as a static method in theExecutorsclass), which is of typeExecutorService, and pass it aCallabletask using itssubmit() method. (The primary difference between theexecute()andsubmit()meth-ods is that the former returns no result, whereas the latter returns a result as aFuture.) Once we submit thecallabletask to the thread, we wait for its result by calling theget()method of theFutureobject it returns.
It is quite easy to notice at f i rst that this model of thread creation appears more complicated than simply creating a thread and joining on its termination.
However, incurring this modest degree of complication confers benef i ts. As we have seen, usingCallableandFutureallows for threads to return results.
THE JVM AND THE HOST OPERATING SYSTEM
The JVM is typically implemented on top of a host operating system (see Figure 18.10). This setup allows the JVM to hide the implementation details of the underlying operating system and to provide a consistent, abstract environment that allows Java programs to operate on any platform that supports a JVM. The specification for the JVM does not indicate how Java threads are to be mapped to the underlying operating system, instead leaving that decision to the particular implementation of the JVM. For example, the Windows operating system uses the one-to-one model; therefore, each Java thread for aJVMrunning on Windows maps to a kernel thread. In addition, there may be a relationship between the Java thread library and the thread library on the host operating system. For example, implementations of a JVM for the Windows family of operating systems might use the WindowsAPI when creating Java threads; Linux and macOS systems might use the Pthreads API.
Implicit Threading
With the continued growth of multicore processing, applications containing hundreds—or even thousands—of threads are looming on the horizon.
Designing such applications is not a trivial undertaking: programmers must address not only the challenges outlined in Section 4.2 but additional diff i cul-ties as well. These diff i culties, which relate to program correctness, are covered in Chapter 6 and Chapter 8.
One way to address these diff i culties and better support the design of con-current and parallel applications is to transfer the creation and management of threading from application developers to compilers and run-time libraries.
This strategy, termed implicit threading, is an increasingly popular trend. In this section, we explore four alternative approaches to designing applications that can take advantage of multicore processors through implicit threading.
As we shall see, these strategies generally require application developers to identify tasks—not threads—that can run in parallel. A task is usually writ-ten as a function, which the run-time library then maps to a separate thread, typically using the many-to-many model (Section 4.3.3). The advantage of this approach is that developers only need to identify parallel tasks, and the libraries determine the specif i c details of thread creation and management.
ANDROID THREAD POOLS
In Section 3.8.2.1, we coveredRPCs in the Android operating system. You may recall from that section that Android uses the Android Interface Definition Language (AIDL), a tool that specifes the remote interface that clients interact with on the server.AIDLalso provides a thread pool. A remote service using the thread pool can handle multiple concurrent requests, servicing each request using a separate thread from the pool.