Chapter 18 Flashcards
Virtualization
The fundamental idea behind a virtual machine is to abstract the hardware
of a single computer (the CPU, memory, disk drives, network interface cards,
and so forth) into several different execution environments, thereby creating
the illusion that each separate environment is running on its own private
computer. This concept may seem similar to the layered approach of operating
system implementation (see Section 2.8.2), and in some ways it is. In the case of
virtualization, there is a layer that creates a virtual system on which operating
systems or applications can run.
Virtual machine implementations involve several components. At the base
is the host, the underlying hardware system that runs the virtual machines.
The virtual machine manager (VMM) (also known as a hypervisor) creates
and runs virtual machines by providing an interface that is identical to the host
(except in the case of paravirtualization, discussed later). Each guest process is
provided with a virtual copy of the host (Figure 18.1). Usually, the guest process
is in fact an operating system. A single physical machine can thus run multiple
operating systems concurrently, each in its own virtual machine.
Take a moment to note that with virtualization, the definition of “operat-
ing system” once again blurs. For example, consider VMM software such as
VMware ESX. This virtualization software is installed on the hardware, runs
when the hardware boots, and provides services to applications. The services
include traditional ones, such as scheduling and memory management, along
with new types, such as migration of applications between systems. Further-
more, the applications are, in fact, guest operating systems. Is the VMware ESX
VMM an operating system that, in turn, runs other operating systems? Certainly
it acts like an operating system. For clarity, however, we call the component that
provides virtual environments a VMM.
The implementation of VMMs varies greatly. Options include the following:
* Hardware-based solutions that provide support for virtual machine cre-
ation and management via firmware. These VMMs, which are commonly
found in mainframe and large to midsized servers, are generally known as
type 0 hypervisors. IBM LPARs and Oracle LDOMs are examples.
* Operating-system-like software built to provide virtualization, including
VMware ESX (mentioned above), Joyent SmartOS, and Citrix XenServer.
These VMMs are known as type 1 hypervisors.
* General-purpose operating systems that provide standard functions as
well as VMM functions, including Microsoft Windows Server with HyperV
and Red Hat Linux with the KVM feature. Because such systems have a
feature set similar to type 1 hypervisors, they are also known as type 1.
* Applications that run on standard operating systems but provide VMM
features to guest operating systems. These applications, which include
VMware Workstation and Fusion, Parallels Desktop, and Oracle Virtual-
Box, are type 2 hypervisors.
* Paravirtualization, a technique in which the guest operating system is
modified to work in cooperation with the VMM to optimize performance.
* Programming-environment virtualization, in which VMMs do not virtu-
alize real hardware but instead create an optimized virtual system. This
technique is used by Oracle Java and Microsoft.Net.
* Emulators that allow applications written for one hardware environment
to run on a very different hardware environment, such as a different type
of CPU.
* Application containment, which is not virtualization at all but rather
provides virtualization-like features by segregating applications from the
operating system. Oracle Solaris Zones, BSD Jails, and IBM AIX WPARs
“contain” applications, making them more secure and manageable.
The variety of virtualization techniques in use today is a testament to
the breadth, depth, and importance of virtualization in modern computing.
Virtualization is invaluable for data-center operations, efficient application
development, and software testing, among many other uses.
Virtual Machines History
Virtual machines first appeared commercially on IBM mainframes in 1972.
Virtualization was provided by the IBM VM operating system. This system has
evolved and is still available. In addition, many of its original concepts are
found in other systems, making it worth exploring.
IBM VM/370 divided a mainframe into multiple virtual machines, each
running its own operating system. A major difficulty with the VM approach
involved disk systems. Suppose that the physical machine had three disk
drives but wanted to support seven virtual machines. Clearly, it could not
allocate a disk drive to each virtual machine. The solution was to provide
virtual disks—termed minidisks in IBM’s VM operating system. The minidisks
were identical to the system’s hard disks in all respects except size. The system
implemented each minidisk by allocating as many tracks on the physical disks
as the minidisk needed.
Once the virtual machines were created, users could run any of the oper-
ating systems or software packages that were available on the underlying
machine. For the IBM VM system, a user normally ran CMS —a single-user
interactive operating system.
For many years after IBM introduced this technology, virtualization
remained in its domain. Most systems could not support virtualization.
However, a formal definition of virtualization helped to establish system
requirements and a target for functionality. The virtualization requirements
called for:
* Fidelity. A VMM provides an environment for programs that is essentially
identical to the original machine.
* Performance. Programs running within that environment show only
minor performance decreases.
* Safety. The VMM is in complete control of system resources.
These requirements still guide virtualization efforts today.
By the late 1990s, Intel 80x86 CPUs had become common, fast, and rich
in features. Accordingly, developers launched multiple efforts to implement
virtualization on that platform. Both Xen and VMware created technologies,
still used today, to allow guest operating systems to run on the 80x86. Since
that time, virtualization has expanded to include all common CPUs, many
commercial and open-source tools, and many operating systems. For exam-
ple, the open-source VirtualBox project (http://www.virtualbox.org) provides
a program that runs on Intel x86 and AMD 64 CPUs and on Windows, Linux,
macOS, and Solaris host operating systems. Possible guest operating systems
include many versions of Windows, Linux, Solaris, and BSD, including even
MS-DOS and IBM OS/2.
Benefits and Features VM
Several advantages make virtualization attractive. Most of them are fundamen-
tally related to the ability to share the same hardware yet run several different
execution environments (that is, different operating systems) concurrently.
One important advantage of virtualization is that the host system is pro-
tected from the virtual machines, just as the virtual machines are protected
from each other. A virus inside a guest operating system might damage that
operating system but is unlikely to affect the host or the other guests. Because
each virtual machine is almost completely isolated from all other virtual
machines, there are almost no protection problems.
A potential disadvantage of isolation is that it can prevent sharing of
resources. Two approaches to providing sharing have been implemented. First,
it is possible to share a file-system volume and thus to share files. Second, it
is possible to define a network of virtual machines, each of which can send
information over the virtual communications network. The network is mod-
eled after physical communication networks but is implemented in software.
Of course, the VMM is free to allow any number of its guests to use physical
resources, such as a physical network connection (with sharing provided by the
VMM), in which case the allowed guests could communicate with each other
via the physical network.
One feature common to most virtualization implementations is the ability
to freeze, or suspend, a running virtual machine. Many operating systems
provide that basic feature for processes, but VMMs go one step further and
allow copies and snapshots to be made of the guest. The copy can be used
to create a new VM or to move a VM from one machine to another with its
current state intact. The guest can then resume where it was, as if on its original
machine, creating a clone. The snapshot records a point in time, and the guest
can be reset to that point if necessary (for example, if a change was made but
is no longer wanted). Often, VMMs allow many snapshots to be taken. For
example, snapshots might record a guest’s state every day for a month, making
restoration to any of those snapshot states possible. These abilities are used to
good advantage in virtual environments.
A virtual machine system is a perfect vehicle for operating-system research
and development. Normally, changing an operating system is a difficult task.
Operating systems are large and complex programs, and a change in one
part may cause obscure bugs to appear in some other part. The power of
the operating system makes changing it particularly dangerous. Because the
operating system executes in kernel mode, a wrong change in a pointer could
cause an error that would destroy the entire file system. Thus, it is necessary
to test all changes to the operating system carefully.
Of course, the operating system runs on and controls the entire machine,
so the system must be stopped and taken out of use while changes are made
and tested. This period is commonly called system-development time. Since
it makes the system unavailable to users, system-development time on shared
systems is often scheduled late at night or on weekends, when system load is
low.
A virtual-machine system can eliminate much of this latter problem. Sys-
tem programmers are given their own virtual machine, and system develop-
ment is done on the virtual machine instead of on a physical machine. Normal
system operation is disrupted only when a completed and tested change is
ready to be put into production.
Another advantage of virtual machines for developers is that multiple
operating systems can run concurrently on the developer’s workstation. This
virtualized workstation allows for rapid porting and testing of programs in
varying environments. In addition, multiple versions of a program can run,
each in its own isolated operating system, within one system. Similarly, quality-
assurance engineers can test their applications in multiple environments with-
out buying, powering, and maintaining a computer for each environment.
A major advantage of virtual machines in production data-center use is
system consolidation, which involves taking two or more separate systems
and running them in virtual machines on one system. Such physical-to-virtual
conversions result in resource optimization, since many lightly used systems
can be combined to create one more heavily used system.
Consider, too, that management tools that are part of the VMM allow system
administrators to manage many more systems than they otherwise could.
A virtual environment might include 100 physical servers, each running 20
virtual servers. Without virtualization, 2,000 servers would require several
system administrators. With virtualization and its tools, the same work can be
managed by one or two administrators. One of the tools that make this possible
is templating, in which one standard virtual machine image, including an
installed and configured guest operating system and applications, is saved and
used as a source for multiple running VMs. Other features include managing
the patching of all guests, backing up and restoring the guests, and monitoring
their resource use.
Virtualization can improve not only resource utilization but also resource
management. Some VMMs include a live migration feature that moves a run-
ning guest from one physical server to another without interrupting its opera-
tion or active network connections. If a server is overloaded, live migration can
thus free resources on the source host while not disrupting the guest. Similarly,
when host hardware must be repaired or upgraded, guests can be migrated
to other servers, the evacuated host can be maintained, and then the guests
can be migrated back. This operation occurs without downtime and without
interruption to users.
Think about the possible effects of virtualization on how applications are
deployed. If a system can easily add, remove, and move a virtual machine,
then why install applications on that system directly? Instead, the application
could be preinstalled on a tuned and customized operating system in a vir-
tual machine. This method would offer several benefits for application devel-
opers. Application management would become easier, less tuning would be
required, and technical support of the application would be more straightfor-
ward. System administrators would find the environment easier to manage as
well. Installation would be simple, and redeploying the application to another
system would be much easier than the usual steps of uninstalling and rein-
stalling. For widespread adoption of this methodology to occur, though, the
format of virtual machines must be standardized so that any virtual machine
will run on any virtualization platform. The “Open Virtual Machine Format” is
an attempt to provide such standardization, and it could succeed in unifying
virtual machine formats.
Virtualization has laid the foundation for many other advances in com-
puter facility implementation, management, and monitoring. Cloud comput-
ing, for example, is made possible by virtualization in which resources such
as CPU, memory, and I/O are provided as services to customers using Internet
technologies. By using APIs, a program can tell a cloud computing facility to
create thousands of VMs, all running a specific guest operating system and
application, that others can access via the Internet. Many multiuser games,
photo-sharing sites, and other web services use this functionality.
In the area of desktop computing, virtualization is enabling desktop and
laptop computer users to connect remotely to virtual machines located in remote data centers and access their applications as if they were local. This
practice can increase security, because no data are stored on local disks at the
user’s site. The cost of the user’s computing resource may also decrease. The
user must have networking, CPU, and some memory, but all that these system
components need to do is display an image of the guest as its runs remotely (via
a protocol such as RDP). Thus, they need not be expensive, high-performance
components. Other uses of virtualization are sure to follow as it becomes more
prevalent and hardware support continues to improve.
Building Blocks
Although the virtual machine concept is useful, it is difficult to implement.
Much work is required to provide an exact duplicate of the underlying
machine. This is especially a challenge on dual-mode systems, where the
underlying machine has only user mode and kernel mode. In this section, we
examine the building blocks that are needed for efficient virtualization. Note
that these building blocks are not required by type 0 hypervisors, as discussed
in Section 18.5.2.
The ability to virtualize depends on the features provided by the CPU. If
the features are sufficient, then it is possible to write a VMM that provides
a guest environment. Otherwise, virtualization is impossible. VMMs use sev-
eral techniques to implement virtualization, including trap-and-emulate and
binary translation. We discuss each of these techniques in this section, along
with the hardware support needed to support virtualization.
As you read the section, keep in mind that an important concept found
in most virtualization options is the implementation of a virtual CPU (VCPU).
The VCPU does not execute code. Rather, it represents the state of the CPU as
the guest machine believes it to be. For each guest, the VMM maintains a VCPU
representing that guest’s current CPU state. When the guest is context-switched
onto a CPU by the VMM, information from the VCPU is used to load the right
context, much as a general-purpose operating system would use the PCB.
Trap-and-Emulate
On a typical dual-mode system, the virtual machine guest can execute only in
user mode (unless extra hardware support is provided). The kernel, of course,
runs in kernel mode, and it is not safe to allow user-level code to run in
kernel mode. Just as the physical machine has two modes, so must the virtual
machine. Consequently, we must have a virtual user mode and a virtual kernel
mode, both of which run in physical user mode. Those actions that cause a
transfer from user mode to kernel mode on a real machine (such as a system
call, an interrupt, or an attempt to execute a privileged instruction) must also
cause a transfer from virtual user mode to virtual kernel mode in the virtual
machine.
How can such a transfer be accomplished? The procedure is as follows:
When the kernel in the guest attempts to execute a privileged instruction,
that is an error (because the system is in user mode) and causes a trap to the
VMM in the real machine. The VMM gains control and executes (or “emulates”)
the action that was attempted by the guest kernel on the part of the guest. It then returns control to the virtual machine. This is called the trap-and-emulate
method and is shown in Figure 18.2.
With privileged instructions, time becomes an issue. All nonprivileged
instructions run natively on the hardware, providing the same performance
for guests as native applications. Privileged instructions create extra overhead,
however, causing the guest to run more slowly than it would natively. In
addition, the CPU is being multiprogrammed among many virtual machines,
which can further slow down the virtual machines in unpredictable ways.
This problem has been approached in various ways. IBM VM, for exam-
ple, allows normal instructions for the virtual machines to execute directly
on the hardware. Only the privileged instructions (needed mainly for I/O)
must be emulated and hence execute more slowly. In general, with the evolu-
tion of hardware, the performance of trap-and-emulate functionality has been
improved, and cases in which it is needed have been reduced. For example,
many CPUs now have extra modes added to their standard dual-mode opera-
tion. The VCPU need not keep track of what mode the guest operating system is
in, because the physical CPU performs that function. In fact, some CPUs provide
guest CPU state management in hardware, so the VMM need not supply that
functionality, removing the extra overhead.
Binary Translation
Some CPUs do not have a clean separation of privileged and nonprivileged
instructions. Unfortunately for virtualization implementers, the Intel x86 CPU
line is one of them. No thought was given to running virtualization on the
x86 when it was designed. (In fact, the first CPU in the family—the Intel
4004, released in 1971—was designed to be the core of a calculator.) The chip
has maintained backward compatibility throughout its lifetime, preventing
changes that would have made virtualization easier through many genera-
tions.
Let’s consider an example of the problem. The command popf loads the
flag register from the contents of the stack. If the CPU is in privileged mode, all
of the flags are replaced from the stack. If the CPU is in user mode, then only
some flags are replaced, and others are ignored. Because no trap is generated
if popf is executed in user mode, the trap-and-emulate procedure is rendered
useless. Other x86 instructions cause similar problems. For the purposes of this
discussion, we will call this set of instructions special instructions. As recently
as 1998, using the trap-and-emulate method to implement virtualization on the
x86 was considered impossible because of these special instructions.
This previously insurmountable problem was solved with the implemen-
tation of the binary translation technique. Binary translation is fairly simple in
concept but complex in implementation. The basic steps are as follows:
1. If the guest VCPU is in user mode, the guest can run its instructions
natively on a physical CPU.
2. If the guest VCPU is in kernel mode, then the guest believes that it is run-
ning in kernel mode. The VMM examines every instruction the guest exe-
cutes in virtual kernel mode by reading the next few instructions that the
guest is going to execute, based on the guest’s program counter. Instruc-
tions other than special instructions are run natively. Special instructions
are translated into a new set of instructions that perform the equivalent
task—for example, changing the flags in the VCPU.
Binary translation is shown in Figure 18.3. It is implemented by translation
code within the VMM. The code reads native binary instructions dynamically
from the guest, on demand, and generates native binary code that executes in
place of the original code.
The basic method of binary translation just described would execute
correctly but perform poorly. Fortunately, the vast majority of instructions
would execute natively. But how could performance be improved for the other
instructions? We can turn to a specific implementation of binary translation,
the VMware method, to see one way of improving performance. Here, caching
provides the solution. The replacement code for each instruction that needs
to be translated is cached. All later executions of that instruction run from the
translation cache and need not be translated again. If the cache is large enough,
this method can greatly improve performance.
Let’s consider another issue in virtualization: memory management,
specifically the page tables. How can the VMM keep page-table state both for
guests that believe they are managing the page tables and for the VMM itself?
A common method, used with both trap-and-emulate and binary translation,
is to use nested page tables (NPTs). Each guest operating system maintains
one or more page tables to translate from virtual to physical memory. The
VMM maintains NPTs to represent the guest’s page-table state, just as it creates
a VCPU to represent the guest’s CPU state. The VMM knows when the guest
tries to change its page table, and it makes the equivalent change in the NPT.
When the guest is on the CPU, the VMM puts the pointer to the appropriate
NPT into the appropriate CPU register to make that table the active page table.
If the guest needs to modify the page table (for example, fulfilling a page
fault), then that operation must be intercepted by the VMM and appropriate
changes made to the nested and system page tables. Unfortunately, the use of
NPTs can cause TLB misses to increase, and many other complexities need to
be addressed to achieve reasonable performance.
Although it might seem that the binary translation method creates large
amounts of overhead, it performed well enough to launch a new industry
aimed at virtualizing Intel x86-based systems. VMware tested the performance
impact of binary translation by booting one such system, Windows XP, and
immediately shutting it down while monitoring the elapsed time and the
number of translations produced by the binary translation method. The result
was 950,000 translations, taking 3 microseconds each, for a total increase of
3 seconds (about 5 percent) over native execution of Windows XP. To achieve
that result, developers used many performance improvements that we do not
discuss here. For more information, consult the bibliographical notes at the end
of this chapter.
Hardware Assistance
Without some level of hardware support, virtualization would be impossible.
The more hardware support available within a system, the more feature-rich
and stable the virtual machines can be and the better they can perform. In the
Intel x86 CPU family, Intel added new virtualization support (the VT-x instruc-
tions) in successive generations beginning in 2005. Now, binary translation is
no longer needed.
In fact, all major general-purpose CPUs now provide extended hardware
support for virtualization. For example, AMD virtualization technology (AMD-
V) has appeared in several AMD processors starting in 2006. It defines two new
modes of operation—host and guest—thus moving from a dual-mode to a
multimode processor. The VMM can enable host mode, define the characteris-
tics of each guest virtual machine, and then switch the system to guest mode,
passing control of the system to a guest operating system that is running in
the virtual machine. In guest mode, the virtualized operating system thinks it
is running on native hardware and sees whatever devices are included in the
host’s definition of the guest. If the guest tries to access a virtualized resource,
then control is passed to the VMM to manage that interaction. The functionality
in Intel VT-x is similar, providing root and nonroot modes, equivalent to host
and guest modes. Both provide guest VCPU state data structures to load and
save guest CPU state automatically during guest context switches. In addition,
virtual machine control structures (VMCSs) are provided to manage guest
and host state, as well as various guest execution controls, exit controls, and
information about why guests exit back to the host. In the latter case, for exam-
ple, a nested page-table violation caused by an attempt to access unavailable
memory can result in the guest’s exit.
AMD and Intel have also addressed memory management in the virtual
environment. With AMD’s RVI and Intel’s EPT memory-management enhance-
ments, VMMs no longer need to implement software NPTs. In essence, these
CPUs implement nested page tables in hardware to allow the VMM to fully
control paging while the CPUs accelerate the translation from virtual to phys-
ical addresses. The NPTs add a new layer, one representing the guest’s view
of logical-to-physical address translation. The CPU page-table walking func-
tion (traversing the data structure to find the desired data) includes this new
layer as necessary, walking through the guest table to the VMM table to find the
physical address desired. A TLB miss results in a performance penalty, because
more tables (the guest and host page tables) must be traversed to complete
the lookup. Figure 18.4 shows the extra translation work performed by the
hardware to translate from a guest virtual address to a final physical address.
I/O is another area improved by hardware assistance. Consider that the
standard direct-memory-access (DMA) controller accepts a target memory
address and a source I/O device and transfers data between the two without
operating-system action. Without hardware assistance, a guest might try to
set up a DMA transfer that affects the memory of the VMM or other guests. In
CPUs that provide hardware-assisted DMA (such as Intel CPUs with VT-d), even
DMA has a level of indirection. First, the VMM sets up protection domains to
tell the CPU which physical memory belongs to each guest. Next, it assigns the
I/O devices to the protection domains, allowing them direct access to those
memory regions and only those regions. The hardware then transforms the
address in a DMA request issued by an I/O device to the host physical memory
address associated with the I/O. In this manner, DMA transfers are passed
through between a guest and a device without VMM interference.
Similarly, interrupts must be delivered to the appropriate guest and must
not be visible to other guests. By providing an interrupt remapping feature,
CPUs with virtualization hardware assistance automatically deliver an inter-
rupt destined for a guest to a core that is currently running a thread of that
guest. That way, the guest receives interrupts without any need for the VMM
to intercede in their delivery. Without interrupt remapping, malicious guests
could generate interrupts that could be used to gain control of the host system.
(See the bibliographical notes at the end of this chapter for more details.)
ARM architectures, specifically ARM v8 (64-bit) take a slightly different
approach to hardware support of virtualization. They provide an entire excep-
tion level— EL2—which is even more privileged than that of the kernel (EL1).
This allows the running of a secluded hypervisor, with its own MMU access
and interrupt trapping. To allow for paravirtualization, a special instruction
(HVC) is added. It allows the hypervisor to be called from guest kernels. This
instruction can only be called from within kernel mode (EL1).
An interesting side effect of hardware-assisted virtualization is that it
allows for the creation of thin hypervisors. A good example is macOS’s hyper-
visor framework (“HyperVisor.framework”), which is an operating-system-
supplied library that allows the creation of virtual machines in a few lines of
code. The actual work is done via system calls, which have the kernel call the
privileged virtualization CPU instructions on behalf of the hypervisor process,
allowing management of virtual machines without the hypervisor needing to
load a kernel module of its own to execute those calls.
The Virtual Machine Life Cycle
Let’s begin with the virtual machine life cycle. Whatever the hypervisor type,
at the time a virtual machine is created, its creator gives the VMM certain
parameters. These parameters usually include the number of CPUs, amount
of memory, networking details, and storage details that the VMM will take into
account when creating the guest. For example, a user might want to create a
new guest with two virtual CPUs, 4 GB of memory, 10 GB of disk space, one
network interface that gets its IP address via DHCP, and access to the DVD drive.
The VMM then creates the virtual machine with those parameters. In the
case of a type 0 hypervisor, the resources are usually dedicated. In this situa-
tion, if there are not two virtual CPUs available and unallocated, the creation
request in our example will fail. For other hypervisor types, the resources are
dedicated or virtualized, depending on the type. Certainly, an IP address can-
not be shared, but the virtual CPUs are usually multiplexed on the physical
CPUs as discussed in Section 18.6.1. Similarly, memory management usually
involves allocating more memory to guests than actually exists in physical
memory. This is more complicated and is described in Section 18.6.2.
Finally, when the virtual machine is no longer needed, it can be deleted.
When this happens, the VMM first frees up any used disk space and then
removes the configuration associated with the virtual machine, essentially
forgetting the virtual machine.
These steps are quite simple compared with building, configuring, run-
ning, and removing physical machines. Creating a virtual machine from an
existing one can be as easy as clicking the “clone” button and providing a new
name and IP address. This ease of creation can lead to virtual machine sprawl,
which occurs when there are so many virtual machines on a system that their
use, history, and state become confusing and difficult to track.
Type 0 Hypervisor
Type 0 hypervisors have existed for many years under many names, including
“partitions” and “domains.” They are a hardware feature, and that brings its
own positives and negatives. Operating systems need do nothing special to
take advantage of their features. The VMM itself is encoded in the firmware and
loaded at boot time. In turn, it loads the guest images to run in each partition.
The feature set of a type 0 hypervisor tends to be smaller than those of the other
types because it is implemented in hardware. For example, a system might be
split into four virtual systems, each with dedicated CPUs, memory, and I/O
devices. Each guest believes that it has dedicated hardware because it does,
simplifying many implementation details.
I/O presents some difficulty, because it is not easy to dedicate I/O devices
to guests if there are not enough. What if a system has two Ethernet ports and
more than two guests, for example? Either all guests must get their own I/O
devices, or the system must provided I/O device sharing. In these cases, the
hypervisor manages shared access or grants all devices to a control partition.
In the control partition, a guest operating system provides services (such as net-
working) via daemons to other guests, and the hypervisor routes I/O requests
appropriately. Some type 0 hypervisors are even more sophisticated and can
move physical CPUs and memory between running guests. In these cases, the
guests are paravirtualized, aware of the virtualization and assisting in its exe-
cution. For example, a guest must watch for signals from the hardware or VMM
that a hardware change has occurred, probe its hardware devices to detect the
change, and add or subtract CPUs or memory from its available resources.
Because type 0 virtualization is very close to raw hardware execution, it
should be considered separately from the other methods discussed here. A type
0 hypervisor can run multiple guest operating systems (one in each hardware
partition). All of those guests, because they are running on raw hardware, can
in turn be VMMs. Essentially, each guest operating system in a type 0 hypervisor
is a native operating system with a subset of hardware made available to it.
Because of that, each can have its own guest operating systems (Figure 18.5).
Other types of hypervisors usually cannot provide this virtualization-within-
virtualization functionality.
Type 1 Hypervisor
Type 1 hypervisors are commonly found in company data centers and are, in a
sense, becoming “the data-center operating system.” They are special-purpose
operating systems that run natively on the hardware, but rather than providing
system calls and other interfaces for running programs, they create, run, and
manage guest operating systems. In addition to running on standard hard-
ware, they can run on type 0 hypervisors, but not on other type 1 hypervisors.
Whatever the platform, guests generally do not know they are running on
anything but the native hardware.
Type 1 hypervisors run in kernel mode, taking advantage of hardware pro-
tection. Where the host CPU allows, they use multiple modes to give guest oper-
ating systems their own control and improved performance. They implement
device drivers for the hardware they run on, since no other component could
do so. Because they are operating systems, they must also provide CPU schedul-
ing, memory management, I/O management, protection, and even security.
Frequently, they provide APIs, but those APIs support applications in guests or
external applications that supply features like backups, monitoring, and secu-
rity. Many type 1 hypervisors are closed-source commercial offerings, such as
VMware ESX, while some are open source or hybrids of open and closed source,
such as Citrix XenServer and its open Xen counterpart.
By using type 1 hypervisors, data-center managers can control and man-
age the operating systems and applications in new and sophisticated ways.
An important benefit is the ability to consolidate more operating systems and
applications onto fewer systems. For example, rather than having ten systems
running at 10 percent utilization each, a data center might have one server man-
age the entire load. If utilization increases, guests and their applications can be
moved to less-loaded systems live, without interruption of service. Using snap-
shots and cloning, the system can save the states of guests and duplicate those
states—a much easier task than restoring from backups or installing manually
or via scripts and tools. The price of this increased manageability is the cost
of the VMM (if it is a commercial product), the need to learn new management
tools and methods, and the increased complexity.
Another type of type 1 hypervisor includes various general-purpose oper-
ating systems with VMM functionality. Here, an operating system such as Red-
Hat Enterprise Linux, Windows, or Oracle Solaris performs its normal duties
as well as providing a VMM allowing other operating systems to run as guests.
Because of their extra duties, these hypervisors typically provide fewer virtual-
ization features than other type 1 hypervisors. In many ways, they treat a guest
operating system as just another process, but they provide special handling
when the guest tries to execute special instructions.
Type 2 Hypervisor
Type 2 hypervisors are less interesting to us as operating-system explorers,
because there is very little operating-system involvement in these application-
level virtual machine managers. This type of VMM is simply another process
run and managed by the host, and even the host does not know that virtual-
ization is happening within the VMM.
Type 2 hypervisors have limits not associated with some of the other types.
For example, a user needs administrative privileges to access many of the
hardware assistance features of modern CPUs. If the VMM is being run by a
standard user without additional privileges, the VMM cannot take advantage
of these features. Due to this limitation, as well as the extra overhead of running a general-purpose operating system as well as guest operating systems, type 2
hypervisors tend to have poorer overall performance than type 0 or type 1.
As is often the case, the limitations of type 2 hypervisors also provide
some benefits. They run on a variety of general-purpose operating systems,
and running them requires no changes to the host operating system. A student
can use a type 2 hypervisor, for example, to test a non-native operating system
without replacing the native operating system. In fact, on an Apple laptop,
a student could have versions of Windows, Linux, Unix, and less common
operating systems all available for learning and experimentation.
Paravirtualization
As we’ve seen, paravirtualization works differently than the other types of
virtualization. Rather than try to trick a guest operating system into believing
it has a system to itself, paravirtualization presents the guest with a system
that is similar but not identical to the guest’s preferred system. The guest must
be modified to run on the paravirtualized virtual hardware. The gain for this
extra work is more efficient use of resources and a smaller virtualization layer.
The Xen VMM became the leader in paravirtulization by implementing
several techniques to optimize the performance of guests as well as of the host
system. For example, as mentioned earlier, some VMMs present virtual devices
to guests that appear to be real devices. Instead of taking that approach, the Xen
VMM presented clean and simple device abstractions that allow efficient I/O as
well as good I/O-related communication between the guest and the VMM. For each device used by each guest, there was a circular buffer shared by the guest
and the VMM via shared memory. Read and write data are placed in this buffer,
as shown in Figure 18.6.
For memory management, Xen did not implement nested page tables.
Rather, each guest had its own set of page tables, set to read-only. Xen required
the guest to use a specific mechanism, a hypercall from the guest to the hyper-
visor VMM, when a page-table change was needed. This meant that the guest
operating system’s kernel code must have been changed from the default code
to these Xen-specific methods. To optimize performance, Xen allowed the guest
to queue up multiple page-table changes asynchronously via hypercalls and
then checked to ensure that the changes were complete before continuing
operation.
Xen allowed virtualization of x86 CPUs without the use of binary trans-
lation, instead requiring modifications in the guest operating systems like the
one described above. Over time, Xen has taken advantage of hardware features
supporting virtualization. As a result, it no longer requires modified guests and
essentially does not need the paravirtualization method. Paravirtualization is
still used in other solutions, however, such as type 0 hypervisors.
Programming-Environment Virtualization
Another kind of virtualization, based on a different execution model, is the
virtualization of programming environments. Here, a programming language
is designed to run within a custom-built virtualized environment. For exam-
ple, Oracle’s Java has many features that depend on its running in the Java
virtual machine (JVM), including specific methods for security and memory
management.
If we define virtualization as including only duplication of hardware, this is
not really virtualization at all. But we need not limit ourselves to that definition.
Instead, we can define a virtual environment, based on APIs, that provides a set
of features we want to have available for a particular language and programs
written in that language. Java programs run within the JVM environment, and
the JVM is compiled to be a native program on systems on which it runs. This
arrangement means that Java programs are written once and then can run on
any system (including all of the major operating systems) on which a JVM is
available. The same can be said of interpreted languages, which run inside
programs that read each instruction and interpret it into native operations.
Emulation
Virtualization is probably the most common method for running applications
designed for one operating system on a different operating system, but on the
same CPU. This method works relatively efficiently because the applications
were compiled for the instruction set that the target system uses.
But what if an application or operating system needs to run on a different
CPU? Here, it is necessary to translate all of the source CPU’s instructions so
that they are turned into the equivalent instructions of the target CPU. Such an
environment is no longer virtualized but rather is fully emulated.
Emulation is useful when the host system has one system architecture
and the guest system was compiled for a different architecture. For example, suppose a company has replaced its outdated computer system with a new
system but would like to continue to run certain important programs that were
compiled for the old system. The programs could be run in an emulator that
translates each of the outdated system’s instructions into the native instruction
set of the new system. Emulation can increase the life of programs and allow
us to explore old architectures without having an actual old machine.
As may be expected, the major challenge of emulation is performance.
Instruction-set emulation may run an order of magnitude slower than native
instructions, because it may take ten instructions on the new system to read,
parse, and simulate an instruction from the old system. Thus, unless the new
machine is ten times faster than the old, the program running on the new
machine will run more slowly than it did on its native hardware. Another
challenge for emulator writers is that it is difficult to create a correct emulator
because, in essence, this task involves writing an entire CPU in software.
In spite of these challenges, emulation is very popular, particularly in
gaming circles. Many popular video games were written for platforms that are
no longer in production. Users who want to run those games frequently can
find an emulator of such a platform and then run the game unmodified within
the emulator. Modern systems are so much faster than old game consoles that
even the Apple iPhone has game emulators and games available to run within
them.
Application Containment
The goal of virtualization in some instances is to provide a method to segregate
applications, manage their performance and resource use, and create an easy
way to start, stop, move, and manage them. In such cases, perhaps full-fledged
virtualization is not needed. If the applications are all compiled for the same
operating system, then we do not need complete virtualization to provide these
features. We can instead use application containment.
Consider one example of application containment. Starting with version
10, Oracle Solaris has included containers, or zones, that create a virtual layer
between the operating system and the applications. In this system, only one
kernel is installed, and the hardware is not virtualized. Rather, the operating
system and its devices are virtualized, providing processes within a zone with
the impression that they are the only processes on the system. One or more
containers can be created, and each can have its own applications, network
stacks, network address and ports, user accounts, and so on. CPU and memory
resources can be divided among the zones and the system-wide processes.
Each zone, in fact, can run its own scheduler to optimize the performance of
its applications on the allotted resources. Figure 18.7 shows a Solaris 10 system
with two containers and the standard “global” user space.
Containers are much lighter weight than other virtualization methods.
That is, they use fewer system resources and are faster to instantiate and
destroy, more similar to processes than virtual machines. For this reason, they
are becoming more commonly used, especially in cloud computing. FreeBSD
was perhaps the first operating system to include a container-like feature
(called “jails”), and AIX has a similar feature. Linux added the LXC container
feature in 2014. It is now included in the common Linux distributions via
a flag in the clone() system call. (The source code for LXCis available at
https://linuxcontainers.org/lxc/downloads.)
Containers are also easy to automate and manage, leading to orchestration
tools like docker and Kubernetes. Orchestration tools are means of automat-
ing and coordinating systems and services. Their aim is to make it simple to
run entire suites of distributed applications, just as operating systems make
it simple to run a single program. These tools offer rapid deployment of
full applications, consisting of many processes within containers, and also
offer monitoring and other administration features. For more on docker, see
https://www.docker.com/what-docker. Information about Kubernetes can be
found at https://kubernetes.io/docs/concepts/overview/what-is-kubernetes.