Networking - CL1 Flashcards

1
Q

Understanding networks: layers and protocols

A

Networks Protocols Part One
A network protocol is a set of rules for communicating between
computers. Protocols govern format, timing, sequencing, and error
control. Without these rules, the computer cannot make sense of the
stream of incoming bits.
But there is more than just basic communication. Suppose you plan to
send a file from one computer to another. You could simply send it all in
one single string of data. Unfortunately, that would stop others from using
the network for the entire time it takes to send the message. This would
not be appreciated by the other users. Additionally, if an error occurred
during the transmission, the entire file would have to be sent again. To
resolve both of these problems, the file is broken into small pieces called
packets and the packets are grouped in a certain fashion. This means
that information must be added to tell the receiver where each group
belongs in relation to others, but this is a minor issue. To further improve
transmission reliability, timing information and error correcting information
are added.
Because of this complexity, computer communication is broken down into
steps. Each step has its own rules of operation and, consequently, its
own protocol. These steps must be executed in a certain order, from the
top down on transmission and from the bottom up on reception. Because
of this hierarchical arrangement, the term protocol stack is often used to
describe these steps. A protocol stack, therefore, is a set of rules for
communication, and each step in the sequence has its own subset of
rules.
What is a protocol, really? It is software that resides either in a
computer’s memory or in the memory of a transmission device, like a
network interface card. When data is ready for transmission, this software
is executed. The software prepares data for transmission and sets the
transmission in motion. At the receiving end, the software takes the data
off the wire and prepares it for the computer by taking off all the
information added by the transmitting end.There are a lot of protocols, and this often leads to confusion. A Novell
network can communicate through its own set of rules (its own protocol
called IPX/SPX), Microsoft does it another way (NetBEUI), DEC once did
it a third way (DECnet), and IBM does it yet a fourth (NetBIOS). Since the
transmitter and the receiver have to “speak” the same protocol, these
four systems cannot talk directly to each other. And even if they could
directly communicate, there is no guarantee the data would be usable
once it was communicated.
Anyone who’s ever wanted to transfer data from an IBM-compatible
personal computer to an Apple Macintosh computer realizes that what
should be a simple procedure is sometimes anything but. These two
popular computers use widely differing—and incompatible—file systems.
That makes exchanging information between them impossible, unless
you have translation software or a LAN. Even with a network, file transfer
between these two types of computers isn’t always transparent. [Editor’s
note: Even in the Internet age, Mac/Windows/Unix file exchange is often
less than perfectly transparent.]
If two types of personal computers can’t communicate easily, imagine the
problems occurring between PCs and mainframe computers, which
operate in vastly different environments and usually under their own
proprietary operating software and protocols. For example, the original
IBM PC’s peripheral interface—known as a bus—transmitted data eight
bits at a time. The newer 386, 486, and Pentium PCs have 32-bit buses,
and mainframes have even wider buses. This means that peripherals
designed to operate with one bus are incompatible with another bus, and
this includes network interface cards (NICs). Similar incompatibilities also
exist with software. For instance, Unix-based applications (and often the
data generated with them) cannot be used directly on PCs operating
under Windows or MS-DOS. Resolving some of these incompatibilities is
where protocol standards fit in.
A protocol standard is a set of rules for computer communication that has
been widely agreed upon and implemented by many vendors, users, and
standards bodies. Ideally, a protocol standard should allow computers to
talk to each other, even if they are from different vendors. Computers
don’t have to use an industry-standard protocol to communicate, but ifthey use a proprietary protocol then they can only communicate with
equipment of their own kind.
There are many standard protocols, none of which could be called
universal, but the successful ones can be characterized with something
called the OSI model. The standards and protocols associated with the
OSI reference model are a cornerstone of the open systems concept for
linking the literally dozens of dissimilar computers found in offices
throughout the world.

The OSI Model
The Open System Interconnection (OSI) model includes a set of
protocols that attempt to define and standardize the data communications
process. The OSI protocols were defined by the International
Organization for Standardization (ISO). The OSI protocols have received
the support of most major computer and network vendors, many large
customers, and most governments, including the United States.
The OSI model is a concept that describes how data communications
should take place. It divides the process into seven groups, called layers.
Into these layers are fitted the protocol standards developed by the ISO
and other standards bodies, including the Institute of Electrical and
Electronic Engineers (IEEE), American National Standards Institute
(ANSI), and the International Telecommunications Union (ITU), formerly
known as the CCITT (Comité Consultatif International Téléphonique et
Télégraphique).
The OSI model is not a single definition of how data communications
actually takes place in the real world. Numerous protocols may exist at
each layer. The OSI model states how the process should be divided and
what protocols should be used at each layer. If a network vendor
implements one of the protocols at each layer, its network components
should work with other vendors’ offerings.
The OSI model is modular. Each successive layer of the OSI model
works with the one above and below it. At least in theory, you may
substitute one protocol for another at the same layer without affecting theoperation of layers above or below. For example, Token Ring or Ethernet
hardware should operate with multiple upper-layer services, including the
transport protocols, network operating system, internetwork protocols,
and applications interfaces. However, for this interoperability to work,
vendors must create products that meet the OSI model’s specifications.
Although each layer of the OSI model provides its own set of functions, it
is possible to group the layers into two distinct categories. The first four
layers— physical, data link, network, and transport—provide the end-to-
end services necessary for the transfer of data between two systems.
These layers provide the protocols associated with the communications
network used to link two computers together.
The top three layers—the application, presentation, and session layers—
provide the application services required for the exchange of information.
That is, they allow two applications, each running on a different node of
the network, to interact with each other through the services provided by
their respective operating systems.
A graphical illustration of the OSI model is shown above. The following is
a description of just what each layer does.
1. The Physical layer provides the electrical and mechanical
interface to the network medium (the cable). This layer gives
the data-link layer (layer 2) its ability to transport a stream of
serial data bits between two communicating systems; it conveys
the bits that move along the cable. It is responsible for making
sure that the raw bits get from one place to another, no matter
what shape they are in, and deals with the mechanical and
electrical characteristics of the cable.The OSI model is
not a single definition of how data communications takes
place. It states how the processes should be divided and
offers several options. In addition to the OSI protocols, as
defined by ISO, networks can use the TCP/IP protocol suite,
the IBM Systems Network Architecture (SNA) suite, and
others. TCP/IP and SNA roughly follow the OSI structure.
2. The Data-Link layer handles the physical transfer, framing (the
assembly of data into a single unit or block), flow control and
error-control functions over a single transmission link; it is
responsible for getting the data packaged for the Physical layer.
The data link layer provides the network layer (layer 3) reliable
information-transfer capabilities. The data-link layer is often
subdivided into two parts—Logical Link Control (LLC) and
Medium Access Control (MAC)—depending on the
implementation.
3. The Network layer establishes, maintains, and terminates
logical and physical connections among multiple interconnected
networks. The network layer is responsible for translating logical
addresses, or names, into physical (or data-link) addresses. It
provides network routing and flow-control functions across the
computer-network interface.4. The Transport layer ensures data is successfully sent and
received between two end nodes. If data is sent incorrectly, this
layer has the responsibility to ask for retransmission of the data.
Specifically, it provides a reliable, network-independent
message-interchange service to the top three application-
oriented layers. This layer acts as an interface between the
bottom and top three layers. By providing the session layer
(layer 5) with a reliable message transfer service, it hides the
detailed operation of the underlying network from the session
layer.
5. The Session layer decides when to turn communication on and
off between two computers—it provides the mechanisms that
control the data-exchange process and coordinates the
interaction between them. It sets up and clears communication
channels between two communicating components. Unlike the
network layer (layer 3), it deals with the programs running in
each machine to establish conversations between them. Some
of the most commonly encountered protocol stacks, including
TCP/IP, don’t implement a session layer.
6. The Presentation layer performs code conversion and data
reformatting (syntax translation). It is the translator of the
network, making sure the data is in the correct form for the
receiving application. Of course, both the sending and receiving
applications must be able to use data subscribing to one of the
available abstract data syntax forms. Most commonly,
applications handle these sorts of data translations themselves
rather than handing them off to a Presentation layer.
7. The Application layer provides the interface between the
software running in a computer and the network. It provides
functions to the user’s software, including file transfer access
and management (FTAM) and electronic mail service.
Unfortunately, protocols in the real world do not conform precisely to
these neat definitions. Some network products and architectures combine
layers. Others leave layers out. Still others break the layers apart. But nomatter how they do it, all working network products achieve the same
result—getting data from here to there. The question is, do they do it in a
way that is compatible with networks in the rest of the world?

What OSI Is And Is Not
While discussing the OSI reference model it is important to understand
what the model does not specify as well as what it actually spells out.
The ISO created the OSI reference model solely to describe the external
behavior of electronics systems, not their internal functions.
The reference model does not determine programming or operating
system functions, nor does it specify an application programming
interface (API). Neither does it dictate the end-user interface—that is, the
command-line and/or icon-based prompts a user uses to interact with a
computer system.
The OSI standards merely describe what is placed on a network cable
and when and how it will be placed there. It does not state how vendors
must build their computers, only the kinds of behavior these systems may
exhibit while performing certain communications operations.
The OSI standards are distinct from the OSI suite of protocols. Thisconcept permits a vendor to develop network elements that are more or
less ignorant of the other components on the network. They are said to
be ignorant in that they may need to know that other network
components exist, but not the specific details about their operating
systems or interface buses. One of the primary benefits of this concept is
that vendors can change the internal design of their network components
without affecting their network functionality, as long as they maintain the
OSI-prescribed external attributes. The figure below shows the protocols
in the OSI suite.

Connection Types
The OSI protocol suite is inherently connection-oriented, but the services
each OSI layer provides can either be connection-oriented, or
connectionless. In the three-step connection-oriented mode operation
(the steps are connection establishment, data transfer, and connection
release), an explicit binding between two systems takes place.
In connectionless operation, no such explicit link occurs; data transfer
takes place with no specified connection and disconnection function
occurring between the two communicating systems. Connectionless
communication is also known as datagram communication.

At The Physical Layer
Let’s compare some real protocols to the OSI model. The best known
physical layer standards of the OSI model are those from the IEEE. That
is, the ISO adopted some of the IEEE’s physical network standards as
part of its OSI model, including IEEE 802.3 or Ethernet, IEEE 802.4 or
token-passing bus, and IEEE 802.5 or Token Ring. ISO has changed the
numbering scheme, however, so 802.3 networks are referred to as ISO
8802-3, 802.4 networks are ISO 8802-4, and 802.5 networks are ISO
8802-5.
Each physical layer standard defines the network’s physical
characteristics and how to get raw data from one place to another. They
also define how multiple computers can simultaneously use the network
without interfering with each other. (Technically, this last part is a job forthe data-link layer, but we’ll deal with that later.)
IEEE 802.3 defines a network that can transmit data at 10Mbps and uses
a logical bus (or a straight line) layout. (Physically, the network can be
configured as a bus or a star.) Data is simultaneously visible to all
machines on the network and is nondirectional on the cable. All machines
receive every frame, but only those meant to receive the data will
process the frame and pass it to the next layer of the stack. Network
access is determined by a protocol called Carrier Sense Multiple
Access/Collision Detection (CSMA/CD). CSMA/CD lets any computer
send data whenever the cable is free of traffic. If the data collides with
another data packet, both computers “back off,” or wait a random time,
then try again to send the data until access is permitted. Thus, once there
is a high level of traffic, the more users there are, the more crowded and
slower the network will become. Ethernet has found wide acceptance in
office automation networks.
IEEE 802.4 defines a physical network that has a bus layout. Like 802.3,
Token Bus is a shared medium network. All machines receive all data but
do not respond unless data is addressed to them. But unlike 802.3,
network access is determined by a token that moves around the network.
The token is visible to every device but only the device that is next in line
for the token gets it. Once a device has the token it may transmit data.
The Manufacturing Automation Protocol (MAP) and Technical Office
Protocol (TOP) standards use an 802.4 physical layer. Token Bus has
had little success outside of factory automation networks.
IEEE 802.5 defines a network that transmits data at 4Mbps or 16Mbps
and uses a logical ring layout, but is physically configured as a star. Data
moves around the ring from station to station, and each station
regenerates the signal. It does not support simultaneous multiple access
as Ethernet does. The network access protocol is token-passing. The
token and data move about in a ring, rather than over a bus as they do in
Token Bus. Token Ring has found moderate acceptance in office
automation networks and a greater degree of support in IBM-centric
environments.
There are other physical and data-link layer standards, some thatconform to the OSI model and others that don’t. ARCnet is a well known
one that only became standardized in 1998, long after the time when it
had any commercial significance. It uses a token-passing bus access
method, but not the same as does IEEE 802.4. LocalTalk is Apple’s
proprietary network that transmits data at 230.4Kbps and uses CSMA/CA
(Collision Avoidance). Fiber Distributed Data Interface (FDDI) is an ANSI
and OSI standard for a fiber-optic LAN that uses a token-passing protocol
to transmit data at 100Mbps on a ring.

When It Began
The International Standards Organization, based in Geneva, Switzerland,
is a multinational body of representatives from the standards-setting
agencies of about 90 countries. These agencies include the American
National Standards Institute (ANSI) and British Standards Institute (BSI).
Because of the multinational nature of Europe, and its critical need for
intersystem communication, the market for OSI-based products is
particularly strong there. As a result, the European Computer
Manufacturers’ Association (ECMA) has played a major role in
developing the OSI standards. In fact, before the Internet’s Transmission
Control Protocol/Internet Protocol (TCP/IP) began to dominate
international networks, European networking vendors and users were
generally further advanced in network standards, based on OSI
implementations, than were their American counterparts, who relied
principally on proprietary solutions such as IBM’s Systems Network
Architecture (SNA) or TCP/IP.
Creating the OSI standards was a long, drawn-out process: The ISO
began work on OSI protocols in the late 1970s, finally releasing its seven-
layer architecture in 1984. It wasn’t until 1988 that the five-step
standards-setting process finally resulted in stabilized protocols for the
upper layers of the OSI reference model. [Editor’s note: From the
perspective of 2000, the primary worldwide significance of the OSI
protocols was in the use of the seven layer stack model as a way of
learning about networks. While there remain many implementations of
OSI protocols, particularly in Europe where they were in some cases
legally imposed, it’s clear that worldwide, the lion’s share of newdevelopment and investment is devoted to TCP/IP and will be for the
foreseeable future.]

Network Protocols Part Two
The Data-Link layer (the second OSI layer) is often divided into two
sublayers; the Logical Link Control (LLC) and the Medium Access Control
(MAC). The IEEE also defines standards at the data-link layer. The ISO
standards for the MAC layer, or lower half of the data-link layer, were
taken directly from the IEEE 802.x standards.
Medium Access Control, as its name suggests, is the protocol that
determines which computer gets to use the cable (the transmission
medium) when several computers are trying. For example, 802.3 allows
packets to collide with each other, forcing the computers to retry a
transmission until it is sent successfully. 802.4 and 802.5 limit
conversation to the computer with the token. Remember, this is done in
fractions of a second, so even when the network is busy, users don’t wait
very long for access on any of these three network types.
The upper half of the data-link layer, the LLC, provides reliable data
transfer over the physical link. In essence, it manages the physical link.
The IEEE splits the data-link layer in half because the layer has two jobs
to do. The first is to coordinate the physical transfer of data. The second
is to manage access to the physical medium. Dividing the layer allows for
more modularity and therefore more flexibility. The type of medium
access control has more to do with the physical requirements of the
network than the actual management of data transfer. In other words, the
MAC layer is closer to the physical layer than the LLC layer. By dividing
the layer, a number of MAC layers can be created, each corresponding to
a different physical layer, but just one LLC layer can handle them all. This
increases flexibility and gives the LLC an important role in providing an
interface between the various MAC layers and the higher-layer protocols.
The role of the data-link’s upper layer is so crucial, the IEEE gave it a
standard of its own: 802.2 LLC.
Besides 802.2, other protocols can perform the LLC functions. High-level
Data-Link Control (HDLC) is a protocol from ISO, which also conforms to
the OSI model. IBM’s Synchronous Data-Link Control (SDLC) does not
conform to the OSI model but performs functions similar to the data-linklayer. Digital Equipment’s DDCMP or Digital Data Communications
Protocol provides similar functions.

Three Transport Protocols
The ISO has established protocol standards for the middle layers of the
OSI model. The transport layer, at layer four, ensures that data is reliably
transferred among transport services and users. Layer five, the session
layer, is responsible for process-to-process communication. The line
between the session and transport layers is often blurred.
As of yet, no ISO transport or session layer has been implemented on a
widespread basis, nor has the complete OSI protocol stack been
established. To make matters more confusing, most middle-layer
protocols on the market today do not fit neatly into the OSI model’s
transport and session layers, since many were created before the ISO
began work on the OSI model.
The good news is many existing protocols are being incorporated into the
OSI model. Where existing protocols are not incorporated, interfaces to
the OSI model are being implemented. This is the case for TCP/IP, and
IPX, which are the major middle-layer protocols available today.
In the PC LAN environment, NetBIOS has been an important protocol.
IBM developed NetBIOS (or Network Basic Input/Output System) as an
input/output system for networks. NetBIOS can be considered a session-
layer protocol that acts as an application interface to the network. It
provides the tools for a program to establish a session with another
program over the network. Many programs have been written to this
interface.
NetBIOS does not obey the rules of the OSI model in that it does not talk
only to the layers above and below it. Programs can talk directly to
NetBIOS, skipping the application and presentation layers. This doesn’t
keep NetBIOS from doing its job; it just makes it incompatible with the
OSI model. The main drawback of NetBIOS is that it is limited to working
on a single network.TCP/IP or Transmission Control Protocol/Internet Protocol is actually
several protocols. TCP is a transport protocol. IP operates on the network
layer. TCP/IP traditionally enjoyed enormous support in government,
scientific, and academic internetworks and in recent years has dominated
the commercial networking environment, too. Part of the explanation is
that corporate networks began to approach the size of networks found in
the government and in universities, which drove corporations to look for
internetworking protocol standards. They found TCP/IP to be
progressively more useful as it became more widespread. Many people
once viewed TCP/IP as an interim solution until OSI could be deployed,
but no one seriously believes that the OSI protocols will ever have more
than a niche role in the future.
Often when TCP/IP is discussed, the subjects of SMTP, FTP, Telnet, and
SNMP are also raised. These are application protocols developed
specifically for TCP/IP. SMTP or the Simple Mail Transfer Protocol is the
electronic mail relay standard. FTP stands for File Transfer Protocol and
is used to exchange files among computers running TCP/IP. Telnet is
remote log-in and terminal emulation software. SNMP or the Simple
Network Management Protocol is the most widely implemented network
management protocol. The figure shows the protocols of TCP/IP.
Novell traditionally used IPX/SPX as its native transport protocols, though
the company introduced a “native” implementation of TCP/IP in place of
IPX/SPX. Internetwork Packet Exchange (IPX) and Sequenced Packet
Exchange (SPX) are both variants of Xerox’s XNS protocol. IPX provides
network layer services, while SPX is somewhat rarely employed by
applications that need transport layer services. Because IPX
implementations prior to the introduction of NetWare Link Services
Protocol (NLSP) in NetWare 4 caused a great deal of broadcast traffic
and required frequent transmission acknowledgements, which can cause
problems in a WAN, Novell also supported TCP/IP with gateways prior to
its native TCP/IP implementation.

Protocol Babel
If the number of available protocols seems like senseless confusion, it is
and it isn’t. Certain protocols have different advantages in specific
environments. No single protocol stack will work better than every other
in every setting. NetBIOS works well in small PC networks but is
practically useless for communicating with WANs; APPC works well in
peer-to-peer mainframe environments; TCP/IP excels in internetworks
and heterogeneous environments.
On the other hand, much more is made about the differences in protocols
than is warranted. Proprietary protocols can be perfect solutions in many
cases. Besides, if proprietary protocols are sufficiently widespread, they
become de facto standards, and gateways to other protocols are built.
These include DEC’s protocol suite, Sun Microsystems’ Network Filing
System and other protocols, and Apple’s AppleTalk protocols. While
these enjoy widespread use, that use is based on the computers thesecompanies sell and not the proliferation of the protocols throughout the
networking industry.
Whether it’s a proprietary or standard protocol, users are faced with
difficult choices. These choices are made slightly easier by the shakeout
and standardization that has occurred at the physical and data-link
layers. There are three choices: Token Ring, Ethernet, or FDDI. At the
transport layers, IPX/SPX and TCP/IP emerged as the dominant
protocols. [Editor’s note: As of 2000, practically every new local network
installation uses some form of Ethernet and TCP/IP, while the installed
base remnants of Token Ring, FDDI, and IPX are diminishing
irretrievably.]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Basic understanding of TCP/IP model and protocols

A

Although a protocol architecture may suggest a certain approach to implementation, it usually does not include a mandate. Consequently, we make a distinction between the protocol architecture and the implementation architecture, which
defines how the concepts in a protocol architecture may be rendered into existence, usually in the form of software.
Many of the individuals responsible for implementing the protocols for the
ARPANET were familiar with the software structuring of operating systems, and
an influential paper describing the “THE” multiprogramming system [D68] advocated the use of a hierarchical structure as a way to deal with verification of the
logical soundness and correctness of a large software implementation. Ultimately,
this contributed to a design philosophy for networking protocols involving multiple layers of implementation (and design). This approach is now called layering
and is the usual approach to implementing protocol suites.

Layering
With layering, each layer is responsible for a different facet of the communications. Layers are beneficial because a layered design allows developers to evolve
different portions of the system separately, often by different people with somewhat different areas of expertise. The most frequently mentioned concept of protocol layering is based on a standard called the Open Systems Interconnection (OSI)
model [Z80] as defined by the International Organization for Standardization
(ISO). Figure 1-2 shows the standard OSI layers, including their names, numbers,
and a few examples. The Internet’s layering model is somewhat simpler, as we
shall see in Section 1.3.
Although the OSI model suggests that seven logical layers may be desirable
for modularity of a protocol architecture implementation, the TCP/IP architecture is normally considered to consist of five. There was much debate about the
relative benefits and deficiencies of the OSI model, and the ARPANET model that
preceded it, during the early 1970s. Although it may be fair to say that TCP/IP
ultimately “won,” a number of ideas and even entire protocols from the ISO protocol suite (protocols standardized by ISO that follow the OSI model) have been
adopted for use with TCP/IP (e.g., IS-IS [RFC3787]).
As described briefly in Figure 1-2, each layer has a different responsibility.
From the bottom up, the physical layer defines methods for moving digital information across a communication medium such as a phone line or fiber-optic cable.
Portions of the Ethernet and Wireless LAN (Wi-Fi) standards are here, although
we do not delve into this layer very much in this text. The link or data-link layer
includes those protocols and methods for establishing connectivity to a neighbor
sharing the same medium. Some link-layer networks (e.g., DSL) connect only two
neighbors. When more than one neighbor can access the same shared network, the
network is said to be a multi-access network. Wi-Fi and Ethernet are examples of
such multi-access link-layer networks, and specific protocols are used to mediate
which stations have access to the shared medium at any given time. We discuss
these in Chapter 3.
Moving up the layer stack, the network or internetwork layer is of great interest
to us. For packet networks such as TCP/IP, it provides an interoperable packet format that can use different types of link-layer networks for connectivity. The layer
also includes an addressing scheme for hosts and routing algorithms that choose
where packets go when sent from one machine to another. Above layer 3 we find
protocols that are (at least in theory) implemented only by end hosts, including
the transport layer. Also of great interest to us, it provides a flow of data between
sessions and can be quite complex, depending on the types of services it provides (e.g., reliable delivery on a packet network that might drop data). Sessions represent ongoing interactions between applications (e.g., when “cookies” are used
with a Web browser during a Web login session), and session-layer protocols may
provide capabilities such as connection initiation and restart, plus checkpointing
(saving work that has been accomplished so far). Above the session layer we find
the presentation layer, which is responsible for format conversions and standard
encodings for information. As we shall see, the Internet protocols do not include a
formal session or presentation protocol layer, so these functions are implemented
by applications if needed.
The top layer is the application layer. Applications usually implement their
own application-layer protocols, and these are the ones most visible to users.
There is a wide variety of application-layer protocols, and programmers are constantly inventing new ones. Consequently, the application layer is where there is
the greatest amount of innovation and where new capabilities are developed and
deployed.

Multiplexing, Demultiplexing, and Encapsulation in Layered
Implementations
One of the major benefits of a layered architecture is its natural ability to perform
protocol multiplexing. This form of multiplexing allows multiple different protocols
to coexist on the same infrastructure. It also allows multiple instantiations of the
same protocol object (e.g., connections) to be used simultaneously without being
confused.
Multiplexing can occur at different layers, and at each layer a different sort of
identifier is used for determining which protocol or stream of information belongs
together. For example, at the link layer, most link technologies (such as Ethernet
and Wi-Fi) include a protocol identifier field value in each packet to indicate which
protocol is being carried in the link-layer frame (IP is one such protocol). When
an object (packet, message, etc.), called a protocol data unit (PDU), at one layer is
carried by a lower layer, it is said to be encapsulated (as opaque data) by the next
layer down. Thus, multiple objects at layer N can be multiplexed together using
encapsulation in layer N - 1. Figure 1-3 shows how this works. The identifier at
layer N - 1 is used to determine the correct receiving protocol or program at layer
N during demultiplexing.
In Figure 1-3, each layer has its own concept of a message object (a PDU) corresponding to the particular layer responsible for creating it. For example, if a layer
4 (transport) protocol produces a packet, it would properly be called a layer 4 PDU
or transport PDU (TPDU). When a layer is provided a PDU from the layer above it,
it usually “promises” to not look into the contents of the PDU. This is the essence
of encapsulation—each layer treats the data from above as opaque, uninterpretable information. Most commonly a layer prepends the PDU with its own header,
although trailers are used by some protocols (not TCP/IP). The header is used for
multiplexing data when sending, and for the receiver to perform demultiplexing, based on a demultiplexing (demux) identifier. In TCP/IP networks such identifiers
are commonly hardware addresses, IP addresses, and port numbers. The header
may also include important state information, such as whether a virtual circuit is
being set up or has already completed setup. The resulting object is another PDU.
One other important feature of layering suggested by Figure 1-2 is that in pure
layering not all networked devices need to implement all the layers. Figure 1-4
shows that in some cases a device needs to implement only a few layers if it is
expected to perform only certain types of processing.
In Figure 1-4, a somewhat idealized small internet includes two end systems, a
switch, and a router. In this figure, each number corresponds to a type of protocol
at a particular layer. As we can see, each device implements a different subset of
the layer stack. The host on the left implements three different link-layer protocols
(D, E, and F) with corresponding physical layers and three different transportlayer protocols (A, B, and C) that run on a single type of network-layer protocol.
End hosts implement all the layers, switches implement up to layer 2 (this switch
implements D and G), and routers implement up to layer 3. Routers are capable
of interconnecting different types of link-layer networks and must implement the
link-layer protocols for each of the network types they interconnect.
The internet of Figure 1-4 is somewhat idealized because today’s switches and
routers often implement more than the protocols they are absolutely required to
implement for forwarding data. This is for a number of reasons, including management. In such circumstances, devices such as routers and switches must sometimes act as hosts and support services such as remote login. To do this, they
usually must implement transport and application protocols.
Although we show only two hosts communicating, the link- and physicallayer networks (labeled as D and G) might have multiple hosts attached. If so,
then communication is possible between any pair of systems that implement the
appropriate higher-layer protocols. In Figure 1-4 we can differentiate between an
end system (the two hosts on either side) and an intermediate system (the router in
the middle) for a particular protocol suite. Layers above the network layer use endto-end protocols. In our picture these layers are needed only on the end systems.
The network layer, however, provides a hop-by-hop protocol and is used on the two
end systems and every intermediate system. The switch or bridge is not ordinarily
considered an intermediate system because it is not addressed using the internetworking protocol’s addressing format, and it operates in a fashion that is largely
transparent to the network-layer protocol. From the point of view of the routers
and end systems, the switch or bridge is essentially invisible.
A router, by definition, has two or more network interfaces (because it connects two or more networks). Any system with multiple interfaces is called multihomed. A host can also be multihomed, but unless it specifically forwards packets
from one interface to another, it is not called a router. Also, routers need not be special hardware boxes that only move packets around an internet. Most TCP/IP
implementations, for example, allow a multihomed host to act as a router also,
if properly configured to do so. In this case we can call the system either a host
(when an application such as File Transfer Protocol (FTP) [RFC0959] or the Web is
used) or a router (when it is forwarding packets from one network to another). We
will use whichever term makes sense given the context.
One of the goals of an internet is to hide all of the details of the physical layout (the topology) and lower-layer protocol heterogeneity from the applications.
Although this is not obvious from our two-network internet in Figure 1-4, the
application layers should not care (and do not care) that even though each host
is attached to a network using link-layer protocol D (e.g., Ethernet), the hosts are
separated by a router and switch that use link-layer G. There could be 20 routers between the hosts, with additional types of physical interconnections, and the
applications would run without modification (although the performance might be
somewhat different). Abstracting the details in this way is what makes the concept of an internet so powerful and useful.

The Architecture and Protocols of the TCP/IP Suite
So far we have discussed architecture, protocols, protocol suites, and implementation techniques in the abstract. In this section, we discuss the architecture and
particular protocols that constitute the TCP/IP suite. Although this has become the
established term for the protocols used on the Internet, there are many protocols
beyond TCP and IP in the collection or family of protocols used with the Internet. We begin by noting how the ARPANET reference model of layering, which
ultimately formed the basis for the Internet’s protocol layering, differs somewhat
from the OSI layering discussed earlier.

The ARPANET Reference Model
Figure 1-5 depicts the layering inspired by the ARPANET reference model, which
was ultimately adopted by the TCP/IP suite. The structure is simpler than the OSI
model, but real implementations include a few specialized protocols that do not fit
cleanly into the conventional layers.
Starting from the bottom of Figure 1-5 and working our way up the stack,
the first layer we see is 2.5, an “unofficial” layer. There are several protocols that
operate here, but one of the oldest and most important is called the Address Resolution Protocol (ARP). It is a specialized protocol used with IPv4 and only with
multi-access link-layer protocols (such as Ethernet and Wi-Fi) to convert between
the addresses used by the IP layer and the addresses used by the link layer. We
examine this protocol in Chapter 4. In IPv6 the address-mapping function is part
of ICMPv6, which we discuss in Chapter 8.
At layer number 3 in Figure 1-5 we find IP, the main network-layer protocol
for the TCP/IP suite. We discuss it in detail in Chapter 5. The PDU that IP sends to
link-layer protocols is called an IP datagram and may be as large as 64KB (and up
to 4GB for IPv6). In many cases we shall use the simpler term packet to mean an
IP datagram when the usage context is clear. Fitting large packets into link-layer
PDUs (called frames) that may be smaller is handled by a function called fragmentation that may be performed by IP hosts and some routers when necessary. In fragmentation, portions of a larger datagram are sent in multiple smaller datagrams
called fragments and put back together (called reassembly) when reaching the destination. We discuss fragmentation in Chapter 10.
Throughout the text we shall use the term IP to refer to both IP versions 4 and
6. We use the term IPv6 to refer to IP version 6, and IPv4 to refer to IP version 4,
currently the most popular version. When discussing architecture, the details of
IPv4 versus IPv6 matter little. When we delve into the way particular addressing
and configuration functions work (Chapter 2 and Chapter 6), for example, these
details will become more important.
Because IP packets are datagrams, each one contains the address of the layer
3 sender and recipient. These addresses are called IP addresses and are 32 bits long
for IPv4 and 128 bits long for IPv6; we discuss them in detail in Chapter 2. This
difference in IP address size is the characteristic that most differentiates IPv4 from
IPv6. The destination address of each datagram is used to determine where each
datagram should be sent, and the process of making this determination and sending the datagram to its next hop is called forwarding. Both routers and hosts perform forwarding, although routers tend to do it much more often. There are three types of IP addresses, and the type affects how forwarding is performed: unicast
(destined for a single host), broadcast (destined for all hosts on a given network),
and multicast (destined for a set of hosts that belong to a multicast group). Chapter
2 looks at the types of addresses used with IP in more detail.
The Internet Control Message Protocol (ICMP) is an adjunct to IP, and we label
it as a layer 3.5 protocol. It is used by the IP layer to exchange error messages and
other vital information with the IP layer in another host or router. There are two
versions of ICMP: ICMPv4, used with IPv4, and ICMPv6, used with IPv6. ICMPv6
is considerably more complex and includes functions such as address autoconfiguration and Neighbor Discovery that are handled by other protocols (e.g., ARP)
on IPv4 networks. Although ICMP is used primarily by IP, it is also possible for
applications to use it. Indeed, we will see that two popular diagnostic tools, ping
and traceroute, use ICMP. ICMP messages are encapsulated within IP datagrams in the same way transport layer PDUs are.
The Internet Group Management Protocol (IGMP) is another protocol adjunct to
IPv4. It is used with multicast addressing and delivery to manage which hosts are
members of a multicast group (a group of receivers interested in receiving traffic for
a particular multicast destination address). We describe the general properties of
broadcasting and multicasting, along with IGMP and the Multicast Listener Discovery protocol (MLD, used with IPv6), in Chapter 9.
At layer 4, the two most common Internet transport protocols are vastly different. The most widely used, the Transmission Control Protocol (TCP), deals with
problems such as packet loss, duplication, and reordering that are not repaired
by the IP layer. It operates in a connection-oriented (VC) fashion and does not
preserve message boundaries. Conversely, the User Datagram Protocol (UDP) provides little more than the features provided by IP. UDP allows applications to send
datagrams that preserve message boundaries but imposes no rate control or error
control.
TCP provides a reliable flow of data between two hosts. It is concerned with
things such as dividing the data passed to it from the application into appropriately sized chunks for the network layer below, acknowledging received packets,
and setting timeouts to make certain the other end acknowledges packets that
are sent, and because this reliable flow of data is provided by the transport layer,
the application layer can ignore all these details. The PDU that TCP sends to IP is
called a TCP segment.
UDP, on the other hand, provides a much simpler service to the application
layer. It allows datagrams to be sent from one host to another, but there is no
guarantee that the datagrams reach the other end. Any desired reliability must
be added by the application layer. Indeed, about all that UDP provides is a set
of port numbers for multiplexing and demultiplexing data, plus a data integrity
checksum. As we can see, UDP and TCP differ radically even though they are at
the same layer. There is a use for each type of transport protocol, which we will
see when we look at the different applications that use TCP and UDP.
There are two additional transport-layer protocols that are relatively new
and available on some systems. As they are not yet very widespread, we do not
devote much discussion to them, but they are worth being aware of. The first is the
Datagram Congestion Control Protocol (DCCP), specified in [RFC4340]. It provides a
type of service midway between TCP and UDP: connection-oriented exchange of
unreliable datagrams but with congestion control. Congestion control comprises
a number of techniques whereby a sender is limited to a sending rate in order to
avoid overwhelming the network. We discuss it in detail with respect to TCP in
Chapter 16.
The other transport protocol available on some systems is called the Stream
Control Transmission Protocol (SCTP), specified in [RFC4960]. SCTP provides reliable delivery like TCP but does not require the sequencing of data to be strictly
maintained. It also allows for multiple streams to logically be carried on the same
connection and provides a message abstraction, which differs from TCP. SCTP
was designed for carrying signaling messages on IP networks that resemble those
used in the telephone network.
Above the transport layer, the application layer handles the details of the particular application. There are many common applications that almost every implementation of TCP/IP provides. The application layer is concerned with the details
of the application and not with the movement of data across the network. The
lower three layers are the opposite: they know nothing about the application but
handle all the communication details.

Multiplexing, Demultiplexing, and Encapsulation in TCP/IP
We have already discussed the basics of protocol multiplexing, demultiplexing,
and encapsulation. At each layer there is an identifier that allows a receiving system to determine which protocol or data stream belongs together. Usually there is
also addressing information at each layer. This information is used to ensure that
a PDU has been delivered to the right place. Figure 1-6 shows how demultiplexing
works in a hypothetical Internet host.
Although it is not really part of the TCP/IP suite, we shall begin bottom-up
and mention how demultiplexing from the link layer is performed, using Ethernet
as an example. We discuss several link-layer protocols in Chapter 3. An arriving
Ethernet frame contains a 48-bit destination address (also called a link-layer or
MAC—Media Access Control—address) and a 16-bit field called the Ethernet type.
A value of 0x0800 (hexadecimal) indicates that the frame contains an IPv4 datagram. Values of 0x0806 and 0x86DD indicate ARP and IPv6, respectively. Assuming that the destination address matches one of the receiving system’s addresses,
the frame is received and checked for errors, and the Ethernet Type field value is
used to select which network-layer protocol should process it.
Assuming that the received frame contains an IP datagram, the Ethernet
header and trailer information is removed, and the remaining bytes (which constitute the frame’s payload) are given to IP for processing. IP checks a number of
items, including the destination IP address in the datagram. If the destination address matches one of its own and the datagram contains no errors in its header
(IP does not check its payload), the 8-bit IPv4 Protocol field (called Next Header
in IPv6) is checked to determine which protocol to invoke next. Common values
include 1 (ICMP), 2 (IGMP), 4 (IPv4), 6 (TCP), and 17 (UDP). The value of 4 (and
41, which indicates IPv6) is interesting because it indicates the possibility that an
IP datagram may appear inside the payload area of an IP datagram. This violates
the original concepts of layering and encapsulation but is the basis for a powerful
technique known as tunneling, which we discuss more in Chapter 3.
Once the network layer (IPv4 or IPv6) determines that the incoming datagram
is valid and the correct transport protocol has been determined, the resulting datagram (reassembled from fragments if necessary) is passed to the transport layer
for processing. At the transport layer, most protocols (including TCP and UDP)
use port numbers for demultiplexing to the appropriate receiving application.

Port Numbers
Port numbers are 16-bit nonnegative integers (i.e., range 0–65535). These numbers
are abstract and do not refer to anything physical. Instead, each IP address has
65,536 associated port numbers for each transport protocol that uses port numbers (most do), and they are used for determining the correct receiving application. For
client/server applications (see Section 1.5.1), a server first “binds” to a port number, and subsequently one or more clients establish connections to the port number using a particular transport protocol on a particular machine. In this sense,
port numbers act more like telephone number extensions, except they are usually
assigned by standards.
Standard port numbers are assigned by the Internet Assigned Numbers
Authority (IANA). The set of numbers is divided into special ranges, including the
well-known port numbers (0–1023), the registered port numbers (1024–49151), and
the dynamic/private port numbers (49152–65535). Traditionally, servers wishing to
bind to (i.e., offer service on) a well-known port require special privileges such as
administrator or “root” access.
The range of well-known ports is used for identifying many well-known services such as the Secure Shell Protocol (SSH, port 22), FTP (ports 20 and 21), Telnet
remote terminal protocol (port 23), e-mail/Simple Mail Transfer Protocol (SMTP,
port 25), Domain Name System (DNS, port 53), the Hypertext Transfer Protocol or Web
(HTTP and HTTPS, ports 80 and 443), Interactive Mail Access Protocol (IMAP and
IMAPS, ports 143 and 993), Simple Network Management Protocol (SNMP, ports 161
and 162), Lightweight Directory Access Protocol (LDAP, port 389), and several others.
Protocols with multiple ports (e.g., HTTP and HTTPS) often have different port
numbers depending on whether Transport Layer Security (TLS) is being used with
the base application-layer protocol (see Chapter 18).
Note
If we examine the port numbers for these standard services and other standard
TCP/IP services (Telnet, FTP, SMTP, etc.), we see that most are odd numbers.
This is historical, as these port numbers are derived from the NCP port numbers.
(NCP, the Network Control Protocol, preceded TCP as a transport-layer protocol
for the ARPANET.) NCP was simplex, not full duplex, so each application required
two connections, and an even-odd pair of port numbers was reserved for each
application. When TCP and UDP became the standard transport layers, only a
single port number was needed per application, yet the odd port numbers from
NCP were used.
The registered port numbers are available to clients or servers with special
privileges, but IANA keeps a reserved registry for particular uses, so these port
numbers should generally be avoided when developing new applications unless
an IANA allocation has been procured. The dynamic/private port numbers are
essentially unregulated. As we will see, in some circumstances (e.g., on clients)
the value of the port number matters little because the port number being used
is transient. Such port numbers are also called ephemeral port numbers. They are
considered to be temporary because a client typically needs one only as long as the
user running the client needs service, and the client does not need to be found by the server in order to establish a connection. Servers, conversely, generally require
names and port numbers that do not change often in order to be found by clients

Names, Addresses, and the DNS
With TCP/IP, each link-layer interface on each computer (including routers) has
at least one IP address. IP addresses are enough to identify a host, but they are
not very convenient for humans to remember or manipulate (especially the long
addresses used with IPv6). In the TCP/IP world, the DNS is a distributed database
that provides the mapping between host names and IP addresses (and vice versa).
Names are set up in a hierarchy, ending in domains such as .com, .org, .gov, .in,
.uk, and .edu. Perhaps surprisingly, DNS is an application-layer protocol and
thus depends on the other protocols in order to operate. Although most of the
TCP/IP suite does not use or care about names, typical users (e.g., those using Web
browsers) use names frequently, so if the DNS fails to function properly, normal
Internet access is effectively disabled. Chapter 11 looks into the DNS in detail.
Applications that manipulate names can call a standard API function (see
Section 1.5.3) to look up the IP address (or addresses) corresponding to a given
host’s name. Similarly, a function is provided to do the reverse lookup—given an
IP address, look up the corresponding host name. Most applications that take a host
name as input also take an IP address. Web browsers support this capability. For
example, the Uniform Resource Locators (URLs) http://131.243.2.201/index.
html and http://[2001:400:610:102::c9]/index.html can be typed into a Web
browser and are both effectively equivalent to http://ee.lbl.gov/index.html (at
the time of writing; the second example requires IPv6 connectivity to be successful)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Defining internet, intranet and VPN

A

Defining the Internet
The Internet is the largest WAN in the world. It is a public domain available to every-
one in the United States, and it is available to most other countries as well. This section
defines the Internet and the way it functions.
The Internet is a worldwide system of connected computer networks. Computers that
connect to the Internet use the TCP/IP protocol suite. It is estimated that there currently are
2 billion Internet users and an estimated 650 million computers connected to the Internet,
although it is difficult to estimate this due to NAT and other similar services. The origins of
the Internet can be traced back to ARPANET, which was developed by the U.S. government
for security purposes; however, ARPANET was a disjointed group of networks using outmoded
or non-uniform protocols. By using TCP/IP to join different types of networks together, the
Internet was created.
The Internet is not controlled by any one governing body—except for two technical aspects.
First, the IP classification system is defined by the IANA (Internet Assigned Numbers
Authority). Second, DNS is defined by the Internet Engineering Task Force (IETF).
Otherwise, the Internet is “controlled” by various ISPs and network providers depending on
the location. These companies define how the Internet is accessed.
Companies use the Internet for many reasons, including:
• To communicate messages such as email.
• To gather information, often through the usage of web pages.
• To share information, often through the use of a web server.
• For e-commerce.
• To collaborate with other companies, organizations, and users.
Individuals use the Internet for these reasons as well as for social networking, shopping, file
sharing, gaming, and other multimedia use.
Though the World Wide Web is a big part of the Internet, it is not the entire Internet.
However, users quite often use the terms interchangeably. Technically, the Internet is the
entire data communications system that connects the world, including hardware and software.
Meanwhile, the World Wide Web (WWW) is an enormous system of interlinked hypertext
documents that can be accessed with a web browser. The World Wide Web Consortium
defines standards for how these documents are created and interlinked. Currently, the World
Wide Web is in a stage known as Web 2.0 (with Web 3.0 just under way). Web 2.0 is an
interactive type of web experience compared to the previous version 1.0. Web 2.0 allows users
to interact with each other and act as contributors to Web sites as well. Currently, when most
people access the Internet, they do it through a web browser, but there are many other tools
that can also be used to access the Internet, including instant messaging programs, FTP cli-
ents, third-party media programs, and more.

Defining Intranets and Extranets
Intranets and extranets are used by organizations to share data with select individuals.
Whereas an intranet is used by an organization to share data with its employees, an
extranet is used to share data with sister companies or other partnered organizations.
An intranet is a private computer network or single Web site that an organization implements
in order to share data with employees around the world. User authentication is necessary
before a person can access the information in an intranet; ideally, this keeps the general public
out, as long as the intranet is properly secured.
Generally, a company refers to its intranet as its private Web site, or perhaps the portion of
the company Web site that is private. However, intranets use all of the inherent technologies
characteristic of the Internet. For instance, within an intranet, TCP/IP protocols such as
HTTP and FTP and email protocols like POP3 and SMTP are all employed just the same
way as they are on the Internet. Again, the only difference is an intranet is a privatized version
of the Internet, and any company can have one.
An extranet is similar to an intranet except that it is extended to users outside a company,
and possibly to entire organizations that are separate from or lateral to the company. For
instance, if a company often needs to do business with a specific organization, it might
choose to set up an extranet in order to facilitate information sharing. User authentication is
still necessary, and an extranet is not open to the general public.
Figure 8-1 illustrates both an intranet and extranet. Users can connect to intranets and
extranets by simply logging in to a Web site or by using a virtual private network.

Understanding VPNs
A VPN is a virtual private network that allows connectivity between two remote net-
works. It can also be used locally, but that implementation is much less common.
In order to better understand virtual private networks, let’s discuss them a bit further and
show how to set up a basic VPN.
A virtual private network (VPN) is a connection between two or more computers or devices
that are not on the same private network. In fact, there could be LANs or WANs in between
each of the VPN devices. In order to ensure that only the proper users and data sessions cross
to a VPN device, data encapsulation and encryption are used. A “tunnel” is created, so to
speak, through the LANs and WANs that might intervene; this tunnel connects the two VPN
devices together. Every time a new session is initiated, a new tunnel is created. Some techni-
cians refer to this as tunneling through the Internet, although some VPN tunnels might go
through private networks as well.
VPNs normally utilize one of two tunneling protocols:
• Point-to-Point Tunneling Protocol (PPTP) is the more commonly used protocol, but
it is also the less secure option. PPTP generally includes security mechanisms, and no
additional software or protocols need to be loaded. A VPN device or server that allowsincoming PPTP connections must have inbound port 1723 open. PPTP works within
the point-to-point protocol (PPP), which is also used for dial-up connections.
• Layer 2 Tunneling Protocol (L2TP) is quickly gaining popularity due to the inclusion
of IPsec as its security protocol. Although this is a separate protocol and L2TP doesn’t
have any inherent security, L2TP is considered the more secure solution because IPsec is
required in most L2TP implementations. A VPN device or server that allows incoming
L2TP connections must have inbound port 1701 open.
An illustration of a basic VPN is shown in Figure 8-2. Note that the VPN server is on one
side of the cloud and the VPN client is on the other. The VPN client will have a standard
IP address to connect to its own LAN. The IP address shown in the figure is the IP address it
gets from the VPN server. The computer has two IP addresses; in essence, the VPN address is
encapsulated within the logical IP address.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Basics of Firewalls and DMZ

A

Understanding Security Devices and Zones
Security devices such as firewalls are the main defense for a company’s networks,
whether they are LANs, WANs, intranets, or extranets. Perimeter security zones such as
demilitarized zones (DMZs) help keep certain information open to specific users or to the
public while keeping the rest of an organization’s data secret.

Defining Firewalls and Other Perimeter Security Devices
Firewalls are used to protect a network from malicious attack and unwanted intrusion.
They are the most commonly used type of security device in an organization’s perimeter.
Firewalls are primarily used to protect one network from another. They are often the first
line of defense in network security. There are several types of firewalls; some run as software
on server computers, some run as stand-alone dedicated appliances, and some work as just
one function of many on a single device. They are commonly implemented between the LAN
and the Internet, as shown in Figure 8-7.
Generally, there is one firewall, with the network and all associated devices and computers
residing “behind” it. By the way, if a device is “behind” the firewall, it is also considered to
be “after” the firewall, and if the device is “in front of ” the firewall, it is also considered to be
“before” the firewall.
In Figure 8-7, you can see that the firewall has a local address of 10.254.254.249, which
connects it to the LAN. It also has an Internet address of 87.69.11.124, which allows connectivity for the entire LAN to the Internet. The firewall also hides the LAN IP addresses.
By default, the IP address 87.69.11.124 should be completely shielded. This means that all
inbound ports are effectively closed and will not allow incoming traffic, unless a LAN com-
puter initiates a session with another system on the Internet. Regardless, you should check
this with third-party applications such as Nmap or with a web-based port scanning utility like
ShieldsUP!. We will show these in upcoming exercises. If any ports are open, or unshielded,
they should be addressed immediately. Then, the firewall should be rescanned for vulnerabilities.
Many of today’s firewalls have two types of firewall technologies built into them: SPI and
NAT. However, there are a couple other types of firewall methodologies of which you should
be aware:
• Packet filtering inspects each packet that passes through the firewall and accepts or
rejects it based on a set of rules. There are two types of filtering: stateless packet inspec-
tion and stateful packet inspection (SPI). A stateless packet filter, also known as pure
packet filtering, does not retain memory of packets that have passed through the firewall.
Because of this, a stateless packet filter can be vulnerable to IP spoofing attacks. However,
a firewall running stateful packet inspection is normally not vulnerable to this because it
keeps track of the state of network connections by examining the header in each packet.
It should be able to distinguish between legitimate and illegitimate packets. This function
operates at the network layer of the OSI model.
• NAT filtering, also known as NAT endpoint filtering, filters traffic according to ports
(TCP or UDP). This can be done in three ways: using basic endpoint connections, by
matching incoming traffic to the corresponding outbound IP address connection, or by
matching incoming traffic to the corresponding IP address and port.
• Application-level gateway (ALG) supports address and port translation and checks
whether the type of application traffic is allowed. For example, your company might
allow FTP traffic through the firewall, but it may decide to disable Telnet traffic. The
ALG checks each type of packet coming in and discards those that are Telnet packets.
This adds a layer of security; however, it is resource intensive.
• Circuit-level gateway works at the session layer of the OSI model when a TCP or UDP
connection is established. Once the connection has been made, packets can flow between
the hosts without further checking. Circuit-level gateways hide information about the
private network, but they do not filter individual packets.

Redefining the DMZ
A perimeter network or demilitarized zone (DMZ) is a small network that is set up separately
from a company’s private local area network and the Internet. It is called a perimeter network
because it is usually on the edge of a LAN, but DMZ has become a much more popular
term. A DMZ allows users outside a company LAN to access specific services located on the
DMZ. However, when the DMZ set up properly, those users are blocked from gaining access
to the company LAN. Users on the LAN quite often connect to the DMZ as well, but with-
out having to worry about outside attackers gaining access to their private LAN. The DMZ
might house a switch with servers connected to it that offer web, email, and other services.
Two common DMZ configurations are as follows:
• Back-to-back configuration: This configuration has a DMZ situated between two
firewall devices, which could be black box appliances or Microsoft Internet Security and
Acceleration (ISA) Servers.
• 3-leg perimeter configuration: In this scenario, the DMZ is usually attached to a separate
connection of the company firewall. Therefore, the firewall has three connections—one
to the company LAN, one to the DMZ, and one to the Internet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Application layer protocols basics (HTTP, FTP, Telnet)

A

Getting connected isn’t as difficult as you might think.
The growth of the Internet has been an interesting reflection of the growth
of networking in general. The first networks to be deployed in most
companies were workgroup networks—islands of connectivity. They were
of various types, and they weren’t connected to each other.
As networking technologies matured, and networking took on greater
importance in many organizations, the workgroup networks grew and
often became interconnected. The next step was enterprise-wide
networking, and it wasn’t long before companies began to deploy e-mail
across those enterprise networks.
Today, many companies have full internal networks in place, and the
growth of the Internet signals a continuance of the networking trend.
Companies are now connecting via the Internet to their trading partners
and prospective customers, much as the early workgroup networks
interconnected to form a larger corporate network where different
departments could collaborate on projects.
The Internet is also mirroring another trend: Just as we’ve seen
microcomputers and workstations move from text-oriented, command-
driven operating systems to graphical user interfaces, so too have
services on the Internet shifted from the terse command-line types to the
graphical World Wide Web.
A consequence of this shift toward the Internet is that network managers
are often being asked to set up Internet connections and World Wide
Web sites. This Tutorial is the first in a series designed to introduce
network managers to the Internet and Web technologies.

Internet Services
The World Wide Web steals the lion’s share of attention lately, but thereis actually a wide variety of services available on the Internet. “Internet
Services,” gives a brief description of some of the key services available
on the Internet.
In the early years, the Internet was mostly used for electronic mail and for
exchanging files between computer systems. These applications tended
to be textual and command-line oriented, which means that the Internet
was, at that time, mostly used by the “initiates.” The Internet didn’t really
open up to the masses until just a few years ago, when the World Wide
Web—an application with a graphical interface—was deployed.

The World Wide Web
What is the Web? It has many aspects, which makes it difficult to
describe in just a sentence or two. I’ll give you a sentence, but then I’ll
need several paragraphs to elaborate: The World Wide Web is a client-
server system for delivering information in hypermedia form.
The medium of the Web is the Hypertext Markup Language (HTML).
HTML is essentially a page description language, similar to Adobe
Systems’ Postscript or Hewlett-Packard’s PCL (Printer Control
Language). HTML tells the Web browser on the user’s PC how to display
the text and graphics that represent the content of a particular Web site.
A Web browser is an HTML interpreter that requests and receives HTML-
coded documents from a Web server and displays the information
according to HTML commands embedded in the code.
The server component of this client-server system is a computer running
software that operates according to the hypertext transport protocol
(http). The Web server responds to users’ Web browsers by sending the
files the browsers request.
In most cases, a Web server delivers a document one page at a time. (Of
course, that page can be much longer than the height of your display
screen—you may have to scroll through several screens to see the entire
page). These documents are hypertext, much like the Windows Help
system. Certain key words are hyperlinks. Usually, the browser will
indicate hyperlink text by underlining it and displaying it in a differentcolor than the rest of the text. (Images can also be hyperlinks.) When you
click on a hyperlink, it causes the browser to issue a request for the
HTML document associated with that link. The Web server will then
service that request. In Web lingo, each request for a file (text document
or graphic image) is called a hit.
One difference between hyperlinks in HTML and those in other hypertext
systems, such as Windows Help, is that HTML hyperlinks can take you to
an entirely different server. These hyperlinks, in effect, make the Web
one giant document management system, which explains how the World
Wide Web got its name. Published on a Web server, other Web sites or
Web documents referenced within this article could be made into
hyperlinks; a reader could jump to each reference with just a click of the
mouse.
Web servers are attractive electronic publishing systems. In the past, the
only way to publish something on the Internet—and ensure that everyone
could read it—was to present it in plain ASCII text. Richer formats, such
as text displayed in a particular font, size, or style (italics, for example)
were word processor-specific. Graphics, too, require specific viewer
programs, which the reader may or may not have had. These factors
hindered the presentation and effectiveness of electronic publishing.
Enter the World Wide Web.
The Web has given us a level of platform independence. It’s somewhat
similar to having a videocassette that can be played on a wide variety of
videocassette recorders, regardless of the vendor. Standardization in http
and HTML means that any Web browser can read any Web document (at
least, in theory). As HTML develops, vendors tend to add extensions that
add new features to HTML or make life easier for its coders. Not every
Web browser can read every proprietary extension, so certain features
might not work with all browsers. It’s still true, though, that if you stick
with base-level HTML and avoid proprietary extensions, almost any
browser will be able to read and display your documents. Of course, you
need a Web browser running on your computer, and it’s safe to say that
there are now browsers for almost every type of computer.
Web servers were developed to reside on the Internet, but there’s noreason you can’t use one on any other TCP/IP network, large or small.
This has given rise to the idea of corporate intranets—networks that are
completely contained within the organizations they serve. Figure 1 shows
an example of an intranet, as well as a connection to the Internet.
Everything behind the firewall (that is, everything within the dashed lines)
is the corporate intranet.
The concept of the Web as a platform-independent, client-server system
is tantalizing to developers. Not only is platform independence a nice
feature to have on the Internet, it’s effective for the intranet as well.
Companies such as IBM’s Lotus Development are bringing out Internet
interfaces for their client-server systems (Lotus Notes, in this case).
Typically, whenever you revise (rev) a client-server system, you have to
develop both a new client piece and a new server piece. The amount of
work needed on the client side is multiplied several times if you’re trying
to provide clients for several different operating systems. However, if you
use a Web browser as the client, there’s no work to be done on the client
side whatsoever—you can simply let companies such as Netscape
Communications (Mountain View, CA) or Spyglass (Naperville, IL)provide the browsers.
WAN links—at least those that most companies can afford—are typically
very restrictive in terms of data throughput when compared with LAN
links. Most people consider a T1 line (1.544Mbps) a high-speed link, but
it crawls in comparison to 10Mbps Ethernet. For this reason, you must
carefully plan the graphic design of your Web pages. Keep graphics
small, and never put more than a few on each page, or else your readers
will be staring at the Windows hourglass icon for minutes at a time.
As bad as this problem can be in the wide area, it disappears for
intranets due to the tremendous throughput of local area networks. If
you’re going to strictly dedicate a Web site as an intranet server, you can
afford to go hog wild with graphics. Ironically, Web servers, which were
born on the Internet, seem to be realizing their full potential on the
intranet.
If there’s a fly in this soup, it’s HTML, which is essentially a document
publishing system. HTML is read-only and as such, it is not interactive,
although you can request new pages by clicking on hyperlinks. There are
ways around this, as we’ll explore later in the series, but Web designers
must really bend over backward to compensate for the one-way nature of
HTML.

Internet Services
Many ways to surf the Net.
The World Wide Web is only one of the many services available on the
Internet. Here’s a brief synopsis of ten Internet services:
archie Archie servers catalog the names of files residing on many
Internet ftp sites and index keywords about those files. Using archie, you
can obtain a list of files that match your keyword, as well as the ftp server
where each file is located. Once you know which file you want, you use
ftp to fetch it. An archie search can save you a tremendous amount of
work because you don’t have to log in to hundreds of hosts and search
each one individually.
Electronic mail (e-mail) Internet mail uses the Simple Mail Transport
Protocol (SMTP) to transport e-mail messages across the Internet.
file transfer protocol (ftp) Ftp lets you copy files from one computer to
another or across a network (the Internet, for example). In most cases,
you’re required to log in to the remote computer before you can obtain
access to any of the files. Some systems, however, are meant to offer
files to the public. For this purpose, anonymous ftp exists, wherein you
log in with the user name “anonymous,” and your IP address serves as
your password.
gopher Gopher is an easy-to-use, menu-oriented search tool. Gopher
servers catalog information by subject area, and the menu structure lets
you “drill down” to successively more specific topics. Gopher includes a
plain-text viewer, which enables you to view individual files (if they’re text-
only) so you can determine whether those files are what you’re looking
for. Gopher will fetch the file for you, saving you from the need to use ftp
to retrieve the file. Gopher sites are interconnected, such that selecting a
particular menu item may leapfrog you to a different gopher server.
Gopher was developed at the University of Minnesota, where the “mother
gopher” still resides.
Network news You can post a message on a particular topic, and it willbe widely disseminated to a distribution list of subscribers. These topic-
oriented BBSs are known as newsgroups. The underlying messaging
protocol used is the Network News Transport Protocol (NNTP).
telnet This is a terminal-emulation program that runs on your PC and
emulates a terminal for some host computer. A key difference between
telnet and earlier terminals is that while terminals originally used RS-232
serial connections or some other type of terminal cable to connect to the
host computer, telnet uses the network to make the link.
veronica The veronica system indexes the menus of all of the gopher
servers. The collection of all the menus in all the gopher servers is known
as “gopherspace,” and veronica gives you a powerful way to search all of
gopherspace for the subject in which you’re interested.
Wide Area Information Service (WAIS) Instead of indexing file names
(as archie does, for example), WAIS indexes the text within the files,
allowing you to find information that might not be stored in file names.
World Wide Web The World Wide Web is a networked, graphically
oriented, hypermedia system. It uses the hypertext transport protocol
(http) and the Hypertext Markup Language (HTML).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Understanding HTTP and WWW

A

What Is the World Wide Web?
The view of the web page you see through the window of your web browser is the result of a
conversation between the browser and a web server computer. The language used for that
conversation is called Hypertext Transfer Protocol (HTTP). The data delivered from the server to
the client is a finely crafted jumble of text, images, addresses, and formatting codes rendered to a
unified document through an amazing, versatile formatting language called Hypertext Markup
Language (HTML). The basic elements of what we know today as the World Wide Web were
created by Tim Berners-Lee in 1989 at the CERN research institute in Geneva, Switzerland. Berners-
Lee created a subtle and powerful information system by bringing together three technologies that
were already in development at the time:
Markup language: A system of instructions and formatting codes embedded in text
Hypertext: A means for embedding links to documents, images, and other elements in text
The Internet: (As you know by now) A global computer network of clients requesting services
and servers providing services through TCP/IP
Markup languages began in the 1960s as a means for adding formatting and typesetting codes to the
simple text used by early computers. At the time, text files were used throughout the computing world
for configuration files, online help documents, and electronic mail messages. When people startedusing computers for letters, memos, and other finished documents, they needed a way to specify
elements such as headlines, italics, bold font, and margins. Some of the early markup languages (such
as TeX, which is still in use today) were developed as a means for scientists to format and typeset
mathematical equations.
By the time modern-day word processing programs began to emerge, vendors had developed
numerous systems (many of them proprietary) for coding formatting information into a text document.
Some of these systems used ASCII-based codes. Others used different digital markers to denote
formatting information.
By the Way: Compatibility
Of course, these formatting code systems work only if the application that writes the document
and the application that reads the document agree on what each code means.
Berners-Lee and other HTML pioneers wanted a universal, vendor-neutral system for encoding
format information. They wanted this markup system to include not just typesetting codes but also
references to image files and links to other documents.
The concept of hypertext (a live link within text that switches the view to the document referenced in
the link) also evolved in the 1960s. Berners-Lee brought the hypertext concept to the Internet through
the development of the uniform resource locator (URL) or uniform resource indicator (URI); see Hour
16, “The Internet: A Closer Look”. Links let the reader view the online information in small doses.
The reader can choose whether to link to another page for additional information. HTML documents
can be assembled into unified systems of pages and links (see Figure 17.1). A visitor can find a
different path through the data depending on how the visitor traverses the links. And the web
developer has almost unlimited ability to define where a link will lead. The link can lead to another
HTML document in the same directory, a document in a different directory, or even a document on a
different computer. The link might lead to a totally different website on another computer across the
world.

Understanding HTTP
As you learned earlier, web servers and browsers communicate using the Hypertext Transfer Protocol
(HTTP). HTTP 1.1, which arrived in 1997 with RFC 2068 and is currently defined in RFCs 7230-
7235, was the dominant version of HTTP for many years. A new version, known as HTTP/2,
appeared in 2015. HTTP/2, which is based on Google’s SPDY protocol, primarily provides
performance enhancements and is not intended as a replacement for HTTP’s semantics or status
codes. Many websites and browsers now offer support for HTTP/2, but HTTP 1.1 is still very much
present on the Internet. HTTP/2 is described in RFC 7540.
The purpose of HTTP is to support the transfer of HTML documents. HTTP is an application-level
protocol. The HTTP client and server applications use the reliable TCP transport protocol to
establish a connection.
HTTP does the following:
Establishes a connection between the browser (the client) and the server
Negotiates settings and establishes parameters for the session
Provides for the orderly transfer of HTML content
Closes the connection with the server
Although the nature of web communication has become extremely complex, most of that complexity
relates to how the server builds the HTML content and what the browser does with the content it
receives. The actual process of transferring the content through HTML is relatively uncluttered,
although, as you will learn later in this section, HTTP/2 adds some additional complications to the
transfer process in order to improve performance.
When you enter a URL into the browser window, the browser first checks the scheme of the URL to
determine the protocol. (Most web browsers support other protocols in addition to HTTP.) If the
browser determines that the URL refers to a resource on an HTTP site, it extracts the DNS name from
the URL and initiates the name resolution process. The client computer sends the DNS lookup requestto a name server and receives the server’s IP address. The browser then uses the server’s IP address
to initiate a TCP connection with the server. (See Hour 6, “The Transport Layer,” for more on TCP.)
After the TCP connection is established, the browser uses the HTTP GET command to request the
web page from the server. The GET command contains the URL of the resource the browser is
requesting and the version of HTTP the browser wants to use for the transaction. In most cases, the
browser can send the relative URL with the GET request, rather than the full URL, because the
connection with the server has already been established:
GET /watergate/tapes/transcript HTTP/1.1
Several other optional field:value pairs might follow the GET command, specifying settings
such as the language, browser type, and acceptable file types.
The server response consists of a header followed by the requested document. The format of the
response header is as follows:
HTTP/1.1 status_code reason-phrase
field:value
field:value…
The status code is a three-digit number describing the status of the request. The reason-phrase is a
brief description of the status. Some common status codes are shown in Table 17.3. As you can see,
the leftmost digit of the code identifies a general category. The 100s are informational, the 200s
denote success, the 300s specify redirection, the 400s show a client error, and the 500s specify a
server error. You might be familiar with the famous 404 code, which often appears in response to a
missing page or a mistyped URL. Like the client request, the server response can also include a
number of optional field:value pairs. Some of the header fields are shown in Table 17.4. Any
field that is not understood by the browser is ignored.

As you can see from Table 17.4, some of the header fields are purely informational. Other header
fields might contain information used to parse and process the incoming HTML document.
The Content-Length field is particularly important. In the earlier HTTP version 1.0, each
request/response cycle required a new TCP connection. The client opened a connection and initiated
a request. The server fulfilled the request and then closed the connection. In that situation, the client
knew when the server had stopped sending data because the server closed the TCP connection.
Unfortunately, this process required the increased overhead necessary for continually opening and
closing connections. HTTP 1.1 allows the client and server to maintain the connection for longer than
a single transmission. In that case, the client needs some way of knowing when a single response is
finished. The Content-Length field specifies the length of the HTML object associated with the
response. If the server doesn’t know the length of the object it is sending—a situation increasingly
common with the appearance of dynamic HTML—the server sends the header field
Connection:close to notify the browser that the server will specify the end of the data by
closing the connection. HTTP also supports a negotiation phase in which the server and browser
agree to common settings for certain format and preference options.
As mentioned previously, HTTP/2 was developed for performance reasons and does not change
HTTP 1.1 syntax or methods. Some of the benefits of HTTP/2 are:
Parallel processing within a connection: HTTP 1.1 let the browser open multiple simultaneous
connections, but each connection could only respond to one request at a time. HTTP/2 allows the
server to service multiple requests at once through the same connection.
Header compression: Header information transferred at the start of a new connection can slow
down the response time significantly. Compression reduces the total size of the header data.
Server “push” responses: The server can provide content for the client before waiting for a
request. As websites grow more complex, one page might contain several different objects, such
as graphics files. Rather than waiting on a request for each item, the server can anticipate the
request for these additional objects and push the content directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Basic troubleshooting tools (ICMP, ping, traceroute)

A

How to Troubleshoot Your Connections with Ping and Traceroute
The PING utility is one of the most famous and most helpful networking commands. It’s the first command that comes to mind when facing network reachability problems. It’s also the first command that needs to be issued when there is a need to find out whether a certain host is “alive” or not.
The ping command uses the services of the Internet Control Message Protocol (ICMP), the latter being encapsulated in the IP header. Therefore, the ping utility operates basically on layer 3 (the Network layer) of the OSI model. It does not use the services of the Transport layer, and the reason for that is that traffic reliability issues are not the case here. Ping performs a simple host lookup.
TRACEROUTE is another very helpful utility that operates similarly to ping and also uses the services of the ICMP protocol. Traceroute, as the name implies, is used to trace the path between the sender and the destination host. It is a one-way trace, meaning that it traces the route from the source to destination and not the other way around, which by the way, may follow a different path. Traceroute also uses the services of User Datagram Protocol (UDP), in specific implementations, as the transport layer for a specific reason that we’ll go into further on.
So first, lets start with an overview of ICMP protocol and then we can get into the details of how ping and traceroute use this protocol to perform their tasks.

Internet Control Message Protocol (ICMP)
ICMP is a Network layer protocol that belongs to the group of control protocols similar to ARP and RARP. ICMP protocol has been designed with the unreliable characteristics of the IP protocol in mind. Due to this unreliability and connectionless behavior of IP, there was no way of informing the originator host that something went wrong during data transmission. ICMP has been designed to provide this function.
ICMP messages report back to the sender when something unexpected occurs, giving the person a clue of what might have gone wrong. I want to remind you that ICMP does not solve the reliability issues of IP; that is up to the upper layer (the Transport layer) to perform. ICMP messages are encapsulated in IP packets as seen
below

Troubleshooting with PING
Lets take a look at the behavior of the ping command with the help of Ethereal application. First, to launch the ping command the simplest way is to open a command prompt window and type in PING [ip address of the host to reach], or if DNS service is running type in PING [URL of the destination host]. Sometimes, the extended ping command, which issues continuous echo request messages, is very helpful. The format of this command is PING –t [IP address of the host]. The ping command operates the same way in Windows, Unix, Cisco machines and in every other networking device. The principle is the same, even though variations on the extended functions of the command may exist.
In the next image you can see a ping command towards the URL of Trainsignal.com. DNS query is performed first to translate the URL to an IP address, and then four echo request message types are transmitted. Transmit packets are time stamped. When the remote host receives these echo requests, inside the echo reply messages it includes the timestamp enclosed in the echo request. Upon receiving the reply message and performing simple calculations, the round trip delay time is revealed and noted

Troubleshooting with TRACEROUTE
The traceroute command operates similarly to ping. On Cisco routers and Unix platforms the layout of the command is: TRACEROUTE [destination IP address] or TRACEROUTE [URL of the destination host]. On Windows machines the function of traceroute comes with the command TRACERT (short for trace route), which operates in a slightly different manner than in Cisco and Unix platforms (details on this are below). Traceroute uses a clever way to capture the footprint of a packet’s journey. We will use an imaginary example (see the next image) to help us investigate how traceroute achieves its purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Client/Server model

A

Client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients.[1] Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client does not share any of its resources, but it requests content or service from a server. Clients therefore initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client–server model are Email, network printing, and the World Wide Web.

Client and server role
The client-server characteristic describes the relationship of cooperating programs in an application. The server component provides a function or service to one or many clients, which initiate requests for such services. Servers are classified by the services they provide. For example, a web server serves web pages and a file server serves computer files. A shared resource may be any of the server computer’s software and electronic components, from programs and data to processors and storage devices. The sharing of resources of a server constitutes a service.
Whether a computer is a client, a server, or both, is determined by the nature of the application that requires the service functions. For example, a single computer can run web server and file server software at the same time to serve different data to clients making different kinds of requests. Client software can also communicate with server software within the same computer.[2] Communication between servers, such as to synchronize data, is sometimes called inter-server or server-to-server communication.

Client and server communication
In general, a service is an abstraction of computer resources and a client does not have to be concerned with how the server performs while fulfilling the request and delivering the response. The client only has to understand the response based on the well-known application protocol, i.e. the content and the formatting of the data for the requested service.
Clients and servers exchange messages in a request–response messaging pattern. The client sends a request, and the server returns a response. This exchange of messages is an example of inter-process communication. To communicate, the computers must have a common language, and they must follow rules so that both the client and the server know what to expect. The language and rules of communication are defined in a communications protocol. All client-server protocols operate in the application layer. The application layer protocol defines the basic patterns of the dialogue. To formalize the data exchange even further, the server may implement an application programming interface (API).[3] The API is an abstraction layer for accessing a service. By restricting communication to a specific content format, it facilitates parsing. By abstracting access, it facilitates cross-platform data exchange.[4]
A server may receive requests from many distinct clients in a short period of time. A computer can only perform a limited number of tasks at any moment, and relies on a scheduling system to prioritize incoming requests from clients to accommodate them. To prevent abuse and maximize availability, server software may limit the availability to clients. Denial of service attacks are designed to exploit a server’s obligation to process requests by overloading it with excessive request rates.

Link:
https://en.wikipedia.org/wiki/Client%E2%80%93server_model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sockets, IP and port addressing

A

A network socket is an internal endpoint for sending or receiving data within a node on a computer network. Concretely, it is a representation of this endpoint in networking software (protocol stack), such as an entry in a table (listing communication protocol, destination, status, etc.), and is a form of system resource.
The term socket is analogous to physical female connectors, communication between two nodes through a channel being visualized as a cable with two male connectors plugging into sockets at each node. Similarly, the term port (another term for a female connector) is used for external endpoints at a node, and the term socket is also used for an internal endpoint of local inter-process communication (IPC) (not over a network). However, the analogy is strained, as network communication need not be one-to-one or have a dedicated communication channel.

Use
A process can refer to a socket using a socket descriptor, a type of handle. A process first requests that the protocol stack create a socket, and the stack returns a descriptor to the process so it can identify the socket. The process then passes the descriptor back to the protocol stack when it wishes to send or receive data using this socket.
Unlike ports, sockets are specific to one node; they are local resources and cannot be referred to directly by other nodes. Further, sockets are not necessarily associated with a persistent connection (channel) for communication between two nodes, nor is there necessarily some single other endpoint. For example, a datagram socket can be used for connectionless communication, and a multicast socket can be used to send to multiple nodes. However, in practice for internet communication, sockets are generally used to connect to a specific endpoint and often with a persistent connection.

Socket addresses
In practice, socket usually refers to a socket in an Internet Protocol (IP) network (where a socket may be called an Internet socket), in particular for the Transmission Control Protocol (TCP), which is a protocol for one-to-one connections. In this context, sockets are assumed to be associated with a specific socket address, namely the IP address and a port number for the local node, and there is a corresponding socket address at the foreign node (other node), which itself has an associated socket, used by the foreign process. Associating a socket with a socket address is called binding.
Note that while a local process can communicate with a foreign process by sending or receiving data to or from a foreign socket address, it does not have access to the foreign socket itself, nor can it use the foreign socket descriptor, as these are both internal to the foreign node. For example, in a connection between 10.20.30.40:4444[A] and 50.60.70.80:8888[B] (local IP address:local port, foreign IP address:foreign port), there will also be an associated socket at each end, corresponding to the internal representation of the connection by the protocol stack on that node. These are referred to locally by numerical socket descriptors, say 317 at one side[A] and 922 at the other[B]. A process on node 10.20.30.40[A] can request to communicate with node 50.60.70.80[B] on port 8888 (request that the protocol stack create a socket to communicate with that destination), and once it[B] has created a socket and received a socket descriptor (317), it[B] can communicate via this socket by using the descriptor (317). The protocol stack will then forward data to and from node 50.60.70.80[B] on port 8888. However, a process on node 10.20.30.40[A] cannot request to communicate based on the foreign socket descriptor, (e.g. “socket 922” or “socket 922 on node 50.60.70.80[B]”) as these are internal to the foreign node and are not usable by the protocol stack on node 10.20.30.40[A].

Implementation
A protocol stack, today usually provided by the operating system (rather than as a separate library, for instance), is a set of services that allow processes to communicate over a network using the protocols that the stack implements. The operating system forwards the payload of incoming IP packets to the corresponding application by extracting the socket address information from the IP and transport protocol headers and stripping the headers from the application data.
The application programming interface (API) that programs use to communicate with the protocol stack, using network sockets, is called a socket API. Development of application programs that utilize this API is called socket programming or network programming. Internet socket APIs are usually based on the Berkeley sockets standard. In the Berkeley sockets standard, sockets are a form of file descriptor, due to the Unix philosophy that “everything is a file”, and the analogies between sockets and files. Both have functions to read, write, open, and close. In practice the differences strain the analogy, and different interfaces (send and receive) are used on a socket. In inter-process communication, each end generally has its own socket.
In the standard Internet protocols TCP and UDP, a socket address is the combination of an IP address and a port number, much like one end of a telephone connection is the combination of a phone number and a particular extension. Sockets need not have a source address, for example, for only sending data, but if a program binds a socket to a source address, the socket can be used to receive data sent to that address. Based on this address, Internet sockets deliver incoming data packets to the appropriate application process.
Socket often refers specifically to an internet socket or TCP socket. An internet socket is minimally characterized by the following:
- local socket address, consisting of the local IP address and (for TCP and UDP, but not IP) a port number
- protocol: A transport protocol, e.g., TCP, UDP, raw IP. This means that (local or remote) endpoints with TCP port 53 and UDP port 53 are distinct sockets, while IP does not have ports.
- A socket that has been connected to another socket, e.g., during the establishment of a TCP connection, also has a remote socket address.

Definition
The distinctions between a socket (internal representation), socket descriptor (abstract identifier), and socket address (public address) are subtle, and these are not always distinguished in everyday usage. Further, specific definitions of a socket differ between authors. In IETF Request for Comments, Internet Standards, in many textbooks, as well as in this article, the term socket refers to an entity that is uniquely identified by the socket number. In other textbooks,[1] the term socket refers to a local socket address, i.e. a “combination of an IP address and a port number”. In the original definition of socket given in RFC 147, as it was related to the ARPA network in 1971, “the socket is specified as a 32 bit number with even sockets identifying receiving sockets and odd sockets identifying sending sockets.” Today, however, socket communications are bidirectional.
Within the operating system and the application that created a socket, a socket is referred to by a unique integer value called a socket descriptor.

Link:
https://en.wikipedia.org/wiki/Network_socket

This chapter deals with the structure of network-layer addresses used in the Internet, also known as IP addresses. We discuss how addresses are allocated and
assigned to devices on the Internet, the way hierarchy in address assignment aids
routing scalability, and the use of special-purpose addresses, including broadcast,
multicast, and anycast addresses. We also discuss how the structure and use of
IPv4 and IPv6 addresses differ.
Every device connected to the Internet has at least one IP address. Devices
used in private networks based on the TCP/IP protocols also require IP addresses.
In either case, the forwarding procedures implemented by IP routers (see Chapter
5) use IP addresses to identify where traffic is going. IP addresses also indicate
where traffic has come from. IP addresses are similar in some ways to telephone
numbers, but whereas telephone numbers are often known and used directly by
end users, IP addresses are often shielded from a user’s view by the Internet’s DNS
(see Chapter 11), which allows most users to use names instead of numbers. Users
are confronted with manipulating IP addresses when they are required to set up
networks themselves or when the DNS has failed for some reason. To understand
how the Internet identifies hosts and routers and delivers traffic between them,
we must understand the role of IP addresses. We are therefore interested in their
administration, structure, and uses.
When devices are attached to the global Internet, they are assigned addresses
that must be coordinated so as to not duplicate other addresses in use on the network. For private networks, the IP addresses being used must be coordinated to
avoid similar overlaps within the private networks. Groups of IP addresses are
allocated to users and organizations. The recipients of the allocated addresses then assign addresses to devices, usually according to some network “numbering plan.”
For global Internet addresses, a hierarchical system of administrative entities helps
in allocating addresses to users and service providers. Individual users typically
receive address allocations from Internet service providers (ISPs) that provide both
the addresses and the promise of routing traffic in exchange for a fee.

Expressing IP Addresses
The vast majority of Internet users who are familiar with IP addresses understand
the most popular type: IPv4 addresses. Such addresses are often represented in
so-called dotted-quad or dotted-decimal notation, for example, 165.195.130.107.
The dotted-quad notation consists of four decimal numbers separated by periods.
Each such number is a nonnegative integer in the range [0, 255] and represents
one-quarter of the entire IP address. The dotted-quad notation is simply a way of
writing the whole IPv4 address—a 32-bit nonnegative integer used throughout
the Internet system—using convenient decimal numbers. In many circumstances
we will be concerned with the binary structure of the address. A number of Internet sites, such as http://www.subnetmask.info and http://www. subnetcalculator.com, now contain calculators for converting between formats of
IP addresses and related information. Table 2-1 gives a few examples of IPv4
addresses and their corresponding binary representations, to get started.
In IPv6, addresses are 128 bits in length, four times larger than IPv4 addresses,
and generally speaking are less familiar to most users. The conventional notation
adopted for IPv6 addresses is a series of four hexadecimal (“hex,” or base-16) numbers called blocks or fields separated by colons. An example IPv6 address containing
eight blocks would be written as 5f05:2000:80ad:5800:0058:0800:2023:1d71. Although
not as familiar to users as decimal numbers, hexadecimal numbers make the task
of converting to binary somewhat simpler. In addition, a number of agreed-upon
simplifications have been standardized for expressing IPv6 addresses [RFC4291]:
1. Leading zeros of a block need not be written. In the preceding example, the
address could have been written as 5f05:2000:80ad:5800:58:800:2023:1d71.
2. Blocks of all zeros can be omitted and replaced by the notation ::. For example, the IPv6 address 0:0:0:0:0:0:0:1 can be written more compactly as ::1.
Similarly, the address 2001:0db8:0:0:0:0:0:2 can be written more compactly
as 2001:db8::2. To avoid ambiguities, the :: notation may be used only once
in an IPv6 address.
3. Embedded IPv4 addresses represented in the IPv6 format can use a form
of hybrid notation in which the block immediately preceding the IPv4 portion of the address has the value ffff and the remaining part of the address
is formatted using dotted-quad. For example, the IPv6 address ::ffff:10.0.0.1
represents the IPv4 address 10.0.0.1. This is called an IPv4-mapped IPv6
address.
4. A conventional notation is adopted in which the low-order 32 bits of the
IPv6 address can be written using dotted-quad notation. The IPv6 address
::0102:f001 is therefore equivalent to the address ::1.2.240.1. This is called
an IPv4-compatible IPv6 address. Note that IPv4-compatible addresses are
not the same as IPv4-mapped addresses; they are compatible only in the
sense that they can be written down or manipulated by software in a way
similar to IPv4 addresses. This type of addressing was originally required
for transition plans between IPv4 and IPv6 but is now no longer required
[RFC4291].

Basic IP Address Structure
IPv4 has 4,294,967,296 possible addresses in its address space, and IPv6 has 340,282,3
66,920,938,463,463,374,607,431,768,211,456. Because of the large number of addresses
(especially for IPv6), it is convenient to divide the address space into chunks. IP
addresses are grouped by type and size. Most of the IPv4 address chunks are eventually subdivided down to a single address and used to identify a single network
interface of a computer attached to the Internet or to some private intranet. These
addresses are called unicast addresses. Most of the IPv4 address space is unicast
address space. Most of the IPv6 address space is not currently being used. Beyond
unicast addresses, other types of addresses include broadcast, multicast, and
anycast, which may refer to more than one interface, plus some special-purpose
addresses we will discuss later. Before we begin with the details of the current
address structure, it is useful to understand the historical evolution of IP addresses

Classful Addressing
When the Internet’s address structure was originally defined, every unicast IP
address had a network portion, to identify the network on which the interface using the IP address was to be found, and a host portion, used to identify the particular host
on the network given in the network portion. Thus, some number of contiguous bits
in the address became known as the net number, and remaining bits were known as
the host number. At the time, most hosts had only a single network interface, so the
terms interface address and host address were used somewhat interchangeably.
With the realization that different networks might have different numbers of
hosts, and that each host requires a unique IP address, a partitioning was devised
wherein different-size allocation units of IP address space could be given out to
different sites, based on their current and projected number of hosts. The partitioning of the address space involved five classes. Each class represented a different trade-off in the number of bits of a 32-bit IPv4 address devoted to the network
number versus the number of bits devoted to the host number. Figure 2-1 shows
the basic idea.

Subnet Addressing
One of the earliest difficulties encountered when the Internet began to grow was
the inconvenience of having to allocate a new network number for any new network segment that was to be attached to the Internet. This became especially cumbersome with the development and increasing use of local area networks
(LANs) in the early 1980s. To address the problem, it was natural to consider a
way that a site attached to the Internet could be allocated a network number centrally that could then be subdivided locally by site administrators. If this could be
accomplished without altering the rest of the Internet’s core routing infrastructure, so much the better.
Implementing this idea would require the ability to alter the line between the
network portion of an IP address and the host portion, but only for local purposes
at a site; the rest of the Internet would “see” only the traditional class A, B, and C
partitions. The approach adopted to support this capability is called subnet addressing [RFC0950]. Using subnet addressing, a site is allocated a class A, B, or C network number, leaving some number of remaining host bits to be further allocated
and assigned within a site. The site may further divide the host portion of its base
address allocation into a subnetwork (subnet) number and a host number. Essentially, subnet addressing adds one additional field to the IP address structure, but
without adding any bits to its length. As a result, a site administrator is able to
trade off the number of subnetworks versus the number of hosts expected to be on
each subnetwork without having to coordinate with other sites

Subnet Masks
The subnet mask is an assignment of bits used by a host or router to determine how
the network and subnetwork information is partitioned from the host information
in a corresponding IP address. Subnet masks for IP are the same length as the corresponding IP addresses (32 bits for IPv4 and 128 bits for IPv6). They are typically
configured into a host or router in the same way as IP addresses—either statically
(typical for routers) or using some dynamic system such as the Dynamic Host Configuration Protocol (DHCP; see Chapter 6). For IPv4, subnet masks may be written
in the same way an IPv4 address is written (i.e., dotted-decimal). Although not
originally required to be arranged in this manner, today subnet masks are structured as some number of 1 bits followed by some number of 0 bits. Because of this
arrangement, it is possible to use a shorthand format for expressing masks that
simply gives the number of contiguous 1 bits in the mask (starting from the left).
This format is now the most common format and is sometimes called the prefix
length. Table 2-4 presents some examples for IPv4.
Masks are used by routers and hosts to determine where the network/subnetwork portion of an IP address ends and the host part begins. A bit set to 1 in
the subnet mask means the corresponding bit position in an IP address should be
considered part of a combined network/subnetwork portion of an address, which
is used as the basis for forwarding datagrams (see Chapter 5). Conversely, a bit
set to 0 in the subnet mask means the corresponding bit position in an IP address
should be considered part of the host portion. For example, in Figure 2-4 we can
see how the IPv4 address 128.32.1.14 is treated when a subnet mask of 255.255.255.0
is applied to it.

Variable-Length Subnet Masks (VLSM)
So far we have discussed how a network number allocated to a site can be subdivided into ranges assigned to multiple subnetworks, each of the same size and
therefore able to support the same number of hosts, based on the operational expectations of the network administrator. We now observe that it is possible to use a
different-length subnet mask applied to the same network number in different portions of the same site. Although doing this complicates address configuration management, it adds flexibility to the subnet structure because different subnetworks
may be set up with different numbers of hosts. Variable-length subnet masks (VLSM)
are now supported by most hosts, routers, and routing protocols. To understand
how VLSM works, consider the network topology illustrated in Figure 2-5, which
extends Figure 2-3 with two additional subnetworks using VLSM.

Broadcast Addresses
In each IPv4 subnetwork, a special address is reserved to be the subnet broadcast
address. The subnet broadcast address is formed by setting the network/subnetwork portion of an IPv4 address to the appropriate value and all the bits in the Host
field to 1. Consider the left-most subnet from Figure 2-5. Its prefix is 128.32.1.0/24.
The subnet broadcast address is constructed by inverting the subnet mask (i.e.,
changing all the 0 bits to 1 and vice versa) and performing a bitwise OR operation with the address of any of the computers on the subnet (or, equivalently, the
network/subnetwork prefix). Recall that the result of a bitwise OR operation is 1
if either input bit is 1. Using the IPv4 address 128.32.1.14, this computation can be
written as shown in Figure 2-6.

IPv6 Addresses and Interface Identifiers
In addition to being longer than IPv4 addresses by a factor of 4, IPv6 addresses
also have some additional structure. Special prefixes used with IPv6 addresses
indicate the scope of an address. The scope of an IPv6 address refers to the portion
of the network where it can be used. Important examples of scopes include nodelocal (the address can be used only for communication on the same computer),
link-local (used only among nodes on the same network link or IPv6 prefix), or
global (Internet-wide). In IPv6, most nodes have more than one address in use,
often on the same network interface. Although this is supported in IPv4 as well, it is not nearly as common. The set of addresses required in an IPv6 node, including
multicast addresses (see Section 2.5.2), is given in [RFC4291].

The Internet Protocol (IP)
IP is the workhorse protocol of the TCP/IP protocol suite. All TCP, UDP, ICMP, and
IGMP data gets transmitted as IP datagrams. IP provides a best-effort, connectionless datagram delivery service. By “best-effort” we mean there are no guarantees
that an IP datagram gets to its destination successfully. Although IP does not simply drop all traffic unnecessarily, it provides no guarantees as to the fate of the
packets it attempts to deliver. When something goes wrong, such as a router temporarily running out of buffers, IP has a simple error-handling algorithm: throw
away some data (usually the last datagram that arrived). Any required reliability
must be provided by the upper layers (e.g., TCP). IPv4 and IPv6 both use this basic
best-effort delivery model.
The term connectionless means that IP does not maintain any connection state
information about related datagrams within the network elements (i.e., within the
routers); each datagram is handled independently from all other others. This also
means that IP datagrams can be delivered out of order. If a source sends two consecutive datagrams (first A, then B) to the same destination, each is routed independently and can take different paths, and B may arrive before A. Other things
can happen to IP datagrams as well: they may be duplicated in transit, and they
may have their data altered as the result of errors. Again, some protocol above IP
(usually TCP) has to handle all of these potential problems in order to provide an
error-free delivery abstraction for applications.
In this chapter we take a look at the fields in the IPv4 (see Figure 5-1) and
IPv6 (see Figure 5-2) headers and describe how IP forwarding works. The official
specification for IPv4 is given in [RFC0791]. A series of RFCs describe IPv6, starting with [RFC2460].

IPv4 and IPv6 Headers
Figure 5-1 shows the format of an IPv4 datagram. The normal size of the IPv4
header is 20 bytes, unless options are present (which is rare). The IPv6 header is
twice as large but never has any options. It may have extension headers, which provide similar capabilities, as we shall see later. In our pictures of headers and datagrams, the most significant bit is numbered 0 at the left, and the least significant
bit of a 32-bit value is numbered 31 on the right.
The 4 bytes in a 32-bit value are transmitted in the following order: bits 0–7
first, then bits 8–15, then 16–23, and bits 24–31 last. This is called big endian byte
ordering, which is the byte ordering required for all binary integers in the TCP/IP
headers as they traverse a network. It is also called network byte order. Computer
CPUs that store binary integers in other formats, such as the little endian format
used by most PCs, must convert the header values into network byte order for
transmission and back again for reception.

IP Header Fields
The first field (only 4 bits or one nibble wide) is the Version field. It contains the
version number of the IP datagram: 4 for IPv4 and 6 for IPv6. The headers for both
IPv4 and IPv6 share the location of the Version field but no others. Thus, the two
protocols are not directly interoperable—a host or router must handle either IPv4
or IPv6 (or both, called dual stack) separately. Although other versions of IP have
been proposed and developed, only versions 4 and 6 have any significant amount
of use. The IANA keeps an official registry of these version numbers [IV].
The Internet Header Length (IHL) field is the number of 32-bit words in the IPv4
header, including any options. Because this is also a 4-bit field, the IPv4 header is
limited to a maximum of fifteen 32-bit words or 60 bytes. Later we shall see how
this limitation makes some of the options, such as the Record Route option, nearly
useless today. The normal value of this field (when no options are present) is 5.
There is no such field in IPv6 because the header length is fixed at 40 bytes.
Following the header length, the original specification of IPv4 [RFC0791]
specified a Type of Service (ToS) byte, and IPv6 [RFC2460] specified the equivalent
Traffic Class byte. Use of these never became widespread, so eventually this 8-bit
field was split into two smaller parts and redefined by a set of RFCs ([RFC3260]
[RFC3168][RFC2474] and others). The first 6 bits are now called the Differentiated
Services Field (DS Field), and the last 2 bits are the Explicit Congestion Notification
(ECN) field or indicator bits. These RFCs now apply to both IPv4 and IPv6. These
fields are used for special processing of the datagram when it is forwarded. We
discuss them in more detail in Section 5.2.3.
The Total Length field is the total length of the IPv4 datagram in bytes. Using
this field and the IHL field, we know where the data portion of the datagram
starts, and its length. Because this is a 16-bit field, the maximum size of an IPv4
datagram (including header) is 65,535 bytes. The Total Length field is required in the header because some lower-layer protocols that carry IPv4 datagrams do not
(accurately) convey the size of encapsulated datagrams on their own. Ethernet,
for example, pads small frames to be a minimum length (64 bytes). Even though
the minimum Ethernet payload size is 46 bytes (see Chapter 3), an IPv4 datagram
can be smaller (as few as 20 bytes). If the Total Length field were not provided, the
IPv4 implementation would not know how much of a 46-byte Ethernet frame was
really an IP datagram, as opposed to padding, leading to possible confusion.
Although it is possible to send a 65,535-byte IP datagram, most link layers
(such as Ethernet) are not able to carry one this large without fragmenting it
(chopping it up) into smaller pieces. Furthermore, a host is not required to be able
to receive an IPv4 datagram larger than 576 bytes. (In IPv6 a host must be able to
process a datagram at least as large as the MTU of the link to which it is attached,
and the minimum link MTU is 1280 bytes.) Many applications that use the UDP
protocol (see Chapter 10) for data transport (e.g., DNS, DHCP, etc.) use a limited
data size of 512 bytes to avoid the 576-byte IPv4 limit. TCP chooses its own datagram size based on additional information (see Chapter 15).
When an IPv4 datagram is fragmented into multiple smaller fragments, each of
which itself is an independent IP datagram, the Total Length field reflects the length
of the particular fragment. Fragmentation is described in detail along with UDP in
Chapter 10. In IPv6, fragmentation is not supported by the header, and the length
is instead given by the Payload Length field. This field measures the length of the
IPv6 datagram not including the length of the header; extension headers, however,
are included in the Payload Length field. As with IPv4, the 16-bit size of the field
limits its maximum value to 65,535. With IPv6, however, it is the payload length that
is limited to 64KB, not the entire datagram. In addition, IPv6 supports a jumbogram
option (see Section 5.3.1.2) that provides for the possibility, at least theoretically, of
single packets with payloads as large as 4GB (4,294,967,295 bytes)!
The Identification field helps indentify each datagram sent by an IPv4 host. To
ensure that the fragments of one datagram are not confused with those of another,
the sending host normally increments an internal counter by 1 each time a datagram
is sent (from one of its IP addresses) and copies the value of the counter into the IPv4
Identification field. This field is most important for implementing fragmentation, so
we explore it further in Chapter 10, where we also discuss the Flags and Fragment
Offset fields. In IPv6, this field shows up in the Fragmentation extension header, as
we discuss in Section 5.3.3.
The Time-to-Live field, or TTL, sets an upper limit on the number of routers
through which a datagram can pass. It is initialized by the sender to some value
(64 is recommended [RFC1122], although 128 or 255 is not uncommon) and decremented by 1 by every router that forwards the datagram. When this field reaches
0, the datagram is thrown away, and the sender is notified with an ICMP message
(see Chapter 8). This prevents packets from getting caught in the network forever
should an unwanted routing loop occur.
The Protocol field in the IPv4 header contains a number indicating the type of
data found in the payload portion of the datagram. The most common values are
17 (for UDP) and 6 (for TCP). This provides a demultiplexing feature so that the IP
protocol can be used to carry payloads of more than one protocol type. Although
this field originally specified the transport-layer protocol the datagram is encapsulating, it is now understood to identify the encapsulated protocol, which may or
not be a transport protocol. For example, other encapsulations are possible, such
as IPv4-in-IPv4 (value 4). The official list of the possible values of the Protocol field
is given in the assigned numbers page [AN]. The Next Header field in the IPv6
header generalizes the Protocol field from IPv4. It is used to indicate the type of
header following the IPv6 header. This field may contain any values defined for
the IPv4 Protocol field, or any of the values associated with the IPv6 extension
headers described in Section 5.3.
The Header Checksum field is calculated over the IPv4 header only. This is important to understand because it means that the payload of the IPv4 datagram (e.g.,
TCP or UDP data) is not checked for correctness by the IP protocol. To help ensure
that the payload portion of an IP datagram has been correctly delivered, other
protocols must cover any important data that follows the header with their own
data-integrity-checking mechanisms. We shall see that almost all protocols encapsulated in IP (ICMP, IGMP, UDP, and TCP) have a checksum in their own headers
to cover their header and data and also to cover certain parts of the IP header they
deem important (a form of “layering violation”). Perhaps surprisingly, the IPv6
header does not have any checksum field.
The algorithm used in computing a checksum is also used by most of the
other Internet-related protocols that use checksums and is sometimes known as
the Internet checksum. Note that when an IPv4 datagram passes through a router,
its header checksum must change as a result of decrementing the TTL field. We
discuss the methods for computing the checksum in more detail in Section 5.2.2.
Every IP datagram contains the Source IP Address of the sender of the datagram
and the Destination IP Address of where the datagram is destined. These are 32-bit
values for IPv4 and 128-bit values for IPv6, and they usually identify a single interface on a computer, although multicast and broadcast addresses (see Chapter 2)
violate this rule. While a 32-bit address can accommodate a seemingly large number of Internet entities (4.5 billion), there is widespread agreement that this number is inadequate, a primary motivation for moving to IPv6. The 128-bit address
of IPv6 can accommodate a huge number of Internet entities. As was restated in
[H05], IPv6 has 3.4 × 1038 (340 undecillion) addresses. Quoting from [H05] and others: “The optimistic estimate would allow for 3,911,873,538,269,506,102 addresses
per square meter of the surface of the planet Earth.” It certainly seems as if this
should last a very, very long time indeed.

The Internet Checksum
The Internet checksum is a 16-bit mathematical sum used to determine, with
reasonably high probability, whether a received message or portion of a message
matches the one sent. Note that the Internet checksum algorithm is not the same as
the common cyclic redundancy check (CRC) [PB61], which offers stronger protection.
To compute the IPv4 header checksum for an outgoing datagram, the value
of the datagram’s Checksum field is first set to 0. Then, the 16-bit one’s complement sum of the header is calculated (the entire header is considered a sequence
of 16-bit words). The 16-bit one’s complement of this sum is then stored in the
Checksum field to make the datagram ready for transmission. One’s complement
addition can be implemented by “end-round-carry addition”: when a carry bit
is produced using conventional (two’s complement) addition, the carry is added
back in as a 1 value. Figure 5-3 presents an example, where the message contents
are represented in hexadecimal.
When an IPv4 datagram is received, a checksum is computed across the whole
header, including the value of the Checksum field itself. Assuming there are no
errors, the computed checksum value is always 0 (a one’s complement of the value
FFFF). Note that for any nontrivial packet or header, the value of the Checksum
field in the packet can never be FFFF. If it were, the sum (prior to the final one’s
complement operation at the sender) would have to have been 0. No sum can ever
be 0 using one’s complement addition unless all the bytes are 0—something that
never happens with any legitimate IPv4 header. When the header is found to be
bad (the computed checksum is nonzero), the IPv4 implementation discards the
received datagram. No error message is generated. It is up to the higher layers to
somehow detect the missing datagram and retransmit if necessary.

IP Options
IP supports a number of options that may be selected on a per-datagram basis.
Most of these options were introduced in [RFC0791] at the time IPv4 was being
designed, when the Internet was considerably smaller and when threats from
malicious users were less of a concern. As a consequence, many of the options are
no longer practical or desirable because of the limited size of the IPv4 header or
concerns regarding security. With IPv6, most of the options have been removed
or altered and are not an integral part of the basic IPv6 header. Instead, they are
placed after the IPv6 header in one or more extension headers. An IP router that
receives a datagram containing options is usually supposed to perform special
processing on the datagram. In some cases IPv6 routers process extension headers,
but many headers are designed to be processed only by end hosts. In some routers,
datagrams with options or extensions are not forwarded as fast as ordinary datagrams. We briefly discuss the IPv4 options as background and then look at how
IPv6 implements extension headers and options. Table 5-4 shows most of the IPv4
options that have been standardized over the years.
Table 5-4 gives the reserved IPv4 options for which descriptive RFCs can be
found. The complete list is periodically updated and is available online [IPPARAM]. The options area always ends on a 32-bit boundary. Pad bytes with a value
of 0 are added if necessary. This ensures that the IPv4 header is always a multiple
of 32 bits (as required by the IHL field). The “Number” column in Table 5-4 is the
number of the option. The “Value” column indicates the number placed inside the
option Type field to indicate the presence of the option. These values from the two
columns are not necessarily the same because the Type field has additional structure. In particular, the first (high-order) bit indicates whether the option should
be copied into fragments if the associated datagram is fragmented. The next 2 bits
indicate the option’s class. Currently, all options in Table 5-4 use option class 0
(control) except Timestamp and Traceroute, which are both class 2 (debugging and
measurement). Classes 1 and 3 are reserved.
Most of the standardized options are rarely or never used in the Internet today.
Options such as Source and Record Route, for example, require IPv4 addresses to
be placed inside the IPv4 header. Because there is only limited space in the header (60 bytes total, of which 20 are devoted to the basic IPv4 header), these options are
not very useful in today’s IPv4 Internet where the number of router hops in an
average Internet path is about 15 [LFS07]. In addition, the options are primarily
for diagnostic purposes and make the construction of firewalls more cumbersome
and risky. Thus, IPv4 options are typically disallowed or stripped at the perimeter
of enterprise networks by firewalls (see Chapter 7).
Within enterprise networks, where the average path length is smaller and protection from malicious users may be less of a concern, options can still be useful.
In addition, the Router Alert option represents somewhat of an exception to the
problems with the other options for use on the Internet. Because it is designed
primarily as a performance optimization and does not change fundamental router
behavior, it is permitted more often than the other options. As suggested previously, some router implementations have a highly optimized internal pathway for
forwarding IP traffic containing no options. The Router Alert option informs routers that a packet requires processing beyond the conventional forwarding algorithms. The experimental Quick-Start option at the end of the table is applicable to
both IPv4 and IPv6, and we describe it in the next section when discussing IPv6
extension headers and options.

IPv6 Extension Headers
In IPv6, special functions such as those provided by options in IPv4 can be enabled
by adding extension headers that follow the IPv6 header. The routing and timestamp functions from IPv4 are supported this way, as well as some other functions
such as fragmentation and extra-large packets that were deemed to be rarely used
for most IPv6 traffic (but still desired) and thereby did not justify allocating bits
in the IPv6 header to support them. With this arrangement, the IPv6 header is
fixed at 40 bytes, and extension headers are added only when needed. In choosing
the IPv6 header to be of a fixed size, and requiring that extension headers be processed only by end hosts (with one exception), the designers of IPv6 have made the
design and construction of high-performance routers easier because the demands
on packet processing at routers can be simpler than with IPv4. In practice, packetprocessing performance is governed by many factors, including the complexity
of the protocol, the capabilities of the hardware and software in the router, and
traffic load.
Extension headers, along with headers of higher-layer protocols such as TCP
or UDP, are chained together with the IPv6 header to form a cascade of headers
(see Figure 5-6). The Next Header field in each header indicates the type of the
subsequent header, which could be an IPv6 extension header or some other type.
The value of 59 indicates the end of the header chain. The possible values for the
Next Header field are available at [IP6PARAM], and most are provided in Table 5-5.
As we can see from Table 5-5, the IPv6 extension header mechanism distinguishes some functions (e.g., routing and fragmentation) from options. The order of the extension headers is given as a recommendation, except for the location of
the Hop-by-Hop Options, which is mandatory, so an IPv6 implementation must
be prepared to process extension headers in the order in which they are received.
Only the Destination Options header can be used twice—the first time for options
pertaining to the destination IPv6 address contained in the IPv6 header and the
second time (position 8) for options pertaining to the final destination of the datagram. In some cases (e.g., when the Routing header is used), the Destination IP
Address field in the IPv6 header changes as the datagram is forwarded to its ultimate destination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Using proxy server

A

In computer networks, a proxy server is a server (a computer system or an application) that acts as an intermediary for requests from clients seeking resources from other servers.[1] A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity.[2] Proxies were invented to add structure and encapsulation to distributed systems.

Link:
https://en.wikipedia.org/wiki/Proxy_server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

File transfer services: FTP, TFTP

A

FTP
The File Transfer Protocol (FTP) is a widely used protocol that enables a user to transfer files
between two computers on a TCP/IP network. A file transfer application (typically also called ftp)
uses FTP to transfer files. The user runs an FTP client application on one computer, and the other
computer runs an FTP server program such as ftpd (FTP daemon) on a UNIX/Linux computer, or an
FTP service on other platforms. Many FTP client programs are command-line based, but graphical
versions are available as well. FTP is used primarily to transfer files, although it can perform other
functions such as creating directories, removing directories, and listing files.
By the Way: FTP and the Web
FTP is also widely used on the World Wide Web, and the FTP protocol has been integrated
into most web browsers. Sometime when you’re downloading a file through a web browser,
you might notice the URL in the address box begins with ftp://.
FTP uses the TCP protocol and, therefore, operates through a reliable, connection-oriented session
between the client and server computers. The standard FTP daemon on the server listens on TCP port
21 for a request from a client. When a client sends a request, a Transmission Control Protocol (TCP)
connection is initiated (see Hour 6, “The Transport Layer”). The remote user is then authenticated by
the FTP server, and a session begins. A classic text-based FTP session requires the remote user to
interact with the server through a command-line interface. Typical commands start and stop the FTP
session, navigate through the remote directory structure, and upload or download files. Newer GUI-
based FTP clients offer a graphic interface (rather than a command interface) for navigating
directories and moving files.
By the Way: Daemon and Service
In the UNIX world, a daemon is a process that runs in the background and performs a service
when that service is requested. A daemon is called a service in the Windows world.
On most computers, you start a text-based FTP session by entering ftp followed by the hostname or
IP address of the FTP server. FTP then prompts you for a user ID and a password, which are used by
the FTP server to validate you as an authorized user and determine your rights. For example, the user
account you log on with might be assigned read-only access, or it might be configured for both read
and write operations. Many FTP servers are available for public use and allow you to log on with a
user ID called anonymous (usually for read-only access). When the anonymous account is used as
the user ID, you can enter virtually any password. However, it is customary to enter your email
account name as the password. When FTP servers are not intended for general public use, the servers
are configured to not allow anonymous access. In that case, you must enter a user ID and password to
gain access. The user ID and password are typically set up and provided by the FTP server
administrator.
Many FTP client implementations allow you to enter either UNIX-based commands or DOS-based
commands. The actual commands available depend on the client software being used. When you
transfer files using FTP, you must specify to FTP the type of file that you are about to transfer; the
most common choices are binary and ASCII. Choose ASCII when the type of file you want to transfer
is a simple text file. Choose binary when the type of file you want to transfer is a program file, a
word processing document, or a graphics file. The default file transfer mode is ASCII.
Be aware that many FTP servers reside on UNIX and Linux computers. UNIX and Linux are case
sensitive—that is, they distinguish between uppercase and lowercase letters. So, you must match the
case exactly when entering filenames. The current directory on the local computer from which you
start an FTP session is the default location where files are transferred to or from.

Trivial File Transfer Protocol
The Trivial File Transfer Protocol (TFTP) is used to transfer files between the TFTP client and a
TFTP server, a computer running the TFTP daemon. This protocol uses User Datagram Protocol
(UDP) as a transport and, unlike FTP, does not require a user to log on to transfer files. Because
TFTP does not require a user logon, it is often considered a security hole, especially if the TFTP
server permits writing.
TFTP was designed to be small so that both it and UDP could be implemented on a PROM
(programmable read-only memory) chip. TFTP is limited (hence the name trivial) when compared to
FTP. TFTP can only read and write files; it cannot list the contents of directories, create or remove
directories, or allow a user to log on as FTP allows. TFTP is primarily used in conjunction with the
RARP and BOOTP protocols to boot diskless workstations and, in some cases, to upload new system
code or patches to routers or other network devices. TFTP can transfer files using either an ASCII
format known as netascii or a binary format known as octet; a third format known as mail is no longer
used.
When a user enters a tftp statement on a command line, the computer initiates a connection to the
server and performs the file transfer. At the completion of the file transfer, the session is closed and terminated. The syntax of the TFTP statement is as follows:
TFTP [-i] host [get | put] []

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Name resolution services: DNS, whois

A

What Is Name Resolution?
When the early TCP/IP networks went online, users quickly realized that it was not healthy or
efficient to attempt to remember the IP address of every computer on the network. The people at the
research center were much too busy to have to remember whether Computer A in Building 6 had the
address 100.12.8.14 or 100.12.8.18. Programmers began to wonder whether it would be
possible to assign each computer a descriptive, human-friendly name and then let the computers on
the network take care of associating the name with an address.
The hostname system is a simple name resolution technique developed early in the history of TCP/IP.
In this system, each computer is assigned an alphanumeric name called a hostname. If the operating
system encounters an alphanumeric name where it is expecting an IP address, the operating system
consults a hosts file (see Figure 10.1). The hosts file contains a list of hostname-to-IP-address
associations. If the alphanumeric name is on the list of hostnames, the computer reads the IP address
associated with the name. The computer then replaces the hostname in the command with the
corresponding IP address and executes the command.
The hosts file system worked well (and still does) on small local networks. However, this system
becomes inefficient on larger networks. The host-to-address associations have to reside in a single
file, and the search efficiency of that file diminishes as the file expands. In the ARPAnet days, a single
master file called hosts.txt maintained a list of name-to-address associations, and local administrators
had to continually update hosts.txt to stay current. Furthermore, the hosts name space was essentially
flat. All nodes were equal, and the name resolution system could not make use of the efficient,
hierarchical structure of the IP address space.
Even if the ARPAnet engineers could have solved these problems, the hosts file system could never
work with a huge network with millions of nodes like the Internet. The engineers knew they needed a
hierarchical name resolution system that would
- Distribute the responsibility for name resolution among a group of special name resolution
servers. The name resolution servers maintain the tables that define name-to-address associations.
- Grant authority for local name resolution to a local administrator. In other words, instead of
maintaining a centralized, master copy of all name-to-address pairs, let an administrator on
Network A be responsible for name resolution on Network A, and let an admin of Network B
manage name resolution for Network B. That way, the individuals responsible for any changes on
a network are also responsible for making sure those changes are reflected in the name resolution
infrastructure.
These priorities led to the development of the Domain Name System (DNS). DNS is the name
resolution method used on the Internet and is the source of common Internet names such as
www.unixreview.com and www.slashdot.org. As you will learn later in this hour, DNS divides the
namespace into hierarchical entities called domains. The domain name can be included with thehostname in what is called a fully qualified domain name (FQDN). For instance, a computer with the
hostname maybe in the domain whitehouse.gov would have the FQDN maybe.whitehouse.gov.
Through the years, the DNS system continued to evolve, and DNS now offers options for better
security, dynamic address mapping, and autodiscovery. This hour describes hostname resolution and
DNS name resolution. You also learn about NetBIOS, a name resolution system used on some legacy
Microsoft networks.

Name Resolution Using Hosts Files
As you learned earlier in this hour, a hosts file is a file containing a table that associates hostnames to
IP addresses. Hostname resolution was developed before the more sophisticated DNS name
resolution, and newer, more sophisticated name resolution methods make the hosts file a bit
anachronistic in contemporary environments. However, this legacy hostname resolution technique is
still a good starting point for a discussion of name resolution.
Configuring hostname resolution on a small network is usually simple. Operating systems that support
TCP/IP recognize the hosts file and use it for name resolution with little or no intervention from the
user. The details for configuring hostname resolution vary, depending on the implementation. The
steps are roughly as follows:
1. Assign an IP address and hostname to each computer.
2. Create a hosts file that maps the IP address to the hostname of each computer. The hosts file is
often named hosts, although some implementations use the filename hosts.txt.
3. Place the hosts file in the designated location on each computer. The location varies, depending
on the operating system.
The hosts file contains entries for hosts that a computer needs to communicate with, allowing you to
enter an IP address with a corresponding hostname, an FQDN, or other aliases statically. Also, the
file usually contains an entry for the loopback address, 127.0.0.1. The loopback address is used
for TCP/IP diagnostics and represents “this computer.”

DNS Name Resolution
The designers of DNS wanted to avoid having to keep an up-to-date name resolution file on each
computer. DNS instead places name resolution data on one or more special servers. The DNS servers
provide name resolution services for the network (see Figure 10.2). If a computer on the network
encounters a hostname where it is expecting an IP address, it sends a query to the server asking for the
IP address associated with the hostname. If the DNS server has the address, it sends the address back
to the requesting computer. The computer then invisibly substitutes the IP address for the hostname
and executes the command. When a change occurs on the network (such as a new computer or a
change to a hostname), the network administrator has to change only the DNS configuration once (on
the DNS server). The new information is then available to any computer that initiates a DNS query to
the server. Also, the DNS server can be optimized for search speed and can support a larger database
than would be possible with each computer searching separately through the cumbersome hosts file.
The DNS server shown in Figure 10.2 provides several advantages over hosts filename resolution. It
offers a single DNS configuration point for a local network and provides more efficient use of
network resources. However, the configuration shown in Figure 10.2 still does not solve the problem
of providing decentralized management of a vast network infrastructure. Like the hosts file, the
configuration in Figure 10.2 would not scale well to a huge network like the Internet. The name server
in Figure 10.2 could not operate efficiently with a database that included a record for every host on
the Internet. Even if it could, the logistics of maintaining an all-Internet database would be
prohibitive. Whoever configured the server would have to know about every change to any Internet
host anywhere in the world.
A better solution, reasoned the designers, was to let every office or institution configure a local name
server to operate, as shown in Figure 10.2, and then to provide a means for all the name servers to
talk to each other (see Figure 10.3). In this scenario, when a DNS client sends a name resolution
request to a name server, the name server does one of the following:
If the name server can find the requested address in its own address database, it immediately
sends the address to the client.
If the name server cannot find the address in its own records, it queries other name servers to find
the address and then sends the address to the client.
You might be wondering how the first name server knows which name server to contact when it
begins the query process that will lead to the address. Actually, this query process is closely
associated with the design of the DNS namespace. Keep in mind that DNS is not working strictly with
a hostname. As described earlier in this hour, DNS works with FQDNs. An FQDN consists of both a
hostname and a name specifying the domain.
The DNS namespace is a multitiered arrangement of domains (see Figure 10.4). A domain is a
collection of computers under a single authority sharing a common portion of the namespace (that is,
bearing the same domain name). At the top of the DNS tree is a single node known as root. Root is
sometimes shown as a period (.), although the actual symbol for root is a null character. Beneath root
is a group of domains known as top-level domains (TLDs). Figure 10.4 shows some of the TLDs for
the world’s most famous DNS namespace: the Internet. TLDs include the familiar .com, .org, and .edu
domains, as well as domains for national governments, such as .us (United States), .uk (United
Kingdom), .fr (France), and .jp (Japan).
The domain name shows the chain of domains from the top of the tree. The name server in the domain
sams.com holds name resolution information for hosts located in sams.com. The authoritative name
server for a domain can delegate name resolution for a subdomain to another server. For instance, the
authoritative name server in sams.com can delegate authority for the subdomain edit.sams.com to
another name server. The name resolution records for the subdomain edit.sams.com are then located
on the name server that has been delegated authority for the subdomain. Authority for name resolution
is thus delegated throughout the tree, and the administrators for a given domain can have control of
name-to-address mappings for the hosts in that domain.
When a host on the network needs an IP address, it usually sends a recursive query to a nearby name
server. This query tells the name server, “either give me the IP address associated with this name or
else tell me that you can’t find it.” If the name server cannot find the requested address among its own
records, it initiates a process of querying other name servers to obtain the address. This process is
shown in Figure 10.6. Name server A is using what is called an iterative query to find the address. An
iterative query tells the next name server “either send me the IP address or give me a clue to where I
might find it.” To summarize this process, the client sends a single recursive query to the name server.
The name server then issues a series of iterative queries to other name servers to resolve the name.
When the name server gets the address associated with the name, it replies to the client’s query with
the address.
The process for DNS name resolution is as follows (refer to Figure 10.6):
1. Host1 sends a query to name server A asking for the IP address associated with the domain
name trog.tenth.marines.mil.
2. Name server A checks its own records to see if it has the requested address. If server A has the
address, it returns the address to Host1.
3. If name server A does not have the address, it initiates the process of finding the address. Name
server A sends an iterative request for the address to name server B, a top-level name server for
the .mil domain, asking for the address associated with the name
trog.tenth.marines.mil.
4. Name server B is not able to supply the address, but it is able to send name server A the address
of name server C, the name server for marines.mil.
5. Name server A sends a request for the address to name server C. Name server C is not able to
supply the address, but it is able to send the address of name server D, the name server for
tenth.marines.com.
6. Name server A sends a request for the IP address to name server D. Name server D looks up the
address for the host trog.tenth.marines.mil and sends the address to name server A.
Name server A then sends the address to Host1.
7. Host1 initiates a connection to the host trog.tenth.marines.mil.
This process occurs millions of times a day on the Internet. This tidy scenario is complicated
somewhat by some additional features of the modern network, including address caching, Dynamic
Host Configuration Protocol (DHCP), and dynamic DNS. However, the functionality of most TCP/IP
networks depends on this form of DNS name resolution.
It is also important to note that the network is not required to have a separate name server for each
node on the domain tree. A single name server can handle multiple domains. It is also common for
multiple name servers to serve a single domain.

Registering a Domain
The Internet is only one example of a DNS namespace. You do not have to be connected to the Internet
to use DNS. If you are not connected to the Internet, you do not have to worry about registering your
domain names. However, organizations that want to use their own domain names on the Internet (such
as BuddysCars.com) must register that name with the proper registration authority.
Internet Corporation for Assigned Names and Numbers (ICANN) has overall authority for the task of
domain name registration but delegates registration for particular TLDs to other groups. The U.S.
company VeriSign currently acts as a maintainer for the DNS root zone under contract from ICANN.
Other groups maintain the system of top-level domains (TLDs).

Name Server Types
When implementing DNS on your network, you need to choose at least one server to be responsible
for maintaining your domain. This is referred to as your primary name server, and it gets all the
information about the zones it is responsible for from local files. Any changes you make to your
domain are made on this server.
Many networks also have at least one more server as a backup, or secondary name server. If
something happens to your primary server, this machine can continue to service requests. The
secondary server gets its information from the primary server’s zone file. When this exchange of
information takes place, it is referred to as a zone transfer.
A third type of server is called a caching-only server. A cache is part of a computer’s memory that
keeps frequently requested data ready to be accessed. As a caching-only server, it responds to queries
from clients on the local network for name resolution requests. It queries other DNS servers for
information about domains and computers that offer services such as Web and FTP. When it receives
information from other DNS servers, it stores that information in its cache in case a request for that
information is made again.
Caching-only servers are used by client computers on the local network to resolve names. Other DNSservers on the Internet will not know about them and, therefore, will not query them. This is desirable
if you want to distribute the load your servers are put under. A caching-only server is also simple to
maintain.
By the Way: DNS Implementations
DNS must be implemented as a service or daemon running on the DNS server machine.
Windows servers have a native DNS service, though some Microsoft admins prefer to use
third-party DNS implementations. The UNIX/Linux world has a number of DNS
implementation options, but the most popular choice is Berkeley Internet Name Domain
(BIND).

Dynamic DNS
DNS, as it has been described so far, is designed for situations in which there is a permanent (or at
least semipermanent) association of a hostname with an IP address. In today’s networks (as you learn
in the next hour), IP addresses are often assigned dynamically. In other words, a new IP address isassigned to a computer through Dynamic Host Configuration Protocol (DHCP) each time the computer
starts. This means that if the computer is to be registered with DNS and accessible by its hostname,
the DNS server must have some way to learn the IP address the computer is using.
The recent popularity of dynamic IP addressing has forced DNS vendors to adapt. Some IP
implementations (including BIND) now offer dynamic update of DNS records. In a typical scenario
(see Figure 10.9), the host obtains an IP address from the DHCP server and then updates the DNS
server with the new address. You learn more about DHCP in Hour 12, “Configuration.”
Enterprise directory systems such as Microsoft’s Active Directory use dynamic DNS to manage
DHCP client systems within the directory structure. Dynamic DNS services are also popular on the
Internet. Several online services offer a means for registering a permanent DNS name for computers
with dynamic addresses. Users can access these services to remotely connect to a home network
using the DNS name or to operate a personal website without a static address.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly