Lesson 12: Applications: CDNs and Overlay Networks Flashcards

Question 1

Q

What is HTTP Redirection?

Answer

A

A network protocol used for redirecting HTTP traffic. As a response to a client request, the server may send a redirect response with code 3xx and a new address for the client to use.

Question 2

Q

What is IP Anycast?

Answer

A

A network protocol that can be used to select a server with the shortest BGP path. This is accomplished by assigning the same IP address to different servers and advertising their paths using BGP. Routers pick the shortest path for the address, assuming they are the same server.

Question 3

Q

Describe the DNS message format.

Answer

A

first field is an ID: can be used for tracking queries/responses
flags: can be used for multiple features such as specifying message type (query/response), query format (recursive/iterative)
questions: info about the query eg. hostname that is being queried, query type (A,MX,…)
answer: resource records if the message is a response type
authority: resource records for more authoritative servers
additional: section contains other helpful records

Question 4

Q

What are the most common types of resource records?

Answer

A

Type=A: name is domain name and value is the IP address of the hostname eg. (abc.com, 190.191.192.193, A)
Type=NS: name is domain name and the value is the appropriate authoritative DNS server that can obtain the IP addresses for hosts in that domain (abc.com, dns.abc.com, NS)
Type=CNAME: name is alias hostname and the value is the canonical name (abc.com, relay1.dnsserver.abc.com, CNAME)
Type=MX: the name is the alias hostname of a mail server and the value is the canonical name of the email server (abc.com, mail.dnsserver.abc.com, MX)

Question 5

Q

What is a DNS resource record?

Answer

A

The DNS servers store the mappings between hostnames and IP addresses as resource records (RRs). These resource records are contained inside the DNS reply messages.A DNS resource record has four fields: (name, value, Type, TTL).The TTLspecifies the time (in sec) a record should remain in the cache. The name and the value depend on the type of the resource record.

Question 6

Q

What is DNS caching?

Answer

A

The idea of DNS Caching is that,in both iterative and recursive queries, after a server receives the DNS reply of mapping from any host to IP address, it stores this information in the Cache memory before sending it to the client.

Question 7

Q

What is the difference between iterative and recursive DNS queries?

Answer

A

In theiterative queryprocess, the querying host is referred to a different DNS server in the chain, until it can fully resolve the request.

Whereas in the recursive query, the querying host, and each DNS server in the chain queries the next server and delegates the query to it.We note that the usual pattern is for the first query from the requesting host to the local DNS server to be recursive, and the remaining queries to be iterative.

Question 8

Q

What is the structure of the DNS hierarchy? Why does DNS use a hierarchical scheme?

Answer

A

The DNS hierarchy consists of the following types of servers:

Root DNS servers: There are 13 servers, each of which is a network of replicated servers mostly located in North America. As of May 2019, the total number of server instances is 984.

Top level domain (TLD) Servers: These servers are responsible for the top level domains such as .com, .org, .edu, etc and also all of the country top level domains such as .uk, .fr, .jp.

Authoritative servers: An organization’s authoritative DNS server keeps the DNS records that need to be publicly accessible, such as the domain name - IP mappings for web serves and mail servers of that organization.

Local DNS servers.**Even though this type of servers is not considered as strictly belonging to the DNS hierarchy, nevertheless it is considered central to the overall DNS architecture. Each Internet Service Provider (ISP), such as a university, a company or a small residential ISP, has one or more local DNS servers. Hosts that connect to an ISP are provided with the IP addresses of one or more local DNS servers. So, when a host makes a DNS query, the query is sent to the provided local DNS server, which in turn acts as a proxy, and it forwards the query into the DNS hierarchy.

Why we need it:

Because the centralized model would have the following problems:

Single point of failure
Concurrent traffic handling is difficult
Geographic distance would add to latency

Question 9

Q

What are the services offered by DNS, apart from hostname resolution?

Answer

A

Mail server/Host aliasing: Email servers have to have simple and mnemonic names. Eg@hotmail.com. However, the canonical hostname can be difficult to remember eg relay2.west-coast.hotmail.com. DNS is used to get the canonical hostname (and IP address) for an alias hostname. Also, a host can have one or more names. If there are two hostnames then this usually is a combination of canonical and mnemonic hostnames. DNS can be used to find the canonical hostname for a given host and also obtain an IP for that host.
Load distribution: Busy websites may be replicated over multiple servers. When a client makes a DNS query, the DNS server responds with the entire set of addresses but rotates the address ordering with each reply. This helps in distributing the traffic across servers.

Question 10

Q

What are the main steps that a host takes to use DNS?

Answer

A

The user host runs the client side of the DNS application
The browser extracts the hostnamewww.someschool.edu(Links to an external site.)and passes it to client side of the DNS application.
DNS Client sends a query containing the hostname of DNS
DNS Client eventually receives a reply which included IP address for the hostname
As soon as the host receives the IP addresses, it can initiate a TCP connection to the HTTP server located at that port at that IP

Question 11

Q

Why would a centralized design with a single DNS server not work?

Answer

A

Single point of failure
Concurrent traffic handling is difficult
Geographic distance would add to latency

Question 12

Q

What is consistent hashing? How does it work?

Answer

A

The main idea behind consistent hashing is that servers and content objects are mapped to the same ID space. The successor server to the object is responsible for serving it and whenever the immediate successor is down, the next available one is used. The idea is to reduce the amount of remapping required when hashkeys change. ie. servers are added/removed

Question 13

Q

What are the strategies for server selection? What are the limitations of these strategies?

Answer

A

DNS
HTTP redirection
IP Anycast

Question 14

Q

What is the drawback to using the traditional approach of having a single, publicly accessible web server?

Answer

A

Single point of failure
Unable to handle high traffic concurrently
Expensive due to geographical location

Question 15

Q

What is a CDN?

Answer

A

A content distribution network. Networks of multiple, geographically distributed servers and/or data centers, with copies of content that direct users to a server or server cluster that can best serve the user’s request.

Question 16

Q

What are the six major challenges that Internet applications face?

Answer

Study These Flashcards

A

peering point congestion
inefficient routing protocols
unreliable networks
ineffficient communication protocols
scalability
application limitations and slow rate of change of adoption

Question 17

Q

What are the major shifts that have impacted the evolution of the Internet ecosystem?

Answer

Study These Flashcards

A

the internet has evolved into a large scale content delivery network with the increased demand for online content
topological flattening: IXPs are really popular and provide a large number of services, shifting the topology from ISPs to IXP heavy

Question 18

Q

Compare the “enter deep” and “bring home” approach of CDN server placement.

Answer

Study These Flashcards

A

enter deep: CDNs place many smaller server clusters “deep” into the access networks around the world. This has the benefit of providing lower latency and higher throughput to the user. However, it is more difficult to manage and maintain due to the large number of clusters
bring home: CDNs place fewer larger server clusters at key points (typically at IXPs and not in access networks). It’s easier to manage, at the cost of higher latency and lower throughput for the end user compared to the enter deep

Question 19

Q

What is the role of DNS in the way CDN operates?

Answer

Study These Flashcards

A

The DNS query helps resolve the server cluster and eventually the server for the CDN requested by a host

Question 20

Q

What are the two main steps in CDN server selection?

Answer

Study These Flashcards

A

mapping client to a cluster

- selecting a server from the cluster

Question 21

Q

What is the simplest approach to select a cluster? What are the limitations of this approach?

Answer

Study These Flashcards

A

geographically closest:

could be difficult to determine the closest one since usually it interacts with the LDNS server of the client, which would be located elsewhere
closest may not equal to best end-to-end network performance eg. due to load, routing inefficiencies

Question 22

Q

What metrics could be considered when using measurements to select a cluster?

Answer

Study These Flashcards

A

network layer metrics: delay, available bandwitdth

- application layer metrics: re-buffering ratio, average bitrate, page load times

Question 23

Q

How are the metrics for cluster selection obtained?

Answer

Study These Flashcards

A

active: done through probing

- passive: grouping subnets of clients together and collecting the performance metrics based on actual requests served

Question 24

Q

Explain the distributed system that uses a 2-layered system. What are the challenges of this system?

Answer

Study These Flashcards

A

a coarse grained global layer operates at larger time scales (few tens of seconds or minutes). This layer has a global view of client quality measurements. It builds a data-driven prediction model of video quality
fine grained per-client decision layer that operates at the milliseconds timescale. It makes actual decisions upon a client request. This is based on the latest but possible stale pre-computed global model and up to date per-client state.

challenges:

requires a centralized controller
needs data for different subnet-cluster pairs. Thus, some of the clients deliberately need to be routed to sub-optimal clusters

Lesson 12: Applications: CDNs and Overlay Networks Flashcards

(24 cards)