Lesson 12: Applications: CDNs and Overlay Networks Flashcards

1
Q

What is HTTP Redirection?

A

A network protocol used for redirecting HTTP traffic. As a response to a client request, the server may send a redirect response with code 3xx and a new address for the client to use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is IP Anycast?

A

A network protocol that can be used to select a server with the shortest BGP path. This is accomplished by assigning the same IP address to different servers and advertising their paths using BGP. Routers pick the shortest path for the address, assuming they are the same server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the DNS message format.

A
  • first field is an ID: can be used for tracking queries/responses
  • flags: can be used for multiple features such as specifying message type (query/response), query format (recursive/iterative)
  • questions: info about the query eg. hostname that is being queried, query type (A,MX,…)
  • answer: resource records if the message is a response type
  • authority: resource records for more authoritative servers
  • additional: section contains other helpful records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the most common types of resource records?

A
  • Type=A: name is domain name and value is the IP address of the hostname eg. (abc.com, 190.191.192.193, A)
  • Type=NS: name is domain name and the value is the appropriate authoritative DNS server that can obtain the IP addresses for hosts in that domain (abc.com, dns.abc.com, NS)
  • Type=CNAME: name is alias hostname and the value is the canonical name (abc.com, relay1.dnsserver.abc.com, CNAME)
  • Type=MX: the name is the alias hostname of a mail server and the value is the canonical name of the email server (abc.com, mail.dnsserver.abc.com, MX)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a DNS resource record?

A

The DNS servers store the mappings between hostnames and IP addresses as resource records (RRs). These resource records are contained inside the DNS reply messages.A DNS resource record has four fields: (name, value, Type, TTL).The TTLspecifies the time (in sec) a record should remain in the cache. The name and the value depend on the type of the resource record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is DNS caching?

A

The idea of DNS Caching is that,in both iterative and recursive queries, after a server receives the DNS reply of mapping from any host to IP address, it stores this information in the Cache memory before sending it to the client.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between iterative and recursive DNS queries?

A

In theiterative queryprocess, the querying host is referred to a different DNS server in the chain, until it can fully resolve the request.

Whereas in the recursive query, the querying host, and each DNS server in the chain queries the next server and delegates the query to it.We note that the usual pattern is for the first query from the requesting host to the local DNS server to be recursive, and the remaining queries to be iterative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the structure of the DNS hierarchy? Why does DNS use a hierarchical scheme?

A

The DNS hierarchy consists of the following types of servers:

Root DNS servers: There are 13 servers, each of which is a network of replicated servers mostly located in North America. As of May 2019, the total number of server instances is 984.

Top level domain (TLD) Servers: These servers are responsible for the top level domains such as .com, .org, .edu, etc and also all of the country top level domains such as .uk, .fr, .jp.

Authoritative servers: An organization’s authoritative DNS server keeps the DNS records that need to be publicly accessible, such as the domain name - IP mappings for web serves and mail servers of that organization.

Local DNS servers.**Even though this type of servers is not considered as strictly belonging to the DNS hierarchy, nevertheless it is considered central to the overall DNS architecture. Each Internet Service Provider (ISP), such as a university, a company or a small residential ISP, has one or more local DNS servers. Hosts that connect to an ISP are provided with the IP addresses of one or more local DNS servers. So, when a host makes a DNS query, the query is sent to the provided local DNS server, which in turn acts as a proxy, and it forwards the query into the DNS hierarchy.

Why we need it:

Because the centralized model would have the following problems:

  1. Single point of failure
  2. Concurrent traffic handling is difficult
  3. Geographic distance would add to latency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the services offered by DNS, apart from hostname resolution?

A
  1. Mail server/Host aliasing: Email servers have to have simple and mnemonic names. Eg@hotmail.com. However, the canonical hostname can be difficult to remember eg relay2.west-coast.hotmail.com. DNS is used to get the canonical hostname (and IP address) for an alias hostname. Also, a host can have one or more names. If there are two hostnames then this usually is a combination of canonical and mnemonic hostnames. DNS can be used to find the canonical hostname for a given host and also obtain an IP for that host.
  2. Load distribution: Busy websites may be replicated over multiple servers. When a client makes a DNS query, the DNS server responds with the entire set of addresses but rotates the address ordering with each reply. This helps in distributing the traffic across servers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main steps that a host takes to use DNS?

A
  1. The user host runs the client side of the DNS application
  2. The browser extracts the hostnamewww.someschool.edu(Links to an external site.)and passes it to client side of the DNS application.
  3. DNS Client sends a query containing the hostname of DNS
  4. DNS Client eventually receives a reply which included IP address for the hostname
  5. As soon as the host receives the IP addresses, it can initiate a TCP connection to the HTTP server located at that port at that IP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why would a centralized design with a single DNS server not work?

A
  1. Single point of failure
  2. Concurrent traffic handling is difficult
  3. Geographic distance would add to latency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is consistent hashing? How does it work?

A

The main idea behind consistent hashing is that servers and content objects are mapped to the same ID space. The successor server to the object is responsible for serving it and whenever the immediate successor is down, the next available one is used. The idea is to reduce the amount of remapping required when hashkeys change. ie. servers are added/removed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the strategies for server selection? What are the limitations of these strategies?

A
  • DNS
  • HTTP redirection
  • IP Anycast
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the drawback to using the traditional approach of having a single, publicly accessible web server?

A
  • Single point of failure
  • Unable to handle high traffic concurrently
  • Expensive due to geographical location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a CDN?

A

A content distribution network. Networks of multiple, geographically distributed servers and/or data centers, with copies of content that direct users to a server or server cluster that can best serve the user’s request.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the six major challenges that Internet applications face?

A
  • peering point congestion
  • inefficient routing protocols
  • unreliable networks
  • ineffficient communication protocols
  • scalability
  • application limitations and slow rate of change of adoption
17
Q

What are the major shifts that have impacted the evolution of the Internet ecosystem?

A
  • the internet has evolved into a large scale content delivery network with the increased demand for online content
  • topological flattening: IXPs are really popular and provide a large number of services, shifting the topology from ISPs to IXP heavy
18
Q

Compare the “enter deep” and “bring home” approach of CDN server placement.

A
  • enter deep: CDNs place many smaller server clusters “deep” into the access networks around the world. This has the benefit of providing lower latency and higher throughput to the user. However, it is more difficult to manage and maintain due to the large number of clusters
  • bring home: CDNs place fewer larger server clusters at key points (typically at IXPs and not in access networks). It’s easier to manage, at the cost of higher latency and lower throughput for the end user compared to the enter deep
19
Q

What is the role of DNS in the way CDN operates?

A

The DNS query helps resolve the server cluster and eventually the server for the CDN requested by a host

20
Q

What are the two main steps in CDN server selection?

A
  • mapping client to a cluster

- selecting a server from the cluster

21
Q

What is the simplest approach to select a cluster? What are the limitations of this approach?

A

geographically closest:

  • could be difficult to determine the closest one since usually it interacts with the LDNS server of the client, which would be located elsewhere
  • closest may not equal to best end-to-end network performance eg. due to load, routing inefficiencies
22
Q

What metrics could be considered when using measurements to select a cluster?

A
  • network layer metrics: delay, available bandwitdth

- application layer metrics: re-buffering ratio, average bitrate, page load times

23
Q

How are the metrics for cluster selection obtained?

A
  • active: done through probing

- passive: grouping subnets of clients together and collecting the performance metrics based on actual requests served

24
Q

Explain the distributed system that uses a 2-layered system. What are the challenges of this system?

A
  • a coarse grained global layer operates at larger time scales (few tens of seconds or minutes). This layer has a global view of client quality measurements. It builds a data-driven prediction model of video quality
  • fine grained per-client decision layer that operates at the milliseconds timescale. It makes actual decisions upon a client request. This is based on the latest but possible stale pre-computed global model and up to date per-client state.

challenges:

  • requires a centralized controller
  • needs data for different subnet-cluster pairs. Thus, some of the clients deliberately need to be routed to sub-optimal clusters