Lesson 12: Applications: CDNs and Overlay Networks Flashcards
What is HTTP Redirection?
A network protocol used for redirecting HTTP traffic. As a response to a client request, the server may send a redirect response with code 3xx and a new address for the client to use.
What is IP Anycast?
A network protocol that can be used to select a server with the shortest BGP path. This is accomplished by assigning the same IP address to different servers and advertising their paths using BGP. Routers pick the shortest path for the address, assuming they are the same server.
Describe the DNS message format.
- first field is an ID: can be used for tracking queries/responses
- flags: can be used for multiple features such as specifying message type (query/response), query format (recursive/iterative)
- questions: info about the query eg. hostname that is being queried, query type (A,MX,…)
- answer: resource records if the message is a response type
- authority: resource records for more authoritative servers
- additional: section contains other helpful records
What are the most common types of resource records?
- Type=A: name is domain name and value is the IP address of the hostname eg. (abc.com, 190.191.192.193, A)
- Type=NS: name is domain name and the value is the appropriate authoritative DNS server that can obtain the IP addresses for hosts in that domain (abc.com, dns.abc.com, NS)
- Type=CNAME: name is alias hostname and the value is the canonical name (abc.com, relay1.dnsserver.abc.com, CNAME)
- Type=MX: the name is the alias hostname of a mail server and the value is the canonical name of the email server (abc.com, mail.dnsserver.abc.com, MX)
What is a DNS resource record?
The DNS servers store the mappings between hostnames and IP addresses as resource records (RRs). These resource records are contained inside the DNS reply messages.A DNS resource record has four fields: (name, value, Type, TTL).The TTLspecifies the time (in sec) a record should remain in the cache. The name and the value depend on the type of the resource record.
What is DNS caching?
The idea of DNS Caching is that,in both iterative and recursive queries, after a server receives the DNS reply of mapping from any host to IP address, it stores this information in the Cache memory before sending it to the client.
What is the difference between iterative and recursive DNS queries?
In theiterative queryprocess, the querying host is referred to a different DNS server in the chain, until it can fully resolve the request.
Whereas in the recursive query, the querying host, and each DNS server in the chain queries the next server and delegates the query to it.We note that the usual pattern is for the first query from the requesting host to the local DNS server to be recursive, and the remaining queries to be iterative.
What is the structure of the DNS hierarchy? Why does DNS use a hierarchical scheme?
The DNS hierarchy consists of the following types of servers:
Root DNS servers: There are 13 servers, each of which is a network of replicated servers mostly located in North America. As of May 2019, the total number of server instances is 984.
Top level domain (TLD) Servers: These servers are responsible for the top level domains such as .com, .org, .edu, etc and also all of the country top level domains such as .uk, .fr, .jp.
Authoritative servers: An organization’s authoritative DNS server keeps the DNS records that need to be publicly accessible, such as the domain name - IP mappings for web serves and mail servers of that organization.
Local DNS servers.**Even though this type of servers is not considered as strictly belonging to the DNS hierarchy, nevertheless it is considered central to the overall DNS architecture. Each Internet Service Provider (ISP), such as a university, a company or a small residential ISP, has one or more local DNS servers. Hosts that connect to an ISP are provided with the IP addresses of one or more local DNS servers. So, when a host makes a DNS query, the query is sent to the provided local DNS server, which in turn acts as a proxy, and it forwards the query into the DNS hierarchy.
Why we need it:
Because the centralized model would have the following problems:
- Single point of failure
- Concurrent traffic handling is difficult
- Geographic distance would add to latency
What are the services offered by DNS, apart from hostname resolution?
- Mail server/Host aliasing: Email servers have to have simple and mnemonic names. Eg@hotmail.com. However, the canonical hostname can be difficult to remember eg relay2.west-coast.hotmail.com. DNS is used to get the canonical hostname (and IP address) for an alias hostname. Also, a host can have one or more names. If there are two hostnames then this usually is a combination of canonical and mnemonic hostnames. DNS can be used to find the canonical hostname for a given host and also obtain an IP for that host.
- Load distribution: Busy websites may be replicated over multiple servers. When a client makes a DNS query, the DNS server responds with the entire set of addresses but rotates the address ordering with each reply. This helps in distributing the traffic across servers.
What are the main steps that a host takes to use DNS?
- The user host runs the client side of the DNS application
- The browser extracts the hostnamewww.someschool.edu(Links to an external site.)and passes it to client side of the DNS application.
- DNS Client sends a query containing the hostname of DNS
- DNS Client eventually receives a reply which included IP address for the hostname
- As soon as the host receives the IP addresses, it can initiate a TCP connection to the HTTP server located at that port at that IP
Why would a centralized design with a single DNS server not work?
- Single point of failure
- Concurrent traffic handling is difficult
- Geographic distance would add to latency
What is consistent hashing? How does it work?
The main idea behind consistent hashing is that servers and content objects are mapped to the same ID space. The successor server to the object is responsible for serving it and whenever the immediate successor is down, the next available one is used. The idea is to reduce the amount of remapping required when hashkeys change. ie. servers are added/removed
What are the strategies for server selection? What are the limitations of these strategies?
- DNS
- HTTP redirection
- IP Anycast
What is the drawback to using the traditional approach of having a single, publicly accessible web server?
- Single point of failure
- Unable to handle high traffic concurrently
- Expensive due to geographical location
What is a CDN?
A content distribution network. Networks of multiple, geographically distributed servers and/or data centers, with copies of content that direct users to a server or server cluster that can best serve the user’s request.