Content Distribution Networks Flashcards
Why do we want to deploy caching?
- High load on websites (video streaming)
- Distance means latency
- DDoS atacks
What do we want to achieve with caching?
- Distribute load
- min distance
- distribute load
- reduce intra-AS traffic
What are edge-caches?
Content servers at the end of cdn
- placed in ISP networks
Why do we want to send packets from the same flow on the same path?
to prevent re-ordering if paths with different lengths are used. Re-ordering messes with TCP congestion control (in-order property of tcp).
Flows can be identified via the 5 tuple hash
How does consistent hashing work and why do we need it?
Map clients to a point on a circle. Clients walk around to find the server they should use. If n changes, only K/N keys need to be remapped.
servers can be remapped to mult positions on circle -> more even distribution.
How does HTTP load balancing work?
The load balancer redirects the user to the worker instead of forwarding traffic. Main server answers with a 302 temporarily moved with the IP address of the “worker” to the client. Browser automatically connects to this point.
What 2 options exist for dealing with https traffic in loadbalancing?
- SSL/TLS frontend - entry server decrypts traffic for internal network: cost efficient, central config&cert but snooping in network yields clear text
- Perform enc/dec on content server: May burn resources, decryption as late as possible, easy deployment out-of-box
How does DNS load balancing work?
Resolve the same domain name for multiple ip addresses.
1: client chooses randomly
2: Dns LB chooses endpoint near to client (GeoIP)
How does DDoS protection work?
- Setup a DNS to protection provider (Cloudflare)
- CDN serves static content
- Dynamic content is served via the provider’s server
What is the problem with HTTPS on DDoS protection?
- CDN needs to terminate the conneciton & has complete access
- We don’t know how th e follow-up connection is encrypted (maybe not at all)
How does anycast-based load balancing work?
- DNS was not intended vs LB
- Packet routing (& routing manipulation) via BGP
- assign lots of content servers with same IP
- announce IP prefix through lots of different sites
- > BGP takes care of finding best available content server
- > In practice, everybody uses 5-tuple hashing so no mult paths for same flow
Which objescts do we cache in CDN caches?
- Many objects are cached once
- Disk space without benefits
- > use cache filtering
- > Bloom filter to decide which objects to cache
What are bloom filters and how do they work?
Bloom filters are binary arrays with mult hash functions. It can say in O(1) if a entry is (most likely) part of the base-set of the BF but can say definetly if its not.
It maps object hashes from mult functions into table with binary entries. “Lookup elements” are hashen and then checked if the hash values are part of the BF. If one is not present, it is definetly not an element in set of BF