Caching Flashcards

Question 1

Q

What are some considerations for questions about Caching system?

Answer

A

o	Size of cache
o	Queries per second (QPS)
o	Latency for Get/Set
o	Eviction policy
o	Availability (100% always)
o	Scalable (because it is distributed)

Question 2

Q

What is application layer cache?

Answer

A

Placing a cache directly on a request layer node would cause the response data to be stored locally on the node.
 Each time a request is made to the service, the node will quickly return local, cached data if it exists.
 If it is not in the cache, the requesting node will query the data from disk.
 The cache on one request layer node could be located both in memory (which is very fast) and on the node’s local disk (faster than going to network storage).

Question 3

Q

What are the problems with the application cache, and what are the 2 way to handle it?

Answer

A

global caches and distributed caches.

Question 4

Q

What are the Pros/Cons of distributed cache?

Answer

A

o Advantage: of a distributed cache is the ease by which we can increase the cache space, which can be achieved just by adding nodes to the request pool.

o A disadvantage: of distributed caching is resolving a missing node.
 Some distributed caches get around this by storing multiple copies of the data on different nodes; however, you can imagine how this logic can get complicated quickly, especially when you add or remove nodes from the request layer.
 Although even if a node disappears and part of the cache is lost, the requests will just pull from the origin—so it isn’t necessarily catastrophic!

Question 5

Q

what is a distributed cache?

Answer

A

o In a distributed cache, each of its nodes own part of the cached data.
o Typically, the cache is divided up using a consistent hashing function, such that if a request node is looking for a certain piece of data, it can quickly know where to look within the distributed cache to determine if that data is available.
 In this case, each node has a small piece of the cache, and will then send a request to another node for the data before going to the origin.

Question 6

Q

What is global cache?

Answer

A

o All the nodes use the same single cache space. This involves adding a server, or file store of some sort, faster than your original store and accessible by all the request layer nodes.
o Each of the request nodes queries the cache in the same way it would a local one.
o This kind of caching scheme can get a bit complicated because it is very easy to overwhelm a single cache as the number of clients and requests increase, but is very effective in some architectures (particularly ones with specialized hardware that make this global cache very fast, or that have a fixed dataset that needs to be cached).

Question 7

Q

What are the 2 types of global cache?

Answer

A

Two types of global cache:
 Cache is responsible to fetch data if it does not have the data
 Query node is responsible to fetch data if the cache did not have the data

Question 8

Q

When would we use a global cache in which the query node is responsible to fetch the data from the DB?

Answer

A

o Most applications leveraging global caches tend to use the first type, where the cache itself manages eviction and fetching data to prevent a flood of requests for the same data from the clients.
o However, there are some cases where the second implementation (on the right) makes more sense.

 For example, if the cache is being used for very large files, a low cache hit percentage would cause the cache buffer to become overwhelmed with cache misses; in this situation, it helps to have a large percentage of the total data set (or hot data set) in the cache.
 Another example is an architecture where the files stored in the cache are static and shouldn’t be evicted. (This could be because of application requirements around that data latency—certain pieces of data might need to be very fast for large data sets—where the application logic understands the eviction strategy or hot spots better than the cache.)

Question 9

Q

What is a CDN? when would we use it? What does the initials stand for?

Answer

A

Content Distribution Network (CDN)
o A cache that comes into play for sites serving large amounts of static media.
o In a typical CDN setup, a request will first ask the CDN for a piece of static media; the CDN will serve that content if it has it locally available. If it isn’t available, the CDN will query the back-end servers for the file and then cache it locally and serve it to the requesting user.
o If the system we are building isn’t yet large enough to have its own CDN, we can ease a future transition by serving the static media off a separate subdomain (e.g. static.yourservice.com) using a lightweight HTTP server like Nginx, and cutover the DNS from your servers to a CDN later.

Question 10

Q

What are 3 types of cache invalidation?

Answer

A

Write through
Write around
Write back

Question 11

Q

What is Write through cache invalidation method? Pros/Cons?

Answer

A

Write-through cache:
• Under this scheme data is written into the cache and the corresponding database at the same time.
• The cached data allows for fast retrieval, and since the same data gets written in the permanent storage, we will have complete data consistency between cache and storage.
• Advantage: this scheme ensures that nothing will get lost in case of a crash, power failure, or other system disruptions.
• Disadvantage: this scheme has the disadvantage of higher latency for write operations.

Question 12

Q

What is Write around cache invalidation method? Pros/Cons?

Answer

A

Write-around cache:
• This technique is similar to write through cache, but data is written directly to permanent storage, bypassing the cache.
• Advantage: This can reduce the cache being flooded with write operations that will not subsequently be re-read.
• Disadvantage: A read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency.

Question 13

Q

What is Write back cache invalidation method? Pros/Cons?

Answer

A

 Write-back cache:
• Data is written to cache alone, and completion is immediately confirmed to the client.
• The write to the permanent storage is done after specified intervals or under certain conditions.
• Advantages: Low latency and high throughput for write-intensive applications
• Disadvantages: data loss in case of a crash or other adverse event because the only copy of the written data is in the cache.

Question 14

Q

What are 6 types of cache eviction policy?

Answer

A

o First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
o Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
o Least Recently Used (LRU): Discards the least recently used items first.
o Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first.
o Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
o Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.