exam_2021-retake Flashcards
What is the AFK scale cube?
It is a model for segmenting services, defining micro-services and scaling products.
What are the axis’ of AFK scale cube?
X-Axis: Horizontal duplication and cloning of services and data
Y-Axis: Functional Decomposition and segmentation
Z-Axis: Service and data partitioning along customer boundaries
What is horizontal duplication?
One monolithic system/service > many systems, each a clone and load balanced
What is y-Axis
Split by function, method, service or dissimilar things
What is Z-Axis?
Split by similar things. For example, customers
Q1.A Explain the difference between service partitioning and data partitioning inthe AKF scale cube, and give an example of each
Splitting services is on the Y-Axis according to AFK scale cube, data partitioning on the Z-Axis. You can split services by separating the webserver from the database. You can do data partitioning by splitting a database (sharding) based on customer ID’s for example.
Q1.B At one point, Google search introduced extra layers between the web frontend and the servers holding the partitions of the index. Why did they have to introduce these layers?
They introduced caching servers. These have a hit rate of 30-60% and are capable of handling a whole lot of traffic
What is a race condition?
When the code is trying to do two or more things at once, and the result changes depending on the order they occur in
What is one simple solution against race conditions?
To place all the requests in a queue and refuse to answer any requests until the previous one is completed
What is the problem with placing all requests in a queue and refusing any further requests to prevent race conditions?
It doesn’t scale. That’s how old computers work, single-threaded.
When is using a queue and refusing further requests a good solution
If you absolutely must count everything accurately, in real time. For example, a large festival with lots of requests where it can’t allow collisions.
What is alternative to using this queueing system that does scale well?
Eventual consistency
What is eventual consistency?
Each server holds it’s own count. It will update the central system when there’s time to do so. (it can also be seconds apart, doesn’t have to be hours).
When is eventual consistency a bad design choice?
When a change needs to be made immediately. For example, the privacy settings of a youtube video (private or public)
Why are youtube views not accurate?
This is because of caching. Caching holds the data and serves it to the customer quickly. A site like youtube has many, many caching servers and each time you can be routed to a different caching server.
Eventually, eventual consistency will take place and everything will be sorted out at some point
Q1.C Google Search replaced batch processing to create the index to a more incremental method of keeping the index up-to-date. Why did they want to make this replacement?
They wanted to make this replacement because the batch process via MapReduce resulted in documents not showing up in search results for 2-3 days. They needed a lower “time from crawl-to-search-hit”. Solution was:
- New data storage system: Colossus / BigTable
- Event-driven, incremental processing: Caffeine / Percolator
What is batch processing?
It gives you the ability to execute multiple operations in one request, rather than having to submit each operation individually
Q1.D What problem is avoided by Google by time-stamping the contents of a BigTable cell?
Because of versioning by timestamps there are no write-write conflicts on a cell. As we will see: when replicated, eventual
consistency is used.