exam_2017 Flashcards
Using examples from NFS, explain why distribution transparency and failure transparency are impossible or hard to achieve
Distribution transparency, because a user can’t tell whether NFS is down or the network is badly congested, because it’s impossible to distinguish between a dead process or a slow responding process. These communication latencies can’t be hidden.
Failure transparency, because a user can’t tell whether the server performed the operation if the NFS crashed.
What are scaling techniques?
- Bigger Machines
- Virtualisation
- Asynchronous communication
- Replication & Caching
- Partitioning
- Software Optimisation
A mathematical proof states that reliable failure detection is impossible. Why then is it possible to do reliable failure detection in practice?
- By using flooding consensus, an agreement is reached in which a selected leader gets accept messages from a quorum of servers. Two examples (fail-noisy methods) are Paxos, which is used by Google, or the more understandable and formally proven correct protocol: Raft.
- By making agreements on a certain amount of time after which a server is considered down.
What is consensus?
The process by which we reach agreement over system state between unreliable machines connected by asynchronous networks
What is Paxos trying to solve?
How do we reach agreement over a single value in a scenario where failures might occur
What are Paxos stages?
It is essential to have a multi-state process.
- Promise and commit
- Majority agreement
- Monotonically increasing numbers
Why is reliable failure detection important for consensus in a process group?
To achieve overall system reliability in the presence of a number of faulty processes, or else a process may wait infinite time for a response.
How does asynchronous communication help to build large systems? Give an example.
Async communication helps because systems don’t have to wait on each other to send bits over the line. A start and stop bit let the client know that the information is complete. Downloading or sending files or emails are examples of async communication.
What are the 4 types of servers for google search?
- Root
- Cache
- Parent
- Leaf
Scaling techniques for types of servers google search?
root: software optimisation
Cache: replication/caching
Parent: partitioning
Leaf: partitioning
Functions for root, cache servers google search?
root: handles browser requests, acts as front-end web server
cache: Stores temporary requests
Functions for parent, leaf servers google search?
parent: distribute queries as in a multi-level tree
leaf: index/doc requests are handled from in-memory data structures
What are the pros of in-memory indexing systems?
Big increase in throughput.
Big decrease in query latency
Issues of in-memory indexing systems?
Variance: query touches 1000s of machines, not dozens
Availability: 1 or few replicas of each doc’s index data
Queries of death
What are canary requests?
Request to check health status of a machine. You send a request to check if it works on one server first, if it fails unexpectedly, try another machine (could be coincidence). If fails K times, reject request
What does the repository manager do?
Coordinates index switching as new shards become available
What were the problems with traditional google search system?
More collections to search besides web. For example, Google Maps. You need more real-time results
How was creating the index done first?
It was a batch process via MapReduce.
- Store all documents in GFS
- Run several MapReduce jobs to create index
- Upload index to Leaf servers
What was the problem with the MapReduce index method?
New documents would not show up in search results for 2-3 days.
What solutions replaced mapreduce
Data storage system: Colossus / BigTable
Event-driven, incremental processing: Caffeine / Percolator
What is BigTable?
A distributed storage system. A given table is a three-dimensional structure containing cells indexed by a row key, a column key and a timestamp. Each table may consist of many tablets. It’s typically used to replicate data to multiple bigtable clusters in different datacenters.
What makes BigTable scalable?
- There is no versioning (timestamp is the version)
- Automatic resource management (less manual labor and instant resource availability)
- Tablets in table split if getting too big
- Different machines can handle different tablets, which results in the workload divided equally over resources
What makes caching one of the most efficient scaling techniques?
A couple machines can do the work of a substantial amount of machines. It reduces network traffic, access latency, workload of server, and the robustness of the service is enhanced and the access time is shorter
What is the disadvantage of caching?
Big latency spike/capacity drop when complete index updated or cache flushed.
- In some cases the data might be outdated, though there are methods to prevent this.
Cache misses increase lookup time because there’s already time spent looking into the cache.