2.2 Back of the Envelope Flashcards
What is the approximate value of 2^10?
1,024
1,024 is also referred to as 1 KB (kilobyte).
What is the exact value of 2^20?
1,048,576
1,048,576 is also known as 1 MB (megabyte).
What is the approximate memory size of 2^30?
1 GB
1 GB (gigabyte) is equivalent to 1 billion bytes.
What is the latency of an L1 cache reference?
0.5 ns
True or False: The latency of a mutex lock/unlock is 25 ns.
False
The latency of a mutex lock/unlock is actually 100 ns.
What is the latency for reading 1 MB sequentially from SSD?
1,000,000 ns (1 ms)
This reflects a speed of approximately 1GB/sec for SSD.
What happens during a typical first year for a new cluster?
Various hardware failures and maintenance events
Includes overheating, PDU failures, rack-moves, network rewiring, and individual machine failures.
What is a key characteristic of Google’s computing environment?
Large clusters of commodity PCs
What are the challenges faced in the Google engineering environment?
Need for distributed systems, high capacity systems, and careful design
This is due to data or request volumes that exceed the capability of a single machine.
What is the significance of designing software interfaces carefully?
Ensures flexibility and usability for other hypothetical clients
What is Protocol Buffers?
A good protocol description language used by Google
It is self-describing, supports multiple languages, and allows for efficient encoding/decoding.
What does ‘self-describing’ refer to in the context of protocols?
The ability of a protocol to provide its own structure and meaning
What is the estimated time to generate an image results page with 30 thumbnails using a serial read design?
560 ms
What is the estimated time to generate an image results page with 30 thumbnails using a parallel read design?
18 ms
This is a theoretical estimate that ignores variance.
What is the expected time to quicksort 1 GB of 4 byte numbers?
Approximately 30 seconds
What are microbenchmarks used for?
To understand performance and reduce cycle time for performance testing
What is the latency for reading 1 MB sequentially from disk?
30,000,000 ns (30 ms)
What does the term ‘back-of-the-envelope’ calculations refer to?
Quick estimates to assess feasibility or performance
Fill in the blank: The latency for sending a packet from CA to the Netherlands is _______.
150,000,000 ns
What are the core language libraries and data structures mentioned?
Core language libraries, basic data structures, SSTables, protocol buffers, GFS, BigTable, indexing systems, MySQL, MapReduce
These are fundamental components for building software systems.
Why is it important to understand the implementations of core libraries?
Understanding implementations enables effective back-of-the-envelope calculations
Without this knowledge, one cannot make informed decisions or estimates.
What should you do when building infrastructure?
Identify common problems and build software systems to address them in a general way
Avoid trying to satisfy every client demand to maintain system simplicity.
True or False: Building infrastructure should be done for its own sake.
False
Infrastructure should be built to address real needs, not hypothetical ones.
What does ‘Design for Growth’ entail?
Anticipate how requirements will evolve and ensure design works if scale changes by 10X or 20X
Considerations for larger scales may differ from smaller ones.
What is a key goal when designing for low latency?
Aim for low average times and consider 90%ile and 99%ile latencies
User experience can significantly degrade with high latency.
What consistency model is most products with mutable state gravitating towards?
Eventual consistency
This model is better from an availability standpoint compared to strong consistency.
What is the significance of using threads in applications?
Using threads can help improve both throughput and latency
Modern machines have multiple cores, making parallelization crucial.
What should you consider regarding data access?
Disks: seeks, sequential reads; Memory: caches, branch predictors
Understanding these aspects is crucial for optimizing performance.
Fill in the blank: Compression is a very important aspect of many systems, especially in _______.
inverted index posting list formats, storage systems for persistent data
Compression helps reduce storage requirements and improve performance.
What are ‘canary requests’ used for?
To detect issues in applications and improve robustness against failures
They help identify problems before affecting a large number of users.
What is essential for monitoring and debugging applications?
Exporting HTML-based status pages and key-value pairs via a standard interface
This allows for easy diagnosis and monitoring of system health.
What is the source code philosophy at Google?
Google has one large shared source base with core libraries and application-specific code
This promotes code reuse and improvements across applications.
What are some benefits of code reviews and design reviews?
Improve code quality, catch issues early, and facilitate knowledge sharing
They are part of good software engineering hygiene.
What challenges arise from multi-site software engineering?
Increased coordination needs, communication difficulties, and trust establishment between teams
These challenges necessitate effective collaboration tools and practices.
What motivates Google’s expansion to multiple engineering sites?
To hire the best candidates regardless of geographic location
This strategy enhances talent acquisition and diversity.
What is a service-based model for software development?
A model that allows fluid changes, easy testing, and enables small teams to accomplish a lot
It promotes agility and responsiveness in software development.
What impact does work at Google have?
Work has a very large impact, affecting hundreds of millions of users every month
This scale of influence can be highly motivating for engineers.