Intro Flashcards
Approaches for coping with load?
- Scaling up or vertical scaling
- Scaling out or horizontal scaling
- Using Elastic systems in case of unpredictable load.
Scaling up or vertical scaling?
Moving to a more powerful machine
Scaling out or horizontal scaling?
Distributing the load across multiple smaller machines
What are Elastic systems? And when are they useful?
Automatically add computing resources when detected load increase.
Quite useful if load is unpredictable.
What are the three design principles for software systems in terms of maintainability?
- Operability
- Simplicity
- Evolvability
What is Operability?
Make it easy for operation teams to keep the system running.
Data systems can do the following to make routine tasks easy e.g.
- Providing visibility into the runtime behavior and internals of the system, with good monitoring.
- Providing good support for automation and integration with standard tools.
- Providing good documentation and easy-to-understand operational model (“If I do X, Y will happen”).
- Self-healing where appropriate, but also giving administrators manual control over the system state when needed.
What is Simplicity?
Easy for new engineers to understand the system by removing as much complexity as possible.
What is Evolvability?
Make it easy for engineers to make changes to the system in the future.
What are functional and nonfunctional requirements?
- Functional requirements: what the application should do
- Nonfunctional requirements: general properties like security, reliability, compliance, scalability, compatibility and maintainability.
What is the difference between Latency and response time?
The response time is what the client sees. Always measured on client side.
Latency is the duration that a request is waiting to be handled.
When we measure response time, what is a better metric than average response time? And why?
Percentiles are a metric than average response time as percentiles tells how many users actually experienced that delay
- Median (50th percentile or p50). Half of user requests are served in less than the median response time, and the other half take longer than the median
- Percentiles 95th, 99th and 99.9th (p95, p99 and p999) are good to figure out how bad your outliners are.
What are the common percentiles measures used for response time?
- Median (50th percentile or p50). Half of user requests are served in less than the median response time.
- Percentiles 95th, 99th and 99.9th (p95, p99 and p999) are good to figure out how bad your outliners are.
Give an example of using 99.99 percentile as a measure of response time? And why is not common practice?
Amazon uses 99.9th percentile for response time requirements for internal services because the customers with the slowest requests often have the most data.
What accounts for large part of response times at high percentiles? And why are high percentiles not common practice?
Queueing delays often account for large part of the response times at high percentiles.
Optimizations are expensive at high percentiles.
What are SLOs and SLAs?
Service level objectives (SLOs) and service level agreements (SLAs) are contracts that define the expected performance and availability of a service. These metrics set expectations for clients of the service and allow customers to demand a refund if the SLA is not met.
An SLA may state the median response time to be less than 200ms and a 99th percentile under 1s.