l11 chaos Flashcards
Classical Testing
does not scale
Classical Test Pyramide:
* Unit Tests à fast, trivial, fine granular
* Integration Tests à long, complex, covering
multiple components
* System Tests à ongoing, very complex,
covers entire system
Problems:
* The higher it gets in the pyramide, the more effort
it needs
* Entire Systems are too complex to be tested
completely with all correlations
* Classical testing is not scaling and therefore not
working
Definition of Chaos Engineering
Chaos Engineering is the discipline of experimenting on a distributed system in order to build
confidence in the system’s capability to withstand turbulent conditions in production
Tests VS Experiments
Workflow of Chaos Experiments
Define Steady State
Defining “healthy” system:
- Similar to “baseline” when it comes to metrics
- Focus more on application metrics than on
technical metrics
- Similar to SLO / Alarming
- Focus on “Symptoms” not “Cause
.
Introducing Failures
Introducing Errors as described in the hypothesis bases on the following criteria:
* Errors might be of different kinds (see beside) + combinations of those
* Define a Blast Radius, in which the chaos should happen.
* If effects outside happen, the experiment must stop immediately
* Set up a time frame where the chaos occurs
Form a Hypothesis
Reflect what might happen when you introduce failures:
* How will the steady system change upon failure injection?
* Often, if resiliency is the target, the hypothesis is something like:
“The system will operate normal”
* Think about the effect on a control group and affected group
Be open regarding the result, there is no failure
Verifying Results and fixing issues
Each experiment should be clearly documented:
* Templates help to structure the experiment and the outcome
* If the result proves the hypothesis à fine
* If the result dissaprove the hypothesis, analysis should take place, why this happened
* Action Items are to be filed and to prioritized based on the analysis
* The control group can help you to find differences in the system.
Name 5 difference between tests and experiments.
Failure categories
Advanced Principles of Chaos Engineering
- Hypothesize about steady state.
- Vary real-world events.
- Run experiments in production.
- Automate experiments to run continuously.
- Minimize blast radius