l10 observability Flashcards

1
Q

Observability

A

Telemetry Collection:
* Collecting metrics from various sources across all
layers (L3 – L7)
* Needs to gain metrics from infrastructure as well as
elements of application (deployment & services)
Analytics and Visibility
* Visualizing and Analyzing Metrics
* Mechanism for reporting anomalies
* Packet-Capture from Pod-to-Pod necessary
Security and Troubleshooting
* Tracing or Service Meshing
* Mechanisms for Prediction and Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

eBPF

A

Extensible Berkley Packet Filters:
* Sandboxed Programs in Userspace in
Kernel
* Cillium (as one example) for better
networking instead of iptables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Target of Observability, Critical Path

A

A span A is part of the critical path if and
only if:
– A’s parent is blocked on A’s completion
at time t
– A is not blocked on any child span’s
completion at time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SLI

A

Service Level Indicator
What are we measuring?
E.g. How much time take the search results

  • Base for defining availability
  • For one specitic action/attribute
  • Of one specific service
  • Examples to be defined
  • Golden Signals of one specific service of
    one operation (the concise the better) for
    network service
  • Durability for storage
  • Correctness for computation
  • Just the metrics, no thresholds / rules to meet
  • Need to be derived automatically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SLO

A

Service Level Objective
How well do we perform on the SLI?
E.g. Queries should be return results within 500ms

  • Threshold to be hold for defined SLIs:
    SLI <= target threshold
  • Technically hard to define, need to be refined
  • Wrong SLI à no use
  • Threshold too low à Customer/Services affected
  • Threshold too high à Too many incidents, false alarms
  • Must be simple yet holistic
  • Avoid absolutes (always available, for all data accesses, etc.)
  • Organizational hard to develop
  • Must be defined with product management
  • Have as few SLOs as possible (but as many as necessary)
    p95(http_latency[path=webappl/impressum}) < 50
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SLA

A

Service Level Agreement
Consequences for missing objectives
E.g. Apologize, payback, …

  • Result if SLO is not met
  • Legal and easy language with fixed defined consequences
  • Promise against Customer defined by Product Management (not DevOps any more)
  • Not of interest for the rest of the lecture since not definable by sourcecode but in contracts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 causes for failure

A

Internal System Changes,
Changes in User Behaviour,
Changes in dependencies
Changes in platform

those are system boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Availability, Parallel vs Serial / Sequentiell

A

Parallel:
* HA-Setup of same services
* E.g. Horizontal Scaling
Parallel Component = 1 - ( 1 - obe)*(1-unde)
Serial:
Series Component = C1 * C2 * C3 * C4…
* Different Services of different kinds
* E.g. Database and Messaging and Load Balance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly