l7 monitoring Flashcards
White / Black Box Monitoring
- Blackbox Monitoring: Look from outside
into a “blackbox” - Classical User Perspective
- Source for alerts
- External View
- Define Output:
- Latency
- Return Values / Return Codes
- Prober OSI-Layer aware:
- Level 7: HTTP/s / DNS / gRPC
- Level 4: TLS / IMAP
- Level 3: ICMP
HTTP Response Time - Whitebox Monitoring: Transparent look
into a system: - Internal Metrics like CPU Usage
- Traces
- Logs
Monitoring should adress two questions
- What is broken -> Symptom
Immediate Alerts - Why is it broken -> Cause
Proactive Prevention
4 Golden Signals
Two monitoring methods related to these signals
* RED: Rate (Traffic), Error, Duration (Latency)
* Application focus, no infrastructure or external systems
* USE: Utilization, Saturation, Errors
* Focus on all resources, even external resources
Saturation
Is the system overutilized or not?
Metric Types
Counter
Monotonic Increase: A counter is a cumulative metric that only increases over time. It is used to track the number of occurrences of specific events or actions.
Reset on Restart: Counters typically reset to zero when the system restarts or the application restarts.
Gauge
Arbitrary Values: A gauge is a metric that can go up and down, representing a current value at a specific point in time.
Instantaneous Measurement: It measures the current value of some variable that can fluctuate, such as temperature, memory usage, or active connections.
Histogram:
Buckets: Measures distribution over predefined buckets.
Examples: Response time distribution, event duration distribution.
Use Cases: Analyzing frequency and distribution of values.
Summary:
Quantiles: Provides quantile estimates and total counts/sums.
Examples: Request latency quantiles, performance analysis.
Use Cases: Tracking latency, analyzing performance characteristics.
Saving Metrics
Time Series Databases
* Cleaning up old data:
- Roll up: Summarizing values with coarser granulariy after given number of days
- Clean Up: Removing old data from the storage
- Archive: Put old data to slower storage
* Continuously capturing data over time
* Data Stream: Data is inserted all the time, Updated are seldom (or not occuring after all)
* Data is time-centric
* All data is timestamped
PromQL
The storage metrics can then be used to be queried
with an own defined query language: PromQL
* Non-SQL style
* Only read, never update/write
* Vector-based
* Build in operators:
* +, -, and, or, unless, >, <, =~, !=,
* Aggregation timestamps: avg, sum, rate, stddev
Base on Time Range
* Return data for fixed defined timestamp
* Is the base for Dashboards as well as for al
Explain scrape config
kubernetes_sd_config: * Matches the kind of service discovery including
element to be scraped
relabel_configs: * Modifying the labels with source_labels (which
label is demanded), action (what should happen
with the label?), target_label (label to be
generated)
*Very powerful to enrich monitoring data with
further information like: * namespace, pod, all labels * Adapting urls-suffic (metrics path) * Filtering