Prometheus Flashcards

1
Q

Main components of prometheus ecosystem

A
  1. the main Prometheus server which scrapes and stores time series data
  2. client libraries for instrumenting application code
  3. a push gateway for supporting short-lived jobs
  4. special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
  5. an alertmanager to handle alerts
  6. various support tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

PromQL datatypes

A
  1. Instance vector
  2. Range vector
  3. Scalar
  4. String
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Instance vector

A

a set of time series containing a single sample for each time series, all sharing the same timestamp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Range vector

A

a set of time series containing a range of data points over time for each time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Range vector time selection:

A

The range is a closed interval, i.e. samples with timestamps coinciding with either boundary of the range are still included in the selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Offset modifier

A
  1. allows changing the time offset for individual instant and range vectors in a query.
  2. always needs to follow the selector immediately
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

@ modifier

A
  1. allows changing the evaluation time for individual instant and range vectors in a query. (unix timestamp)
  2. always needs to follow the selector immediately
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

vector1 unless vector2

A
  1. a vector consisting of the elements of vector1 for which there are no elements in vector2 with exactly matching label sets.
  2. all matching elements in both vectors are dropped
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Binary operations precedence

A
  1. *, /, %, atan2
  2. +, -
  3. ==, !=, <=, <, >=, >
  4. and, unless
  5. or
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

One-to-one vector matches

A
  1. finds a unique pair of entries from each side of the operation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

One-to.one vector match: on / ignoring

A
  1. The ignoring keyword allows ignoring certain labels when matching
  2. the on keyword allows reducing the set of considered labels to a provided list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Aggregation operators: without clause

A

removes the listed labels from the result vector, while all other labels are preserved in the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Aggregation operators: by clause

A

drops labels that are not listed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of rules in prometheus

A
  1. recording rules
  2. alerting rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To quickly check whether a rule file is syntactically correct

A

promtool check rules /path/to/example.rules.yml

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Recording rules

A
  1. allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series
  2. especially useful for dashboards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Recording / alerting groups

A

Rules within a group are run sequentially at a regular interval, with the same evaluation time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Elements of recording rules

A
  1. record
  2. expr
  3. labels: [ <labelname>: <labelvalue> ]</labelvalue></labelname>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Elements of alerts

A
  1. alert (name of alert)
  2. for: <duration> Alerts are considered firing once they have been returned, otherwise pending</duration>
  3. keep_firing_for: <duration></duration>
  4. labels:
  5. annotations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

LIMITING ALERTS AND SERIES

A

A limit for alerts produced by alerting rules and series produced recording rules can be configured per-group. When the limit is exceeded, all series produced by the rule are discarded, and if it’s an alerting rule, all alerts for the rule, active, pending, or inactive, are cleared as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Alertmanager

A
  1. handles alerts sent by client applications such as the Prometheus server.
  2. takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie
  3. takes care of silencing and inhibition of alerts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Core concepts of alert manager

A
  1. Grouping
  2. Inhibition
  3. Silences
  4. Client behavior
  5. High Availability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Alert management: grouping

A

Grouping categorizes alerts of similar nature into a single notification

24
Q

Alert management: inhibition

A

suppressing notifications for certain alerts if certain other alerts are already firing.

25
Q

Alert management: silencing

A

a straightforward way to simply mute alerts for a given time.

26
Q

Prometheus metric types

A
  1. Counter
  2. Gauge
  3. Histogram
  4. Summary
27
Q

Counter metrics

A

a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.

28
Q

Gauge metrics

A

a metric that represents a single numerical value that can arbitrarily go up and down.

29
Q

Histogram metrics

A

samples observations (usually things like request durations or response sizes) and counts them in configurable buckets.

CUMULATIVE

30
Q

Histogram exposes the following metrics

A
  1. <basename>_bucket: cumulative counters for the observation buckets

    </basename>
  2. <basename>_sum: the total sum of all observed values

    </basename>
  3. <basename>_count: the count of events that have been observed
    </basename>
31
Q

Summary metrics

A
  1. Similar to a histogram, a summary samples observations
  2. also it calculates configurable quantiles over a sliding time window.
32
Q

A summary with a base metric name of <basename> exposes multiple time series during a scrape:</basename>

A
  1. streaming φ-quantiles (0 ≤ φ ≤ 1) of observed events, exposed as <basename>{quantile="<φ>"}</φ></basename>
  2. the total sum of all observed values, exposed as <basename>_sum</basename>
  3. the count of events that have been observed, exposed as <basename>_count</basename>
33
Q

Instance

A
  1. an endpoint you can scrape is called,
  2. usually corresponding to a single process
34
Q

Job

A

A collection of instances with the same purpose, a process replicated for scalability or reliability for example

35
Q

Automatically generated labels and time series

A
  1. job
  2. instance <host>:<port></port></host>
36
Q

The three types of services

A
  1. online-serving
  2. offline-processing
  3. batch jobs
37
Q

Online-serving systems

A
  1. one where a human or another system is expecting an immediate response
  2. should be monitored on both the client and server side
38
Q

Metrics for online-serving systems

A
  1. number of performed queries,
  2. errors
  3. latency
  4. number of in-progress requests
39
Q

Offline processing

A
  1. no one is actively waiting for a response, and batching of work is common
  2. may also be multiple stages of processing
40
Q

Metrics for Offline processing

A

For each stage

  1. track the items coming in
  2. how many are in progress
  3. the last time you processed something
  4. how many items were sent out
  5. track batches going in and out
41
Q

Batch jobs

A

do not run continuously, which makes scraping them difficult.

42
Q

Metrics for batch jobs

A
  1. last time it succeeded
  2. how long each major stage of the job took
  3. overall runtime and the last time the job completed (successful or failed)
  4. overall job-specific statistics (e.g. total number of records processed)
43
Q

Types of subsystems

A
  1. Libraries
  2. Logging
  3. Failures
  4. Threadpools
  5. Caches
  6. Collectors
44
Q

Metric labels

A

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

  1. The change of any labels value, including adding or removing labels, will create a new time series.
  2. Labels with an empty label value are considered equivalent to labels that do not exist.
45
Q

Samples

A
  1. a float64 value
  2. a millisecond-precision timestamp
46
Q

Exposition formats

A
  1. Text-based format
  2. OpenMetrics Text Format
  3. Protobuf format
47
Q

Text-based formats

A
  1. All lines for a given metric must be provided as one single group, with the optional HELP and TYPE lines first
  2. Each line must have a unique combination of a metric name and labels.

http_requests_total{method=”post”,code=”200”} 1027 1395066363000

48
Q

delta(v range-vector) => instant vector

A
  1. calculates the difference between the first and last value of each time series element in a range vector
  2. should only be used with gauges and native histograms
49
Q

deriv(v range-vector)

A
  1. calculates the per-second derivative of the time series in a range vector v, using simple linear regression
  2. shell only be used with gauges
50
Q

irate(v range-vector)

A
  1. calculates the per-second instant rate of increase of the time series in the range vector.
  2. This is based on the last two data points
  3. should only be used when graphing volatile, fast-moving counters
51
Q

rate(v range-vector)

A
  1. calculates the per-second average rate of increase of the time series in the range vector.
  2. should only be used with counters and native histograms where the components behave like counters. It is best suited for alerting, and for graphing of slow-moving counters.
  3. can’t be used with the gauge metrics
52
Q

Metrics registry

A

A set of metrics in an instrumented application that will be returned when scraped

53
Q

Alert labels vs alert annotations

A
  1. Alert labels should be used for metadata that uniquely identifies an alert
  2. Annotations should be used for longer form descriptive content of an alert
54
Q

ALERT

A

queries the alerts being evaluated by Prometheus

55
Q

Metric Names - characters

A
  1. may contain ASCII letters, digits, underscores, and colons
  2. must match the regex [a-zA-Z_:][a-zA-Z0-9_:]*
56
Q

Best practice: metric naming

A
  1. (single-word) application prefix: domain the metric belongs to (prometheus_, process_, http)
  2. should have a suffix describing the unit, in plural form (single base unit)
  3. should represent the same logical thing-being-measured across all label dimensions.