- metrics retrieved through /metrics endpoint metric types: - counter: cumulative metric montonically increasing - gauge: numerical value arbitrarily gone up and down - histogram: counts for buckets, total sum, events - metric name and labels define a time series

- separated in prometheus server and alertmanager - rules determine an alert

L7 - Cloud Monitoring 2/2 Flashcards by Paolo Oppelt

What is Logstash

Logstash = light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination.

How well did you know this?

Not at all

Perfectly

X-pack

X-Pack brought a number of deeply integrated enterprise capabilities to the Elastic Stack which included security, alterting, monitoring, graph analytics etc.

How well did you know this?

Not at all

Perfectly

What is Beats?

(lightweight, single-purpose data shippers)

agents to collect data
filebeat for logs
metricbeat for metrics
heartbeat for health
elastic search provides integration for data injection and major systems

How well did you know this?

Not at all

Perfectly

What is Prometheus for Metrics?

Prometheus is an open source monitoring system
https://prometheus.io
Initially built by soundcloud.com now a Cloud Native Foundation project

Features
Metric collection in form of time series
Storage by a time series database
Query language for accessing the time series
Alerting
Visualization

How well did you know this?

Not at all

Perfectly

Cloud Native Computing Foundation

pushes for a sustainable ecosystem for Cloud Native Computing
hosts several fast-growing open source projects including Kubernetes, Prometheus and Envoy
runs CloudNativeCon

How well did you know this?

Not at all

Perfectly

Prometheus Architecture

How well did you know this?

Not at all

Perfectly

Prometheus Scraping

metrics retrieved through /metrics endpoint
metric types:
counter: cumulative metric montonically increasing
gauge: numerical value arbitrarily gone up and down
histogram: counts for buckets, total sum, # events
metric name and labels define a time series

How well did you know this?

Not at all

Perfectly

Prometheus Exporters

exporters allow to provide metrics for services that cannot be instrumented

How well did you know this?

Not at all

Perfectly

Prometheus Alerting

separated in prometheus server and alertmanager
rules determine an alert

How well did you know this?

Not at all

Perfectly

What is Prometheus Alertmanager?

manages alerts, including silencing, inhibition, grouping and sending out notifications

How well did you know this?

Not at all

Perfectly

What is Prometheus Scalability?

hierarchical federation of servers

How well did you know this?

Not at all

Perfectly

How does visualization with Grafana work?

open-source
prometheus can be used as a data source (connect through ip-address and port; create your own dashboard)

How well did you know this?

Not at all

Perfectly

What do Borgmon instances do?

receives a list of target services e.g. from a discovery service
periodically collects the service monitoring interface:
(the collection is distributed over the period;
it decodes the results and stores them in memory as a time series)
metrics are counters and gauges

How well did you know this?

Not at all

Perfectly

What are gauges?

instantaneous measurements e.g. CPU utilization

How well did you know this?

Not at all

Perfectly

How does time series storage work?

each metric is stored in a time series
entries are (timestamp, value) pairs

How well did you know this?

Not at all

Perfectly

What is in-memory buffering for Google Borgmon?

Study These Flashcards

local buffer is designed to hold the time series for a certain time horizon
oldest entries are deleted once the horizon is reached

How to compute the ratio of error to requests?

Study These Flashcards

Aggregate rates of response codes across instances
Compute error rate for entire cluster
Compute ratio of error to requests

Alerting rules in Borgmon

Study These Flashcards

specify a condition for alerting
minimum duration of alerting situation (alerts have pending and fire state)

Borgmon summary concerning monitoring and usage

Study These Flashcards

monitoring system:
- provides measurements of metrics
- stores measurements as time series
- rules for aggregation
- hierarchical design for scalability

usage:
- alerting
- dashboard

What is Amazon CloudWatch?

Study These Flashcards

monitoring and management system
collects:
metrics and logs

What are some preselected metrics in Amazon CloudWatch?

Study These Flashcards

CPU utilization
read/write latency
request counts and latency for LB

How can you access the management console for Amazon CloudWatch?

Study These Flashcards

CLI
Web service API
Libraries for Java etc.

What is Google Dapper?

Study These Flashcards

For distributed tracing.

What is tracing?

Study These Flashcards

capture the interaction of different services
capture the individual events e.g. submit a request, receive the request, start processing, submit answer, receive answer
associate events with a given request to be able to analyze the execution of this request

What are the pros of Google Dapper?

- continuous and ubiquitous tracing - low-overhead - application transparency - scalability - can collect payload data if added by application developer - can be used to enforce security policies (authentication and encryption) - allows for runtime verification that provides greater assurance than source code audits

What is a Dapper trace tree?

- nodes are called spans: lifetime of a request - edges indicate temporal relationship

What is there to know about spans?

- represented as remote procedure call (RPC) - attributes: - span id: identifies a span - parent id: span id of triggering span - trace id: identifies triggering request

What are the advantages of Dapper?

- Dapper does not collect any payload data - Dapper can be used to enforce security policies - Such runtime verification provides greater assurance than source code audits - Continuous and ubiquitous tracing - Low-overhead - Application transparency - Scalability

Why is there overhead in Dapper?

- trace generation and collection - amount of resources to store and analyze trace data

How to reduce overhead in Dapper?

- coalesce events: multiple trace events are coalesced to a log file write operation - asynchronous writes: writes are asynchronous to the traced application - adaptive sampling at the application (only a certain rate of requests per second are captured)

What can Dapper Depot API be used for?

- access to traces via trace id

What is an open-source alternative to Dapper?

Open Telemetry

L7 - Cloud Monitoring 2/2 Flashcards

(32 cards)