L7 - Cloud Monitoring 2/2 Flashcards
What is Logstash
Logstash = light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination.
X-pack
X-Pack brought a number of deeply integrated enterprise capabilities to the Elastic Stack which included security, alterting, monitoring, graph analytics etc.
What is Beats?
(lightweight, single-purpose data shippers)
agents to collect data
filebeat for logs
metricbeat for metrics
heartbeat for health
elastic search provides integration for data injection and major systems
What is Prometheus for Metrics?
Prometheus is an open source monitoring system
https://prometheus.io
Initially built by soundcloud.com now a Cloud Native Foundation project
Features
Metric collection in form of time series
Storage by a time series database
Query language for accessing the time series
Alerting
Visualization
Cloud Native Computing Foundation
- pushes for a sustainable ecosystem for Cloud Native Computing
- hosts several fast-growing open source projects including Kubernetes, Prometheus and Envoy
- runs CloudNativeCon
Prometheus Architecture
Prometheus Scraping
- metrics retrieved through /metrics endpoint
metric types: - counter: cumulative metric montonically increasing
- gauge: numerical value arbitrarily gone up and down
- histogram: counts for buckets, total sum, # events
- metric name and labels define a time series
Prometheus Exporters
exporters allow to provide metrics for services that cannot be instrumented
Prometheus Alerting
- separated in prometheus server and alertmanager
- rules determine an alert
What is Prometheus Alertmanager?
- manages alerts, including silencing, inhibition, grouping and sending out notifications
What is Prometheus Scalability?
- hierarchical federation of servers
How does visualization with Grafana work?
- open-source
- prometheus can be used as a data source (connect through ip-address and port; create your own dashboard)
What do Borgmon instances do?
- receives a list of target services e.g. from a discovery service
- periodically collects the service monitoring interface:
(the collection is distributed over the period;
it decodes the results and stores them in memory as a time series) - metrics are counters and gauges
What are gauges?
instantaneous measurements e.g. CPU utilization
How does time series storage work?
- each metric is stored in a time series
- entries are (timestamp, value) pairs
What is in-memory buffering for Google Borgmon?
- local buffer is designed to hold the time series for a certain time horizon
- oldest entries are deleted once the horizon is reached
How to compute the ratio of error to requests?
- Aggregate rates of response codes across instances
- Compute error rate for entire cluster
- Compute ratio of error to requests
Alerting rules in Borgmon
- specify a condition for alerting
- minimum duration of alerting situation (alerts have pending and fire state)
Borgmon summary concerning monitoring and usage
monitoring system:
- provides measurements of metrics
- stores measurements as time series
- rules for aggregation
- hierarchical design for scalability
usage:
- alerting
- dashboard
What is Amazon CloudWatch?
- monitoring and management system
- collects:
- metrics and logs
What are some preselected metrics in Amazon CloudWatch?
- CPU utilization
- read/write latency
- request counts and latency for LB
How can you access the management console for Amazon CloudWatch?
- CLI
- Web service API
- Libraries for Java etc.
What is Google Dapper?
For distributed tracing.
What is tracing?
- capture the interaction of different services
- capture the individual events e.g. submit a request, receive the request, start processing, submit answer, receive answer
- associate events with a given request to be able to analyze the execution of this request