L7 - Cloud Monitoring Flashcards
Why monitor?
To make best use of your rented resources to reduce your costs and increase satisfaction of the users of your service
observable system
one that exposes enough data about itself so that generating information (finding answers to questions yet to be formulated) and easily accessing this information becomes simple
monitoring
process of collecting status information of applicaitons and resources; the data can be used to observe application and infrastructure
monitoring system
consists of all components for gathering monitoring data at runtime
2 ways to create information
- proactively: through continuous analysis for triggering alarms or to give an overview of the status of the system
- reactively: triggering through events such as incidents (e.g. root cause analysis and autoscaling)
What is the purpose of monitoring at the infrastructure level?
- resource management
- incident detection
- root cause analysis
- metering for payment
- auditing
What is the purpose of monitoring at application level?
- performance analysis
- resource management
- failure detection and resolution
- SLA verification
- auditing
How does monitoring take place in a parallel system?
- batch system
- data are collected during an application run
- analysis happens post mortem
- execution is reproducable
How does monitoring take place in the cloud?
- interactive system
- data are continuously produced - realtime data
- realtime analysis
- data used for immediate action or to study past system behavior
3 pillars of monitoring
- metrics
- logs
- traces
4 important metrics in monitoring
- latency
- throughput or traffic
- error rate
- utilization or saturation
What is latency
- time it takes to service a request
- selectively measures successful or error requests
What is throughput or traffic?
- web services: requests/second
- streaming system: network I/O rate or concurrent sessions
- database: transaction/second or retrievals per second
What is the error rate?
- rate of requests that fail
What is utilization or saturation?
- percentage of capacity
- CPU, memory, I/O
For what are metrics collected for Microservices?
Autoscaling, performance tuning
What for are metrics collected for the platform like K8s or Docker?
container distribution, autoscaling VM cluster
What are metrics collected for the infrastructure?
- root cause analysis
For what are metrics collected for the hardware?
management of VMs
Monitoring system requirements
- comprehensive
- low intrusion
- extensibility
- scalability
- elasticity
- accuracy
What is Blackbox Monitoring?
- the monitoring system is handled as a black box
- no data are gained from inside of the system
- e.g. only the request interface of a service is visible. Nothing about the internal structure
from internet:
Black box monitoring refers to the monitoring of servers with a focus on areas such as disk space, CPU usage, memory usage, load averages, etc.
What is whitebox monitoring?
- data is also from inside of the system
- this gives more context and more detailed insights
- e.g. internal organization of a service is visible
e.g. Performing advanced detection of behavior we don’t expect to see, such as a user not going through the normal steps you’d expect when signing into your application or resetting a password.
Why is there overhead in monitoring?
- instrumentation
- computation for aggregations
- memory overhead for buffering
- time to push to disk or transfer to collector
- storage overhead for long-term storage
What is instrumentation?
Instrumentation is the process of adding code to your application so you can understand its inner state
What does overhead lead to
intrusion
How can overhead in monitoring be reduced?
- number of metrics
- measurement frequency
- representation
- batching
- sampling
- long-term coarsening
What is a log?
sequence of immutable records of discrete events
What can an event log be composed of?
- plaintext = most common format of logs
- structured = much evangelized, typically JSON
- binary = think logs in the Protobuf format
What is ELK Stack?
ELK is the acronym for three open-source projects:
- Elasticsearch: search and analytics engine
- Logstash: server-side data processing pipeline
- Kibana: lets users visualize data with charts and graphs
What is Elastic Stack
next evolution of ELK Stack
= the open source, distributed, RESTful, JSON-based search engine. Easy to use, scalable and flexible, it earned hyper-popularity among users and a company formed around it, you know, for search.
ELK + Beats and X-Pack