Monitoring Flashcards
What is CloudWatch?
Platform for monitoring and observability to identify potential issues
What are the 2 categories of CloudWatch metrics?
Default metrics and customer provided metrics
What is the default period for cloudwatch statisitics?
60 seconds
Are there default Alarms?
No
What is the default reporting interval? For detailed monitoring?
5 minute and 1 minute with added cost
What metrics are provided by standard?
CPU Utilization, Network throughput
What metrics are custom?
EBS volume storage usage, Memory utilization
What does it mean for a metric to be custom?
Metrics need a CloudWatch agent installed on the host
What are system metrics?
These are default, standard metrics e.g. managed services provides more of these
What are application metrics?
Custom metrics that need a CloudWatch agent installed
What is CloudWatch Logs?
Place to centrally store and access log files and query
What is a Log Event?
A data point of what happened with a timestamp and data
What is a Log Stream?
A collection of log events from the same source
What is a Log Group?
Collection of log streams from different sources e.g. different ec2 instances
What do you use if you want to use SQL with CloudWatch Logs?
Use CloudWatch Insights
What do you create when looking for specific terms in CloudWatch?
You create an Event Filter Pattern
Where should Logs go if the focus is on storage? If focus is on processing logs?
For storage send them to S3
For processing logs send to CloudWatch Logs
What are common sources for CloudWatch Logs?
EC2, Lamba, RDS, On-Prem, Cloud Trail
What should you use if you need realtime log processing?
Use Kinesis instead
How do you use Alarms with CloudWatch Logs?
If an event filter pattern is matched, then generate an Alarm
What should you use for system/application lvl monitoring? Monitoring best practices/standards?
CloudWatch
AWS Config
What do you need installed on EC2 for CloudWatch Logs to work?
A CloudWatch Agent needs to be installed
Is CloudWatch realtime?
No, it is near realtime
What can you do to fix underlying hardware failure in EC2 using CloudWatch?
You can stop, restart, reboot, terminate an EC2 instance as an Action using CloudWatch Alarm
What should you watch out for between Alarms polling period and metric collection?
Note that if you are collecting metrics at a 5 minute interval but alarm looks at data every 1 minute then will always be insufficient data