Monitoring. Flashcards
Cloudwatch vs Cloudtrail vs xray
Cloudwatch: Metrics; collect and track key metrics Logs: Collect, monitor, analyze, store Events: Send notifications Alarms: React in real time to event/metric
XRAY
Troubleshoot app performance and error
distributed tracing of services
Cloudtrail
Internal monitor of APi call
Audit changes to AWS resources.
Cloudwatch Metrics
Namespace
dimension
Metrics for EVERY service in AWS
Metric: Variable to monitor CPU/Network
Metrics belong to NAMESPACES
*NAMESPACE: container for metric
- Dimension: A dimension is metrics metadata in the form of a name/value pair
- clarifies what the metric is and what data it stores**
- up to 10 dimensions per metric
example: Dimensions: Server=Prod, Domain=Frankfurt,
Metric have timestamp
**CAN CREATE DASHBOARD OF METRICS WITH THIS INFORMATION
Cloudwatch Metrics
DETAILED MONITORING
default ec2 metrics
Default: EC2 default 5 minutes
detailed monitoring: 1 minute metrics
Allows reaction to happen quicker, like upscale ASG
-Free allows 10 detailed monitoring metrics
NORMAL METRICS TO BE SEEN
CPU
NETWORK
Disk I/O
Custom Metrics are required for things like **Memory.
Cloudwatch Metrics CUSTOM METRICS Dimensions Metric resolution\ sending metric to cloudwatch api call
Possible to define and send your own metrics to cloudwatch
Customize items:
- Dimensions (attributes) can be used to segment:
clarifies what the metric is and what data it stores
Instance.id
Enviroment.name - Metric Resolution: Standard custom resolution is 1 minute, HIGH RESOLUTION FOR LOW
High resolution: API(StorageResolution Parameter) can be assigned up to 1 second resolution - API CALL PUTMETRICDATA: PutMetricData API call can be used to send metric to cloudwatch, throttle can be done via exponential backoff.
Cloudwatch ALARMS
targets
states
period
*Trigger notification for any metric
Alarms can go to :
AutoScale, Ec2 actions, SNS notification
Various options for sample: Sampling, %, max, min, etc
States
OK
Insufficient_data (when not enough data is sent)
Alarm (alarm threshold passed)
Period:
Time in length to evaluate metric
High resolution metrics will need to be 10 or 30 seconds for period
Cloudwatch ALARMS
cloudwatch period vs evaluation period
datapoint: A datapoint is the value of a metric for a given metric aggregation period
Period: Length of time aggregating data points
multiple points collected over a specific time make a period
Evaluation period: is the number of alarm periods (or alarm data points) to take into account when determining whether the alarm is triggered or not.
Cloudwatch LOGS overview
how its sent
where it comes from
where does it go to and why?
Application can send logs to Cloudwatch using the SDK
Collection from Elastic Beanstalk ECS Lambda VPC flow log API gateway Cloudtrail, via filter Cloudwatch log Agent, ec2 machines Route53, LOG DNS query
Cloudwatch logs can be INPUT into
Batch export to S3 for archive
Elasticsearch cluster for analysis
Cloudwatch LOGS basics and operation Log Storage Architecture: Log expiration policy: CLI tools security and permission
Cloudwatch logs: Filter Expressions: allows searching through logs
Log Storage Architecture:
Naming convention
1. Log Group: Name representing the origin app
2. Log Stream: Instances within apps/ log files /containers (the sources of data from one item)
Log expiration policy:
Can make it never expire, or remain 30 days
CLI we can tail cloudwatch logs: as logs are displayed in cloudwatch we can have it show in CLI of instance.
(using Tail in Command line interface of instance we can get realtime information)
To send logs to cloudwatch we need IAM permissions!!
Security: Encryption of Logs KMS at rest at LOG GROUP LEVEL
Cloudwatch logs for Ec2: AGENT
default what happens
what agent is used for
DEFAULT: No logs will go into cloudwatch from EC2
Need to run cloudwatch Agent!
Cloudwatch agent: Push log files you want from ec2 to cloudwatch logs
Need proper IAM permissions
Cloudwatch agent can be setup on premises also!
DEFINITiON:
Collect more system-level metrics from Amazon EC2 instances / ON PREMISE SERVERS across operating systems. The metrics can include in-guest metrics, in addition to the metrics for EC2 instances. The additional metrics that can be collected are listed in Metrics Collected by the CloudWatch Agent.
Cloudwatch LOGS AGENT / UNIFIED AGENT
uses
differences
managed services considerations
For Virtual servers
- Cloud watch logs agent:
Old version of agent
can only send to cloudwatch logs - Cloudwatch unified agent
- Collect addtional system level metrics, RAM processes, etc
- Collect logs to send to Cloudwatch logs
- Centralized configuration using SSM parameter store
* *ALLOWS CENTRALIZED CONFIGURATION OF ALL AGENTS UNLIKE NORMAL PREVIOUS SERVICE.
CLOUDWATCH UNIFIED AGENT
METRICS
DEFAULT WITHOUT
Default:
CPU, network, IO disk
Collected via UNIFIED AGENT, directly on linux server
CPU (active, guest, idle) Disk (free, used, total), disk IO Ram (free, inactive) Netstat (tcp, udp connections) Processes (total, idle, sleep) Swap space (free/used)
Cloudwatch LOGS METRIC FILTER
Filtering expressions
retroactive filtering?
Filter expressions: via cloudwatch logs
Specific IP inside log
Or count occurrences of ERROR type in logs
metric filter can count and TRIGGER ALARMS
Filters do not retroactive filter data, only public metric for items counted after filter was created!
DIAGRAM
EC2 (streams to CW) CW logs the receives stream
Log is processed through the filter
Metrics filtered are counted and can trigger alarm
SNS publishes message.
CLOUDWATCH EVENTS
Scheduled
event pattern
Documentation of changes.
-Schedule : CRON jobs
-Event pattern: Rules to be activated in response to services acting in a specific way
EX: Codepipeline state changes can trigger lambda function or SNS etc
-Cloudwatch event: will generate a JSON document that gives information about the change
EVENT BRIDGE
Default event bus
Partner event bus
Custom event bus
cross account
Default event bus : generated by AWS services CLOUDWATCH EVENTs
Partner event bus: Receive events from SAAS service or apps, zendesk, datalog, segment, auth0
Custom Event Bus: For your own applications, they can publish their own events and have other applications react
Can be accessed by other accounts: Cross account event bus
Rules: HOW TO PROCESS EVENTS
EVENT BRIDGE
SCHEMA REGISTRY
EventBridge can analyze events and INFER schema, they figure out how to structure the data.
Schema Registry: Allows you to generate code for application that knows in advance how its structured in event bus. so you can make sure your code can be analyzed in EVENT BUS
Schema can be versioned
.
**The Amazon EventBridge schema registry stores event structure - or schema - in a shared central location and maps those schemas to code for Java, Python, and Typescript so it’s easy to use events as objects in your code. Schemas from your event bus are automatically added to the registry when you turn on the schema discovery feature. You can connect to and interact with the schema registry from the AWS console, APIs, or through the SDK Toolkits for Jetbrains (Intellij, PyCharm, Webstorm, Rider) and VS Code.