Data Dog Flashcards
Infrastructure Monitoring
Visibility into hosts, containers, kubernetes clusters, and cloud services
Application Performance Monitoring
Tracing requests and identifying performance bottlenecks
Log management
Collects, filters, and analyzes logs at scale
Security monitoring
Detecting threats and misconfigurations
Real User Monitoring (RUM)
Monitors frontend users experiences
Synthetic monitoring
Simulating user interactions to test uptime and performance
NPM (Network Performance Monitoring)
Network Traffic Flow and latency
Data Dog Agent
-collector: every 15 seconds from your application
-forwarder: sends data to the Cloud SaaS website
-configurable through YAML
Distributed Tracing
-tracing data across microservices
-flame graphs for visualizing request paths
-service graphs for capturing dependencies between services
Alerts and Automation
-Notify users when thresholds are exceeded
-anomaly detection: using ML to detect deviations in metrics
-auto-healing: automate incident response action
Compute Monitoring
-CPU Usage: % of CPU
-CPU Load Average: number of processes waiting in CPU queue
-CPU steal: time a VM waits waiting for CPU due to other VMs waiting
-CPU I/O wait time: % of time CPU is waiting for disk or network I/O operations to complete
Memory Monitoring
-RAM consumption
-Page faults: requested data is not available in RAM and must be loaded in through disk
-Swap usage: “virtual memory” stored in the disk that is used when RAM is filled up. This swapping process be demanding if performed too often
-Memory fragmentation: memory blocks are too big or too small resulting in efficiencies
Disk and Storage Monitoring
-disk usage %
-disk read/write IOPS (input/output operations per second)
-disk latency
-disk queue length
Network Monitoring
-RTT latency: delays in data transmission
-bandwidth usage
-packet loss
-connections and active sessions
Process and Service Monitoring
-measuring what processes are resource-hungry
Collection process
-Agent
-Cloud Integrations
-SNMP monitoring (Simple Network Management Protocol)
-APIs
Visualization and Analysis
-Dashboards: CPU, Disk, Memory, network usage
-Heat Maps: performance outliers across servers
-Host Maps: high-level overview of infrastructure health
-Service Maps: dependencies and bottlenecks
Alerting
-Threshold alerts
-Anomaly detection
-Event correlation
-Auto-scaling triggers