Datadog Features Flashcards
Containers + Kubernetes
Real time view of health of containerized environments
IoT
agent that monitors physical device connected to the internet
Distributed tracing
Execution path of a request
Continuous profiler
observe application process and measure code performance
Deployment tracking
breakdown requests tied to the continuous profiler
Serverless monitoring
metrics, traces, logs monitoring of a cloud model in which the cloud provider takes care of all the server-related infrastructure and maintenance
Agent
Brings data from server into DD
Installed at the OS
Collects system-level metrics (RAM, storage, CPU)
APM focuses on
Monitor performance metrics of applications Metrics: - Requests (load) - Error - Duration/Latency
ETE Tracing
End to end
connect traces to infrastructure metrics
Tracing w/o limits
ingest, search, analyze 100% of traces Complete traces: - priority to errors - high latency - low volume end points
Traces
the WHY
record of request on a application
show how long each step took to execute
Logs
the WHAT
record/trail of something that happened in the stack
SIEM
point out vulnerabilities in your code
Synthetic monitoring
Proactive
simulate different types of traffic
focuses on availability, application load/response times
CI/CD pipeline
catch issues before code makes it to staging and production
NPM
live network traffic between cloud & containerized environments
identify root cause of application if the network is at fault
NDM
live hardware network traffic between devices like routers, firewalls, switches
Auto-detection of cloud endpoints
Assesses which client-side applications are affected by they degradation of storage the depend on in the cloud
App-layer insight
health traffic of two endpoints
network metrics: TCP, latency, connection churn
DNS
domain name server
over view of server performance
RUM
REACTIVE focuses on end-user performance including: - appearance - load-times - error
Mobile RUM
measure end user experience on web applications
* Runs on EDGE
Events
text record for bigger, less frequent moments
Log parsing
refining and cleaning raw data into a standardized format
Log indexing
storing clean logs
Tags
each host & container has the same tag
Attributes & facets
provide additional log related details IE: timestamp
Roll
downgrading older data to a lower granularity
Monitor alert
condition based alerts
Metric alert
when a threshold is met OR when a metric value changes over a specified period of time
Anomaly alert
when there is a irregularity compared to historical patterns
Outlier alert
when a user is behaving differently that that of the group
Forecast alert
when a metric is projected to cross a threshold in the future
Composite alert
grouping multiple monitored alerts into 1 main alert
Watchdog
automated detection monitoring
Postmortem notebooks
summary of MTTD & MTTR data
SLO
service level object
focused on end-user
reliability and performance goals of a service over a specific period of time
SLI
service level indicator
focused on end-user
metrics about the service
Error budgets
measure how much unreliability is allowed before an SLO or SLA is violated
Burn rate
measure how fast an error budget is being consumed
Infrastructure list
overview of all hosts monitored by DD
Host map
visualize hosts
Infrastructure monitoring focuses
health and performance of back-end Metrics: - CPU - Memory - Storage
Incident monitoring focuses
Get ahead of situation before it happens
Quickly identify issue, what it’s impacting
Quickly understand how to resolve and prevent in the future
(all can be automated)
Metrics
numerical values
time-series data
Distributed metrics
percentile aggregations across all of customer’s hosts
DogStatsD
DD tool to easily collect custom metrics
Security monitoring
security of infrastructure
Line graph
time-series data
Stacked areas graph
summarizes multiple data points into 1 data point
Bar graph
count of a certain metric
Heat map
evolution of a metric overtime
Infrastructure Monitoring includes
- hosts/clouds/vm
- containers
- processes
- serverless
- IoT
What are the market expectations of an APM product?
- distributed tracing
- exception tracking
- auto-instrumentation
- code profiling
- RUM
- synthetics
Which pillar?
If something goes wrong, WHAT exactly happened?
Logs
Which pillar?
WHY is the code doing that? Is that what the code is supposed to?
Traces
Which pillar?
Are the infrastructure and applications doing what they’re supposed to?
Metrics