Commonly Asked Demo Questions Flashcards
How big is the Datadog agent? What is the agent overhead?
In terms of resource consumption the Datadog agent consumes roughly:
Resident memory (actual RAM used): 50MB
CPU Runtime: less than 1% of averaged runtime
Disk: (Linux 120MB, Windows: 60MB)
Network: 10-50 KB of bandwidth per minute
*The stats listed above are based on an EC2 m1.large instance running for 10+ days.
For APM, resource consumption can vary a lot depending on actual usage. For minimal usage, the order of magnitude is similar to the numbers listed above.
Max total trace size: 10MB
Objection: But you don’t support on-prem!
“Actually, it’s a common misconception that we don’t support on-premise use cases. Many of our customers, including one of the largest healthcare platforms, used Datadog to monitor their migration from a mostly on-prem environment to the cloud.”
Do we have to use the agent?
The agent is not necessary, but, it is highly recommended
For starters, the OS metrics that you get are a very helpful baseline for your servers.
You also get more metrics (versus Cloudwatch for example)
Also, some of our integrations require the agent to collect metrics.
To enable APM, the agent + client (this refers to the programming language - python, ruby, go) is required (Linux)
How do you install the agent with Chef/Puppet?
We have a cookbook for Chef and a module for Puppet
How does Datadog monitor containers?
We monitor containers no matter how frequently they are created or destroyed. We also monitor orchestration tools like Kubernetes, Mesos, or Amazon ECS
We have what we call, Autodiscovery - which auto-discovers containers and apps running on them. So we can automatically detect and monitor any changes to your containerized environment and always have visibility into the apps running on them
Can you run a version of Datadog on-premise?
The agent is what you would install on-premise to collect metrics/events from your local environment and send it to your Datadog account
I’m concerned about security issues related to sending data to an external 3rd party
The data collected by default is very benign, and they can audit our Agent’s code via our Github account to verify everything that it does.
I’m concerned about security issues related to having more places of connections from my infrastructure to the internet.
You can set up the agent to report metrics via Proxy to reduce the number of hosts that actually have a connection to the internet. Our backend hosts all the data and the DataDog UI queries it.
How do you transmit information over the internet?
All data transmitted between Datadog and Datadog users is protected using Transport Layer Security (TLS) and HTTP Strict Transport Security (HSTS).
How is data encrypted in-flight/at rest?
Data is encrypted at rest using secure symmetric cipher - AES.
How is the data stored?
We’re stored entirely on AWS, and rely on their security for storage. If they ask if we’re multi-region - the answer is no. We’re on East.
What security certifications do you have?
CSA STAR and AICPA SOC
What metrics are collected for each integration? How are they decided?
Our development team undergoes an analysis of which metrics are noteworthy to collect, and then programs them into each integration. We list which metrics are being collected for each integration on our Docs page.
How can we get custom metrics in?
We have a StatsD handler called DogStatsD as part of the agent if you are using StatsD. We also have a REST-ful API
How many custom metrics can you take?
We can take an unlimited number of custom metrics. Included in our Pro pricing is 100 custom metrics per host. If you will have more, we work out pricing based on the additional volume. However, we are able to cut down typically on large amounts of custom metrics by combining metrics with tags. Pricing for Lambda falls under custom metrics pricing
Can you collect and show logs in Datadog?
Yes, Datadog’s log management tool can also show logs from other integrated tools in the event stream, or overlay them onto graphs with metrics.
Does DataDog integrate with Splunk?
Yes. However, logs from other sources are shown in the ‘Event Stream,’ and are not available in the ‘Log Explorer.’ The Log Explorer has many features for searching, visualization, and correlation with infrastructure or APM that aren’t available with logs that come from other sources.
How is the data parsed from logs determined to be a metric or event?
If you’re parsing integer/float values, it’s a metric. If it’s text, it’s an event. Metrics are considered custom when parsing your own logs from other log management tools.
How are you different from AWS CloudWatch?
Datadog gets much more granular (down to 1 second) and hold that granularity for the full retention period. Cloudwatch rolls up to 1 hour for 15 months. They start out at 1 min granularity and roll up to 5 mins after 15 days.
We are also much more robust for alerting and graphing based on our ability to aggregate data from multiple servers, and the ability to “slice and dice” this data based on our tagging.
Datadog provides 75-120 system level metrics and Cloudwatch is around 13 metrics
Lastly, AWS CloudWatch is only one source for us. We will take multiple sources of data and allow for correlation analysis between metrics and events.
What is the flamegraph?
The flamegraph in Datadog represents traces. It represents one request and is made up of spans. Spans represent the different services involved in that request
What is a trace?
A trace is a visualization of the communication across different services to process a request and the time spent by each service
What is the data retention time frame? Can it be extended?
We retain data at full granularity for 15 months (Infrastructure and APM metrics), but can easily do more as needed.
How do you sample traces?
We sample fast (performing well) and slow (not performing well) traces using signature-based sampling, which also removes traces that are too similar to each other
What do custom metrics look like inside of Datadog?
They look the exact same as all of the other metrics. If you can “increment it”, you can send it to Datadog. Metric = value + timestamp so it can be plotted
Can you send alerts via email, SMS, ticketing systems?
Yes. Alerts can be sent via email, or, to alerting systems like PagerDuty, OpsGenie and VictorOps. Alerts can also be sent via SMS and to ticketing systems like JIRA or Zendesk using our integrations
How often does Datadog collect data?
Datadog is collecting data at 10 to 20 second intervals from the agent, so about 15 seconds
The agent’s collector itself, runs every check sequentially and then sleeps for 15 seconds, so depending on the duration of every check it can run every 15 seconds or more.
DogStatsD, on the other hand works differently, DogStatsD receives metrics (instead of collecting them like the collector does), and then flushes them every 10 seconds.
What’s the level of granularity that data is stored?
We’re capable of storing data at 1 second, although the default is 15 seconds.
What languages does Datadog support?
Ruby, Java, Python, Go, C#/.NET, plus others through DogStatsD & API
Is the product API enabled?
Mostly everything in the UI is API enabled
What languages are supported for APM?
Java, Python, Ruby, Go, Node.js and popular frameworks/libraries for those languages
What Database/other integrations are supported for APM?
It depends on the language and we’re always releasing new integrations. Which integrations are we looking for?
Do you have “fill-in-the-blank” integration?
We’re always releasing new integrations as well as offer community contributed integrations for tools we haven’t built yet. Let’s pull up the most up-to-date list. Which integrations are we looking for?
What is “Event metadata”?
Event metadata is data that provides information about the event’s main data, such as: any tags associated with that event, the integration the event came from, the time and date of creation and the user or creator of the event. The metadata we collect for each event depends on what’s available from the integration.
What are tags?
Tags are labels applied to metrics and hosts and can be used for filtering in graphs and alerts. They are key: value pairs