Infra questions Flashcards by Jason Cusolito

How long do we store metrics for?

We retain all of our metrics for 15 months

How well did you know this?

Not at all

Perfectly

Can I automate the build of dashboards and monitors?

Yes - you can use our API alongside tools like terraform to automate builds

How well did you know this?

Not at all

Perfectly

Do you deploy the agent as a sidecar for my Kubernetes Cluster?

No - it’s deployed as a daemonset, therefore one agent per host/node

How well did you know this?

Not at all

Perfectly

How often do we receive metrics from API crawlers?

For cloud - around 10 minutes, which is the same interval you’d experience in cloud native tools.

How well did you know this?

Not at all

Perfectly

Do we need the Agent?Or can we just configure through API

Since the APIs collect metrics on a 10-15 min interval, to get more granular metrics AND quicker metrics, you should get the Agent, because the agent collects every 15 seconds…Why would you NOT want the agent?

How well did you know this?

Not at all

Perfectly

Where is my data stored?

We are fully hosted in the cloud, so in multiple cloud instances (AWS, Azure, GCP, also in the EU. Mostly in USA.

How well did you know this?

Not at all

Perfectly

Do you offer on-prem hostings?

No.

How well did you know this?

Not at all

Perfectly

How could I get more advanced metrics from RDS?

Deploy the agent to an ec2 instance on the same security group as the database - you can hook into the rds like a regular DB

How well did you know this?

Not at all

Perfectly

Can I automate setting up multiple AWS accounts rather than manually in the UI?

Yes, using terraform, API, or cloudformation

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Any community dashboards?

No. But we have all types of dashboards and it is very easy to customize your own

How well did you know this?

Not at all

Perfectly

Can I convert grafana dashboards into DD so I dont have to recreate?

How well did you know this?

Not at all

Perfectly

Can I trigger changes on my infrastructure when an alert goes off?

Not directly, but we can trigger a webhook to allow a client to trigger a script based on the web hook. We also have AWS Automation and Kubernetes to autoscale

How well did you know this?

Not at all

Perfectly

Will I see Watchdog alerts right away?

No, it can take anywhere from 2-6 weeks. Watchdog needs to collect data in order to start making assumptions. The more data it has, the better

How well did you know this?

Not at all

Perfectly

Can I silence alerts periodically?

Yes, we can manage downtime in monitors easily

How well did you know this?

Not at all

Perfectly

How long does it take to set up Communcation integrations?

With any supported tech stack, very quickly, but can depend

What If I have to completely seperate my data based on teams?

We could spin up multi-org environments, or seperate DD instances

Is RBAC supported?

Yes, for logs and dashboards

Do you use monitor netflow? (e.g. monitoring the flow between devices such as routers/firewalls

Yes, in public beta

Can you support hybrid envrionments?

Yes

Support serverless?

Yes - theres a Serverless page under infrastructure to show AWS lambdas and AWS serverless

How often do live containers get updated?

2 second buffer period

Whats the agent overhead?

Depends… .08%-3% CPU image

If a client is running AWS ECS on Fargate, do they deploy the agent as a sidecar using Docker within the task definition?

Yes

Do I have to use Helm to deploy the agent in a Kubernetes environment?

Nope. We also provide YAML manifests for required resources

Can I see container health information beyond the 36 hours available in Live view?

Yes. Docker, Kubernetes, Chhecks/Integrations retain container info for the standard 15 months

My K8s is epehemeral

Datadog can provide insights into the entire lifecycle of pods, including observing when they terminate. We can also capture real time data on CPU and memory util. of pods

My servers are ephemeral

You can use a metric monitor on the system uptime metric to address this. This metric is an ever increasing timer which resets to 0 when the host starts. You can use diff() function with the metric to distinguise between a new server and a rebooted server

Infra questions Flashcards

potential infrastructure questions