AZ-400 Cloud Guru Notes Flashcards

1
Q

What is reliability?

A

Access app when & how expected

  • Availability
  • Performance
  • Latency
  • Security
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to measure reliability?

A
  • Determine acceptable uptimes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define SLA, SLO and SLI

A

Service Level Agreement - agreement around availability

Service Level Objective - goal service wants to reach to meet agreement

Service Level Indicator - metrics used to meet commitment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Site Reliability Engineering?

A

Engineers working the full stack so same devs who wrote code fix issues and automate performance checks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Key concepts of SRE

A

Feedback - loops for continuous improvement
Measure everything - collect data on how everything is working
Alerts - use data to create alerts
Automation - automate as much as possible to reduce alerts
Small changes - fixes little and often
Risk - embrace risk to learn from problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Azure Monitor

A

Central location to gather info from app, infrastructure and Azure resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Azure Monitor collects?

A

Metrics - Data that measures how system is performing at a certain time
Logs - Messages about certain events in a system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Metrics usage

A

Visualise data in Metric explorer, save to dashboard/workbook
Anaylse performance
Alert if resource reaches threshold
Automate action based on metrics e.g. autoscaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can app insights provide?

A
  • App exceptions
  • App dependencies
  • Performance information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

URL ping test

A

Send HTTP req to specific URL to test availability
Lets you know duration and success rate
Enable retries recommended
Can have multiple test locations
Can have alerts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an action group?

A

Tells you who to notify and how

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is failure mode analysis?

A

Find single points of failure
Rate risk and severity
Determine response - respond and recover

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are fault points and fault modes?

A

Fault points - any place where architecture can fail
Fault modes - all the way a fault point can failW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a fault domain?

A

Make sure resources are hosted on separate racks in centres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is load testing vs stress testing?

A

Load - normal to heavy load tests
Stress - upper limit tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a baseline?

A

Determine a healthy state - gives a starting point of health of application checks
Makes it easier to determine effects of adding or changing things

17
Q

How are baselines created?

A
  • Log analytics and metrics explorer - analyze data
  • Azure monitor insights - provides recommended metrics & dashboards for Azure services
  • App insights - provides same as monitor but for app itself
18
Q

Dynamic threshold alert advantages

A
  • Machine learning picks up expected behavior against metrics
  • Don’t have to manually figure out thresholds
  • Can set it and forget it for rules
19
Q

App insights smart detection

A

Uses machine learning to detect anomalies with telemetry data
Built in - configured automatically with enough time and data
Creates alert independently using smart detection findings

20
Q

What can smart detection help with?

A

Failures/failure rates - figures out expected number of failures. Gives continuous monitoring and provides context. Min data & 24hour telemetry for baseline

Performance - page response testing, does daily analysis in case of false positives. Min data & 8 days telemetry for baseline

21
Q

What are dependencies?

A

To HTTP, db or file system calls
Internal vs external - use of libraries (internal), use of APIs (external)
Strong vs weak - strong can cause breaking failure, weak will still run but with limited features

22
Q

App insights dependency tracking on .NET

A

Tracking configured by default on .NET and .NET Core
Some need manual configuration

23
Q

What are app insights dependency data?

A

App map - of dependencies
Transaction diagnostics - track as they pass through different systems
Browsers - see calls
Log analytics - create custom queries against dep data

24
Q

App dependencies on VMs

A

Enable dependency agent on VM

Processes - Shows dependencies looking for servers, inbound/outbound latency, TCP ports
Views -different scopes from Vms, scale sets or from Azure monitor

25
Q

How often performance alerts analysed?

A

Once a day in case of false positives (i.e. expected spikes)

26
Q

What is TFVC?

A

Team Foundation Version Control
Developers check out the only version of the file on a machine, unlike Git.

27
Q

what level does Azure Repo work at?

A

Project level
Supports Git and TFVC
Optional component for Azure pipeline - other repos/version control can be used

28
Q

What is a submodule?

A

Add third part libraries as an external resource that can still be integrated

29
Q

What is scalar?

A

Git projects struggle to scale at larger sizes
Scalar helps reduce data transfer, cmd runtime, helps index relevant files & organise

30
Q
A