Making decisions: Log Diagnostics Flashcards

Question 1

Q

Situation

Answer

A

Teams weren’t tapping into the diagnostic potential of our log aggregation tool.

Question 2

Q

Task

Answer

A

Investigate barrier to entry

Question 3

Q

Action - Phase 1 (research)

Answer

A

Conducted user research to discover how users used our log aggregation tool and the querying tools (Athena and Quicksight)
Discovered developers knew how to use tools but felt they querying tools were too slow.
Based on research pursued spike of debugging UI tools

Question 4

Q

Action - Phase 2 (spike)

Answer

A

• Optimised for fast feedback loop and set-up speed.
• Had four options:
1. CloudWatchLogs Agent → CloudWatch → Lambda → AWS ES → Kibana
2. Flume Agent (Log Extractor) → Elasticsearch → Kibana
3. Flume Agent (Log Extractor) → S3 → Lambda → AWS Elasticsearch → Kibana
4. Build ELK stack from scratch from a subset of data.
• Spiked #3 because version conflicts on 1 & 2, and 4 was not optimising for feedback and setup.
• Implemented the spike and conducted follow-up user research on how developers interacted with Kibana.

Question 5

Q

Result

Answer

A

Discovered that the lack of standardised logging across teams hindered log fidelity.
Developers confirmed they would use ssh/grep until log diagnostic tool was easier than ssh/grep.
Proposed work to standardise logging on shared infrastructure.
Stopped work on delivering an ELK-stack based set of features

Question 6

Q

Reflection

Answer

A

Strengths: user research, focus on fast-feedback, pros/cons analysis.
Differently: explored other log diagnostic tooling besides AWS elasticsearch.
Pushed harder to continue work on log standardisation.
Create a milestone system for log diagnostics

Making decisions: Log Diagnostics Flashcards

(6 cards)