Troubleshooting Flashcards

Question 1

Q

Users are complaining about slow response times for a critical application. Walk us through your approach to diagnosing the source of latency.

Answer

A

Gather Initial Information (patterns, changes, consistency)
Define Performance Baseline
Network Analysis (packet loss, bandwidth utilisation; use ping or traceroute)
Server Health Check (utilisation)
Database Analysis (utilisation, queries, index usage)
Application Profiling (inefficient code, memory leaks)
Application Dependencies (changes)
Application Logs (errors, tracing)
Web Server Analysis (logs, response times, load)
Load Balancer Examination (configuration, performance)
Client-Side Investigation (browser compatibility)
Performance Monitoring (utilisation, latency, tracing, load testing)
Security (firewalls)
Comparative Analysis (normal vs. slow, patterns)
Collaboration (dev, DB, sys admins)
Testing and Validation (test hypothesis)
17 Communication and Resolution (stakeholders)

Question 2

Q

A database serving an essential application goes down unexpectedly. How would you handle this incident? Describe the steps you’d take to bring the database back online while minimizing data loss and service disruption.

Answer

A

Initiate incident response process
Communicate with stakeholders
Assess impact and scope
Isolate cause (logs, metrics)
Implement Immediate Fixes (patch, unlock bottleneck, restart service)
Restore from Backups (use backup/restore plan)
Data Recovery (perform point-in-time recovery)
Testing and Verification
Monitor and Stabilise
Identify Preventive Measures (post incident retro, comms, documentation)

Question 3

Q

The application is experiencing an increase in HTTP 500 internal server errors. Outline your process for investigating and resolving these errors, including the possible factors you’d consider and the strategies you’d employ to mitigate the issue.

Answer

A

Initial Assessment
Monitoring and Alerting
Error Logs Analysis
Identify Patterns
Code Review
Database Inspection
Infrastructure Assessment
Third-Party Services
Server Configuration
Testing and Reproduction
Rollback Recent Changes
Code Debugging
Error Handling and Logging
Load and Performance Testing
Bug Fixing and Code Deployment
Communication
Post-Incident Review
Documentation

Question 4

Q

Users are reporting intermittent connectivity issues, and you suspect a misconfiguration in the load balancer settings. Describe how you would verify the load balancer configuration, identify any misconfigurations, and rectify the issue to restore proper traffic distribution.

Answer

A

Gather Information
Logging and Monitoring
Access Load Balancer Configuration
Review Load Balancer Configuration
Check Health Checks
Session Persistence (misconfigured session persistence can lead to uneven distribution of traffic)
Connection Limits and Timeouts
Protocol and Port Settings
Compare with Best Practices
Network Topology and Routing
Backup Configuration
Rectify Misconfigurations
Testing
Verification and Validation
User Feedback
Post-Incident Review
Documentation

Question 5

Q

One of the microservices in a distributed application is exhibiting a memory leak, causing it to gradually consume more memory over time. How would you troubleshoot this issue, identify the service with the leak, and implement a solution to prevent further memory consumption?

Answer

A

Gather information
Examine monitoring and logs (memory usage, garbage collection, heap utilisation)
Analyse Memory Dump (capture memory dumps at different time intervals when the leak is suspected)
Identify the Leaking Code (inefficient memory management practices, unclosed resources, excessive object creation)
Analyse Dependencies
Memory Profiling (identify memory hotspots, anything consuming excessive memory)
Heap Analysis (visualise the memory usage patterns)
Testing and Isolation
Fix the Code
Retest and Validate
Post-Incident Review
Documentation