Resilience_Engineer_Deep_Interview_Flashcards

1
Q

🔧 TECHNICAL QUESTIONS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you decide whether to build a custom solution or use an existing SaaS tool?

A

S: At Eficens, we had to automate invoice generation for a loan processing system.
T: I had to determine whether to build a custom solution or integrate a SaaS tool.
A: I conducted a cost and feature comparison between building it in-house with Spring Boot and integrating Sage Intacct. I considered factors like supportability, integration time, team bandwidth, and maintenance overhead.
R: We chose Sage Intacct via API integration, reducing dev time by 3 weeks and ensuring compliance. I also automated data exchange using Python scripts for daily sync with PostgreSQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain a scenario where you integrated legacy systems with new cloud-native solutions.

A

S: At TCS, our client wanted to modernize a legacy HRIS by integrating it with Microsoft Azure AD for SSO.
T: I was responsible for ensuring smooth integration without breaking existing workflows.
A: I used Terraform to provision Azure components and built middleware in Python to sync identity data. We rolled out changes incrementally using feature flags and extensive testing in staging.
R: We completed integration with zero downtime and enhanced user experience by reducing login issues by 90%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe your experience automating data flows using APIs or scripting.

A

S: During my SOC project on AWS, I needed to automate log collection across services.
T: The goal was to centralize data from AWS CloudTrail, GuardDuty, and Security Hub.
A: I used Python and AWS Lambda to pull data via APIs, format it, and forward it to Elasticsearch and Kibana for visualization. I also configured S3 as a backup store.
R: Reduced manual log collection effort by 100% and improved detection of suspicious activities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How have you contributed to system resilience and fault tolerance?

A

S: At Eficens, our backend microservices sometimes failed under load.
T: My task was to improve system resilience.
A: I containerized services using Docker, deployed on AWS Lambda behind API Gateway, and implemented retries with exponential backoff in Java. I also added Splunk alerts for anomalies.
R: System uptime improved from 95% to 99.9%, and recovery time after faults dropped by 60%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

💬 BEHAVIORAL QUESTIONS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Tell me about a time you had to work with non-technical stakeholders.

A

S: During a security project, I worked with the HR team to improve onboarding access flows.
T: They needed a simplified GUI to manage access roles.
A: I conducted a workshop to gather requirements, then built a web-based GUI in Python Flask to manage IAM roles via AWS SDK.
R: Reduced their dependency on IT by 70% and improved provisioning time from 2 days to 1 hour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe a time when you had to manage competing priorities.

A

S: At TCS, two major features were due at the same time—one for backend optimization, the other for compliance.
T: I was leading both efforts and had to balance development.
A: I prioritized based on risk and regulatory deadline, communicated timelines with stakeholders, and split the team accordingly.
R: Delivered the compliance module on time and delayed the optimization task by only 3 days with no business impact.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

🔍 SITUATIONAL QUESTIONS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If your MVP solution starts failing in production, how would you handle it?

A

S: Imagine a web tool I built for invoice generation starts failing intermittently.
T: My task is to restore service quickly while diagnosing the root cause.
A: First, I’d check logs via Splunk, use health checks, and roll back if a recent deployment caused the issue. I’d isolate the service and implement a retry mechanism. Simultaneously, I’d open comms with stakeholders and document a postmortem.
R: This reduces panic, helps maintain trust, and prevents recurrence via RCA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s your approach to identifying and clearing technical roadblocks?

A

S: At Eficens, a Lambda function often hit AWS memory limits.
T: I had to fix scalability issues without increasing costs.
A: I refactored logic to batch-process data, reduced cold start times, and used AWS X-Ray to trace bottlenecks.
R: Reduced function runtime by 40% and cost by 25%, unblocking scale-up efforts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

🛠️ TECH STACK QUESTIONS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How have you used PostgreSQL, Elasticsearch, or Snowflake in your past roles?

A

PostgreSQL: Used it in microservices to manage loan data; wrote optimized queries, used indexing and roles for performance and security.
Elasticsearch: Integrated with Suricata and Zeek logs in my AWS SOC project for threat hunting dashboards.
Snowflake: While I haven’t used Snowflake directly, I’ve worked with similar warehousing platforms and am confident in learning it quickly due to my SQL and ETL experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you ensure changes are backward compatible?

A

Use feature flags to enable gradual rollout.
Maintain schema versioning in APIs and DB.
Run regression tests and sandbox testing before deployment.
Use blue-green deployment or canary rollout strategies for critical services.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly