interview Flashcards

Question 1

Q

How do you differentiate between incidents, problems, and changes?

Answer

A

Incident: An unplanned interruption or degradation of service.
Problem: The underlying cause of one or more incidents.
Change: A modification to an IT service or system aimed at resolving problems or improving functionality.

Question 2

Q

How do you handle multiple simultaneous incidents?

Answer

A

Assess and prioritize based on impact and urgency.
Assign dedicated resources to each incident.
Use playbooks to ensure a structured response for high-priority incidents.
Communicate status updates effectively to all stakeholders.

Question 3

Q

What is your experience with post-incident reviews (PIRs)?

Answer

A

Conducted PIRs within 48 hours of incident resolution.
Structured the review to include a timeline, root cause analysis, corrective actions, and lessons learned.
Facilitated open discussions to identify process gaps and ensure accountability without assigning blame.

Question 4

Q

How do you ensure compliance with SLAs during incident management?

Answer

A

Establish a clear escalation and communication path for all teams involved.
Use global incident management tools like ServiceNow or PagerDuty.
Ensure team members are aware of time zones and organizational dependencies.

Question 5

Q

How do you diagnose performance issues on a Linux server?

Answer

A

Use tools like top, htop, or vmstat for resource usage.
Analyze logs under /var/log.
Check disk I/O using iostat or iotop.
Inspect network performance using netstat, tcpdump, or iftop.

Question 6

Q

What is your understanding of TCP/UDP and when would you use each?

Answer

A

TCP: Reliable, connection-oriented protocol for use cases like file transfers and web browsing.
UDP: Lightweight, connectionless protocol for real-time applications like DNS queries and video streaming.

Question 7

Q

How do you ensure containerized services are running optimally?

Answer

A

Use monitoring tools like Prometheus and Grafana for resource tracking.
Analyze container logs using docker logs.
Ensure health checks and resource limits (CPU, memory) are defined in Docker or Kubernetes configurations.
Investigate inter-container network latency or misconfigurations.

Question 8

Q

How would you troubleshoot DNS resolution failures?

Answer

A

Check the DNS server’s availability using nslookup or dig.
Verify DNS configurations in /etc/resolv.conf.
Investigate firewall or network settings blocking DNS traffic.
Ensure TTL values are appropriate and DNS caches are updated.

Question 9

Q

What are some common bottlenecks in CI/CD pipelines, and how do you address them?

Answer

A

Slow Builds: Optimize builds by caching dependencies or using parallel tasks.
Failed Tests: Ensure tests are modular and focus on critical areas.
Deployment Issues: Use automated rollback mechanisms or staged deployments.

Question 10

Q

How do you configure and use Prometheus or graffana for monitoring?

Answer

A

Install Prometheus and configure scrape targets in the prometheus.yml file.
Use exporters (e.g., node_exporter for Linux systems) to gather metrics.
Query metrics using PromQL and visualize them with Grafana.

Question 11

Q

What is the difference between active and passive monitoring?

Answer

A

Active Monitoring: Simulates user transactions to test system performance proactively (e.g., synthetic monitoring).
Passive Monitoring: Observes live user activity to detect issues in real-time (e.g., packet sniffing).

Question 12

Q

What do you look for in log files during incident resolution?

Answer

A

Errors or exceptions with timestamps matching the incident.
Patterns indicating system or user activity leading to failure.
Logs of dependent services to identify cascading issues.

Question 13

Q

How do you design an incident escalation matrix?

Answer

A

Define escalation tiers based on severity and impact.
Assign escalation paths to specific roles or teams.
Establish time thresholds for each tier.
Regularly review and update the matrix.

Question 14

Q

What are some key metrics you track to measure the success of incident management processes?

Answer

A

Mean Time to Detect (MTTD).
Mean Time to Acknowledge (MTTA).
Mean Time to Resolve (MTTR).
SLA compliance rates.
Post-incident review completion rates.

Question 15

Q

How do you handle emergency changes during an incident?

Answer

A

Assess the impact and risk of the change with key stakeholders.
Gain approval through an expedited emergency change management process.
Test the change in a controlled environment if time permits.
Monitor the results and document the change thoroughly.

Question 16

Q

What are some strategies for mitigating risks in high-availability systems?

Answer

A

Implement redundancy at all levels (e.g., servers, storage, networks).
Use automated failover mechanisms.
Regularly test disaster recovery and failover scenarios.
Ensure proper monitoring and alerting to catch early signs of degradation.

Question 17

Q

A critical database server is down during peak hours. How would you handle the situation?

Answer

A

Notify stakeholders immediately and assemble the incident response team.
Investigate logs for errors or performance degradation.
Check for hardware issues or resource exhaustion.
Apply a temporary fix, such as restoring from a backup or scaling resources.
Document the incident thoroughly and schedule a follow-up for root cause analysis.

Question 18

Q

A monitoring tool has flagged intermittent latency in a microservices-based application. What’s your approach?

Answer

A

Examine logs for specific services with high response times.
Use distributed tracing tools to identify bottlenecks.
Investigate resource usage on affected nodes.
Test inter-service communication and network latency.

Question 19

Q

Explain the significance of ICMP in network troubleshooting.

Answer

A

ICMP is used for diagnostic and error-reporting purposes.
Common tools like ping and traceroute rely on ICMP to measure connectivity and path latency.

Question 20

Q

How do you ensure I/O performance optimization in a high-load application?

Answer

A

Use RAID for disk performance and redundancy.
Optimize database queries and indexes.
Implement caching layers (e.g., Redis).
Monitor and adjust kernel I/O schedulers.

Question 21

Q

Define the Bot’s Purpose

Answer

A

Identify the problem the bot will solve (e.g., automate repetitive tasks, assist employees, or manage workflows).
Example use cases: answering FAQs, scheduling, or retrieving on-call staff information.

Question 22

Q

how to choose right bot platform?

Answer

A

Decide where the bot will operate (e.g., Slack, Microsoft Teams, email, or a custom app).
Ensure it integrates well with corporate tools (e.g., Jira, ServiceNow, or internal APIs).

Question 23

Q

Select Development Tools for bot

Answer

A

Bot Frameworks: Use frameworks like Microsoft Bot Framework, Dialogflow, or Botpress to streamline development.
Programming Language: Python, JavaScript, or Node.js are commonly used due to their simplicity and libraries.
APIs: Utilize corporate APIs (like HR systems or databases) to fetch required data.

Question 24

Q

Build Core Functionality for a bot

Answer

A

Write code for the bot’s tasks. For example:
Use APIs for fetching schedules, automating queries, or retrieving documents.
Build logic for processing commands like “Show today’s on-call staff.”
Implement natural language understanding (NLU) using tools like Rasa or Dialogflow for conversational bots.

Question 25

Q

1. Ensure Security and Compliance for bot

Answer

A

Use encryption to secure sensitive data.
Follow corporate policies on data storage and processing.
Authenticate users (e.g., via Single Sign-On (SSO) or OAuth).

Question 26

Q

Test and Deploy for a bot

Answer

A

Test the bot in a staging environment for bugs or user flow issues.
Deploy it to production on your chosen platform (e.g., Slack workspace or Teams environment).

Question 27

Q

Monitor and Improve

Answer

A

Monitor usage logs and performance metrics.
Gather feedback from users to improve functionality.
Update the bot regularly to handle new scenarios or integrate additional features.

Question 28

Q

Can you walk us through the lifecycle of an incident from detection to resolution?

Answer

A

Detection: Incident identified via monitoring tools, alerts, or user reports.
Classification: Assign severity level based on impact and urgency.
Response: Notify stakeholders, assemble the incident response team, and assign roles.
Diagnosis: Use logs, monitoring data, and tools to identify the root cause.
Resolution: Apply a temporary fix (if needed) and implement a permanent solution.
Communication: Regular updates to stakeholders.
Post-Incident Review: Document the incident, analyze for lessons learned, and refine processes to prevent recurrence.

Question 29

Q

How would you prioritize incidents during a major outage?

Answer

A

Assess the impact (e.g., number of users affected, financial loss).
Analyze the urgency (e.g., SLA breaches or cascading system failures).
Focus on critical systems supporting customer-facing services.
Ensure effective delegation, while providing continuous communication to stakeholders.

Question 30

Q

What monitoring tools are you familiar with, and how have you used them?

Answer

A

Tools: Prometheus, Grafana, Splunk, Datadog, and ELK Stack.
Used for tracking system performance, detecting anomalies, creating alerts, and diagnosing root causes.
Example: Used Grafana dashboards to monitor application health during peak traffic and proactively mitigated risks by analyzing metrics like CPU and memory usage.

Question 31

Q

How would you explain a technical incident to non-technical stakeholders?

Answer

A

Start with the impact (e.g., “Service X is currently unavailable for 20% of users”).
Avoid technical jargon; use analogies if helpful (e.g., “It’s like a traffic jam blocking access”).
Highlight the steps being taken and the estimated time for resolution.
Keep updates concise and provide frequent updates to maintain trust.

Question 32

Q

What’s your approach to delivering bad news about an incident?

Answer

A

Be transparent but solution-focused.
Clearly explain the situation, impact, and mitigation steps in progress.
Reassure stakeholders by emphasizing the team’s expertise and a structured resolution plan.

Question 33

Q

Describe a challenging incident you managed and how you resolved it.

Answer

A

Context: A critical payment processing service outage.
Action: Coordinated with SREs, reviewed logs, and pinpointed a database deadlock issue.
Solution: Rolled back the faulty code deployment and implemented additional monitoring to catch similar issues proactively.
Outcome: Restored services within the SLA and conducted a root cause analysis to prevent recurrence.

Question 34

Q

How would you troubleshoot an intermittent network issue?

Answer

A

Start with logs: Check network device logs and application-level errors.
Run diagnostics: Use tools like traceroute, ping, and packet capture.
Correlate data: Identify patterns like time of occurrence or specific affected regions.
Isolate components: Test individual elements of the network to narrow down the root cause.

Question 35

Q

What tools do you use for incident tracking and reporting?

Answer

A

Jira: For tracking and documenting incidents.
ServiceNow: For managing incident lifecycles.
Confluence: For post-incident reporting and knowledge sharing.

Question 36

Q

How do you ensure high-quality incident documentation?

Answer

A

Include essential details: incident timeline, root cause, impact, resolution steps, and lessons learned.
Ensure reports are concise, structured, and accessible to both technical and non-technical audiences.
Use templates to maintain consistency across reports.

Question 37

Q

How do you ensure effective collaboration during an incident?

Answer

A

Use a centralized communication channel (e.g., Slack or Microsoft Teams).
Clearly assign roles and responsibilities to team members.
Encourage open communication and quick escalation of issues.
Regularly update stakeholders and keep the team focused on resolution goals.

Question 38

Q

How do you manage a situation where two teams disagree on the root cause of an incident?

Answer

A

Facilitate a discussion to focus on facts rather than opinions.
Use data (e.g., logs, metrics) to guide decisions.
If unresolved, escalate to a neutral decision-maker or a higher-level incident commander.

Question 39

Q

What’s your experience with CI/CD pipelines?

Answer

A

Familiar with Jenkins, GitLab CI/CD, and GitHub Actions.
Implemented pipelines for automated testing, deployment, and monitoring.
Example: Reduced deployment time by 30% using a well-defined CI/CD process integrated with Kubernetes.

Question 40

Q

How do you mitigate risks when rolling out changes during an incident?

Answer

A

Implement a change freeze during critical periods.
Conduct thorough pre-deployment testing.
Use canary deployments or blue-green deployments to minimize impact.
Rollback immediately if adverse effects are detected.

Question 41

Q

How do you handle stress during a critical incident?

Answer

A

Stay calm and focused by breaking the problem into smaller tasks.
Use checklists and incident response playbooks to stay organized.
Maintain clear communication and lean on the team for support.

Question 42

Q

Describe a time you improved an incident management process.

Answer

A

Implemented a post-mortem framework to identify recurring incident trends.
Developed an automated incident notification system using Slack and ServiceNow integration.
Resulted in a 20% improvement in incident response times and better documentation quality.

Question 43

Q

How do you prioritize incidents and allocate resources?

Answer

A

depending on severity of the issue, outage, number of users effected

Question 44

Q

Can you describe a time you prevented a recurring issue?

Answer

A

ERD, PRD, production. oncall bot schedules misalligned

Question 45

Q

How do you manage communication during high-pressure incidents?

Answer

A

Through an lark channel with all the correct POCs from the affected departments. asking for update and providing when I can.

Question 46

Q

Provide an example of a critical incident you handled and its outcome.

Answer

A

rollbacks for bad deployments

traffic failover to ttp2 after ttp1 outage

downstream service caused errors spikes to a critical psm for my team 3C which caused an investigation and discussion with the team that caused it.

Question 47

Q

If you get 3 escalations, what would you prioritize? (3 escalations being: suicide content, LGBT or/and Government pressure)

Answer

A

In such scenarios, prioritization would depend on the urgency and impact. Suicide content would take the highest priority as it involves potential loss of life and requires immediate action to protect individuals. Next, I would address government pressure, ensuring compliance with regulations to maintain operational stability. Lastly, I would manage the LGBT-related escalation, ensuring that it is handled sensitively and in alignment with TikTok’s values of inclusivity and community support. Throughout, I would ensure clear communication and resource allocation to address all issues effectively.

Question 48

Q

What are you aspirations?

Answer

A

Through out my career ive been aspired to be more of leader where I can contribute to my team by driving project ideas, resolutions as well as being in the trench’s getting the work done.

Question 49

Q

Tell me about a time you’ve experienced conflict and how you dealt with it.

Answer

A

during my time at TikTok I designed a bot that updates the oncall schedules based on a master schedule regardless of the region. and outputs the current oncaller. During a demo for the discovery team there was a conflict between two of the SRE’s regarding the impact of the tool for their team that escalated to shouting. Which I took the reigns of the conversation so we could break down the issues that were present since one of the discovery members didn’t understand the issue the first member was bringing up.

Question 50

Q

Why are you interested in Tiktok?

Answer

A

im interested in working at titkok because its an ever evolving industry with numerous talented engineers and projects to work on to grow my career as well as supporting a platform that can connect millions of people worldwide through the app.

Question 51

Q

Besides your professional commitment, can you share a story about a time when you helped someone?

Answer

A

one of my friends was desperate for a career change and I brought up IT and got him involved with some foundational certs such as A+, Linux +. networking +, etc. We then went over interview prep and found some help desk jobs and now hes currently working for Microsoft as a pen test engineer after a 3 year journey.

Question 52

Q

AGILE

Answer

A

Linear form of agile that utilizes
1.requirements
2. design
3. implementation
4. Testing
5. Deployment
6. Maintenance

Question 53

Q

Q: What command in Linux shows running processes and their resource usage?

Answer

A

top or htop.

Question 54

Q

Q: How do you check disk usage in Linux?

Answer

A

Use df -h for disk usage and du -sh for directory size.

Question 55

Q

Q: What is the purpose of the /etc/hosts file in Linux?

Answer

A

A: It maps hostnames to IP addresses locally, bypassing DNS.

Question 56

Q

Q: What does ping do in networking?

Answer

A

A: Tests connectivity between two devices by sending ICMP echo requests.

Question 57

Q

Q: Name three common types of virtualization technologies.

Answer

A

A: VMware, Hyper-V, and KVM. oraclevm

Question 58

Q

Q: What is the difference between NAT and Bridged network modes in virtualization?

Answer

A

A: NAT shares the host’s IP, while Bridged gives VMs direct access to the network.

Question 59

Q

Q: What does the iptables command do in Linux?

Answer

A

A: Manages firewall rules for packet filtering.

Question 60

Q

Q: What is the purpose of a VLAN?

Answer

A

A: To segment a network into isolated virtual networks for security and efficiency.

Question 61

Q

Q: What is the primary purpose of Kubernetes?

Answer

A

A: Orchestrates containerized applications for scaling, deployment, and management.

Question 62

Q

Q: How do you create a Docker container from an image?

Answer

A

A: Use the command docker run <image>.</image>

Question 63

Q

Q: What is a Kubernetes Pod?

Answer

A

A: The smallest deployable unit in Kubernetes, containing one or more containers.

Question 64

Q

Q: How does Kubernetes ensure high availability?

Answer

A

A: By automatically replicating and rescheduling pods on healthy nodes.

Answer 65

A

A: Defines the steps to build a custom Docker image.

Answer 66

A

A: Use kubectl get nodes.

Answer 67

A

A: Manages external HTTP/S access to services within the cluster.

Answer 68

A

A: Prometheus with Grafana or Kubernetes Dashboard.

Answer 69

A

A: Identification, logging, categorization, prioritization, investigation, resolution, closure.

Answer 70

A

A: Service Level Agreement – a commitment to resolve issues within a specified timeframe.

Answer 71

A

A: To restore normal service operation as quickly as possible with minimal business impact.

Answer 72

A

A: A Priority 1 incident with the highest severity, often causing significant business disruption.

Answer 73

A

A: To analyze root causes, evaluate the response, and identify process improvements.

Answer 74

A

A: Information Technology Infrastructure Library.

Answer 75

A

A: To coordinate efforts, ensure clear communication, and oversee resolution activities.

Answer 76

A

A: Identifying goals and the problem the process aims to solve.

Answer 77

A

A: ITIL (Information Technology Infrastructure Library).

Answer 78

A

A: Provide regular updates with clear and concise information to stakeholders.

Answer 79

A

A: Use post-incident reviews, gather feedback, and implement iterative changes.

Answer 80

A

A: Responsible, Accountable, Consulted, Informed.

Answer 81

A

A: It ensures consistency, provides clarity, and supports training and compliance.

Answer 82

A

A: A predefined set of steps and procedures to handle specific incident types.

Answer 83

A

A: Ansible, Rundeck, or Zapier.

Answer 84

A

A: It changes file permissions (e.g., chmod 755 file).

Answer 85

A

A: find /path/to/dir -name “*.log” -mtime -7.

Answer 86

A

A: Piping (|) passes the output of one command as input to another. Example: ls -l | grep “filename”.

Answer 87

A

A: > overwrites a file, while&raquo_space; appends to a file.

Answer 88

A

A: Processes and extracts data from text. Example: awk ‘{print $1}’ file.txt prints the first column of a file.

Answer 89

A

!/bin/bash

ps aux

Answer 90

A

A: Exits the script immediately if a command returns a non-zero status.

Answer 91

A

0 2 * * * /path/to/script.sh

Answer 92

A

A: Excludes lines matching a pattern. Example: grep -v “error” file.txt.

Answer 93

A

A: sed -i ‘s/foo/bar/g’ file.txt.

Answer 94

A

A: Master node (API server, scheduler, etcd, controller manager) and worker nodes (kubelet, kube-proxy, container runtime).

Answer 95

A

A: Run kubeadm init, then configure kubectl and join worker nodes using the provided token.

Answer 96

A

A: Use a Service of type NodePort or LoadBalancer, or configure an Ingress.

Answer 97

A

docker build -t <username>/<imagename>:<tag> .
docker push <username>/<imagename>:<tag></tag></imagename></username></tag></imagename></username>

Answer 98

A

A: docker ps.

Answer 99

A

A: Use kubectl scale deployment <name> --replicas=<count>.</count></name>

Answer 100

A

A: Stores non-sensitive configuration data as key-value pairs, which can be used by applications.

Answer 101

A

kubectl rollout restart deployment <name></name>

Answer 102

A

docker-compose is for local container orchestration, while Kubernetes manages containers across distributed systems.

Answer 103

A

Define a YAML file specifying the storage class, access modes, and size, then apply it using kubectl apply -f.

Answer 104

A

A: An incident is an immediate disruption of service, while a problem is the underlying cause of one or more incidents.

Answer 105

A

A: To review and approve proposed changes to minimize risks.

Answer 106

A

A: By assessing impact (business effect) and urgency (time sensitivity).

Answer 107

A

A: Faster resolution, improved communication, reduced downtime, and better documentation for future prevention.

Answer 108

A

A: A repository of solutions, troubleshooting guides, and documentation to help resolve incidents efficiently.

Answer 109

A

A: Acts as the single point of contact for users to report incidents and request services.

Answer 110

A

A: Reactive addresses incidents after they occur, while proactive identifies and prevents potential issues.

Answer 111

A

A: Key metrics include Mean Time to Resolution (MTTR), First Call Resolution (FCR) rate, and SLA compliance.

Answer 112

A

A: To involve higher-level support or management when the current team cannot resolve the issue within SLA timelines.

Answer 113

A

A: ServiceNow, PagerDuty, or Jira Service Management.

Answer 114

A

! /bin/bash

for i in {1..10}; do
touch “file$i.txt”
done

Answer 115

A

A: netstat -tuln or ss -tuln.

Answer 116

A

A: Network Address Translation translates private IP addresses to a public IP address for internet communication, conserving IPv4 addresses.

Answer 117

A

A: Use ip link add <name> type bridge or configure with ifconfig or ip addr.</name>

Answer 118

A

A: Soft links (symbolic links) point to the original file’s path, while hard links are direct references to the inode, unaffected by file relocation.

Answer 119

A

A: ping tests connectivity to a host, while traceroute shows the route packets take to reach the host.

Answer 120

A

Generate a key pair: ssh-keygen.
Copy the public key to the remote server: ssh-copy-id user@remote_host.
Ensure proper permissions: chmod 700 ~/.ssh and chmod 600 ~/.ssh/authorized_keys.

Answer 121

A

It converts input into arguments for a command. Example: ls | xargs rm removes files listed by ls.

Answer 122

A

A: command > file 2>&1.

Answer 123

A

Functional Escalation: Involves higher-level technical expertise.
Hierarchical Escalation: Involves senior management for visibility or decision-making.

Answer 124

A

SLA (Service Level Agreement): Defines service delivery expectations between a provider and a customer.
OLA (Operational Level Agreement): Defines responsibilities between internal teams.
UC (Underpinning Contract): Defines obligations between a provider and third-party vendors.

Answer 125

A

Assess Impact: Identify affected systems and services.
Activate Major Incident Process: Notify stakeholders and assemble an incident response team.
Communicate Updates: Provide regular updates to users and management.
Implement Fix: Work on resolution or mitigation.
Document: Log details for post-incident review.

Answer 126

A

A process to determine the underlying reason for an incident or problem and identify corrective measures to prevent recurrence.

Answer 127

A

MTTR (Mean Time to Resolve)
MTTD (Mean Time to Detect)
First Call Resolution Rate
Incident Escalation Rate

Answer 128

A

Coordinate response teams.
Ensure SLA compliance.
Communicate with stakeholders.
Drive root cause analysis and post-incident reviews.
Identify areas for process improvement.

Answer 129

A

Identify: Detect and verify the breach.
Contain: Isolate affected systems.
Eradicate: Remove the threat.
Recover: Restore systems and data.
Learn: Conduct a post-incident review.

Answer 130

A

DNS (Domain Name System) resolves human-readable domain names (e.g., google.com) into IP addresses that computers use to identify resources.

Answer 131

A

A (Address): Maps a domain to an IPv4 address.
AAAA: Maps a domain to an IPv6 address.
CNAME: Maps an alias to another domain name.
MX (Mail Exchange): Specifies mail servers for a domain.
NS (Name Server): Specifies authoritative DNS servers for a domain.
PTR (Pointer): Provides reverse DNS, mapping an IP address to a hostname.
TXT: Stores arbitrary text, often used for verification and policies (e.g., SPF, DKIM).

Answer 132

A

A: Queries DNS servers for information about domains and their records.

Answer 133

A

nslookup google.com

Answer 134

A

-kubectl version: Get Kubernetes client and server versions.
- kubectl get pods: List all running pods.
- kubectl describe pod <pod_name>: Get detailed info about a pod.
- kubectl apply -f <file>.yaml: Apply configuration from a YAML file.
- kubectl delete pod <pod_name>: Delete a pod.

Answer 135

A

physical : it is responsible for the actual physical connection between the devices. The physical layer contains information in the form of bits

Answer 136

A

Data Link Layer (DLL)
The data link layer is responsible for the node-to-node delivery of the message.

Answer 137

A

Network Layer
The network layer works for the transmission of data from one host to the other located in different networks. It also takes care of packet routing i.e. selection of the shortest path to transmit the packet, from the number of routes available.

Answer 138

A

Transport Layer
The transport layer provides services to the application layer and takes services from the network layer. The data in the transport layer is referred to as Segments. It is responsible for the end-to-end delivery of the complete message. The transport layer also provides the acknowledgment of the successful data transmission and re-transmits the data if an error is found. Protocols used in Transport Layer are TCP, UDP NetBIOS, PPTP.

Answer 139

A

Layer 5 – Session Layer
Session Layer in the OSI Model is responsible for the establishment of connections, management of connections, terminations of sessions between two devices. It also provides authentication and security. Protocols used in the Session Layer are NetBIOS, PPTP.

Answer 140

A

Presentation Layer
The presentation layer is also called the Translation layer. The data from the application layer is extracted here and manipulated as per the required format to transmit over the network. Protocols used in the Presentation Layer are JPEG, MPEG, GIF, TLS/SSL, etc.

Answer 141

A

Application Layer
At the very top of the OSI Reference Model stack of layers, we find the Application layer which is implemented by the network applications. These applications produce the data to be transferred over the network. This layer also serves as a window for the application services to access the network and for displaying the received information to the user. Protocols used in the Application layer are SMTP, FTP, DNS, etc.

Answer 142

A

TCP
Creates a secure connection to ensure data is transmitted reliably. TCP verifies that data is received and checks for errors.
UDP
Does not establish a connection, so it doesn’t check for errors or confirm receipt. This means some data may be lost during transmission.

Answer 143

A

An issue that needs to be addressed immediately and with as many resources as is required. Such an issue causes a full outage or makes a critical function of the product to be unavailable for everyone, without any known workaround.

Answer 144

A

severe end user tiktok user impact app functions are broken and severe experience issues are being encounter.
- 3 or more teams impacted such as TCE, RDS, HDFS
- quantifiable revenue or advertiser impact
- security impact risk to customer data, security breach, data loss, vulnerabilities, hack/attack, etc

Answer 145

A

high system affect vs single user affect. vs urgency.

Answer 146

A

Responsibilities:

imt is added to a p0 incident and begins tracking the incident timeline
ensure escalation to correct technical teams based on systems impacted
insures that the incident is being address in a timely manner and will drive escalations to team leads and managers
opens fatal record
starts incident analysis template to start incident report
tracks the incident details and drives the incident group until the impact is mitigated
add all relevant data to incident report and begins the post incident review process
Security incidents escalate to the appropriate security channel

Answer 147

A

IT Service Management

Answer 148

A

intial triage ( join/create oncall, review chat logs, request issue summary, request poc updates, ensure all necessary escalation contact that are needed to investigate the issue are engaged.
manage incident (update ttp incident thread every 15 minutes for critcal issues. request regular updates from technical teams, use data to populate the iat with as much data as possible)
Post incident (when mitigate lower to p1, send final message to appropriate groups. create jira epic and create post mortem doc.)

Answer 149

A

identify - detect/log incident
analyze - categorize and prioritize incidents
respond - investigate, diagnose and resolve incidents
review - post mortem and improvements

Answer 150

A

identify incidents
document incidents
categorize incidents
assign ownership

Answer 151

A

incident coordinator
technical specialists
communication manager
process owner

Answer 152

A

service outage
security breach
human error
natural disasters

Answer 153

A

Responsible: Person(s) doing the task.
Accountable: Person with final decision-making authority.
Supporting: Person(s) providing support or resources.
Consulted: Person(s) providing input or feedback.
Informed: Person(s) kept updated on progress or decisions.

Answer 154

A

Network Traffic Drop in TTP1 OCI

Answer 155

A

CI/CD stands for Continuous Integration and Continuous Deployment/Delivery, automating code integration, testing, and deployment processes.

Answer 156

A

A: CI involves merging developer code changes into a shared repository multiple times a day, with automated builds and testing to detect integration issues early.

Answer 157

A

A: CD automates the release process, ensuring the application is always in a deployable state but requires manual approval to deploy to production.

Answer 158

A

A: Continuous Deployment automates the release process entirely, deploying every change that passes automated tests to production without manual intervention.

Answer 159

A

Code Commit: Developers push code to a repository.
Build: The application is compiled, and dependencies are installed.
Test: Automated tests validate the code.
Deploy: The tested application is deployed to staging or production.

Answer 160

A

A: To ensure that changes do not break existing functionality or introduce bugs.

Answer 161

A

A: Jenkins, GitHub Actions, GitLab CI/CD, CircleCI, Travis CI, Azure DevOps, etc.

Answer 162

A

A: Version control (e.g., Git) tracks changes, enables collaboration, and integrates with CI/CD pipelines for automated builds and testing.

Answer 163

A

A: A strategy where a new version is rolled out to a small subset of users before full deployment to minimize risk.

Answer 164

A

A: Reverting to a previous stable version in case of deployment failures.

Answer 165

A

A: To define and clarify roles and responsibilities within a project or process to avoid confusion.

Answer 166

A

Assign clear responsibilities for incident detection, escalation, resolution, and review.
Ensure accountability is established for critical decision-making.
Identify key stakeholders to keep informed during incidents.

Answer 167

A

Responsible: Incident responder or SRE.
Accountable: Incident manager or lead.
Supporting: System administrator or SME.
Consulted: Security team or product owner.
Informed: Leadership or affected customers.

Answer 168

A

Overlapping roles leading to confusion.
Lack of agreement on responsibilities.
Not keeping the matrix up to date with organizational changes.

Answer 169

A

A:

Incident Management focuses on restoring service as quickly as possible.
Problem Management identifies and resolves the underlying cause of incidents.

Answer 170

A

A: A set of best practices for IT service management that aligns IT services with business needs.

Answer 171

A

A: To act as a single point of contact (SPOC) for users to report incidents and request services.

Answer 172

A

A: A group responsible for evaluating and approving proposed changes to IT systems.

Answer 173

A

People: Stakeholders and roles.
Processes: Workflows and activities.
Products: Technology and tools.
Partners: Vendors and suppliers.

Answer 174

A

Mean Time to Resolve (MTTR).
Mean Time to Detect (MTTD).
Number of recurring incidents.
SLA compliance rate.

Answer 175

A

Reactive: Focuses on resolving incidents after they occur.
Proactive: Prevents incidents through monitoring, analysis, and improvement.