Operational Excellence Flashcards
Design principals
- Perform operations as code: Run infraestructure as code, scripts.
- Make frequent, smaill, reversible changes.
- Refine operations procedures frecuently.
- Anticipate failute: Test failure scenarios, Fail fast.
- Learn from all operational failures: document and share.
Practice areas
- Organization
- Prepare,
3, Operate, - Evolve
Organization practice area
need to understand organization priorities, structure, how organization supports team members so they can support business outcomes
Organization Priorities
- Evaluate external customers needs.
- Evaluate internal customers needs.
- Evaluate governance requirements,
- Evaluate compliance requirements.
- Evaluate threat landscape.
- Evaluate tradeoffs.
- Manage benefits and risks,
Organization Operation model
understand roles, responsability, how decisions are made. Models that rule the company.
Operating model 2 by 2 representations
understand relationshipe between teams in your environment. WHO does WHAT.
Operating model - Fully separated model
Application and platform are managed by a fully separed team. Work is passed between teams through mechanisms such as work requests, work queues, tickets, or by using an IT service management (ITSM) system.
Operating model - Separated AEO and IEO
Here we follow the “you build it, you run it” methodology. The engineers are responsible for the engineering and operation of their workload. To organize the teams, you should use AWS Organizations and AWS Control Tower. The platform engineering team provides a standardized set of services (e.g. development or monitoring tools) and access to cloud services to the application team. The AWS Service Catalog can be used to govern the tooling.
PRO
Standards are distributed, provided, or shared
Strong feedback loop
Platform team supports Application team
Adopting standards may reduce reviews to enter production
CON
When changes or additions, Application Team always needs to discuss with Platform Team
AEO
Application Engineering and Operations
IEO
Infraestructure Engineering and Operations
Operating model - Separated AEO and IEO with centralized governance and a Service Provider
Similar to the centralized governance, but you offload some operations tasks such a patching and updating to Managed Services. These service is handled by AWS and they take care of these tasks
PRO
Offload “boring” operational tasks
Gain advantage of your providers’ standards, best practices, processes, and expertise
Latest service offerings
CON
Does not address the bottlenecks and delays created by transition of tasks between teams
Operating model - Separated AEO and IEO with centralized governance and an internal service provider consulting partner
This model also establishes the “you build it, you run it” methodology. But the difference to the previous model, this enables a Cloud Operations and Platform Enablement (COPE) team which supports when there are no cloud related topics. It provides a forum to ask questions, discuss needs, and identify solutions. The platform engineering team builds the core shared platform capabilities governance via the AWS Service Catalog.
PRO
Adopting more DevOps culture
Enabling cloud transformation for teams, establishes centralized cloud governance, and defines account and organization management standards
Application Team get CI/CD-pipeline from COPE
Remove Barriers that slow application team adoption of beneficial cloud capabilities
CON
involves huge effort to facilitate cloud adoption and organization standards
CCoE
Cloud Center of Enablement
COPE
Cloud Operations and Platform Enablement
Operating model - Separated AEO and IEO with decentralized governance
In this model the application engineers and developers perform both platform and application for engineering and operational workloads. Standards are still distributed by the platform team but the application teams are more free to engineer and operate their own capabilities in support of their workload.
PRO
Fewer constraints
More free in choosing own tooling
CON
Higher responsibilities of Application Engineer
Risk of rework is higher
Enforce policies (Governance via AWS Organizations and AWS Control Tower)
Operating model - relationship and ownership - Resources have identified owners
Understand who has ownership of each application, workload, platform, and infrastructure component, what business value is provided by that component, and why that ownership exists.
1.Define forms of ownership and how they are assigned
2.Define who owns an organization, account, collection of resources, or individual components
3.Capture ownership in the metadata for the resources