Pluralsight - Reliable Google Cloud Infrastructure: Design and Process Flashcards
SLO / SLI abbreviations
Service Level Indicators
- measurable and time-bound
- latency / requests per second
- availability etc
- it’s basically a KPI
Service Level Objectives
- the goal you want to achieve by completing a certain number of SLIs
- must be achievable
Service Level Agreement
- contract to deliver a service that specifies consequences if the service isn’t delivered
Key questions to ask
Who are the users?
Who are the developers?
Who are the stakeholders?
What does the system do?
What are the main features?
Why is the system needed?
When do the users need and/or want the solution?
When can the developers be done?
How will the system work?
How many users will there be?
How much data will there be?
Stateless service
Means that it gets its data from the stateful environment.
They work in combination with the stateful ones = those that are attached to persistent disks for example.
The stateless backend can for example scale up or down, depending on the demand coming from front-end directed to it by the load balancer.
12 factor app
- Codebase - should be tracked in a version control environment
- Dependency declaration / isolation (tracking is done by Maven, Java or pip). Dependencies can be isolated by being packed into a container. Container Registry can store the images.
- Configuration - every app has different configuration environments: Dev, test, production
- Backing service - eg DB, caches should be accessed by URL
- Build, release, run - software development process should be split into these parts.
- Processes - execute app as one or more stateless process
- Port-binding - services should be exposed to a port
- Concurrency - apps should be able to scale up/down
- Disposability - apps should be able to handle failures
- Dev/prod parity - keep them as similar as possible
- Logs - awareness of health of your apps
- Admin processes - usually one-off or automated
REST - representational state transfer
Design of microservices based on REST
Done to achieve loosely coupled independent services (i.e. barely related to each other/barely linking to each other).
Versioned contracts concept - each time you change your app, you create a new version, but you keep a certificate for an older one in case other apps rely on the old version –> so you ensure backward compatibility
At low levels apps communicate via HTTPS using text-based payloads:
- Client makes GET, POST, PUT or DELETE requests
- Body of the request is formatted as JSON or XML
- Results turned as JSON, XML or HTML
REST supports loosely coupled logic, but it needs a lot of engineering, if there are customised requests from clients, there must be custom made REST APIs to correctly call on the elements requested.
- Uniform interface is a key
- Paging should be consistent
- URIs consistency
Batch APIs
- collection of resources representations sent to the person requesting info in the form of JSON files
HTTP - it is a protocol: Design of microservices based on HTTP
HTTP requests consist of:
- VERB: GET, PUT, POST, DELETE
- Uniform Resource Identifier (URI)
- Header (metadata about the msg)
- Request body
GET - retrieve a resource
POST - request a creation of a new resource
PUT - create a new resource or alter existing data, can only be done after POST
DELETE
API / services - each API makes available some collection of data
gRPC
OpenAPI
- Each google cloud service exposes a REST API
- Service endpoint example: https://compute.googleapis.com
- collections include instances, instanceGroups, instanceTemplates
- verbs (GET…)
Use OpenAPI to expose apps to clients (i.e. make the apps available to the clients).
gRPC
- binary protocol developed at Google
- useful for internal microservice communications
- based on HTTP/2
- supported by many gcloud services such as Global Load Balancer, cloud endpoints for microservices, GKE (using Envoy Proxy)
————————————————-
Tools for managing APIs
Cloud Endpoints
- helps develop, deploy and manage API
Apigee
- built for enterprises (for on/off premises, any public cloud usage)
- Both of the above provide user authentication, monitoring, securing, OpenAPI and gRPC
Gcloud services are also APIs in some way:
Compute Engine API:
- has collections (instances, instance groups, networks, subnetworks etc)
– for each collection various methods are used to manage the data
Continuous Integration
- Code is written, pushed to GitHub
- Unit tests are passed
- Build deployment package - create docker image / dockerfile
- The image is saved in the Container Registry where it can be deployed
- Additional step: Quality analysis by tools such as SonarQube
Note: each microservice should have its own repo
CI provided by Google
-
Cloud Source Repository
- like GitHub on gcloud
- these are managed git repos -
Cloud Build
- building software quickly
- Docker-build service, alternative to using Docker Build command
- gcloud builds submit –tag gcr.io/project_id/img_name
- they can create dependencies, run unit tests…
- executes build steps you define, it’s like executing commands in a script -
Build triggers
- watch your repo or build container
- support Maven, custom builds and Docker
- the triggers start the builds automatically when the changes are made to the source code -
Container/Artifact Registry
- provides a secure private Docker image repo on Gcloud. The images are stored in Cloud Storage. Can use IAM rules to set who can access.
- can use docker push or docker pull<img></img> -
Binary Authorisation
- allows you to enforce deploying only trusted containers into GKE
- for example an ‘attestor’ verifies that the image is coming from a trusted repo
- uses Kritis Signer for the vulnerabilities assessment
Note: You can create a VM to test the container/image. In the VM options select container and in the ‘name’ put the name of the container from the ‘History’ in Cloud Build.
Note: Build configs can be specified in the Dockerfile or Cloud Build file.
Note: To connect to Artifact Registry, will need to do git config –global –> with email and name, and
Storage options
Relational
1. Cloud SQL - vertical scaling (incr the no. of VMs); global scalability
2. Cloud Spanner - horizontal scaling (eg adding nodes); global scalability
3. Alloy DB - analytical processing, gen AI
File
1. Filestore - latency sensitive - unstructured data
NoSQL
1. Firestore -scaling with no limits, good for user profiles, game states; non-relational data analytics without caching
2. Cloud BigTable - horizontal scaling - eventual consistency (the updates to files are not immediately seen by all the users); good for read/write, financial services, low latency, unlike BigQuery
Object
1. Cloud Storage - scaling with no limits, binary project data; unstructured data
Block
1. Persistent Disk (snapshots are backups of persistent disk)
Warehouse
1. BigQuery
In memory
1. Memorystore - vertical scaling - eventual consistency; caching, gaming; non-relational data analytics with caching
HTTPs + Cloud CDN
When global Load Balancers are used, it’s best to use a SSL certificate to protect data.
With HTTPs LB you should use Cloud CDN - caching content closest to the user.
Network Connectivity (VPNs)
VPC Peering
- good to connect different VPCs together regardless of whether they are in the same org or not but the subnet ranges CANNOT overlap
Cloud VPN
- connects your on-premises network to Gcloud VPC through an IPsec VPN Tunnel
- one VPN gateway encrypts the traffic, another decrypts
- good for low volumes of data
- Classic VPN
– has 99.9% availability
– single interface
– single external IP
– static routes are available
- HA VPN
– has 99.99% SLA availability
– 2 interfaces
– 2 external IPs
– must use BGP routing = dynamic routing –> can create active/active or active/passive routing configs
- static (classic VPN) or dynamic routes using Cloud Router - allows to update which tunnel info goes through without actually changing the configurations
Note:
HA VPN allows to use Dedicated and Partner Interconnects. Dedicated - allows direct connection to the colocation facility. Partner - for lower bandwidth requirements. These options allow to use Internal IPs. Must use Cloud Routing = BGP ! Can also use Private Google Access.
Gcloud Deployment methods (GKE vs VMs vs App Engine etc)
If you have specific machine requirements
- use VMs
No specific OS requirements, do you need containers?
- YES: use GKE (Kubernetes if you want to customise; Cloud Run if not - then google managers LBs, autoscaling, clusters, health checks)
No specific OS and no containers, is your service event-driven?
- YES: Cloud Functions
- NO: App Engine (LBs, autoscaling, all the infrastructure - are all managed by Google, you just focus on the code)
Dockerfiles in:
App Engine
Kubernetes
Cloud Run
App Engine
- when we have a Dockerfile - can build it:
docker build -t test-python . –> dot at the end indicates the build should be done in the cwd
- Create image in App Engine
gcloud app create –region=us-west1 - Deploy image
gcloud app deploy –version=one –quiet - Now if you go to App Engine, you will see a link top right corner –> if clicked the image/app will run on the App Engine in a new window
- If I create a new version, then App Engine will replicate this too; new version can be created in the same deploy command as above after I change whatever it is that I wanna change. Can add –no-promote flag which will ensure that the first version is still the one running. Not v2. Maybe we just wanna test v2 for now.
- In the ‘Versions’ section can click ‘Split Traffic’ and divert traffic to whichever version we want.
Kubernetes
- Once you create a cluster from the console, connect to it either through console or cloud shell:
gcloud container clusters get-credentials cluster-1 –zone us-central1-c –project project_name - Show machines in the cluster:
kubectl get nodes - We will need a new file called kubernetes-config.yaml with certain settings
- To push the image:
**gcloud builds submit –tag us-west1-docker.pkg.dev/$DEVSHELL_PROJECT_ID/devops-demo/devops-image:v0.2 ** - Once the image is built, copy the image name that cloud shell gives and paste into the kubernetes-config.yaml file in the images part
- Then we apply the changes
kubectl apply -f kubernetes-config.yaml - can check:
kubectl get pods / kubectl get services
—————————————
Cloud Run
- Create a new image:
gcloud builds submit –tag us-west1-docker.pkg.dev/$DEVSHELL_PROJECT_ID/devops-demo/cloud-run-image:v0.1 . - In the console, go to Cloud Run –> Create Service
- Create a service based on the most recent image in ‘select’ image part
- make the service publicly available by ticking ‘Allow unauthenticated invocations’
- the service should start and we can click the newly available link
Cascading Failures
Occur when one system’s failure causes other systems to overload and subsequently fail
Circuit Breaker
Stops sending requests to an instance if it fails several times
Cloud Identity-Aware Proxy (Cloud IAP)
Provides managed access to applications running on App Engine, Compute Engines, GKE. Access to web applications on gcloud without VPN
gcloud auth login = gcloud auth activate-service-account –key-file=[path to key]
Command 1 is used to use user credentials, command 2 is to login as a service account user.
VPC security best practices
- Don’t create a public IP unless you absolutely need one. Use other available options instead (direct interconnect etc)
- use Cloud NAT for egress to the inet from internal machines
- use Private Google Access:
gcloud compute networks subnets update subnet-b –enable-private-ip-google-access
Cloud Endpoints
API management gateway that allows to deploy and manage APIs on any Google Cloud backend.
It uses Identity Platform for authentication if needed.
Uses HTTPS, restrict to TLS
Google Cloud Armour
- provides additional Ddos protection
- Cloud DNS also helps as well as the load balancers
- supports layer 7 Web Application Firewall (WAF) rules:
– rules that prevent common SQL attacks
– identify threats using request headers, IPs etc
Encryption
- Google uses Data Encryption Keys DEK = symmetric AES-256 key
- keys are encrypted using Key Encryption Keys (KEK)
- root keys are controlled in Cloud KMS - can be used for custom key management
- **Customer supplied encryption keys - CSEK* can be used too - so the key is stored on-premises and not on cloud –> can only be used with Cloud Storage and Compute Engine
- Data loss prevention API - finds and sort of masks sensitive data
Rolling updates without downtime
- instance groups allow this feature as default
- Kubernetes also by default –> just specify the replacement Docker image
- App Engine: rolling updates is completely automated
- use blue/green deployment when you don’t want two versions to be running simultaneously: blue will be current software version, green the updated version.
– for compute engine can use DNS to migrate requests
– for Kubernetes can configure your service to route to new parts using labels
– for App Engine can simply split traffic - Canary deployment - you send a small percentage of your traffic to the updated environment and you closely monitor it
Costs
- consider Spot VMs
- consider Sustainability Discounts on VMs
- don’t overestimate the NEEDED disk space
- don’t overuse GKE clusters, creating more than you need
- don’t use external IPs if not needed
Lab:
Starting to work with Terraform
- You will need a configurations file - file that describes infrastructure in Terraform
touch instance.tf –> this will be config file where we will create a VM instance - Add the following info into the file (JSON format code):
**
resource “google_compute_instance” “terraform” {
project = “qwiklabs-gcp-01-27d195f2ecf2”
name = “terraform”
machine_type = “e2-medium”
zone = “us-central1-c”boot_disk {
initialize_params {
image = “debian-cloud/debian-11”
}
}network_interface {
network = “default”
access_config {
}
}
}
** - Initialise local Terraform settings:
terraform init - create an execution plan - tells you step by step what will be done if I ‘apply’ the configs
terraform plan - apply configs
terraform apply - once the config is applied, Terraform creates a new file where it tracks what Terraform is responsible for (terraform.tfstate)
- To see the current state of config
terraform show
Lab:
1. Build, change, and destroy infrastructure with Terraform.
2. Create Resource Dependencies.
3. Provision infrastructure.
- Create config
touch main.tf - Add info to config file from lab
- Initialise
terraform init - Create resources
terraform apply - See the latest state
terraform show - Update main.tf and to see a new version apply the changes:
terraform apply - Can also destroy everything in Terraform
terraform destroy –>destroys step by step, eg VPC network cannot be deleted while instances still exist within it. So the instances will be deleted first.
Trying out dependencies (add External static IP to instance): Terraform knows that the IP must be created before it can be attached, so it knows what depends on what
- Change the main.tf again to include static IP and attach it to the VM instance
- Check out what will happen and save the plan to a file called static_ip:
terraform plan -out static_ip - Use the new file to execute the created plan:
terraform apply “static_ip”
———————–
When dependencies are not explicitly seen by Terraform (because they may be in different files):
- depends on … is used
- eg if an instance expects to use a Cloud Storage bucket, but we configure the bucket inside OUR application code, which isn’t visible to Terraform, use the depends on explicit dependency, rather than the implicit dependency.
- do terraform apply once the changes about bucket info and new instance that relies on the bucket are made to the main.tf file
————————————
Provision infrastructure
- Terraform uses provisioners to upload files, run shell scripts, or install and trigger other software like configuration management tools. - Example of adding a ‘local-exec’ provisioner within a VM instance resource:
**
resource “google_compute_instance” “vm_instance” {
name = “terraform-instance”
machine_type = “e2-micro”
tags = [“web”, “dev”]
provisioner “local-exec” {
command = “echo ${google_compute_instance.vm_instance.name}: ${google_compute_instance.vm_instance.network_interface[0].access_config[0].nat_ip}»_space; ip_address.txt”
}
…
**
- Terraform treats provisioners differently from other arguments. Provisioners only run when a resource is created, but adding a provisioner does not force that resource to be destroyed and recreated.
- to tell Terraform to recreate the instance:
terraform taint google_compute_instance.vm_instance
then
terraform apply - Can also create a ‘destroyed provisioner’ that run only during a destroy operation. These are useful for performing system cleanup, extracting data, etc.
Lab:
Modules in Terradata
Definitions:
Modules are used to help manage configs when we have one block in one config file relating to another block and another in a different file. This goes on and on. To make the process easier, the modules are used.
For example, you might create a module to describe how all of your organization’s public website buckets will be configured, and another module for private buckets used for logging applications.
A Terraform module is a set of Terraform configuration files in a single directory. Even a simple configuration consisting of a single directory with one or more .tf files is a module. When you run Terraform commands directly from such a directory, it is considered the root module.
I can call on a different module (in a different repo) from my current module - that other module will be a ‘child module’.
In many ways, Terraform modules are similar to the concepts of libraries, packages, or modules found in most programming languages.
Using a Terraform Registry module
- Locate the module on the Registry
- Note the key arguments of the module include source and version:
module “network” {
source = “terraform-google-modules/network/google”
version = “3.3.0”
# insert the 3 required variables here
} - Modules should have 3 variables as well eg:
- network_name: The name of the network being created
- project_id: The ID of the project where this VPC will be created
- subnets: The list of subnets being created - I can define the variables in the variables.tf file like:
**
variable “project_id” {
description = “The project ID to host the network in”
default = “qwiklabs-gcp-03-82d68f9a40fb”
** - I can call on the variable from main.tf:
**
module “test-vpc-module” {
source = “terraform-google-modules/network/google”
version = “~> 6.0”
project_id = var.project_id # Replace this with your project ID in quotes
network_name = var.network_name
mtu = 1460
** - Can also define output variables in outputs.tf
- Run terraform init
- Run terraform apply
Build a module
- create a module to manage Compute Storage buckets used to host static websites
- A typical file structure of a module is:
├── LICENSE
├── README.md
├── main.tf
├── variables.tf
├── outputs.tf
Lab:
Managing Terraform State (unfinished)
Info about infrastructure state is stored in terraform.tfstate
Terraform expects that each remote object is bound to only one resource instance.
Terraform also stores metadata (eg info about dependencies).
Terraform provides syncing, so at a point in time only one person can run Terraform. This is called Remote Locking (in the Remote State).
Each Terraform configuration has an associated backend that defines how operations are executed and where persistent data such as the Terraform state is stored.
Certain backends support multiple named workspaces, which allows multiple states to be associated with a single configuration. The configuration still has only one backend, but multiple distinct instances of that configuration can be deployed without configuring a new backend or changing authentication credentials.
Working with backends
1. When a backend is created, Terraform asks if you want to move your current state to backend. It is recommended that you also manually back up your state. You can do this by simply copying your terraform.tfstate file to another location.
Bringing existing infrastructure to Terraform