2. Google Cloud Fundamentals Flashcards
Google Cloud Global Infrastructure (regions, zones, locations)
- 40 regions
- 121 zones
- 187 network edge locations (PoP - point of presence locations)
- 200+ countries and territories
Google Infrastructure: multi-regions, regions & zones
Multi-regions:
- contain 2+ regions
- ensures your data is always available
- E.g. Cloud Spanner has multi-region configs - allows to replicate your app in several zones and several regions - good for low latency
Regions:
- comprise several zones
Zones:
- smallest entity
- they are called deployment areas for Google Cloud resources within a region
VPC (Virtual private cloud)
Resources can be in different zones but on the same subnet.
The size of the subnet can be increased by expanding the range of IP addresses
VPCs provide a global distributed firewall.
Recommendation don’t use default: it’s too big, broad (too many regions), insecure (only default firewall rules exist and no restrictions on the internal network)
Compute Service Options: Compute Engine
- Characteristics
- IaaS (offers the greatest flexibility out of all the compute service options)
- offers VMs as instances
- can choose zone and region for VM
- can choose operating system for VM
- can use private & public images
Compute Service Options: Compute Engine
- How to manage multiple instances?
Using instance groups; can add/remove capacity using autoscaling with instance groups
Compute Service Options: Google Kubernetes Engine (GKE)
- Characteristics
- Container-orchestration system for automating, deploying, scaling and managing containers
- CaaS (container as a service)
- Set of APIs to deploy containers on a aset of nodes/clusters/compute instances like machines (not VMs aka nodes, like in Google Cloud)
- The smallest unit that can be deployed by Kubernetes is a Pod - generally one pod will include one container (sometimes 2 if they are closely linked)
- Pods provide a unique IP
- kubectl command allows to control containers in a pod
- To see the list of pods: kubectl get pods
- Kubernetes creates a fixed IP for all the pods
- Load balancer is created as a Network load balancer to control the containers and is attached to the external IP of the cluster of pods (kubectl expose deployments nginx –port=80 –type=LoadBalancer)
- a service has an external IP too, this is used to refer to the pods ‘cluster’. This is because individual pod IPs change all the time, but the server IP is fixed. It can be used by another service/cluster of pods to communicate between each other even if the inner pods change
- kubectl scale - allows to change the no. of pods depending on the requirements
- kubectl get deployments shows how many pods we have and what they are
- kubectl apply -f nginx-deployment.yaml applies the changes in the config yaml file. The changes were made to eg change the no. of pods
- kubectl get services - to see the public IP of the service
- kubectl rollout - to update a new version of the app
- GKE consists of multiple machines/Compute Engine instances, grouped together under a ‘cluster’.
- GKE adds advantages such as: load balancing, auto scaling, auto upgrades, node auto-repair, logging and monitoring.
- gcloud container clusters create k1 - To start Kubernetes on a cluster in a GKE through a command
GKE added features (unlike basic Kubernetes)
- Load balancing for Compute Engine instances
- Node pools to designate subsets of nodes within a cluster for additional flexibility
- Automatic scaling of your cluster’s node instance count
- Automatic upgrades for your cluster’s node software
- Node auto-repair to maintain node health and availability
- Logging and Monitoring with Cloud Monitoring for visibility into your cluster
Cluster definition
Group of nodes = group of Compute Engine instances.
A cluster consists of at least one cluster master machine and multiple worker machines called nodes (compute engine instances).
Lab:
Create new GKE cluster - command & comments
- To create a new cluster:
gcloud container clusters create –machine-type=e2-medium –zone=us-central1-a lab-cluster - After the cluster is created, you need auth credentials to be able to interact with it. These can be obtained by running:
gcloud container clusters get-credentials lab-cluster
Once you know credentials, you can deploy a containerised application to the cluster.
GKE uses Kubernetes objects (Examples of Kubernetes objects include Pods, Services, Deployments, ConfigMaps, and more) to create and manage your cluster’s resources. Kubernetes provides the Deployment object for deploying stateless applications like web servers. Service objects define rules and load balancing for accessing your application from the internet.
There are different types of services, such as ClusterIP (internal service), NodePort (exposes the service on each node’s IP at a static port), and LoadBalancer (provisions an external IP address and distributes traffic to the service).
- To create a new Deployment hello-server from the hello-app container image, run the following kubectl create command:
kubectl create deployment hello-server –image=gcr.io/google-samples/hello-app:1.0 - To create a Kubernetes Service, which is a Kubernetes resource that lets you expose your application to external traffic, run the following kubectl expose command:
kubectl expose deployment hello-server –type=LoadBalancer –port 8080 - To inspect the hello-server Service, run kubectl get:
kubectl get service - To view the application from your web browser, open a new tab and enter the following address, replacing [EXTERNAL IP] with the EXTERNAL-IP for hello-server.
http://[EXTERNAL-IP]:8080 - To delete the cluster, run the following command:
gcloud container clusters delete lab-cluster
Compute Service Options: App Engine
- Characteristics
- PaaS
- fully managed, serverless platform for developing and hosting web applications at scale
- here google manages required resources for your application to run, i.e. on-demand scaling is available
- integrates with Web Security Scanner to identify threats
Compute Service Options: Cloud Functions
- Characteristics
- FaaS (Function as a Service)
- Function is triggered when an even being watched (monitored) is fired
- Serverless execution environment for building and connecting cloud services
- Cloud functions can be written using JavaScript, Python 3, Go, Java
Compute Service Options: Cloud Functions
- Use cases
- Data processing or ETL operations
- Webhooks to respond to HTTP triggers
- APIs that compose loosely coupled logic
- Mobile backend functions
Compute Service Options: Cloud Run
- Characteristics
- serverless (no need to focus on infrastructure, just on the app development)
- used to deploy and scale containerised applications
- no need to do infrastructure management, as it scales up and down automatically
1.write code in any language, any library, any binary
2. build and package your app into a container image - use docker, create a Dockerfile. Then create a container image from the same directory where the Dockerfile is saved by running (gcloud builds submit –tag gcr.io/$GOOGLE_CLOUD_PROJECT/helloworld)
3. image is pushed to Artifact Registry (Cloud Run) for deployment
- FaaS
Once deployed, you will get a unique HTTPs URL - the on-demand container starts, and any further containers are added/removed as needed
- Cloud Run adds all the encryption, only worry about web requests because it handles HTTPs requests for you.
- Pay only for the resources used while the container is handling the request / pay for when it’s starting a container / pay for when it’s shutting down a container
- Adding a container to Cloud Run allows us to make our project publicly available using the URL assigned to it by Cloud Run. Unless pushed to Cloud Run, the website will only be available for viewing on my PC.
Command to move data to storage (if drag and drop isn’t used)
gcloud storage
The command is used in “Online Transfer” process
Storage Transfer Service vs Transfer appliance
Service that allows to transfer large amounts of online data.
Allows to move data:
- from a diff cloud provider
- a diff Cloud Storage region
- from an HTTPs endpoint
Transfer appliance
Actual disk that is given to which you can transfer data and then you ship it back to the upload facility. Can transfer up to 1 Petabyte of data (1024 Terabytes = 1024 * 1024 GB)
Offline media Import
Another 3rd party company ships the USB drives to google
- Storage Options: Cloud Storage
- Overview
MAIN storage options
- object storage (docs, pics)
- unstructured data with no file sharing
- 11 9s durability (99.999999999%) - unlikely to lose a file
- Unlimited storage with no minimum object size
- scalable
- single API across storage classes
- by default the object will take the same storage class property as the bucket where it was placed, but this can be changed for the object
- bucket’s storage class can be changed, but not the availability area, cannot switch between Regional/Multi-region/Dual-region
Control on buckets
- IAM
- ACLs - access control lists - Who has access and the level of access (owner/writer/reader); each object can have up to 100 ACLs
- Signed URL (cryptographic key) - time limited access to a bucket/object; can be used by someone without google account, a URL is created with certain permissions, time-limited (gcloud storage signurl -d 10m path/to/key.p12 gs://bucket/object
- signed policy document - what kind of file can be uploaded by someone with a signed URL
- Object Lifecycle Management - control if to delete the object or maybe archive it after a certain period of time (if changed the rules, may take 24hrs to update)
- strong global consistency is offered: so if a bucket is created, it immediately shows, if a file is deleted/added all is immediately reflected across the globe
Storage Options: Cloud Storage
- Storage Classes
- Standard (no limitations, the one you access all the time)
- Nearline (low-cost, to be accessed less than 1/month, retrieval costs start with this class)
- Coldline (lower-cost, to be accessed once every quarter)
- Archive (lowest cost, to be accessed once a year)
- Autoclass (the storage class is assigned automatically by the system based on the storage usage pattern)
Storage Options: Cloud Storage
- Availability
- Region (storing data in one region)
- Dual-region
- Multi-region (storing in one same geographic area but across several regions)
Cloud Storage: use cases and URL keys
URL keys hold the following info:
- Globally unique identifier
- binary form of the actual data itself
- relevant associated meta-data
Eg. video storage, pictures, audios
Cloud Storage use cases:
- Online content
- Backup and archiving
- Storage of intermediate results
Cloud storage units are buckets, they have unique names and must have a location specified for lower latency.
Objects are immutable, if changed, a new version of the file is created. Object versioning exists, so if you delete a file it can be recovered from the previous version - but the versioning must be pre-enabled.
- Storage Options: Filestore
- fully managed NFS (Network File System) file server
- unstructured data
- use with VM instances and Kubernetes Clusters accessing the data at the same time
- several VM instances use the storage at the same time
- for GKE or Compute Engines
- good for high-performance workload
- migration of on-premises applications
- complex financial models
- web developers
- Storage Options (SQL/Relational):
Cloud SQL VS Cloud Spanner
MAIN storage options
- Storage Options (SQL/Relational):
For STRUCTURED data
Cloud SQL:
- available in many zones
- managed service of PostgreSQL, MySQL, and SQL Server
- when 2 zones are used, the process of failover (when the primary instance (zone) fails and the users are directed to the standby instance (second zone)) can be managed.
- scaling up is available (but instance restart is required)
- Choosing a connection type
– if an application is in the same project and in the same region -> private IP connection
– if it’s in a different region/project –> cloud SQL auth proxy for auto key rotation OR for manual SSL connection OR Unencrypted connection
- to download Cloud SQL proxy:
wget ___
./cloud_sql_proxy_new -instances=database_name
- connect Cloud SQL via private IP (note the private IP of the SQL instance; copy external IP of a VM and use it as a website name)
Cloud Spanner:
- available across zones + regions + globally
- scalable relational DB
- designed to support transactions, consistency, synchronous replication, high performance
- good for financial apps due to having transactional consistency
Alloy DB
- good for machine learning and generative AI
- PostgreSQL
- good for backups
- fast transactional processing
- real-time fast business insights
- Storage Options (NoSQL): BigTable
MAIN storage options
- used for data analytics and operations
- fully-managed, scalable up - very well, NoSQL database
- low latency
- offers cluster resizing without downtime
- handles massive workloads (good for quickly changing data eg finance)
- data can be uploaded using APIs, streaming services etc
- good for ML
- it learns to adjust to specific access patterns
- the smallest number of Nodes you could have is 3 and you have to pay for them regardless
Tables structure
- Column family (a general name for all the columns)
- Column qualifier (the actual columns)
- Tables are stored on Colossus - google’s file system
Note: it is not serverless, used in Search, Google Maps, Analytics, FinTech
- Storage Options (NoSQL): Firestore
MAIN storage options
- NoSQL, realtime database - data sync is use to update any connected device
- optimised for offline use
- cluster resizing without downtime
- Data is stored in documents and then organised into collections
- You are charged for every Read Write Delete of documents + for the amount of database storage used (there is also free quota per day)
- supports ACID trxns, so if one trxn in the process fails, the whole process will fail
- Firestore is an improved version of a datastore
- allows to scale down very well but can scale up too
- includes transactional consistency (if it is not required, use Bigtable)
2 modes:
Datastore mode (for new server projects)
Native mode (for new mobile and web apps)
Note: it is also only horizontally scalable and not serverless.
- Storage Options: Persistent Disks
- Durable block storage for instances
Options (available per zones and per regions):
- Standard
- Solid State (SSD): lower latency, higher iops
- Local SSD: attached to your VM hardware directly but will only persist till the VM is stopped/deleted. Has higher storage capacity
Note: block storage stores operating system info
Storage/Database Options (NoSQL): Memorystore
- highly available in-memory service for Redis and Memcached
- in memory caching
- fully-managed
- allows for high-availability, failover, patching
Note: it is not scalable, it is not serverless, not realtime.
Database Options (NoSQL): Datastore
- fast, fully-managed, serverless, NoSQL document database
- for mobile, web and IoT apps
- multi-region replication
- ACID transactions (maintains data integrity even in the presence of system failures, crashes, or errors)
Note: it is not scalable
Cloud Storage vs Cloud SQL vs Spanner vs Firestore vs Cloud Bigtable
Cloud Storage
- good when storing immutable objects larger than 10 mb, eg videos (Stores PBs of data)
Cloud SQL
- if you need full support from SQL for an online transaction processing system
- good for storing user credentials/customer orders, i.e. good for web frameworks (up to 64 TB of data)
Spanner
- same as Cloud SQL but allows horizontal flexibility (Stores PBs of data)
Firestore
- massive scaling (TBs of data)
- Storing and syncing data from mobile and web apps
Cloud Bigtable
- No SQL queries support, good for large amount of structured objects
- Analytical data, heavy with read and write events (PBs of data)
Networks, Firewalls and Routes: VPC
- virtualised network (can think of it as a virtualised datacentre)
- global resource
- default network already pre-exists, new networks can be added but cannot be shared between different projects
- default isn’t secure enough