GCP & ML Fundamentals Flashcards
Big data challenges are…
- Migrating existing data solutions
- Analysing large datasets at scale
- Building streaming pipeline
- Applying machine learning
Google Cloud platform is made up of three layers…
Top: Big Data & ML Products
Middle: Compute Power, Storage, Networking
Bottom: Security
Google cloud bucket names have to be…
Globally unique.
What is the command line command for google cloud?
gsutil
Are we charged when a virtual machine is ‘stopped’?
Yes - but only for the disk space (keep the VM and the software which is installed on it).
Why does cloud computing differ from desktop computing?
In cloud computing - compute power and storage are independent.
What are the 4 types of google cloud storage?
Standard, Nearline, Coldline and Archive
What is Stardard Storage?
Best for frequently accessed data and for data which is not going to be stored for a long time.
What is nearline storage?
Low cost and highly durable for data which is read/modified once a month.
What is coldline storage?
Lower cost than nearline and highly durable for data which is read/modified once a quarter.
What is archive storage?
Lowest cost for storing data, archived as a online backup.
What is a project?
Base level organising entity for creating and using resources, managing billing, APIs and permissions.
What is a folder?
A folder contains multiple projects within an organisation.
What is the root node of the GCP hierarchy?
Organisation
Organisation -> Folder -> Project -> Resources e.g. BQ dataset
What google service controls GCP resources?
Identity and Access Management
What is IAMs and what does it do?
Identity and Access Management - it allows you to fine tune access control to all the GCP resources in use.
What is the main benefit of networking?
We don’t have to do everything on one machine if we have a fast enough network. Googles data center network enables separation of compute and storage (process data without copying it).
What are edge points of presence (in networking)?
Edge points of presence are where the private google network connects to the public one.
What are the different levels of security you can have?
- On-premise (full responsibility)
- Infrastructure as a Service (IaaS)
- Platform as a Service (PaaS)
- Managed Services (Least Responsibility)
What google tool helps to implement security policies?
IAMs (Identity and Access Management)
How can BigQuery be accessed?
Command Line, REST API, Web UI and third party tools (e.g. matillion).
What is Compute Engine?
Lets you run virtual machines on demand in the cloud = IaaS solution.
What are the 3 types of compute engine available on GCP?
- Custom Machines - optimal cpu and memory for problem.
- Spot Machines - reduce computing cost by 91%
- Predefined Machines
What is Google Kubernetes Engine (GKE)?
GKE is clusters of machines running containers, it is way to orchestrate code which is running in containers.