GCP & ML Fundamentals Flashcards

1
Q

Big data challenges are…

A
  • Migrating existing data solutions
  • Analysing large datasets at scale
  • Building streaming pipeline
  • Applying machine learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Google Cloud platform is made up of three layers…

A

Top: Big Data & ML Products

Middle: Compute Power, Storage, Networking

Bottom: Security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Google cloud bucket names have to be…

A

Globally unique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the command line command for google cloud?

A

gsutil

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Are we charged when a virtual machine is ‘stopped’?

A

Yes - but only for the disk space (keep the VM and the software which is installed on it).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why does cloud computing differ from desktop computing?

A

In cloud computing - compute power and storage are independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 types of google cloud storage?

A

Standard, Nearline, Coldline and Archive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Stardard Storage?

A

Best for frequently accessed data and for data which is not going to be stored for a long time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is nearline storage?

A

Low cost and highly durable for data which is read/modified once a month.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is coldline storage?

A

Lower cost than nearline and highly durable for data which is read/modified once a quarter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is archive storage?

A

Lowest cost for storing data, archived as a online backup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a project?

A

Base level organising entity for creating and using resources, managing billing, APIs and permissions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a folder?

A

A folder contains multiple projects within an organisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the root node of the GCP hierarchy?

A

Organisation

Organisation -> Folder -> Project -> Resources e.g. BQ dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What google service controls GCP resources?

A

Identity and Access Management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is IAMs and what does it do?

A

Identity and Access Management - it allows you to fine tune access control to all the GCP resources in use.

17
Q

What is the main benefit of networking?

A

We don’t have to do everything on one machine if we have a fast enough network. Googles data center network enables separation of compute and storage (process data without copying it).

18
Q

What are edge points of presence (in networking)?

A

Edge points of presence are where the private google network connects to the public one.

19
Q

What are the different levels of security you can have?

A
  • On-premise (full responsibility)
  • Infrastructure as a Service (IaaS)
  • Platform as a Service (PaaS)
  • Managed Services (Least Responsibility)
20
Q

What google tool helps to implement security policies?

A

IAMs (Identity and Access Management)

21
Q

How can BigQuery be accessed?

A

Command Line, REST API, Web UI and third party tools (e.g. matillion).

22
Q

What is Compute Engine?

A

Lets you run virtual machines on demand in the cloud = IaaS solution.

23
Q

What are the 3 types of compute engine available on GCP?

A
  1. Custom Machines - optimal cpu and memory for problem.
  2. Spot Machines - reduce computing cost by 91%
  3. Predefined Machines
24
Q

What is Google Kubernetes Engine (GKE)?

A

GKE is clusters of machines running containers, it is way to orchestrate code which is running in containers.

25
What is a container?
A container is a package of code and its dependencies - it is highly portable and resource efficient.
26
What is App Engine?
Googles PaaS which is a way to run code without worrying about infrastructure.
27
What are Cloud Functions?
Serverless execution environments (FaaS (function as a service)) - it executes code in response to an event and google scales resources as required.
28
What are Googles database managed services?
Cloud BigTable, Storage, SQL, Spanner and Datasource.
29
What roles exist in an analytics team?
- Data Engineer - Decision Makers - Analysts - Statisticians - Applied ML Engineer - Data Scientists
30
What is a recommendation system?
Model recommends products based on preferences e.g. Netflix or YouTube. - They must scale to meet demand. - Prediction can be stream or batch.
31
Where should you store unstructured data?
Cloud Storage
32
Where should you store data which is structured and has a latency in seconds?
BigQuery
33
Where should you store data which is structured and has a latency in milli-seconds?
Cloud BigTable
34
Where should you store data which is structured and is a No SQL workload?
Cloud DataStore
35
Where should you store data which is structured, is SQL based and where 1 database is enough?
Cloud SQL
36
Where did Big Data Tools evolve from?
- Hadoop = MapReduce (map; performs filter and reduce; summary operations). - Cloud Services = seperate, specialise and connect.
37
What are clusters?
Clusters are a fungible resource, they are used when required and manged automatically through dataproc. - Clusters can be up and running in 90seconds.
38
What can we do to clusters to avoid under or over provisioning?
- Auto Scaling = turn on or off according to job size. | - Incorporate Premptible VMs - affordable (80% cheaper) and short lived but limited.