Big Data Platform Flashcards
1
Q
What is Cloud Dataproc?
A
- Google’s managed apache service
- Billed by the second
- Runs on clusters owned by the customer
2
Q
What is Cloud Dataflow?
A
- Managed etl pipelines
- Automated scaling and provisioning
3
Q
What is Big Query?
A
- Analytics data warehouse
- Allows SQL queries
- Free monthly quota
- Bounded to a region
- Billing for storage and process used
4
Q
What is Cloud Pub/Sub?
A
- Messaging bus
- Good for cases where data arrives in high and unpredictable rates (iot for example)
5
Q
What is Cloud Datalab?
A
- “jupyter playground” managed service
- pay for resources used, nmr of notebooks does not matter
6
Q
What is Cloud ML Platform?
A
- Ml platform with use case specific managed apis like:
- Cloud Vision (image analysis)
- Cloud Natural Language (audio to text, reveal structure and meaning of text, extract information)
- Cloud Translation
- Cloud Video Intelligence (annotate videos)