GCP BigData General Flashcards
What are the GCP BigData services?
- Dataproc
- Dataflow
- Bigquery
- Cloud Pub/Sub
- Datalab
What is Dataproc?
It is managed,
- Hadoop
- Mapreduce
- Pig
- Hive
What is Dataflow?
Stream and batch processing pipelines.
What is BigQuery?
It is a data wherehouse with analytical capabilities.
Stream data at 199K rows er second
For GCP Big Data service, do you need to provision and manage resources?
No Google takes care of this for you.
How long will it take to deploy a hadoop cluster using Dataproc?
90sec
When deploying hadoop with Dataproc can I decided on the instance size and memory?
Yes
When running hadoop with Dataproc am I fixed to the size of the cluster I am currently using?
No, you can scale up or down as needed.
I need to monitor my Dataproc hadoop cluster, what options do I have?
Use stackdriver