Big Data Solutions Flashcards
What is Cloud Pub/Sub?
A messaging event service that is fully managed and used for data pipelines
Which services align with the Cloud Dataflow pipeline?
Cloud Dataflow is based on Apache Beam
Which services align with the Cloud Dataproc pipeline?
Cloud Dtaproc is used for Apache Spark and Hadoop clusters
What is Big Query?
BigQuery is a fully managed anaylitics service used to help analyze large amounts of data
What language are queries executed in BigQuery?
SQL language
What services can Cloud Pub/Sub integrate with?
Cloud Logs, Cloud API, Cloud Dataflow, Cloud Storage, and Compute Engine
What is the primary difference between Cloud Dataflow and Cloud Dataproc?
You must provision your own servers in Cloud Dataproc
What types of instances are available for Cloud Dataproc jobs?
Compute Engine instances, preemptible instances
What is Cloud IoT Core?
Cloud IoT Core is a fully managed Google service that offers secure connections, management, and ingestion of data from IoT devices
What type of Pub/Sub protocol does Cloud IoT Core use?
It typically uses MQTT Pub/Sub protocol more effectively than HTTP although it can use both
What can you do with Cloud IoT Core?
You can register, configure, update, and control IoT devices
How much data can be loaded into BigQuery?
BigQuery can be scaled to petabytes of data, although it must always contain at least 1 dataset
What is a publisher?
An application that can create and send messages to a topic
What is a topic?
A topic is a resource to which messages are sent by publishers
What is a message?
Data a publisher will send to a topic (data in transit)