1_Data Processing fundamentals Flashcards
1
Q
Data Lifecycle
- Ingestion is the process of bringing application data, streaming data, and batch data into the cloud.
- Storage stage focuses on persisting data to an appropriate storage system.
- Processing and analyzing is about transforming data into a form suitable for analysis.
- Exploring and visualizing focuses on testing hypotheses and drawing insights from data.
A
2
Q
Batch Data
- Batch data is ingested in bulk, typically in files.
- Examples of batch data ingestion include uploading files of data exported from one application to be processed by another.
- Large sets of data tha ‘pool’ up over time.
- Low latency is not as important.
- Both batch and streaming data can be transformed and processed using Cloud Dataflow.
A
3
Q
Streaming Data
- Streaming data is a set of data that is sent in small messages that are transmitted continuously from the data source.
- Streaming data may be telemetry data, which is data generated at regular intervals, and event data, which is data generated in response to a particular event.
- Stream ingestion services need to deal with potentially late and missing data.
- Requires low latency.
- Streaming data is often ingested using Cloud Pub/Sub.
A
4
Q
Data Processing Solutions
A
5
Q
Levels of structure of data
- These levels are structured, semi-structured, and unstructured.
- Structured data has a fixed schema, such as a relational database table.
- Semi-structured data has a schema that can vary; the schema is stored with data.
- Unstructured data does not have a structure used to determine how to store data.
A
6
Q
Choosing a datastore
A