1_Data Processing fundamentals Flashcards
1
Q
Data Lifecycle
- Ingestion is the process of bringing application data, streaming data, and batch data into the cloud.
- Storage stage focuses on persisting data to an appropriate storage system.
- Processing and analyzing is about transforming data into a form suitable for analysis.
- Exploring and visualizing focuses on testing hypotheses and drawing insights from data.
![](https://s3.amazonaws.com/brainscape-prod/system/cm/326/837/406/q_image_thumb.png?1605049952)
A
2
Q
Batch Data
- Batch data is ingested in bulk, typically in files.
- Examples of batch data ingestion include uploading files of data exported from one application to be processed by another.
- Large sets of data tha ‘pool’ up over time.
- Low latency is not as important.
- Both batch and streaming data can be transformed and processed using Cloud Dataflow.
A
3
Q
Streaming Data
- Streaming data is a set of data that is sent in small messages that are transmitted continuously from the data source.
- Streaming data may be telemetry data, which is data generated at regular intervals, and event data, which is data generated in response to a particular event.
- Stream ingestion services need to deal with potentially late and missing data.
- Requires low latency.
- Streaming data is often ingested using Cloud Pub/Sub.
A
4
Q
Data Processing Solutions
![](https://s3.amazonaws.com/brainscape-prod/system/cm/388/618/551/q_image_thumb.png?1656026759)
A
5
Q
Levels of structure of data
- These levels are structured, semi-structured, and unstructured.
- Structured data has a fixed schema, such as a relational database table.
- Semi-structured data has a schema that can vary; the schema is stored with data.
- Unstructured data does not have a structure used to determine how to store data.
A
6
Q
Choosing a datastore
![](https://s3.amazonaws.com/brainscape-prod/system/cm/326/855/076/q_image_thumb.png?1656021009)
A