Big Data Flashcards
1
Q
What defines Big Data (3V)
A
Volume, Velocity, Veracity
2
Q
What is Volume
A
The scale of information being handled by data processing system
3
Q
What is Velocity
A
The speed at which data is being processed: ingested, analyzed, and visualized
4
Q
What is Variety
A
The diversity of data sources, formats, and quality
5
Q
Data Warehouses
A
- Structural or Processed: Data is organized, may have been transformed, and is stored in a structural way
- Ready to use: Data exists in the warehouse for a defined purpose, and in a format where it is ready to be consumed
- Rigid: Data may be easier to understand, but less up-to-date. Structures are hard to change
6
Q
Data Lakes
A
- Raw or Unstructured: The data lake contains all raw unprocessed data, before any kind of transformation or organization
- Ready to analyze: Data is more up to date, but may require more advanced tools for analysis
- Flexible: No structure is enforced, so new types of data can be added at any time
7
Q
OLTP
A
- High volume of short transactions
- Fast queries
- high integrity
MODIFY DATA
8
Q
OLAP
A
- Low volume of long-running queries
- Aggregated historical data
QUERY DATA
9
Q
Stages of a Data Pipeline
A
- Ingestion
- Storage
- Processing
- Visualization
10
Q
Data ingestion Technical Challenges
A
- choose the correct compute and storage options. Otherwise, a solution can be too expensive or too slow
- data should have value
- security of data
11
Q
Common data transformations
A
- formatting
- labeling
- filtering
- validating
12
Q
Stages of Data Modeling
A
- Conceptual. What are the entities in my data? What are their attributes and relationships?
- Logical
- Physical
13
Q
Google Cloud Storage (GCS)
A
- Fully managed object storage
For unstructured data: images, videos. Access via API or programmatic SDKs - Multiple storage classes
Instant access in all classes. Lifecycle management for objects and buckets - Secure and durable
Secure access control. High availability and maximum durability
14
Q
Google Cloud Storage concepts (buckets)
A
- a bucket is a logical container for objects
- buckets exist within projects
- bucket names exists within a global namespace
- bucket can be:
- regional
- dual-regional
- nulti-regional
15
Q
Storage classes in GCS
A
- Standard
- Nearline
- Coldline
- Archive