Big Data Flashcards
What are the 3Vs of big data and what do they mean?
Volume: Large amounts of data
Variety: In many different forms, from diverse sources
Velocity: The content is changing quickly
What is the ETL cycle?
Extract: Convert raw/semi-structured data into structured data
Transform: Convert units, join data structures, cleanup, etc.
Load: Load the data into another system for further processing
What is the difference between stream and batch processing?
Batch processing assumes all data exists in some store, and processes all data at once
Stream processing does not assume all data exists in some store, and processes data as it arrives to the system
What are the three basic data types?
Unstructured: Data with an unknown format
Semi-structured: Data with a known format
Structured: Data with a known format, linked in graphs/tables