Big Data Flashcards
Big data
Data that is too large/complex to be processed or analysed with traditional data processing techniques
What makes data “big data”?
o Many data points
o Many variables
o High frequency
o Complex data
Volume
Large amounts of data generated from various sources
Velocity
Data being created/processed
Variety
Data that has many forms (structured, unstructured, text, multimedia)
Veracity
Incomplete/inconsistent data
Structured data
All data follows the same format, making it easier to store and search
Unstructured data
Combination of different formats (e.g. file types)
Semi-structured data
Structured in form, but with less constraints than structured data
DIKW Model
- Raw data (red)
- Meaning of data (Traffic light has turned red)
- Context of data (The traffic light I’m driving towards has turned red)
- Data is then applied (Stop the car)
DRIP
Data Rich, Information Poor
Statistics
Data is usually collected to (dis)prove a hypothesis
Data science
Analysing data that has already been collected for other reasons
Analytics
Describe - identify trends
Diagnose - analyse past data to identify why something happened
Predict - predict future trends/events
Prescribe - act on the prediction