Section 12 Chapter 72 - Big Data Flashcards
Big Data
Data that won’t fit into the usual containers due to its volume, velocity or variety
Volume of Big Data
Too big to fit in a single server
Velocity of Big Data
Very short amount of time to respond
Variety of Big Data
Can be structured/unstructured
Why the lack of structure common in Big Data is a problem (2)
- Analysing it is difficult
- Relational databases cannot be used as they require the data to be in a row-column format
How Big Data can be analysed
machine learning
Why the volume of Big Data is important
Relational databases don’t scale well across multiple machines
What programming paradigm is most suited to Big Data
Functional
How Big Data can be processed (with regards to volume)
Processing is distributed across multiple machines
Why functional programming is suited to Big Data (2)
- No side effects (Statelessness)
- Higher order functions
- Assignment is forbidden, which makes parallel programming much easier
Fact based model
An alternative to the relational data model. Immutable facts are recorded with timestamps rather than overwritten.
Graph Schema
Data is stored as nodes, properties and edges
Fact (In a fact based model)
A single piece of information