4.11.1 Big Data Flashcards
Define big data.
A catch all term for data that won’t fit in the usual containers.
What do the three v’s do?
Describe big data.
What are the 3 v’s?
Volume.
Velocity.
Variety.
Define volume.
Too much data for it all to fit on a conventional hard drive or server. Data has to be stored over multiple serves, each of which is composed over many hard drives.
In terms of volume, why must data be stored over multiple servers?
As relational databases don’t scale well over multiple machines.
Define velocity.
Data in the servers created and modified rapidly. Servers must respond to frequently changing data within a matter of milliseconds.
Define variety.
Data held on the servers consists of many different data types - from binary files to multimedia files.
In terms of big data, what is the biggest problem?
Unstructured nature gives cause for difficulty when analysing the data. Conventional databases are not suited to store big data as it is required that it confirms to a column and row structure. Do not scale well over multiple servers.
What needs to happen when storing big data over multiple servers?
The processing associated with using the data must be amongst multiple machines.
Why is storing big data over multiple machines incredibly difficult with conventional programming paradigms?
As all machines would have to be synchronised so no data is overwritten or damaged.
Why is functional programming used with big data?
Solves the problem of programming over multiple machines.
Stateless - no side effects.
Uses immutable data structures.
Supports higher order functions.
Attributes make it easier to write and correct efficient, distributed code than with any procedural programming.
How can we represent data that doesn’t conform to the typical column and row format?
With the fact based model
In terms of the fact based model (FBM) how is data stored?
As a fact.
(FBM) What are the benefits of facts?
Immutable and cannot be overwritten, reducing the risk of loosing data due to human error.
(FBM) what is stored with each fact?
A time stamp - indicating the data and time each piece of information was recorded.