4.11 Big Data Flashcards
What is big data
a catch-all term for data
that won’t fit the usual containers
can be described in terms of volume, velocity or variety
volume - too big to fit into a single server
velocity - milliseconds to seconds to respond
variety - data in many forms such as
structured, unstructured, text, multimedia.
What is fact based modelling in big data
Deconstructing big data into fundamental units known as “facts”
and placing these facts into a data set
What can be said about the data set produced from fact based modelling
The dataset will be an ever growing list of immutable facts
What are the advantages of fact based modelling (4)
No indexing needed
New items can simply be appended to the growing dataset
Facts are immutable making it easy to query
Data is true forever
How can functional programming help when dealing with big data (3)
Statelessness
High order functions
Immutable data structures
all of which makes it easier to write correct
and efficient distributed code
What is meant by statelessness in functional programming
Nothing about a functional program is dependent on how often it is called or what order other functions are called in
Makes it easier to write correct code and easy to understand
What is meant by high order functions in functional programming
High order functions are functions which can take more than one input or output another function
High order functions are easily parallelised meaning more than one processor can work on different parts of the large data set at the same time.
This helps when having to deal with large volumes of data “Big data”
What is meant by immutable data structures in functional programming
Functional programming languages are immutable - object cannot be modified once created.
Allows for easy parallel processing