Big Data Flashcards
What are the three main features of Big data?
- Volume
- Velocity
- Variety
What does Volume mean in terms of Big data?
Data is too big to fit into a single server. Data must be stored over multiple servers, each composed of many hard drives.
What does Velocity mean in terms of Big data?
Data on the servers are created and modified rapidly. The servers must respond to frequently changing data within miliseconds.
What is Big Data?
A term used to refer to typically unstructured datasets that are large in terms of storage size, data streaming rate and/or variety.
What does Variety mean in terms of Big data?
Data held on servers consist of many different types of data, from binary files to multimedia files (photos and videos)
What is the problem with Big data being unstructured?
Being unstructured makes it difficult to analyse the data. Conventional databases aren’t suited for storing big data because they require the data to conform to rows and columns.
How can useful information be extracted from Big data?
Using machine learning techniques to discern patterns in the data.
What is the downside of storing data across multiple servers?
The processing associated with using big data must be split across multiple machines.
Why is functional programming used to process data over multiple machines?
Functional programs are stateless (they have no side effects), make use of immutable data structures and support higher order functions.
Makes it easier to write correct, efficient, distributed code.
Why do conventional programming paradigms struggle with processing data over multiple machines?
Conventional programming paradigms wouldn’t work as the machines would all have to be synchronised to stop data being overwritten or damaged.
What is a Higher-order function?
A function which takes functions as its inputs and/or outputs a function.
What is a Graph Schema?
A method of defining a database in terms of nodes, edges and properties.