Big Data Flashcards
State the three Vs of big data.
Volume
Velocity
Variety
Define variety
The data held on the servers consists of many different types of data. It ranges from binary files to photos and videos.
Define volume
There is too much data to fit onto a single server. Data must be stored across multiple servers, each composed of many hard drives.
Define velocity
Data on the servers is created and modified rapidly. The servers must
respond to frequently changing data within a matter of milliseconds.
What is big data
The large volume of data - both structured and unstructured - that inundates a business on a day-to-day basis.
What is the most challenging attribute of big data?
A lack of structure. Unstructured data is difficult to analyse, and conventional databases are not suited to storing it as it doesn’t conform to a row and column structure.
How is useful information extracted from big data?
Machine-learning techniques are used to discern patterns in the data.
Which programming paradigm is well suited to processing big data?
Functional programming
Which property of data structures in functional programs means that their value doesn’t change after instantiation?
Immutable
This means something can never change once created
Describe the fact-based model for representing data.
Each individual piece of information is stored as a fact.
Facts are immutable and can’t be overwritten.
Therefore, each fact is stored with a timestamp. This allows multiple values to be held for the same attribute.
In the event of a query, the timestamps are compared and the most recent is returned.
State two advantages of using the fact-based model for storing data?
It reduces the risk of accidentally losing data due to human error.
New data is simply appended to the dataset as it is created, therefore an index is not required.
What is represented by nodes in graph schema?
Entities
How are relationships between entities represented in graph schema?
Arrows
What is an assumption made about graph schema?
Each node contains the most recent information available.