Big Data Flashcards
What is Volume (Big Data)?
When there is too much data for it to fit on a conventional hard drive or server - data has to be stored over multiple servers - each of which has many hard drives
What is Velocity? (Big Data)
Data on the server is created and modified rapidly - servers must respond quickly to frequently changing data
What is Variety? (Big Data)
The data held on the servers contains many different types of data (e.g. Binary Files/Multimedia)
What is the most challenging thing about Big Data?
The most challenging thing is not the volume, but the structure of big data
Big data is unstructured, so it is difficult to analyse te data.
1) Conventional databases arent suited to strore big data as big data does not conform to a row and column structure
2) Conventional Databases also don’t scale well across multiple servers - as the data must be split across multiple machines, which would have to be all synchronised, which is incredibly difficult
How is data extracted? Big Data
1) Machine learning techniques are used to discern patterns in the data
2) (E.g. Data from Surveillance Systems)
What is fact-based modelling?
A way if representing big data
1) Each individual piece of information is stored as a fact. Facts are immutable (they can’t change once created), and can’t be overwitten
2) Stored with each fact is a timestamp, which shows when the info was recorded. Mutliple different values can be held for the same attribute, so timestamps come in to see the most recent one
3) Reduces risk of accidentally losing data due to human error as cannot be overwitten