4.11 - Big Data Flashcards
What is Big Data?
Catch-all term for data that doesn’t fit usual containers
What are the three defining features of big data?
Volume, velocity and variety
What is volume?
Too much data to fit on conventional hard drive or server, so data must be stored over multiple servers
What is velocity?
Data is created and modified rapidly, so servers must respond to frequently changing data within a matter of milliseconds.
What is variety?
Data held on servers consists of many different types, from binary files to multimedia like photos and videos.
What is the biggest problem with big data and why?
Its unstructured nature makes it difficult to analyse the data. Conventional databases are not suited to it because they require data to conform to a row and column structure. Furthermore, conventional databases do not scale well over multiple servers.
How is useful information extracted from big data?
Machine learning techniques are used to discern patterns in the data.
What is the solution to processing data over multiple machines and why?
Functional programming, since functional programs are stateless made make use of immutable data structures. Furthermore, it supports higher-order functions.
These attributes make it easier to write correct, efficient, distributed code.
What is the fact-based model for representing data?
Each individual piece of info is stored as a fact. Facts are immutable and can’t be overwritten. Each fact has time stamp stored with it, indicating date and time when info was recorded. Facts never deleted or overwritten, so multiple values can be held for same attribute.
Reduces risk of accidentally losing data due to human error, and does away with index for data and instead simply appends new data to dataset.