big data Flashcards
define big data
too big or too complicated to be managed using normal techniques
no universally agreed definition
what are the four V’s of big data
Volume; the size data, terabytes or even exabytes to process.
Velocity; the speed at which data flows, streaming milliseconds to record real time. sensor in a car.
Variety; the validity of the data, data inconsistency, latency
Veracity; the nature of the data (structured and unstructured formats), structured - databases, semi-structured - sml, unstructured - web search
what is the problem with big data
storage. especially when the data is in different formats
what is big data used for
data mining, data storage, data analysis, data sharing, data visualisation
examples of big data
- The New York Stock Exchange generates about one
terabyte of new trade data every day - Facebook generates 4 new petabytes of data per
day. Mainly in terms of photo and video uploads,
message exchanges, comments etc. - A single Jet engine can generate 10+ terabytes of
data per flight - Walmart processes 40 Petabytes of data, per day
issues with big data
- Data is not knowledge
- It is very possible to be data rich but information
poor (DRIP) - Large data sets are often collected opportunistically,
and not for the purpose of answering the question
that you are now interested in - Long term longitudinal data sets are often difficult
to exploit - Privacy, confidentiality and ethical considerations