Introduction Flashcards
how is “Big Data” both a noun and an adjective?
we have big data, and we use big data tools.
big data is so big that (definition):
traditional data processing application software is inadequate
what are the four v’s?
volume, velocity, variety, veracity (disinformation?)
data was always powerful, what changed?
new sources, insignificant data is actually most valuable (niche products, long tail model)
5 use cases of big data
remote patient monitoring: preventative care
product sensors: manufacturing support
real time location data: geo-advertising
public surveys: tailored public services
social media: marketing, retail
big data acquisition challenges (besides the main storage problem)
selection, filtering & compression, collecting metadata
Big data doesn’t have to be too big
we process a lot to understand which data is actually valuable
fundamental to understand measure, and control the data
metadata
big data processing challenges
single machine limitation, parallelization, fault-tolerance
3 main scenarios for programming solutions for big data
batch, interactive (process quickly) and streaming
what is a data lake?
central repository for data kept in original format and queried only when needed
NoSQL
not only sql, scalable version, no acid, no standard for modeling or querying
NewSQL
combine benefits from relational and NoSQL
two main approaches listed under analytics techniques within big data framework
data mining and machine learning