Big Data Flashcards
What are the 5 V’s?
- Volume - the sheer size of the data
- Velocity - the speed at which data arrives and is processed
- Variety - the number of different formats that data comes in
- Variability - the variations in meaning, the context dependence
- Veracity - the uncertainty of data quality
What is empirical science?
Science paradigm used thousands of years ago, which involves describing natural phenomena
What is theoretical branch science?
Science paradigm used in the last few hundred years, which involves using models and generalizations
What is computational branch science?
Science paradigm used in the last few decades, which involves simulating complex phenomena
What is data exploration science (eScience)?
Science paradigm used today, which involves unifying a theory, experimenting and simulation
What do big data frameworks deal with? (3)
- Distribution of data
- Mapping computation to distributed data
- Handling resource failures
What is Hadoop?
An ecosystem of big data technologies that are available as open source, originally developed by Yahoo.
What is the MapReduce algorithm?
Allows users to specify a map a function for distributed processing and a reduce function for aggregation of results