Big Data Flashcards
The 3 Vs of big data
volume
velocity
variety
The 2 additional dimensions of big data
variability
complexity
“Volume” in big data
organizations collect data from a variety of sources
“Velocity” in big data
data streams in at an unprecedented speed and must be dealt with in a timely manner
“Variety” in big data
data comes in all type of formats
“Variability” in big data
data flows can be highly inconsistent with periodic peaks
“Complexity” in big data
the variety of sources of data makes it difficult to link, match, cleanse, and transform data across systems
what does acid stand for?
Atomicity
Consistency
Isolation
Durability
ACID: Atomicity
requires each transaction to be all or nothing
ACID: Consistency
ensures that any transaction will bring the database from one valid state to another
ACID: Isolation
ensures that concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially (one after another)
ACID: Durability
ensures that once a transaction has been committed, it will remain so even in the event of power loss, crashes or errors
What is Hadoop?
a distributed file system and data processing engine that is designed to handle extremely high volumes of data in any structure
the 2 components of Hadoop
the distributed file system
the MapReduce programming paradigm for managing applications on multiple distributed servers
what is data mining?
-the discovery of useful patterns in the data
–the nontrivial extraction of implicit, previously unknown information from data
-the exploration and analysis of large quantities of data to discover meaningful patterns