Introduction Flashcards
What launch the Big Data era ?
The combination of growing data and on demand cloud computing
What steps are required for processing unstructured data ?
Data Acquisition, Storage, Retrieval, Cleaning, Processing
What are the 4 technologies helping handle unstructured data ?
Hadoop, Storm, Spark, NoSQL
What is Hadoop ?
Open source framework designed to handle big amount of unstructured data
What are Storm and Spark ?
Frameworks for real time processing of a big amount of data
What is Neo4J ?
A graph database
What is Cassandra ?
A Key-Value Pairs database
Where does the value of Big Data come ?
Value comes from integrating different types of data source and analysing them at scale
What are 3 advantages of integrating data sources leading to an increased data collaboration ?
It reduces complexity, it increases data availability, it unifies the data systems
What are the 5 Vs - characteristics of Big Data ?
The 5 Vs are Volume, Variety, Velocity, Veracity, Valence
What could be the 6th V completing the 5 characteristics of Big Data coined by Doug Laney of Gartner?
It could be Value.
What are the original 3 Vs - Characteristics of Big Data ?
The 3 first V’s are Volume, Variety, Velocity
What does Valence refer to ?
This refers to how big data can be bond with each other, forming connections between otherwise disparate datasets. It also refers to the connectiveness of big data in the form of Graphs.
What does Volume refer to ?
This refers to the vast amounts of data that is generated every second/minute/hour/day in our digitized world. Dimension of Big Data related to its size and its exponential growth.
What does Variety refers to ?
This refers to the ever-increasing different forms that data can come in, e.g., text, images, voice, geospatial. The variety refer to the additional complexity related to different kinds of data that needed to store, combine and process