What is big data & big data framework Flashcards
1
Q
What is big data?
A
Big data is made out of four dimensions:
- Volume: refers to the quantity of available data (high volume)
- Velocity: refers to the rate at which the data is recorded/collected (high velocity)
- Veracity: refers to quality and applicability of data
- Variety: refers to the different type of available data (high variety)
Which require specific technology and analytical methods to transform into value.
To transform it into a value-adding process it must be able to:
- Demonstrate tangible value
- Be able to operationalize
2
Q
What components are included in the big data ecosystem to enable it?
A
- Scalable storage
- Scalable analytics
- Computing platform
- Application development framework
- Data management environment
- Project management processes and tools
3
Q
What is Hadoop?
A
Hadoop is essentially a collection of open source projects that are combined to enable a
software-based big data appliance.
Three important layers:
- Hadoop Distributed File System (HDFS)
- Decreases cost of specialty large-scale storage systems
- Provides ability to rely on commodity components
- Enables ability to deploy using cloud-based services
- Reduces system management costs
- MapReduce
- Is a Software framework
- Used to write applications which process vast
amounts of data inparallel on large clusters
- YARN: a new generation framework for job scheduling and cluster management
4
Q
What is data mining? Why is it useful? What does the typical data mining process look like?
A
What
Data mining is the art and science of discovering knowledge, insights and patterns in data
Why
- Recognition of hidden value in data
- Ability to effectively gather quality data and efficiently process it