Big Data Overview Flashcards
Big Data
refers to non-conventional strategies and
innovative technologies used by businesses and
organizations to capture, manage, process, and make
sense of a large volume of data
challenges of big data
*Capturing, transporting, and moving the data
*Managing - the data, the hardware involved, and the software
*Processing - to provide insight
*Storing - safeguarding and securing
conventional BI & DWH architecture
App Servers
Network Switches
Database Servers
SAN Switch
Storage Array
proprities : SQL based
High availability
Enterprise database
Right design for structured data
Analytics Architecture
Edge node
Network switches
Data nodes
porprities :Not only SQL based
High scalability, availability, and flexibility
Compute and storage in the same box for reducing network latency
Right design for semi-structured and unstructured data
Data and Application are in the same machine (Data nodes)
The Vs of Big Data
Volume Variety Velocity{the speed at which vast amounts of data are
being generated, collected and analyzed} Veracity {is the quality or trust of the data} Value
Volume
how much data is there?
Variety
- how many different types of sources are there?
Velocity
- how quickly is the data being created, moved, or
accessed?
Veracity
can we trust the data?
Validity
- is the data accurate and correct?
Viability
- is the data relevant to the use case at hand?
Volatility
- how often does the data change?
Vulnerability -
can we keep the data secure?
Visualization
- how can the data be presented to the user?
Value
- can this data produce a meaningful return on
investment?