Big Data Intro - Week 1 Flashcards
Big Data: 5 V’s?
5 Vs: volume, variety, velocity, veracity, and value.
What does veracity mean?
Veracity – Data Trustworthiness: refers to uncertainty due to data inconsistency & incompleteness, ambiguities, latency, deception, model approximations.
Big Data Characteristics (the other 5 v’s)
● Validity: correctness of data
● Variability: dynamic behaviour
● Volatility: tendency to change in time
● Vulnerability: vulnerable to breach or attacks
● Visualization: visualizing meaningful usage of data
What are some challenges of processing big data?
○ The analysis of Big Data which uses data structures which do not conform to traditional notions of data structure
○ The storage of data that exceeds the capacity of conventional computer systems
○ The processing of Big Data that may require computational resources exceeding the computation capacities of conventional computing systems.
What is prescriptive referring to?
Prescriptive
○ Answers what should be done based on data.
○ Provides actionable intelligence.
What are 4 types of Big Data Analytics?
Descriptive, Predictive, presciptive, diagnostic
What are the stages of big data analytics?
● Problem identification & data requirement
● Data pre-processing
● Data Analytics
● Data Visualization
What is quasi structured data?
Somewhat structured data that can become structured through algorithms, tools, and time.
What is spark used for in one sentence?
Spark is used for real time data analytic processing
In one word, what is PIG used for?
Analytics
In less than one sentence, describe what HIVE is used for
Data warehousing