big_data_flashcards (2)
Definition of Big Data
Big Data refers to datasets that are so large and complex that traditional data processing software is inadequate to deal with them.
Characteristics of Big Data
Volume, Velocity, Variety, Veracity
Volume
The large quantity of data.
Velocity
The speed at which data is produced and analyzed.
Variety
Data can be structured, unstructured, or multimedia.
Veracity
The trustworthiness and quality of data, which may be inconsistent or ambiguous.
Use of Big Data as a Noun
Refers to the data itself, e.g., ‘We have big data’.
Use of Big Data as an Adjective
Refers to tools and processes used to handle data, e.g., ‘big data tools’.
Common Definitions of Big Data
Exceeds the capabilities of typical database software to store, manage, and analyze.
Hype Around Big Data
Data has always been powerful, but new sources and exponential growth drive new opportunities.
Importance of Long-Tail Data
Significant value comes from niche data items combined, as seen in cases like Amazon’s sales.
Success Stories
Big Data used in crime prevention, genetic disease diagnosis, finance, personalized advertising, and sports injury prevention.
Challenges of Big Data Processing
Data volume, parallelization, fault tolerance, and system-level management.
Apache Hadoop
An open-source framework that allows for the distributed processing of large data sets across clusters of computers.
Key Components of Hadoop
HDFS (storage), YARN (resource management), and MapReduce (parallel processing).