big_data_flashcards (2)
Definition of Big Data
Big Data refers to datasets that are so large and complex that traditional data processing software is inadequate to deal with them.
Characteristics of Big Data
Volume, Velocity, Variety, Veracity
Volume
The large quantity of data.
Velocity
The speed at which data is produced and analyzed.
Variety
Data can be structured, unstructured, or multimedia.
Veracity
The trustworthiness and quality of data, which may be inconsistent or ambiguous.
Use of Big Data as a Noun
Refers to the data itself, e.g., ‘We have big data’.
Use of Big Data as an Adjective
Refers to tools and processes used to handle data, e.g., ‘big data tools’.
Common Definitions of Big Data
Exceeds the capabilities of typical database software to store, manage, and analyze.
Hype Around Big Data
Data has always been powerful, but new sources and exponential growth drive new opportunities.
Importance of Long-Tail Data
Significant value comes from niche data items combined, as seen in cases like Amazon’s sales.
Success Stories
Big Data used in crime prevention, genetic disease diagnosis, finance, personalized advertising, and sports injury prevention.
Challenges of Big Data Processing
Data volume, parallelization, fault tolerance, and system-level management.
Apache Hadoop
An open-source framework that allows for the distributed processing of large data sets across clusters of computers.
Key Components of Hadoop
HDFS (storage), YARN (resource management), and MapReduce (parallel processing).
Big Data Scenarios
Analytics (batch), Interactive (near-real-time), and Streaming (real-time).
Data Lake
A central repository for storage, processing, and analysis of raw data, kept in its original format.
NoSQL and NewSQL
NoSQL DBMSs focus on scalability and flexibility without ACID transactions, while NewSQL combines relational and NoSQL benefits.
Techniques for Big Data Analysis
ETL, data integration, data mining, machine learning, supervised and unsupervised learning.
Goals of Analytics
Descriptive (what happened), Diagnostic (why it happened), Predictive (what is likely to happen), and Prescriptive (actions to take).
Big Data Job Roles
Data Analyst, Data Architect, Data Engineer, Data Scientist.
Skills of a Data Scientist
Statistical analysis, programming, data visualization, machine learning, domain knowledge.
Conclusion on Big Data
Big data offers great opportunities but comes with challenges in management and technological adaptation.