big_data_flashcards (2)

1
Q

Definition of Big Data

A

Big Data refers to datasets that are so large and complex that traditional data processing software is inadequate to deal with them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Characteristics of Big Data

A

Volume, Velocity, Variety, Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Volume

A

The large quantity of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Velocity

A

The speed at which data is produced and analyzed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variety

A

Data can be structured, unstructured, or multimedia.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Veracity

A

The trustworthiness and quality of data, which may be inconsistent or ambiguous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Use of Big Data as a Noun

A

Refers to the data itself, e.g., ‘We have big data’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Use of Big Data as an Adjective

A

Refers to tools and processes used to handle data, e.g., ‘big data tools’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Common Definitions of Big Data

A

Exceeds the capabilities of typical database software to store, manage, and analyze.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hype Around Big Data

A

Data has always been powerful, but new sources and exponential growth drive new opportunities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Importance of Long-Tail Data

A

Significant value comes from niche data items combined, as seen in cases like Amazon’s sales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Success Stories

A

Big Data used in crime prevention, genetic disease diagnosis, finance, personalized advertising, and sports injury prevention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Challenges of Big Data Processing

A

Data volume, parallelization, fault tolerance, and system-level management.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Apache Hadoop

A

An open-source framework that allows for the distributed processing of large data sets across clusters of computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Key Components of Hadoop

A

HDFS (storage), YARN (resource management), and MapReduce (parallel processing).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Big Data Scenarios

A

Analytics (batch), Interactive (near-real-time), and Streaming (real-time).

17
Q

Data Lake

A

A central repository for storage, processing, and analysis of raw data, kept in its original format.

18
Q

NoSQL and NewSQL

A

NoSQL DBMSs focus on scalability and flexibility without ACID transactions, while NewSQL combines relational and NoSQL benefits.

19
Q

Techniques for Big Data Analysis

A

ETL, data integration, data mining, machine learning, supervised and unsupervised learning.

20
Q

Goals of Analytics

A

Descriptive (what happened), Diagnostic (why it happened), Predictive (what is likely to happen), and Prescriptive (actions to take).

21
Q

Big Data Job Roles

A

Data Analyst, Data Architect, Data Engineer, Data Scientist.

22
Q

Skills of a Data Scientist

A

Statistical analysis, programming, data visualization, machine learning, domain knowledge.

23
Q

Conclusion on Big Data

A

Big data offers great opportunities but comes with challenges in management and technological adaptation.