Big Data Engineering Flashcards

1
Q

What is Data Engineering

A

The practice of designing and building systems for data aggregation, storage, and analysis at scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Data Analytics.

A

The science of fusing heterogeneous data, identifying relationships, making predictions, and supporting decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Big Data?

A

Large volumes of structured or unstructured data with high variety, velocity, and complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 5 Vs of Big Data?

A

Volume: Large amounts of data

Variety: Different types and sources
Velocity: Speed of data generation
Veracity: Trustworthiness of data
Value: Business and strategic importance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is Big Data significant?

A

Increasing data generation

Improved data storage and analysis capabilities
High business and research value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the major classifications of Data Analytics?

A

Descriptive Analytics: Summarizing past data

Predictive Analytics: Forecasting future trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are key processes in the Big Data lifecycle?

A
  1. Acquisition
  2. Extraction
  3. Integration
  4. Analysis
  5. Interpretation
  6. Decision-making
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the contents of a Data Lake

A

Stores raw data
Stores any type of structured or unstructured data
Agile and low cost
Used for ML/IoT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe contents of a Data Warehouse

A

Stores processed data
Stores structured data
Less agile and expensive
Used for business intelligence, healthcare analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does 1.acquisition require?

A

Selection
Filtering
Metadata generation
Managing provenance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does 2.extraction require?

A

Transformation
Normalization
Cleaning
Aggregation
Error handling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does 3.integration require?

A

Standardization
Conflict management
Reconciliation
Mapping definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does 4.analysis require?

A

Exploration
Data mining
Machine Learning
Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does 5.interpretation require?

A

Knowledge of the domain
Knowledge of the provenance
Identification of patterns of interest
Flexibility of the process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does 6.decision require?

A

Managerial skills
Continuous improvement of the project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the software stack for data analytics?

A
  1. Ingestion
  2. Storage (HDFS)
  3. Data preprocessing (Spark)
  4. Knowledge extraction