Question 5 Flashcards

1
Q

What is Big Data?

A

Refers to datasets characterized by volume, velocity, and variety, making them unsuitable for traditional relational database management systems. It involves managing large datasets that align with these 3V’s. Big data allows businesses to generate and track continuous data streams, enabling real-time processing and insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 V’s of Big Data?

A
  • Volume
  • Velocity
  • Variety
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Volume in Big Data?

A

Volume: Refers to the vast amount of data generated. It can be handled through:

  • Scaling Up: Upgrading existing systems to handle larger loads.
  • Scaling Out: Distributing the load across multiple servers when a single server’s capacity is exceeded.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Velocity in Big Data?

A

Velocity: The speed at which data is generated and must be processed.

  • Stream Processing: Analysing data in real-time as it flows into the system.
  • Feedback Loop Processing: Analysing data to produce actionable insights immediately.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Variety in Big Data?

A

Variety: The different types of data (structured and unstructured) that need to be stored.

  • Structured Data: Fits into a predefined model (e.g., relational databases).
  • Unstructured Data: Does not fit into a predefined model (e.g., text, images).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is NoSQL?

A

Non-relational database technologies developed to address Big Data challenges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does NoSQL differ from Relational Model in key values?

A

NoSQL Structure:
- Unique Key-value pairs
Differences from Relational:
- Schema-less, simple queries, highly scalable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does NoSQL differ from a Relational Model in Document Databases?

A

NoSQL Structure:
- Documents (e.g. JSON)
Differences from Relational:
- Flexible schemas, content-based querying, data stored together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does NoSQL differ from a Relational Model in Column-family Stores?

A

NoSQL Structure:
- Data stored in Columns
Differences from Relational:
- Optimized for specific queries, varied column schemas, highly available and scalable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does NoSQL differ from a Relational Model in Graph Databases?

A

NoSQLStructure:
- Nodes and edges
Differences from Relational:
- Direct relationship modelling, efficient traversal, adaptable schemas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Hadoop framework?

A

A Java-based framework designed for the distributed storage and processing of large data sets across clusters of computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the 2 core components of the Hadoop Framework.

A
  • Hadoop Distributed File System (HDFS): A distributed file system that stores data across many machines and provides high throughput access.
  • MapReduce: A programming model that processes large data sets in parallel across a distributed cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the major components of the Hadoop ecosystem?

A
  • Hive: A data warehousing solution that uses SQL-like queries.
  • Pig: A scripting language for creating MapReduce jobs.
  • HBase: A NoSQL database that runs on top of HDFS.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is data storage?

A

Data storage focuses on how data is organized and saved, involving structures, formats, and systems, while data processing emphasizes how data is accessed, manipulated, and transformed through queries, transactions, and analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly