Big Data and NoSQL Flashcards

Question 1

Q

what is data storage

Answer

A

Data storage focuses on how data is organized and saved, involving structures, formats, and systems

Question 2

Q

What is data processing

Answer

A

data processing emphasizes how data is accessed, manipulated, and transformed through queries,
transactions, and analytics.

Question 3

Q

Explain the role of Big Data in modern business

Answer

A

Big data: Refers to datasets characterized by volume, velocity, and variety, making them unsuitable for
traditional relational database management systems. It involves managing large datasets that align with these
3V’s. Big data allows businesses to generate and track continuous data streams, enabling real-time processing
and insights.

Question 4

Q

Characteristics of Big Data and how these go beyond the “3 Vs”

Answer

A

Volume
Velocity
Variety

Question 5

Q

What is volume

Answer

A

Volume: Refers to the vast amount of data generated. It can be handled through:
* Scaling Up: Upgrading existing systems to handle larger loads.
* Scaling Out: Distributing the load across multiple servers when a single server’s capacity is exceeded.

Question 6

Q

What is velocity

Answer

A

Velocity: The speed at which data is generated and must be processed.
* Stream Processing: Analysing data in real-time as it flows into the system.
* Feedback Loop Processing: Analysing data to produce actionable insights immediately.

Question 7

Q

what is variety

Answer

A

Variety: The different types of data (structured and unstructured) that need to be stored.
* Structured Data: Fits into a predefined model (e.g., relational databases).
* Unstructured Data: Does not fit into a predefined model (e.g., text, images).

Question 8

Q

what are the core component s of hadoop

Answer

A

Hadoop Distributed File System (HDFS): A distributed file system that stores data across many machines
and provides high throughput access.
MapReduce: A programming model that processes large data sets in parallel across a distributed cluster.

Question 9

Q

What is hadoop framework

Answer

A

Hadoop: is a Java-based framework designed for the distributed storage and processing of large data sets
across clusters of computers.

Question 10

Q

HDFS Key Features

Answer

A

High Volume: Default block sizes can be quite large (up to 64 MB or more).
Write-Once, Read-Many: Simplifies concurrency and improves throughput.
Fault Tolerance: Data is replicated across different devices to ensure availability in case of failures.

Question 11

Q

MapReduce Framework:

Answer

A

Map Function: Processes input data into key-value pairs.
Reduce Function: Aggregates and summarizes the results of the map function

Question 12

Q

Identify the major components of the Hadoop ecosystem

Answer

A

Hadoop Ecosystem: A set of related tools and applications that complement Hadoop.

Question 13

Q

examples the hadoop ecosystem

Answer

A

Hive: A data warehousing solution that uses SQL-like queries.
Pig: A scripting language for creating MapReduce jobs.
HBase: A NoSQL database that runs on top of HDFS.

Question 14

Q

Four major approaches of the noSQL data model and how they differ from the relational model

Answer

A

Key-Value Stores - difference - Schema-less, simple queries, highly scalable
Document Databases - difference - Flexible schemas, content-based querying, data stored together.
Column-Family - difference - Optimized for specific queries, varied column schemas, highly available and scalable.
Graph Databases - difference - Direct relationship modeling, efficient traversal, adaptable schemas

Question 15

Q

Describe the characteristics of NewSQL databases

Answer

A

NewSQL databases: Aim to bridge the gap between traditional relational databases (RDBMS) and NoSQL
databases. While RDBMS are essential for supporting ACID-compliant transactions in everyday business
operations, NoSQL databases focus on managing large volumes of user-generated and machine-generated
data across distributed systems.

Question 16

Q

NewSQL databases, like ClusterixDB and NuoDB, incorporate features from both RDBMS and NoSQL, offering:

Answer

A

Similarities to RDBMS:
SQL as the primary interface
ACID-compliant transactions
Similarities to NoSQL:
Support for highly distributed clusters
Key-value or column-oriented data storage