Big Data and NoSQL Flashcards

1
Q

what is data storage

A

Data storage focuses on how data is organized and saved, involving structures, formats, and systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data processing

A

data processing emphasizes how data is accessed, manipulated, and transformed through queries,
transactions, and analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the role of Big Data in modern business

A

Big data: Refers to datasets characterized by volume, velocity, and variety, making them unsuitable for
traditional relational database management systems. It involves managing large datasets that align with these
3V’s. Big data allows businesses to generate and track continuous data streams, enabling real-time processing
and insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Characteristics of Big Data and how these go beyond the “3 Vs”

A
  1. Volume
  2. Velocity
  3. Variety
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is volume

A
  1. Volume: Refers to the vast amount of data generated. It can be handled through:
    * Scaling Up: Upgrading existing systems to handle larger loads.
    * Scaling Out: Distributing the load across multiple servers when a single server’s capacity is exceeded.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is velocity

A
  1. Velocity: The speed at which data is generated and must be processed.
    * Stream Processing: Analysing data in real-time as it flows into the system.
    * Feedback Loop Processing: Analysing data to produce actionable insights immediately.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is variety

A
  1. Variety: The different types of data (structured and unstructured) that need to be stored.
    * Structured Data: Fits into a predefined model (e.g., relational databases).
    * Unstructured Data: Does not fit into a predefined model (e.g., text, images).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the core component s of hadoop

A
  • Hadoop Distributed File System (HDFS): A distributed file system that stores data across many machines
    and provides high throughput access.
  • MapReduce: A programming model that processes large data sets in parallel across a distributed cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is hadoop framework

A

Hadoop: is a Java-based framework designed for the distributed storage and processing of large data sets
across clusters of computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

HDFS Key Features

A
  • High Volume: Default block sizes can be quite large (up to 64 MB or more).
  • Write-Once, Read-Many: Simplifies concurrency and improves throughput.
  • Fault Tolerance: Data is replicated across different devices to ensure availability in case of failures.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MapReduce Framework:

A
  • Map Function: Processes input data into key-value pairs.
  • Reduce Function: Aggregates and summarizes the results of the map function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Identify the major components of the Hadoop ecosystem

A

Hadoop Ecosystem: A set of related tools and applications that complement Hadoop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

examples the hadoop ecosystem

A
  • Hive: A data warehousing solution that uses SQL-like queries.
  • Pig: A scripting language for creating MapReduce jobs.
  • HBase: A NoSQL database that runs on top of HDFS.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Four major approaches of the noSQL data model and how they differ from the relational model

A
  1. Key-Value Stores - difference - Schema-less, simple queries, highly scalable
  2. Document Databases - difference - Flexible schemas, content-based querying, data stored together.
  3. Column-Family - difference - Optimized for specific queries, varied column schemas, highly available and scalable.
  4. Graph Databases - difference - Direct relationship modeling, efficient traversal, adaptable schemas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the characteristics of NewSQL databases

A

NewSQL databases: Aim to bridge the gap between traditional relational databases (RDBMS) and NoSQL
databases. While RDBMS are essential for supporting ACID-compliant transactions in everyday business
operations, NoSQL databases focus on managing large volumes of user-generated and machine-generated
data across distributed systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NewSQL databases, like ClusterixDB and NuoDB, incorporate features from both RDBMS and NoSQL, offering:

A
  • Similarities to RDBMS:
  • SQL as the primary interface
  • ACID-compliant transactions
  • Similarities to NoSQL:
  • Support for highly distributed clusters
  • Key-value or column-oriented data storage