First Midterm Flashcards

1
Q

What does NoSQL stand for?

A

Not Only SQL

It refers to a set of modern databases that don’t use traditional relational models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a key trait of NoSQL databases?

A

Schema-less

Data can be stored as key-value pairs, documents, columns, or graphs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the main purpose of NoSQL databases?

A

Designed to handle Big Data and unstructured/semi-structured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are NoSQL databases needed?

A

Traditional SQL databases can’t scale for volume, variety, and velocity of modern data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 7 V’s of Big Data?

A
  • Volume: Enormous amounts of data (ZB, YB)
  • Velocity: Speed of incoming data (real-time)
  • Variety: Different formats (text, video, images, JSON, etc.)
  • Variability: Data meaning can change with time
  • Veracity: Trustworthiness and quality of data
  • Visualization: Displaying complex data clearly (charts, graphs)
  • Value: Extracting useful insights from data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Who defined Relational Databases (RDBMS) and when?

A

Edgar Codd in 1970.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the ACID properties in RDBMS?

A
  • Atomicity: Transactions are all-or-nothing.
  • Consistency: Data remains valid after a transaction.
  • Isolation: Transactions don’t interfere.
  • Durability: Once saved, data won’t be lost.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the limitations of RDBMS?

A
  • Fixed schemas
  • Hard to scale horizontally
  • Not suitable for massive, real-time, or semi-structured data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main characteristics of NoSQL databases?

A
  • Horizontal Scalability
  • Schema-less
  • High Performance
  • Open-source and cost-effective
  • Eventual Consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is eventual consistency in NoSQL?

A

Data updates are not always immediately visible across all nodes but will eventually become consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the types of NoSQL databases?

A
  • Key-Value Stores
  • Document Databases
  • Column-Family Stores
  • Graph Databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Key-Value Store?

A

The simplest form of a NoSQL database, storing data as (key, value) pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the basic operations of Key-Value Stores?

A
  • put(key, value)
  • get(key)
  • delete(key)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the BASE model in NoSQL?

A
  • Basically Available: System is always available (even if partially)
  • Soft state: Data can change over time, not always stable
  • Eventual consistency: Data will become consistent… eventually
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the CAP theorem?

A
  • Consistency (C): All users see the same data.
  • Availability (A): Every request gets a response, even if it’s partial.
  • Partition Tolerance (P): The system continues to work even if network failures occur.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two types of scalability?

A
  • Vertical Scaling (Scaling up)
  • Horizontal Scaling (Scaling out)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is sharding in NoSQL?

A

Splits data across multiple machines based on a key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the difference between replication and sharding?

A
  • Replication: Duplicates the same data across multiple nodes.
  • Sharding: Splits data across multiple machines.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does durability mean in the context of ACID?

A

Once a transaction is committed, it won’t be lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the MapReduce programming model?

A

A three-phase model consisting of Map, Shuffle, and Reduce phases.

21
Q

What does the Map function do in MapReduce?

A

Processes each record and outputs zero or more (key, value) pairs.

22
Q

What is the Shuffle phase in MapReduce?

A

Groups intermediate pairs with the same key, managed by the framework.

23
Q

What is the Reduce phase in MapReduce?

A

Applies the reduce function to each unique key to generate final output.

24
Q

What is Hadoop?

A

An open-source implementation of MapReduce designed to process large datasets.

25
What are the components of the Hadoop framework?
* Input Reader * Map Function * Partition Function * Shuffle/Sort * Reduce Function * Output Writer
26
What is the role of the NameNode in HDFS?
Master server that manages metadata.
27
What is the role of DataNodes in HDFS?
Worker nodes that store actual blocks and send heartbeats to NameNode.
28
What is the purpose of a combiner in MapReduce?
Acts like a mini-reducer for partial aggregation to reduce network traffic.
29
When should you avoid using Key-Value Stores?
* Need relationships between data * Require multi-key transactions * Need queries based on values * Need batch operations
30
What is the structure of a Key-Value Store similar to?
A relational database table with two columns: key and value.
31
What is the goal of the MapReduce word count example?
Count how many times each word appears in a set of documents.
32
What is the significance of 'shared nothing' architecture in MapReduce?
Nodes don’t share memory or disk, enhancing parallel processing and fault tolerance.
33
What are the conditions for using value stores?
• Relationships between data (e.g., foreign keys) • Multi-key transactions (e.g., all-or-nothing updates) • Queries based on values (not keys) • Batch operations (e.g., filtering or grouping multiple records) ## Footnote Value stores are optimal when these specific needs arise.
34
What is Redis?
Redis (REmote DIctionary Server) is an open-source, in-memory, key-value store. ## Footnote Redis supports complex data types unlike traditional key-value stores.
35
What are the key features of Redis?
• Data types: strings, lists, sets, sorted sets, hashes. • Atomic operations on data structures. • Persistence options: Snapshots (RDB), Append-only file (AOF) • Pub/Sub messaging • Transactions • Master-slave replication • Automatic failover (Redis Sentinel) • Redis Cluster (for partitioning) ## Footnote These features make Redis a versatile choice for various applications.
36
How does Redis store data?
Data is stored in RAM, making reads and writes extremely fast. ## Footnote The dataset must fit into memory, and Redis can persist data to disk periodically.
37
What happens if Redis runs out of memory?
• Killed by OS • Crashes • Slows down ## Footnote Monitoring with the INFO command is recommended to avoid these issues.
38
What is the maximum length of strings in Redis?
512 MB ## Footnote Redis strings can hold large amounts of data.
39
What commands are used with Redis Lists?
• LPUSH • RPUSH • LPOP • RPOP • LRANGE • LLEN ## Footnote These commands allow for manipulation of ordered collections of strings.
40
What are the characteristics of Sets in Redis?
• Unordered collection of unique strings. • Set operations: SADD, SREM, SINTER, SUNION, SISMEMBER ## Footnote Sets ensure that each element is unique.
41
What is the purpose of Redis Transactions?
Commands are queued using MULTI and executed together using EXEC. ## Footnote If one command fails, others still run.
42
What does Redis Master-Slave Replication entail?
One master handles all writes, and one or more slaves copy the master's data and handle reads. ## Footnote Replication is asynchronous; the master continues working while syncing.
43
What is the process of Redis Replication when a slave connects to a master?
1. Slave sends a SYNC request. 2. Master creates a snapshot. 3. Master buffers any new changes. 4. Snapshot is sent to the slave. 5. Slave loads the snapshot and applies buffered changes. 6. Partial sync is possible in Redis 2.8+ ## Footnote This process ensures that slaves receive the most updated data without impacting the master.
44
What are the two methods of Redis Persistence?
1. RDB (Snapshot) 2. AOF (Append-Only File) ## Footnote RDB takes snapshots periodically, while AOF logs every write command.
45
What security feature does Redis provide?
You can set a password in the Redis config file. ## Footnote Clients must authenticate to interact with Redis, protecting data from unauthorized access.
46
True or False: Redis can provide consistency, availability, and partition tolerance simultaneously.
False ## Footnote Redis cannot achieve all three characteristics of the CAP theorem at the same time.
47
What are the types of Partitioning in Redis?
1. Range Partitioning 2. Hash Partitioning ## Footnote Range partitioning distributes data based on specified ranges, while hash partitioning uses a hash function.
48
What is the default behavior of Redis when it comes to consistency and availability?
Master writes, slaves read, which may cause slightly outdated data. ## Footnote This behavior prioritizes availability over strict consistency.
49
What are common use cases for Redis?
• Caching • Real-time analytics • Queues • Session store ## Footnote Redis is widely used in applications requiring fast data access.