First Midterm Flashcards

Question

What are the components of the Hadoop framework?

Answer 1

* Input Reader * Map Function * Partition Function * Shuffle/Sort * Reduce Function * Output Writer

Answer 2

Master server that manages metadata.

Answer 3

Worker nodes that store actual blocks and send heartbeats to NameNode.

Answer 4

Acts like a mini-reducer for partial aggregation to reduce network traffic.

Answer 5

* Need relationships between data * Require multi-key transactions * Need queries based on values * Need batch operations

Answer 6

A relational database table with two columns: key and value.

Answer 7

Count how many times each word appears in a set of documents.

Answer 8

Nodes don’t share memory or disk, enhancing parallel processing and fault tolerance.

Answer 9

• Relationships between data (e.g., foreign keys) • Multi-key transactions (e.g., all-or-nothing updates) • Queries based on values (not keys) • Batch operations (e.g., filtering or grouping multiple records) ## Footnote Value stores are optimal when these specific needs arise.

Answer 10

Redis (REmote DIctionary Server) is an open-source, in-memory, key-value store. ## Footnote Redis supports complex data types unlike traditional key-value stores.

Answer 11

• Data types: strings, lists, sets, sorted sets, hashes. • Atomic operations on data structures. • Persistence options: Snapshots (RDB), Append-only file (AOF) • Pub/Sub messaging • Transactions • Master-slave replication • Automatic failover (Redis Sentinel) • Redis Cluster (for partitioning) ## Footnote These features make Redis a versatile choice for various applications.

Answer 12

Data is stored in RAM, making reads and writes extremely fast. ## Footnote The dataset must fit into memory, and Redis can persist data to disk periodically.

Answer 13

• Killed by OS • Crashes • Slows down ## Footnote Monitoring with the INFO command is recommended to avoid these issues.

Answer 14

512 MB ## Footnote Redis strings can hold large amounts of data.

Answer 15

• LPUSH • RPUSH • LPOP • RPOP • LRANGE • LLEN ## Footnote These commands allow for manipulation of ordered collections of strings.

Answer 16

• Unordered collection of unique strings. • Set operations: SADD, SREM, SINTER, SUNION, SISMEMBER ## Footnote Sets ensure that each element is unique.

Answer 17

Commands are queued using MULTI and executed together using EXEC. ## Footnote If one command fails, others still run.

Answer 18

One master handles all writes, and one or more slaves copy the master's data and handle reads. ## Footnote Replication is asynchronous; the master continues working while syncing.

Answer 19

1. Slave sends a SYNC request. 2. Master creates a snapshot. 3. Master buffers any new changes. 4. Snapshot is sent to the slave. 5. Slave loads the snapshot and applies buffered changes. 6. Partial sync is possible in Redis 2.8+ ## Footnote This process ensures that slaves receive the most updated data without impacting the master.

Answer 20

1. RDB (Snapshot) 2. AOF (Append-Only File) ## Footnote RDB takes snapshots periodically, while AOF logs every write command.

Answer 21

You can set a password in the Redis config file. ## Footnote Clients must authenticate to interact with Redis, protecting data from unauthorized access.

Answer 22

False ## Footnote Redis cannot achieve all three characteristics of the CAP theorem at the same time.

Answer 23

1. Range Partitioning 2. Hash Partitioning ## Footnote Range partitioning distributes data based on specified ranges, while hash partitioning uses a hash function.

Answer 24

Master writes, slaves read, which may cause slightly outdated data. ## Footnote This behavior prioritizes availability over strict consistency.

Answer 25

• Caching • Real-time analytics • Queues • Session store ## Footnote Redis is widely used in applications requiring fast data access.