Concurrency, Parallelism, Distributed Computing Flashcards

1
Q

Threads vs. Processes

A

Definition:
Threads share memory within the same process, while processes run independently with separate memory spaces.

  • Threads: lightweight, easier sharing of data, but must handle synchronization to avoid conflicts (e.g. mutex or semaphore).
  • Processes: more overhead, but safer isolation (less risk of data corruption).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mutex vs. Semaphore

A

Definition:
A mutex (mutual exclusion) allows only one thread at a time to access a resource, while a semaphore can allow multiple concurrent accesses (permits).

  • Mutex: typically “locked” or “unlocked” (binary).
  • Semaphore: can be counting (supports multiple permits) or binary (similar to a mutex).
  • Both are used to synchronize threads and avoid race conditions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Deadlock

A

Definition:
Occurs when two or more threads or processes block each other, each waiting for a resource that the others hold.

  • Four conditions: Mutual exclusion, Hold and wait, No preemption, Circular wait.
  • Prevent by careful resource ordering, lock timeouts, or deadlock detection.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Race Condition

A

Definition:
Multiple threads or processes access and modify shared data without proper synchronization, leading to unpredictable or incorrect results.

  • Commonly fixed via locks, atomic operations, or other synchronization mechanisms.
  • Debugging can be difficult; best prevented by design.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Parallelism vs. Concurrency

A

Definition:
Concurrency is about dealing with multiple tasks over the same time period (not necessarily simultaneously); Parallelism is about executing tasks simultaneously using multiple cores/CPUs.

  • Concurrency = managing lots of tasks at once (structure).
  • Parallelism = running tasks at the exact same time (execution).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Threading

A

Definition:
Creating multiple threads within a process to perform tasks concurrently.

  • Can improve throughput for I/O-bound tasks.
  • For CPU-bound tasks in languages like Python (due to the GIL), might not see true parallel speedups.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Apache Spark

A

Definition:
An open-source distributed computing framework for big data processing, with APIs in Scala, Java, Python, and R.

  • Performs in-memory computations via resilient distributed datasets (RDDs) or DataFrames for speed.
  • Includes high-level libraries for SQL, streaming, machine learning, and graph processing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dask

A

Definition:
A Python library for parallel computing that extends NumPy, Pandas, and scikit-learn APIs to larger-than-memory or distributed datasets.

  • Uses “lazy” evaluation and task scheduling across multiple cores or machines.
  • Ideal for scaling Python code without switching to completely different ecosystems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In-Memory vs. Out-of-Memory Computations

A

Definition:
In-memory computations hold data in RAM for faster processing, while out-of-memory (OOM) computations handle data larger than RAM by streaming or chunking it off disk.

  • In-memory solutions (e.g., Spark, Dask) significantly reduce I/O overhead but need sufficient RAM.
  • OOM solutions trade speed for the ability to handle very large datasets.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MapReduce (Concept)

A

Definition:
A programming model for distributed processing of large data sets across clusters (popularized by Hadoop).

  • Map step: transforms or filters data into key-value pairs.
  • Reduce step: aggregates or summarizes data by keys.
  • Foundation for many big-data processing frameworks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly