Papers for Scheduling, Memory Management, IPC, and Sync Flashcards

Question 1

Q

What is the premise of Fedorova’s paper on scheduling for chip multi processors?

Answer

A

That scheduling can have a high impact on CPU performance and can sometimes even counteract the benefits of smart chips

Question 2

Q

Where does Fedorova consider the bottleneck to be for CPU performance?

Answer

A

contention for processor pipeline because that is resource that single vs multi-threaded processors are competing for

Question 3

Q

What types of threads should be scheduled on the same CPU according to Fedorova?

Answer

A

Threads that won’t compete for the same resources as each other.

Question 4

Q

What is Instruction delay latency?

Answer

A

delay between when an instruction starts and when it is completed (i.e. how long is the thread blocked by an instruction). Can be used to approximate processor pipeline performance

Question 5

Q

How do we approximate the instruction delay latency?

Answer

A

Use the mean cycles-per-instruction (CPI), which is already supported by the hardware. High CPI == higher resource usage, low CPI == lower resource usage because that means that thread is blocked a lot

Question 6

Q

What is the outcome of Fedorova paper?

Answer

A

That we should schedule threads with high and low CPIs together

Question 7

Q

What is the test_and_set implementation?

Answer

A

while(test_and_set(lock) = BUSY) {}. (+) good latency and low delay, but (-) lots of contention

Question 8

Q

What is the test_and_test_and_set implementation?

Answer

A

While((lock == busy) or (test_and_set(lock) == busy));

This is better because we spin on a cache lookup instead of a main memory lookup == better contention

Question 9

Q

What is the spin lock delay implementation?

Answer

A

An improvement on test_and_test_and_set. Set a delay in the while loop. This helps decrease contention, but causes more delay. Can have static or dynamic delay

Question 10

Q

What is a queuing lock?

Answer

A

waiting threads are placed in a queue. Requires more memory and has lower latency, but low delay and has best contention

Question 11

Q

What are the considerations for spin lock performance according to Anderson?

Answer

A

Want to minimize the communication from spinning processors on the memory buses
Want to minimize delay between releasing a lock and a new thread acquiring it
Want to minimize the time it takes to acquire a lock when it is free (don’t want lots of checking to accomplish 1 and 2)

Question 12

Q

What is quiescence?

Answer

A

time it takes for all of the waiting processors to each try their test_and_set when a lock is released

Question 13

Q

What is the main problem with test_and_set according to Anderson?

Answer

A

Lock is released -> invalidates caches
- Each processor tries a test_and_set at the same time–> creates a queue of test_and_set operations that are waiting to happen
- Each test_and_set invalidates every cache
- Processors cannot unqueue their test_and_set operation
- This is worse when the critical section is short

Question 14

Q

What are the hardware improvements that Anderson mentions for test_and_set?

Answer

A

Provide parallel access to memory locations -> reduces test_and_set overhead
Only invalidate cache if test_and_set changes the value

Papers for Scheduling, Memory Management, IPC, and Sync Flashcards

(14 cards)