Multicore Flashcards

1
Q

Diagram:

Single-Core CPU Chip

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Overview of

Multicore Architectures

A
  • Replicate multiple processor cores on a single die
  • The cores fit into a single processor socket
  • Also called Chip Multi-Processor ( CMP )
  • Cores run in parallel
  • Within each core, threads are time-sliced (just like on a uniprocessor)
  • OS percieves each core as a separate processor
  • Scheduler maps threads/processes to different cores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Multicore:

Interaction with the

Operating System

A
  • OS perceives each core as a separate processor
  • OS Scheduler maps threads/processes to different cores
  • OS is likely multi-threaded itself, scheduling it’s own use of the cores
  • Most major OSs support multi-cores today:
    • Windows, Linux, Mac OS X, …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Motivation for using

Multicore Processors

A
  • It is difficult to make single-core clock frequencies even higher
  • Deeply pipelined circuits:
    • heat problems
    • Interconnect delays dominate
    • difficult design and verification
    • large design teams necessary
  • Many new applications are multithreaded
  • General trend in computer architecture
    • Shift towards more parallelism
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Instruction Level

Parallelism

A
  • Parallelism at the machine-instruction level
  • The processor can
    • re-order instructions,
    • pipeline instructions
    • split instructions into microinstructions
    • do aggressive branch prediction
    • etc
  • Instruction-Level parallelism enabled rapid increases in processor speeds over the last 15 years
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Instruction Level

Improvements

A
  • Architectural improvements have become small and incremental:
    • Additional circuitry contributes little to application performance
  • More likely additional interconnect delays will slow processor’s cycle time, reducing performance for all applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Thread-Level Parallelism (TLP)

A
  • Parallelism on a more coarse scale
  • Server can serve each client in a separate thread
  • A computer game can do AI, graphics and physics on three separate threads
  • Single-Core superscalar processor cannot fully exploit TLP
  • Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multiprocessors:

Definition

A

Any computer with several processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multiprocessors:

Types

A

Single Instruction Multiple Data (SIMD)

  • ex: Modern Graphics Cards

Multiple Instructions, Multiple Data (MIMD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multiprocessors:

Memory Types

A

Shared Memory

In this model, there is one(large) common shared memory for all processors

Distributed Memory

In this model, each processor has its own(small) local memory.

It’s content is not replicated anywhere else.

Processors have some other communication mechanism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a “Multi-Core” Processor?

A
  • A special kind of multiprocessor
    • All processors are on the same chip
  • Multicore processors are MIMD:
    • Different cores execute different threads( Multiple Instructions)
    • operating in different parts of memory (Multiple Data)
  • Multi-core is a shared memory multiprocessor:
    • All cores share the same memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Applications

that benefit from

Multi-Core Architecture

A
  • Database Servers
  • Web Servers
  • Compilers
  • Multimedia applications
  • Scientific applications, CAD/CAM
  • In general, applications with Thread-Level parallelism
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Simultaneous Multithreading (SMT)

A
  • A technique complementary to multi-core
  • Addresses the problem of the processor pipeline getting stalled
  • Permits multiple independent threads to execute simultaneously on the same core
  • Weaving together multiple threads on the same core
  • Without SMT, only a single thread can run at any given time
  • Cannot simultaneously use the same functional unit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Processor Pipeline Stall:

Two Causes

A
  • Waiting for the result of a long floating point or integer operation
  • Waiting for data to arrive from memory
    • Other execution units wait unused if no SMT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why SMT is not a “true” Parallel Processor

A
  • Enables better threading (e.g. up to 30%)
  • OS and applications perceive each simultaneous thread as a separate “virtual processor”
  • The chip has only a single copy of each resource
  • Compare to multicore:
    • Each core has its own copy of resources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Combining

Multi-Core and

SMT

A
  • Cores can be SMT-enabled (or not)
  • Number of SMT threads:
    • 2, 4, or something 8 simultaneous threads
  • Intel calls them “hyperthreads”

Different Combinations:

  • Single-Core, non-SMT (standard uniprocessor)
  • Single-Core, SMT
  • Multi-Core, non-SMT
  • Multi-Core, SMT
17
Q

Comparison:

Multi-Core vs SMT

A

Multicore:

  • Several cores, each is smaller and not as powerful
  • Easier to desgin and manufacture
  • Great with thread-level parallelism

SMT:

  • Can have one large and fast superscaler core
  • Great performance on a single thread
  • Mostly still only exploits instruction-level parallelism
18
Q

Memory Hierarchy:

  • SMT
  • Multi-Core Chips
A

Simultaneous Multithreading Only:

All caches are shared

Multicore Chips:

  • L1 caches are private
  • L2 caches private in some architectures, shared in others

*Memory is always shared

19
Q

What are “Fish” Machines?

A
  • Dual-core Intel Xeon processors
  • Each core is hyper-threaded
  • Private L1 caches
  • Shared L2 caches
20
Q

Advantages of

Private Caches

A
  • Closer to core, so faster access
  • Reduces contention
21
Q

Advantages of

Shared Caches

A
  • Threads on different cores can share the same cache data
  • More cache space available if a single (or a few) high-performance thread runs on the system
22
Q

Cache Coherence Problem

A

Since multicore has private caches,

how to keep data consistent across caches?

  • Each core should perceive memory as a shared, monolithic array
  • One core copies something into its cache, makes changes, and writes back to memory
  • But a second core reads the stale copy before core 1 writes back into memory
  • This is a general problem with multiprocessors, not just multicore
  • There are many solution algorithms and coherence protocols designed to deal with this
23
Q

Cache Coherence:

Simple Solution

A

Invalidation-based protocol

with snooping

Alternatively: Update protocol

24
Q

Cache Coherence:

What is “snooping”?

A

All cores continuously “snoop”, or monitor,

the bus connecting the cores

25
Q

Cache Coherence:

Invalidation Protocol

Basic Idea

A

If a core writes to a data item,

all other copies of this data item in other caches

become invalidated.

This is accomplished by sending an invalidation request on the bus.

26
Q

Cache Coherence:

Update Protocol

A

Upon changing a data item,

a core broadcasts the updated value on the bus.

*Alternative to the Invalidate Protocol.

27
Q

Cache Coherence:

Invalidation Protocol

vs

Update Protocol

A
  • When performing multiple writes to the same location:
    • Invalidation:
      • only used on the first write
    • Update:
      • must broadcast each write, including the new variable value
  • Invalidation protocol generally performs better:
    • generates less bus traffic
    • typically requires less logic
28
Q

Cache Coherence;

Advanced Invalidation Protocols

A
  • More sophisticated protocols use extra cache state bits
  • State Bits:
    • M - Modified
    • E - Exclusive
    • S - Shared
    • I - Invalid
  • Protocols can be MSI, or MESI
  • Note: Memory used as semaphores has special requirements
29
Q

Programming for

Multi-Core

A
  • Programmers have a choice of using
    • multiple threads, or
    • multiple processes
  • Spread the workload across multiple cores
  • Write Parallel algorithms
  • OS will map threads/processes to cores
  • Thread safety is very important
30
Q

Programming for Multicore:

Thread Safety:

Things to keep in mind

A

Thread Safety is VERY IMPORTANT

  • Pre-emptive Context Switching:
    • Context switch can happen AT ANY TIME
  • Dealing with true concurrency,
    • not just uniprocessor time-slicing
  • Concurrency bugs are exposed much faster when dealing with multi-core
31
Q

Multicore Programming:

Assigning Threads

to the Cores

A
  • Each thread/process has an Affinity Mask
  • The affinity mask specifies which cores the thread is allowed to run on
  • Different threads can have different masks
  • Affinities are inherited across fork()
32
Q

Affinity Masks

Overview

A
  • Affinity Masks are bit vectors that specify which cores a thread can run on
    • Without SMT:
      • One bit for each core, 1 if allowed, 0 if not
  • When Multicore and SMT are combined:
    • Affinity Mask stores separate bits for each Simultaneous Thread:
      • 2 bits for each core
  • By default, an affinity mask is all 1s, allowing a thread to run on any core
33
Q

Affinity Masks:

  • Default
  • Assignment
A
  • By default, an affinity mask is all 1s:
    • All threads can run on all processors/cores
  • Then, the OS Scheduler decides which threads run on which cores
  • OS Scheduler detects skewed workloads,
    • migrates threads to less busy processors
  • Programmers can also set their own affinities
    • These are called Hard Affinities
34
Q

Context Switching:

Cost

A

Context Switching is Costly

  • Need to restart the execution pipeline
  • Cached data is invalidated
  • OS Scheduler tries to avoid migration as much as possible
    • Tends to keep a thread in the same core
    • This is called Soft Affinity
35
Q

What is

Soft Affinity

A

The tendency of the OS Scheduler to keep a thread in the same core.

36
Q

What are

Hard Affinities

A

Affinities that are explicitly defined by programmers.

Rule of Thumb:

Use the default scheduler unless there is a good reason not to.

37
Q

When to set your own Affinities

A
  • Two (or more) threads share data-structures in memory:
    • map to the same core so they can share a cache
  • Real-Time threads:
    • Example:
      • A thread running a robot controller
      • Must not be context switched, or else robot can become unstable
      • Dedicate an entire core just to this thread