Exam 2 Flashcards
ISA
Specifies the software interface to the CPU
(Instructional Set Architecture)
Microarchitecture
- hardware design of a physical CPU
- an implementation of an ISA
Control
Tells everything else what to do and when
- decode the instruction and set all the control signals to make that instruction happen
- automatically sets the control signals to the right values to make each instruction happen
Registers
- Hold the values being computed
- temporary stopping point for your program’s data
ALU
Computes new values from old values
Arithmetic and Logic Unit
Values move between what
Registers and ALU
Architectural Registers
- registers that the ISA specifies
- GPR < Architectural < Microarchitectural
GPR
registers you can use for any purpose in programs
- General Purpose Registers
Microarchitectural registers
- exist outside the ISA and are part of the implementation of the CPU
- used for temp storage, implement multistep operations, control specific features, etc.
- Often inaccessible from software
Register File
- Holds General-purpose registers
- like an array of registers or small word addressed memory
PC register
- part of the control
- says what step to do next
PC
memory address (send it to memory)
- memory sends back data (instruction) and the control decodes the instructions and tells things what to do
Program Counter
CPU’s Job
to read and execute instructions
(Fetch, Decode, Execute (Once per instruction))
Fetch
gets the next instruction to execute from memory by using PC
- Instruction memory
Decode
Control reads the instruction
- look at the fetched instruction and set control signals
Control/Register File
Executes
The control does the instruction by telling other parts what to do
- wait for data to flow through the ALU
ALU
how many times is memory accessed for a load or store?
-1st access: to fetch instruction
-2nd access: to actually load the value
Schematic
Graphical way to represent systems (flowcharts)
HDL
Hardware Description Languages (text)
EDA
Electronic design automation (schematics)
Schematics and texts are…
equal in power
Circuit schematics show…
how data flows from one component to another
components can
produce or consume values (or both)
Is there order in circuit schematics?
NO
everything in the circuit happens simultaneously
Boolean Function
takes one or more Boolean (t/f) inputs and produces a Boolean output
truth table
summarizes a Boolean function by listing the output for every possible combination of inputs
gate
implements one of the basic Boolean logic functions
What can all computation be built from
All computation can be built from just the basic Boolean logic operations (AND, OR, and NOT)
Add two three-bit numbers
- need three one bit adders
- chain the carries from each place to the next higher place
Propagation Delay
every gate and wire has it
- the amount of time needed for a signal to “propagate” through it
ripple carry
linear in the number of bits
- if you double number of bits, double the amount of time needed
overflow
Get a number too big to be represented and it gets wrapped around
If you add two n-digit numbers in any base…
the result will have at most n + 1 digits
3 options to deal with overflow
- store
- ignore
- fall on the floor (crash)
Ignore option
- worst and most common way to respond to overflow
Crash option
- better than ignoring
- generates CPU Exception (can be detected and handled)
Arbitrary Precision Arithmetic
two 8-bit adds, starting with LSB
- the carry bit from the first add is carried into the second add
- can keep chaining them together
overflow in unsigned addition
in unsigned addition, if the MSB has a carry out of 1, an overflow happened
overflow in signed addition
in signed addition, overflow occurs if we add two numbers of the same sign and get the opposite sign
signed subtraction
same as signed addition, apply the rule after doing the negation
overflow for unsigned subtraction
overflow occurs when the MSB’s carry out is 0 not 1
overflow for signed subtraction
same as addition, but check after negating second input
two paths can be done…
at the same time
when making a decision in hardware you…
do all possible things, then pick only the thing you need, and ignore the rest
Multiplexer
(mux)
- outputs one of its inputs based on t=a select input
Demultiplexer
(Demux)
- opposite of multiplexer
(with very few exceptions, you won’t need these)
negate means
change the sign to the opposite
shifting left by n is the same as…
multiplying by 2^n
shifting right by n is the same as…
dividing by 2^n
Arithmetic Right Shift
used for signed numbers
- “smears” the sign bit into the top bits
(pushes 1s in the left place while scooting right)
And
intersection
set of things that are in both sets
Or
Union
the set of things that are in either set
XOR
Symmetric set difference
set of items in one set but not both
Mask
specifically constructed value that has 1s in the bits it wants to keep and 0s in the bits that we want to discard
(&<bitwise> 000001111, 1s being the bits we want to keep) (same is doing 1st number mod 16)</bitwise>
To isolate the lowest n bits,
And with 2^n-1 or bits & ((1 < < n) - 1
no finite number of fractional places can
represent infinite fractions, because writing infinite fractional places would require infinite storage
how are floats represented
as a fixed-length binary number with fractional places
every operation on floats round
the result off to the nearest representable float
- (a + b) + c may not be equal to a + (b + c)
floats don’t use what?
2’s complement
(they use sign magnitude)
Combinational Logic
you give it some combo of inputs and you get an output
one kind of memory
a circuit that remembers information and memory is the basis of sequential circuits due to propagation delay
sequential circuits
can perform sequences of operations
clock signal
says when we move to the next step
clock edges
when it changes between 1 and 0
rising and falling edges
clock cycle
time from one rising edge to another rising edge
write enable
like a door on the input of the register
when the write enable is 0
it ignores the clock and the value never changes
chooses if we change a register’s value on each cycle
state
all the things remembered by the system
blinky states
on or off
state transition table
sequential version of a truth table
encodes the same information as a state transition diagram
FSM
Finite State Machine
way of thinking about a sequential process where there is state (memory), the state changes based on the current state transition logic and (optionally) inputs, there are outputs based on the states
is multiplication slower than addition
yes
when you multiply two n-digit/bit numbers
the product will be at most 2n digits/bits
division
factoring polynomials
fundamentally slower
how many addresses can you access in memory
You can only access one address in memory per clock cycle
if you have one kind of resource and you want two users to use it, you have two options
- make two instead of one so they can use them simultaneously
- make the users take turns using one, over time
Harvard Architecture
2 memories
- PC -> Instruction on memory -> instructions
- stores -> data memory -> loads
- easier for hardware designers
Von Neumann
One memory (takes turns)
- PC, Stores -> Memory -> instructions, loads
- uses multiple clock cycles
- fetch instruction on first cycle
- perform variable load on second cycle (most common but more complex)
- easier for programmers
Assemblers job
turns assembly language code (written by humans) into machine code (for the CPU to read) based on ISAW
What makes instructions smaller
encoding instructions as bitfields
jump
make execution go to one specific place
- absolute
- jumps (j, jal, jr) put a value (the jump target) into the PC
branches
make execution go to one of two places
- relative
absolute
sets the PC to a new value
relative
it adds an offset to the current PC
offset
destination - here (calc by assembler not CPU)
FDXMW
Fetch, Decode, Execute, Memory Access, Write-back
Memory Access
If its a load or store, do that
Data memory
Write-back
If theres a destination register, write the result to it
Register File
3 phases of decoding
- split the encoded bitfield into its constituent values
- from the opcode, determine which instruction we’re looking at
- map that instruction to a unique combination of control signals
happens in control unit
overall shape of control
- the only input is current instruction, which gets split up
- the opcode gets decoded to produce the instruction signals
- the instruction signals are used to come up with the control signals
- the outputs are some instruction fields and the control signals
small numbers have
negative exponents
large numbers have
positive exponents
when you move the decimal to the left
you add to the exponent
when you move the decimal to the right
you take away from the exponent
(T)era
10^12
(G)iga
10^9
(M)ega
10^6
(K)ilo
10^3
(m)illi
10^-3
micro (weird u)
10^-6
(n)ano
10^-9
(p)ico
10^-12
Latency
how long a task takes to complete
- seconds per task
Throughput
how many tasks you can complete in a span of time
- tasks per second
improving latency
reducing amount of time that one task takes
improving throughput
increasing the amount of work being done at once
what is the best latency
shortest latency
(time/task)
what is the best throughput
the one with the highest number of completed tasks
(tasks/time)
making latency lower
can make throughput higher
critical path
path through a circuit that requires the longest series of sequential operations
fastest we can clock a sequential circuit
is the reciprocal of the critical path’s propagation delay
problem with single cycle architecture
- it ties the performance of the CPU to the performance of the RAM
- clock cannot tick any faster than the instruction with the longest critical path
- can’t run the clock any faster than memory latency
what is the fastest a single cycle machine can run
66-83 MHz (66 to 83 million)
1 CPI
pipeline registers
hold onto data for the next phase in multi cycle
CPI
(Cycles Per Instruction)
- latency
- how many cycles it takes to complete one instruction
- lower is better (slower instructions take more cycles)
- Best is 1
Total time
n * CPI * t seconds
Microprogramming
Control is a FSM whose transition table is in a special memory
microcode
(ucode)
small “programs” that encode the sequence of steps for each instruction
sequencer
decides what steps to do next (control flow through uProgram)
uInstruction
set of controls and FSM control flow information
uCode ROM
(read only memory)
indexed by the uPC
hardware control is faster
because it is smaller and simpler
the longer the sequence of instructions gets
the closer the CPI
the more stages instruction have
the higher the throughput gets
cache
temporary holding area for recently used data
(memory references cache)
temporal
these variables are used over and over, very quickly. It makes sense to keep them somewhere fast to access
spatial
the items in this array are all next to each other in memory. the cache will pull in several items at a time
scalar CPU
finish at most 1 instruction per cycle
superscalar CPU
finishes more than one instruction per cycle
pipelining
partially overlapping instruction execution to improve throughput and complete instructions faster
Caching
keeping copies of recently used data in a smaller but faster memory so it can be accessed more quickly in the near future
Out-of-Order
CPUs analyze several (a dozen or more) instructions in advance, then dynamically reorder them so they can be executed more quickly then they would as written