CSCI 223 Midterm 2 Flashcards
3 mechanisms in procedures
- passing control (call, return)
- passing data (using either registers or memory)
- memory management (stack - allocate/deallocate)
x86-64 stack grows toward
lower memory addresses (down)
register ? contains address of top of stack
%rsp
stack operation: pushq Src
fetch (read) operand at Src), decrememt %rsp by 8 (or 4 on 32-bit), write operand at address given by %rsp
register ? contains address of the frame pointer
%rbp
stack operation: popq Dest
read value at address given by %rsp, increment %rsp by 8 (or 4 on 32-bit machine), store value at Dest (must be register)
procedure control flow
use stack
procedure call: push return address on stack, jump to label
procedure ret: pop address from stack, jump to address
return address
address of the next instruction right after call
return value
%rax (64-bit) or %eax (32-bit)
stack allocated in ?
frames (state for single procedure instantiation)
contents of stack frames
return information, local storage (if needed), temporary storage (if needed)
management of stack frames
space allocated when enter procedure (“set-up” code; includes push by call instruction) and deallocated when return (“finish” code; includes pop by ret instruction)
we create a new stack frame when
we enter a new function
calling conventions
caller saved and callee saved
caller saved
caller saves temporary values in its frame before the call
callee saved
callee saves temporary values in its frame before using, then restores them before returning to caller
%rax is the ?, ?-saved, and can be modified by ?
return value, caller-saved, procedure
%rdi…%r9 are the ?, ?-saved, and can be modified by ?
arguments, caller-saved, procedure
%r10, %r11 are ?-saved, and can be modified by ?
caller-saved, procedure
%rbx, %r12, %r13, %r14 are ?-saved, and can be modified by ?
caller-saved, callee must save and restore
%rbp is the ?, ?-saved, and can be modified by ?
frame pointer, callee-saved, callee must save and restore
%rsp is the ?, ?-saved, and can be modified by ?
stack pointer, callee-saved (special form), restored to original value upon exit from procedure
recursion is handled by ? calling conventions
normal (in the use of the stack)
char/unsigned char
1 byte
2^8 (256) numbers
short/unsigned short
2 bytes
2^16 numbers
int/unsigned int
4 bytes
2^32 numbers
long/unsigned long
8 bytes
2^64 numbers
integer encoding standards: unsigned integers
like converting binary to decimal: the sum of the number (0 or 1) times 2^i
integer encoding standards: two’s complement for signed integers
first bit is a sign bit (0 = non-negative, 1 = negative) in addition to a value, converting the rest is like binary to decimal
to find the negative version of an integer
- find bit pattern for positive version
- flip all bits
- add 1
for a 16-bit integer, UMax (unsigned)
0xFFFF = 2^16-1 (need to know context to know difference between -1 and 2^16-1)
for a 16-bit integer, TMax (signed)
0x7FFF = 2^15-1
for a 16-bit integer, TMin (signed)
0x8000 = -2^15
for a 16-bit integer, -1
0xFFFF = -1 (need to know context to know difference between -1 and 2^16-1)
for a 16-bit integer, 0
0x0000 = 0 (always all zeroes for signed and unsigned)
the size of TMin is equal to
TMax + 1
the size of UMax is equal to
2*TMax + 1
equivalence of unsigned/signed numeric values
same encodings for nonnegative values
uniqueness of unsigned/signed numeric values
every bit pattern represents a unique integer value; each representable integer has a unique bit encoding
constants by default are considered to be ? integers unless
signed; U as suffix
sign extension
given w-bit signed integer x, convert it to w+k bit integer with same value
rule for sign extension
make k copies of sign bit
floating point numbers
encode rational numbers of the form v = x*2^y
standard for representing floating point numbers (but not all)
IEEE 754
single-precision for IEEE 754
32-bit
double precision for IEEE 754
64-bit
IEEE 754 sign bit is
MSB
IEEE 754 next set of bits
exp
IEEE 754 last set of bits
frac
IEEE 754 formula
(-1)^s (1.frac) 2^(exp - bias)
divide by two by shifting
right
multiply by two by shifting
left
IEEE 754 single precision bit allocation
sign bit = 1 bit
exp = 8 bits
frac = 23 bits
IEEE 754 double precision bit allocation
sign bit = 1 bit
exp = 11 bits
frac = 52 bits
normalized values for IEEE 754
when exp =/= 000…0 or 111…1
bias for exp
2^(k-1) - 1
bias for exp for 32 bit, 64 bit
127, 1023
denormalized values for IEEE 754
exp = 000…0
cases:
frac = 000…0 represents zero value (+0, -0 depending on sign bit)
frac =/= 000…0 represents numbers very close to 0.0
special values for IEEE 754
exp = 111…1
cases:
frac = 000…0 represents infinity (both positive and negative depending on sign bit)
frac =/= 000…0 represents Not-a-Number (NaN; when no numeric value can be determined)
IEEE 754 zero
exp = 00...00 frac = 00...00
IEEE 754 smallest positive denormalized
exp = 00...00 frac = 00...01
IEEE 754 largest denormalized
exp = 00...00 frac = 11...11
IEEE 754 smallest positive normalized
exp = 00...01 frac = 00...00
IEEE 754 one
exp = 01...11 frac = 00...00
IEEE 754 largest normalized
exp = 11...10 frac = 11...11
floating point zero is (the same/different from) the integer zero
the same
rounding modes
towards zero, round down (towards -inf), round up (towards +inf), nearest even (default)
int to double conversion
exact conversion, as long as int has <= 53 bit word size
int to float conversion
will round according to rounding mode
double/float to int conversion
truncates fractional part, like rounding toward zero; not definite when out of range or NaN
key features of RAM
traditionally packaged as a chip, basic storage unit is normally a cell (one bit per cell)
RAM stands for
Random-Access Memory
static RAM (SRAM)
used for cache; each cell stores a bit with a four or six-transistor circuit; retains value indefinitely, as long as it is kept powered; relatively insensitive to electrical noise (EMI), radiation, etc.; faster and more expensive than DRAM
dynamic RAM (DRAM)
used for main memory; each cell stores bit with capacitor; one transistor per bit; value must be refreshed every 10-100ms; more sensitive to disturbances than SRAM; slower and cheaper than SRAM
DRAM and SRAM are ? memories
volatile (lose info if powered off)
nonvolatile memories + examples
retain value even if powered off (ROM, PROM, EPROM, EEPROM, flash memory)
ROM
read-only memory; programmed during production
PROM
programmable ROM; can be programmed once after manufacturing
EPROM
erasable PROM; can be bulk erased (UV, XRay)
EEPROM
electronically erasable PROM; on our system; electronic erase capability
flash memory
EEPROMs with partial (sector) erase capability; wears out after about 100,000 erasings
uses for nonvolatile memories
firmware programs stored in ROM, solid state disks, disk caches
bus
a collection of parallel wires that carry address, data, and control signals; typically shared by multiple devices
memory read transaction
movl A, %eax
CPU places address A on the memory bus
main memory reads A from the memory bus, retrieves word x, and places it on the bus
CPU reads word x from the bus and copies it into register %eax
memory write transaction (more complicated)
movl %eax, A
CPU places address A on bus; memory reads it and waits for the corresponding word to arrive
CPU places data word y on the bus
main memory reads data word y from the bus and stores it at address A
explain the power wall
CPU clock rates increase up to 2003, then drop because the number of cores was increased in exchange for lower speed to save power
the CPU-memory performance gap
widens between DRAM, disk, and CPU speeds; the key to bridging this gap is locality
locality
programs tend to use data and instructions with addresses near or equal to those they have used recently
temporal locality
recently referenced items are likely to be referenced again in the near future
spatial locality
items with nearby addresses tend to be referenced close together in time
referencing array elements in succession is an example of ? locality
spatial
referencing the variable “sum” in each iteration is an example of ? locality
temporal
referencing instructions in sequence is an example of ? locality
spatial
cycling through a loop repeatedly is an example of ? locality
temporal
stride
the distance between two consecutively accessed addresses (e.g. stride-N, stride-1)
CPU registers hold ? retrieved from ?
words; L1 cache
L1/L2 cache holds ? retrieved from ?
cache lines; main memory
main memory holds ? retrieved from ?
disk blocks/”page” local disks
local disks hold ? retrieved from ?
files; disks on remote network servers
memory hierarchy
registers, L1 cache, L2 cache, main memory, local secondary storage, remote secondary storage
cache
a smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device
fundamental idea of memory hierarchy
for each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1
why do memory hierarchies work?
because of locality, programs tend to access the data at level k more often than they access the data at level k+1. thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit
big idea of memory hierarchy
creates an illusion of a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top
cache hit
data block b is needed and found in the cache
cache miss
data block b is needed, not found in the cache, fetched from memory, then stored in cache
types of cache misses
cold (compulsory), conflict, capacity
cold/compulsory cache miss
occurs because the cache is empty
conflict cache miss
occur when the level k cache is large enough, but multiple data objects all map to the same level k block
capacity cache miss
occurs when the set of active cache blocks (working set) is larger than the cache
cache memories
small, fast SRAM-based memories managed automatically in hardware that hold frequently accessed blocks of main memory
cache configurations
direct-mapped, e-way associative, fully associated
general cache organization
S, E, B S = 2^s sets E = 2^e lines per set B = 2^b bytes per cache block cache size = C = SxExB data bytes
direct-mapped cache
1 line per set
number of lines = number of sets
fastest
E-way associative cache
E = # lines per set
fully associated cache
one set with all cache lines
cache read
locate set, check if any line in set has matching tag, yes + line valid: hit, locate data starting at offset
cache block mapping: direct-mapped
block will always go to the same location (address % number of blocks)
cache block mapping: fully associative
block can go anywhere
cache block mapping: E-way associative
block has E options in one set
what to do on a write hit
write-through (write immediately to memory) or write-back (defer write to memory until replacement of line; need a dirty bit to determine if the line is different from memory or not)
what to do on a write miss
write-allocate (load into cache, update line in cache) or no-write-allocate (writes immediately to memory)
typical selection for write hit/write miss
write-back + write-allocate
intel core i7 hierarchy has ? cores, each with ?
4; Registers, L1 d-cache, L1 i-cache, and L2 unified cache
the intel core i7 processor package holds
the 4 cores and the L3 unified cache (shared by all cores)
intel core i7 L1 i-cache and d-cache
32 KB, 8-way, access: 4 cycles
intel core i7 L2 unified cache
256 KB, 8-way, access: 11 cycles
intel core i7 L3 unified cache
8MB, 16-way, access: 30-40 cycles
intel core i7 block size
64 bytes for all caches
cache miss rate
fraction of memory references not found in cache; misses/accesses = 1 - hit rate
hit time
time to deliver a line in the cache to the processor (includes time to determine whether the line is in the cache)
miss penalty
additional time required because of a miss
average memory access time
AMAT = hit time + (miss rate * miss penalty)
99% hits is ? times as good as 97% hits
two
three primary strategies for cache block replacement policy
- random (or psuedo-random)
- least recently used (LRU)
- first in, first out (FIFO - oldest block is replaced)
cache optimization
- reduce miss rate
- reduce miss penalty
- reduce hit time
six basic cache optimizations
- larger block size to reduce miss rate
- larger cache to reduce miss rate
- higher associativity to reduce miss rate
- multilevel caches to reduce miss penalty
- give priority to read misses over writes to reduce miss penalty
- avoid address translation to reduce hit time
2:1 cache rule of thumb
a direct-mapped cache of size N has about the same miss rate as a 2-way set associative cache of size N/2. This held in three C’s figures for cache sizes less than 128 KB