Semester 1 Flashcards
What is the ALU?
The arithmetic logic unit is a component of the computer to perform arithmetic or mathematical operations.
Here is an ALU, what are the following parts?
INVA A, ENA, ENB, F0 and F1.
- ENA and ENB (ENABLERS) have to be high voltage for A or B to be considered.
- INVA gives the inverse of A
- F0 and F1 (the decoder) decide the operation to be enabled in the logical unit.
How many numbers can you add up to in bits in an 8bit adder?
2^8 = 64
What are each of the labelled components within clock cycle 1?
What do these field descriptions mean for an example microarchitecture?
B
Mem
C
ALU
J
Addr
- Selects B bus source
- Mem = Memory functions
- C = Selects which registers written from C bus
- ALU = ALU and Shifter Functions
- J = Determines how the next microinstruction is selected
- Addr - Contains address of potential next microinstruction
What does adding « 8 to a microinstruction do?
Shift the result left by 1 byte
What are the logic circuits whose outputs at any instant of time depend not only on the present input but also on the past outputs called?
Sequential circuits
What is the Control Unit responsible for?
- Fetching instructions from memory
- Configuring ALU
- Moving data
512kB
Consider the 1-bit Full Adder and assume all the inputs are applied at time t=10. When is the output ready? Assume all the basic gates have two units of time delay.
t = 22
1 XOR gates, containing 3 basic gate rows 3 x 2 = 6
2 AND gates = 4
1 OR gate = 2
= 12
10 + 12 = 22
What is the purpose of the adder?
An adder is a digital circuit which performs addition to numbers. It is used within the processor and is essential for the ALU.
What is the purpose of a comparator?
Comparators compare two currents and use the largest current as the results.
What is a PLD?
- A programmable logic device (PLD) is an electronic component used to build reconfigurable digital circuits.
- PLD has an undefined function at the time of manufacture.
- Before the PLD can be used in a circuit it must be programmed to implement the desired function.
- Programming a PLD changes the connections made between the gates in the device.
What are 4 Combinational Logic Circuits which perform Arithmetic and Logic Functions?
- Adders
- Subtractors
- Comparators
- PLD’s
What are 4 Combinational Logic Circuits which perform Data Transmission?
- Multiplexers (i.e. majority vote)
- Demultiplexer
- Encoder
- Decoder
What are 3 Combinational Logic Circuits which perform Code Conversion?
- Binary
- BCD
- 7-segment
What method causes branching as an error?
Pipelining
Consider the 8-bit Adder with Look Ahead Carry and assume all the inputs are applied at time t=4. When is the result of addition ready? Assume all the basic gates have one unit of time delay.
12, look ahead carry simply adds 1t per bit, so 8 + 4 = 12
What are programs?
- Programs are what a computer does, using the CPU
What is the purpose of the logic level in computer architecture?
To make decisions and perform basic functionality
What truth table represents the XNOR gate?
What is the processor?
- Sometimes referred to as the CPU (central processing unit) processes data by executing program instructions.
- At processor level, these will be low-level instructions in the form of machine code that the processor has been designed to handle based on a specific processor instruction set
What is main memory?
- Main memory is memory that can be accessed directly by the processor.
- Each memory location (instructions/data stored as binary sequences), has a physical address, used to locate it and its content.
- It is external memory not located in the CPU
What are the two main types of memory?
- RAM (Random Access Memory) : The working memory that is used by the processor during the Fetch-Decode-Execute cycle. It is volatile
- ROM (Read Only Memory) : Memory which is used in the boot process for the computer system. It is non-volatile
What are registers?
- Fast local memory on the CPU
- Very small storage locations used to hold data temporarily
- They have very high read/write speeds
What are I/O Devices? 3 main types.
External (peripheral) devices that can be categorised into 3 groups:
- Secondary storage devices e.g. hard disk
- Input devices e.g. a keyboard/sensor
- Output devices e.g. a speaker/actuator
Each peripheral also has a device driver that provides a software interface for the device
Bus
- A series of parallel wires that connect internal components of a computer system, allowing signals to be passed through them
Bus Width
- The number of parallel wires in a bus has a direct relationship to the number of bits that can be transferred.
Address Bus
Unidirectional
———–>
(away from CPU)
- Transports memory addresses
- Bigger width = larger range of addresses thus - increasing the computers amount of addressable memory
- 1 wire = 2^1 addresses
What is the Data Bus?
Bidirectional
main memory ←———–> processor
- Sends data and instructions
- Bigger width = larger volume of data transfer
- 1 wire = 1 bit
What is the Control Bus?
Bidirectional
Main memory ←———–>CPU(Processor)
- Carries control signals to regulate operations
- Higher clock speed (a control signal) = More instructions per second + higher temp/power consumption
- can control Clock, memory read/write
How is the Von Neumann Architecture structured?
- All data/instructions are stored in the main memory
- Instructions are sent to the processor along the system bus to be executed
- Data sent to/from the processor is sent along the system bus
- Any input/output is performed by i/o devices with the data travelling from them to the cpu/main memory
Harvard Architecture?
- The main difference to the Vonn Neumann Architecture is it has separate buses for data and instructions, making it more efficient and faster
What is integrated graphics?
- It uses other parts of the machine to do its job
What is the Stored Program Concept?
- machine code instructions stored in main memory are fetched and executed serially by a processor that performs arithmetic and logical operations.
Describe the FDE cycle
FETCH
- The content of the PC is copied into the MAR
(PC —> MAR) - The contents of the MAR is transferred to main memory by the address bus
(MAR —Address bus—> MM) - The instructions from MM are sent to the MBR/MDR by data bus simultaneously
(MM —data bus—> MBR) - The program counter is incremented by 1, or points to next instruction
(PC +1) - The content of the MBR is copied to the Current Instruction Register (MBR -> CIR)
DECODE
- The content of the CIR is decoded by the control unit
- The decoded instruction is split into opcode + operand
EXECUTE
- Any data required by the instruction that isn’t present in registers is fetched
- The instruction is carried out
- Results of any calculations are stored in general purpose registers, main memory or an accumulator (e.g. ALU for arithmetic calculators)
What is pipelining?
- Doing multiple parts of the FDE cycle in parallel so that computations can be achieved at a faster rate. (e.g. superscalar architectures to make processes parallel)
- Parallel processing
Complete the 5 level memory hierarchy
using: tape, optical disk, main memory, magnetic/SSD and cache
What is cache?
- Relatively small capacity set of locations that sit close to the processor, used to store instructions and data most frequently used.
- More cache = More instructions can be queued and carried out
L1 Cache is the smallest and fastest
L2 Cache is shared by cores, but larger or slower
L3 and new L4 Slow but large and sit on or near the processor
- L1 is closest to the CPU
- Cache can be cleared to increase the speed
Give the name, algebraic function and truth table of this logic gate
Give the name, algebraic function and truth table of this logic gate
Give the name, algebraic function and truth table of this logic gate
Give the name, algebraic function and truth table of these logic gates
What is a Source?
Source: Where the voltage enters the transistors.
What is a Drain?
Drain: Where the voltage leaves the transistors.
What is a Gate?
The terminal that controls the flow
What is the difference between NMOS and PMOS?
n-MOS: Gate driven positive allows (works) current to flow between Source and Drain. Gate driven negative isolates Source and Drain (stops)
p-MOS: Gate driven negative allows (works) current to flow between Source and Drain. Gate driven positive isolates Source and Drain (stops), PMOS little circle on GATE
Note: Image of a CMOS inverter
+V = 1 = High voltage
-V = 0 = Low voltage
What gate does the CMOS inverter have the same functionality as?
CMOS Inverter has the same functionality as a NOT gate.
Draw/Describe a CMOS NOR Gate
2 p-MOS and 2 n-MOS
Draw a CMOS NAND Gate
Note Reminder: CMOS NAND Gate Silicon Example
How do you calculate the total number of possible input calculations for a gate?
2^N where N is the number of inputs. e.g. 3 inputs would be 2^3 = 8 possible combinations in a truth table
How do you evaluate an XOR gate with more than 2 inputs?
If you have more than 2 inputs, if the number of 1’s is odd, the output is 1, otherwise it is 0.
Make a XOR gate with NAND gates only
You can make XOR gate with just and 4 NAND gates (1, 2 then 1 again)
Boolean Function Circuit Example:
- Using AND gates, we can draw the outputs of M by focusing on the rows of the truth table in which M = 1, drawing 1 AND gate for each 1 output
- Not (line over character) symbolises 0 as the input, NOT 1.
- Truth table consists of only 1’s or 0’s
- Output of 1= 0, 2= 0, 3 = 0 and 4 = 1, and gate 8 is an OR gate, meaning output of M in this case is 1.
Boolean Function Circuit Expressions Example:
How can you construct a NOT gate using a NAND gate?
We can use a NAND gate as a NOT gate if both values put into the NAND gate are the same.
How can you construct a NOT gate using a NOR gate?
We can use a NOR gate as a NOT gate if both values put into the NOR gate are the same.
How can you construct an AND gate using NANDS?
Your second NAND can essentially function as a NOT gate, taking in two of the same value to change the NAND into an AND.
How can you construct AND using NOR?
- First two NOR become NOT via both taking in the same input, NOT + NOR creates an AND Gate as the end result.
- You can do something similar but replace all NOR with NAND to create an OR gate.
Complete the Boolean Identity Table:
Match up the pairs of gates that have circuit equivalence
What is a Multiplexer? What does it always consist of?
Multiple input signals to combine for a singular common output.
Always consists of at least one NOT gate, AND gate and OR gate.
How does the select signal impact a multiplexer?
If the select signal is 0, the output will always be the value of D0
What does a Multiplexer majority do?
- It is an 8 input multiplier that can compute the majority vote function (outputs 1 when most input is 1)
How would you calculate F for each of these outputs?
- Add the value of ABC to decide which D to output, and then outputs the value associated. e.g. if ABC = 7, it chooses D7 and as shown via the diagram 1 is output
What does a decoder do?
- A decoder converts coded inputs into coded outputs where the input and output codes are different
- Large number of outputs, 1 output should be 1, the rest should be 0.
What does an encoder do?
A encoder takes all data inputs one at a time and converts them to a single output
What does a demultiplexer do?
explanation + example given
- input signal D which is connected to all outputs
1. takes the inputs in binary and adds them (ABC)
2. Outputs the number they total
3. e.g., if A = 1, B = 1 and C = 1, this is 111 = 7, so the system would output a 1 where F7 is seen
4. aka 10000000
What is Base 10?
Decimal system, digits 0-9
Each position has a place value/weight, 1 = 10^0, 10 = 10^1, 100 = 10^2
What is notation?
Notation: to denote the base of a number with the base as a subscript (often omitted for base 10)
Why is it beneficial to use binaries during storage
How do you convert binary to decimal?
e.g. 1101
base 2
starting at the value in the lowest place value, multiply each digit by 2^place value
How do you convert octal to decimal?
e.g. 704 (8)
base 8
starting at the value in the lowest place value, multiply each digit by 8^place value
How do you convert hexadecimal to decimal?
base 16
starting at the value in the lowest place value, convert the HEX digits, then multiply each digit by 16^place value
How do you convert binary into hex or octal?
- For hex: Split from right to left, in groups of 4 for hexadecimal, and then convert each nibble to its hex value
- For octal: Split from right to left, in groups of 3, and then convert these groups into octal (using denary conversion), which you can then write out
What are the binary arithmetic rules?
1 + 1 = 0, carry 1
1 + 0 or 0 + 1 = 1
1 + 1 + 1 = 1, carry 1
What are two ways you can do decimal to binary conversion?
- Decimal to binary conversion can also be done via working down the binary e.g. if you’re trying to put 280 in binary, you can’t take away 512, so that is a 0, whereas 256 would be a 1, and you minus this and continue down.
- see image
At minimum, what does a half adder consist of ?
DRAW IT!
- an AND gate and an XOR gate
- takes two inputs to ADD
- Sum is XOR gate, this is the lowest place value when adding
- Carry is AND gate, this is carried to the next adder for next digit addition
Draw a truth table for a half adder
What is a full adder? How does it differ from a half adder?
It takes 3 inputs instead of two, it intakes a carry from previous arithmetic and outputs a carry additionally.
What is propagation delay?
Delay between gates in activating, you can separate the time for the carry and the inputs because they are put in separately (when referring to full adder).
Note reminder: Full adder propagation delay
ALSO SEE IF YOU CAN DRAW A FULL ADDER CORRECTLY
Note reminder: Full adder logic equation explained for delay
ALSO SEE IF YOU CAN DRAW A FULL ADDER CORRECTLY
What is an 8-bit left or right shifter?
Shift a binary value up or down a place
How much time delay does an XOR gate have?
if every logic gate round creates 1t, 3 time delay as it is made with 4 NAND GATES, and 3 lines.
At minimum, what is a full adder comprised of?
AND, AND, XOR, XOR (second one for the sum), + OR for the carry
What is hamming distance?
Between two strings of equal length, it is the number of positions at which corresponding symbols are different. In other terms, it measures the minimum number of substitutions required to change one string in to the other.
What is the hamming distance formula for detecting (d) amount of errors?
hamming distance >= d+1
What is the hamming distance formula for correcting (d) amount of errors?
hamming distance >= 2d + 1
What is even parity?
Even parity = Even number of 1’s (after adding the parity bit)
What is odd parity?
Odd parity = Odd number of 1’s (after adding the parity bit)
What is percent overhead?
The percentage of check bits within the word size
e.g. if word size was 4, and check bits were 3, the percent overhead would be 75%
Error correcting 4 bit data word venn diagram example:
- Check each circle for its individual parity bits and then look for errors when crossing over
The even parity bit can be generated using XOR gates with two inputs. How many such XOR gates is needed to compute the parity bit of a 7-bit data word?
6 XOR gates
1 LESS
Using even parity hamming code where (if any) is the error in the following code?
The error is in Data bit D1
Treat each circle as checking its own parity, often the error is the odd crossover out
Using hamming code where (if any) is the error in the following code?
Data bits: 1111
Parity bits: 111
There is no error
Using even parity hamming code where (if any) is the error in the following code?
Data bit D2
Treat each circle as checking its own parity, often the error is the odd crossover out
What gate is used to calculate even parity?
What gate is used to calculate odd parity?
Even parity: XOR gate
Odd parity: XNOR gate
Note reminder: Check bit generator for 4 bit data word
What is S?
What is R?
Why are the X’s laid out like this
S = Sender
R = Reciever
For each row, you start at the x where the number is e.g. first x on row 4 starts at 4, then prints 4 x’s, then takes 4 spaces, and repeats
Rows continue until the length of the Reciever has been mapped
What does a clock do in a sequential circuit?
Contains clock which syncs up transfer, and positive feedback is recieved using data previously calculated and pulling it from memory.
What is A?
What is B?
What is C?
A is the data send out
B is the data from A but with time delay
C is the data gathered when both A AND B are on their rising edge.
reminder: Clock signals have a rising edge and a falling edge
What is a?
What is b?
(a) is NOR latch in state 0
(b) is NOR latch in state 1
in SR latch, what does it mean if Q and Qbar are the same?
The system is vulnerable. They need to be opposite eachother to be stable (e.g. outputting 1 and 0)
What is the benefit of using a clock in a D latch?
What is D?
- Control and ensure more synced system.
- D is connected to both AND gates, but its inverse is connected to the bottom and, meaning both Q and Qbar cannot be high.
What is the benefit of aAND gates in a D latch?
Inclusion of AND gate means that in no scenario both values can be 1, preventing vulnerability.
However, there is a small time delay for taking the negation of D which may allow a small time frame in which both q and not q are positive.
What is the most essential type of memory parts to ensure minimal time delay?
D-type Flip flop. It is adjusted to make the time delay essentially negatable as it is very small.
Explain how each of these latches/flip flops work.
What is the rule for a Master Slave Flip flop using D latches?
- Master and slave (two D-type flip flops) are never on at the same time.
- It requires a symmetric clock at high speed.
How many bits does a flip flop cover?
Each flip flop covers 1-bit from a register.
What is the purpose of clear
(CLR) and amplify here?
Clear empties the signal
Large triangle – Amplified signal to ensure it is distinguishable between high and low voltage as it travels and potentially weakens
What do read/write operations always act on?
A complete word
Typical Memory Internal organisation note reminder:
What type of architecture do high level languages has to be defined in terms of?
Any high-level language has to be defined in terms of microinstructions, which in turn have to be supported by a microarchitecture.
How can you make a microarchitecture more efficient?
- Having fewer microinstructions
- Adding more hardware
What do latches do within a circuit?
latches separate parts of the circuit for us, (allowing for pipelining sections)
What is the principle of locality?
If a particular storage location is referenced, it is likely that nearby memory locations will be referenced in the near future
If the main memory size for bit entries is 2^32, what is the size of the TAG,line, word and byte?
Tag = 2^16 (half)
Line + word + byte = 2^16
What is a cache miss?
Cache miss: Indicated by a comparator, the memory value has never been fetched before
What is a cache hit?
Value has been fetched into the cache previously, quicker execution
What is direct mapped cache?
If a byte is present in cache, it can only be in one place
e.g. 64kbyte cache:
- Cache contains 2k lines/entries
- Each line contains 32 bytes of data
- Each line also contains a 16 bit TAG and a 1 validation bit
- The validation bit is 1 if there is real data in that line
- The TAG contains the 16 most significant bits of the actual address of the data contained in that line
Memory Read Operation:
cache
- CPU transmits 32 bit address of X
- X 5-15 selecs cache line
- Cache is initially empty so cache line valid bit is 0
- Comparator indicates cache miss (no pre-fetched information)
- (Due to cache miss) Whole address is propagated to main memory which outputs requested data word.
- Requested data word is read by CPU.
- Requested data word is also written to cache line selected by X 5-15 in data word, selected by X 2-4
- Cache reads 7 more words from main memory to complete cache line data part.
- X 16-31 is written to the tag part of the cache line selected by X 5-15
- The Valid bit of the cache line selected by X 5-15 is set to 1 to indicate that a tag and 8 words of data have been written.
- Cycle somewhat repeats, main memory plays no part in this read as valid bit is now 1 due to fetched data, so a cache HIT occurs instead.
We have a direct-mapped cache with size 32 MB. Data is moved to the cache with the size of 64 bytes. What is the number of lines in the cache?
Number of lines (n) = Cache Size(in bytes)/Line size
n = 32 x 2^20 bytes/ 64 bytes
= 524288 or 2^19
How many bytes is 1 megabyte?
How many bytes is 1 kilobyte?
How many bits is 1 byte?
- 2^20 bytes
- 2^10 bytes
- 8 bits
How do you calculate the number of lines in a direct mapped cache?
Cache size (in bytes)/ Line size(in bytes)
A 64k byte Direct-Mapped Cache is organised as shown in the diagram. If the cache receives a read request at the 32-bit address: 00000101 11100000 01101001 11001001, which cache line will be examined for reading the data?
last 11 bits - 01101001 110
What is the formula to calculate input size of row decoder?
Log2(No of rows)
You are given an SRAM Memory IC of size 128K bytes arranged as 512x256x8 bits. What is the input size of the row decoder for storing a byte in a specific location?
Log2(512)
= 9
Data Flow
How data is moving
What does an instruction fetch unit do for a microarchitecture
Allows more than one instruction to be stored to be fetched and therefore in a cycle more than one action can be performed.
What is an ISA?
Instruction set for a microarchitecture.
What is a microarchitecture?
A small operative architecture that performs logical instructions i.e. FDE cycle
What data type is contained within the FDE decode?
A queue
CORTEX-A53 registers reminder:
16, 12 general purpose registers. (all 32 bits)
R13 = Stack pointer
R14 = Link register
R15 = Program Counter
CPSR = Current Program Status register, allows for specific tasks, it is read only
What does the program counter do?
Point to the next instruction?
Describe the CPSR?
Current Program Status register, allows for specific tasks, it is read only
What do the following CSPR instructions do?
- N
- Z
- C
- V
Arm Operations
- N – Negative result from the ALU
- Z – Zero result from the ALU
- C – ALU operation Carry out
- V – ALU Operation oVerflowed
How does direct mapped cache operate?
– A line of data from a given location in main memory always maps onto the same cache line
What is thrashing, and why is it an issue with direct mapped cache?
– A line of data from a given location in main memory always maps
onto the same cache line
– This can result in thrashing where data moves to and from
memory frequently, limiting CPU performance
How does set associative cache operate?
- A set-associative cache replicates every line a fixed
number of times - Data fetched from main memory could be in more than
one place in cache so a fixed number of places must be
searched - Greatly reduces thrashing and improves performance
4-way set-associative cache, LABEL AND EXPLAIN all of the components
- Line: a block of contiguous words from main memory
- Offset: lower bits identify a word within a line
- Way: a subdivision of cache where Line is stored
- Tag: top bits of the 64-bit address tell the cache where the Line came from in main memory
- Index: middle bits determine in which line of the cache the
address can be found - Set: Cache lines from all Ways sharing a particular Index
- 7/8. Tag and RAM
- CPU Read1: fetch Line from main memory and store in Way
- CPU Read2: search Indexed Set for Tag
Principles of Operation: Multi-Level cache read (include when it is not read in L1)
Consider an instruction fetch:
1. There is a cache lookup in the L1 data cache
2. If it is found in the L1 Cache, the data is then read from the L1 cache and returned to the core
(OPTIONAL IF IT ISNT IN THE L1 CACHE)
3. if it isnt found in the L1 cache, but IS found in the L2 cache, the cache line is loaded into the L1 cache from the L1 cache and data is returned to the core
4. If is not in either L1 or L2, then data is LOADED into both of these caches from MM or L3 cache and supplied to the core
How does a cache Write-Back occur? What is it for?
- Exists to synchronise the MM with the Cache (incase cache is more recent)
1. A write updates the L1 Data Cache only and marks the cache line as dirty
2. Write to L2 system delayed until the line is evicted (way needed for different data)
How does a cache Write-Through occur?
- A write updates both the L1 Data Cache and the L2 system immediately
- This does not mark the cache line as dirty
What is the snoop control unit and what does it do?
- Maintains duplicate copies of L1 data cache tags from all cores
- The SCU monitors the line fetch memory requests and transfers between cores if dirty.
What are Virtual Memory addresses?
- Those used by the user or the compiler/assembler
What are Physical Memory addresses?
- Those used by the actual memory system
Virtual Memory and Physical Memory explanation diagram:
What is page table?
- Translation table with page table entrys to convert virtual to physical addresses.
What is the MMU?
- the memory management unit (MMU) uses the most significant bits of virtual addresses of code/data to index them into a translation table containing the physical addresses
- translation is carried out automatically in hardware - transparent to the application
- MMU also controls memory access permissions, ordering and cache policies.
How does microarchitecture implement an instruction?
It executes a microprogram
What do M bits control?
Memory accesses
What is control store?
- Memory that you cant see, read or write
- it is the instruction set of the computer that specifies what is written there: high to low level instruction
- It is used for holding microprograms
- Contains the MIR (micro instruction register)
What is an MPC?
Microprogram counter, it is similar to program counter that points out to the next instruction. This does the same job but internally for the CPU to point out to the next microprogram
What are the ALU operations based on the following f1 and f0 decoder inputs?
0 0
0 1
1 0
1 1
- 0 0 : A AND B
- 0 1 : NOT B
- 1 0 : A OR B
- 1 1 : A PLUS B
What are the labelled parts of this pipeline illustration within a microarchitecture:
IF
ID
EX
MEM
WB
IF = Instruction Fetch Unit
ID = Instruction Decode Unit
EX = Execute Unit
MEM = Memory Unit
WB = Writeback Unit
Which of the following is hardware and which is software? Which two potentially are some of both?
Program/Application
Microarchitecture
Digital Logic
Compiler/Interpreter
Operating System
Devices e.g. Transistors
Programming Language
Solid State Physics
Instruction set architecture
True or false, 64bites equal to 2^6 bytes?
True
- Log2(64) = 6
How does the CPU specify the location of main memory it wants for the cache?
- It needs an address equivalent to the power of bites of the size of the memory
- e.g. if memory is 64 bytes, it is 2^6 bits and therefore requires an address of 6 bits
- Top two bits signify the part of the memory e.g. if it is 10, you want access to part 2 of the memory
- Next the line is specified
- and last bit specifies which byte
What is the size of main memory with a 32 bit address?
2^32 = 4 gigabytes
a) i
b) ii
c) iv
d) iii
If a code set has a Hamming distance of 6 what is the maximum number of bit errors can be detected?
d-1 errors
= 5
Label the missing components: a, b, c, d , e, f, g, h
D) C latch
E) A latch
F) B latch
When will output s3 (sum) change in the following ripple carry ?
t = 5 for each full adder (IT SAYS EACH WILL CHANGE AT T=5, NOT APPLIED AT T=5), +1 OR gates inbetween
(5+1) + (5+ 1) + 5 = 17
- 17 not 18 because final OR isnt used.
Ripple Carry vs Look ahead carry
- Look ahead carry adds time delay per bit (8 bits, 8 delay)
- Ripple carry adds time delay per gate for first cycle (but when in a cycle, each cycle after the first only counts as 1 addition to time delay (you must also consider connecting OR Gates within the diagram as adding a unit) , e.g.,8 bits, 20 delay for sum (6 + 14(7 x2 - 7 or GATES AND 7 ripples))
Consider the 8-bit Adder with Ripple Carry and assume all the inputs are applied at time t=20. When is Carryout ready? Assume all the basic gates have one unit of time delay.
1st adder 5 units, 20 + 5 = 25 (applied)
then each following adder adds 2 bit of time delay by default assumption. (1 for each s1, s2 ect, 1 for the OR gates connecting the ripple)
2 x 7 = 14
14 + 25 = 39
Consider a 16-bit Adder with Look Ahead Carry and assume all the inputs are applied at time t=100. When is the result of addition ready? Assume all the basic gates have two units of time delay.
100 + 16 bits
= 116
- Two units of time delay for gates dont matter as LAC only considers bits.
If each basic gate creates 1 unit of time delay, when is carryout and when is sum ready based on an adder?
carryout = 5t
sum = 6t
How does carry cycle impact an adder?
- Ripple Carry: First cycle is normal amount (e.g. 6 for sum, 5 for carry) but then +1time delay for each cycle (+ time delay of any gates e.g. OR gates connecting the ripple)
- Look ahead: just adds on the number of bits to the time delay
What is CISC? Who was it used by? Why was it used?
5 points
CISC, Complex Instruction Set Computing (e.g. AMD/intel)
- Memory in 70s was expensive.
- Tring to keep programs short (limit the number of microinstructions).
- CPU designers built more functionality into individual machine instructions, a trend that later became known as CISC.
- CISC instruction set was implemented in firmware by large microprogram stores.
- However, a lot of compilers ignored most complex instructions for reasons of portability.
What is RISC? Who was it used by? Why was it used?
3 points
RISC: Reduced Instruction Set Computing (e.g. ARM, RISC-5)
- Memory price is less expensive.
- Performance became more important than short programs.
- Designers noticed that only 20% of instructions were run 80% of the time so they focused their effort on making the 20% run very quickly – techniques such as pipelining.
Reminder: History of ARM
ARM:
- Initially stood for Acorn RISC Machine, April 1985, ARM1
- ARM2 came later, added multiplication hardware.
- 1990s, new company ARM, Advanced RISC Machines
ARM business model:
- They licensed their intellectual property.
- Sold rights to their designs to semiconductor companies.
- ARM company now licenses IP blocks such as ALU, CPU and memory.
- Have specification documents to define how compliant products must behave
What are the 3 ARM profiles?
- Application profile aimed at high performance processes capable of running fully featured operating systems.
- Real-time profile defines an architecture aimed at systems that require deterministic timing and low interrupt latency.
- Microcontroller profile defines an architecture aimed at low-cost systems, where low latency interrupt processing is vital.
What is SoC?
3 points
System on Chip (Soc):
- Most basic computer entity, billions of transistors, libraries of blocks
- Semiconductor companies license ARM blocks, and add other parts to create a SoC, usually import an operating system.
- SoCs using IP blocks reduce time to market significantly.
What is an embedded system?
The piece of hardware is hidden, running software to perform specific tasks such as TV set top box or MP3 player.
What profile/cortex is used for raspberry pi?
Cortext-A53 is used to build hardware for the raspberry pi.
Difference between Armv8 and Armv7 in terms of instruction set size?
- Armv8 provides a 64 bit instruction set
- Armv7 provides 32 bit instruction set, ARMV8 can support this however
Explain the components of the raspberry pi
- Quad(4 processors) cortex architecture
- USB connector
- Memory/RAM
- Card Reader
- DC to UD converter
Explain the components of the Cortex-A53
- L1 Data Cache
- L1 Instruction Cache
- CPU, processor
- 4 cores
- shared unified L2 cache
Explain the numbered parts of this architecture
- 32 KB instruction cache, A-B associative, 2^5 megabytes + 2^10 (bytes) = 2^15bytes
- Instruction of 1 byte size fetched and put here, 4 instructions (8 bit each)
- Instructions are decoded here
- Instructions are broken up into atomic instructions – Micro Ops (4 uOps).These are scheduled in another queue.
- For numerical operations up to 2 micro uOps can execute as there are two places
- There are two for F-/Neon
- There are 3 for loading/storing
- This is for Boolean operations/Other
- Registers for results
- 32kb data cache.
- Translation lookaside buffer - translation for physical -> virtual addresses
- Both cache types come from the larger l2 cache
- Program counters, point to next instruction
- Scheduling thread for instructions
- Branch predictors guesses whether the conditional branch will be holding or not.
- Retirement order buffer, makes more performant via stitching out of order instructions
Explain the processes of the translation lookaside buffer
The translation table used by MMU is stored in main memory
1. The MMU maintains an L1 cache of recently accessed page translations in a Translation Lookaside Buffer (TLB)
2. Each TLB entry contains not just physical and Virtual Addresses, but also attributes such as memory type, cache
policies, access permissions etc.
3. If TLB does not contain a valid translation for the Virtual Address issued by a core (a TLB miss) a translation table lookup is performed using the Table Walk Unit
Explain the processes of the memory management unit for translation
- The MMU uses the most significant bits of the Virtual
Addresses of code and data to index entries in a
translation table which contains the Physical Addresses - The translation is carried out automatically in hardware
and is transparent to the application - In addition to address translation, the MMU controls
memory access permissions, memory ordering, and
cache policies for each region of physical memory
How does the Cortex-A53 protect against errors?
- The Cortex-A53 processor protects against soft errors that result in a cache RAM bitcell temporarily holding the incorrect value
- The Cortex-A53 CPU cache protection support has a minimal performance impact when no errors are present
- When an error is detected, the access that caused the error is stalled while the correction takes place
- If the error cannot be corrected (failed memory) that way is never used again and the data is fetched from the next level cache or from main memory
What is RAM SED capability?
- Some RAMs have Single Error Detect (SED) capability
(Hamming distance of 2),
What is RAM SECDED capability?
- Single Error Correct, Double Error Detect (SECDED) capability (Hamming distance of 3)
How many pins does this have?
For propagation delay, when are all p ready? When are all g ready?
All g = t+1
all p = t+3
S1 instruction fetch
S2 instruction decode
S3 operation fetch
S4 instruction execute
S5 Write back
2^0 = ?
2^1 = ?
2^2 = ?
2^ 3 = ?
2^4 = ?
2^5 = ?
2^6 = ?
2^7 = ?
2^0 = 1
2^1 = 2
2^2 = 4
2^ 3 = 8
2^4 = 16
2^5 = 32
2^6 = 64
2^7 = 128