OA Flashcards
superscalar
Technique primarily associated with hardware.
- Functional units ALU, Floating Point Unit, Load/Store Unit are duplicated in the pipeline of a superscalar processor
- which allows the hardware to issue multiple instructions to each unit simultaneously.
LEGv8 doubleword
Another natural unit of access in a computer, usually a group of 64 bits (8 bytes)
Bits corresponds to the size of a register in the LEGv8 architecture. In computers, a byte is a group of 8 bits.
virtual memory
A technique that uses main memory as a “cache” for secondary storage.
The address is broken into a virtual page number and a page offset
Program counter (PC)
The register that contains the address of the next instruction to be executed
LDUR
load register
Register File
How do you access the register?
A register file is a state element that consists of a set of registers that can be read and written by supplying a register number to be accessed. The control unit (CU) supplies the register number to be accessed.
Register Files provides 1024 scalar, 32-bit registers for up to 64 threads.
machine language
The language made up of binary-coded instructions that is used directly by the computer
system software
The set of programs that enables a computer’s hardware devices and application software to work together; it includes the operating system and utility programs.
operating system
(computer science) software that controls the execution of computer programs and may provide various services
Assembly Language
Programming language that has the same structure and set of commands as machine languages but allows programmers to use symbolic representations of numeric machine code.
IBM 360/91
- What’s Tomasulo’s algorithm?
- How is it related to Pentium?
The IBM 360/91. It was a powerful computer built by IBM in the 1960s. The IBM 360/91 introduced many new concepts, including dynamic detection of memory hazards, generalized forwarding, and reservation stations.
Dynamic Random Access Memory (DRAM)
DRAM is a type of memory that provides quick access to any data location.
- Multiple DRAMs are used together to contain the instructions and data of a program.
Modern DRAMS consist of rows in each bank
frame buffering
Frame buffering is a process involving a portion of RAM that contains a bitmap, which drives a video display. It acts as a memory buffer containing a complete frame of data.
Memory buffers are temporary storage areas in RAM that hold data while it’s being transferred between devices or processes. They act like midway points, ensuring smooth data flow and preventing delays caused by speed differences between hardware components. Examples include video streaming, keyboard input, and printing documents.
Datapath
The datapath is a component of the processor that performs arithmetic operations
Control Unit
The control unit is a component of the processor that
- commands the datapath
- Manages memory and
- I/O devices
according to the instructions of the program.
Integrated circuit
Also called a chip. A device combining dozens to millions of transistors.
Central processor unit (CPU)
Also called processor. The active part of the computer, which contains the datapath and control and which adds: numbers, tests numbers, signals, & I/O devices to activate, and so on.
Static random access memory (SRAM)
Also memory built as an integrated circuit, but faster and less dense than DRAM.
Instruction set architecture
Instruction set architecture is an abstract interface between the
- hardware and
- The lowest-level software
Which encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on.
Instruction set architecture is also called architecture.
Application binary interface (ABI)
The application binary interface is an
- user portion of the instruction set plus
- operating system interfaces used by application programmers.
It defines a standard for binary portability across computers.
Volatile memory
Storage, such as DRAM, that retains data only while receiving power.
Nonvolatile Memory
A form of memory that retains data even in the absence of a power source and that is used to store programs between runs. A DVD disk is nonvolatile.
Magnetic disk
Also called hard disk. A form of nonvolatile secondary memory composed of rotating platters coated with a magnetic recording material.
Because they are rotating mechanical devices, access times are about 5 to 20 milliseconds and cost per gigabyte in 2012 was $0.05 to $0.10
Main memory
Also called primary memory. Memory used to hold programs while they are running; typically consists of DRAM in today’s computers.
Secondary memory
- Nonvolatile or Volatile?
Nonvolatile memory used to store programs and data between runs; typically consists of flash memory in PMDs and magnetic disks in servers.
Nonvolatile are computer memories that can retain & stored information regardless if power is removed
Flash memory
A nonvolatile semiconductor memory.
- It is cheaper and slower than DRAM
- But more expensive per bit and faster than magnetic disks.
Access times are about 5 to 50 microseconds and cost per gigabyte in 2012 was $0.75 to $1.00.
Single Instruction Single Data (SISD)
A uniprocessor
Multiple Instruction Multiple Data (MIMD)
The conventional MIMD programming model, where a single program runs across all processors.
Single Instruction Stream, Multiple Data Streams (SIMD)
The same instruction is applied to many data streams, as in a vector processor.
Data-level parallelism
Parallelism achieved by performing the same operation on independent data. In simpler terms, it means you can process multiple pieces of data simultaneously if they don’t rely on each other.
For example, if you’re calculating the average of two separate lists of numbers, you can parallelize the task by calculating the average of each list independently.
LEGv8
assembly language instructions
data hazard (pipeline data hazard)
When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction are not yet available.
forwarding (bypassing)
A method of reducing a data hazard
Forwarding or bypassing is a method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible registers or memory
Structural hazard
When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.
Pipelining
Technique that allows the CPU to work on more than one instruction at a time
Formula
total process time = [longest task * (total load -1)] + total load time
R-format ALU operations
Requires register file and the ALU.
Program Counter (PC)
(PC)
The register that contains the address of the next instruction to be executed
output
The results of the operation of any system.
spatial locality
The principle stating that if a data location is referenced, data locations with nearby addresses will tend to be referenced soon.
RAID 2
Bit-level striping with dedicated Hamming-code parity. OBSOLETE.
Application Binary Interface (ABI)
The user portion of the instruction set plus the operating system interfaces used by application programmers.
It defines a standard for binary portability across computers.
Smaller is faster
A very large number of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther.
- Guidelines such as “smaller is faster” are not absolutes; 31 registers may not be faster than 32.
- Even so, the truth behind such observations causes computer designers to take them seriously. In this case, the designer must balance the craving of programs for more registers with the designer’s desire to keep the clock cycle fast.
- Another reason for not using more than 32 is the number of bits it would take in the instruction format.
commit unit
The commit unit is an unit in a dynamic or out-of-order execution pipeline that decides when it is safe to release the result of an operation to programmers, visible registers, and memory.
in-order commit
A commit in which the results of pipelined execution are written to the programmer-visible state in the same order that instructions are fetched.
Exception Enable (Interrupt Enable)
A signal or action that controls whether the process responds to an exception or not; necessary for preventing the occurrence of exceptions during intervals before the processor has safely saved the state needed for restart.
Weak scaling
Weak scaling is a technique to speed-up a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors.
Not bound by Amdah’s Law
Write serialization
Write serialization is a method that ensures writes to a location are seen in the same order by all processors.
- Maintaining the order of writes to a given location ensures that all processors sharing memory read the correct data.
- Write serialization is like taking a snapshot of an object and saving it as a sequence of bytes.
- This allows us to store the object’s data in a file or send it over a network.
Imagine this: You have a toy car
- Write serialization is like taking a picture of the car and saving it as a digital image file.
- This file contains all the information about the car’s color, shape, and size.
- Later, you can open this file and see the car’s picture, even if the actual toy car is not around.
- Similarly, write serialization saves an object’s data in a byte stream, which can be used to recreate the object later.
multimedia extensions (MMX)
A multimedia extension is an expanded set of instructions supported by a processor that provides multimedia-specific functions.
temporal locality
The temporary locIlity is a principle stating that if a data location is referenced then it will tend to be referenced again soon.
Memory hierarchy
A memory hierarchy is a structure that uses multiple levels of memories.
- As the distance from the processor increases,
- So does boththe size of the memories & access time
Block (or line)
The minimum unit of information that can be either present or not present in a cache.
Hit rate
The hit rate is a fraction of memory accesses found in a level of the memory hierarchy.
Miss rate
The fraction of memory accesses not found in a level of the memory hierarchy
miss penalty
The miss penalty is the time required to fetch a block into a level of the memory hierarchy from the lower level,
including the time it takes to:
- Access the block
- transmit it from one level to the other
- insert that block in the level that experienced the miss
- Then pass the block to the requestor.
Hit time
The time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.
Parallelization
consist of dividing a program into separate components that run in parallel on individual computers in the cluster
ARM architecture
ARM architecture is a contract between hardware and software dictating how they interact with each other. It specifies rules for hardware operations when executing instructions.
can support 16-bit
Amdahl’s Law
A formula used to find the maximum improvement possible by improving a particular part of a system. In parallel computing, Amdahl’s law is mainly used to predict the theoretical maximum speedup for program processing using multiple processors
multiprocessor
A term used to refer to a computer with more than one CPU.
Uniform Memory Access (UMA)
A multiprocessor in which latency to any word in main memory is about the same no matter which processor requests the access.
Non-Uniform Memory Access (NUMA)
A Non uniform memory access is a varying system memory access times, because of system hardware.
loop unrolling
A technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together.
Blocking
Blocking is a failure to retrieve information that is available in memory even though you are trying to produce it
can help reduce cache miss rate
Set Associative Cache
A set associative cache is cache that has a fixed number of locations (at least two) where each block can be placed.
RAID 0 (Disk Striping)
Disk Striping. Disk striping requires at least two drives. It does not provide redundancy to data. If any one drive fails, all data is lost.
- Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
- Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
RAID 1 (mirroring)
Two drives are used in unison, and all data is written to both drives, giving you a mirror or extra copy of the data, in the case that one drive fails
RAID 3
Byte-level striping with dedicated parity. OBSOLETE, replaced with RAID 5.
- Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
- Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
RAID 4
Same stripping as RAID 5
Block-level striping with dedicated parity. Not often used, replaced with RAID 5.
- Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
- Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
RAID 5
Same stripping as RAID 4
Block level striping with parity. RAID-5 uses three or more disks and provides fault tolerance.
- Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
- Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
RAID 6
Disk striping with parity. RAID-6 uses four or more disks and provides fault tolerance. It can survive the failure of two drives.
More about what’s Dual parity data striping
- Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
- Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
silicon crystal ingot
A rod composed of a silicon crystal that has these dimensions
- 8 - 12 inches: diameter
- 12-24 inches: long
wafer
A slice from a silicon ingot no more than 0.1 inches thick, used to create chips.
Instruction Set Architecture (ISA)
Also called architecture. An abstract interface between the hardware and the lowest-level software that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on
Transistor
An on/off switch controlled by an electric signal
very large-scale integrated (VLSI) circuit
A device containing hundreds of thousands to millions of transistors.
silicon
A natural element that is a semiconductor
- Excellent conductors of electricity (using either microscopic copper or aluminum wire)
- Excellent insulators from electricity (like plastic sheathing or glass)
- Areas that can conduct or insulate under special conditions (as a switch)
Semiconductor
A substance that can conduct electricity (providing a flow of electricity) under some conditions
Die
The individual rectangular sections that are cut from a wafer, more informally known as chips.
complementary metal-oxide semiconductor (CMOS)
How is it dominant?
Dominant technology for integrated circuits
- Low power
- High noise immunity.
- Scalability
- Cost Effective
LEGv8 word
A natural unit of access in a computer, usually a group of 32 bits
LEGv8 register
- LEGv8 is a simple subset of the ARMv8 AArch64 architecture
- it is a 64-bit architecture that uses 32-bit instructions.
- It has 32 registers, each 64-bits wide, (one of them always zero).
more registers will lead to a slower clock frequency
Data transfer instruction
A command that moves data between memory and registers.
Address
A value used to delineate the location of a specific data element within a memory array.
load
data transfer instruction that copies data from memory to a register
LEGv8 LDUR
The LEGv8 LDUR loads a (32-bits or 64-bit) value from a memory address. The value of the memory address (used to fetch the value) is calculated by adding a base address and an offset. The result is stored in a register.
The U in LDUR stands for unscaled immeditate
base address
A base address is a reference point for other addresses in computing.
- It's the starting memory address of a block of data or memory in a system's memory. - Base addresses can refer to a specific location of a program component or data structure (5000 below)
base register
A b$ r$ is a pointer to a byte in memory that can hold the smallest legal physical memory address.
- It can also represent the beginning location of a memory array (X22 below)
offset
a constant value added to a base address to locate a particular array element (8 below)