OA Flashcards

1
Q

superscalar

A

Technique primarily associated with hardware.

  1. Functional units ALU, Floating Point Unit, Load/Store Unit are duplicated in the pipeline of a superscalar processor
  2. which allows the hardware to issue multiple instructions to each unit simultaneously.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

LEGv8 doubleword

A

Another natural unit of access in a computer, usually a group of 64 bits (8 bytes)

Bits corresponds to the size of a register in the LEGv8 architecture. In computers, a byte is a group of 8 bits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

virtual memory

A

A technique that uses main memory as a “cache” for secondary storage.

The address is broken into a virtual page number and a page offset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Program counter (PC)

A

The register that contains the address of the next instruction to be executed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

LDUR

A

load register

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Register File

How do you access the register?

A

A register file is a state element that consists of a set of registers that can be read and written by supplying a register number to be accessed. The control unit (CU) supplies the register number to be accessed.

Register Files provides 1024 scalar, 32-bit registers for up to 64 threads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

machine language

A

The language made up of binary-coded instructions that is used directly by the computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

system software

A

The set of programs that enables a computer’s hardware devices and application software to work together; it includes the operating system and utility programs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

operating system

A

(computer science) software that controls the execution of computer programs and may provide various services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assembly Language

A

Programming language that has the same structure and set of commands as machine languages but allows programmers to use symbolic representations of numeric machine code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

IBM 360/91

  1. What’s Tomasulo’s algorithm?
  2. How is it related to Pentium?
A

The IBM 360/91. It was a powerful computer built by IBM in the 1960s. The IBM 360/91 introduced many new concepts, including dynamic detection of memory hazards, generalized forwarding, and reservation stations.

What’s Tomasulo’s algorithm

How is it IBM 360/90 related to Pentium?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dynamic Random Access Memory (DRAM)

A

DRAM is a type of memory that provides quick access to any data location.

  • Multiple DRAMs are used together to contain the instructions and data of a program.

Modern DRAMS consist of rows in each bank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

frame buffering

A

Frame buffering is a process involving a portion of RAM that contains a bitmap, which drives a video display. It acts as a memory buffer containing a complete frame of data.

Memory buffers are temporary storage areas in RAM that hold data while it’s being transferred between devices or processes. They act like midway points, ensuring smooth data flow and preventing delays caused by speed differences between hardware components. Examples include video streaming, keyboard input, and printing documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Datapath

A

The datapath is a component of the processor that performs arithmetic operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Control Unit

A

The control unit is a component of the processor that

  1. commands the datapath
  2. Manages memory and
  3. I/O devices

according to the instructions of the program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Integrated circuit

A

Also called a chip. A device combining dozens to millions of transistors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Central processor unit (CPU)

A

Also called processor. The active part of the computer, which contains the datapath and control and which adds: numbers, tests numbers, signals, & I/O devices to activate, and so on.

How does it do this?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Static random access memory (SRAM)

A

Also memory built as an integrated circuit, but faster and less dense than DRAM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Instruction set architecture

A

Instruction set architecture is an abstract interface between the

  1. hardware and
  2. The lowest-level software

Which encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on.

Instruction set architecture is also called architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Application binary interface (ABI)

A

The application binary interface is an

  1. user portion of the instruction set plus
  2. operating system interfaces used by application programmers.

It defines a standard for binary portability across computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Volatile memory

A

Storage, such as DRAM, that retains data only while receiving power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Nonvolatile Memory

A

A form of memory that retains data even in the absence of a power source and that is used to store programs between runs. A DVD disk is nonvolatile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Magnetic disk

A

Also called hard disk. A form of nonvolatile secondary memory composed of rotating platters coated with a magnetic recording material.

Because they are rotating mechanical devices, access times are about 5 to 20 milliseconds and cost per gigabyte in 2012 was $0.05 to $0.10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Main memory

A

Also called primary memory. Memory used to hold programs while they are running; typically consists of DRAM in today’s computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Secondary memory

  1. Nonvolatile or Volatile?
A

Nonvolatile memory used to store programs and data between runs; typically consists of flash memory in PMDs and magnetic disks in servers.

Nonvolatile are computer memories that can retain & stored information regardless if power is removed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Flash memory

A

A nonvolatile semiconductor memory.

  • It is cheaper and slower than DRAM
  • But more expensive per bit and faster than magnetic disks.

Access times are about 5 to 50 microseconds and cost per gigabyte in 2012 was $0.75 to $1.00.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Single Instruction Single Data (SISD)

A

A uniprocessor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Multiple Instruction Multiple Data (MIMD)

A

The conventional MIMD programming model, where a single program runs across all processors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Single Instruction Stream, Multiple Data Streams (SIMD)

A

The same instruction is applied to many data streams, as in a vector processor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Data-level parallelism

A

Parallelism achieved by performing the same operation on independent data. In simpler terms, it means you can process multiple pieces of data simultaneously if they don’t rely on each other.

For example, if you’re calculating the average of two separate lists of numbers, you can parallelize the task by calculating the average of each list independently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

LEGv8

A

assembly language instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

data hazard (pipeline data hazard)

A

When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction are not yet available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

forwarding (bypassing)

A method of reducing a data hazard

A

Forwarding or bypassing is a method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible registers or memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Structural hazard

A

When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Pipelining

A

Technique that allows the CPU to work on more than one instruction at a time

Formula
total process time = [longest task * (total load -1)] + total load time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

R-format ALU operations

A

Requires register file and the ALU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Program Counter (PC)

A

(PC)
The register that contains the address of the next instruction to be executed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

output

A

The results of the operation of any system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

spatial locality

A

The principle stating that if a data location is referenced, data locations with nearby addresses will tend to be referenced soon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

RAID 2

A

Bit-level striping with dedicated Hamming-code parity. OBSOLETE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Application Binary Interface (ABI)

A

The user portion of the instruction set plus the operating system interfaces used by application programmers.

It defines a standard for binary portability across computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Smaller is faster

A

A very large number of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther.

  1. Guidelines such as “smaller is faster” are not absolutes; 31 registers may not be faster than 32.
  2. Even so, the truth behind such observations causes computer designers to take them seriously. In this case, the designer must balance the craving of programs for more registers with the designer’s desire to keep the clock cycle fast.
  3. Another reason for not using more than 32 is the number of bits it would take in the instruction format.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

commit unit

A

The commit unit is an unit in a dynamic or out-of-order execution pipeline that decides when it is safe to release the result of an operation to programmers, visible registers, and memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

in-order commit

A

A commit in which the results of pipelined execution are written to the programmer-visible state in the same order that instructions are fetched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Exception Enable (Interrupt Enable)

A

A signal or action that controls whether the process responds to an exception or not; necessary for preventing the occurrence of exceptions during intervals before the processor has safely saved the state needed for restart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Weak scaling

A

Weak scaling is a technique to speed-up a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors.

Not bound by Amdah’s Law

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Write serialization

A

Write serialization is a method that ensures writes to a location are seen in the same order by all processors.

  1. Maintaining the order of writes to a given location ensures that all processors sharing memory read the correct data.
  2. Write serialization is like taking a snapshot of an object and saving it as a sequence of bytes.
    • This allows us to store the object’s data in a file or send it over a network.

Imagine this: You have a toy car

  1. Write serialization is like taking a picture of the car and saving it as a digital image file.
  2. This file contains all the information about the car’s color, shape, and size.
  3. Later, you can open this file and see the car’s picture, even if the actual toy car is not around.
  4. Similarly, write serialization saves an object’s data in a byte stream, which can be used to recreate the object later.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

multimedia extensions (MMX)

A

A multimedia extension is an expanded set of instructions supported by a processor that provides multimedia-specific functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

temporal locality

A

The temporary locIlity is a principle stating that if a data location is referenced then it will tend to be referenced again soon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Memory hierarchy

A

A memory hierarchy is a structure that uses multiple levels of memories.

  1. As the distance from the processor increases,
  2. So does boththe size of the memories & access time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Block (or line)

A

The minimum unit of information that can be either present or not present in a cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Hit rate

A

The hit rate is a fraction of memory accesses found in a level of the memory hierarchy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Miss rate

A

The fraction of memory accesses not found in a level of the memory hierarchy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

miss penalty

A

The miss penalty is the time required to fetch a block into a level of the memory hierarchy from the lower level,

including the time it takes to:

  1. Access the block
  2. transmit it from one level to the other
  3. insert that block in the level that experienced the miss
  4. Then pass the block to the requestor.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Hit time

A

The time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Parallelization

A

consist of dividing a program into separate components that run in parallel on individual computers in the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

ARM architecture

A

ARM architecture is a contract between hardware and software dictating how they interact with each other. It specifies rules for hardware operations when executing instructions.

can support 16-bit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Amdahl’s Law

A

A formula used to find the maximum improvement possible by improving a particular part of a system. In parallel computing, Amdahl’s law is mainly used to predict the theoretical maximum speedup for program processing using multiple processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

multiprocessor

A

A term used to refer to a computer with more than one CPU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Uniform Memory Access (UMA)

A

A multiprocessor in which latency to any word in main memory is about the same no matter which processor requests the access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Non-Uniform Memory Access (NUMA)

A

A Non uniform memory access is a varying system memory access times, because of system hardware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

loop unrolling

A

A technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Blocking

A

Blocking is a failure to retrieve information that is available in memory even though you are trying to produce it

can help reduce cache miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Set Associative Cache

A

A set associative cache is cache that has a fixed number of locations (at least two) where each block can be placed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

RAID 0 (Disk Striping)

A

Disk Striping. Disk striping requires at least two drives. It does not provide redundancy to data. If any one drive fails, all data is lost.

  1. Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
  2. Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

RAID 1 (mirroring)

A

Two drives are used in unison, and all data is written to both drives, giving you a mirror or extra copy of the data, in the case that one drive fails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

RAID 3

A

Byte-level striping with dedicated parity. OBSOLETE, replaced with RAID 5.

  1. Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
  2. Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

RAID 4

Same stripping as RAID 5

A

Block-level striping with dedicated parity. Not often used, replaced with RAID 5.

  1. Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
  2. Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

RAID 5

Same stripping as RAID 4

A

Block level striping with parity. RAID-5 uses three or more disks and provides fault tolerance.

  1. Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
  2. Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

RAID 6

A

Disk striping with parity. RAID-6 uses four or more disks and provides fault tolerance. It can survive the failure of two drives.

More about what’s Dual parity data striping

  1. Data striping without parity (found in RAID 0) offers zero redundancy or fault tolerance.
  2. Data striping with parity (found in RAID 4,5,& 6) does indeed offer redundancy or fault tolerance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

silicon crystal ingot

A

A rod composed of a silicon crystal that has these dimensions

  1. 8 - 12 inches: diameter
  2. 12-24 inches: long
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

wafer

A

A slice from a silicon ingot no more than 0.1 inches thick, used to create chips.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Instruction Set Architecture (ISA)

A

Also called architecture. An abstract interface between the hardware and the lowest-level software that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Transistor

A

An on/off switch controlled by an electric signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

very large-scale integrated (VLSI) circuit

A

A device containing hundreds of thousands to millions of transistors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

silicon

A

A natural element that is a semiconductor

  1. Excellent conductors of electricity (using either microscopic copper or aluminum wire)
  2. Excellent insulators from electricity (like plastic sheathing or glass)
  3. Areas that can conduct or insulate under special conditions (as a switch)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Semiconductor

A

A substance that can conduct electricity (providing a flow of electricity) under some conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Die

A

The individual rectangular sections that are cut from a wafer, more informally known as chips.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

complementary metal-oxide semiconductor (CMOS)

How is it dominant?

A

Dominant technology for integrated circuits

Dominant how? Here’s how.

  1. Low power
  2. High noise immunity.
  3. Scalability
  4. Cost Effective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

LEGv8 word

A

A natural unit of access in a computer, usually a group of 32 bits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

LEGv8 register

A
  • LEGv8 is a simple subset of the ARMv8 AArch64 architecture
  • it is a 64-bit architecture that uses 32-bit instructions.
    • It has 32 registers, each 64-bits wide, (one of them always zero).

more registers will lead to a slower clock frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

Data transfer instruction

A

A command that moves data between memory and registers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

Address

A

A value used to delineate the location of a specific data element within a memory array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

load

A

data transfer instruction that copies data from memory to a register

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

LEGv8 LDUR

A

The LEGv8 LDUR loads a (32-bits or 64-bit) value from a memory address. The value of the memory address (used to fetch the value) is calculated by adding a base address and an offset. The result is stored in a register.

The U in LDUR stands for unscaled immeditate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

base address

A

A base address is a reference point for other addresses in computing.

- It's the starting memory address of a block of data or memory in a system's memory. 

- Base addresses can refer to a specific location of a program component or data structure (5000 below)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

base register

A

A b$ r$ is a pointer to a byte in memory that can hold the smallest legal physical memory address.

- It can also represent the beginning location of a memory array (X22 below)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

offset

A

a constant value added to a base address to locate a particular array element (8 below)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

Big Endian

A

A CPU or memory architecture in which the most significant byte is stored at the lowest memory address.

90
Q

store register

A

instruction complementary to load. It copies data from register to memory.

the format is similar to load; name of the operation, followed by the register to be stored, then the base register, and finally the offset to select the array element.

91
Q

LEGv8 STUR

A

store register

92
Q

spilling register

A

Formally a technique in which

  1. A variable is moved
  2. From a register space to the RAM within the main memory
  3. The main memory temporarily stores it.
  4. The variables are to be used in the program currently under execution
93
Q

reservation station

A

A buffer within a functional unit that holds the operands and the operation.

94
Q

Reorder Buffer

A

The buffer that holds results in a dynamically scheduled processor until it is safe to store the results to memory or a register.

95
Q

out-of-order execution

A

A situation in pipelined execution when an instruction blocked from executing does not cause the following instructions to wait.

96
Q

VLIW

A

A style of instruction set architecture that launches many operations that are defined to be independent in a single wide instruction, typically with many separate opcode fields

97
Q

ARMv8 virtual memory

A
  1. Uses a 64-bit address space, but only the lower 48 bits are used for addressing.
  2. This allows for a very large virtual address space, even for systems with less physical memory.
  3. Virtual memory is a key concept in computer architecture that separates the logical memory space seen by programs from the physical memory space available in the system.

This provides several benefits, including:

Process isolation: Programs can be given their own virtual address space, which helps to prevent them from interfering with each other’s memory.

Efficient memory usage: Virtual memory allows programs to use more memory than is physically available by swapping data between physical memory and secondary storage (such as a hard disk).

Simplified memory management: Virtual memory makes it easier for the operating system to manage memory, as it can allocate and deallocate memory dynamically.

98
Q

address translation (address mapping)

Address mapping is used to access memory

A

The process by which a virtual address is mapped to an address used to access memory.

99
Q

Exception Syndrome Register (ESR)

A

Register that record the cause of the exception

100
Q

FADDS, FSUBS

In computer architecture, FADDs and FSUBs are instructions for floating-point arithmetic:

A
  • FADDs: Floating-point addition (single precision). Adds two single-precision floating-point numbers.
  • FSUBs: Floating-point subtraction (single precision). Subtracts two single-precision floating-point numbers.

There are also double-precision variants (FADDD, FSUBD) for higher precision calculations.

101
Q

FADDD, FSUBD, FMULD, FDIVD

Floating-point addition, .. subtraction , ..multiplication, .. division

A

Double-precision arithmetic

  • Double-precision uses 64 bits (about 15 decimal digits) for higher accuracy and wider range, while
    • single-precision uses 32 bits (about 7 decimal digits).

Choose double-precision when high accuracy is crucial.

102
Q

FCMPS, FCMPD

Floating-point Compare Single-precision

Floating-point Compare Double-precision

A

Single- and double-precision comparison

  • Double-precision uses 64 bits (about 15 decimal digits) for higher accuracy and wider range, while
  • single-precision uses 32 bits (about 7 decimal digits).

Choose double-precision when high accuracy is crucial.

103
Q

motivations for virtual memory

A
  1. To allow efficient and safe sharing of memory among several programs
  2. To remove the programming burdens of a small, limited amount of main memory [still being used today]
  3. To allow a single user program to exceed the size of primary memory.
104
Q

physical address

A

An address in main memory.

105
Q

Protection

A

It refers to the mechanisms implemented to prevent multiple processes running simultaneously from interfering with each other.

  1. This interference could be intentional or unintentional and may involve reading or modifying another process’s data.
  2. Protection mechanisms also safeguard the operating system itself from user processes.
    1. Essentially, these mechanisms create isolated environments for each process, ensuring they operate independently and securely.
106
Q

page

A

A virtual memory block.

all virtual memory system relocate the program as a set of fixed-size blocks

107
Q

page fault

A

a virtual memory miss

108
Q

virtual address

A

An address that corresponds to a location in virtual space and is translated by address mapping to a physical address when memory is accessed.

109
Q

page table

Contains the virtual to physical address translation

A

The table containing the virtual to physical address translations in a virtual memory system.

  1. The table, which is stored in memory, is typically indexed by the virtual page number;
  2. Each entry in the table contains the physical page number for that virtual page if the page is currently in memory.
  3. indexed by the page number
    1. from the virtual address
110
Q

swap space

A

The space on the disk reserved for the full virtual memory space of a process

111
Q

Reference Bit (Use Bit or Access Bit)

A

A field that is set whenever a page is accessed and that is used to implement LRU or other replacement schemes.

ARMv8 calls it an access bit

112
Q

Techniques for reducing total max storage required

A
  1. Keep a limit register that restrict the size of the page table for a given process and add more entries as needed.
  2. A limit register for each segment specifies the current size of the segment, which grows in units of pages. This type of segmentation is used by many architectures, including ARMv8 and MIPS. Unlike the type of segmentation discussed in a previous elaboration, this form of segmentation is invisible to the application program, although not to the operating system. This does not work when the address space is used sparsely rather than contiguous.
  3. Apply a hashing function to the virtual address so that the table need to be only the size of the number of physical pages in the main memory. AKA inverted page table. Lookup process can be more complex because it is not indexed
  4. Allow the page tables to be paged. It works by allowing the page tables to reside in the virtual address space.
  5. Multiple levels of page tables and is the solution that ARMv8 uses to reduce the memory footprint of address translation. This scheme allows the address space to be used in a sparse fashion (multiple noncontiguous segments can be active) without having to allocate the entire page table. Useful with very large address spaces and in software systems that require noncontiguous allocation. The primary disadvantage of this multi-level mapping is the more complex process for address translation.
113
Q

Least Recently Used (LRU)

A

A replacement scheme in which the block replaced is the one that has been unused for the longest time

114
Q

dirty bit

A

indicates if a page has been written since being read into memory

115
Q

Translation Lookaside Buffer (TLB)

A

A cache that keeps track of recently used address mappings to try to avoid an access to the page table. Can be used to improve access performance by relying on locality of reference

TLB size: 16-512 entries
Block size: 1-2 page table entries (typically 4-8 bytes each)
Hit time: 0.5-1 clock cycle
Miss penalty: 10-100 clock cycles
Miss rate: 0.01%-1%

116
Q

The Intrinsity FastMATH TLB

A

The memory system uses 4 KiB pages and just a 32-bit address space; thus, the virtual page number is 20 bits long. The physical address is the same size as the virtual address. The TLB contains 16 entries, it is fully associative, and it is shared between the instruction and data references. Each entry is 64 bits wide and contains a 20-bit tag (which is the virtual page number for that TLB entry), the corresponding physical page number (also 20 bits), a valid bit, a dirty bit, and other bookkeeping bits. Like most ARMv8 systems, it uses software to handle TLB misses.

117
Q

TLB miss

A

indicates that a page is not in the TLB. Another process then finds and loads the missing page.

118
Q

TLB events combination

A

+——-+————-+——-+————+
| TLB | Page table | Cache | Result |
+——-+————-+——-+————+
| Hit | Hit | Miss | Possible |
| Miss | Hit | Hit | Possible |
| Miss | Hit | Miss | Possible |
| Miss | Miss | Miss | Possible |
| Hit | Miss | Miss | Impossible |
| Hit | Miss | Hit | Impossible |
| Miss | Miss | Hit | Impossible |
+——-+————-+——-+————+

119
Q

virtually addressed cache

A

A cache that is accessed with a virtual address rather than a physical address

Does not use the TLB

120
Q

Aliasing

A

A situation in which two addresses access the same object; it can occur in the virtual memory when there are two virtual addresses for the same physical page

121
Q

Physically addressed cache

A

A cache that is addressed by a physical address.

122
Q

context switch

A

A changing of the internal state of the processor to allow a different process to use the processor that includes saving the state needed to return to the currently executing process.

123
Q

syscall

A

Generates a system call exception that transfers control to the processor and allows access to a dedicated location in supervisor code space. The process returns to user mode via the RET instruction.

124
Q

L1 cache (primary cache)

A

a cache for a cache

125
Q

L2 cache

A

A cache for main memory

It is faster than memory, but tends to be larger and slower than the L1 cache

126
Q

Virtual machines

A

Developed in the mid-1960s.

Benefits:
Managing software
Managing hardware

127
Q

virtual machine manager (VMM)

A

Hypervisor

usually, run in system mode while guest VM run in user mode.

hardware = host
VMs = guest

It determines how to map virtual resources to physical resources.

It is also much smaller than a traditional OS; the isolation portion of a VMM is only 10,000 lines of code

128
Q

Instruction Set Architecture (ISA)

A

The part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O.

129
Q

Memory Hierarchy

A

concept that is necessary for the CPU to be able to manipulate data.

130
Q

Strong scaling

A

Speed-up achieved on a multiprocessor without increasing the size of the problem.

131
Q

multiprocessor architecture

A

A unified graphics and computing multiprocessor executes vertex, geometry, and pixel fragment shader programs, and parallel computing programs

advantages:
Increased throughput
Cost saving
Increased reliability

132
Q

Streaming processor (SP)

A

the primary thread instruction processor in the multiprocessor

133
Q

Special function unit (SFU)

A

Compute 32-bit floating-point approximations to reciprocal, reciprocal square root, and key transcendental functions. It also implements 32-bit floating-point planar attribute interpolation for pixel shaders, providing accurate interpolation of attributes such as color, depth, and texture coordinates.

134
Q

Uniform Memory Access (UMA)

A

A multiprocessor in which latency to any word in main memory is about the same no matter which processor requests the access.

135
Q

Non-Uniform Memory Access (NUMA)

A

Varying system memory access times, because of system hardware.

136
Q

throughput

A

the amount of work performed by a system during a given period of time

137
Q

CPU Time Formula

A

(Instructions) x (CPI) x (Clock Cycle Time)

138
Q

Set Associative Cache

A

A cache that has a fixed number of locations (at least two) where each block can be placed

(Block number) modulo (Num set in the cache)

Variable
one-way - Original set
two-way - Will improve overall performance
four-way
eight-way

139
Q

Fully associative cache

A

A cache structure in which a block can be placed in any location in the cache.

140
Q

CDC 6600

A

This system is widely considered to have been the first supercomputer. Also first load-store architecture

141
Q

Tomasulo’s Algorithm

A

An algorithm for dynamic scheduling and out-of-order execution

uses dynamic hazard detection, generalized forwarding, and reservation stations.

142
Q

IBM 7030

A

AKA Stretch

Produced with the goal of being 100 times faster than the previous IBM 704

143
Q

Imprecise interupt

A

The unpopularity of imprecise interrupts led to the standard of commit units in dynamically scheduled pipelined processors

144
Q

Digital Equipment Corporation (DEC)

A

A major American company in the computer industry from the 1950s to the 1990s

145
Q

PDP-8

A

First commercial minicomputer introduced by Digital Equipment Corporation

cost under $20,000

146
Q

Intel 4004

A

First microprocessor

147
Q

supercomputer

A

a particularly powerful mainframe computer.

148
Q

Seymour Cray

A

American inventor of the Cray supercomputer.

149
Q

Least Significant Bits

A

The two bits furthest to the right

150
Q

most significant bit

A

left most bit

151
Q

Sign and magnitude representation

A

a signed number representation where a single bit is used to represent the sign and the remaining bits represent the magnitude

152
Q

sign extension

A

function of a copy signed load is to copy the sign repeatedly to fill the rest of the register

153
Q

LEGv8 fields

A
  1. opcode - 11 bits - basic operation of the instruction
  2. rm - 5 bits - the second register source operand
  3. shamt - 6 bits - shift amount
  4. rn - 5 bits - the first register source operand
  5. rd - 5 bits - the register destination operand
154
Q

B.EQ
equal

B.NE
not equal

B.LT
less than

B.LE
less than or equal to

B.GT
greater than

B.GE
greater than or equal to

A

B.EQ
equal

B.NE
not equal

B.LT
less than

B.LE
less than or equal to

B.GT
greater than

B.GE
greater than or equal to

155
Q

condition codes (flag)

A

4 bits are used

In MIPS, two registers are compared and the result of the comparison is stored in a third register. Then a conditional branching statement assess the value of the third register to see if the condition is true or false.

Negative (N)
Zero (Z)
Overflow (V)
Carry (C)

156
Q

B.MI
branch on minus
N=1

B.PL
branch on plus
N=0

B.VS
branch on overflow set
V=1

B.VC
branch on overflow clear
V=0

A

B.MI
branch on minus
N=1

B.PL
branch on plus
N=0

B.VS
branch on overflow set
V=1

B.VC
branch on overflow clear
V=0

157
Q

branch-and-link instruction

A

An instruction that branches to an address simultaneously saves the address of the following instruction in a register

(LR or X30 in LEGv8).

158
Q

Return address

A

A link to the calling site that allows a procedure to return to the proper address; in MIPS it is stored in register

LR (X30)

159
Q

Caller

A

The program that instigates a procedure and provides the necessary parameter values.

160
Q

Callee

A

A procedure that executes a series of stored instructions based on parameters provided by the caller and then returns control to the caller.

161
Q

overflow (floating point)

A

A situation in which a positive exponent becomes too large to fit in the exponent field.

162
Q

underflow (floating point)

A

A situation in which a negative exponent becomes too large to fit in the exponent field.

163
Q

double precision

A

A floating-point value represented in 64-bit words.

164
Q

single precision

A

A floating-point value represented in a single 32-bit word.

165
Q

subword parallelism (data level parallelism)

A

Given that the parallelism occurs within a wide word

They are known as well as vector or SIMD, for single instruction, multiple data (see COD Section 6.6 (Introduction to graphics processing units)). The rising popularity of multimedia applications led to arithmetic instructions that support narrower operations that can easily compute in parallel.

166
Q

accumulator

A

Archaic term for register. On-line use of it as a synonym for “register” is a fairly reliable indication that the user has been around quite a while

167
Q

Load-Store Architecture (register-register architecture)

A

An instruction set architecture in which all operations are between registers and data memory may only be accessed via loads of stores

168
Q

RISC Architecture

A

Reduced instruction set computer architecture

Relies on small and simple instructions instead of more complex and specialized instructions. Most current instruction set employ the architecture model

169
Q

High-level-language computer architecture

A

Proposed in the 1960s, high-level-language architecture failed to make much of a commercial impact. Better compilers and programming languages, and growing memory sizes led to this architecture’s demise.

170
Q

Accumulator Architecture

A

One operand of a binary operation is implicitly in the accumulator

The earliest computers had only one register to perform arithmetic operations. All operations would accumulate in the single register, called the accumulator.

171
Q

Stack Architecture

A

no register

In the 1960s, believing that compilers were not good at register allocation, some companies eliminated registers and instead transferred operands onto and off of the stack, similar to what was done in Hewlett-Packard calculators.

172
Q

ARMv7

A

ARM started as the processor for the Acorn computer, hence its original name of Acorn RISC Machine. The Berkeley RISC papers influenced its architecture.

173
Q

ARMv8

A

extension of ARMv7 with 64-bit address. ARM took the opportunity to redesign the instruction set to make it look much more like MIPS than like earlier ARM versions

174
Q

SDIV
Signed divide

SUBI
subtract immediate

MVZ
move wide with zero

STUR
store register

LSL
logical shift left

LSR
Logical Shift Right

A

SDIV
Signed divide

SUBI
subtract immediate

MVZ
move wide with zero

STUR
store register

LSL
logical shift left

LSR
Logical Shift Right

175
Q

Coprocessor

A

an additional chip that accelerates a portion of the work of a processor; in this case, it accelerated floating-point computation

176
Q

Combinational element

A

An operational element, such as an AND gate or an ALU.

177
Q

State element

A

A memory element, such as a register or a memory.

178
Q

clocking methodology

A

defines when signals can be read and when they can be written

179
Q

Edge-triggered clocking

A

any values stored in a sequential logic element are updated only on a clock edge

Edge-triggered state elements make simultaneous reading and writing both possible and unambiguous.

180
Q

Control signal

A

A signal used for multiplexor selection or for directing the operation of a functional unit; contrasts with a data signal, which contains information that is operated on by a functional unit.

181
Q

Asserted

A

The signal is logically high or true

182
Q

Deasserted

A

The signal is logicall low or false.

183
Q

rising clock edge
0 to 1

falling clock edge
1 to 0

A

rising clock edge
0 to 1

falling clock edge
1 to 0

184
Q

Sign-extend

A

To increate the size of data item by replicating the high-order sign bit of the original data item in the high-order bits of the larger, destination data item.

185
Q

Branch target address

A

the address specified in a branch, which becomes the new program counter if the branch is taken. In the LEGv8 archtecture, the branch target is given by the sum of the offset field of the instruction and the address of the branch

186
Q

Branch taken

A

A branch where the branch condition is satisfied and the program counter becomes the branch target. All unconditional branches are taken branches

187
Q

branch not taken or (untaken branch)

A

A branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch.

188
Q

ALUOp

A

the 4-bit ALU control input using a small control unit that has as inputs the opcode field of the instruction and a 2-bit control field

indicates whether the operation to be performed should be add (00) for loads and stores, pass input b (01) for CBZ, or be determined by the operation encoded in the opcode field (10)

10 is ORR, AND, SUB, ADD (R-type)
00 is LDUR, STUR
01 is CBZ

189
Q

ALU control lines

A

0000 is AND
0001 is OR
0010 is ADD
0110 is SUBTRACT
0111 PASS INPUT B
1100 is NOR

190
Q

Pipeline stall (bubble)

A

A specific form of data hazard in which the data being loaded by a load instruction has not yet become available when it is needed by another instruction.

191
Q

control hazard (branch hazard)

A

When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.

192
Q

Branch prediction

A

A method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.

193
Q

five-stage pipeline

A

five instructions will be in execution during any single clock cycle

  1. IF - instruction fetch
  2. ID - instruction decode and register file read
  3. EX - execution or address calculation
  4. MEM - data memory access
  5. WB - write back
194
Q

Instruction Fetch (IF)

A

Move instruction from memory to the control unit

195
Q

Instruction decode (ID)

A

Pull apart the instruction, set up the operation in the ALU, and compute the source and destination operand addresses

196
Q

Execute (EX)

A

ALU is used to perform the instruction’s operation or to compute an address, or an adder is used for branches; both are depicted using an ALU symbol.

197
Q

Data memory access (MEM)

A

the data memory (DM) may be read (for a load instruction) or written (for a store instruction). For load, the right half is shaded, indicating read. (For store, the left half would be shaded).

198
Q

Write back (WB)

A

the register file (Reg) may be written by certain instructions (like R-type instructions). The left half is shaded to indicate write (vs. read). Although two Reg icons appear in the stylized depictions, only one register file exists.

199
Q

Branch prediction buffer

A

Also called branch history table. A small memory that is indexed by the lower portion of the address of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not.

200
Q

Dynamic branch prediction

A

prediction of branches at runtime using runtime information

201
Q

Branch target buffer

A

A structure that caches the destination PC or destination instruction for a branch. It is usually organized as a cache with tags, making it more costly than a simple prediction buffer.

202
Q

Correlating predictor

A

A branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches.

203
Q

Tournament branch predictor

A

A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch.

204
Q

Microarchitecture

A

The organization of the processor, including the major functional units, their interconnection, and control.

205
Q

Architectural registers

A

The instruction set of visible registers of a processor; for example, in MIPS, these are the 32 integer and 16 floating point registers

206
Q

Three Cs model

A

A cache model in which all cache misses are classified into one of three categories: compulsory misses, capacity misses, and conflict misses

207
Q

compulsory miss (cold-start miss)

A

a cache miss caused by the first access to a block that has never been in the cache

Both large block sizes and prefetching may reduce compulsory misses

208
Q

compulsory miss (cold-start miss)

A

a cache miss caused by the first access to a block that has never been in the cache

Both large block sizes and prefetching may reduce compulsory misses

209
Q

capacity miss

A

A cache miss that occurs because the cache, even with full associativity, cannot contain all the blocks needed to satisfy the request.

Increase cache size may decrease capacity misses and increase access time

210
Q

conflict miss (collision miss)

A

A cache miss that occurs in a set-associative or direct-mapped cache when multiple blocks compete for the same set and that are eliminated in a fully associative cache of the same size

Increases associativity may decrease miss rate and increase access time.

211
Q

Finite-state machine

A

A sequential logic function consisting of a set of inputs and outputs, a next-state function that maps the current state and the inputs to a new state, and an output function that maps the current state and possibly the inputs to a set of asserted outputs.

212
Q

Next-state machine

A

A combinational function that, given the inputs and the current state, determines the next state of a finite-state machine.

213
Q

cache ready signal

A

set in the Compare Tag state if requested read or write is a hit

214
Q

memory ready signal

A

set in the Write Back state when a block is written to memory and in the Allocate state when a memory read is completed.

215
Q

Consistency

A

ensure that writes to a location by different processors are seen in the same order by all processors. It is defines when written values will be returned by a read

216
Q

Coherence

A

ensures that a read of a data item returns the most recently written value of that data item. It defines what values can be returned by a read

217
Q

snooping

A

a popular cache coherence protocol.

Each cache contains a copy of data from a block of physical memory along with a copy of the sharing status of that block. Each cache contains a controller that monitors, or snoops, activity on a shared communication medium to determine if any action is needed to ensure cache coherence.

218
Q

write invalidate protocol

A

enforcing coherence is to ensure that a processor has exclusive access to a data item before it writes that item

219
Q

R-Format Instructions

A

They all read two registers, perform an ALU operation on the contents of the registers, and write the result to a register. We call these instructions either R-type instructions or arithmetic-logical instructions (since they perform arithmetic or logical operations). This instruction class includes ADD, SUB, AND, and ORR,

Need an ALU to operate on the values read from the registers.

220
Q

Multiprocessing

A

The simultaneous execution of two or more instructions at the same time

221
Q

Multithreading

A

allows multiple commands, or threads to run simultaneously