Exam cards Flashcards
Study for PMPH course fall 2018
State Moore’s Law
The number of transistors in a dense integrated circuit doubles about every two years. OR: Computer power doubles every 19-24 months while the cost effectiveness keeps pace.
Explain cost-effectiveness of hardware
Cost-effectiveness is the ratio between the performance and cost of hardware
Explain the killer-micro effect
Additional hardware resources generated by the rapid increase in transistor density were utilized to increase the speed/frequency of single CPU-systems resulting in complex designs of muscled single processors relying on out-of-order dataflow execution model.
What are the two main capabilities of the out-of-order (data-flow) execution, model?
- storing in their pipeline thousands of instruction. 2. executing hundreds of instructions per cycle.
Explain the divergence between academic interest in parallel architectures, and industries focus on muscled single processor systems
Muscling single CPU systems was seen as the path of least resistance as it required no rewriting of code while converting to parallel architectures requires significant rewriting which is tedious and non-trivial.
What was the tectonic shift towards multiprocessor design?
In 2004 Intel cancelled the design of the Pentium 4 GHz and switches to multicore design
What is the power wall?
Dynamic power is proportional to the cube of frequency
What is the memory wall?
The seemingly exponentially-increasing performance gap between processor and memory
Commodity parallel architectures such as GPUs provide thousands of cores and tens of thousands of threads. What is the main barrier to utilizing this power?
The lack of high-level programming models and compiler optimizations that would allow commodity software to harness the power.
What are a program and a process/thread?
A program is a set of statements for performing computational tasks while a process/thread embeds the execution of the computation. A program is to the process/thread what a
the recipe is for cooking
What is a processor/core and what is a multithreaded core?
The hardware entity capable of sequencing and executing a process/threads instructions. Multithreaded cores support multiple hardware threads, each running in its own hardware context.
What is a multiprocessor?
A multiprocessor is a set of processors connected to execute a workload. Each multiprocessor consists of several cores - potentially with hardware multithreaded support and several levels of cache.
Name the three main improvement types in processor design
- Deeper piplines (10-20 stages) - more stages = less complex single stages = less gates per stage, i.e each stage is quicker hence larger clock rate.
- Aggressively exploiting instruction level parallelism (ILP) for example by out-of-order execution, speculative execution which combine techniques such as register renaming, reordering
buffers, branch predication, lockup-free caches, speculative memory disambiguation. - improvements in circuit design
Why is it unsustainable to carry on improving the clockrate?
• First, it is unfeasible to build deeper pipelines because it is difficult to imagine useful stages
that can be built from less than 10 gates (we have already reached that point).
• Second, the impact of technology scaling will be blunted in the future due to wire delays,
which do not scale, because the speed of wire transmission grows much slower than the
switching speed.
• Finally, and perhaps most importantly, circuits clocked at higher rates consume more power,
and we have already reached the limits of power consumption in single-chip microprocessors.
Which hardware optimizations can we do other than increasing clock rate?
Reduction of feature size and increasing the number of transistors.
How can we utilize an increased number of transistors?
We can
1. Enhance the parallelism of the memory system.
2. Fetch and decode multiple instructions per clock.
3 Run concurrently multiple hardware threads per core, for example in order to hide the high latency of the memory system
4 Support thousands of cores that run threads in parallel on different corres.
How much does DRAM density increase every third year (historically)?
4 x
How much does DRAM speed increase each year (historically)?
7%
The memory wall has been replaced by what?
A bandwidth wall. This is because nowadays, the memory subsystem
needs to efficiently feed cores that execute threads in parallel, which means that the memory system
has to be capable of delivering multiple data in the same time, which is measured by bandwidth.
Name the four technological limits in architectural design
Power, wire delays, reliability and complexity of design
Explain what the power constraint is and why it favours parallel processing over an increase in the clock rate.
The power constraint is that the power consumed by a processor is given by the sum of dynamic and static power and is not limitless. The dynamic power is consumed by gate switching. Dynamic power is roughly proportional to the cubic power of the frequency. Hence increasing the frequency requires much more power than replicating a uniprocessor.
What is leakage power and how is it related to dynamic power?
Leakage power is the same as static power and is dissipated in all circuits at all times independent of frequency and switching. It is dominated by cache leakage. the leakage power increases exponentially as the voltage at which a transistor switches off is reduced. Smaller feature sizes and lower threshold voltage means that leakage power has overtaken dynamic power as the major source of dissipation.
What are the three classifications of hardware failures
Transient failures (Soft error): due to voltage reduction in every process generation bits are more and more prone to flip. The device is operational but data may be partly corrupted. Intermittent/Temporary failures: Result of environmental variations on the chip such as hot spots. In order for device to return to correct behavior the issue must be fixed (turn off and cool down). Permanent failures: can not be fixed.
Why do multiproccesors promote more reliablility than uniprocessor systems?
Threads can be used to redundantly perform the same computation and compare results. Faulty cores can be detected and disabled while the system remains functional.
The propagation delay on a wire is proportional to what?
the product between its resistance and capacitance.
The resistance of a wire is proportional to what?
the ratio between the cross-section area and the length of the wire.
What happens to the length of wires at each process generation? Is this good or bad for resistance?
It shrinks. Good!