General Knowledge Flashcards
Explain what a tightly coupled workload is
If you are spanning multiple instances, you are tightly coupled if those instances have to talk to one another in order to get the job done. The failure of one node usually leads to the failure of the entire calculation
Explain what a loosely coupled workload is
Each instance can process independently, and maybe once in a while they write to disk to check point data with one another. But they don’t actually have to actively communicate
Explain what a FLOP is
FLoating-point Operations Per Second. They’re used as a measure of how fast a processor is; how many operations they can do on floating point numbers every second. Floating point numbers are basically numbers that have a decimal place in them somewhere, as opposed to an integer which is just a round number on its own. A more powerful processor can do more operations more quickly.
What is FPGA
Field Programmable Gate Array.
A regular processor is hard-wired; it’s versatile and there’s lots you can do with it, but on the hardware level, you can’t change anything. When you program for a standard CPU, you’re limited by the way the hardware was laid out, because different arrangements are better at different things (think of how a GPU is better than a CPU at certain tasks).
A “pure” FPGA is a chip that can be changed on the hardware level. You use software to change the way the chip operates, so, in essence, you can build your own chip for whatever application you’re doing. When you’re writing some types of code, there are optimizations that you simply can’t make on a regular CPU because the manufacturer designed it to be versatile.
What is a scheduler
Managed which jobs run where and when. Used to share the HPC system amongst many users.
What is an MPI
Message Passing Interface. Can be operated only on homogeneous system. Used to exchange data between workers. MPI handles process-to-process communications, while OpenMP manages threads in a given process. In a distributed system environment consisting of processes that have their own memory locally, a programming model that implements communication between processes only by sending and receiving messages. Processes do not share memory space and do not allow one process to directly access another process’s memory
What is the purpose of checkpointing
- HPC has limited hardware redundancy;
- If nodes supporting a HPC application fail
○ the application eventually fails
○ Therefore, applications need to periodically save their application state (e.g. check-pointing) to guard against losing a lot of work
- If nodes supporting a HPC application fail
Explain the difference between a Management Node and a Master Node
Usually the master or head node is where your users will login and run jobs. So it’s sort of like a bastion to the compute cluster. This is generally where you would run the scheduler as well, but sometimes you want to put the scheduler on a different node because you don’t want users on that head node to be mucking about and saturating the CPU/memory and breaking your services so you put it on the management node. The management node is also where you can tap in and operate on the storage without root squashing, you can also provide directory services to tap in here.
What is a workflow
It’s rare that an application is run by itself. Usually, a set of applications are run in a series of steps to form a complete workflow