Exams questions Flashcards
Describe the goals, structure, and uses of a Mellor-Crummey and Scott (MCS) lock
The goal of MCS lock is to allow multiple CPUs to operate on the same shared memory resource.
The MCS lock is built upon a spin lock mechanism.
This mechanism has those steps:
1- A global list called queque_spin_lock will be atomically checked to ensure that no one is holding a lock on that resource.
a- If free (so the q_spin_lock points to 0) the lock will be grated and the q_spin_lock will point to that CPU
b- If not free the q_spin_lock will point to the last CPU that requested the lock so the new CPU will create a pointer from the previous CPU to itself and will update the q_spin_lock to himself as well
2- Once a resource will be freed the next CPU that has been linked to the freeing CPU will take control of the resource
Describe the goals and the basic operating principles of the Buddy allocator algorithm.
The buddy allocator algorithm aims to allocate and deallocate physical memory efficiently.
We start from chunks of memory that are usually powers of 2.
Starting from the allocation the algorithms will get a request for an amount of memory. At this point, it will lock for the smallest chunk of memory that can fit the request. If the chunk is bigger than the required space it will be left recursively divided till having just the right space. Once that has been done the algorithm will assign the highest portion of the subdivision to the callee. All chunks that are obtained from this process will be referenced by a 1-per-size linked list that will reference all of them. This alignment of the subdivisions is prefixed so that is really fast to look up contiguous blocks to re-merge them.
Deallocation instead works in the opposite way. Once a chunk of memory has been freed we recursively check if the adjacent block can be re-merged with the deallocated one.
In Linux, this fragmentation is represented through bitmaps that are useful to Linux to understand if the next block is free and able to re-merge
Describe Linux’s CFS scheduling policy by highlighting the goals, the most important parameters affecting it, and its difference with previous versions such as the O(1) scheduler.
The CFS (Completly fair scheduler) goal is the fair share of the CPU among all the processes. It will ensure each process will have a chance to run in a given time, avoiding starvation.
Those main parameters characterize the algorithm:
- Target latency: as mentioned before is the amount of time in which the scheduler ensures gives each process a chance to run. (teta)
- Nice value: is a property of each task that will determine how much CPU time proportion will be given to that task. v [-20, +19]. This will be used to compute the actual weight of the process by an exponential formula (lambda)
- Minimum granularity: the minimum time that each task will be able to run. This is represented as pi
The time slice of each process is computed as
t(i) = max(lamba(i) x teta / sum(lambda) , pi)
This algorithm will assign, given the priority (nice value) of the task “proportionally” more CPU time.
When a process became runnable, if it has consumed less CPU time than the running task, it will run immediately.
This algorithm is extended by CGropus that in Linux ensure that all users have the same share of the processor regardless of the number of tasks that they are running.
Describe the goals and the operating principles of the page cache in Linux, by highlighting the main structures involved
Once a file is loaded from the filesystem Linux keeps track of it in order to, if requested again, not look for it in the filesystem. This cache keeps a per-file page cache that will give the process direct access to the file already loaded in memory (if present) when the process requires it. Those pages are represented by a file descriptor and can be retrieved from memory with: file descriptor + offset
Describe the total store ordering (TSO) memory model and how it affects concurrent programming within the Linux kernel.
Total store ordering is a type of memory model that is one step less powerful then sequential store ordering. This is the memory model implemented for x86.
TSO ensures that locally and remotely all the writes done by the same CPU will be seen by other CPUs in the same order but gives no guarantees of the order in which writes that came from different CPUs will propagate on the shared memory. In particular, a consensus algorithm between all the CPUs will decide which store to perform before the others.
This mechanism in Linux can break some synchronization mechanisms not based on locking if writes are seen in the wrong order.
To force the code execution back into a sequentially consistent behavior, write memory fences (barrier)
instructions must be used.
What is the purpose of the slab allocator in Linux? What is the overall structure?
The Linux kernel has to allocate a lot of objects with the same structure. The classic algorithm would be too slow since those algorithms usually account for every large page that will cause really high fragmentation.
A slab is a set of one or more pages that get associated with a particular data structure t.
During the allocation, the algorithm will lock for the specific t structure in the proper slab in order to faster allocate the object.
What is the task internal structure
Each task is represented by s structure that contains information for the operating system to manage it. In particular, this structure is called task_struct and has as fields:
- state: is represented by 4 possible values: running R, interruptible I, non-interruptible N, and dead D
- PID: id of the task that will be assigned from OS on fork()
- PPID: parent PID
- mm_struct: structure of the virtual memory pages assigned to the task
- fs_struct: current and root directory
- files_struct: files currently open by the process
- signal_struct : signal info
- thread_struct: a collection of registers of the thread that will be stored at context switch
Wait queques
Wait queques is a list (queque) of tasks that are currently waiting for an interrupt in order to be runned. Tasks in wait queue can be interruptible and non interruptible. If a task is non interruptible it will not be awaken by any generic interrupt. A task can be also exclusive or not exclusive. If a task is exclusive it will be put at the end of the queue and will be the only one to be awaken in case of a new interrupt. In case of an interrupt then, if the queque starts with non exclusive tasks, all those tasks will be awaken, till the first exclusive task.
How works the linux boot
On Linux boot, once the bootloader will load the structure needed to initialize himself, systemD or systemV will execute. Those tasks will initialize all the needed services in order for linux to operate. Once those tasks will be completed the kernel (PID 0) will start running and init (PID 1) will be forked from it. Init is responsible for starting all others services not essencials for starting the kernel. The only difference between SystemD and SystemV is that SystamD will try to parallelize the initialization of all the services and will expone some APIs (systemCTL) to see their status and menage their execution.
Describe the scheduling class APIs in Unix
In Unix, there are different scheduling classes APIs that the programmer can use as a base to implement the scheduler of that class. Linux doesn’t implement all of them. The API classes are, in order of priority:
- sched_deadline: Scheduling class used for tasks that require to complete before a deadline. Is implemented by the Earliest Deadline First (EDF) scheduling algorithem. Audio is an example of this kind of task. Linux has implemented this class
- sched_fifo: Scheduling class meant for short real-time tasks (priority pi [0-99]) that also require to be non-preemptable. This scheduling class is not implemented in Linux.
- sched_rr: Scheduling class for real-time processes. Processes meant for this scheduling class have a priority pi [0, 99]. This scheduling class is implemented in Linux by a scheduling algorithm called round robin.
- sched_other: Scheduling class for non-real-time processes. Processes meant for this scheduling class have priority pi [100, 149]. Linux implements this scheduler with CFS. In the past, the algorithm used was called O(1) which guaranteed scheduling tasks in O(1) time.
- sched_batch: Scheduling class meant for long-running tasks. Is like sched_other but with a much longer time slice (1.5s)
- sched_idle: Scheduling class implemented in Linux with the only IDLE task. This scheduling class is meant for that kind of program that will execute only NOPs.
Describe the inter-processes communication methods
In Unix, there are different methods to communicate through processes. The simplest one is when a parent process waits for the competition of the child. Though more advanced communication methods are needed.
Signals are
- Unidirectional
- No data transferred
- Asynchronous
communication method used by OS o a process to communicate with other processes.
A process can send a signal by the calling kill(PID -> int, int_numb -> int) with as an argument of the call both the PID of the process and the interrupt number.
The method sigaction is the method used by Linux to subscribe a handler once a process receives an interrupt. By this method sigprocmask the interrupts can also be masked, stored in a queue, and handled later.
Named and Unnamed pipes are another communication method. Those are:
- Unidirectional
- Producer/consumer
- Message-based
The only difference between named and unnamed pipes is that named pipes are not represented by a file description but are virtual files in the file system. Anyway no actual I/O is performed to FS.
Message queues are sort of Named pipes but with multiple producers / multiple consumers. Those queues are represented through a virtual file in the file system.
The last IPC method is shared memory. This method allows multiple processes to allocate and share a chunk of memory with other processes. The pattern to do so is:
1- Create a special virtual file in the file system by SHM_OPEN returning the file descriptor
2- Specify the dimension of the memory object with FTRUNCATE
3- Map the memory of the file descriptor in the process
Cleaning methods are also available
Classification of the scheduling algorithms
Scheduling algorithms can have different properties:
Preemptive or non-preemptive: A scheduling algorithm can or not interrupt processes at any time. Preemption is necessary for responsive systems
Static or dynamic: Scheduling decisions are based on fixed (static) or non-fixed (dynamic) parameters.
Online or Offline: In an offline scheduler the scheduler is fully aware of what processes are meant to be scheduled in the system
Optimal or Heuristic: Optimal schedulers optimize a given cost function. This can be really complex and long to achieve so heuristic algorithms are created to reduce scheduling overhead
Describe PSO
Partial store order is a memory model implemented by ARM processors that is weaker than Sequential and Total Store Order. In this memory model each processor read and writes from a different copy of the memory and each write propagates to the other processors independently, with no guarantee on the order in which those writes will arrive to other processors.
Buffer overflow and meltdown
Buffer overflow and meltdown are two vulnerabilities that can arise in Linux. The first one is a very old and known vulnerability in which the size of a buffer is not checked once a program is asked to write on it so that we can write more than the buffer can contain. This can be used to take control of the RIP (return instruction pointer) in order to perform attacks such as shellcode or ROP. This attack is partially mitigated through a kernel-enforced process called Address Space Layout Randomization (ASLR). This process performs a randomization of the base virtual address so that is not possible to know where to jump after an overflow. A dual approach is present also in kernel space called KASLR.
Meltdown is instead a new kind of attack that leverage on processor speculation to leak kernel memory to user space. It relies on a CPU race condition that can arise between instruction execution and privilege checking. This execution leaves some side effects in user space that are leaks of that privilege checking. One countermeasure is KPTI which exposes two PGDs. One is only visible in kernel mode and contains both kernel addresses and user addresses, The other is visible in User mode and contains all user addresses and only a few amount of kernel addresses necessary for syscalls, interrupts, and exceptions
Interrupts and deferring work
Interrupts are generated from I/O devices or timers. The former is Asynchronous, the latter synchronous. Asynchronous interrupts can be also classified as maskable and unmaskable. Once an interrupt is generated and reaches the CPU, this has to stop the currently running task and handle the interrupt. Once the task has been handled the CPU returns to the execution of its previously executed task.
In order to batch some interrupts and to improve the overall responsiveness Linux implements differing work methods that decide the interrupt management in two levels:
- Top half: Minimal amount of work is executed when the interrupt arrives to register the interrupt. This task is non-interruptible and schedules a deferred work
- Bottom half: Finalize the differed work. The time in which those are scheduled are called reconciliation points. 3 methods of deferred work are available:
- SoftIRQ
- Tasklet
- Work queues