Exams questions Flashcards by Vincenzo Greco

Describe the goals, structure, and uses of a Mellor-Crummey and Scott (MCS) lock

The goal of MCS lock is to allow multiple CPUs to operate on the same shared memory resource.
The MCS lock is built upon a spin lock mechanism.
This mechanism has those steps:
1- A global list called queque_spin_lock will be atomically checked to ensure that no one is holding a lock on that resource.
a- If free (so the q_spin_lock points to 0) the lock will be grated and the q_spin_lock will point to that CPU
b- If not free the q_spin_lock will point to the last CPU that requested the lock so the new CPU will create a pointer from the previous CPU to itself and will update the q_spin_lock to himself as well
2- Once a resource will be freed the next CPU that has been linked to the freeing CPU will take control of the resource

How well did you know this?

Not at all

Perfectly

Describe the goals and the basic operating principles of the Buddy allocator algorithm.

The buddy allocator algorithm aims to allocate and deallocate physical memory efficiently.
We start from chunks of memory that are usually powers of 2.
Starting from the allocation the algorithms will get a request for an amount of memory. At this point, it will lock for the smallest chunk of memory that can fit the request. If the chunk is bigger than the required space it will be left recursively divided till having just the right space. Once that has been done the algorithm will assign the highest portion of the subdivision to the callee. All chunks that are obtained from this process will be referenced by a 1-per-size linked list that will reference all of them. This alignment of the subdivisions is prefixed so that is really fast to look up contiguous blocks to re-merge them.
Deallocation instead works in the opposite way. Once a chunk of memory has been freed we recursively check if the adjacent block can be re-merged with the deallocated one.
In Linux, this fragmentation is represented through bitmaps that are useful to Linux to understand if the next block is free and able to re-merge

How well did you know this?

Not at all

Perfectly

Describe Linux’s CFS scheduling policy by highlighting the goals, the most important parameters affecting it, and its difference with previous versions such as the O(1) scheduler.

The CFS (Completly fair scheduler) goal is the fair share of the CPU among all the processes. It will ensure each process will have a chance to run in a given time, avoiding starvation.
Those main parameters characterize the algorithm:
- Target latency: as mentioned before is the amount of time in which the scheduler ensures gives each process a chance to run. (teta)
- Nice value: is a property of each task that will determine how much CPU time proportion will be given to that task. v [-20, +19]. This will be used to compute the actual weight of the process by an exponential formula (lambda)
- Minimum granularity: the minimum time that each task will be able to run. This is represented as pi

The time slice of each process is computed as
t(i) = max(lamba(i) x teta / sum(lambda) , pi)

This algorithm will assign, given the priority (nice value) of the task “proportionally” more CPU time.

When a process became runnable, if it has consumed less CPU time than the running task, it will run immediately.

This algorithm is extended by CGropus that in Linux ensure that all users have the same share of the processor regardless of the number of tasks that they are running.

How well did you know this?

Not at all

Perfectly

Describe the goals and the operating principles of the page cache in Linux, by highlighting the main structures involved

Once a file is loaded from the filesystem Linux keeps track of it in order to, if requested again, not look for it in the filesystem. This cache keeps a per-file page cache that will give the process direct access to the file already loaded in memory (if present) when the process requires it. Those pages are represented by a file descriptor and can be retrieved from memory with: file descriptor + offset

How well did you know this?

Not at all

Perfectly

Describe the total store ordering (TSO) memory model and how it affects concurrent programming within the Linux kernel.

Total store ordering is a type of memory model that is one step less powerful then sequential store ordering. This is the memory model implemented for x86.
TSO ensures that locally and remotely all the writes done by the same CPU will be seen by other CPUs in the same order but gives no guarantees of the order in which writes that came from different CPUs will propagate on the shared memory. In particular, a consensus algorithm between all the CPUs will decide which store to perform before the others.
This mechanism in Linux can break some synchronization mechanisms not based on locking if writes are seen in the wrong order.
To force the code execution back into a sequentially consistent behavior, write memory fences (barrier)
instructions must be used.

How well did you know this?

Not at all

Perfectly

What is the purpose of the slab allocator in Linux? What is the overall structure?

The Linux kernel has to allocate a lot of objects with the same structure. The classic algorithm would be too slow since those algorithms usually account for every large page that will cause really high fragmentation.
A slab is a set of one or more pages that get associated with a particular data structure t.
During the allocation, the algorithm will lock for the specific t structure in the proper slab in order to faster allocate the object.

How well did you know this?

Not at all

Perfectly

What is the task internal structure

Each task is represented by s structure that contains information for the operating system to manage it. In particular, this structure is called task_struct and has as fields:
- state: is represented by 4 possible values: running R, interruptible I, non-interruptible N, and dead D
- PID: id of the task that will be assigned from OS on fork()
- PPID: parent PID
- mm_struct: structure of the virtual memory pages assigned to the task
- fs_struct: current and root directory
- files_struct: files currently open by the process
- signal_struct : signal info
- thread_struct: a collection of registers of the thread that will be stored at context switch

How well did you know this?

Not at all

Perfectly

Wait queques

Wait queques is a list (queque) of tasks that are currently waiting for an interrupt in order to be runned. Tasks in wait queue can be interruptible and non interruptible. If a task is non interruptible it will not be awaken by any generic interrupt. A task can be also exclusive or not exclusive. If a task is exclusive it will be put at the end of the queue and will be the only one to be awaken in case of a new interrupt. In case of an interrupt then, if the queque starts with non exclusive tasks, all those tasks will be awaken, till the first exclusive task.

How well did you know this?

Not at all

Perfectly

How works the linux boot

On Linux boot, once the bootloader will load the structure needed to initialize himself, systemD or systemV will execute. Those tasks will initialize all the needed services in order for linux to operate. Once those tasks will be completed the kernel (PID 0) will start running and init (PID 1) will be forked from it. Init is responsible for starting all others services not essencials for starting the kernel. The only difference between SystemD and SystemV is that SystamD will try to parallelize the initialization of all the services and will expone some APIs (systemCTL) to see their status and menage their execution.

How well did you know this?

Not at all

Perfectly

Describe the scheduling class APIs in Unix

In Unix, there are different scheduling classes APIs that the programmer can use as a base to implement the scheduler of that class. Linux doesn’t implement all of them. The API classes are, in order of priority:
- sched_deadline: Scheduling class used for tasks that require to complete before a deadline. Is implemented by the Earliest Deadline First (EDF) scheduling algorithem. Audio is an example of this kind of task. Linux has implemented this class
- sched_fifo: Scheduling class meant for short real-time tasks (priority pi [0-99]) that also require to be non-preemptable. This scheduling class is not implemented in Linux.
- sched_rr: Scheduling class for real-time processes. Processes meant for this scheduling class have a priority pi [0, 99]. This scheduling class is implemented in Linux by a scheduling algorithm called round robin.
- sched_other: Scheduling class for non-real-time processes. Processes meant for this scheduling class have priority pi [100, 149]. Linux implements this scheduler with CFS. In the past, the algorithm used was called O(1) which guaranteed scheduling tasks in O(1) time.
- sched_batch: Scheduling class meant for long-running tasks. Is like sched_other but with a much longer time slice (1.5s)
- sched_idle: Scheduling class implemented in Linux with the only IDLE task. This scheduling class is meant for that kind of program that will execute only NOPs.

How well did you know this?

Not at all

Perfectly

Describe the inter-processes communication methods

In Unix, there are different methods to communicate through processes. The simplest one is when a parent process waits for the competition of the child. Though more advanced communication methods are needed.
Signals are
- Unidirectional
- No data transferred
- Asynchronous
communication method used by OS o a process to communicate with other processes.
A process can send a signal by the calling kill(PID -> int, int_numb -> int) with as an argument of the call both the PID of the process and the interrupt number.
The method sigaction is the method used by Linux to subscribe a handler once a process receives an interrupt. By this method sigprocmask the interrupts can also be masked, stored in a queue, and handled later.

Named and Unnamed pipes are another communication method. Those are:
- Unidirectional
- Producer/consumer
- Message-based
The only difference between named and unnamed pipes is that named pipes are not represented by a file description but are virtual files in the file system. Anyway no actual I/O is performed to FS.

Message queues are sort of Named pipes but with multiple producers / multiple consumers. Those queues are represented through a virtual file in the file system.

The last IPC method is shared memory. This method allows multiple processes to allocate and share a chunk of memory with other processes. The pattern to do so is:
1- Create a special virtual file in the file system by SHM_OPEN returning the file descriptor
2- Specify the dimension of the memory object with FTRUNCATE
3- Map the memory of the file descriptor in the process

Cleaning methods are also available

How well did you know this?

Not at all

Perfectly

Classification of the scheduling algorithms

Scheduling algorithms can have different properties:
Preemptive or non-preemptive: A scheduling algorithm can or not interrupt processes at any time. Preemption is necessary for responsive systems
Static or dynamic: Scheduling decisions are based on fixed (static) or non-fixed (dynamic) parameters.
Online or Offline: In an offline scheduler the scheduler is fully aware of what processes are meant to be scheduled in the system
Optimal or Heuristic: Optimal schedulers optimize a given cost function. This can be really complex and long to achieve so heuristic algorithms are created to reduce scheduling overhead

How well did you know this?

Not at all

Perfectly

Describe PSO

Partial store order is a memory model implemented by ARM processors that is weaker than Sequential and Total Store Order. In this memory model each processor read and writes from a different copy of the memory and each write propagates to the other processors independently, with no guarantee on the order in which those writes will arrive to other processors.

How well did you know this?

Not at all

Perfectly

Buffer overflow and meltdown

Buffer overflow and meltdown are two vulnerabilities that can arise in Linux. The first one is a very old and known vulnerability in which the size of a buffer is not checked once a program is asked to write on it so that we can write more than the buffer can contain. This can be used to take control of the RIP (return instruction pointer) in order to perform attacks such as shellcode or ROP. This attack is partially mitigated through a kernel-enforced process called Address Space Layout Randomization (ASLR). This process performs a randomization of the base virtual address so that is not possible to know where to jump after an overflow. A dual approach is present also in kernel space called KASLR.
Meltdown is instead a new kind of attack that leverage on processor speculation to leak kernel memory to user space. It relies on a CPU race condition that can arise between instruction execution and privilege checking. This execution leaves some side effects in user space that are leaks of that privilege checking. One countermeasure is KPTI which exposes two PGDs. One is only visible in kernel mode and contains both kernel addresses and user addresses, The other is visible in User mode and contains all user addresses and only a few amount of kernel addresses necessary for syscalls, interrupts, and exceptions

How well did you know this?

Not at all

Perfectly

Interrupts and deferring work

Interrupts are generated from I/O devices or timers. The former is Asynchronous, the latter synchronous. Asynchronous interrupts can be also classified as maskable and unmaskable. Once an interrupt is generated and reaches the CPU, this has to stop the currently running task and handle the interrupt. Once the task has been handled the CPU returns to the execution of its previously executed task.
In order to batch some interrupts and to improve the overall responsiveness Linux implements differing work methods that decide the interrupt management in two levels:
- Top half: Minimal amount of work is executed when the interrupt arrives to register the interrupt. This task is non-interruptible and schedules a deferred work
- Bottom half: Finalize the differed work. The time in which those are scheduled are called reconciliation points. 3 methods of deferred work are available:
- SoftIRQ
- Tasklet
- Work queues

How well did you know this?

Not at all

Perfectly

What is a timer

Study These Flashcards

The system timer generates interrupts at regular intervals and provides basic timing to system components. This is used for:
- Keeping track of current time
- Generate regular interrupts
- Managing sleep states
Dynamic timers are used for background tasks. They are dynamic because they can be created and destroyed dynamically. They are implemented using a data structure called timer_list which contains callback functions and time delays

What is the difference between a character driver and bloc driver

Study These Flashcards

A character driver is used to control devices that transmit data one character at a time. Those devices are typically accessed through files in the /dev/ directory and are used to transmit data to and from the device.
A bloc driver is used to control devices that transmit data on a block of a fixed size at a time. Those devices are typically accessed through files in the /dev/ directory and are used to load and store data from the device. Since more requests can come in short sequence driver can decide to batch them to commit changes all at once.

Memory-mapped vs Port-mapped I/O

Study These Flashcards

Memory-mapped I/O involves mapping a device’s memory space into the memory space of the CPU so that the device can be accessed using memory access instructions. This allows the device to be treated like any other memory location and allows the CPU to perform I/O operations on the device using standard memory access instructions. Memory-mapped I/O is typically used for devices that have a large amount of memory.
Port-mapped I/O involves accessing a device through a dedicated set of I/O ports, which are special memory locations that are used to communicate with the device. Port-mapped I/O is typically used for devices that have a small amount of memory.

Linux device management

Study These Flashcards

In Linux, devices are managed with a layered approach. Each layer is responsible for a different aspect of device management:
- Hardware: Consist of the physical device and its components
- Device Driver: Software component that communicates with the hardware and provides a standard interface to the OS for accessing it. It is responsible for controlling the device
- Device file: Consist of a special virtual file exposed in the file system that represents the device. This file is used to perform I/O operations.
- User space: User space programs that interact with the device file

Device-to-CPU communication

Study These Flashcards

In Linux there are two main ways for the OS to register I/O operations coming from the device:
- Pooling: The OS actively listens to the device wasting CPU cycles. This is necessary when the OS is sure that I/O will arrive within a short timeframe (so that actually scheduling another task would lead to more overhead) or when is crucial for the OS to handle the I/O operation as soon as possible
- Interrupt: When an interrupt reaches the CPU, if possible, it has to suspend the currently running task in order to execute the interrupt handler (or the interrupt service routine ISR) and then come back to the previously running task. Though in case of too many interrupts, the CPU can livelock.

Virtualization goals

Study These Flashcards

Shadowing and deprivileging

Study These Flashcards

Hardware-assisted vs Software based

Study These Flashcards

Problem and solutions for pure trap and emulate

Study These Flashcards

Paravirtualization and containerization

CPU-to-device communication

CPU-to-device communication is a problem solved by mapping the device into address space. This mapping can be done by mapping a port or mapping the entire memory of the device in the address space of the OS. Port mapping is limited because the number of ports is limited, and instructions are slow and clunky (also because they are privileged instructions not runnable in user space).

What is DMA and what does it solve

Direct memory access (also called DMA) is a pattern used by the CPU to unload the trivial operation of copying chunks of memory from the main memory of the CPU to the memory of the device. When DMA is used the CPU unloads this operation to external components that will handle the coping routing, leaving the CPU free for other operations by giving it the only duty of invoking the DMA routine. Once the coping is finished the CPU is made aware of it through an interrupt.

Interrupts types

In Linux, there are two categories of interrupts: - Synchronous: the interrupt is generated by the CPU itself so that the interrupt in aligned with the clock. Those interrupts are also called Exceptions - Asynchronous: the interrupt is external to the CPU and is generated by other devices or timers. Those interrupts can be subdivided in: - Maskable: the CPU can decide to ignore that interrupt - Unmaskable: the CPU always recognizes the interrupt

Interrupt flow

Once an asynchronous interrupt (so coming from an external device) is issued it arrives first at the PIC (Programmable Interrupt Controller) for a multi-processor CPU. PIC will map the interrupts (in x86 they are represented as 1 byte unsigned integer) to the CPU that will handle them. PIC is meant to better load balance the interrupt handling between the CPU and add different priority levels to the interrupt. PIC can be either implemented in software such as in hardware. Linux has is own software implementation of PIC inside the kernel.

States the difference between the different interrupt handlers

Once an interrupt is issued, the CPU will sense it in the so-called top half and will schedule an interrupt handler, called the bottom half. Once this bottom half was a generic function. Nowadays those terms are there just as general concepts that refer to more advanced techniques of deferring works. When an interrupt is invoked, CPU preempt the current user-mode task and will run the interrupt handler in supervisor-mode. This refferes to the top half legacy concept of the interrupt handler. This handeler can then schedule a low priority routine that can run later on to improve system response. There are 3 main types of deferring work: - SoftIRQ: General interface that can be implemented by different specific hasnder. The main properties are: a No preemption, so the programming must handle shared resources by using only spin-locks b No huge reads c The routine called do_softIRQ() decide which scheduled softIRQ to run. This routine is called after each reconciliation point. Also when the number of softIRQ increase too much kernel can schedule a thread to handle all of them d Each handler can run simultaneously on different CPUs for different interrupt requests - Tasklet: Type of softIRQ. Thein inherits the same constrinatints. Additionally each handler can only run on a CPU at the time. - Work queues: those tasks are preempatble, can read huge chunks of memory. Those are meant for have duties in the OS. Once issued they are added to a queue that, once the thread responsible for handling all those tasks would eventually wake up, will execute all of them and go back to sleep.

What are the different file systems for devices in linux

In Linux we find 3 different file systems to manage devices: - Devfs: virtual file system that is used to manage device files in /dev that are currently attached to the machine. Problems: - Names constraints. - Plugging and unplugging the device might result in the name change - Big database of names required - Udev: substitutes Devfs migrating naming problem in user space.

Page reclamation algorithm and watermarks

Translation lookaside buffer

Memory zones

Virtual memory

I/O scheduling

The Linux kernel has policies to manage I/O requests from tasks to devices. The objective is to: - Addressing priorities - Minimize time wasted by hard disk seek - Addressing Deadline constraints The various algorithms are: - NOOP: Efficiency thanks to minimum overhead. Simplest scheme. Request ordered in FIFO queue. It just batch requests for adjacent sectors - Budget fair scheduling: Fairness thanks to budget assignation. Assign a budget to each process and use that budget to have a fair share of the disk. When a process is selected the OS grants it exclusive access till budget expiration. It tries to improve performance by increasing the priority to tasks with a low budget. High overhead. - Kyber: Maximize throughput. Merge adjacent requests and, possibly tries to reader requests (and recursively merge request and reorder). It is done by dividing read and write requests. - MQ-Deadline: Tradeoff between merging and fairness for starving processes. Assign expiration to I/O requests and runn all the other like NOOP. This algorithm is improved by an heuristic that will wait for few milliseconds for a request of the same type to arrive in order to try to batch them

Exams questions Flashcards

(36 cards)