MS Past paper qs Flashcards

Question 1

Q

Describe in detail the linear sweep disassembly algorithm, its strengths and limitations (if any). [6 marks]

2017/18/19/23

Answer

A

The linear sweep disassembly algorithm is a type of static malware analysis where in a binary, we iterate through all code segments and we decode all the bytes consecutively and then we parse them into a list of instructions. 1st 4 - data, rest code.

binary blob -> iterate through code -> decode bytes one by one linearly -> parse into a list of instructions

Strengths:
- provides complete coverage of a program’s code sections - won’t miss code
- x86 can automatically resynchronise itself after just a few instructions
- safe for disassembling ELF binaries, don’t typically contain inline data (good for benign libraries)

Limitations:
- doesn’t understand control flow
- compiler often mixes code with data as memory address is unknown so doesn’t actually know what’s the code and what’s the data, but it just assumes everything is good
- with jump tables it can intersperse data: if data is mistaken for code may encounter invalid opcodes
- some data can correspond to valid opcodes so this will output bogus instructions
- as a result can become desynchronised so misses instructions

Question 2

Q

Describe in detail the recursive traversal disassembly algorithm, its strengths and limitations (if any). [6 marks]

2017/18/19/23

Answer

A

Begin disassembling from known entry points in the binary.
Follow the control flow recursively until encountering branches.
If the branch is unconditional, follow the known address and continue the flow.
If the branch is conditional, the address is unknown, so the instruction pointer changes accordingly. Otherwise, continue linearly.
Keep track of return instructions and start disassembling from there recursively.

Strengths
- can easily distinguish code from data
- good for inline/malicious code as it’s not easily fooled into producing bogus output

Limitations
- Indirect code invocation: Jump tables create indirect control flow, making it harder for disassemblers to track instruction flow because there’s no direct target address to follow.

Hard to follow: Disassemblers struggle with jump tables since they can’t statically resolve where the data is loaded, some need runtime conditions resolved so may miss blocks of code, unless they use special heuristics to handle them. Hard to know where to jump during analysis.
Some instructions, like “ret” instructions, cause errors and prevent further analysis, leading to missed instructions and incomplete disassembly.
Recursive Traversal Disassemblers might fail to discover instructions at specific addresses, like 0x443903-0x443925, possibly due to complexities in the code or limitations in the disassembler’s approach.

Question 3

Q

What is a botnet and what is a botnet C&C server? [3 marks]

2017/18/19/22

Answer

A

A botnet is a network of bots, which are malware that carry out some malicious action in co-ordination with other bots.

A botnet C&C server is a command and control server which the bots utilise to communicate over. This is the server which the attacker/botmaster uses to control the bots.

Question 4

Q

How are botnets usually created (cite at least 2 different ways)?
Describe two different ways in which botnets are usually created.
[2 marks]

2017/18/19

Answer

A

As a computer worm
A Trojan horse resident in a program

Question 5

Q

Describe three methods botnets use to determine rendez-vous points for C&C servers. [9 marks]

2022

Answer

A

Botnets may be centralised in which case each bot communicates directly with the botmaster. However, the botmaster would then become a bottleneck for large botnets
Many botnets use a hierarchical structure in which the botmaster communicates with a set of bots that are in turn botmasters for other bots. This allows control over a large botnet
Peer-to-peer botnets use a C&C structure in which there is no single C&C server. Instead, a peer-to-peer network is constructed, with the bots acting as peers. Thus, if some portion of the botnet is deleted, the remainder of the botnet can continue to function

fast flux/domain flux

Question 6

Q

Describe in detail how a drive-by download attack works and what it is used for. [3 marks]

2017/18/22/23

Answer

A

It is a type of infection vector which is the way a malware uses to propagate itself on a computer

This is unintentional download of malicious code through typically taking advantage of an app, OS or web browser that contains security flaws due to unsuccessful/lack of updates.

Uses:
- Exploiting APIs for browser plugins to enable the downloading and execution of arbitrary files
- Exploiting vulnerabilities in the web browser or plugins

Question 7

Q

State what a rootkit is and describe at least one of its hooking techniques; specify pros and cons. [4 marks]

2017/18/19/22

Answer

A

A rootkit is a pernicious Trojan horse, it hides itself on a system so it can carry out its actions without being detected.

Hooking techniques: altering part of a kernel, for example, the Knark rootkit modifies entries in the system call table to invoke new versions in a kernel-loadable module. These new versions hijack system calls related to examining the file system, network connections, spawning processes, etc., concealing the presence of the rootkit.

Pros:
- Stealth: Kernel modification techniques allow rootkits to operate stealthily by altering core system functions, evading detection by traditional security measures.
- Persistence: Once the kernel is modified, the rootkit can maintain control over the system across reboots, ensuring persistent access for the attacker.

Cons:
- Complexity: Modifying the kernel requires a deep understanding of system internals, making the development and deployment of such rootkits complex.
- Detection: Detection methods such as comparing the system call table stored in the kernel with a copy stored on disk at boot time can potentially uncover the presence of kernel-level modifications.

Question 8

Q

Describe in a few sentences what clustering (i.e., unsupervised learning) and classification (i.e., supervised learning) are. For each of these two categories, provide the name of one well-known algorithm we have discussed in class. [4 marks]

2018/19/22/23

Answer

A

In clustering, there are an unknown number of classes and the model does not have any prior knowledge. It groups and interprets similar data based only on input data.
(Hidden Markov model)

In classification, there is a known number of classes and we learn from a set of labelled samples. Given a labeled dataset, we find a model that separates them into their respective classes.
(K-Nearest Neighbours)

Question 9

Q

Report the formulas of Accuracy, F1-Score, Precision and Recall (1 mark for each formula).

2018/19/22/23

Answer

A

Accuracy = (True Positive + True Negative) / ALL

F1 = (2 * Precision * Recall) / Precision + Recall

Precision = True Positive / (True positive + False positive)

Recall = True Positive / (True Positive + False Negative)

Question 10

Q

Discuss the different behaviour of F1-Score with respect to “Accuracy” in the case of a dummy binary supervised classifier that, given any application as input, predicts always ”0” (benign), and never ”1” (malicious); consider a testing dataset of 1000 apps, with 990 app having ground truth label “benign”, and 10 app having ground truth label “malicious”. In particular, for the comparison, refer to the following definition of Accuracy:

Accuracy = (T P + T N) / (T P + F P + T N + F N)
where TP are True Positives, FP are False Positives, TN are True Negatives, and FN are False Negatives.

[5 marks]

2018/19/22/23

Answer

A

Accuracy:
Total = 1000
TP = 0, none are malicious, all predicted benign
TN = 990, predicted benign actually benign
FP = 10, predicted benign actually malicious
FN = 0, predicted no malicious but all benign

Accuracy = (0 + 990) / (0 + 10 + 990 + 0) = 990/1000 = 0.99

F1-Score:
The F1-score is calculated using precision and recall. Precision is the ratio of true positives to the sum of true positives and false positives. Recall is the ratio of true positives to the sum of true positives and false negatives.

Precision = TP / (TP + FP) = 0 / (0 + 10) = 0
Recall = TP / (TP + FN) = 0 / (0 + 0) = 0

F1 = 2 * Precision * Recall / Precision + Recall = 2 * 0 * 0 / 0 + 0 = 0

Comparison:
- Accuracy gives a high value (0.99), indicating that the classifier performs well in terms of overall correctness. However, this metric can be misleading because it does not consider the class imbalance (i.e., the majority class “benign” dominates the accuracy).
- F1-score, on the other hand, is 0 due to the absence of true positives. This highlights the limitation of F1-score in situations where one class dominates the dataset and the classifier fails to predict the minority class.

Question 11

Q

Discuss what are True Positive (TP), True Negatives (TN), False Positive (FP), False Negatives (FN) in the context of a malware classifier that classifies an input sample as malicious or benign. [8 marks]

2023

Answer

A

TP = binary malware, system malware - CORRECT MALICIOUS - Malware labeled as Malware
TN = binary malware, system not malware - CORRECT BENIGN - A normal program labeled as safe

FP = binary not malware, system malware (Type 1 Error) - INCORRECT MALICIOUS - A normal program labeled as Malware
FN = binary not malware, system not malware (Type 2 Error) - INCORRECT BENIGN - Malware labeled as safe

Question 12

Q

Describe the high-level view of the algorithm with an example picture in
2D space, showing the decision boundary, some feature points, and the
support vectors (4 marks).

2018/19

Answer

A

|. . . / / / *. *
|. */ / / *. *
|. *. / / / *. *
|. / / /. *
|. / / / *. *
_____________________________

decision boundary / hyperplane = the line in the middle
support vectors = feature points that are on the margins
feature points = *

Question 13

Q

Describe the Radial Basis Function (RBF) non-linear kernel through a
visual example in a 2D graph (3 marks).

2018/19

Answer

A

same as SVM normal but it’s in a circle, outer circle one class inner circle another class

You can conceptualize an SVM with the RBF kernel as generating a smoothed linear combination of spheres around each data point ( x ):

Inside each sphere, the classification is the same as ( x ), while outside is assigned the opposite class.
The parameter ( \gamma ) determines the radius of these spheres, i.e., the proximity required to classify a point similarly.
The parameter ( C ) determines the degree of “smoothing.”

Question 14

Q

With the aid of diagrams, show and explain the main differences between overwriting viruses, companion viruses, and parasitic viruses (in all their forms). [9 marks]

2019/22/23

Answer

A

Overwriting viruses:
- overwrites section of program file with virus, may break infected program
- infect host file with bigger/smaller file size
- potentially break infected program

Companion viruses:
- don’t overwrite or infect files, but create new files with same name as the legitimate one but with a different file extension
- so when user runs legit file, companion file also executes
- in DOS if no extension is given order of execution is COM, EXE then BAT
- eg. infects COM file of same name as EXE file

Parasitic viruses:
- prepending: insert virus code at beginning of executable then shift original code to follow virus
- appending: append virus code to executable, insert JMP at beginning of executable
- fragmenting: code intermixed with original ones

Question 15

Q

Briefly explain what a phishing attack is. [2 marks]

2019/23

Answer

A

Phishing is the act of impersonating a legitimate entity, typically a web site associated with a business, to obtain information such as passwords, credit card numbers, and other private information without authorisation.

Requires that the attackers create a website that display a page that looks like it belongs to a bank.
User is fooled into thinking this is the legit page for the bank.
Attacker creates an email instructing victims to click on an enclosed link that will take them to the fake page.
However URL is the real bank but underlying link is the fake bank.
User enters personal credentials and attacker saves credentials for later use.

Question 16

Q

Explain how an homograph attack on websites works and describe possible ways to detect and prevent it (on the client side). [6 marks]

2019/22

Answer

A

A phishing technique using look-alike characters from different scripts to create fake URLs that appear legitimate. The Latin ‘a’ (U+0061) replaced with Cyrillic ‘а’ (U+0430), making “раypal.com” look like “paypal.com”.

Detection:
- Check URLs Closely: Look for unusual characters. Hover over links to see the actual URL.
- Use Browser Features: Enable Punycode display in browsers. Install security extensions like Netcraft Anti-Phishing.

Prevention:
- Anti-Phishing Tools: Use browser security settings like Google Safe Browsing. Install antivirus software with anti-phishing features.
- Stay Updated: Keep browsers and security tools up to date. Use the latest browser versions.

Question 17

Q

Explain what a remote DLL injection attack is and describe in detail all the steps performed during this attack on a Windows host. Note: the exact name of the API calls are not needed. [10 marks]

2019/23

Answer

A

The target process is forced to load a malicious DLL into its process memory space.

Steps:
* identify the process and get the process ID and open a handle (OpenProcess())
* allocate memory in the target process (VirtualAllocEx()) with ReadWrite
* copy the DLL path name in the allocated memory
* determine address of LoadLibrary() in kernel32.dll
* execute a thread (CreateRemoteThread()) with a pointer to LoadLibrary() and to the allocated memory with the string
* deallocate memory and close the handle to the target process

Question 18

Q

Briefly describe code obfuscation and three anti-static analysis techniques. [4 marks]

2019/22/23

Answer

A

Code obfuscation is a limit of static analysis. Obfuscation refers to techniques that preserve the program’s semantics and functionality while, at the same time, making it more difficult for the analyst to extract and comprehend the program’s structure

Three anti-static analysis techniques:
- Junk insertion: introduces disassembly errors by inserting junk bytes at selected locations into the code where the disassembler expects code (locations not reachable at run-time)

Branch function: modify the normal behaviour of a function call
Overlapping instructions: sharing code on different levels

Question 19

Q

Describe with examples four ways in which a malware can detect a sandbox. [16 marks]

2023

Answer

A

Find processes: VBoxService.exe, vmtoolsd.exe
Find files or devices: \Device\VBoxMouse
Detect available libraries: LoadLibrary(‘VBoxOGL.dll’) * Detect BIOS version
Detect disk description: IOCTL_STORAGE_QUERY_PROPERTY, IOCTL_SCSI_MINIPORT
Detect disk size: IOCTL_DISK_GET_DRIVE_GEOMETRY
Detect guest tools
Find windows: FindWindow(‘VBoxTrayToolWnd’)
Check registry keys: HKLM\HARDWARE\Description\System\VideoBiosVersion

Question 20

Q

[SKIP]

Briefly describe the main phases of a typical crypto-ransomware infection. [6 marks]

2019

Answer

A

Delivery:
- How it happens: Delivered via phishing emails, malicious attachments, or exploit kits.
- Example: Email with a fake invoice attachment.
Installation:
- What happens: Malware exploits vulnerabilities and installs itself.
- Example: Uses an exploit to gain admin privileges.
C2 Communication:
- Purpose: Connects to a server to get encryption keys.
- Example: Sends system info and retrieves a key.
File Encryption:
- Action: Encrypts files with strong algorithms.
- Example: Locks documents and appends a new extension.
Ransom Demand:
- Notification: Displays a ransom note demanding payment.
- Example: Asks for Bitcoin to decrypt files.
Cleanup:
- Final steps: Deletes itself and system backups.
- Example: Removes traces and backups to prevent recovery.

Question 21

Q

[SKIP]

Describe at high-level the generic encryption process performed by a hybrid encryption crypto-ransomware, and provide the pseudocode of the algorithm. [9 marks]

2019

Answer

A

Cryptographic hash functions, unlike checksum algorithms, are resilient against collision attacks and do not cause a lot of false positives:
* they take a long time to compute
* easily evaded: a simple change in the input file can generate a totally different hash value

File Selection: Identify and select files to encrypt based on specific criteria (e.g., file types, locations).

Key Generation:
Symmetric Key: Generate a unique symmetric key (e.g., AES key) for encrypting the files.
Asymmetric Key: Use the attacker’s public key for encrypting the symmetric key.

File Encryption:
Encrypt each selected file using the symmetric key and a symmetric encryption algorithm (e.g., AES).

Symmetric Key Encryption:
Encrypt the generated symmetric key using the attacker’s public key with an asymmetric encryption algorithm (e.g., RSA).

Key Storage:
Save the encrypted symmetric key in a location accessible to the attacker (e.g., appended to the encrypted file or saved in a specific file).

Ransom Note:
Display or create a ransom note informing the user of the encryption and the demand for payment to retrieve the decryption key.

Question 22

Q

Explain in details the goals of a dynamic analysis system (malware analysis sandbox)? [18 marks]

2017/22

Answer

A

Sandbox is a security mechanism for separating running programs:
* often used to execute untested or untrusted programs or code
* no (or limited) risk to harm the host machine or operating system
* provides a tightly controlled set of resources for guest programs to run in
* it can be based on virtualization (which can be used to emulate something)

Visibility — A sandbox must see as much as possible of the execution of a program: otherwise, it risks of missing interesting, potentially malicious, behaviours
Resistance to detection — Monitoring should be hard to detect and environment hard to fingerprint
Scalability — With 500,000+ malware samples per day, analysis must scale up: the execution of one sample does not interfere with the execution of subsequent malware programs. Analyses should be automated

Question 23

Q

What are the difference between emulation and virtualisation? [8 marks]

2017

Answer

A

Virtualization:
* a level of abstraction
* resource virtualization (e.g.,virtual memory, RAID, storage virtualization, overlay networks)
* platform virtualization (e.g.,emulation/simulation)
* application-level (Java VM)
* OS-level (FreeBSDJails)
* HW-level (VMMTypeI)

An emulator duplicates the functions of a system A using a system B that behaves like A (e.g., MAME)

A simulator is a system designed to provide a realistic imitation of an abstract model of a system (e.g., mathematical or physical model)

Question 24

Q

What are the implications of using emulation or virtualisation in terms of
advantages and disadvantages in a (dynamic) malware analysis sandbox
context? [8 marks]

2017

Answer

A

Pros:
* automate the whole analysis process
* process high volumes of malware
* get the actual executed code
* can be very effective if used smartly

Cons:
* can be expensive
* some portions of the code might not be triggered
* environment could be detected

Question 25

Q

1 bool IsPasswordOK ( void ) {
2 char Password [12];
3 char GoodPass [12] = readPasswdFromPasswdFile ();
4
5 puts ( “ Enter password : “ );
6 gets ( Password );
7 return 0 == strcmp ( Password , GoodPass );
8 }
9 int main ( void ) {
10 bool PwStatus = false ;
11
12 do {
13 PwStatus = IsPasswordOK ();
14 } while ( PwStatus == false );
15 puts ( “ Access granted ! “ );
16 }

Briefly explain why this program has a security vulnerability. Note that
readPasswdFromPasswdFile() does not perform any checks. It just reads the file
and returns the contents of the file. [4 marks]

2023

Answer

A

Use of gets(): This function leads to a buffer overflow risk because it does not limit the number of characters read.

Potential for Buffer Overflow: User input exceeding the buffer size can overwrite memory, leading to potential security exploits.

By replacing gets() with a safer alternative like fgets() and ensuring the input length does not exceed the buffer size, the vulnerability can be mitigated.

A buffer overflow occurs when data is written outside the boundaries allocated to a variable.
Can have serious consequences for the security and reliability of a program, attacker can modify values or variables or execute arbritary code.

Question 26

Q

Demonstrate how the vulnerability can be exploited by providing an example of an input exploiting it, and by showing (with a diagram) how the process stack would look like during exploitation. [10 marks]

2023

Answer

A

see code injection via buffer overflow

Question 27

Q

Describe how static and dynamic analysis can help identify potentially harmful behaviours in an application (in general, not necessarily linked to the code example provided below in question 1). In your explanation include at least two shortcomings for each type of analysis. [10 marks]

2022

Question 28

Q

Explain in detail how system calls work on either a Linux or Windows system (i.e., choose only one operating system). Use diagrams to show the invocation and data flows. [7 marks]

2022

Answer

A

System Calls in Linux

A system call is a way for programs to interact with the operating system kernel. System calls provide a controlled interface for programs to request services such as file operations, process management, and network communication.

Here’s a detailed explanation of how system calls work on a Linux system:

Steps of a System Call

User Space and Kernel Space:
- User space: Where user applications run.
- Kernel space: Where the core of the operating system executes. Direct access to kernel space from user space is not allowed for security and stability.
Invocation:
- A system call is typically invoked by a library function in the C standard library (glibc on Linux). For example, calling open() in a C program.
- The library function sets up the system call by placing the system call number (which identifies the specific system call) and its arguments in the appropriate CPU registers.
Transition to Kernel Mode:
- The library function executes a special CPU instruction (e.g., int 0x80, syscall, or sysenter depending on the architecture) that triggers a transition from user mode to kernel mode.
- This transition involves switching the CPU mode, changing the stack, and jumping to a predefined location in the kernel code.
Kernel Execution:
- The CPU now executes the system call handler in the kernel. The kernel examines the system call number and arguments.
- The appropriate kernel function is called to perform the requested service (e.g., reading a file, writing to a network socket).
- After completing the requested operation, the kernel prepares the return value.
Transition Back to User Mode:
- The kernel sets up the CPU to return to user mode, often by placing the return value in a specific register.
- A special return-from-system-call instruction (e.g., iret, sysret, or sysexit) is executed to switch back to user mode and return control to the user program.
Continuation in User Space:
- The user program resumes execution with the return value of the system call.

Diagram of System Call Flow

Here’s a visual representation of the system call process:

```plaintext
+—————-+ (1) +—————+
| User Program | ———> | glibc (libc) |
+—————-+ +—————+
| |
| (2) |
| |
v (3) v (4)
+——————+ +—————-+
| User Space | | Kernel Space |
| - Registers set | (3) | - System call |
| - syscall | ——–> | handler |
| instruction | +—————-+
+——————+ |
| (4)
|
v
+—————-+
| Kernel Service |
| - File System |
| - Network |
| - Memory |
+—————-+
|
| (5)
|
v
+——————+ +—————-+
| User Space | <——– | Kernel Space |
| - Return value | (6) | - Return |
+——————+ | value setup |
+—————-+
~~~

Detailed Explanation of Diagram Steps

User Program:
- The user program initiates a system call by calling a function like open(), read(), or write().
glibc (libc):
- The C standard library provides wrapper functions for system calls. It prepares the system call by placing the system call number and arguments into CPU registers and then executes the system call instruction.
User Space:
- Execution continues in user space where the system call instruction is executed, triggering a transition to kernel space.
Kernel Space:
- The CPU switches to kernel mode, and the kernel’s system call handler takes over. The handler identifies the system call number, retrieves arguments, and calls the corresponding kernel service function.
Kernel Service:
- The actual work of the system call is performed by the kernel service (e.g., reading a file, sending data over a network).
Return to User Space:
- Once the kernel service completes, the kernel prepares the return value, executes the return-from-system-call instruction, and control is passed back to the user program with the result of the system call.

This detailed process ensures that user programs can safely and efficiently request services from the operating system, maintaining a clear boundary between user and kernel space to protect the system’s stability and security.

Question 29

Q

Compare Cross Site Scripting (XSS) and Cross Site Forgery Request (CSRF) attacks in terms of trust exploitation and their dependency on each other. [5 marks]

2022

Answer

A

XSS: Malicious scripts are injected into trusted websites.
An attacker uses a web application to send malicious code, typically in the form of a browser-side script, to a different end user.
The end user’s browser has no way to recognize that the script should not be trusted and will execute it.
Since the browser believes the script came from a trusted source, the malicious script can access any cookies, session tokens, or other sensitive information retained by the browser and used with that site.

CSRF: A malicious website, email, blog, instant message, or program causes a user’s web browser to perform an unwanted action on a trusted site where the user is currently authenticated.
CSRF attacks are used by an attacker to make a target system perform a function via the target’s browser without the target user’s knowledge.

!!! Unlike XSS, which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user’s browser.

Question 30

Q