PDF File Format / MS Office Flashcards

1
Q

Name two main tools used to analyse PDF Files?

A
  • PDF-ID counts the occurance of certain keywords
  • PDF-Parser actually understands the file format.

It can decode content with “-f” option (/FlateDecode = Zlib compression). Can search for strings in PDF file with “-s”.

Both of these are command-line Python scripts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a JavaScript need to be executed in a PDF?

A

Javascript needs /OpenAction event to be executed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are PDF Filters?

A

PDF Filter are applied from right to left. I.e. first /FlateDecode and then /ASCIIHexDecode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you get around JavaScript Obfuscation in PDF files?

A
  • Add <scipt> tag and HTML</scipt>
  • Tools like Malzilla or Revelo
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are ObjectStreams?

A

An ObjectStream (/ObjStm) is a special type of object. It is an object, that contains a stream, that itself includes other objects. The idea is that multiple objects can be placed in one stream, and the whole stream can be compressed. In practice a document will generally have several object streams, that keep related items together – e.g. All objects for page 1, page 2 etc – this allows for the PDF to still be random accessed easily.

Can be analyzed like this: pdf-parser.py –s ObjStm –f –w 10.pdf | pdfid.py -f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which tool would be more useful in analyzing a PDF file – a Hex Editor or a Text Editor?

A

Text – it’s a text based file format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main part of a PDF file?

A
  • Header
  • Objects
  • Cross Reference (Points to each object)
  • Trailer (EOF, Offset to XREF)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are you looking for in MS Office documents?

A
  • Dropped Files (via Macro or exploit)
  • Shellcode (via a vulnerability)
  • URL callouts (acting as downloader)
  • Malicious Scripts (download / dropping a file)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the approach of analysing MS Office documents?

A
  1. Locate potentially malicious embedded code
  2. Extract suspicious code from the file
    1. If shellcode - disassemble / debug
    2. If other script - deobfuscate
  3. Figure out the end goal and next stage of infection chain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name some tools to analyse MS Office documents?

A
  • OfficeMalScanner
  • OleDump
  • OffVis
  • python-oletools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to run pdf-id?

A

python pdfid.py <filename></filename>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to run pdf-parser?

A

python pdf-parser.py <filename></filename>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to parse out (see content) individual objects from a PDF?

A

python pdf-parser.py -o 5 -c <pdffile></pdffile>

  • -o : Object number
  • -c : Prints out content
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to parse out individual objects from a PDF that is ASCII decoded?

A

python pdf-parser.py -o 5 -f <pdffile></pdffile>

  • -o : Object number
  • -f : Decodes filter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can hackers use PDF to put in malicious code?

A

Obfuscation by using filters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you search for JavaScript within a PDF file?

A

python pdf-parser.py -s Javascript <filename></filename>

  • -s : searches for JS
17
Q

How to search for any OpenAction event in a PDF file?

A

python pdf-parser.py -s OpenAction <filename></filename>

18
Q

How can you analyse an ObjStm in a PDF?

A

pdf-parser.py –s ObjStm –f –w 10.pdf | python pdfid.py -f

  • -s : search for string
  • -f : decode filters
  • -w : Output in raw data format
  • -f : force
19
Q

What is a HeapSpray attack?

A

The idea of a HeapSpray is to fill the area of memory known as the heap with shellcode, so that once our vulnerability triggers – the exploit code will be run. Fill almost all of the Heap with NOP instruction.

20
Q

How can you dump an object of a PDF?

A

python pdf-parser.py -o 5 -f -d exportname.exe <pdffile></pdffile>

21
Q

How can you attach a file to a PDF?

A
  1. /EmbeddedFile
  2. Append file after EOF
22
Q

How to look for extra details in a PDF?

A

python pdfid.py -e filename.pdf

23
Q

How to extract a file which is appended to a pdf?

A

python pdf-parser.py -x filename.pdf

24
Q

What are AcroForms in PDFs?

A

AcroForm are designed to allow PDF to include forms just like a webpage would have. They also however can trigger JavaScript.

25
Q

What is an XML bomb?

A

An XML bomb is a message composed and sent with the intent of overloading an XML parser. The JavaScript loads the metadata of the PDF. Metadata is normally where you will find details such as the author, title, etc. but it can also include JS.

/JS (this.metadata;)

26
Q

How can you run a phising attack using a PDF?

A

By using the JS function app.fs.isFullscreen = true

Combined with AcroForms it’s perfect for a fake bank page.

27
Q

How can you embed a VBS script to a PDF and make it run when you open the PDF?

A

By using the /Launch parameter (check with pdf-id.py)

28
Q

How can you decrypt a pdf file?

A

qPDF -decrypt input.pdf output.pdf

(works with Owner password protected files)

29
Q

Name a online PDF sandbox?

A

Joes Sandbox

30
Q

Name a few advanced PDF attacks?

A
  • HeapSpray
  • Embedded EXE files
  • AcroForm
  • XML bomb
  • app.fs.isFullscreen = true
  • Encrypted PDF
31
Q

What are the two file formats used for Office documents?

A
  • OLE file format (doc/xls/ppt)
  • Office Open XML