PDF File Format / MS Office Flashcards by Yves Sturzenegger

Name two main tools used to analyse PDF Files?

PDF-ID counts the occurance of certain keywords
PDF-Parser actually understands the file format.

It can decode content with “-f” option (/FlateDecode = Zlib compression). Can search for strings in PDF file with “-s”.

Both of these are command-line Python scripts.

How well did you know this?

Not at all

Perfectly

What does a JavaScript need to be executed in a PDF?

Javascript needs /OpenAction event to be executed.

How well did you know this?

Not at all

Perfectly

What are PDF Filters?

PDF Filter are applied from right to left. I.e. first /FlateDecode and then /ASCIIHexDecode.

How well did you know this?

Not at all

Perfectly

How can you get around JavaScript Obfuscation in PDF files?

Add <scipt> tag and HTML</scipt>
Tools like Malzilla or Revelo

How well did you know this?

Not at all

Perfectly

What are ObjectStreams?

An ObjectStream (/ObjStm) is a special type of object. It is an object, that contains a stream, that itself includes other objects. The idea is that multiple objects can be placed in one stream, and the whole stream can be compressed. In practice a document will generally have several object streams, that keep related items together – e.g. All objects for page 1, page 2 etc – this allows for the PDF to still be random accessed easily.

Can be analyzed like this: pdf-parser.py –s ObjStm –f –w 10.pdf | pdfid.py -f

How well did you know this?

Not at all

Perfectly

Which tool would be more useful in analyzing a PDF file – a Hex Editor or a Text Editor?

Text – it’s a text based file format

How well did you know this?

Not at all

Perfectly

What are the main part of a PDF file?

Header
Objects
Cross Reference (Points to each object)
Trailer (EOF, Offset to XREF)

How well did you know this?

Not at all

Perfectly

What are you looking for in MS Office documents?

Dropped Files (via Macro or exploit)
Shellcode (via a vulnerability)
URL callouts (acting as downloader)
Malicious Scripts (download / dropping a file)

How well did you know this?

Not at all

Perfectly

What is the approach of analysing MS Office documents?

Locate potentially malicious embedded code
Extract suspicious code from the file
1. If shellcode - disassemble / debug
2. If other script - deobfuscate
Figure out the end goal and next stage of infection chain

How well did you know this?

Not at all

Perfectly

Name some tools to analyse MS Office documents?

OfficeMalScanner
OleDump
OffVis
python-oletools

How well did you know this?

Not at all

Perfectly

How to run pdf-id?

python pdfid.py <filename></filename>

How well did you know this?

Not at all

Perfectly

How to run pdf-parser?

python pdf-parser.py <filename></filename>

How well did you know this?

Not at all

Perfectly

How to parse out (see content) individual objects from a PDF?

python pdf-parser.py -o 5 -c <pdffile></pdffile>

-o : Object number
-c : Prints out content

How well did you know this?

Not at all

Perfectly

How to parse out individual objects from a PDF that is ASCII decoded?

python pdf-parser.py -o 5 -f <pdffile></pdffile>

-o : Object number
-f : Decodes filter

How well did you know this?

Not at all

Perfectly

How can hackers use PDF to put in malicious code?

Obfuscation by using filters

How well did you know this?

Not at all

Perfectly

How can you search for JavaScript within a PDF file?

Study These Flashcards

python pdf-parser.py -s Javascript <filename></filename>

-s : searches for JS

How to search for any OpenAction event in a PDF file?

Study These Flashcards

python pdf-parser.py -s OpenAction <filename></filename>

How can you analyse an ObjStm in a PDF?

Study These Flashcards

pdf-parser.py –s ObjStm –f –w 10.pdf | python pdfid.py -f

-s : search for string
-f : decode filters
-w : Output in raw data format
-f : force

What is a HeapSpray attack?

Study These Flashcards

The idea of a HeapSpray is to fill the area of memory known as the heap with shellcode, so that once our vulnerability triggers – the exploit code will be run. Fill almost all of the Heap with NOP instruction.

How can you dump an object of a PDF?

Study These Flashcards

python pdf-parser.py -o 5 -f -d exportname.exe <pdffile></pdffile>

How can you attach a file to a PDF?

Study These Flashcards

/EmbeddedFile
Append file after EOF

How to look for extra details in a PDF?

Study These Flashcards

python pdfid.py -e filename.pdf

How to extract a file which is appended to a pdf?

Study These Flashcards

python pdf-parser.py -x filename.pdf

What are AcroForms in PDFs?

Study These Flashcards

AcroForm are designed to allow PDF to include forms just like a webpage would have. They also however can trigger JavaScript.

What is an XML bomb?

An XML bomb is a message composed and sent with the intent of overloading an XML parser. The JavaScript loads the metadata of the PDF. Metadata is normally where you will find details such as the author, title, etc. but it can also include JS. ## Footnote **/JS (this.metadata;)**

How can you run a phising attack using a PDF?

By using the JS function **app.fs.isFullscreen = true** Combined with **AcroForms** it's perfect for a fake bank page.

How can you embed a VBS script to a PDF and make it run when you open the PDF?

By using the **/Launch** parameter (check with pdf-id.py)

How can you decrypt a pdf file?

**qPDF -decrypt input.pdf output.pdf** *(works with Owner password protected files)*

Name a online PDF sandbox?

**Joes Sandbox**

Name a few advanced PDF attacks?

* HeapSpray * Embedded EXE files * AcroForm * XML bomb * app.fs.isFullscreen = true * Encrypted PDF

What are the two file formats used for Office documents?

* OLE file format (doc/xls/ppt) * Office Open XML

PDF File Format / MS Office Flashcards

(31 cards)