PDF File Format / MS Office Flashcards
Name two main tools used to analyse PDF Files?
- PDF-ID counts the occurance of certain keywords
- PDF-Parser actually understands the file format.
It can decode content with “-f” option (/FlateDecode = Zlib compression). Can search for strings in PDF file with “-s”.
Both of these are command-line Python scripts.
What does a JavaScript need to be executed in a PDF?
Javascript needs /OpenAction event to be executed.
What are PDF Filters?
PDF Filter are applied from right to left. I.e. first /FlateDecode and then /ASCIIHexDecode.
How can you get around JavaScript Obfuscation in PDF files?
- Add <scipt> tag and HTML</scipt>
- Tools like Malzilla or Revelo
What are ObjectStreams?
An ObjectStream (/ObjStm) is a special type of object. It is an object, that contains a stream, that itself includes other objects. The idea is that multiple objects can be placed in one stream, and the whole stream can be compressed. In practice a document will generally have several object streams, that keep related items together – e.g. All objects for page 1, page 2 etc – this allows for the PDF to still be random accessed easily.
Can be analyzed like this: pdf-parser.py –s ObjStm –f –w 10.pdf | pdfid.py -f
Which tool would be more useful in analyzing a PDF file – a Hex Editor or a Text Editor?
Text – it’s a text based file format
What are the main part of a PDF file?
- Header
- Objects
- Cross Reference (Points to each object)
- Trailer (EOF, Offset to XREF)
What are you looking for in MS Office documents?
- Dropped Files (via Macro or exploit)
- Shellcode (via a vulnerability)
- URL callouts (acting as downloader)
- Malicious Scripts (download / dropping a file)
What is the approach of analysing MS Office documents?
- Locate potentially malicious embedded code
- Extract suspicious code from the file
- If shellcode - disassemble / debug
- If other script - deobfuscate
- Figure out the end goal and next stage of infection chain
Name some tools to analyse MS Office documents?
- OfficeMalScanner
- OleDump
- OffVis
- python-oletools
How to run pdf-id?
python pdfid.py <filename></filename>
How to run pdf-parser?
python pdf-parser.py <filename></filename>
How to parse out (see content) individual objects from a PDF?
python pdf-parser.py -o 5 -c <pdffile></pdffile>
- -o : Object number
- -c : Prints out content
How to parse out individual objects from a PDF that is ASCII decoded?
python pdf-parser.py -o 5 -f <pdffile></pdffile>
- -o : Object number
- -f : Decodes filter
How can hackers use PDF to put in malicious code?
Obfuscation by using filters
How can you search for JavaScript within a PDF file?
python pdf-parser.py -s Javascript <filename></filename>
- -s : searches for JS
How to search for any OpenAction event in a PDF file?
python pdf-parser.py -s OpenAction <filename></filename>
How can you analyse an ObjStm in a PDF?
pdf-parser.py –s ObjStm –f –w 10.pdf | python pdfid.py -f
- -s : search for string
- -f : decode filters
- -w : Output in raw data format
- -f : force
What is a HeapSpray attack?
The idea of a HeapSpray is to fill the area of memory known as the heap with shellcode, so that once our vulnerability triggers – the exploit code will be run. Fill almost all of the Heap with NOP instruction.
How can you dump an object of a PDF?
python pdf-parser.py -o 5 -f -d exportname.exe <pdffile></pdffile>
How can you attach a file to a PDF?
- /EmbeddedFile
- Append file after EOF
How to look for extra details in a PDF?
python pdfid.py -e filename.pdf
How to extract a file which is appended to a pdf?
python pdf-parser.py -x filename.pdf
What are AcroForms in PDFs?
AcroForm are designed to allow PDF to include forms just like a webpage would have. They also however can trigger JavaScript.
What is an XML bomb?
An XML bomb is a message composed and sent with the intent of overloading an XML parser. The JavaScript loads the metadata of the PDF. Metadata is normally where you will find details such as the author, title, etc. but it can also include JS.
/JS (this.metadata;)
How can you run a phising attack using a PDF?
By using the JS function app.fs.isFullscreen = true
Combined with AcroForms it’s perfect for a fake bank page.
How can you embed a VBS script to a PDF and make it run when you open the PDF?
By using the /Launch parameter (check with pdf-id.py)
How can you decrypt a pdf file?
qPDF -decrypt input.pdf output.pdf
(works with Owner password protected files)
Name a online PDF sandbox?
Joes Sandbox
Name a few advanced PDF attacks?
- HeapSpray
- Embedded EXE files
- AcroForm
- XML bomb
- app.fs.isFullscreen = true
- Encrypted PDF
What are the two file formats used for Office documents?
- OLE file format (doc/xls/ppt)
- Office Open XML