PDF File Format / MS Office Flashcards
Name two main tools used to analyse PDF Files?
- PDF-ID counts the occurance of certain keywords
- PDF-Parser actually understands the file format.
It can decode content with “-f” option (/FlateDecode = Zlib compression). Can search for strings in PDF file with “-s”.
Both of these are command-line Python scripts.
What does a JavaScript need to be executed in a PDF?
Javascript needs /OpenAction event to be executed.
What are PDF Filters?
PDF Filter are applied from right to left. I.e. first /FlateDecode and then /ASCIIHexDecode.
How can you get around JavaScript Obfuscation in PDF files?
- Add <scipt> tag and HTML</scipt>
- Tools like Malzilla or Revelo
What are ObjectStreams?
An ObjectStream (/ObjStm) is a special type of object. It is an object, that contains a stream, that itself includes other objects. The idea is that multiple objects can be placed in one stream, and the whole stream can be compressed. In practice a document will generally have several object streams, that keep related items together – e.g. All objects for page 1, page 2 etc – this allows for the PDF to still be random accessed easily.
Can be analyzed like this: pdf-parser.py –s ObjStm –f –w 10.pdf | pdfid.py -f
Which tool would be more useful in analyzing a PDF file – a Hex Editor or a Text Editor?
Text – it’s a text based file format
What are the main part of a PDF file?
- Header
- Objects
- Cross Reference (Points to each object)
- Trailer (EOF, Offset to XREF)
What are you looking for in MS Office documents?
- Dropped Files (via Macro or exploit)
- Shellcode (via a vulnerability)
- URL callouts (acting as downloader)
- Malicious Scripts (download / dropping a file)
What is the approach of analysing MS Office documents?
- Locate potentially malicious embedded code
- Extract suspicious code from the file
- If shellcode - disassemble / debug
- If other script - deobfuscate
- Figure out the end goal and next stage of infection chain
Name some tools to analyse MS Office documents?
- OfficeMalScanner
- OleDump
- OffVis
- python-oletools
How to run pdf-id?
python pdfid.py <filename></filename>
How to run pdf-parser?
python pdf-parser.py <filename></filename>
How to parse out (see content) individual objects from a PDF?
python pdf-parser.py -o 5 -c <pdffile></pdffile>
- -o : Object number
- -c : Prints out content
How to parse out individual objects from a PDF that is ASCII decoded?
python pdf-parser.py -o 5 -f <pdffile></pdffile>
- -o : Object number
- -f : Decodes filter
How can hackers use PDF to put in malicious code?
Obfuscation by using filters
How can you search for JavaScript within a PDF file?
python pdf-parser.py -s Javascript <filename></filename>
- -s : searches for JS
How to search for any OpenAction event in a PDF file?
python pdf-parser.py -s OpenAction <filename></filename>
How can you analyse an ObjStm in a PDF?
pdf-parser.py –s ObjStm –f –w 10.pdf | python pdfid.py -f
- -s : search for string
- -f : decode filters
- -w : Output in raw data format
- -f : force
What is a HeapSpray attack?
The idea of a HeapSpray is to fill the area of memory known as the heap with shellcode, so that once our vulnerability triggers – the exploit code will be run. Fill almost all of the Heap with NOP instruction.
How can you dump an object of a PDF?
python pdf-parser.py -o 5 -f -d exportname.exe <pdffile></pdffile>
How can you attach a file to a PDF?
- /EmbeddedFile
- Append file after EOF
How to look for extra details in a PDF?
python pdfid.py -e filename.pdf
How to extract a file which is appended to a pdf?
python pdf-parser.py -x filename.pdf
What are AcroForms in PDFs?
AcroForm are designed to allow PDF to include forms just like a webpage would have. They also however can trigger JavaScript.