Lect19 - Bulk Extractor Flashcards
What are the physical search limitations?
- won’t look in compound documents
- ignores special encoding
- cannot see inside archives
…is best used when ASCII text hits are expected
Name a few feature that can be identified by bulk_extractor?
- Account numbers
- Domains
- Email accounts
- IP addresses
What data can be processed with bulk_extractor?
- Disk image
- Directories
BE is data format agnostic.
Like our grep and egrep command for physical searches, bulk_extractor also searches the entire disk or image:
- no awareness of allocated files vs. unallocated file system data units (clusters, etc.)
- works on files, disk images, blocks of data
- does not parse file system data or attempt to align data recovery by sector
- But works on: embedded data, compound documents, archives
What is a scanner?
Scanners are the modules used by BE to locate and report the features. For example: accts – scans for credit card numbers, track 2 information, and phone numbers. The results for this scanner are stored in the following feature files: ccn, ccn track2, telephone.
Explain the following scanners:
- accts
- exif
- find
- gps
- wordlist
- Accts searches for credit card numbers, track data, phone numbers, and other numbers
- Searches for headers, cookies, hostnames, IPs, emails, and URLs.
- finds images and their metadata
- Used for finding specific regular expressions
- finds Garmin-formatted XML containing GPS coordinates
- Finds Facebook HTML
- Create wordlist
How does the standard BE command looks like?
# bulk_extractor -o bulk_out myimage.E01
- o : Directory to write the results (bulk_extractor will create this)
- e : Enable
- E : Disable ALL scanners except
- x : Disable
- -b : Add banner text to report
- -F [filename] : search term list
Create BE command that disables all scanners except ZIP and FIND and search for “Uranium-235”?
bulk_extractor -E zip -e find -f “Uranium-235” -o blk_out <imagefile></imagefile>
What are the three output files?
- Feature files: Files that contain the output of each scanner.
- Histogram files: Files that show the frequency that each item in a feature file is encountered.
- The report file: A DFXML formatted report of the output and environment.
What are stop lists?
Every operating system and the external software we use has help files, manuals, and other documentation that contain email addresses, telephone numbers, and web addresses that are uninteresting, but will still end up in your bulk_extractor feature files and histograms. These false positives can be limited by using stop lists. A stop list can be a simple list of terms (or terms with context) that are blocked from the regular scanner feature files (but still reported in special stopped.txt files for each scanner). Example: -w stoplist.txt
How can you create a word list for password cracking?
# bulk_extractor -E wordlist -o outputdir image.e01