Skip to content

A tool to extract PDF files from binary files such as memory dumps or firmware images

License

Notifications You must be signed in to change notification settings

DanielGekeler/pdfdump666

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdfdump666

A tool to extract PDF files from binary files such as memory dumps or firmware images.

This tool searches for the magic number of pdf (%PDF-) in a given file and extracts matching sections as pdf files. This is very similar to binwalk.

It can be used for data recovery, forensics or reverse engineering.

Building

  1. Install go using the package manager of your operating system (apt, pacman, winget) or follow the official installation guide
  2. Clone this repository using git clone
  3. Open a terminal in the repository
  4. Compile the program with go build .
  5. The file ./pdfdump666 should now exist

Using

You can use gcore, part of gdb to create a memory dump of a running proccess on linux without killing it.

Do a dry run first to see if there might be a pdf in your file.

./pdfdump666 -d <path to file>

If no matches were found, the data might be compressed or encrypted which isn't supported by this tool.

Next, extract the pdf files.

./pdfdump666 <path to file>

Extracted pdf files will be put into a out directory within your current working directory.

Next Steps

This tool might extract multiple files. Only some of them work. Try opening the files with several pdf readers. Extracted files may be damaged or incomplete.

Consider processing the files using qpdf.

If none of the files can be opened, try another extraction including bytes before and after using the -b and -a options. A good range is between 0 and 1000.

Usage

Usage:
  pdfdump666 <file> [flags]

Flags:
  -a, --after int       include N bytes after
  -b, --before int      include N bytes before
  -d, --dry-run         dry run
  -h, --help            help for pdfdump666
  -o, --output string   output directory (default "out")

References

License

Copyright 2026 Daniel Gekeler

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

A tool to extract PDF files from binary files such as memory dumps or firmware images

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages