Skip to content

lavallee/epub2md

Repository files navigation

epub2md

A Python tool for converting EPUB ebooks to Markdown format with preserved structure, images, and metadata.

Features

  • Structure Preservation: Maintains chapter organization and hierarchy
  • Image Support: Extracts and saves images with proper linking
  • Metadata Extraction: Saves book metadata (title, author, description, etc.)
  • Flexible Output: Choose between separate chapter files or single consolidated file
  • Batch Processing: Convert multiple EPUB files at once
  • Clean Conversion: Sanitizes filenames and handles various EPUB formats

Installation

Install from PyPI (when available)

pip install epub2md

Install from Source

git clone https://github.com/lavallee/epub2md.git
cd epub_to_md
pip install -e .

Install from GitHub

pip install git+https://github.com/lavallee/epub2md.git

Usage

Command Line

After installation, use the epub2md command:

# Convert a single EPUB file
epub2md book.epub

# Convert multiple EPUB files
epub2md book1.epub book2.epub

# Convert all EPUB files in current directory
epub2md *.epub

# Save as single markdown file instead of separate chapters
epub2md book.epub --single-file

# Specify custom output directory
epub2md book.epub --output-dir my_books

# Enable verbose logging
epub2md book.epub --verbose

Python Module

You can also run it as a Python module:

python -m epub2md book.epub

Programmatic Usage

from epub2md import EPUBToMarkdownConverter

converter = EPUBToMarkdownConverter(
    output_base_dir="output",
    single_file=False,
    preserve_structure=True
)

success, output_path = converter.convert_epub("book.epub")
if success:
    print(f"Conversion successful! Output: {output_path}")
else:
    print(f"Conversion failed: {output_path}")

Output Structure

Default (Separate Chapters)

output/
  book_title_author/
    README.md           # Table of contents with links
    metadata.json       # Book metadata
    images/            # Extracted images
      image1.jpg
      image2.png
    chapters/          # Individual chapter files
      01_chapter_name.md
      02_chapter_name.md

Single File Mode

output/
  book_title_author/
    README.md           # Table of contents
    metadata.json       # Book metadata
    full_book.md        # Complete book in single file
    images/            # Extracted images
      image1.jpg
      image2.png

Command Line Options

usage: epub2md [-h] [-o OUTPUT_DIR] [-s] [--no-structure] [-v]
               epub_files [epub_files ...]

Convert EPUB files to Markdown format

positional arguments:
  epub_files            EPUB file(s) to convert

options:
  -h, --help            show this help message and exit
  -o, --output-dir OUTPUT_DIR
                        Base output directory (default: output)
  -s, --single-file     Save as single markdown file instead of separate
                        chapters
  --no-structure        Do not preserve chapter structure
  -v, --verbose         Enable verbose logging

Requirements

  • Python 3.8+
  • EbookLib
  • html2text
  • beautifulsoup4
  • Pillow
  • lxml

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Support

About

Python tool for converting EPUB files to Markdown for easier use with llms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages