Skip to content

yanweian/PDFParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PDFParser

Extract highlight text from pdf for nlp.

Each line is responsible for one paragraph.

Usage

java -jar PDFHighLightExtractor.jar -i inFile | Directory [-o output.txt]

  • -i is neccessary
  • -o default is output.txt

Thanks

use some code from https://github.com/juanerasmoe/PDFHighlightExtractor

About

parse pdf to text for nlp

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages