XpdfText can be used in different ways:
- Convert entire PDF files or individual pages to plain text
- maintaining layout, or
- converting to "reading order"
- Extract text from a specified rectangle on a page
- useful for extracting text from forms
- Convert pages into word lists – for each word, you can
retrieve:
- font name and font size
- text color
- word position on the page
- character offset (for highlight files)
The extracted text can be converted to a wide choice of standard
encodings, including UTF-8 Unicode, ISO-8859-1 (Latin-1), 7-bit ASCII,
and various other language-specific encodings.
The XpdfText library also includes all of the functionality ofXpdfInfo.
Supported platforms:
- Windows: DLL
- Windows: COM component - usable from .NET, Visual Basic, Delphi, etc.
- Mac OS X: shared library
- Linux: shared library
- 32-bit and 64-bit versions available for all platforms
- other platforms: portable C++ source
code for the library is available
Chưa có hoặc chưa được cập nhật!