GithubHelp home page GithubHelp logo

cbrunet / python-poppler Goto Github PK

View Code? Open in Web Editor NEW
88.0 4.0 15.0 1.5 MB

Python binding to Poppler-cpp pdf library

License: GNU General Public License v2.0

Python 64.12% C++ 32.08% TeX 1.39% C 0.88% Meson 1.53%
python pdf poppler pybind11 poppler-library python-poppler poppler-cpp

python-poppler's People

Contributors

bnewbold avatar bzamecnik avatar cbrunet avatar dodopriester avatar frispete avatar mara004 avatar sandeepmistry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

python-poppler's Issues

Building on windows

Ok, so I'm putzing about trying to get this to build on windows

  • Install toolchain and dependencies
    • You'll need visual studio of the appropriate version for your python distro
    • Also Cmake
    • And pkgconfig (https://chocolatey.org/packages/pkgconfiglite from chocolatey is nice in that it doesn't need glib)
      • I think we could probably conditionally not use pkgconfig on windows, but I really don't feel like debugging cmakefiles today, so I just rolled with it. I'm kind of curious why pkgconfig was used instead of cmake's built-in find_library() call.
    • Get precompiled poppler from somewhere (it's a giant pain to build). I'm using releases from https://github.com/oschwartz10612/poppler-windows/releases
      • These builds seem to have some hard-coded paths in their *.pc files. prefix=D:/bld/poppler_1595515154908/_h_env/Library. I replaced them with relative paths and that seems to have worked: prefix=../../_h_env/Library
  • Add an environment variable PKG_CONFIG_PATH pointing to the lib\pkgconfig subdirectory of wherever you unzipped the poppler library
  • At this point, python setup.py bdist will configure successfully, and then try to build.

I'm now at the point where I'm hitting compiler differences:

C:\code\python-poppler\src\cpp\image.cpp(47,27): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'pybind11::buffer_info' [C:\code\python-poppler\build\temp.win-amd64-3.8\Release\image.vcxproj]
C:\code\python-poppler\src\cpp\image.cpp(54,5): message : No constructor could take the source type, or constructor overload resolution was ambiguous [C:\code\python-poppler\build\temp.win-amd64-3.8\Release\image.vcxproj]
C:\code\python-poppler\src\cpp\image.cpp(47,16): error C2064: term does not evaluate to a function taking 6 arguments [C:\code\python-poppler\build\temp.win-amd64-3.8\Release\image.vcxproj]

So it looks like it's not hugely difficult to get this to do things on windows.

Lovely!

Hello there!
You've made a lovely wrapper!
Thank you!

Non UTF-8 character fonts cause `UnicodeDecodeError`

I'm trying to parse a PDF that contains Chinese characters.
The text is extracted okay, but when I try to access fonts, I get the following error:

>>> box.get_font_name()  # Assume the box is extracted from some page, this box contains Chinese characters
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<PATH>/lib/python3.7/site-packages/poppler/utilities.py", line 90, in wrapped
    return fct(*args, **kwargs)
  File "<PATH>/lib/python3.7/site-packages/poppler/page.py", line 64, in get_font_name
    return self._text_box.get_font_name(i)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 7: invalid start byte

Trying to iterate fonts through the document itself results in the same error.

Environment:
Python 3.7.4
Poppler 21.12.0 (Compiled from source).
Happens on both Mac and Ubuntu.

I have seen other poppler bindings, such as this one that handles those errors (by using the replace keyword for decoding the string), but unfortunately it uses deprecated internal APIs and cannot be used with a newer version of poppler (even when trying to build from source).

If there was somehow a way to supply the required encoding or even suppress/ignore those errors, it would be very benficial.
I have seen another comment on another ticket that says we can request to expose the encoding/decoding in the cpp backend.

Build failure on i586 plattform

Hi,

nice project, BTW. While packaging for openSUSE, I came across a build issue:

[   15s] Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.RKb8e4
[   15s] + umask 022
[   15s] + cd /home/abuild/rpmbuild/BUILD
[   15s] + /usr/bin/rm -rf /home/abuild/rpmbuild/BUILDROOT/python-poppler-0.2.1-0.i386
[   15s] ++ dirname /home/abuild/rpmbuild/BUILDROOT/python-poppler-0.2.1-0.i386
[   15s] + /usr/bin/mkdir -p /home/abuild/rpmbuild/BUILDROOT
[   15s] + /usr/bin/mkdir /home/abuild/rpmbuild/BUILDROOT/python-poppler-0.2.1-0.i386
[   15s] + cd python-poppler-0.2.1
[   15s] ++ '[' -f _current_flavor ']'
[   15s] ++ true
[   15s] + python_flavor=
[   15s] + '[' -z '' ']'
[   15s] + python_flavor=tmp
[   15s] + '[' tmp '!=' python3 ']'
[   15s] + '[' -d build ']'
[   15s] + '[' -d _build.python3 ']'
[   15s] + echo python3
[   15s] + /usr/bin/python3 setup.py build '--executable=/usr/bin/python3 -s'
[   15s] running build
[   15s] running build_py
[   15s] creating build
[   15s] creating build/lib.linux-i686-3.8
[   15s] creating build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/__init__.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/_version.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/destination.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/document.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/embeddedfile.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/font.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/image.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/page.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/pagerenderer.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/pagetransition.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/rectangle.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/toc.py -> build/lib.linux-i686-3.8/poppler
[   15s] copying src/poppler/utilities.py -> build/lib.linux-i686-3.8/poppler
[   15s] creating build/lib.linux-i686-3.8/poppler/cpp
[   15s] copying src/poppler/cpp/__init__.py -> build/lib.linux-i686-3.8/poppler/cpp
[   15s] running egg_info
[   15s] writing src/python_poppler.egg-info/PKG-INFO
[   15s] writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
[   15s] writing top-level names to src/python_poppler.egg-info/top_level.txt
[   15s] reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
[   15s] reading manifest template 'MANIFEST.in'
[   15s] writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
[   15s] running build_ext
[   15s] -- The C compiler identification is GNU 10.2.1
[   15s] -- The CXX compiler identification is GNU 10.2.1
[   15s] -- Detecting C compiler ABI info
[   15s] -- Detecting C compiler ABI info - done
[   15s] -- Check for working C compiler: /usr/bin/cc - skipped
[   15s] -- Detecting C compile features
[   15s] -- Detecting C compile features - done
[   15s] -- Detecting CXX compiler ABI info
[   16s] -- Detecting CXX compiler ABI info - done
[   16s] -- Check for working CXX compiler: /usr/bin/c++ - skipped
[   16s] -- Detecting CXX compile features
[   16s] -- Detecting CXX compile features - done
[   16s] -- Found PythonInterp: /usr/bin/python3 (found version "3.8.5") 
[   16s] -- Found PythonLibs: /usr/lib/libpython3.8.so
[   16s] -- Performing Test HAS_CPP14_FLAG
[   16s] -- Performing Test HAS_CPP14_FLAG - Success
[   16s] -- pybind11 v2.5.0
[   16s] -- Found PkgConfig: /usr/bin/pkg-config (found version "1.7.3") 
[   16s] -- Checking for module 'poppler-cpp>=0.62.0'
[   16s] --   Found poppler-cpp, version 0.90.0
[   16s] -- Configuring done
[   16s] -- Generating done
[   16s] -- Build files have been written to: /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/build/temp.linux-i686-3.8
[   16s] Scanning dependencies of target global_
[   16s] Scanning dependencies of target version
[   16s] [  8%] Building CXX object CMakeFiles/version.dir/src/cpp/version.cpp.o
[   16s] [  8%] Building CXX object CMakeFiles/global_.dir/src/cpp/global.cpp.o
[   19s] [ 12%] Linking CXX shared module ../lib.linux-i686-3.8/poppler/cpp/version.cpython-38-i386-linux-gnu.so
[   19s] [ 12%] Built target version
[   19s] Scanning dependencies of target image
[   19s] [ 16%] Building CXX object CMakeFiles/image.dir/src/cpp/image.cpp.o
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp: In function ‘pybind11::buffer_info poppler::image_buffer_info(poppler::image&)’:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: error: no matching function for call to ‘pybind11::buffer_info::buffer_info(void*, long int, std::string, long int, <brace-encl’
[   19s]    54 |     );
[   19s]       |     ^
[   19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:89:5: note: candidate: ‘pybind11::buffer_info::buffer_info(pybind11::buffer_info::private_ctr_tag, void*, pybin’
[   19s]    89 |     buffer_info(private_ctr_tag, void *ptr, ssize_t itemsize, const std::string &format, ssize_t ndim,
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:89:5: note:   candidate expects 8 arguments, 6 provided
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:64:5: note: candidate: ‘pybind11::buffer_info::buffer_info(pybind11::buffer_info&&)’
[   19s]    64 |     buffer_info(buffer_info &&other) {
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:64:5: note:   candidate expects 1 argument, 6 provided
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:54:14: note: candidate: ‘pybind11::buffer_info::buffer_info(Py_buffer*, bool)’
[   19s]    54 |     explicit buffer_info(Py_buffer *view, bool ownview = true)
[   19s]       |              ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:54:14: note:   candidate expects 2 arguments, 6 provided
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:51:5: note: candidate: ‘template<class T> pybind11::buffer_info::buffer_info(const T*, pybind11::ssize_t, bool)’
[   19s]    51 |     buffer_info(const T *ptr, ssize_t size, bool readonly=true)
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:51:5: note:   template argument deduction/substitution failed:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: note:   candidate expects 3 arguments, 6 provided
[   19s]    54 |     );
[   19s]       |     ^
[   19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:47:5: note: candidate: ‘template<class T> pybind11::buffer_info::buffer_info(T*, pybind11::ssize_t, bool)’
[   19s]    47 |     buffer_info(T *ptr, ssize_t size, bool readonly=false)
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:47:5: note:   template argument deduction/substitution failed:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: note:   candidate expects 3 arguments, 6 provided
[   19s]    54 |     );
[   19s]       |     ^
[   19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:43:5: note: candidate: ‘pybind11::buffer_info::buffer_info(void*, pybind11::ssize_t, const string&, pybind11::s’
[   19s]    43 |     buffer_info(void *ptr, ssize_t itemsize, const std::string &format, ssize_t size, bool readonly=false)
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:43:5: note:   candidate expects 5 arguments, 6 provided
[   19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:40:5: note: candidate: ‘template<class T> pybind11::buffer_info::buffer_info(T*, pybind11::detail::any_containe’
[   19s]    40 |     buffer_info(T *ptr, detail::any_container<ssize_t> shape_in, detail::any_container<ssize_t> strides_in, bool readonly=false)
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:40:5: note:   template argument deduction/substitution failed:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: note:   candidate expects 4 arguments, 6 provided
[   19s]    54 |     );
[   19s]       |     ^
[   19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[   19s]                  from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:29:5: note: candidate: ‘pybind11::buffer_info::buffer_info(void*, pybind11::ssize_t, const string&, pybind11::s’
[   19s]    29 |     buffer_info(void *ptr, ssize_t itemsize, const std::string &format, ssize_t ndim,
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:30:89: note:   no known conversion for argument 6 from ‘<brace-enclosed initializer list>’ to ‘pybind11::detail’
[   19s]    30 |                 detail::any_container<ssize_t> shape_in, detail::any_container<ssize_t> strides_in, bool readonly=false)
[   19s]       |                                                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:27:5: note: candidate: ‘pybind11::buffer_info::buffer_info()’
[   19s]    27 |     buffer_info() { }
[   19s]       |     ^~~~~~~~~~~
[   19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:27:5: note:   candidate expects 0 arguments, 6 provided
[   20s] gmake[2]: *** [CMakeFiles/image.dir/build.make:82: CMakeFiles/image.dir/src/cpp/image.cpp.o] Error 1
[   20s] gmake[1]: *** [CMakeFiles/Makefile2:191: CMakeFiles/image.dir/all] Error 2
[   20s] gmake[1]: *** Waiting for unfinished jobs....
[   21s] [ 20%] Linking CXX shared module ../lib.linux-i686-3.8/poppler/cpp/global_.cpython-38-i386-linux-gnu.so
[   21s] [ 20%] Built target global_
[   21s] gmake: *** [Makefile:103: all] Error 2
[   21s] Traceback (most recent call last):
[   21s]   File "setup.py", line 76, in <module>
[   21s]     setup(
[   21s]   File "/usr/lib/python3.8/site-packages/setuptools/__init__.py", line 162, in setup
[   21s]     return distutils.core.setup(**attrs)
[   21s]   File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
[   21s]     dist.run_commands()
[   21s]   File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
[   21s]     self.run_command(cmd)
[   21s]   File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
[   21s]     cmd_obj.run()
[   21s]   File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
[   21s]     self.run_command(cmd_name)
[   21s]   File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
[   21s]     self.distribution.run_command(command)
[   21s]   File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
[   21s]     cmd_obj.run()
[   21s]   File "setup.py", line 39, in run
[   21s]     self.build_extension(ext)
[   21s]   File "setup.py", line 71, in build_extension
[   21s]     subprocess.check_call(
[   21s]   File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
[   21s]     raise CalledProcessError(retcode, cmd)
[   21s] subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j2']' returned non-zero exit status 2.
[   21s] error: Bad exit status from /var/tmp/rpm-tmp.RKb8e4 (%build)

Obviously, the 32bit signature of pybind11::buffer_info::buffer_info differs somehow.

Full build is available here.

Sorry, been in a hurry ATM, therefor no PR for now.

Not able to install poppler on google colab.

Hi everyone,

I am trying to install poppler on Google Colaboratory. I am getting the following error:

ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-trbzri3i/python-poppler/setup.py'"'"'; file='"'"'/tmp/pip-install-trbzri3i/python-poppler/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-20uytqm_/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

Any idea how to solve this or how to install poppler on Google Colab.
I have tried to install using git option too. Still not able to install. I get the following error while trying to install using git:

-- Configuring incomplete, errors occurred!
See also "/content/python-poppler/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
Traceback (most recent call last):
File "setup.py", line 106, in
zip_safe=False,
File "/usr/local/lib/python3.7/dist-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/usr/lib/python3.7/distutils/command/install_lib.py", line 109, in build
self.run_command('build_ext')
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "setup.py", line 39, in run
self.build_extension(ext)
File "setup.py", line 69, in build_extension
["cmake", ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/content/python-poppler', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/content/python-poppler/build/lib.linux-x86_64-3.7/poppler/cpp', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.

Thank you in advance.

Segfault if the document object is not explicitly stored before using search

This segfaults:

from poppler import load_from_data, SearchDirection, CaseSensitivity, load_from_file

# https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
page = load_from_file("dummy.pdf").create_page(0)

page_rect = page.page_rect()

dummy_rect = page.search("Dummy", page_rect, SearchDirection.from_top, CaseSensitivity.case_sensitive)

while this works

from poppler import load_from_data, SearchDirection, CaseSensitivity, load_from_file

# https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
doc = load_from_file("dummy.pdf")
page = doc.create_page(0)

page_rect = page.page_rect()

dummy_rect = page.search("Dummy", page_rect, SearchDirection.from_top, CaseSensitivity.case_sensitive)

Installation issue with pybind11.wrap

Hey, I'm struggling with the following problem. I have to build poppler behind a proxy, i.e. I have no direct internet access.
Yet the install requires one to download pybind11 from github. Is there a way to change the pybind11 dependency somehow, use a custom source_url in the wrap file or something like this?

      The Meson build system
      Version: 1.2.3
      Source dir: /tmp/pip-install-7b10_g5_/python-poppler_76bcf6dbd94d427f98b8759336351404
      Build dir: /tmp/pip-install-7b10_g5_/python-poppler_76bcf6dbd94d427f98b8759336351404/.mesonpy-xuv01fdp
      Build type: native build
      Project name: python-poppler
      Project version: 0.4.1
      C++ compiler for the host machine: c++ (gcc 8.5.0 "c++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)")
      C++ linker for the host machine: c++ ld.bfd 2.30-119
      Host machine cpu family: x86_64
      Host machine cpu: x86_64
      Found pkg-config: /usr/bin/pkg-config (1.4.2)
      Run-time dependency poppler-cpp found: YES 20.11.0
      Program python3 found: YES (/usr/bin/python3.11)
      Downloading pybind11 source from https://github.com/pybind/pybind11/archive/refs/tags/v2.10.3.tar.gz
      <urlopen error [Errno -2] Name or service not known>
      WARNING: failed to download with error: could not get https://github.com/pybind/pybind11/archive/refs/tags/v2.10.3.tar.gz is the internet available?. Trying after a delay...
      <urlopen error [Errno -2] Name or service not known>

Issue in installing python-poppler on GCP VM PyTorch/CUDA11.0.GPU

Hi,

i've tryed this nice tools on my windows laptop, no pb with installation.

I want to scale and itry to install it on my GCP VM ( PyTorch:1.7/CUDA11.0.GPU ) and installation crash ...

You can find below error msg, many thanks for your help.

Bests regards, Olivier.

Collecting python-poppler
Using cached python-poppler-0.2.2.tar.gz (595 kB)
Building wheels for collected packages: python-poppler
Building wheel for python-poppler (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /opt/conda/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/setup.py'"'"'; file='"'"'/tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-2ut924lj
cwd: /tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/
Complete output (91 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/poppler
copying src/poppler/pagerenderer.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/image.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/font.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/init.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/document.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/rectangle.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/_version.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/utilities.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/embeddedfile.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/destination.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/pagetransition.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/page.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/toc.py -> build/lib.linux-x86_64-3.7/poppler
creating build/lib.linux-x86_64-3.7/poppler/cpp
copying src/poppler/cpp/init.py -> build/lib.linux-x86_64-3.7/poppler/cpp
running egg_info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
running build_ext
-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PythonInterp: /opt/conda/bin/python3.7 (found version "3.7.10")
-- Found PythonLibs: /opt/conda/lib/libpython3.7m.so
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- pybind11 v2.5.0
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29")
-- Checking for module 'poppler-cpp>=0.26.0'
-- No package 'poppler-cpp' found
CMake Error at /usr/share/cmake-3.13/Modules/FindPkgConfig.cmake:452 (message):
A required package was not found
Call Stack (most recent call first):
/usr/share/cmake-3.13/Modules/FindPkgConfig.cmake:622 (_pkg_check_modules_internal)
CMakeLists.txt:14 (pkg_check_modules)

-- Configuring incomplete, errors occurred!
See also "/tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".

meson-python: The poppler package is split between purelib and platlib

pip install python_poppler==0.4.0 on python 3.10 (Ubuntu 22.04) using meson 1.1.0 fails with:

#30 28.35       + meson install --no-rebuild --destdir /tmp/pip-install-d59qdq37/python-poppler_8100653bafcf4616b4f77887e540edc4/.mesonpy-zrqhtx8x/install
#30 28.35       
#30 28.35       meson-python: error: The poppler package is split between purelib and platlib: 'purelib/poppler/__init__.py' and 'platlib/poppler/cpp/global_.cpython-310-x86_64-linux-gnu.so', a "pure: false" argument may be missing in meson.build
#30 28.35       [end of output]

-> "pure: false" argument may be missing in meson.build

Caused by meson-python-0.13.0 release:

  • Raise an error when a package is split between platlib and purelib.

More info: #74

I'd guess that we need to pass pure: false in the meson.build file here (similar to test code of meson, release notes, docs):

python3 = python_mod.find_installation('python3', pure: false)

failing image format tests

Hi Charles,

I found 0.2.2 builds failing for some time in our distributions.
It turned out to require a patch similar to:

Index: b/tests/test_image.py
===================================================================
--- a/tests/test_image.py
+++ b/tests/test_image.py
@@ -40,8 +40,8 @@ def test_data_size(pdf_page):


 def test_image_format_to_str():
-    assert str(Image.Format.argb32) == "BGRA"
-    assert str(Image.Format.invalid) == ""
+    assert str(Image.Format.argb32) in ("BGRA", "format_enum.argb32")
+    assert str(Image.Format.invalid) in ("", "format_enum.invalid")
 
 
 def test_image_memory_view(pdf_page):

It looks like being related to the pybind11 version, as we use the system provided one.

Is this intended? In other words, do you want me to prepare a PR with such a change, or do you plan to add some code to keep the old API?

If you want to check out my builds, look here.

The openSUSE_Leap_15.2 build is using an older pybind11, while the other builds use the current release.

Suppressing error messages

If the document is broken, Poppler spews error messages to stderr, which is messy (especially if I am processing multiple documents in parallel).

It would be great to have a possibility of silencing error messages (by setting Poppler's globalParams), or even better by providing a Python callback for reporting errors.

python-poppler not able to find poppler-cpp

Hello everyone,

I'm Arany. I am new to coding. I use Python. I'm here to ask about an issue with python-poppler. Let me explain the situation first.

So, I was trying to install python-poppler through pip and I have faced 3 issues so far and the first two are resolved. Let me list all of them below:

  1. Could not find vswhere.exe: I understood that I need both Visual Studio Installer (the management tool of VS) and Visual Studio to retain vswhere.exe and its functionalities, respectively.
  2. Could not find pkg-config: I downloaded pkg-config, glib and gettext-runtime and added it to the PATH environment variable. I placed pkg-config.exe, libglib-2.0.0.dll and intl.dll in the main folder and could not find it.
  3. (The current issue) Can't find poppler-cpp dependency

The third issue is where I need help. I installed Poppler, added the location into PATH and then python-poppler says it can't find it. It isn't a compatibility issue either! If this helps, to verify everything was done correctly, MS Copilot gave me the code 'poppler-cpp --version' and Command Prompt said it doesn't find any internal or external command, operable program or batch file named 'poppler-cpp'. I remembered that Copilot gave me the command 'pdftoppm -v' previously. 'pdftoppm' being a part of poppler, I used it as a substitute and it worked!

Please guide me through what I need to do to install python-poppler successfully.

Attribute error while installing poopler

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\adity\AppData\Local\Temp\pip-install-ii9h7dq\python-poppler_c004275ea29e4cdbbb1a83cfed9b71c9\setup.py", line 76, in
setup(
File "c:\program files\python39\lib\site-packages\setuptools_init
.py", line 153, in setup
return distutils.core.setup(**attrs)
File "c:\program files\python39\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "c:\program files\python39\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "c:\program files\python39\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "c:\program files\python39\lib\site-packages\setuptools\command\install.py", line 61, in run
return orig.install.run(self)
File "c:\program files\python39\lib\distutils\command\install.py", line 546, in run
self.run_command('build')
File "c:\program files\python39\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\program files\python39\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "c:\program files\python39\lib\distutils\command\build.py", line 135, in run
self.run_command(cmd_name)
File "c:\program files\python39\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\program files\python39\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\Users\adity\AppData\Local\Temp\pip-install-ii9h7dq\python-poppler_c004275ea29e4cdbbb1a83cfed9b71c9\setup.py", line 24, in run
out = subprocess.check_output(["cmake", "--version"])
File "c:\program files\python39\lib\subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\Users\adity\AppData\Roaming\Python\Python39\site-packages\run_init
.py", line 145, in new
process = cls.create_process(command, stdin, cwd=cwd, env=env, shell=shell)
File "C:\Users\adity\AppData\Roaming\Python\Python39\site-packages\run_init_.py", line 121, in create_process
shlex.split(command),
File "c:\program files\python39\lib\shlex.py", line 315, in split
return list(lex)
File "c:\program files\python39\lib\shlex.py", line 300, in next
token = self.get_token()
File "c:\program files\python39\lib\shlex.py", line 109, in get_token
raw = self.read_token()
File "c:\program files\python39\lib\shlex.py", line 140, in read_token
nextchar = self.instream.read(1)
AttributeError: 'list' object has no attribute 'read'

Fails to install with old Ubuntu Xenial (16.04) build environment

I have a strange work environment: an Ubuntu Xenial VM, with libpoppler85 (via the cran/poppler PPA backport), python3.7.7 (via deadsnakes PPA), and a virtualenv for python modules:

bnewbold@bnewbold-dev$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

bnewbold@bnewbold-dev$ cmake --version
cmake version 3.17.3

CMake suite maintained and supported by Kitware (kitware.com/cmake).
bnewbold@bnewbold-dev$ python --version
Python 3.7.7
bnewbold@bnewbold-dev$ head -n2 /etc/os-release 
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"

This means I have a compatible version of libpoppler-cpp-dev, but a version of GCC which does not support C++17.

When building this wrapper package, I get the following build error both with pip install python-poppler (within virtualenv) or with python setup.py install (in git checkout, inside virtualenv):

  [pipenv.exceptions.InstallError]:   [ 62%] Linking CXX shared module ../lib.linux-x86_64-3.7/poppler/cpp/global_.cpython-37m-x86_64-linux-gnu.so
1870 [pipenv.exceptions.InstallError]:   /tmp/pip-install-b9bntv0h/python-poppler/src/cpp/version.cpp:22:18: error: expected ‘{’ before ‘::’ token
1871 [pipenv.exceptions.InstallError]:    namespace poppler::version
1872 [pipenv.exceptions.InstallError]:                     ^
1873 [pipenv.exceptions.InstallError]:   /tmp/pip-install-b9bntv0h/python-poppler/src/cpp/version.cpp:22:20: error: ‘version’ in namespace ‘::’ does not name a type
1874 [pipenv.exceptions.InstallError]:    namespace poppler::version
1875 [pipenv.exceptions.InstallError]:                       ^
1876 [pipenv.exceptions.InstallError]:   /tmp/pip-install-b9bntv0h/python-poppler/src/cpp/version.cpp:33:1: error: expected ‘}’ at end of input
1877 [pipenv.exceptions.InstallError]:    } // namespace poppler::version
1878 [pipenv.exceptions.InstallError]:    ^
1879 [pipenv.exceptions.InstallError]:   CMakeFiles/version.dir/build.make:62: recipe for target 'CMakeFiles/version.dir/src/cpp/version.cpp.o' failed
1880

My experience with C++ is pretty dated, but I believe this is because of using nested namespace syntax:

namespace poppler::version
{
// [...]
}

which is a C++17 feature (?). Build works with more verbose syntax:

namespace poppler
{
namespace version
{
// [...]
}
}

I know this is an old and strange environment to support, but it seems like the fix is very simple. Alternatively, if C++17 syntax is intended to be required, I think CMake can be informed of that dependency and give a cleaner error message.

Thank you for maintaining this wrapper package! As some context of how I am hoping to use this, I work at the Internet Archive and am looking to use poppler (from python) to extract metadata, text, and thumbnails for hundreds of millions of PDFs that we have crawled from the web. We are in the process of upgrading away from Xenial but it will take many months to complete the transition.

not able to install in windows

  1. command "pip install python-poppler " in windows 11. The system cannot find poppler-cpp.
    image

  2. But I have "poppler-cpp.lib" in this dir (after run "conda install -c conda-forge poppler"), whose path is already included in the environmental variable.

image

Segmentation fault (core dumped)

Hi, Thank you for this amazing work. Recently I was working with some pdf and poppler was working great for most of them but for some of those pdf I am seeing the following error:-

Segmentation fault (core dumped)

Considering this is a memory issue I also can't put it in a try & catch to prevent my code from rebooting the workers again and again just to be stuck over there. This has been a major problem for me.
To give you some context and debugging that I have gone ahead with:-

  1. The segmentation error happens when I call:- page.text_list(page.TextListOption.text_list_include_font)
  2. If I remove the optional enum, the error does not come anymore, also if I use pdf_document.create_font_iterator(), this also works but while getting this on the text_box level I face this error
  3. As soon as it hits:- boxes = self._page.text_list(opt_flag) in page.py the code is stopped with the error
  4. I initially thought that this might be an upstream error in the CPP code itself, but other libraries which are based on poppler itself seem to work fine on this pdf, hence my thought that something must be happening in the python bindings

The metadata for the pdf that I see such errors with is mostly (not always):-

{'Producer': 'macOS Version 11.2.3 (Build 20D91) Quartz PDFContext', 'Creator': 'Pages'}

The code to repro the error:-

from poppler import load_from_file
file_path = "sample_pdf.pdf"
pdf_document = load_from_file(file_path)
no_of_pages = pdf_document.pages
for page_ind in range(no_of_pages):
    page = pdf_document.create_page(page_ind)
    text_list = page.text_list(page.TextListOption.text_list_include_font)

The link to the pdf:- https://drive.google.com/file/d/180CDGyiJRfytvuzVsAiYKppHvaBABGkJ/view?usp=sharing
Please request access to the pdf as I can't share it publically. (Really sorry for this, but I hope you understand)

ERROR!!! [HELP WANTED]

Iam currently working on a PDF viewer software with the help of Python-Poppler but unfortunately I ran into ERROR while installing the package.

error1
error2

What can I do for solving this ERROR????

Segmentation fault with EmbeddedFile class

I want to use the EmbeddedFile class. However, I get a segmentation fault.

from poppler import load_from_file

pdf_document = load_from_file("Portfolio.pdf")

for file in pdf_document.embedded_files():
     print(file.name)

What kind of information could you help further? I saw at the TODO file, that EmbeddedFile is not still testet. Does it work yet?

Font name returns *ignored* and font size=-1

Hi,

I'm trying to get font information out of pdf document:

document = load_from_file(file_path)
page = document.create_page(0)
box = page.text_list()[0]

print(box.text) # prints the text as expected
print(box.get_font_name()) # prints "*ingored*
print(box.get_font_size()) # prints "-1"

Using poppler version 0.90 on macos

Issue in installing python-poppler

Hey

I am trying to install python-poppler on my ubuntu machine. It gets successfully installed but when I try to import it, I see this error.

>>> import poppler
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/poppler/build/py-virt3/lib/python3.6/site-packages/poppler/__init__.py", line 22, in <module>
    from poppler.document import load, load_from_file, load_from_data
  File "/home/ec2-user/poppler/build/py-virt3/lib/python3.6/site-packages/poppler/document.py", line 21, in <module>
    from poppler.destination import Destination
  File "/home/ec2-user/poppler/build/py-virt3/lib/python3.6/site-packages/poppler/destination.py", line 21, in <module>
    from poppler.cpp.destination import type_enum as DestinationType  # noqa
ImportError: cannot import name 'type_enum'

I have installed poppler from source using instructions mentioned here. Any help would be appreciated.

Thanks.

Does not handle loading failures

When I try to load a document, which is not a PDF, load_from_file succeeds, but all subsequent methods crash. The reason is that self._document becomes None and most subsequent methods are decorated with @ensure_unlocked, leading to a call of self._document.is_locked.

I am willing to fix it, but which solution do you prefer? We can add an is_broken method, which can be used after loading the document to check if it was loaded successfully. Or we can make loading throw an exception upon failure.

Is it possible to know the order of the text blocks

Firstly, thanks for your contribution thus far.

I've been using Poppler for a while now and it is not clear how to sort the boxes that we receive.

I noticed in Poppler there is a "PopplerStructureElement" https://poppler.freedesktop.org/api/glib/PopplerStructureElement.html , which, I believe, allow us to understand which box comes after which.

Is there a way to replicate this? Or is there an alternative way to know the order for certain?

Thanks in advance

installation issue: can't build wheel for python-poppler

im runing on mac intel

the output is :

Defaulting to user installation because normal site-packages is not writeable
Collecting python-poppler==0.3.0
Using cached python-poppler-0.3.0.tar.gz (823 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: python-poppler
Building wheel for python-poppler (pyproject.toml) ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3.6 /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/tmpgpdvlixh
cwd: /private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-install-oc4qvs72/python-poppler_d930c1d82b0346c29cf6165469df5d7f
Complete output (85 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.6
creating build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/toc.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/pagetransition.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/destination.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/_version.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/pagerenderer.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/init.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/page.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/font.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/document.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/rectangle.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/embeddedfile.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/utilities.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/image.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
creating build/lib.macosx-10.9-x86_64-3.6/poppler/cpp
copying src/poppler/cpp/init.py -> build/lib.macosx-10.9-x86_64-3.6/poppler/cpp
running egg_info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.txt'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
running build_ext
Traceback (most recent call last):
File "setup.py", line 24, in run
out = subprocess.check_output(["cmake", "--version"])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cmake': 'cmake'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in
main()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 262, in build_wheel
metadata_directory)
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 231, in build_wheel
wheel_directory, config_settings)
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 215, in _build_with_temp_dir
self.run_setup()
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 268, in run_setup
self).run_setup(setup_script=setup_script)
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 158, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 108, in
zip_safe=False,
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "setup.py", line 28, in run
+ ", ".join(e.name for e in self.extensions)
RuntimeError: CMake must be installed to build the following extensions: poppler.cpp.modules

ERROR: Failed building wheel for python-poppler
Failed to build python-poppler
ERROR: Could not build wheels for python-poppler, which is required to install pyproject.toml-based projects

python-poppler-glib is not outdated

This is just a doc comment: your note on https://cbrunet.github.io/python-poppler/ that the python-poppler poppler-glib binding is outdated is quite misleading:

python-poppler
    Binding based on poppler-glib. Latest version is from 2009…

This module has not been updated because it is now an integral part of poppler-glib. It works fine with python3, and it's probably the most complete binding. The thing which is actually quite lagging is poppler-cpp (no access to annotations for example, which was my use case, but I guess that there are other things missing).

Because your project comes very early in search results for "python poppler", and because the existence of the poppler-glib binding is not easily discoverable, it would be really helpful if you could update this part of your documentation.

Is it possible to know using this tool if a font is embedded and has to_unicode map?

The workflow of one of my scripts relies on calling subprocess.run for each pdf page to get its respective pdffonts output. Those calls are expensive, so I'm looking forward to better ways to identify likely problematic fonts from font name, enconding, "embeddedness" and unicode conversion.

Example of pdffonts output:

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
DIIEDG+Arial                         CID TrueType      Identity-H       yes yes no   51834  0
DIIEDH+Arial                         CID TrueType      Identity-H       yes yes no   51831  0
DIIDPF+ArialMT                       CID TrueType      Identity-H       yes yes yes  51824  0
DIIEBG+TimesNewRomanPSMT             CID TrueType      Identity-H       yes yes yes  51821  0
[none]                               Type 3            Custom           yes no  no   51861  0
Arial                                TrueType          WinAnsi          yes no  no   67975  0

Is it possible to extract those informations just with poppler backend used here? I just took a look at the source code and I've not found something like that.

Thanks in advance,

[Enhancement]: Provide a way to access images embedded in a PDF

I'm trying to extract both the text and images in a PDF, ideally without recompressing or converting the image formats if possible.

This is pretty trivial to do with pdfminer or PyPDF2, but they're cripplingly slow for extracting text (A PDF which poppler/python-poppler processes in ~1-2 seconds takes 200+ seconds~). On the other hand, there doesn't seem to be any way to get images through python-poppler.

Now, poppler provides a pdfimages utility that uses poppler to extract images, but it looks like it's pretty annoying to do internally.

AFICT, basically the way the pdfimages extracts images is to provide a custom page rendering device (ImageOutputDev), which rather then doing actual compositing when rendering a page just ignores all draw commands other then drawImage_xxx() calls, and instead just saves the data passed to the draw image calls.

I think it'd be pretty easy to tweak ImageOutputDev.cc to instead write images to memory, and then provide a python call that returns the images as bytes, but I don't have a great idea how to start integrating this with the wrapper bits.

Exceptions in `page.py` after poppler update

After applying distribution updates to poppler on KDE Neon User, I got AttributeError exceptions when importing python-poppler, caused by the following code passages:

if version() >= (0, 89, 0):
dictionary["WritingMode"] = page.writing_mode_enum

if version() >= (0, 89, 0):
dictionary["TextListOption"] = page.text_list_option_enum

Removing these blocks fixed the issue for me. I installed python-poppler from source. The distribution-provided poppler package is at version 22.04.0.

Installing on MacOsX Catalina

I can't seem to be able to install python-poppler on my system (MacOS X Catalina).
I have Xcode installed with the commandline utilities (I can compile with clang).
I run brew install poppler with no issue, the files are correctly found at /usr/local/opt/poppler.
My CFLAGS and CPPFLAGS environment variables both include -I/usr/local/opt/poppler/include.
I manage python2/3 via pyenv. pyenv version gives me 3.8.2 as expected.

When I run pip install python-poppler the installation fails.
Examining the log it seems that the compiler fails to find the header files:

    running build_ext
    -- The C compiler identification is AppleClang 11.0.3.11030032
    -- The CXX compiler identification is AppleClang 11.0.3.11030032
    -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
    -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
    -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Found PkgConfig: /usr/local/bin/pkg-config (found version "0.29.2")
    -- Checking for module 'poppler-cpp>=0.62.0'
    --   Found poppler-cpp, version 0.89.0
    -- Found PythonInterp: /Users/bordaigorl/.pyenv/versions/3.8.2/bin/python3.8 (found version "3.8.2")
    -- Found PythonLibs: /Users/bordaigorl/.pyenv/versions/3.8.2/lib/libpython3.8.a
    -- Performing Test HAS_CPP14_FLAG
    -- Performing Test HAS_CPP14_FLAG - Success
    -- pybind11 v2.5.0
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- LTO enabled
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /private/var/folders/jx/r7g7g23n5rb_fdycrtdl5sdw0000gn/T/pip-install-j_u40vtr/python-poppler/build/temp.macosx-10.15-x86_64-3.8
    Scanning dependencies of target font
    Scanning dependencies of target page_renderer
    [  8%] Building CXX object CMakeFiles/page_renderer.dir/src/cpp/page_renderer.cpp.o
    [  8%] Building CXX object CMakeFiles/font.dir/src/cpp/font.cpp.o
    In file included from /private/var/folders/jx/r7g7g23n5rb_fdycrtdl5sdw0000gn/T/pip-install-j_u40vtr/python-poppler/src/cpp/page_renderer.cpp:19:
    /private/var/folders/jx/r7g7g23n5rb_fdycrtdl5sdw0000gn/T/pip-install-j_u40vtr/python-poppler/src/cpp/version.h:21:10: fatal error: 'poppler/cpp/poppler-version.h' file not found
    #include <poppler/cpp/poppler-version.h>
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1 error generated.

I was very confused because clang sees the includes just fine if run manually.
I then thought that maybe it's cmake's fault and found this:
https://gitlab.kitware.com/cmake/cmake/-/issues/19120
So I symlinked

sudo ln -s /usr/local/include/poppler /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/poppler

and now pip install python-poppler works.
Is there a way to fix this without dodgy symlinks?

Poor image rendering quality

Hello @cbrunet . Excellent tool. This is one of the fastest ways to generate PDF images. Thanks for building this.

I have tried python-poppler to render a PDF page, following the documentation available at https://cbrunet.net/python-poppler/usage.html. I have used the following code.

import poppler

doc = poppler.load_from_file(PDF_PATH)
renderer = poppler.PageRenderer()

page = doc.create_page(54)
image = renderer.render_page(page, xres=300, yres=300)
image.save('test.png', 'png', dpi=300)

However, the resultant image has poor rendering, compared to the ones generated using pdftocairo. I am assuming that it is an issue with hinting. I am attaching sample images for your perusal. Zoom in to each of the file to see the difference. The font edges are corrugated in case of python-poppler generated image.

pdftocairo
python_poppler

install in ubuntu 22.04

I tried to install in ubuntu22.04 and i have same problem #28
I install libpoppler-cpp-dev and after that, everything work well
In this issue, I just want to say how I fix #28
here is my console log

(avada) agent@tm:~/Downloads/python-poppler$ git submodule update --init --recursive
(avada) agent@tm:~/Downloads/python-poppler$ python setup.py install
running install
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
creating src/python_poppler.egg-info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.txt'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/poppler
copying src/poppler/document.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/utilities.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/toc.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/rectangle.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/__init__.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/page.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/pagerenderer.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/pagetransition.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/font.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/destination.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/image.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/embeddedfile.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/_version.py -> build/lib.linux-x86_64-3.8/poppler
creating build/lib.linux-x86_64-3.8/poppler/cpp
copying src/poppler/cpp/__init__.py -> build/lib.linux-x86_64-3.8/poppler/cpp
running build_ext
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 11.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- pybind11 v2.9.2 
-- Found PythonInterp: /home/agent/anaconda3/envs/avada/bin/python (found version "3.8.13") 
-- Found PythonLibs: /home/agent/anaconda3/envs/avada/lib/libpython3.8.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2") 
-- Checking for module 'poppler-cpp>=0.26.0'
--   No package 'poppler-cpp' found
CMake Error at /usr/share/cmake-3.22/Modules/FindPkgConfig.cmake:603 (message):
  A required package was not found
Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/FindPkgConfig.cmake:825 (_pkg_check_modules_internal)
  CMakeLists.txt:14 (pkg_check_modules)


-- Configuring incomplete, errors occurred!
See also "/home/agent/Downloads/python-poppler/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log".
Traceback (most recent call last):
  File "setup.py", line 76, in <module>
    setup(
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 148, in setup
    return run_commands(dist)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
    dist.run_commands()
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
    self.run_command(cmd)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
    super().run_command(command)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py", line 74, in run
    self.do_egg_install()
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
    super().run_command(command)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
    super().run_command(command)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/command/install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
    super().run_command(command)
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "setup.py", line 39, in run
    self.build_extension(ext)
  File "setup.py", line 68, in build_extension
    subprocess.check_call(
  File "/home/agent/anaconda3/envs/avada/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/agent/Downloads/python-poppler', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/agent/Downloads/python-poppler/build/lib.linux-x86_64-3.8/poppler/cpp', '-DPYTHON_EXECUTABLE=/home/agent/anaconda3/envs/avada/bin/python', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
(avada) agent@tm:~/Downloads/python-poppler$ sudo apt-get update -y
Hit:1 http://vn.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://vn.archive.ubuntu.com/ubuntu jammy-updates InRelease                                                                                            
Hit:3 http://vn.archive.ubuntu.com/ubuntu jammy-backports InRelease                                                                                          
Hit:4 http://vn.archive.ubuntu.com/ubuntu jammy-security InRelease                                                                                           
Hit:5 http://packages.microsoft.com/repos/code stable InRelease                                                                                              
Hit:6 https://dl.google.com/linux/chrome/deb stable InRelease                                                                                                
Hit:7 https://packages.microsoft.com/repos/edge stable InRelease                                                                                             
Hit:8 https://linux.teamviewer.com/deb stable InRelease                                                                                                      
Hit:9 https://download.sublimetext.com apt/stable/ InRelease                                                                              
Hit:10 https://ppa.launchpadcontent.net/bamboo-engine/ibus-bamboo/ubuntu jammy InRelease                           
Hit:11 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:12 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Ign:13 https://ppa.launchpadcontent.net/numix/ppa/ubuntu jammy InRelease
Hit:14 https://ppa.launchpadcontent.net/papirus/papirus/ubuntu jammy InRelease
Err:15 https://ppa.launchpadcontent.net/numix/ppa/ubuntu jammy Release
  404  Not Found [IP: 185.125.190.52 443]
Reading package lists... Done
W: https://linux.teamviewer.com/deb/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
W: https://download.sublimetext.com/apt/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
E: The repository 'https://ppa.launchpadcontent.net/numix/ppa/ubuntu jammy Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
(avada) agent@tm:~/Downloads/python-poppler$ sudo apt-get install -y libpoppler-cpp-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
  libmessaging-menu0
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  libpoppler-cpp-dev
0 upgraded, 1 newly installed, 0 to remove and 19 not upgraded.
Need to get 11,7 kB of archives.
After this operation, 89,1 kB of additional disk space will be used.
Get:1 http://vn.archive.ubuntu.com/ubuntu jammy/main amd64 libpoppler-cpp-dev amd64 22.02.0-2 [11,7 kB]
Fetched 11,7 kB in 1s (12,6 kB/s)             
Selecting previously unselected package libpoppler-cpp-dev:amd64.
(Reading database ... 392845 files and directories currently installed.)
Preparing to unpack .../libpoppler-cpp-dev_22.02.0-2_amd64.deb ...
Unpacking libpoppler-cpp-dev:amd64 (22.02.0-2) ...
Setting up libpoppler-cpp-dev:amd64 (22.02.0-2) ...
(avada) agent@tm:~/Downloads/python-poppler$ python setup.py install
running install
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.txt'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
-- pybind11 v2.9.2 
-- Checking for module 'poppler-cpp>=0.26.0'
--   Found poppler-cpp, version 22.02.0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/agent/Downloads/python-poppler/build/temp.linux-x86_64-3.8
[  4%] Building CXX object CMakeFiles/global_.dir/src/cpp/global.cpp.o
[  8%] Building CXX object CMakeFiles/version.dir/src/cpp/version.cpp.o
[ 12%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/version.cpython-38-x86_64-linux-gnu.so
[ 12%] Built target version
[ 16%] Building CXX object CMakeFiles/rectangle.dir/src/cpp/rectangle.cpp.o
[ 20%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/global_.cpython-38-x86_64-linux-gnu.so
[ 20%] Built target global_
[ 25%] Building CXX object CMakeFiles/image.dir/src/cpp/image.cpp.o
[ 29%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/rectangle.cpython-38-x86_64-linux-gnu.so
[ 29%] Built target rectangle
[ 33%] Building CXX object CMakeFiles/document.dir/src/cpp/document.cpp.o
[ 37%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/image.cpython-38-x86_64-linux-gnu.so
[ 37%] Built target image
[ 41%] Building CXX object CMakeFiles/page.dir/src/cpp/page.cpp.o
[ 45%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/document.cpython-38-x86_64-linux-gnu.so
[ 45%] Built target document
[ 50%] Building CXX object CMakeFiles/page_renderer.dir/src/cpp/page_renderer.cpp.o
[ 54%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/page.cpython-38-x86_64-linux-gnu.so
[ 54%] Built target page
[ 58%] Building CXX object CMakeFiles/page_transition.dir/src/cpp/page_transition.cpp.o
[ 62%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/page_renderer.cpython-38-x86_64-linux-gnu.so
[ 62%] Built target page_renderer
[ 66%] Building CXX object CMakeFiles/embedded_file.dir/src/cpp/embedded_file.cpp.o
[ 70%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/page_transition.cpython-38-x86_64-linux-gnu.so
[ 70%] Built target page_transition
[ 75%] Building CXX object CMakeFiles/destination.dir/src/cpp/destination.cpp.o
[ 79%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/embedded_file.cpython-38-x86_64-linux-gnu.so
[ 79%] Built target embedded_file
[ 83%] Building CXX object CMakeFiles/toc.dir/src/cpp/toc.cpp.o
[ 87%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/toc.cpython-38-x86_64-linux-gnu.so
[ 87%] Built target toc
[ 91%] Building CXX object CMakeFiles/font.dir/src/cpp/font.cpp.o
[ 95%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/destination.cpython-38-x86_64-linux-gnu.so
[ 95%] Built target destination
[100%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/font.cpython-38-x86_64-linux-gnu.so
[100%] Built target font
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/document.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/utilities.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/toc.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/rectangle.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/__init__.py -> build/bdist.linux-x86_64/egg/poppler
creating build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/image.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/page_renderer.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/destination.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/font.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/__init__.py -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/global_.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/version.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/page.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/toc.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/document.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/rectangle.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/page_transition.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/embedded_file.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/page.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/pagerenderer.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/pagetransition.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/font.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/destination.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/image.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/embeddedfile.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/_version.py -> build/bdist.linux-x86_64/egg/poppler
byte-compiling build/bdist.linux-x86_64/egg/poppler/document.py to document.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/utilities.py to utilities.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/toc.py to toc.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/rectangle.py to rectangle.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/cpp/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/page.py to page.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/pagerenderer.py to pagerenderer.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/pagetransition.py to pagetransition.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/font.py to font.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/destination.py to destination.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/image.py to image.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/embeddedfile.py to embeddedfile.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/_version.py to _version.cpython-38.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/not-zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
creating dist
creating 'dist/python_poppler-0.3.0-py3.8-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing python_poppler-0.3.0-py3.8-linux-x86_64.egg
creating /home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/python_poppler-0.3.0-py3.8-linux-x86_64.egg
Extracting python_poppler-0.3.0-py3.8-linux-x86_64.egg to /home/agent/anaconda3/envs/avada/lib/python3.8/site-packages
Adding python-poppler 0.3.0 to easy-install.pth file

Installed /home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/python_poppler-0.3.0-py3.8-linux-x86_64.egg
Processing dependencies for python-poppler==0.3.0
Finished processing dependencies for python-poppler==0.3.0
(avada) agent@tm:~/Downloads/python-poppler$ 

Cannot install on Windows

When I try to install poppler in Anaconda or CMD:
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org python-poppler

The error occured. Log attached
pytest.1.txt

Please advice how to fix it

Relax `unlock()`/`load_from_...()` to also accept only one of both passwords

While doing tests with encrypted PDFs, I noticed that python-poppler strictly requires both the user and the owner password to be given for decrypting a file.

This behaviour is different from all other libraries I tried (pikepdf¹, pymupdf², redstork³, pypdfium-reboot⁴). These only have one password argument and then automatically detect whether it is the owner or user password.

The current behaviour does not make sense, since there is no need for a user password when the owner password was given, as the owner already has all permissions. And if someone only has the user password, they have no chance reading the file with python-poppler, since they do not have the owner password but it is a strictly required argument (opening fails when it is set to "" or None).

So this should be changed so that only one argument of user_password and owner_password is sufficient, or (even better) you could merge both into one password argument and then auto-detect which one it is.

References:
¹ https://pikepdf.readthedocs.io/en/latest/api/main.html#pikepdf.Pdf.open
² https://pymupdf.readthedocs.io/en/latest/document.html#Document.authenticate
³ https://red-stork.readthedocs.io/en/latest/reference.html#redstork.Document
https://developers.foxit.com/resources/pdf-sdk/c_api_reference_pdfium/group___f_p_d_f_i_u_m.html#gaf783381b0fe5d3f579e9443b3877a7b1

I have attached the file that I used for testing - owner password is test_owner and user password test_user
encrypted.pdf

Clarification and Improvement of `rect` and `rectf` Handling

While using PyLance for local development, I've encountered an inconsistency in the Python bindings related to the handling of Rectangle objects (rect and rectf). Specifically, the constructor for a Rectangle can accept either four floats or four integers, creating a rect or rectf respectively. However, this distinction becomes unclear when using certain functions like Page.text() that specifically require a rectf for the bounding box.

Currently, the bindings do not clearly differentiate between a rect and a rectf from the perspective of a developer working with Rectangles. This leads to confusion, especially since there's no visible difference in the Python code.

Additionally, there's an issue with PyLance when attempting to construct a Rectangle using float values. PyLance reports an error because the default arguments for the constructor are integers, causing type conflicts.

Proposed Solutions:

To address these issues, I suggest one of the following approaches:

  1. Document and Distinguish rect and rectf Types:

    • Update the documentation to clearly differentiate between rect and rectf.
    • Overload the Rectangle constructor to include both (x: float, y: float, w: float, h: float) and (x: int, y: int, w: int, h: int). This provides explicit constructors for each type and makes it clear to the developer which type they are working with.
  2. Transparent Handling of Type Differences:

    • Modify the Rectangle constructor to handle both int and float types seamlessly. This approach would abstract the complexity from the developer, allowing for more flexible and intuitive usage of the API.

installation fails on mac due to c++11 not being used

It fails like in https://stackoverflow.com/questions/45047508/error-unknown-type-name-constexpr-during-make-in-mac-os-x:

 FAILED: src/cpp/document.cpython-310-darwin.so.p/document.cpp.o
c++ -Isrc/cpp/document.cpython-310-darwin.so.p -Isrc/cpp -I../../src/cpp -I../../subprojects/pybind11-2.10.3/include -I/opt/homebrew/Cellar/poppler/23.04.0/include/poppler/cpp -I/opt/homebrew/Cellar/poppler/23.04.0/include/poppler -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.10/include/python3.10 -fvisibility=hidden -fvisibility-inlines-hidden -fcolor-diagnostics -DNDEBUG -Wall -Winvalid-pch -O3 -MD -MQ src/cpp/document.cpython-310-darwin.so.p/document.cpp.o -MF src/cpp/document.cpython-310-darwin.so.p/document.cpp.o.d -o src/cpp/document.cpython-310-darwin.so.p/document.cpp.o -c ../../src/cpp/document.cpp

      ../../subprojects/pybind11-2.10.3/include/pybind11/detail/common.h:547:15: error: unknown type name 'constexpr'
      inline static constexpr size_t size_in_ptrs(size_t s) {

To fix it add default_options : ['c_std=c11', 'cpp_std=c++11'] to project() in meson.build:

project(
    'python-poppler', 
    'cpp', 
    version: '0.4.0', 
    license: 'GNU General Public License v2 (GPLv2)', 
    # license_files: 'LICENSE.txt',
    meson_version: '>=1.0.0',
    default_options : ['c_std=c11', 'cpp_std=c++11']
)

Then installation works:

$ pip3 install meson meson-python
$ pip3 install python_poppler

Successfully installed python-poppler-0.4.0

Rotation Enum is very inconsistent and does not match the actual poppler-cpp

As of the time of writing, the Rotation Enum of python-poppler looks like this

poppler.Rotation.rotate_0
poppler.Rotation.rotate_90
poppler.Rotation.rotate18_0
poppler.Rotation.rotate27_0

which can be confirmed with

>>> import poppler
>>> vars(poppler.Rotation)
mappingproxy({'__init__': <instancemethod __init__ at 0x7f2f00e265b0>, '__doc__': <pybind11_builtins.pybind11_static_property object at 0x7f2f00dfa4f0>, '__module__': 'poppler.cpp.global_', '__entries': {'rotate_0': (rotation_enum.rotate_0, None), 'rotate_90': (rotation_enum.rotate_90, None), 'rotate18_0': (rotation_enum.rotate18_0, None), 'rotate27_0': (rotation_enum.rotate27_0, None)}, '__repr__': <instancemethod  at 0x7f2f00e262b0>, 'name': <property object at 0x7f2f00dfa630>, '__members__': <pybind11_builtins.pybind11_static_property object at 0x7f2f00dfa450>, '__eq__': <instancemethod  at 0x7f2f00e26490>, '__ne__': <instancemethod  at 0x7f2f00e264f0>, '__getstate__': <instancemethod  at 0x7f2f00e26550>, '__hash__': <instancemethod  at 0x7f2f00e26550>, '__int__': <instancemethod __int__ at 0x7f2f00e26610>, '__index__': <instancemethod __index__ at 0x7f2f00e26670>, '__setstate__': <instancemethod  at 0x7f2f00e266d0>, 'rotate_0': rotation_enum.rotate_0, 'rotate_90': rotation_enum.rotate_90, 'rotate18_0': rotation_enum.rotate18_0, 'rotate27_0': rotation_enum.rotate27_0})

The position of the underscore is very inconsistent and a likely cause of errors for users who don't look closely (especially since there is no note in the documentation concerning the Enum's actual attributes). I think most people would expect all attributes to start with rotate_, rather than weirdly having the underscore inside the rotation number for 180 and 270 degrees.

I think the Enum should be restructured to this

poppler.Rotation.rotate_0
poppler.Rotation.rotate_90
poppler.Rotation.rotate_180
poppler.Rotation.rotate_270

Executing tests

Hi,

do you have an idea, how to package pybind11_tests, that your tests require?

Pete

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.