cbrunet / python-poppler Goto Github PK
View Code? Open in Web Editor NEWPython binding to Poppler-cpp pdf library
License: GNU General Public License v2.0
Python binding to Poppler-cpp pdf library
License: GNU General Public License v2.0
Ok, so I'm putzing about trying to get this to build on windows
find_library()
call.*.pc
files. prefix=D:/bld/poppler_1595515154908/_h_env/Library
. I replaced them with relative paths and that seems to have worked: prefix=../../_h_env/Library
PKG_CONFIG_PATH
pointing to the lib\pkgconfig
subdirectory of wherever you unzipped the poppler librarypython setup.py bdist
will configure successfully, and then try to build.I'm now at the point where I'm hitting compiler differences:
C:\code\python-poppler\src\cpp\image.cpp(47,27): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'pybind11::buffer_info' [C:\code\python-poppler\build\temp.win-amd64-3.8\Release\image.vcxproj]
C:\code\python-poppler\src\cpp\image.cpp(54,5): message : No constructor could take the source type, or constructor overload resolution was ambiguous [C:\code\python-poppler\build\temp.win-amd64-3.8\Release\image.vcxproj]
C:\code\python-poppler\src\cpp\image.cpp(47,16): error C2064: term does not evaluate to a function taking 6 arguments [C:\code\python-poppler\build\temp.win-amd64-3.8\Release\image.vcxproj]
So it looks like it's not hugely difficult to get this to do things on windows.
Hello there!
You've made a lovely wrapper!
Thank you!
I'm trying to parse a PDF that contains Chinese characters.
The text is extracted okay, but when I try to access fonts, I get the following error:
>>> box.get_font_name() # Assume the box is extracted from some page, this box contains Chinese characters
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<PATH>/lib/python3.7/site-packages/poppler/utilities.py", line 90, in wrapped
return fct(*args, **kwargs)
File "<PATH>/lib/python3.7/site-packages/poppler/page.py", line 64, in get_font_name
return self._text_box.get_font_name(i)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 7: invalid start byte
Trying to iterate fonts through the document itself results in the same error.
Environment:
Python 3.7.4
Poppler 21.12.0 (Compiled from source).
Happens on both Mac and Ubuntu.
I have seen other poppler bindings, such as this one that handles those errors (by using the replace
keyword for decoding the string), but unfortunately it uses deprecated internal APIs and cannot be used with a newer version of poppler (even when trying to build from source).
If there was somehow a way to supply the required encoding or even suppress/ignore those errors, it would be very benficial.
I have seen another comment on another ticket that says we can request to expose the encoding/decoding in the cpp backend.
Hi,
nice project, BTW. While packaging for openSUSE, I came across a build issue:
[ 15s] Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.RKb8e4
[ 15s] + umask 022
[ 15s] + cd /home/abuild/rpmbuild/BUILD
[ 15s] + /usr/bin/rm -rf /home/abuild/rpmbuild/BUILDROOT/python-poppler-0.2.1-0.i386
[ 15s] ++ dirname /home/abuild/rpmbuild/BUILDROOT/python-poppler-0.2.1-0.i386
[ 15s] + /usr/bin/mkdir -p /home/abuild/rpmbuild/BUILDROOT
[ 15s] + /usr/bin/mkdir /home/abuild/rpmbuild/BUILDROOT/python-poppler-0.2.1-0.i386
[ 15s] + cd python-poppler-0.2.1
[ 15s] ++ '[' -f _current_flavor ']'
[ 15s] ++ true
[ 15s] + python_flavor=
[ 15s] + '[' -z '' ']'
[ 15s] + python_flavor=tmp
[ 15s] + '[' tmp '!=' python3 ']'
[ 15s] + '[' -d build ']'
[ 15s] + '[' -d _build.python3 ']'
[ 15s] + echo python3
[ 15s] + /usr/bin/python3 setup.py build '--executable=/usr/bin/python3 -s'
[ 15s] running build
[ 15s] running build_py
[ 15s] creating build
[ 15s] creating build/lib.linux-i686-3.8
[ 15s] creating build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/__init__.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/_version.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/destination.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/document.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/embeddedfile.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/font.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/image.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/page.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/pagerenderer.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/pagetransition.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/rectangle.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/toc.py -> build/lib.linux-i686-3.8/poppler
[ 15s] copying src/poppler/utilities.py -> build/lib.linux-i686-3.8/poppler
[ 15s] creating build/lib.linux-i686-3.8/poppler/cpp
[ 15s] copying src/poppler/cpp/__init__.py -> build/lib.linux-i686-3.8/poppler/cpp
[ 15s] running egg_info
[ 15s] writing src/python_poppler.egg-info/PKG-INFO
[ 15s] writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
[ 15s] writing top-level names to src/python_poppler.egg-info/top_level.txt
[ 15s] reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
[ 15s] reading manifest template 'MANIFEST.in'
[ 15s] writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
[ 15s] running build_ext
[ 15s] -- The C compiler identification is GNU 10.2.1
[ 15s] -- The CXX compiler identification is GNU 10.2.1
[ 15s] -- Detecting C compiler ABI info
[ 15s] -- Detecting C compiler ABI info - done
[ 15s] -- Check for working C compiler: /usr/bin/cc - skipped
[ 15s] -- Detecting C compile features
[ 15s] -- Detecting C compile features - done
[ 15s] -- Detecting CXX compiler ABI info
[ 16s] -- Detecting CXX compiler ABI info - done
[ 16s] -- Check for working CXX compiler: /usr/bin/c++ - skipped
[ 16s] -- Detecting CXX compile features
[ 16s] -- Detecting CXX compile features - done
[ 16s] -- Found PythonInterp: /usr/bin/python3 (found version "3.8.5")
[ 16s] -- Found PythonLibs: /usr/lib/libpython3.8.so
[ 16s] -- Performing Test HAS_CPP14_FLAG
[ 16s] -- Performing Test HAS_CPP14_FLAG - Success
[ 16s] -- pybind11 v2.5.0
[ 16s] -- Found PkgConfig: /usr/bin/pkg-config (found version "1.7.3")
[ 16s] -- Checking for module 'poppler-cpp>=0.62.0'
[ 16s] -- Found poppler-cpp, version 0.90.0
[ 16s] -- Configuring done
[ 16s] -- Generating done
[ 16s] -- Build files have been written to: /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/build/temp.linux-i686-3.8
[ 16s] Scanning dependencies of target global_
[ 16s] Scanning dependencies of target version
[ 16s] [ 8%] Building CXX object CMakeFiles/version.dir/src/cpp/version.cpp.o
[ 16s] [ 8%] Building CXX object CMakeFiles/global_.dir/src/cpp/global.cpp.o
[ 19s] [ 12%] Linking CXX shared module ../lib.linux-i686-3.8/poppler/cpp/version.cpython-38-i386-linux-gnu.so
[ 19s] [ 12%] Built target version
[ 19s] Scanning dependencies of target image
[ 19s] [ 16%] Building CXX object CMakeFiles/image.dir/src/cpp/image.cpp.o
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp: In function ‘pybind11::buffer_info poppler::image_buffer_info(poppler::image&)’:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: error: no matching function for call to ‘pybind11::buffer_info::buffer_info(void*, long int, std::string, long int, <brace-encl’
[ 19s] 54 | );
[ 19s] | ^
[ 19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:89:5: note: candidate: ‘pybind11::buffer_info::buffer_info(pybind11::buffer_info::private_ctr_tag, void*, pybin’
[ 19s] 89 | buffer_info(private_ctr_tag, void *ptr, ssize_t itemsize, const std::string &format, ssize_t ndim,
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:89:5: note: candidate expects 8 arguments, 6 provided
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:64:5: note: candidate: ‘pybind11::buffer_info::buffer_info(pybind11::buffer_info&&)’
[ 19s] 64 | buffer_info(buffer_info &&other) {
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:64:5: note: candidate expects 1 argument, 6 provided
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:54:14: note: candidate: ‘pybind11::buffer_info::buffer_info(Py_buffer*, bool)’
[ 19s] 54 | explicit buffer_info(Py_buffer *view, bool ownview = true)
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:54:14: note: candidate expects 2 arguments, 6 provided
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:51:5: note: candidate: ‘template<class T> pybind11::buffer_info::buffer_info(const T*, pybind11::ssize_t, bool)’
[ 19s] 51 | buffer_info(const T *ptr, ssize_t size, bool readonly=true)
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:51:5: note: template argument deduction/substitution failed:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: note: candidate expects 3 arguments, 6 provided
[ 19s] 54 | );
[ 19s] | ^
[ 19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:47:5: note: candidate: ‘template<class T> pybind11::buffer_info::buffer_info(T*, pybind11::ssize_t, bool)’
[ 19s] 47 | buffer_info(T *ptr, ssize_t size, bool readonly=false)
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:47:5: note: template argument deduction/substitution failed:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: note: candidate expects 3 arguments, 6 provided
[ 19s] 54 | );
[ 19s] | ^
[ 19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:43:5: note: candidate: ‘pybind11::buffer_info::buffer_info(void*, pybind11::ssize_t, const string&, pybind11::s’
[ 19s] 43 | buffer_info(void *ptr, ssize_t itemsize, const std::string &format, ssize_t size, bool readonly=false)
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:43:5: note: candidate expects 5 arguments, 6 provided
[ 19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:40:5: note: candidate: ‘template<class T> pybind11::buffer_info::buffer_info(T*, pybind11::detail::any_containe’
[ 19s] 40 | buffer_info(T *ptr, detail::any_container<ssize_t> shape_in, detail::any_container<ssize_t> strides_in, bool readonly=false)
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:40:5: note: template argument deduction/substitution failed:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:54:5: note: candidate expects 4 arguments, 6 provided
[ 19s] 54 | );
[ 19s] | ^
[ 19s] In file included from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pytypes.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/cast.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/attr.h:13,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/pybind11.h:44,
[ 19s] from /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/src/cpp/image.cpp:20:
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:29:5: note: candidate: ‘pybind11::buffer_info::buffer_info(void*, pybind11::ssize_t, const string&, pybind11::s’
[ 19s] 29 | buffer_info(void *ptr, ssize_t itemsize, const std::string &format, ssize_t ndim,
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:30:89: note: no known conversion for argument 6 from ‘<brace-enclosed initializer list>’ to ‘pybind11::detail’
[ 19s] 30 | detail::any_container<ssize_t> shape_in, detail::any_container<ssize_t> strides_in, bool readonly=false)
[ 19s] | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:27:5: note: candidate: ‘pybind11::buffer_info::buffer_info()’
[ 19s] 27 | buffer_info() { }
[ 19s] | ^~~~~~~~~~~
[ 19s] /home/abuild/rpmbuild/BUILD/python-poppler-0.2.1/pybind11/include/pybind11/buffer_info.h:27:5: note: candidate expects 0 arguments, 6 provided
[ 20s] gmake[2]: *** [CMakeFiles/image.dir/build.make:82: CMakeFiles/image.dir/src/cpp/image.cpp.o] Error 1
[ 20s] gmake[1]: *** [CMakeFiles/Makefile2:191: CMakeFiles/image.dir/all] Error 2
[ 20s] gmake[1]: *** Waiting for unfinished jobs....
[ 21s] [ 20%] Linking CXX shared module ../lib.linux-i686-3.8/poppler/cpp/global_.cpython-38-i386-linux-gnu.so
[ 21s] [ 20%] Built target global_
[ 21s] gmake: *** [Makefile:103: all] Error 2
[ 21s] Traceback (most recent call last):
[ 21s] File "setup.py", line 76, in <module>
[ 21s] setup(
[ 21s] File "/usr/lib/python3.8/site-packages/setuptools/__init__.py", line 162, in setup
[ 21s] return distutils.core.setup(**attrs)
[ 21s] File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
[ 21s] dist.run_commands()
[ 21s] File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
[ 21s] self.run_command(cmd)
[ 21s] File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
[ 21s] cmd_obj.run()
[ 21s] File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
[ 21s] self.run_command(cmd_name)
[ 21s] File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
[ 21s] self.distribution.run_command(command)
[ 21s] File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
[ 21s] cmd_obj.run()
[ 21s] File "setup.py", line 39, in run
[ 21s] self.build_extension(ext)
[ 21s] File "setup.py", line 71, in build_extension
[ 21s] subprocess.check_call(
[ 21s] File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
[ 21s] raise CalledProcessError(retcode, cmd)
[ 21s] subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j2']' returned non-zero exit status 2.
[ 21s] error: Bad exit status from /var/tmp/rpm-tmp.RKb8e4 (%build)
Obviously, the 32bit signature of pybind11::buffer_info::buffer_info
differs somehow.
Full build is available here.
Sorry, been in a hurry ATM, therefor no PR for now.
Hi everyone,
I am trying to install poppler on Google Colaboratory. I am getting the following error:
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-trbzri3i/python-poppler/setup.py'"'"'; file='"'"'/tmp/pip-install-trbzri3i/python-poppler/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-20uytqm_/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
Any idea how to solve this or how to install poppler on Google Colab.
I have tried to install using git option too. Still not able to install. I get the following error while trying to install using git:
-- Configuring incomplete, errors occurred!
See also "/content/python-poppler/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
Traceback (most recent call last):
File "setup.py", line 106, in
zip_safe=False,
File "/usr/local/lib/python3.7/dist-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.7/dist-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/usr/lib/python3.7/distutils/command/install_lib.py", line 109, in build
self.run_command('build_ext')
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "setup.py", line 39, in run
self.build_extension(ext)
File "setup.py", line 69, in build_extension
["cmake", ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/content/python-poppler', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/content/python-poppler/build/lib.linux-x86_64-3.7/poppler/cpp', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
Thank you in advance.
This segfaults:
from poppler import load_from_data, SearchDirection, CaseSensitivity, load_from_file
# https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
page = load_from_file("dummy.pdf").create_page(0)
page_rect = page.page_rect()
dummy_rect = page.search("Dummy", page_rect, SearchDirection.from_top, CaseSensitivity.case_sensitive)
while this works
from poppler import load_from_data, SearchDirection, CaseSensitivity, load_from_file
# https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
doc = load_from_file("dummy.pdf")
page = doc.create_page(0)
page_rect = page.page_rect()
dummy_rect = page.search("Dummy", page_rect, SearchDirection.from_top, CaseSensitivity.case_sensitive)
Hey, I'm struggling with the following problem. I have to build poppler behind a proxy, i.e. I have no direct internet access.
Yet the install requires one to download pybind11 from github. Is there a way to change the pybind11 dependency somehow, use a custom source_url in the wrap file or something like this?
The Meson build system
Version: 1.2.3
Source dir: /tmp/pip-install-7b10_g5_/python-poppler_76bcf6dbd94d427f98b8759336351404
Build dir: /tmp/pip-install-7b10_g5_/python-poppler_76bcf6dbd94d427f98b8759336351404/.mesonpy-xuv01fdp
Build type: native build
Project name: python-poppler
Project version: 0.4.1
C++ compiler for the host machine: c++ (gcc 8.5.0 "c++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)")
C++ linker for the host machine: c++ ld.bfd 2.30-119
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /usr/bin/pkg-config (1.4.2)
Run-time dependency poppler-cpp found: YES 20.11.0
Program python3 found: YES (/usr/bin/python3.11)
Downloading pybind11 source from https://github.com/pybind/pybind11/archive/refs/tags/v2.10.3.tar.gz
<urlopen error [Errno -2] Name or service not known>
WARNING: failed to download with error: could not get https://github.com/pybind/pybind11/archive/refs/tags/v2.10.3.tar.gz is the internet available?. Trying after a delay...
<urlopen error [Errno -2] Name or service not known>
Hi,
i've tryed this nice tools on my windows laptop, no pb with installation.
I want to scale and itry to install it on my GCP VM ( PyTorch:1.7/CUDA11.0.GPU ) and installation crash ...
You can find below error msg, many thanks for your help.
Collecting python-poppler
Using cached python-poppler-0.2.2.tar.gz (595 kB)
Building wheels for collected packages: python-poppler
Building wheel for python-poppler (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /opt/conda/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/setup.py'"'"'; file='"'"'/tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-2ut924lj
cwd: /tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/
Complete output (91 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/poppler
copying src/poppler/pagerenderer.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/image.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/font.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/init.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/document.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/rectangle.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/_version.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/utilities.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/embeddedfile.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/destination.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/pagetransition.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/page.py -> build/lib.linux-x86_64-3.7/poppler
copying src/poppler/toc.py -> build/lib.linux-x86_64-3.7/poppler
creating build/lib.linux-x86_64-3.7/poppler/cpp
copying src/poppler/cpp/init.py -> build/lib.linux-x86_64-3.7/poppler/cpp
running egg_info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
running build_ext
-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PythonInterp: /opt/conda/bin/python3.7 (found version "3.7.10")
-- Found PythonLibs: /opt/conda/lib/libpython3.7m.so
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- pybind11 v2.5.0
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29")
-- Checking for module 'poppler-cpp>=0.26.0'
-- No package 'poppler-cpp' found
CMake Error at /usr/share/cmake-3.13/Modules/FindPkgConfig.cmake:452 (message):
A required package was not found
Call Stack (most recent call first):
/usr/share/cmake-3.13/Modules/FindPkgConfig.cmake:622 (_pkg_check_modules_internal)
CMakeLists.txt:14 (pkg_check_modules)
-- Configuring incomplete, errors occurred!
See also "/tmp/pip-install-ar576gab/python-poppler_3308e3b17b604d49bb817e3fd73eeffe/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
Could be related to https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/894
To be investigated.
pip install python_poppler==0.4.0
on python 3.10 (Ubuntu 22.04) using meson 1.1.0 fails with:
#30 28.35 + meson install --no-rebuild --destdir /tmp/pip-install-d59qdq37/python-poppler_8100653bafcf4616b4f77887e540edc4/.mesonpy-zrqhtx8x/install
#30 28.35
#30 28.35 meson-python: error: The poppler package is split between purelib and platlib: 'purelib/poppler/__init__.py' and 'platlib/poppler/cpp/global_.cpython-310-x86_64-linux-gnu.so', a "pure: false" argument may be missing in meson.build
#30 28.35 [end of output]
-> "pure: false"
argument may be missing in meson.build
Caused by meson-python-0.13.0 release:
- Raise an error when a package is split between
platlib
andpurelib
.
More info: #74
I'd guess that we need to pass pure: false
in the meson.build
file here (similar to test code of meson, release notes, docs):
python3 = python_mod.find_installation('python3', pure: false)
Maybe this is more of a general question. The same would also apply to libmagic-dev or tesseract-ocr vs. pytesseract. I hope to find some answer here.
Hi Charles,
I found 0.2.2 builds failing for some time in our distributions.
It turned out to require a patch similar to:
Index: b/tests/test_image.py
===================================================================
--- a/tests/test_image.py
+++ b/tests/test_image.py
@@ -40,8 +40,8 @@ def test_data_size(pdf_page):
def test_image_format_to_str():
- assert str(Image.Format.argb32) == "BGRA"
- assert str(Image.Format.invalid) == ""
+ assert str(Image.Format.argb32) in ("BGRA", "format_enum.argb32")
+ assert str(Image.Format.invalid) in ("", "format_enum.invalid")
def test_image_memory_view(pdf_page):
It looks like being related to the pybind11 version, as we use the system provided one.
Is this intended? In other words, do you want me to prepare a PR with such a change, or do you plan to add some code to keep the old API?
If you want to check out my builds, look here.
The openSUSE_Leap_15.2 build is using an older pybind11, while the other builds use the current release.
I'm getting this error when importing the poppler module. any help is greatly appreciated
If the document is broken, Poppler spews error messages to stderr, which is messy (especially if I am processing multiple documents in parallel).
It would be great to have a possibility of silencing error messages (by setting Poppler's globalParams
), or even better by providing a Python callback for reporting errors.
Hello everyone,
I'm Arany. I am new to coding. I use Python. I'm here to ask about an issue with python-poppler. Let me explain the situation first.
So, I was trying to install python-poppler through pip and I have faced 3 issues so far and the first two are resolved. Let me list all of them below:
The third issue is where I need help. I installed Poppler, added the location into PATH and then python-poppler says it can't find it. It isn't a compatibility issue either! If this helps, to verify everything was done correctly, MS Copilot gave me the code 'poppler-cpp --version' and Command Prompt said it doesn't find any internal or external command, operable program or batch file named 'poppler-cpp'. I remembered that Copilot gave me the command 'pdftoppm -v' previously. 'pdftoppm' being a part of poppler, I used it as a substitute and it worked!
Please guide me through what I need to do to install python-poppler successfully.
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\adity\AppData\Local\Temp\pip-install-ii9h7dq\python-poppler_c004275ea29e4cdbbb1a83cfed9b71c9\setup.py", line 76, in
setup(
File "c:\program files\python39\lib\site-packages\setuptools_init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "c:\program files\python39\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "c:\program files\python39\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "c:\program files\python39\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "c:\program files\python39\lib\site-packages\setuptools\command\install.py", line 61, in run
return orig.install.run(self)
File "c:\program files\python39\lib\distutils\command\install.py", line 546, in run
self.run_command('build')
File "c:\program files\python39\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\program files\python39\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "c:\program files\python39\lib\distutils\command\build.py", line 135, in run
self.run_command(cmd_name)
File "c:\program files\python39\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\program files\python39\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\Users\adity\AppData\Local\Temp\pip-install-ii9h7dq\python-poppler_c004275ea29e4cdbbb1a83cfed9b71c9\setup.py", line 24, in run
out = subprocess.check_output(["cmake", "--version"])
File "c:\program files\python39\lib\subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\Users\adity\AppData\Roaming\Python\Python39\site-packages\run_init.py", line 145, in new
process = cls.create_process(command, stdin, cwd=cwd, env=env, shell=shell)
File "C:\Users\adity\AppData\Roaming\Python\Python39\site-packages\run_init_.py", line 121, in create_process
shlex.split(command),
File "c:\program files\python39\lib\shlex.py", line 315, in split
return list(lex)
File "c:\program files\python39\lib\shlex.py", line 300, in next
token = self.get_token()
File "c:\program files\python39\lib\shlex.py", line 109, in get_token
raw = self.read_token()
File "c:\program files\python39\lib\shlex.py", line 140, in read_token
nextchar = self.instream.read(1)
AttributeError: 'list' object has no attribute 'read'
I have a strange work environment: an Ubuntu Xenial VM, with libpoppler85
(via the cran/poppler PPA backport), python3.7.7 (via deadsnakes PPA), and a virtualenv for python modules:
bnewbold@bnewbold-dev$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
bnewbold@bnewbold-dev$ cmake --version
cmake version 3.17.3
CMake suite maintained and supported by Kitware (kitware.com/cmake).
bnewbold@bnewbold-dev$ python --version
Python 3.7.7
bnewbold@bnewbold-dev$ head -n2 /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
This means I have a compatible version of libpoppler-cpp-dev
, but a version of GCC which does not support C++17.
When building this wrapper package, I get the following build error both with pip install python-poppler
(within virtualenv) or with python setup.py install
(in git checkout, inside virtualenv):
[pipenv.exceptions.InstallError]: [ 62%] Linking CXX shared module ../lib.linux-x86_64-3.7/poppler/cpp/global_.cpython-37m-x86_64-linux-gnu.so
1870 [pipenv.exceptions.InstallError]: /tmp/pip-install-b9bntv0h/python-poppler/src/cpp/version.cpp:22:18: error: expected ‘{’ before ‘::’ token
1871 [pipenv.exceptions.InstallError]: namespace poppler::version
1872 [pipenv.exceptions.InstallError]: ^
1873 [pipenv.exceptions.InstallError]: /tmp/pip-install-b9bntv0h/python-poppler/src/cpp/version.cpp:22:20: error: ‘version’ in namespace ‘::’ does not name a type
1874 [pipenv.exceptions.InstallError]: namespace poppler::version
1875 [pipenv.exceptions.InstallError]: ^
1876 [pipenv.exceptions.InstallError]: /tmp/pip-install-b9bntv0h/python-poppler/src/cpp/version.cpp:33:1: error: expected ‘}’ at end of input
1877 [pipenv.exceptions.InstallError]: } // namespace poppler::version
1878 [pipenv.exceptions.InstallError]: ^
1879 [pipenv.exceptions.InstallError]: CMakeFiles/version.dir/build.make:62: recipe for target 'CMakeFiles/version.dir/src/cpp/version.cpp.o' failed
1880
My experience with C++ is pretty dated, but I believe this is because of using nested namespace syntax:
namespace poppler::version
{
// [...]
}
which is a C++17 feature (?). Build works with more verbose syntax:
namespace poppler
{
namespace version
{
// [...]
}
}
I know this is an old and strange environment to support, but it seems like the fix is very simple. Alternatively, if C++17 syntax is intended to be required, I think CMake can be informed of that dependency and give a cleaner error message.
Thank you for maintaining this wrapper package! As some context of how I am hoping to use this, I work at the Internet Archive and am looking to use poppler (from python) to extract metadata, text, and thumbnails for hundreds of millions of PDFs that we have crawled from the web. We are in the process of upgrading away from Xenial but it will take many months to complete the transition.
Hello
I am writing an application which requires getting PNG bytestring of an Image
directly rather than saving it in a file.
I think it would be nice to have a method for Image
class to have this done.
Hi, Thank you for this amazing work. Recently I was working with some pdf and poppler was working great for most of them but for some of those pdf I am seeing the following error:-
Segmentation fault (core dumped)
Considering this is a memory issue I also can't put it in a try & catch to prevent my code from rebooting the workers again and again just to be stuck over there. This has been a major problem for me.
To give you some context and debugging that I have gone ahead with:-
page.text_list(page.TextListOption.text_list_include_font)
pdf_document.create_font_iterator()
, this also works but while getting this on the text_box level I face this errorboxes = self._page.text_list(opt_flag)
in page.py
the code is stopped with the errorThe metadata for the pdf that I see such errors with is mostly (not always):-
{'Producer': 'macOS Version 11.2.3 (Build 20D91) Quartz PDFContext', 'Creator': 'Pages'}
The code to repro the error:-
from poppler import load_from_file
file_path = "sample_pdf.pdf"
pdf_document = load_from_file(file_path)
no_of_pages = pdf_document.pages
for page_ind in range(no_of_pages):
page = pdf_document.create_page(page_ind)
text_list = page.text_list(page.TextListOption.text_list_include_font)
The link to the pdf:- https://drive.google.com/file/d/180CDGyiJRfytvuzVsAiYKppHvaBABGkJ/view?usp=sharing
Please request access to the pdf as I can't share it publically. (Really sorry for this, but I hope you understand)
https://github.com/cbrunet/python-poppler/blob/master/src/cpp/page.cpp#L101
py::class_<page>(m, "page")
# ...
.def("search", &search, py::arg("text"), py::arg("r"), py::arg("direction"), py::arg("case_sensitivity"), py::arg("rotatin") = rotation_enum::rotate_0)
py::arg("rotatin")
should likely be py::arg("rotation")
.
Details of the issue are mentioned here: python-poetry/poetry#7653
I want to use the EmbeddedFile
class. However, I get a segmentation fault.
from poppler import load_from_file
pdf_document = load_from_file("Portfolio.pdf")
for file in pdf_document.embedded_files():
print(file.name)
What kind of information could you help further? I saw at the TODO
file, that EmbeddedFile is not still testet. Does it work yet?
Hi,
I'm trying to get font information out of pdf document:
document = load_from_file(file_path)
page = document.create_page(0)
box = page.text_list()[0]
print(box.text) # prints the text as expected
print(box.get_font_name()) # prints "*ingored*
print(box.get_font_size()) # prints "-1"
Using poppler version 0.90 on macos
Hey
I am trying to install python-poppler on my ubuntu machine. It gets successfully installed but when I try to import it, I see this error.
>>> import poppler
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/poppler/build/py-virt3/lib/python3.6/site-packages/poppler/__init__.py", line 22, in <module>
from poppler.document import load, load_from_file, load_from_data
File "/home/ec2-user/poppler/build/py-virt3/lib/python3.6/site-packages/poppler/document.py", line 21, in <module>
from poppler.destination import Destination
File "/home/ec2-user/poppler/build/py-virt3/lib/python3.6/site-packages/poppler/destination.py", line 21, in <module>
from poppler.cpp.destination import type_enum as DestinationType # noqa
ImportError: cannot import name 'type_enum'
I have installed poppler from source using instructions mentioned here. Any help would be appreciated.
Thanks.
When I try to load a document, which is not a PDF, load_from_file
succeeds, but all subsequent methods crash. The reason is that self._document
becomes None
and most subsequent methods are decorated with @ensure_unlocked
, leading to a call of self._document.is_locked
.
I am willing to fix it, but which solution do you prefer? We can add an is_broken
method, which can be used after loading the document to check if it was loaded successfully. Or we can make loading throw an exception upon failure.
Firstly, thanks for your contribution thus far.
I've been using Poppler for a while now and it is not clear how to sort the boxes that we receive.
I noticed in Poppler there is a "PopplerStructureElement" https://poppler.freedesktop.org/api/glib/PopplerStructureElement.html , which, I believe, allow us to understand which box comes after which.
Is there a way to replicate this? Or is there an alternative way to know the order for certain?
Thanks in advance
im runing on mac intel
the output is :
Defaulting to user installation because normal site-packages is not writeable
Collecting python-poppler==0.3.0
Using cached python-poppler-0.3.0.tar.gz (823 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: python-poppler
Building wheel for python-poppler (pyproject.toml) ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3.6 /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/tmpgpdvlixh
cwd: /private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-install-oc4qvs72/python-poppler_d930c1d82b0346c29cf6165469df5d7f
Complete output (85 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.6
creating build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/toc.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/pagetransition.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/destination.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/_version.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/pagerenderer.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/init.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/page.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/font.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/document.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/rectangle.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/embeddedfile.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/utilities.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
copying src/poppler/image.py -> build/lib.macosx-10.9-x86_64-3.6/poppler
creating build/lib.macosx-10.9-x86_64-3.6/poppler/cpp
copying src/poppler/cpp/init.py -> build/lib.macosx-10.9-x86_64-3.6/poppler/cpp
running egg_info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.txt'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
running build_ext
Traceback (most recent call last):
File "setup.py", line 24, in run
out = subprocess.check_output(["cmake", "--version"])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cmake': 'cmake'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in
main()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 262, in build_wheel
metadata_directory)
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 231, in build_wheel
wheel_directory, config_settings)
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 215, in _build_with_temp_dir
self.run_setup()
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 268, in run_setup
self).run_setup(setup_script=setup_script)
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 158, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 108, in
zip_safe=False,
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/private/var/folders/7z/f5yrvy6d489ctt6jnncd9yn40000gn/T/pip-build-env-_l32gwp0/overlay/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "setup.py", line 28, in run
+ ", ".join(e.name for e in self.extensions)
RuntimeError: CMake must be installed to build the following extensions: poppler.cpp.modules
ERROR: Failed building wheel for python-poppler
Failed to build python-poppler
ERROR: Could not build wheels for python-poppler, which is required to install pyproject.toml-based projects
This is just a doc comment: your note on https://cbrunet.github.io/python-poppler/ that the python-poppler poppler-glib binding is outdated is quite misleading:
python-poppler
Binding based on poppler-glib. Latest version is from 2009…
This module has not been updated because it is now an integral part of poppler-glib. It works fine with python3, and it's probably the most complete binding. The thing which is actually quite lagging is poppler-cpp (no access to annotations for example, which was my use case, but I guess that there are other things missing).
Because your project comes very early in search results for "python poppler", and because the existence of the poppler-glib binding is not easily discoverable, it would be really helpful if you could update this part of your documentation.
The workflow of one of my scripts relies on calling subprocess.run
for each pdf page to get its respective pdffonts output. Those calls are expensive, so I'm looking forward to better ways to identify likely problematic fonts from font name, enconding, "embeddedness" and unicode conversion.
Example of pdffonts output:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
DIIEDG+Arial CID TrueType Identity-H yes yes no 51834 0
DIIEDH+Arial CID TrueType Identity-H yes yes no 51831 0
DIIDPF+ArialMT CID TrueType Identity-H yes yes yes 51824 0
DIIEBG+TimesNewRomanPSMT CID TrueType Identity-H yes yes yes 51821 0
[none] Type 3 Custom yes no no 51861 0
Arial TrueType WinAnsi yes no no 67975 0
Is it possible to extract those informations just with poppler backend used here? I just took a look at the source code and I've not found something like that.
Thanks in advance,
I'm trying to extract both the text and images in a PDF, ideally without recompressing or converting the image formats if possible.
This is pretty trivial to do with pdfminer
or PyPDF2
, but they're cripplingly slow for extracting text (A PDF which poppler/python-poppler processes in ~1-2 seconds takes 200+ seconds~). On the other hand, there doesn't seem to be any way to get images through python-poppler.
Now, poppler provides a pdfimages
utility that uses poppler to extract images, but it looks like it's pretty annoying to do internally.
AFICT, basically the way the pdfimages
extracts images is to provide a custom page rendering device (ImageOutputDev
), which rather then doing actual compositing when rendering a page just ignores all draw commands other then drawImage_xxx()
calls, and instead just saves the data passed to the draw image calls.
I think it'd be pretty easy to tweak ImageOutputDev.cc
to instead write images to memory, and then provide a python call that returns the images as bytes, but I don't have a great idea how to start integrating this with the wrapper bits.
After applying distribution updates to poppler on KDE Neon User, I got AttributeError
exceptions when importing python-poppler, caused by the following code passages:
python-poppler/src/poppler/page.py
Lines 27 to 28 in 33d36b1
python-poppler/src/poppler/page.py
Lines 77 to 78 in 33d36b1
22.04.0
.I can't seem to be able to install python-poppler
on my system (MacOS X Catalina).
I have Xcode installed with the commandline utilities (I can compile with clang).
I run brew install poppler
with no issue, the files are correctly found at /usr/local/opt/poppler
.
My CFLAGS
and CPPFLAGS
environment variables both include -I/usr/local/opt/poppler/include
.
I manage python2/3 via pyenv
. pyenv version
gives me 3.8.2
as expected.
When I run pip install python-poppler
the installation fails.
Examining the log it seems that the compiler fails to find the header files:
running build_ext
-- The C compiler identification is AppleClang 11.0.3.11030032
-- The CXX compiler identification is AppleClang 11.0.3.11030032
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PkgConfig: /usr/local/bin/pkg-config (found version "0.29.2")
-- Checking for module 'poppler-cpp>=0.62.0'
-- Found poppler-cpp, version 0.89.0
-- Found PythonInterp: /Users/bordaigorl/.pyenv/versions/3.8.2/bin/python3.8 (found version "3.8.2")
-- Found PythonLibs: /Users/bordaigorl/.pyenv/versions/3.8.2/lib/libpython3.8.a
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- pybind11 v2.5.0
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /private/var/folders/jx/r7g7g23n5rb_fdycrtdl5sdw0000gn/T/pip-install-j_u40vtr/python-poppler/build/temp.macosx-10.15-x86_64-3.8
Scanning dependencies of target font
Scanning dependencies of target page_renderer
[ 8%] Building CXX object CMakeFiles/page_renderer.dir/src/cpp/page_renderer.cpp.o
[ 8%] Building CXX object CMakeFiles/font.dir/src/cpp/font.cpp.o
In file included from /private/var/folders/jx/r7g7g23n5rb_fdycrtdl5sdw0000gn/T/pip-install-j_u40vtr/python-poppler/src/cpp/page_renderer.cpp:19:
/private/var/folders/jx/r7g7g23n5rb_fdycrtdl5sdw0000gn/T/pip-install-j_u40vtr/python-poppler/src/cpp/version.h:21:10: fatal error: 'poppler/cpp/poppler-version.h' file not found
#include <poppler/cpp/poppler-version.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
I was very confused because clang
sees the includes just fine if run manually.
I then thought that maybe it's cmake's fault and found this:
https://gitlab.kitware.com/cmake/cmake/-/issues/19120
So I symlinked
sudo ln -s /usr/local/include/poppler /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/poppler
and now pip install python-poppler
works.
Is there a way to fix this without dodgy symlinks?
Hello @cbrunet . Excellent tool. This is one of the fastest ways to generate PDF images. Thanks for building this.
I have tried python-poppler to render a PDF page, following the documentation available at https://cbrunet.net/python-poppler/usage.html. I have used the following code.
import poppler
doc = poppler.load_from_file(PDF_PATH)
renderer = poppler.PageRenderer()
page = doc.create_page(54)
image = renderer.render_page(page, xres=300, yres=300)
image.save('test.png', 'png', dpi=300)
However, the resultant image has poor rendering, compared to the ones generated using pdftocairo. I am assuming that it is an issue with hinting. I am attaching sample images for your perusal. Zoom in to each of the file to see the difference. The font edges are corrugated in case of python-poppler generated image.
I tried to install in ubuntu22.04 and i have same problem #28
I install libpoppler-cpp-dev and after that, everything work well
In this issue, I just want to say how I fix #28
here is my console log
(avada) agent@tm:~/Downloads/python-poppler$ git submodule update --init --recursive
(avada) agent@tm:~/Downloads/python-poppler$ python setup.py install
running install
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
creating src/python_poppler.egg-info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.txt'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/poppler
copying src/poppler/document.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/utilities.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/toc.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/rectangle.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/__init__.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/page.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/pagerenderer.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/pagetransition.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/font.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/destination.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/image.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/embeddedfile.py -> build/lib.linux-x86_64-3.8/poppler
copying src/poppler/_version.py -> build/lib.linux-x86_64-3.8/poppler
creating build/lib.linux-x86_64-3.8/poppler/cpp
copying src/poppler/cpp/__init__.py -> build/lib.linux-x86_64-3.8/poppler/cpp
running build_ext
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 11.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- pybind11 v2.9.2
-- Found PythonInterp: /home/agent/anaconda3/envs/avada/bin/python (found version "3.8.13")
-- Found PythonLibs: /home/agent/anaconda3/envs/avada/lib/libpython3.8.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")
-- Checking for module 'poppler-cpp>=0.26.0'
-- No package 'poppler-cpp' found
CMake Error at /usr/share/cmake-3.22/Modules/FindPkgConfig.cmake:603 (message):
A required package was not found
Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/FindPkgConfig.cmake:825 (_pkg_check_modules_internal)
CMakeLists.txt:14 (pkg_check_modules)
-- Configuring incomplete, errors occurred!
See also "/home/agent/Downloads/python-poppler/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log".
Traceback (most recent call last):
File "setup.py", line 76, in <module>
setup(
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 148, in setup
return run_commands(dist)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
dist.run_commands()
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
self.run_command(cmd)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py", line 74, in run
self.do_egg_install()
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 165, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
self.run_command(cmdname)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "setup.py", line 39, in run
self.build_extension(ext)
File "setup.py", line 68, in build_extension
subprocess.check_call(
File "/home/agent/anaconda3/envs/avada/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/agent/Downloads/python-poppler', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/agent/Downloads/python-poppler/build/lib.linux-x86_64-3.8/poppler/cpp', '-DPYTHON_EXECUTABLE=/home/agent/anaconda3/envs/avada/bin/python', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
(avada) agent@tm:~/Downloads/python-poppler$ sudo apt-get update -y
Hit:1 http://vn.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://vn.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 http://vn.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 http://vn.archive.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 http://packages.microsoft.com/repos/code stable InRelease
Hit:6 https://dl.google.com/linux/chrome/deb stable InRelease
Hit:7 https://packages.microsoft.com/repos/edge stable InRelease
Hit:8 https://linux.teamviewer.com/deb stable InRelease
Hit:9 https://download.sublimetext.com apt/stable/ InRelease
Hit:10 https://ppa.launchpadcontent.net/bamboo-engine/ibus-bamboo/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:12 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Ign:13 https://ppa.launchpadcontent.net/numix/ppa/ubuntu jammy InRelease
Hit:14 https://ppa.launchpadcontent.net/papirus/papirus/ubuntu jammy InRelease
Err:15 https://ppa.launchpadcontent.net/numix/ppa/ubuntu jammy Release
404 Not Found [IP: 185.125.190.52 443]
Reading package lists... Done
W: https://linux.teamviewer.com/deb/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
W: https://download.sublimetext.com/apt/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
E: The repository 'https://ppa.launchpadcontent.net/numix/ppa/ubuntu jammy Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
(avada) agent@tm:~/Downloads/python-poppler$ sudo apt-get install -y libpoppler-cpp-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
libmessaging-menu0
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
libpoppler-cpp-dev
0 upgraded, 1 newly installed, 0 to remove and 19 not upgraded.
Need to get 11,7 kB of archives.
After this operation, 89,1 kB of additional disk space will be used.
Get:1 http://vn.archive.ubuntu.com/ubuntu jammy/main amd64 libpoppler-cpp-dev amd64 22.02.0-2 [11,7 kB]
Fetched 11,7 kB in 1s (12,6 kB/s)
Selecting previously unselected package libpoppler-cpp-dev:amd64.
(Reading database ... 392845 files and directories currently installed.)
Preparing to unpack .../libpoppler-cpp-dev_22.02.0-2_amd64.deb ...
Unpacking libpoppler-cpp-dev:amd64 (22.02.0-2) ...
Setting up libpoppler-cpp-dev:amd64 (22.02.0-2) ...
(avada) agent@tm:~/Downloads/python-poppler$ python setup.py install
running install
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing src/python_poppler.egg-info/PKG-INFO
writing dependency_links to src/python_poppler.egg-info/dependency_links.txt
writing top-level names to src/python_poppler.egg-info/top_level.txt
reading manifest file 'src/python_poppler.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.txt'
writing manifest file 'src/python_poppler.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
-- pybind11 v2.9.2
-- Checking for module 'poppler-cpp>=0.26.0'
-- Found poppler-cpp, version 22.02.0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/agent/Downloads/python-poppler/build/temp.linux-x86_64-3.8
[ 4%] Building CXX object CMakeFiles/global_.dir/src/cpp/global.cpp.o
[ 8%] Building CXX object CMakeFiles/version.dir/src/cpp/version.cpp.o
[ 12%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/version.cpython-38-x86_64-linux-gnu.so
[ 12%] Built target version
[ 16%] Building CXX object CMakeFiles/rectangle.dir/src/cpp/rectangle.cpp.o
[ 20%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/global_.cpython-38-x86_64-linux-gnu.so
[ 20%] Built target global_
[ 25%] Building CXX object CMakeFiles/image.dir/src/cpp/image.cpp.o
[ 29%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/rectangle.cpython-38-x86_64-linux-gnu.so
[ 29%] Built target rectangle
[ 33%] Building CXX object CMakeFiles/document.dir/src/cpp/document.cpp.o
[ 37%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/image.cpython-38-x86_64-linux-gnu.so
[ 37%] Built target image
[ 41%] Building CXX object CMakeFiles/page.dir/src/cpp/page.cpp.o
[ 45%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/document.cpython-38-x86_64-linux-gnu.so
[ 45%] Built target document
[ 50%] Building CXX object CMakeFiles/page_renderer.dir/src/cpp/page_renderer.cpp.o
[ 54%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/page.cpython-38-x86_64-linux-gnu.so
[ 54%] Built target page
[ 58%] Building CXX object CMakeFiles/page_transition.dir/src/cpp/page_transition.cpp.o
[ 62%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/page_renderer.cpython-38-x86_64-linux-gnu.so
[ 62%] Built target page_renderer
[ 66%] Building CXX object CMakeFiles/embedded_file.dir/src/cpp/embedded_file.cpp.o
[ 70%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/page_transition.cpython-38-x86_64-linux-gnu.so
[ 70%] Built target page_transition
[ 75%] Building CXX object CMakeFiles/destination.dir/src/cpp/destination.cpp.o
[ 79%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/embedded_file.cpython-38-x86_64-linux-gnu.so
[ 79%] Built target embedded_file
[ 83%] Building CXX object CMakeFiles/toc.dir/src/cpp/toc.cpp.o
[ 87%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/toc.cpython-38-x86_64-linux-gnu.so
[ 87%] Built target toc
[ 91%] Building CXX object CMakeFiles/font.dir/src/cpp/font.cpp.o
[ 95%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/destination.cpython-38-x86_64-linux-gnu.so
[ 95%] Built target destination
[100%] Linking CXX shared module ../lib.linux-x86_64-3.8/poppler/cpp/font.cpython-38-x86_64-linux-gnu.so
[100%] Built target font
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/document.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/utilities.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/toc.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/rectangle.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/__init__.py -> build/bdist.linux-x86_64/egg/poppler
creating build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/image.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/page_renderer.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/destination.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/font.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/__init__.py -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/global_.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/version.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/page.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/toc.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/document.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/rectangle.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/page_transition.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/cpp/embedded_file.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/poppler/cpp
copying build/lib.linux-x86_64-3.8/poppler/page.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/pagerenderer.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/pagetransition.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/font.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/destination.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/image.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/embeddedfile.py -> build/bdist.linux-x86_64/egg/poppler
copying build/lib.linux-x86_64-3.8/poppler/_version.py -> build/bdist.linux-x86_64/egg/poppler
byte-compiling build/bdist.linux-x86_64/egg/poppler/document.py to document.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/utilities.py to utilities.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/toc.py to toc.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/rectangle.py to rectangle.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/cpp/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/page.py to page.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/pagerenderer.py to pagerenderer.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/pagetransition.py to pagetransition.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/font.py to font.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/destination.py to destination.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/image.py to image.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/embeddedfile.py to embeddedfile.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/poppler/_version.py to _version.cpython-38.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/not-zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/python_poppler.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
creating dist
creating 'dist/python_poppler-0.3.0-py3.8-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing python_poppler-0.3.0-py3.8-linux-x86_64.egg
creating /home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/python_poppler-0.3.0-py3.8-linux-x86_64.egg
Extracting python_poppler-0.3.0-py3.8-linux-x86_64.egg to /home/agent/anaconda3/envs/avada/lib/python3.8/site-packages
Adding python-poppler 0.3.0 to easy-install.pth file
Installed /home/agent/anaconda3/envs/avada/lib/python3.8/site-packages/python_poppler-0.3.0-py3.8-linux-x86_64.egg
Processing dependencies for python-poppler==0.3.0
Finished processing dependencies for python-poppler==0.3.0
(avada) agent@tm:~/Downloads/python-poppler$
When I try to install poppler in Anaconda or CMD:
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org python-poppler
The error occured. Log attached
pytest.1.txt
Please advice how to fix it
For each line of text like
the words are where
can I get token locations like
'the ' -> (0, 4)
'words' -> (4, 9)
' are ' -> (9, 14)
'where' -> (14, 19)
Hello.
I've written a small guide here on my website and I would be happy if it's of some use for the others. Maybe include it in the docs or README.md or somewhere?
While doing tests with encrypted PDFs, I noticed that python-poppler strictly requires both the user and the owner password to be given for decrypting a file.
This behaviour is different from all other libraries I tried (pikepdf¹, pymupdf², redstork³, pypdfium-reboot⁴). These only have one password argument and then automatically detect whether it is the owner or user password.
The current behaviour does not make sense, since there is no need for a user password when the owner password was given, as the owner already has all permissions. And if someone only has the user password, they have no chance reading the file with python-poppler, since they do not have the owner password but it is a strictly required argument (opening fails when it is set to ""
or None
).
So this should be changed so that only one argument of user_password
and owner_password
is sufficient, or (even better) you could merge both into one password
argument and then auto-detect which one it is.
References:
¹ https://pikepdf.readthedocs.io/en/latest/api/main.html#pikepdf.Pdf.open
² https://pymupdf.readthedocs.io/en/latest/document.html#Document.authenticate
³ https://red-stork.readthedocs.io/en/latest/reference.html#redstork.Document
⁴ https://developers.foxit.com/resources/pdf-sdk/c_api_reference_pdfium/group___f_p_d_f_i_u_m.html#gaf783381b0fe5d3f579e9443b3877a7b1
I have attached the file that I used for testing - owner password is test_owner
and user password test_user
encrypted.pdf
While using PyLance for local development, I've encountered an inconsistency in the Python bindings related to the handling of Rectangle objects (rect
and rectf
). Specifically, the constructor for a Rectangle can accept either four floats or four integers, creating a rect
or rectf
respectively. However, this distinction becomes unclear when using certain functions like Page.text()
that specifically require a rectf
for the bounding box.
Currently, the bindings do not clearly differentiate between a rect
and a rectf
from the perspective of a developer working with Rectangles. This leads to confusion, especially since there's no visible difference in the Python code.
Additionally, there's an issue with PyLance when attempting to construct a Rectangle using float values. PyLance reports an error because the default arguments for the constructor are integers, causing type conflicts.
Proposed Solutions:
To address these issues, I suggest one of the following approaches:
Document and Distinguish rect
and rectf
Types:
rect
and rectf
.(x: float, y: float, w: float, h: float)
and (x: int, y: int, w: int, h: int)
. This provides explicit constructors for each type and makes it clear to the developer which type they are working with.Transparent Handling of Type Differences:
int
and float
types seamlessly. This approach would abstract the complexity from the developer, allowing for more flexible and intuitive usage of the API.It fails like in https://stackoverflow.com/questions/45047508/error-unknown-type-name-constexpr-during-make-in-mac-os-x:
FAILED: src/cpp/document.cpython-310-darwin.so.p/document.cpp.o
c++ -Isrc/cpp/document.cpython-310-darwin.so.p -Isrc/cpp -I../../src/cpp -I../../subprojects/pybind11-2.10.3/include -I/opt/homebrew/Cellar/poppler/23.04.0/include/poppler/cpp -I/opt/homebrew/Cellar/poppler/23.04.0/include/poppler -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.10/include/python3.10 -fvisibility=hidden -fvisibility-inlines-hidden -fcolor-diagnostics -DNDEBUG -Wall -Winvalid-pch -O3 -MD -MQ src/cpp/document.cpython-310-darwin.so.p/document.cpp.o -MF src/cpp/document.cpython-310-darwin.so.p/document.cpp.o.d -o src/cpp/document.cpython-310-darwin.so.p/document.cpp.o -c ../../src/cpp/document.cpp
../../subprojects/pybind11-2.10.3/include/pybind11/detail/common.h:547:15: error: unknown type name 'constexpr'
inline static constexpr size_t size_in_ptrs(size_t s) {
To fix it add default_options : ['c_std=c11', 'cpp_std=c++11']
to project()
in meson.build
:
project(
'python-poppler',
'cpp',
version: '0.4.0',
license: 'GNU General Public License v2 (GPLv2)',
# license_files: 'LICENSE.txt',
meson_version: '>=1.0.0',
default_options : ['c_std=c11', 'cpp_std=c++11']
)
Then installation works:
$ pip3 install meson meson-python
$ pip3 install python_poppler
Successfully installed python-poppler-0.4.0
As of the time of writing, the Rotation Enum of python-poppler
looks like this
poppler.Rotation.rotate_0
poppler.Rotation.rotate_90
poppler.Rotation.rotate18_0
poppler.Rotation.rotate27_0
which can be confirmed with
>>> import poppler
>>> vars(poppler.Rotation)
mappingproxy({'__init__': <instancemethod __init__ at 0x7f2f00e265b0>, '__doc__': <pybind11_builtins.pybind11_static_property object at 0x7f2f00dfa4f0>, '__module__': 'poppler.cpp.global_', '__entries': {'rotate_0': (rotation_enum.rotate_0, None), 'rotate_90': (rotation_enum.rotate_90, None), 'rotate18_0': (rotation_enum.rotate18_0, None), 'rotate27_0': (rotation_enum.rotate27_0, None)}, '__repr__': <instancemethod at 0x7f2f00e262b0>, 'name': <property object at 0x7f2f00dfa630>, '__members__': <pybind11_builtins.pybind11_static_property object at 0x7f2f00dfa450>, '__eq__': <instancemethod at 0x7f2f00e26490>, '__ne__': <instancemethod at 0x7f2f00e264f0>, '__getstate__': <instancemethod at 0x7f2f00e26550>, '__hash__': <instancemethod at 0x7f2f00e26550>, '__int__': <instancemethod __int__ at 0x7f2f00e26610>, '__index__': <instancemethod __index__ at 0x7f2f00e26670>, '__setstate__': <instancemethod at 0x7f2f00e266d0>, 'rotate_0': rotation_enum.rotate_0, 'rotate_90': rotation_enum.rotate_90, 'rotate18_0': rotation_enum.rotate18_0, 'rotate27_0': rotation_enum.rotate27_0})
The position of the underscore is very inconsistent and a likely cause of errors for users who don't look closely (especially since there is no note in the documentation concerning the Enum's actual attributes). I think most people would expect all attributes to start with rotate_
, rather than weirdly having the underscore inside the rotation number for 180 and 270 degrees.
I think the Enum should be restructured to this
poppler.Rotation.rotate_0
poppler.Rotation.rotate_90
poppler.Rotation.rotate_180
poppler.Rotation.rotate_270
Hi,
do you have an idea, how to package pybind11_tests, that your tests require?
Pete
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.