Comments (6)
Update: fix developed.
from pymupdf.
This post cannot be accepted with a reproducing file.
To circumvent an urgent situation, please use argument fallback=True
.
from pymupdf.
try to run doc.subset_fonts in the attached file will create an error in an
1 - Copy.pdf
earlier version.
Under with fallback, the doc.subset_fonts will raise the same error.
Under new version(without fallback), the error will not be raised, but the file doc.save after doc.subset_fonts will scramble the words.
from pymupdf.
I can reproduce the previous comment:
In [2]: fitz.version
Out[2]: ('1.23.3', '1.23.2', '20230831000001')
In [3]: d = fitz.open("1.-.Copy.pdf")
In [4]: d.subset_fonts()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 d.subset_fonts()
File /usr/lib64/python3.12/site-packages/fitz/utils.py:5448, in subset_fonts(doc, verbose)
5445 # walk through the original font xrefs and replace each by the subset def
5446 for font_xref in xref_set:
5447 # we need the original '/W' and '/DW' width values
-> 5448 width_table, def_width = get_old_widths(font_xref)
5449 # ... and replace original font definition at xref with it
5450 doc.update_object(font_xref, font_str)
File /usr/lib64/python3.12/site-packages/fitz/utils.py:5175, in subset_fonts.<locals>.get_old_widths(xref)
5173 if df[0] != "array": # only handle xref specifications
5174 return None, None
-> 5175 df_xref = int(df[1][1:-1].replace("0 R", ""))
5176 widths = doc.xref_get_key(df_xref, "W")
5177 if widths[0] != "array": # no widths key found
ValueError: invalid literal for int() with base 10: '<</BaseFont/CIDFont+F1/CIDSystemInfo<</Ordering 13 /Registry 14 /Supplement 0>>/CIDToGIDMap/Identity/FontDescriptor<</Ascent 952/CapHeight 631/Descent -268/Flags 6/FontBBox 15 /FontFile2 16 /FontName
But with 1.24.3, I get no error and upon save I see scrambled words:
from pymupdf.
The MuPDF team has developed a fix which I am currently testing.
from pymupdf.
I have a possibly-related issue where 1.24.3 leaves some misc chars on the page, which go away if I stop using subset_fonts. Haven't narrowed it down to a MWE yet, but one difference is I DO NOT get an error with older pymupdf: so it might not be quite the same issue... More to follow.
Downstream issue: https://gitlab.com/plom/plom/-/issues/3374
from pymupdf.
Related Issues (20)
- Widget font not being updated HOT 3
- Check the hash of the downloaded MuPDF tarball
- pix = page.get_pixmap(matrix=matrix, clip=rect) recommend to modify function get_pixmap HOT 1
- insert_pdf gives TypeError HOT 4
- insert_pdf gives SystemError HOT 6
- Embedded full-text search index HOT 4
- Page.delete_widget() doesn't fully remove the widget, other programs still detect the widgets HOT 14
- regression: fill_textbox: IndexError: pop from empty list HOT 5
- Unable to create a checked radiobutton HOT 1
- draw_rect scaled to very small size HOT 5
- set_toc method error HOT 8
- Marked content sequences in text trace dictionary HOT 3
- 1.24.2/1.24.3: spurious characters introduced when using subset_fonts and insert_pdf HOT 7
- PyMuPDF 1.24.4 causes "segmentation fault" under Python 3.12 and old MAC OS HOT 12
- pixmap.invert_irect(pixmap.irect) take 7 seconds HOT 3
- cygwin x64 pip3 install pymupdf error HOT 2
- When extracting a numbered list, the result is not as expected. HOT 3
- Small size after apply fitz.TOOLS.set_small_glyph_heights(True) HOT 2
- page.get_label() gets wrong label on the first page of doc
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf.