GithubHelp home page GithubHelp logo

Comments (6)

JorjMcKie avatar JorjMcKie commented on June 10, 2024 1

Update: fix developed.

from pymupdf.

JorjMcKie avatar JorjMcKie commented on June 10, 2024

This post cannot be accepted with a reproducing file.
To circumvent an urgent situation, please use argument fallback=True.

from pymupdf.

ragebear00 avatar ragebear00 commented on June 10, 2024

try to run doc.subset_fonts in the attached file will create an error in an
1 - Copy.pdf
earlier version.

Under with fallback, the doc.subset_fonts will raise the same error.

Under new version(without fallback), the error will not be raised, but the file doc.save after doc.subset_fonts will scramble the words.

from pymupdf.

cbm755 avatar cbm755 commented on June 10, 2024

I can reproduce the previous comment:

In [2]: fitz.version
Out[2]: ('1.23.3', '1.23.2', '20230831000001')

In [3]: d = fitz.open("1.-.Copy.pdf")

In [4]: d.subset_fonts()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 d.subset_fonts()

File /usr/lib64/python3.12/site-packages/fitz/utils.py:5448, in subset_fonts(doc, verbose)
   5445 # walk through the original font xrefs and replace each by the subset def
   5446 for font_xref in xref_set:
   5447     # we need the original '/W' and '/DW' width values
-> 5448     width_table, def_width = get_old_widths(font_xref)
   5449     # ... and replace original font definition at xref with it
   5450     doc.update_object(font_xref, font_str)

File /usr/lib64/python3.12/site-packages/fitz/utils.py:5175, in subset_fonts.<locals>.get_old_widths(xref)
   5173 if df[0] != "array":  # only handle xref specifications
   5174     return None, None
-> 5175 df_xref = int(df[1][1:-1].replace("0 R", ""))
   5176 widths = doc.xref_get_key(df_xref, "W")
   5177 if widths[0] != "array":  # no widths key found

ValueError: invalid literal for int() with base 10: '<</BaseFont/CIDFont+F1/CIDSystemInfo<</Ordering 13 /Registry 14 /Supplement 0>>/CIDToGIDMap/Identity/FontDescriptor<</Ascent 952/CapHeight 631/Descent -268/Flags 6/FontBBox 15 /FontFile2 16 /FontName

But with 1.24.3, I get no error and upon save I see scrambled words:
image

from pymupdf.

JorjMcKie avatar JorjMcKie commented on June 10, 2024

The MuPDF team has developed a fix which I am currently testing.

from pymupdf.

cbm755 avatar cbm755 commented on June 10, 2024

I have a possibly-related issue where 1.24.3 leaves some misc chars on the page, which go away if I stop using subset_fonts. Haven't narrowed it down to a MWE yet, but one difference is I DO NOT get an error with older pymupdf: so it might not be quite the same issue... More to follow.

Downstream issue: https://gitlab.com/plom/plom/-/issues/3374

from pymupdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.