GithubHelp home page GithubHelp logo

Comments (4)

kellemNegasi avatar kellemNegasi commented on July 22, 2024 3

Hi @ryankilroy , This issue is fixed in the new release (v3.60.0) which can be found here https://github.com/unidoc/unipdf/releases/tag/v3.60.0. Closing this ticket as fixed.

from unipdf.

kellemNegasi avatar kellemNegasi commented on July 22, 2024 2

Hi @ryankilroy, after some investigation, we found out that the issue is in the ToUnicode map provided in the document. It has an invalid code point for the character code that represented the missing letter (l). But the reason other tools were able to extract the correct character is that they resorted to the Replacement Text data provoded as part of the marked content. Currently, our extractor doesn't implement this feature, which is why it just took the invalid code point (which is by the way in the Private Use Area of Unicode ) and extracted it as valid text. We plan to incorporate this feature in the future and provide an update on this ticket upon its release.

Regarding your second issue, i.e., font extraction, the reason for the font extraction failure is that there is no font in pages 3 and beyond (because the pages are scanned). But the error message is not informative enough to convey this. We will update this one too.

from unipdf.

github-actions avatar github-actions commented on July 22, 2024

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

from unipdf.

kellemNegasi avatar kellemNegasi commented on July 22, 2024

Hi @ryankilroy , thank you for reporiting this issue. We were able to reproduce it using the sample code and sample file you provided and we are currently investigating the cause of it. We will write an update as soon as we identify the source of the issue and the fixes.

from unipdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.