GithubHelp home page GithubHelp logo

Comments (5)

jrwagz avatar jrwagz commented on May 23, 2024 1

@azec-pdx , I'm using an M series MacOS device and I was able to use the code on your branch (commit azec-pdx@a2be61e) and was able to get around this same problem for myself. Thank you!

from safaribooks.

azec-pdx avatar azec-pdx commented on May 23, 2024

@lorenzodifuccia && @ivanpagac ,

I believe there is a set of problems introduced with later versions of Python that LXML hasn't addressed yet.
I am watching the following:

  1. https://bugs.launchpad.net/lxml/+bug/1949271
  2. https://github.com/Donohue/medium-to-jekyll/pull/4/files
  3. Donohue/medium-to-jekyll#3

Regardless of this external change in lxml, I found the issue in this project with handling emojis and other special unicode characters when requesting lxml to parse the document, for the versions of Python with which lxml behaves well.

I have addressed the issue in https://github.com/azec-pdx/safaribooks/tree/apiv2 .
I was able to confirm positive results with testing on Book with IDs: 9781098156817 and 9781617297274 which both have some emojis and other offending characters. However, I was able to only get the parsing right with Python 3.9.16 and while using Python 3.9.10, it is still broken (I believe because of the additional issue linked above).

Screenshot 2023-03-27 at 9 08 37 AM

Screenshot 2023-03-27 at 8 58 21 AM

from safaribooks.

azec-pdx avatar azec-pdx commented on May 23, 2024

I've had different behaviors of lxml on same Python version between macOS running Apple M1 chip and macOS running Apple Intel chip. On M1 macOS, it basically errors as described above and my branch is handling that now, but on Intel macOS it never errors out.

from safaribooks.

trsudarshan avatar trsudarshan commented on May 23, 2024

@azec-pdx thank you, is there a version of lxml (fixing Python at 3.9.x), where this error can be avoided? If so, patching requirements.txt to that version of lxml may allow users to locally work around this problem, until a formal PR resolving it, gets merged.

from safaribooks.

dreampuf avatar dreampuf commented on May 23, 2024

#347 fixed this issue.

from safaribooks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.