GithubHelp home page GithubHelp logo

Comments (2)

chanwoood avatar chanwoood commented on August 22, 2024

首先,需要确定最终整合成什么类型文件,我认为 PDF 没多大问题,Gitbook 不是只能在线阅读的吗?想不到怎么用 Gitbook 封装。PDF 中图片是死的,不能缩放,这是 PDF 的通病,我也没办法。如果最终整合为 HTML,那就直接用 Chrome 打开网页,Ctrl + S 算了。

之前我把爬取的信息,转换 HTML ,再通过 pdfkit 来生成 PDF 文档。pdfkit 没能很好地处理图片跨页问题。可以换一条思路:爬取的原始信息 --> Markdown 文档 --> PDF 文档。第一个箭头比较容易实现,第二个箭头可以用 typora 转换,我试了一下,不会产生图片跨页,只是需要手动打开 typora 转换有点麻烦。虽然网上也有一些在线转换,可以用程序自动化完成,只是要么对中文支持不好,要么不稳定。

至于图片分辨率变低,我根据原图那条 URL 抓取的,可能这锅由 pdfkit 背,这货可能在图片转换为 PDF 时降低其分辨率。由 Markown 转为 PDF 可能可以解决。

最后抓取到15页就没了,可能是翻页那条链接没有处理好,这个需要具体分析,我只爬过两个星球,其他星球出现各种 bug 很正常。

最近找实习,估计没时间弄了,抱歉。。。

from crawl-zsxq.

 avatar commented on August 22, 2024

@96chh Gitbook的阅读体验很好,转换为PDF会造成图片方面的各种问题,这个锅当然不能扔给你:)
个人觉得,就按照Gitbook原始的HTML/Markdown格式生成一个目录结构就行了,有需求的人可以自己用Gitbook自带的功能去转换为PDF。

from crawl-zsxq.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.