GithubHelp home page GithubHelp logo

owulveryck / repocketable Goto Github PK

View Code? Open in Web Editor NEW
53.0 4.0 4.0 336 KB

Tool to fetch articles from (getPocket|the web) and turn them into epub

License: MIT License

Go 100.00%
readability getpocket epub epub-generation hacktoberfest2021 hacktoberfest

repocketable's Introduction

rePocketable

GitHub go.mod Go version of a Go module Linux macOS Windows Build

This tool and its webpage are under construction.

Best possible option if you want to see what it will eventually do is to run a cli tool such as to epub:

go run cmd/toEpub/*.go https://whateverpageyouwanttoread/

This utility takes optional -H arguments to pass headers to the http downloader. This option can be used several times to be compatible with the curl command.

ex:

 toEpub -H 'sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36' \
  -H 'sec-ch-ua-platform: "macOS"'  https://thewebsite/thepage.html

I wrote some explanation of the concept in a blog post

Hacktoberfest

This is a toy project, but I am more and more relying on it. I think that hacktoberfest is a good opportunity to turn this project into a product. I will write a contributing guide soon; meanwhile if you want to participate the urgent matters are:

  • Writing a proper vision: discussing it into an issue and submitting a PR to mention in in the README
  • Writing autonomous end-to-end tests: grabbing a sample page, running an httptest server, running a toEpub code and analysing the result
  • Writing a proper documentation
  • Adding a contribution guide
  • sky is the limit, discuss in issues and submit PR once issues are discussed :D

Features

The internal libraries (used by the CLI) are implemeting those features:

  • Webpage fetching and pre-processing
    • preprocessing and sanitization of figures to fetch the correct image from responsive and/or javascript tags (Medium and Toward datascience)
    • experimental feature to turn LaTeX figures into pictures (github.com/go-latex/latex)
    • extraction of the content based on the ARC90 readility project (github.com/cixtor/readability)
  • Opengraph processing to extract meta informations (github.com/dyatlov/go-opengraph)
    • Generation of a cover picture with the front image of the website, the title and the author of the artible
    • Generation of a first chapter with meta data such as the publication date
  • epub generation (github.com/bmaupin/go-epub)
  • experimental getpocket integration
    • reading the article lists and generating epubs from the list
    • a daemon mode that will eventually runs on a ereader device to sync the list (heavy WIP)

Configurations

Those configuration may influence various internal libraries.

KEY TYPE DEFAULT REQUIRED DESCRIPTION
DOWNLOADER_LIVENESS_CHECK Duration 5m true
DOWNLOADER_PROBE_TIMEOUT Duration 60m true
DOWNLOADER_HTTP_TIMEOUT Duration 10s true
DOWNLOADER_TRANSPORT_TIMEOUT Duration 5s true

Those configuration are used for cli using the pocket integration

KEY TYPE DEFAULT REQUIRED DESCRIPTION
POCKET_UPDATE_FREQUENCY Duration 1h true How often to query getPocket
POCKET_HEALTH_CHECK Duration 30s true
POCKET_POCKET_URL String https://getpocket.com/v3/get true
POCKET_CONSUMER_KEY String true See https://getpocket.com/developer/apps/ to get a consumer key
POCKET_USERNAME String The pocket username (will try to fetch it if not found)
POCKET_TOKEN String The access token, will try to fetch it if not found or invalid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.