GithubHelp home page GithubHelp logo

cristoper / feedmixer Goto Github PK

View Code? Open in Web Editor NEW
150.0 2.0 10.0 317 KB

A self-hosted API to fetch and mix entries from Atom and RSS feeds (returns Atom, RSS, or JSON)

License: Do What The F*ck You Want To Public License

Python 97.43% Dockerfile 2.57%
feed atom rss api-server self-hosted python atom-feed

feedmixer's Introduction

Readme Card

feedmixer's People

Contributors

cristoper avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

feedmixer's Issues

returns no feeds

after running a simple test, the items array is empty

[mrgeorgen@mrgPC ~/.newsboat]$ http localhost:8080/json 'f==https://videos.lukesmith.xyz/feeds/videos.atom?videoChannelId=2' 'F==https://www.youtube.com/feeds/videos.xml?channel_id=UC7YOGHUfC1Tb6E4pudI9STA' n==20 full==true 
HTTP/1.1 200 OK 
Connection: close 
Date: Mon, 25 Jan 2021 09:48:26 GMT 
Server: gunicorn/20.0.4 
content-length: 370 
content-type: application/json 
x-fm-errors: %7B%22https%3A//videos.lukesmith.xyz/feeds/videos.atom%3FvideoChannelId%3D2%22%3A%20%22%27NoneType%27%20object%20has%20no%20attribute%20%27get%27%22%7D 
{ 
    "description": "json feed created by FeedMixer.", 
    "home_page_url": "http://localhost:8080/json?f=https%3A%2F%2Fvideos.lukesmith.xyz%2Ffeeds%2Fvideos.atom%3FvideoChannelId%3D2&F=https%3A%2F%2Fwww.youtube.com%2Ffeeds%2Fvideos.xml%3Fchannel_id%3DUC7YOGHUfC1Tb6E4pudI9STA&n=20&full=true", 
    "items": [], 
    "title": "FeedMixer feed", 
    "version": "https://jsonfeed.org/version/1" 
} 

Allow passing database file path to api

Right now the falcon api object is simply created at the top-level of the feedmixer_app.py module. It would be better to create it in a function, so we could pass things like the database file path to it (this would also allow cleaning up better after integration tests which currently create/use the same database file as the app).

Cache pruning

Currently the feed cache will grow until it is manually deleted. Some sort of automatic pruning should be provided (even if just a script that can be run from cron).

Investigate running with PyPy

Should investigate weather running under PyPy brings worthwhile performance improvements (compared to speedparser #3 ), and update deployment documentation accordingly.

AttributeError: 'Enclosure' object has no attribute 'get'

feedmixer_1  | [2020-01-13 15:54:38 +0000] [12] [ERROR] Error handling request /json?f=https://kurier.at/xml/rssd&f=https://rss.orf.at/news.xml&f=https://www.derstandard.at/rss/inland&n=10
feedmixer_1  | Traceback (most recent call last):
feedmixer_1  |   File "/app/.venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 134, in handle
feedmixer_1  |     self.handle_request(listener, req, client, addr)
feedmixer_1  |   File "/app/.venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
feedmixer_1  |     respiter = self.wsgi(environ, resp.start_response)
feedmixer_1  |   File "/app/feedmixer_wsgi.py", line 59, in application
feedmixer_1  |     return api(environ, start_response)
feedmixer_1  |   File "falcon/api.py", line 274, in falcon.api.API.__call__
feedmixer_1  |   File "falcon/api.py", line 269, in falcon.api.API.__call__
feedmixer_1  |   File "/app/feedmixer_api.py", line 121, in on_get
feedmixer_1  |     resp.body = method()
feedmixer_1  |   File "/app/feedmixer.py", line 202, in json_feed
feedmixer_1  |     return self.__generate_feed(JSONFeed).writeString('utf-8')
feedmixer_1  |   File "/app/.venv/lib/python3.5/site-packages/feedgenerator/django/utils/feedgenerator.py", line 193, in writeString
feedmixer_1  |     self.write(s, encoding)
feedmixer_1  |   File "/app/.venv/lib/python3.5/site-packages/jsonfeed/core.py", line 28, in write
feedmixer_1  |     data['items'] += [self.add_item_elements(item), ]
feedmixer_1  |   File "/app/.venv/lib/python3.5/site-packages/jsonfeed/core.py", line 124, in add_item_elements
feedmixer_1  |     'url': attachment.get('enclosure_url'),
feedmixer_1  | AttributeError: 'Enclosure' object has no attribute 'get'

Example feeds that break it

https://www.kleinezeitung.at/rss/wirtschaft
https://kurier.at/xml/rssd

AttributeError: module 'feedparser.sgml' has no attribute 'charref'

I followed the instructions in the readme but I get the following error

[mrgeorgen@mrgPC ~/dev/python/feedmixer]$ pipenv run gunicorn feedmixer_wsgi
[2021-01-24 22:34:16 +0100] [2878] [INFO] Starting gunicorn 20.0.4
[2021-01-24 22:34:16 +0100] [2878] [INFO] Listening at: http://127.0.0.1:8000 (2878)
[2021-01-24 22:34:16 +0100] [2878] [INFO] Using worker: sync
[2021-01-24 22:34:16 +0100] [2884] [INFO] Booting worker with pid: 2884
[2021-01-24 22:34:16 +0100] [2884] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/base.py", line 119, in init_process
    self.load_wsgi()
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
    return self.load_wsgiapp()
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/util.py", line 358, in import_app
    mod = importlib.import_module(module)
  File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_wsgi.py", line 18, in <module>
    from feedmixer_api import wsgi_app
  File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_api.py", line 54, in <module>
    from feedmixer import FeedMixer
  File "/home/mrgeorgen/dev/python/feedmixer/feedmixer.py", line 65, in <module>
    import feedparser
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/feedparser/__init__.py", line 39, in <module>
    from . import api
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/feedparser/api.py", line 53, in <module>
    from .html import _BaseHTMLProcessor
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/feedparser/html.py", line 10, in <module>
    from .sgml import *
AttributeError: module 'feedparser.sgml' has no attribute 'charref'
[2021-01-24 22:34:16 +0100] [2884] [INFO] Worker exiting (pid: 2884)
[2021-01-24 22:34:16 +0100] [2878] [INFO] Shutting down: Master
[2021-01-24 22:34:16 +0100] [2878] [INFO] Reason: Worker failed to boot.

Replace persistent cache with RAM cache

I can't imagine having a persistent cache is very important, but it is slow. The one advantage is that it allows the cache to be used between processes, but I don't know how important that is.

I'd like to write a RAM-based replacement for shelfcache.cache_get() (that wraps requests.get() and handles the HTTP cache headers.

ValueError: invalid literal for int() with base 10

I get the following error:

[2021-01-29 15:52:14 +0100] [22341] [INFO] Starting gunicorn 20.0.4
[2021-01-29 15:52:14 +0100] [22341] [INFO] Listening at: http://127.0.0.1:8000 (22341)
[2021-01-29 15:52:14 +0100] [22341] [INFO] Using worker: sync
[2021-01-29 15:52:14 +0100] [22346] [INFO] Booting worker with pid: 22346
[2021-01-29 15:52:17 +0100] [22346] [ERROR] Error handling request /atom?n=0%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D15%3Fu%3Dcristoper%3Fp%3Dfeedmixer%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D13%3Fu%3Dcristoper%3Fp%3Dfeedmixer%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D3564%3Fu%3Ddense-analysis%3Fp%3Dale%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D192%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D189%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D194%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D190%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D191%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D193%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D339%3Fu%3DFabricMC%3Fp%3Dfabric-loom%3Fformat%3DAtom
Traceback (most recent call last):
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 134, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_wsgi.py", line 64, in application
    return api(environ, start_response)
  File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/falcon/api.py", line 269, in __call__
    responder(req, resp, **params)
  File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_api.py", line 114, in on_get
    feeds, n, full = parse_qs(req)
  File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_api.py", line 84, in parse_qs
    return ParsedQS(feeds, int(n), bool(full))
ValueError: invalid literal for int() with base 10: '0?f=https://rss.nixnet.services/index.php?action=display?bridge=GithubIssue?context=Issue comments?i=15?u=cristoper?p=feedmixer?format=Atom?f=https://rss.nixnet.services/index.php?action=display?brid

after running a shell script
curl 'https://github.com/search?o=desc&q=involves%3AMrGeorgen&s=updated' | github-issue-search
gihtub-issue-search script:

#!/bin/sh
feeds=`grep -Po '(?<=href=")[^"]*' < /dev/stdin |uniq -u |awk -F / '{ if($4 == "issues" || $4 == "pull") { printf "%s", "%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D"$5"%3Fu%3D"$2"%3Fp%3D"$3"%3Fformat%3DAtom"} }'`
curl "http://localhost:8000/atom?n=0$feeds"

I'm not sure if my script passes unexpected parameters. If so feedmixer should print a proper error message which parameter is the problem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.