cristoper / feedmixer Goto Github PK
View Code? Open in Web Editor NEWA self-hosted API to fetch and mix entries from Atom and RSS feeds (returns Atom, RSS, or JSON)
License: Do What The F*ck You Want To Public License
A self-hosted API to fetch and mix entries from Atom and RSS feeds (returns Atom, RSS, or JSON)
License: Do What The F*ck You Want To Public License
after running a simple test, the items array is empty
[mrgeorgen@mrgPC ~/.newsboat]$ http localhost:8080/json 'f==https://videos.lukesmith.xyz/feeds/videos.atom?videoChannelId=2' 'F==https://www.youtube.com/feeds/videos.xml?channel_id=UC7YOGHUfC1Tb6E4pudI9STA' n==20 full==true
HTTP/1.1 200 OK
Connection: close
Date: Mon, 25 Jan 2021 09:48:26 GMT
Server: gunicorn/20.0.4
content-length: 370
content-type: application/json
x-fm-errors: %7B%22https%3A//videos.lukesmith.xyz/feeds/videos.atom%3FvideoChannelId%3D2%22%3A%20%22%27NoneType%27%20object%20has%20no%20attribute%20%27get%27%22%7D
{
"description": "json feed created by FeedMixer.",
"home_page_url": "http://localhost:8080/json?f=https%3A%2F%2Fvideos.lukesmith.xyz%2Ffeeds%2Fvideos.atom%3FvideoChannelId%3D2&F=https%3A%2F%2Fwww.youtube.com%2Ffeeds%2Fvideos.xml%3Fchannel_id%3DUC7YOGHUfC1Tb6E4pudI9STA&n=20&full=true",
"items": [],
"title": "FeedMixer feed",
"version": "https://jsonfeed.org/version/1"
}
Right now the falcon api object is simply created at the top-level of the feedmixer_app.py module. It would be better to create it in a function, so we could pass things like the database file path to it (this would also allow cleaning up better after integration tests which currently create/use the same database file as the app).
Currently the feed cache will grow until it is manually deleted. Some sort of automatic pruning should be provided (even if just a script that can be run from cron).
Should evaluate how well this works: https://github.com/jmoiron/speedparser
Package up a JavaScript client that displays results from feedmixer.
Should investigate weather running under PyPy brings worthwhile performance improvements (compared to speedparser #3 ), and update deployment documentation accordingly.
Maybe mention that this service should not be granted access to any internal services which provide rss feeds (ie, to prevent server side forgery: https://www.agwa.name/blog/post/preventing_server_side_request_forgery_in_golang).
Specifically, we should log to stderr in the Dockerfile.
feedmixer_1 | [2020-01-13 15:54:38 +0000] [12] [ERROR] Error handling request /json?f=https://kurier.at/xml/rssd&f=https://rss.orf.at/news.xml&f=https://www.derstandard.at/rss/inland&n=10
feedmixer_1 | Traceback (most recent call last):
feedmixer_1 | File "/app/.venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 134, in handle
feedmixer_1 | self.handle_request(listener, req, client, addr)
feedmixer_1 | File "/app/.venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
feedmixer_1 | respiter = self.wsgi(environ, resp.start_response)
feedmixer_1 | File "/app/feedmixer_wsgi.py", line 59, in application
feedmixer_1 | return api(environ, start_response)
feedmixer_1 | File "falcon/api.py", line 274, in falcon.api.API.__call__
feedmixer_1 | File "falcon/api.py", line 269, in falcon.api.API.__call__
feedmixer_1 | File "/app/feedmixer_api.py", line 121, in on_get
feedmixer_1 | resp.body = method()
feedmixer_1 | File "/app/feedmixer.py", line 202, in json_feed
feedmixer_1 | return self.__generate_feed(JSONFeed).writeString('utf-8')
feedmixer_1 | File "/app/.venv/lib/python3.5/site-packages/feedgenerator/django/utils/feedgenerator.py", line 193, in writeString
feedmixer_1 | self.write(s, encoding)
feedmixer_1 | File "/app/.venv/lib/python3.5/site-packages/jsonfeed/core.py", line 28, in write
feedmixer_1 | data['items'] += [self.add_item_elements(item), ]
feedmixer_1 | File "/app/.venv/lib/python3.5/site-packages/jsonfeed/core.py", line 124, in add_item_elements
feedmixer_1 | 'url': attachment.get('enclosure_url'),
feedmixer_1 | AttributeError: 'Enclosure' object has no attribute 'get'
Example feeds that break it
https://www.kleinezeitung.at/rss/wirtschaft
https://kurier.at/xml/rssd
I should re-visit the docker file and make the resulting image smaller.
I followed the instructions in the readme but I get the following error
[mrgeorgen@mrgPC ~/dev/python/feedmixer]$ pipenv run gunicorn feedmixer_wsgi
[2021-01-24 22:34:16 +0100] [2878] [INFO] Starting gunicorn 20.0.4
[2021-01-24 22:34:16 +0100] [2878] [INFO] Listening at: http://127.0.0.1:8000 (2878)
[2021-01-24 22:34:16 +0100] [2878] [INFO] Using worker: sync
[2021-01-24 22:34:16 +0100] [2884] [INFO] Booting worker with pid: 2884
[2021-01-24 22:34:16 +0100] [2884] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 790, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_wsgi.py", line 18, in <module>
from feedmixer_api import wsgi_app
File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_api.py", line 54, in <module>
from feedmixer import FeedMixer
File "/home/mrgeorgen/dev/python/feedmixer/feedmixer.py", line 65, in <module>
import feedparser
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/feedparser/__init__.py", line 39, in <module>
from . import api
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/feedparser/api.py", line 53, in <module>
from .html import _BaseHTMLProcessor
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/feedparser/html.py", line 10, in <module>
from .sgml import *
AttributeError: module 'feedparser.sgml' has no attribute 'charref'
[2021-01-24 22:34:16 +0100] [2884] [INFO] Worker exiting (pid: 2884)
[2021-01-24 22:34:16 +0100] [2878] [INFO] Shutting down: Master
[2021-01-24 22:34:16 +0100] [2878] [INFO] Reason: Worker failed to boot.
I can't imagine having a persistent cache is very important, but it is slow. The one advantage is that it allows the cache to be used between processes, but I don't know how important that is.
I'd like to write a RAM-based replacement for shelfcache.cache_get()
(that wraps requests.get()
and handles the HTTP cache headers.
I get the following error:
[2021-01-29 15:52:14 +0100] [22341] [INFO] Starting gunicorn 20.0.4
[2021-01-29 15:52:14 +0100] [22341] [INFO] Listening at: http://127.0.0.1:8000 (22341)
[2021-01-29 15:52:14 +0100] [22341] [INFO] Using worker: sync
[2021-01-29 15:52:14 +0100] [22346] [INFO] Booting worker with pid: 22346
[2021-01-29 15:52:17 +0100] [22346] [ERROR] Error handling request /atom?n=0%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D15%3Fu%3Dcristoper%3Fp%3Dfeedmixer%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D13%3Fu%3Dcristoper%3Fp%3Dfeedmixer%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D3564%3Fu%3Ddense-analysis%3Fp%3Dale%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D192%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D189%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D194%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D190%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D191%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D193%3Fu%3Dthemoonisacheese%3Fp%3D2bored2wait%3Fformat%3DAtom%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D339%3Fu%3DFabricMC%3Fp%3Dfabric-loom%3Fformat%3DAtom
Traceback (most recent call last):
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_wsgi.py", line 64, in application
return api(environ, start_response)
File "/home/mrgeorgen/.local/share/virtualenvs/feedmixer-rEvpsW94/lib/python3.9/site-packages/falcon/api.py", line 269, in __call__
responder(req, resp, **params)
File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_api.py", line 114, in on_get
feeds, n, full = parse_qs(req)
File "/home/mrgeorgen/dev/python/feedmixer/feedmixer_api.py", line 84, in parse_qs
return ParsedQS(feeds, int(n), bool(full))
ValueError: invalid literal for int() with base 10: '0?f=https://rss.nixnet.services/index.php?action=display?bridge=GithubIssue?context=Issue comments?i=15?u=cristoper?p=feedmixer?format=Atom?f=https://rss.nixnet.services/index.php?action=display?brid
after running a shell script
curl 'https://github.com/search?o=desc&q=involves%3AMrGeorgen&s=updated' | github-issue-search
gihtub-issue-search script:
#!/bin/sh
feeds=`grep -Po '(?<=href=")[^"]*' < /dev/stdin |uniq -u |awk -F / '{ if($4 == "issues" || $4 == "pull") { printf "%s", "%3Ff%3Dhttps%3A%2F%2Frss.nixnet.services%2Findex.php%3Faction%3Ddisplay%3Fbridge%3DGithubIssue%3Fcontext%3DIssue+comments%3Fi%3D"$5"%3Fu%3D"$2"%3Fp%3D"$3"%3Fformat%3DAtom"} }'`
curl "http://localhost:8000/atom?n=0$feeds"
I'm not sure if my script passes unexpected parameters. If so feedmixer should print a proper error message which parameter is the problem
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.