pictuga / morss Goto Github PK
View Code? Open in Web Editor NEWGet full text RSS feeds
Home Page: https://morss.it/
License: GNU Affero General Public License v3.0
Get full text RSS feeds
Home Page: https://morss.it/
License: GNU Affero General Public License v3.0
When trying to connect with facebook following error appears:
Can't Load URL: The domain of this URL isn't included in the app's domains. To be able to load this URL, add all domains and subdomains of your app to the App Domains field in your app settings.
getting this new error, am running in docker, git build
url: www.maketecheasier.com/feed
the problem is with self-host only, works fine with morss.it
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 196, in cgi_file_handler
return app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 132, in cgi_app
url, rss = FeedFetch(url, options)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 275, in FeedFetch
raise MorssException('Error downloading feed')
morss.morss.MorssException: Error downloading feed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 156, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 266, in cgi_encode
out = app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 156, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 260, in cgi_error_handler
log('ERROR: %s' % repr(e), force=True)
TypeError: log() got an unexpected keyword argument 'force'
[2021-01-03 15:29:41 +0000] [8] [ERROR] Error handling request /favicon.ico
Traceback (most recent call last):
File "/usr/lib/python3.8/urllib/request.py", line 1350, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1301, in _send_reques
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/usr/lib/python3.8/http/client.py", line 921, in connect
self.sock = self._create_connection(
File "/usr/lib/python3.8/socket.py", line 787, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name does not resolve
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 272, in FeedFetch
req = crawler.adv_get(url=url, follow=('rss' if not options.items else None), delay=delay, timeout=TIMEOUT * 2)
File "/usr/lib/python3.8/site-packages/morss/crawler.py", line 92, in adv_get
con = custom_handler(*args, **kwargs).open(url, timeout=timeout)
File "/usr/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 1379, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.8/urllib/request.py", line 1353, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name does not resolve>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 252, in cgi_error_handler
return app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 156, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 246, in cgi_dispatcher
return app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 156, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 196, in cgi_file_handler
return app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 132, in cgi_app
url, rss = FeedFetch(url, options)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 275, in FeedFetch
raise MorssException('Error downloading feed')
morss.morss.MorssException: Error downloading feed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 156, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 266, in cgi_encode
out = app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 156, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 260, in cgi_error_handler
log('ERROR: %s' % repr(e), force=True)
TypeError: log() got an unexpected keyword argument 'force'
First I received a 50x error. Now landing page is displayed but it can't connect to any feed. Are you working to have morss.it up & ready again?
Hello
I tried to install morss on an up to date arch linux.
> pip install git+https://git.pictuga.com/pictuga/morss.git@master
Collecting git+https://git.pictuga.com/pictuga/morss.git@master
Cloning https://git.pictuga.com/pictuga/morss.git (to revision master) to /tmp/pip-req-build-wgrp5hhd
Running command git clone -q https://git.pictuga.com/pictuga/morss.git /tmp/pip-req-build-wgrp5hhd
Requirement already satisfied (use --upgrade to upgrade): morss==0.0.0 from git+https://git.pictuga.com/pictuga/morss.git@master in /usr/lib/python3.8/site-packages
Requirement already satisfied: lxml in /usr/lib/python3.8/site-packages (from morss==0.0.0) (4.5.2)
Requirement already satisfied: bs4 in /usr/lib/python3.8/site-packages (from morss==0.0.0) (0.0.1)
Requirement already satisfied: python-dateutil in /usr/lib/python3.8/site-packages (from morss==0.0.0) (2.8.1)
Requirement already satisfied: chardet in /usr/lib/python3.8/site-packages (from morss==0.0.0) (3.0.4)
Requirement already satisfied: pymysql in /usr/lib/python3.8/site-packages (from morss==0.0.0) (0.10.0)
Requirement already satisfied: beautifulsoup4 in /usr/lib/python3.8/site-packages (from bs4->morss==0.0.0) (4.9.1)
Requirement already satisfied: six>=1.5 in /usr/lib/python3.8/site-packages (from python-dateutil->morss==0.0.0) (1.15.0)
Requirement already satisfied: soupsieve>1.2 in /usr/lib/python3.8/site-packages (from beautifulsoup4->bs4->morss==0.0.0) (2.0.1)
Building wheels for collected packages: morss
Building wheel for morss (setup.py) ... done
Created wheel for morss: filename=morss-0.0.0-py3-none-any.whl size=62552 sha256=6075ad834cfcecdea16f668925aa5f1725db3c1ec27dfd2c67bb740af00426e5
Stored in directory: /tmp/pip-ephem-wheel-cache-1iodvrzu/wheels/fa/e1/35/7dc2cbdfdaa5b83a83c5ed461628da31febcef62e43ab29823
Successfully built morss
but when i try to use it :
morss --help
Traceback (most recent call last):
File "/usr/bin/morss", line 33, in <module>
sys.exit(load_entry_point('morss==0.0.0', 'console_scripts', 'morss')())
File "/usr/bin/morss", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/lib/python3.8/importlib/metadata.py", line 79, in load
return functools.reduce(getattr, attrs, module)
AttributeError: module 'morss' has no attribute 'main'
Can you help me?
Thank you very much
Hi,
I've tried to install morss but have hit upon an issue when running the test scenario:
MyMachine$ python -m debug http://www/bbc/co/uk/bla/bla.xml
ERROR: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
Stackoverflow suggested adding ".text_factory = str" in, which I did in the init function of the SQLiteCache function in crawlers.py, which appears to have solved my problem. Later comments suggest this isn't the right solution and the data should be converted to binary:
I'm not sure what's right, but thought I'd flag it up to you. I've not had a chance to use this properly yet, but looks very useful - thanks for sharing it.
Cheers,
Iain.
Hi there,
in file feeds.py there is an undefined variable "f" around line 320. Seems like remnant of some renaming. Here the whole function:
def __set__(self, instance, value):
feedlist = self.__get__(instance)
[x.remove() for x in [x for x in f.items]] # f is not defined
[feedlist.append(x) for x in value]
Only report that this is not working:
https://blog.sysaid.com/feed?post_type=sysaid_blog
https://morss.it/https://blog.sysaid.com/feed?post_type=sysaid_blog
I'm trying to use Morss with Android Police, but I get an error saying it Couldn't load the feed.
Hi,
I managed to get some results by fetching rss via morss.it. It takes a long time but it's working.
The url used:
https://morss.it/:items=||*[class=dataList-cell]||a/https://platinmods.com/advancedsearch/advancedsearch-results?type=thread&keywords=&posted_by=&search_forums[]=156
The problem is that it doesn't work with my own server. After 30 seconds I get:
[2021-01-13 10:50:38 +0000] [30] [INFO] Booting worker with pid: 30
[2021-01-13 10:51:08 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:27)
[2021-01-13 10:51:08 +0000] [27] [INFO] Worker exiting (pid: 27)
Even if I put the TIMEOUT to more than 30 seconds the error come every 30 seconds.
What are the morss.it server parameters?
Here is my docker-compose.yml:
version: "2"
services:
morss:
build: /home/xxxxxx/sources/morss
ports:
- '9090:8080'
restart: unless-stopped
environment:
- MAX_ITEM=100
- LIM_TIME=-1
- MAX_TIME=-1
- LIM_ITEM=100
- TIMEOUT=40
- IGNORE_SSL=1
- DEBUG=1
Another thing: Even with DEBUG=1 my docker logs don't return more infos.
Started having issues with feed (www.accountingtoday.com/feed?rss=true) through morss.it, results in Invalid SSL Certificate errors. Can you please help?
salut,
voilà facilement 1 mois et demi que je tente d'installer morss (je ne connais absolument pas python, du coup çà n'aide pas...) sans réussite.
Peu importe la méthode que j'utilise, j'arrive toujours à un problème d'"appli inexistante" et/ou d'import.
Dernier exemple en date (ce soir), en suivant une xxème fois le tuto de https://blog.ronsonchan.com/setting-up-morss-full-text-rss-expander-on-debian-wheezy/ (que tu as mis en avant sur twitter).
lorsque j'arrive à cette étape :
4 -Set up uWSGI
c. Test to see if it works, run uwsgi --http-socket :8080 --wsgi-file morss.py --callable cgi_wrapper
d. Access http://:8080 and you should get the morss default page.
Dans le terminal, j'obtiens :
*** Starting uWSGI 2.0.11.1 (32bit) on [Tue Sep 22 20:23:14 2015] ***
compiled with version: 4.6.3 on 21 September 2015 21:44:50
os: Linux-3.18.11+ #781 PREEMPT Tue Apr 21 18:02:18 BST 2015
nodename: raspberrypi
machine: armv6l
clock source: unix
detected number of CPU cores: 1
current working directory: /usr/share/nginx/www/morss
detected binary path: /usr/share/nginx/www/morss/morss_venv/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 3416
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :8080 fd 3
Python version: 2.7.3 (default, Mar 18 2014, 05:13:23) [GCC 4.6.3]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x1f8d780
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 64256 bytes (62 KB) for 1 cores
*** Operational MODE: single process ***
Traceback (most recent call last):
File "morss.py", line 15, in <module>
from . import feeds
ValueError: Attempted relative import in non-package
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. going in full dynamic mode ***
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (and the only) (pid: 10738, cores: 1)
Dans mon browser :
Internal Server Error
Puis (après la requete dans le browser) dans mon terminal :
--- no python application found, check your startup logs for errors ---
[pid: 10738|app: -1|req: -1/3] 192.168.0.254 () {30 vars in 503 bytes} [Tue Sep 22 20:25:15 2015] GET /favicon.ico => generated 21 bytes in 1 msecs (HTTP/1.1 500) 2 headers in 83 bytes (0 switches on core 0)
J'ai du mal à comprendre l'erreur sur la ligne 15 déjà, feeds.py est bien dans le même répertoire que morss.py.
what's wrong ?
Hello,
i have existing docker compose, in which i want to add morss as another service,
is there any image available either on dockerhub/github.?
if i try git.pictuga.com/pictuga/morss.git
as is obviously it gives error.
when i download and build the docker container, i see package 'wheel' is not installed
how can i fix it?
Building morss
Step 1/5 : FROM alpine:latest
---> a24bb4013296
Step 2/5 : RUN apk add --no-cache python3 py3-lxml py3-gunicorn py3-pip git
---> Using cache
---> 5af6122bab90
Step 3/5 : ADD . /app
---> a670f3716eb1
Step 4/5 : RUN pip3 install /app
---> Running in 6b4f859dfe1a
Processing /app
Requirement already satisfied: lxml in /usr/lib/python3.8/site-packages (from morss==0.0.0) (4.5.1)
Collecting bs4
Downloading bs4-0.0.1.tar.gz (1.1 kB)
Collecting python-dateutil
Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Requirement already satisfied: chardet in /usr/lib/python3.8/site-packages (from morss==0.0.0) (3.0.4)
Collecting pymysql
Downloading PyMySQL-0.10.1-py2.py3-none-any.whl (47 kB)
Collecting beautifulsoup4
Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
Requirement already satisfied: six>=1.5 in /usr/lib/python3.8/site-packages (from python-dateutil->morss==0.0.0) (1.15.0)
Collecting soupsieve>1.2; python_version >= "3.0"
Downloading soupsieve-2.0.1-py3-none-any.whl (32 kB)
Using legacy setup.py install for morss, since package 'wheel' is not installed.
Using legacy setup.py install for bs4, since package 'wheel' is not installed.
Installing collected packages: soupsieve, beautifulsoup4, bs4, python-dateutil, pymysql, morss
Running setup.py install for bs4: started
Running setup.py install for bs4: finished with status 'done'
Running setup.py install for morss: started
Running setup.py install for morss: finished with status 'done'
Successfully installed beautifulsoup4-4.9.3 bs4-0.0.1 morss-0.0.0 pymysql-0.10.1 python-dateutil-2.8.1 soupsieve-2.0.1
Removing intermediate container 6b4f859dfe1a
---> fd192327b049
Step 5/5 : CMD gunicorn --bind 0.0.0.0:8080 -w 4 --preload morss
---> Running in e2194b9749eb
Removing intermediate container e2194b9749eb
---> 724ac9294c12Successfully built 724ac9294c12
hello
Some feed can't get photos
for example : http://www.sudouest.fr/pyrenees-atlantiques/bayonne/rss.xml
Hi!
I have some feeds that takes a long time to fetch (like google trends). My ttrss instance have 15s timeout and I wanted morss to stop trying to fetch a little before this time.
So I tried to set MAX_TIME and LIM_TIME, but I didn't succeeded. I don't know the unity (I suppose that it is seconds), but I tried with 10 for both.
I used environment variable for docker :
environment:
MAX_TIME: 10
LIM_TIME: 10
Am I doing something wrong?
Hi,
I can't manage to add a favicon to my morss site. I added favicon.ico in each morss folder to be sure and added this line to index.html header:
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" sizes="32x32" />
I tried other things without success.
When I click the favicon link in the page source I get this:
<!-- The above is a description of an error in a Python program, formatted
for a Web browser because the 'cgitb' module was enabled. In case you
are not reading this in a Web browser, here is the original traceback:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 294, in FeedFetch
req = crawler.adv_get(url=url, follow=('rss' if not options.items else None), delay=delay, timeout=TIMEOUT * 2)
File "/usr/lib/python3.8/site-packages/morss/crawler.py", line 68, in adv_get
con = custom_handler(*args, **kwargs).open(url, timeout=timeout)
File "/usr/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 1379, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.8/urllib/request.py", line 1319, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "/usr/lib/python3.8/http/client.py", line 835, in __init__
self._validate_host(self.host)
File "/usr/lib/python3.8/http/client.py", line 1208, in _validate_host
raise InvalidURL(f"URL can't contain control characters. {host!r} "
http.client.InvalidURL: URL can't contain control characters. "{% static '" (found at least ' ')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 639, in cgi_error_handler
return app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 533, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 633, in cgi_dispatcher
return app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 533, in app_wrap
return func(environ, start_response, app)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 583, in cgi_file_handler
return app(environ, start_response)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 509, in cgi_app
url, rss = FeedFetch(url, options)
File "/usr/lib/python3.8/site-packages/morss/morss.py", line 297, in FeedFetch
raise MorssException('Error downloading feed')
morss.morss.MorssException: Error downloading feed
-->
Hey, there is a bug when trying to run morss with python 3, namely, executing python3 main.py "$FEED"
I get ERROR: '%' must be followed by '%' or '(', found: '%a, %d %b %Y %H:%M:%S %Z\n%a, %d %b %Y %H:%M:%S %Z\n%Y-%m-%dT%H:%M:%SZ\n%Y-%m-%dT%H:%M:%SZ'
.
It seems to be related to the ConfigParser; this might be a related issue: CGATOxford/CGATPipelines#335
Hi!
I have a big issue with morss : the app act like it have a memory leak :
The only way to handle the problem is to restart the container : not very handy :).
I use the container version of morss, building it from https://git.pictuga.com/pictuga/morss.git , without passing argument (so on gunicorn). I rebuild it yesterday without any improvement.
In order to handle this problem, I tried, without any better, to :
I had some problem with morss for connecting to mariadb, but I think it "works" because I see lines in the data tables :
I didn't understand well how some environment variables work (MAX_ITEM & LIM_ITEM), maybe I need to use this?
Thanks for help :).
Hello,
I have 2 websites that give rss link:
https://secouchermoinsbete.fr/feeds.atom
https://consomac.fr/rss/consomac.xml
But morss.it don't give the full text. Could you check ?
Thank you
Hi,
I've installed morss using Docker, and when I type this URL: https://www.fcbarcelona.com/en/football/first-team/news I get these errors:
(but it works if I enter a valid RSS feed URL)
Just getting empty story bodies when trying to parse Re/Code: http://morss.it/recode.net/feed/
sidenote: awesome app, just found it today and is working great on several site for me!
The Morss feed for hacker news is really great, however there's no easy way to access the comments.
I know Morss is more of a general rss tool, but any chance you'd consider adding features to combine the full text of hacker news posts and the comments?
@commit: 5c2151f
Running as CLI, you get this error
root@jolokia:~/git/morss# python2.7 -m morss http://www.***.it/rss/homepage/rss2.0.xml
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 151, in _run_module_as_main
mod_name, loader, code, fname = _get_module_details(mod_name)
File "/usr/lib/python2.7/runpy.py", line 109, in _get_module_details
return _get_module_details(pkg_main_name)
File "/usr/lib/python2.7/runpy.py", line 101, in _get_module_details
loader = get_loader(mod_name)
File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
return find_loader(fullname)
File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
for importer in iter_importers(fullname):
File "/usr/lib/python2.7/pkgutil.py", line 430, in iter_importers
__import__(pkg)
File "morss/__init__.py", line 1, in <module>
from .morss import *
File "morss/morss.py", line 202
'R': ':', 'S': 'www.', , 'T': '#', 'U': '$', 'V': '~', 'W': '!',
^
SyntaxError: invalid syntax
Massimo
I'm trying to use morss with Lifehacker (https://lifehacker.com/rss), and it works great in combined mode, except all items from domains other than lifehacker.com
are stripped out from the feed.
For instance, items with a link to https://lifehacker.com
and https://vitals.lifehacker.com
are kept, but items linking to https://kinjadeals.theinventory.com
or https://theinventory.com
are dropped.
Any ideas on how to avoid this?
There is a little problem when using the full text content for some site: When we use our RSS readers to read something from a site, the server may block our access to the image source inside the article. The server checks our request header to see if this request is from a user that is reading something on their website. Certainly, this behavior can help them away from those annoying web-spider, but not friendly to us RSS user.
So I'm thinking that if we can embed those pictures in the article into the full text feed?
Hi there. I've been using this tool for some time, amazing work, thank you so much.
A few days back, after a lot of frustrations with the YouTube for iOS application I decided to remove all my subscriptions and turn that into RSS feeds.
But since YouTube feeds are crap, I tried using the moRSS but I get empty articles (but with the link to the original video). What I think it'll be great is that if we put a link to embed to the vídeo in the article, so for example in some iOS RSS readers it will use the native player.
The feeds from YouTube comes from the base URL:
http://gdata.youtube.com/feeds/base/users/[USERNAME]/uploads?client=ytapi-youtube-rss-redirect&alt=rss&orderby=updated&v=2
And the embed base URL is:
https://www.youtube.com/embed/[VIDEO_ID]
I will try to hack as soon as possible with your code, but I'm a beginner in Python, and only worked with Python 3 so there are some things that I will have to learn.
Thanks
I'm a complete newbie and I'm really sorry if this is a very simple problem to solve, but I'd be grateful for any guidance. I'm trying to get a local RSS to CSV downloader. This is what happens when I run morss from the terminal (Ubuntu 18.10):
python -m morss debug http://feeds.bbci.co.uk/news/rss.xml
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/david/programas/morss/morss/morss.py", line 14, in <module>
from . import feeds
ValueError: Attempted relative import in non-package
All solutions I find seem to suggest to change the script itself, not anything about the environment or the command.
Thanks!
I tried getting an rss feed of a public facebook page. (fb.com/pageUsername/posts)
But fb requires me to prove that I'm human.
It would be great if fb would accept the moRSS interface as non-robot.
a solution would be to let moRSS access the page through a dummy fb profile, as it is also done by archive.today
List of small bugs / things yet to be implemented
application/xhtml+xml
as rss mimetype (kinda wrong but still the case sometimes...)I'm trying your online tool and it have problems with:
Edit: Removed
3. https://blog.google/rss/ <-- I see that now they now are full text, so it doesn't need to pass by a full-text rss tool
Hello, thank you very much for your tool, it is a rare pearl in the web as it is today.
I'm looking for a job, so I want to synchronize the different sources in my RSS reader, thus I do not have to always check every website. There are 3 french websites for which it is impossible for me to add them:
Apec : https://www.apec.fr/candidat/recherche-emploi.html/emploi?motsCles=data%20scientist&lieux=711&sortsType=DATE
Here, according to Firefox tool's Inspect Element
, the interesting class is container-result
(the cardboard list)
Pôle Emploi : https://candidat.pole-emploi.fr/offres/recherche?lieux=11R&motsCles=data+scientist&offresPartenaires=false&range=0-19&rayon=10&tri=0
Here, the interesting class is result-list list-unstyled
Welcome to the jungle : https://www.welcometothejungle.com/fr/jobs?query=data%20scientist&page=1&aroundQuery=%C3%8Ele-de-France%2C%20France&refinementList%5Boffice.state%5D%5B0%5D=Ile-de-France&refinementList%5Boffice.country_code%5D%5B0%5D=FR
Here, the interesting class is qjc24y-1 jhGOKF
For each URL, I tried to customize the classes in morss.it but it doesn't work. What am I doing wrong?
Thank you for your precious help!
Hello, and sorry if it's a mistake on my part, but when trying to make a rss feed for https://shonumi.github.io/articles.html it only ends up grabbing the first article. I'm using the following command morss --items "//*[class=inner_text_large]" https://shonumi.github.io/articles.html
Using the website and selecting that element selects 5 articles. For some reason the cli version is stopping at the first one.
My version is current as I installed morss today.
Hi Pictuga.
I know, this is an issue request. But first I wanna thank you for this amazing project! For one of my current projects, it works like a charm and has everything I could have asked for.
Now I have one problem though:
I'm using morss in a flask web app, running on apache wsgi. If I try using a SQLite database for cacheing, I receive the following error:
File "/venv/lib/python3.6/site-packages/morss/crawler.py", line 616, in __setitem__ self.con.execute('INSERT INTO data VALUES (?,?,?,?,?,?) ON CONFLICT(url) DO UPDATE SET code=?, msg=?, headers=?, data=?, timestamp=?', (url,) + value + value) sqlite3.OperationalError: near "ON": syntax error
I'm aware that this error is most definitely not caused by morss itself. In fact, I can run the same code from the same venv's BASH and it will work.
Still, I can't get it to work from my flask app (always getting this very same error).
What I have tried:
Hope you can help. Would be much appreciated :)
morss will also try to figure out whether the full content is already in place (for those websites which understood the whole point of RSS feeds). However this detection is very simple, and only works if the actual content is put in the "content" section in the feed and not in the "summary" section.
Many RSS feeds put full content in the "summary" or "description" section. I wander is there a better way to detect it.
Simply set content length threshold not work because there are many short feeds.
How about is_fulltext = num_images > 0 or content_length > 2000
?
moRSS is a great tool for many pages that miss an RSS feed.
But for webpages based on wordpress, it does not parse the actual linked sub-page (e.g. for categories OR tags OR result pages of search queries). Instead it just takes the frontpage of the domain. Also it does not load a preview, where I can pick the CSS elements of the sub-page, I linked.
Hi, I am interested in this project. Previously I was looking at something similar (https://bitbucket.org/fivefilters/full-text-rss) but I was glad to discover morss since I prefer Python over PHP.
I noticed that some feeds are more difficult than others, for example on this one:
http://www.internazionale.it/sitemaps/rss.xml
morss does not really extract the content:
http://test.morss.it/www.internazionale.it/sitemaps/rss.xml
What works well is Firefox reading mode. After some research I found that it is based on the old readability.com javascript code, which is now here: https://github.com/mozilla/readability; it can be run standalone from node.js.
Chrome has a similar functionality in testing and the source for that is here: https://github.com/chromium/dom-distiller; this one seems more complex to tun as it depends on Java...
What is the best way to incorporate more sophisticated algorithms for content extraction in morss ?
And what about customizing the extraction rule on a site-by-site basis ? full-text-rss above has a repository of site-specific extraction rules: https://github.com/fivefilters/ftr-site-config
morss.it says it cannot load this feed: https://www.raggajungle.biz/category/free-downloads/feed
after todays update, morss stopped with following error.
am running on docker
aceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/usr/lib/python3.8/site-packages/gunicorn/workers/base.py", line 129, in init_process
self.load_wsgi()
File "/usr/lib/python3.8/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
return self.load_wsgiapp()
File "/usr/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/lib/python3.8/site-packages/gunicorn/util.py", line 350, in import_app
__import__(module)
File "/usr/lib/python3.8/site-packages/morss/__init__.py", line 3, in <module>
from .wsgi import application
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 20, in <module>
from . import cred
ImportError: cannot import name 'cred' from partially initialized module 'morss' (most likely due to a circular import) (/usr/lib/python3.8/site-packages/morss/__init__.py)
[2020-08-24 15:12:28 +0000] [8] [INFO] Worker exiting (pid: 8)
[2020-08-24 15:12:28 +0000] [9] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/usr/lib/python3.8/site-packages/gunicorn/workers/base.py", line 129, in init_process
self.load_wsgi()
File "/usr/lib/python3.8/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
return self.load_wsgiapp()
File "/usr/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/lib/python3.8/site-packages/gunicorn/util.py", line 350, in import_app
__import__(module)
File "/usr/lib/python3.8/site-packages/morss/__init__.py", line 3, in <module>
from .wsgi import application
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 20, in <module>
from . import cred
ImportError: cannot import name 'cred' from partially initialized module 'morss' (most likely due to a circular import) (/usr/lib/python3.8/site-packages/morss/__init__.py)
[2020-08-24 15:12:28 +0000] [9] [INFO] Worker exiting (pid: 9)
/usr/lib/python3.8/os.py:1023: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
return io.open(fd, *args, **kwargs)
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 210, in run
self.sleep()
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 360, in sleep
ready = select.select([self.PIPE[0]], [], [], 1.0)
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 245, in handle_chld
self.reap_workers()
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 525, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/gunicorn", line 11, in <module>
load_entry_point('gunicorn==19.9.0', 'console_scripts', 'gunicorn')()
File "/usr/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/usr/lib/python3.8/site-packages/gunicorn/app/base.py", line 223, in run
super(Application, self).run()
File "/usr/lib/python3.8/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 232, in run
self.halt(reason=inst.reason, exit_status=inst.exit_status)
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 345, in halt
self.stop()
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 393, in stop
time.sleep(0.1)
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 245, in handle_chld
self.reap_workers()
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 525, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
[2020-08-24 15:12:28 +0000] [10] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/usr/lib/python3.8/site-packages/gunicorn/workers/base.py", line 129, in init_process
self.load_wsgi()
File "/usr/lib/python3.8/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
return self.load_wsgiapp()
File "/usr/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/lib/python3.8/site-packages/gunicorn/util.py", line 350, in import_app
__import__(module)
File "/usr/lib/python3.8/site-packages/morss/__init__.py", line 3, in <module>
from .wsgi import application
File "/usr/lib/python3.8/site-packages/morss/wsgi.py", line 20, in <module>
from . import cred
ImportError: cannot import name 'cred' from partially initialized module 'morss' (most likely due to a circular import) (/usr/lib/python3.8/site-packages/morss/__init__.py)
[2020-08-24 15:12:28 +0000] [10] [INFO] Worker exiting (pid: 10)
This feed used to work until a couple of days ago when it started returning the error "Error downloading feed (Invalid SSL Certificate)". Without parsing it through morss, the feed loads fine in Firefox.
I guess this is not related to morss but is this certificate check mandatory?
http://morss.it/https://authoritynutrition.com/feed
Thanks for the great project.
when I ask morss to create an rss feed of a sub-page linked in the menu of a webpage based on wix.com, there is a problem with the entry titles.
This is how the title of the rss feed content should look like:
"Fakten über Pandemie-Impfstoffe"
This is how it actually looks like, until now:
Team DKDE Apr 13 2 Min. Fakten über Pandemie-Impfstoffe Fakt 1.1 "Thiomersal in Impfstoffen Die Pandemieimpfstoffe werden 5 μg bzw. 25 μg Thiomersal (entsprechend 2,5 μg bzw. 12,4 μg Quecksilber) pro Dosis enthalt... 0 Ansichten Kommentar verfassen
moRSS' class picker is great tool. Loved using it:
https://morss.it/:items=%7C%7C*%5Bclass=_2eqGx%5D/https://www.dkde.online/blog
I was attempting to try your test site. It's giving a "Forbidden error"
Couldn't load feed https://blog.path.net/.
Please try again later, or report on GitHub.
curl https://blog.path.net/
<html>
<head><title>307 Temporary Redirect</title></head>
<body>
<center><h1>307 Temporary Redirect</h1></center>
<hr><center>openresty</center>
</body>
</html>
Looks like that site have anti bot protection
Hi!
I'd love to try your lib, but the simples example give me an error. I am trying to use morss as a library with the code
import morss
xml_string = morss.process('http://feeds.bbci.co.uk/news/rss.xml')
but i've got this error:
Traceback (most recent call last):
File "a.py", line 3, in <module>
xml_string = morss.process('http://feeds.bbci.co.uk/news/rss.xml')
AttributeError: 'module' object has no attribute 'process'
Can you help me?
Hi
I am just starting to use morss and there are many rss feeds that return ERROR: Link provided is not a valid feed
python -m morss debug http://www.megabolsa.com/feed
'random page'
u'text/html'
ERROR: Link provided is not a valid feed
python -m morss debug http://rss.elconfidencial.com/espana/
'random page'
'text/xml, charset=UTF-8'
ERROR: Link provided is not a valid feed
python -m morss debug http://rss.elconfidencial.com/mundo/
'random page'
'text/xml, charset=UTF-8'
ERROR: Link provided is not a valid feed
They seem pretty valid to me and work ok in several RSS readers
Any quick tip about the source of the problem before diving deep into the code?
Thanks for your time
Hi!
I managed to obtain a nice rss from the result of search. This is a list of files. The problem is that there is an added date that i can't manage to add. I don't know nothing about Xpath...
Here is the date xpath of the 8th file (tr[8])
:item_time=||html|body|div[9]|main|div|div|section[3]|div|table|tbody|tr[8]|td[5]|div/
How can I write this so each item have its own date?
Hi,
I try to morss this url : http://www.lequipe.fr/Xml/actu_rss.xml
Morss.it only returns 9 articles although there are many more.
Is there a limitation on numbers of returned items?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.