feedhq / feedhq Goto Github PK

FeedHQ is a web-based feed reader

License: BSD 3-Clause "New" or "Revised" License

Shell 0.05% Ruby 0.11% Makefile 0.26% Python 77.43% CSS 8.34% JavaScript 1.52% HTML 12.30%

atom-feed atom-feed-parser django elasticsearch feed-reader feeds postgresql python redis rss rss-aggregator rss-feed rss-feed-parser

feedhq's Introduction

FeedHQ

FeedHQ is a simple, lightweight web-based feed reader. Main features:

User-facing features

RSS and ATOM support
Grouping by categories
Awesome pagination and intelligent browsing
Great readability on all screen sizes (smatphones, tablets and desktops)
Mobile-first, retina-ready
Reading list management with Instapaper, Readability or Read It Later support
Filter out already read entries
Hides images/media by default (and therefore filters ads and tracking stuff)
Multiple user support
OPML import
Syntax highlighting, awesome for reading tech blogs
Keyboard navigation
Subtome support

Developer- / Sysadmin-facing features

Nice with web servers, uses ETag and Last-Modified HTTP headers
Handles HTTP status codes nicely (permanent redirects, gone, not-modified…)
Exponential backoff support
PubSubHubbub support

Installation

Requirements:

Python 3.4 or greater
Redis (2.6+ recommended)
PostgreSQL (9.2+ recommended but anything >= 8.4 should work)
Elasticsearch (see compatibility table below)

Getting the code:

git clone https://github.com/feedhq/feedhq.git
cd feedhq
virtualenv -p python2 env
source env/bin/activate
add2virtualenv .
pip install -r requirements.txt

Elasticsearch version requirements:

Commit hash	Date	ES version
Up to 3aea18	Sep 18, 2014	1.1
From 6d5cbc	Oct 28, 2014	1.3

Configuration

FeedHQ relies on environment variables for its configuration. The required environment variables are:

DJANGO_SETTINGS_MODULE: set it to feedhq.settings.
SECRET_KEY: set to a long random string.
ALLOWED_HOSTS: space-separated list of hosts which serve the web app. E.g. www.feedhq.org feedhq.org.
FROM_EMAIL: the email address that sends automated emails (password lost, etc.). E.g. FeedHQ <[email protected]>.
REDIS_URL: a URL for configuring redis. E.g. redis://localhost:6354/1.
DATABASE_URL: a heroku-like database URL. E.g. postgres://user:password@host:port/database.

Optionally you can customize:

DEBUG: set it to a non-empty value to enable the Django debug mode.
MEDIA_ROOT: the absolute location where media files (user-generated) are stored. This must be a public directory on your webserver available under MEDIA_URL.
MEDIA_URL: the URL that handles media files (user-generated) served from MEDIA_ROOT. By default, it is set to /media/.
STATIC_ROOT: the absolute location where static files (CSS/JS files) are stored. This must be a public directory on your webserver available under the /static/ URL.
STATIC_URL: the URL that serves static files (CSS/JS files) located in STATIC_ROOT. By default, it is set to /static/.
SENTRY_DSN: a DSN to enable Sentry error reporting.
SESSION_COOKIE_PATH: the path set on the session cookie. E.g., /.
HTTPS: set-it to a non-empty value to configure FeedHQ for SSL access.
EMAIL_URL: a URL for configuring email. E.g. smtp://user:password@host:port/?backend=my.EmailBackend&use_tls=true. The backend querystring parameter sets the Django EMAIL_BACKEND setting. By default emails only go to the development console.
ES_NODES: space-separated list of elasticsearch nodes to use for cluster discovery. Defaults to localhost:9200.
ES_INDEX: the name of the elasticsearch index. Defaults to feedhq.
ES_ALIAS_TEMPLATE: string template to format the per-user alias that is created in elasticsearch. Defaults to feedhq-{0}.
ES_SHARDS: number of shards that you want in your elasticsearch index. This cannot be changed once you have created the index. Defaults to 5.
ES_REPLICAS: number of times you want your shards to be replicated on the elasticsearch cluster. Defaults to 1. Set it to 0 if you only have one node. This is only used at index creation but the index setting can be altered dynamically.
HEALTH_SECRET: a shared secret between your FeedHQ server(s) and your monitoring. The /health/ endpoint can be protected by requiring clients to provide a shared secret in an X-Token header. If no secret is set, the health endpoint is open to all.
LOG_SYSLOG: set it to 1 to send logs to /dev/log instead of stdout (the default). Logs are formatted in JSON.

For integration with external services:

READITLATER_API_KEY: your Pocket API key.
INSTAPAPER_CONSUMER_KEY, INSTAPAPER_CONSUMER_SECRET: your Instapaper API keys.
READABILITY_CONSUMER_KEY, READABILITY_CONSUMER_SECRET: your Readability API keys.

Then deploy the Django app using the recipe that fits your installation. More documentation on the Django deployment guide. The WSGI application is located at feedhq.wsgi.application.

To create the Elasticsearch index:

django-admin.py create_index

Then you'll need to create the appropriate PostgreSQL database and run:

django-admin.py syncdb
django-admin.py migrate

Note that additionally to the web server, you need to run one or more consumers for the task queue. This is done with the rqworker management command:

django-admin.py rqworker store high default low favicons

The arguments are queue names.

Once your application is deployed (you've run django-admin.py syncdb to create the database tables, django-admin.py migrate to run the initial migrations and django-admin.py collectstatic to collect your static files), you can add users to the application. On the admin interface, add as many users as you want. Then add some some categories and feeds to your account using the regular interface.

Crawl for updates:

django-admin.py sync_scheduler
django-admin.py updatefeeds

Set up a cron job to update your feeds on a regular basis. This puts the oldest-updated feeds in the update queue:

*/5 * * * * /path/to/env/django-admin.py updatefeeds

The updatefeeds command puts 1/12th of the feeds in the update queue. Feeds won't update if they've been updated in the past 60 minutes, so the 5-minute period for cron jobs distributes nicely the updates along the 1-hour period.

A cron job should also be set up for picking and updating favicons (the --all switch processes existing favicons in case they have changed, which you should probably do every month or so):

@monthly /path/to/env/bin/django-admin.py favicons --all

Here is a full list of management commands that you should schedule:

add_missing creates the missing denormalized URLs for crawling. Since URLs are denormalized it's recommended to run it every now and then to ensure consistency.

Recommended frequency: hourly.

Resource consumption: negligible (2 database queries).
delete_unsubscribed is the delete counterpart of add_missing.

Recommended frequency: hourly.

Resource consumption: negligible (2 database queries).
favicons --all forces fetching the favicons for all existing URLs. It's useful for picking up new favicons when they're updated. Depending on your volume of data, this can be resource-intensive.

Recommended frequency: monthly.

Resource consumption: the command itself only triggers async jobs but the jobs perform network I/O, HTML parsing, disk I/O and database queries.
updatefeeds picks 1/12th of the URLs and fetches them.

Recommended frequency: every 5 minutes.

Resource consumption: the command itself only triggers async jobs but the jobs perform network I/O, HTML parsing and -- when updates are found --database queries.
sync_scheduler adds missing URLs to the scheduler. Also useful to run every now and then.

Recommended frequency: every hour.

Resource consumption: one large database query per chunk of 10k feeds which aren't in the scheduler, plus one redis HMSET per URL that's not in the scheduler. As a routine task it's not resource-intensive.
sync_pubsubhubbub unsubscribes from unneeded PubSubHubbub subscriptions.

Recommended frequency: once a day.

Resource consumption: low.
clean_rq removes stale RQ jobs.

Recommended frequency: once a day.

Resource consumption: low. Only makes requests to Redis.
delete_old removes expired entries as determined by each user's entry TTL.

Recommended frequency: once a day.

Resource consumption: medium, makes delete_by_query queries to ES (1 per user).
delete_expired_tokens removes expired API tokens. Tokens are valid for 7 days, after which they are renewed by client apps.

Recommended frequency: once a day.

Resource consumption: low (one DELETE query).

Development

Install the development requirements:

pip install -r requirements-dev.txt

Run the tests:

make test

Or if you want to run the tests with django-admin.py directly, make sure you use feedhq.test_settings as the DJANGO_SETTINGS_MODULE environment variable to avoid making network calls while running the tests.

The Django debug toolbar is enabled when the DEBUG environment variable is true and the django-debug-toolbar package is installed.

Foreman is used in development to start a lightweight Django server and run RQ workers. Environment variables are managed using a python port of Daemontools' envdir utility. A running Redis server is required for this workflow:

make run

When running django-admin.py updatefeeds on your development machine, make sure you have the DEBUG environment variable present to avoid making PubSubHubbub subscription requests without any valid callback URL.

Environment variables for development are set in the envdir directory. For tests, they are located in the tests/envdir directory.

When working on frontend assets (SCSS or js files), watchman can be used to automatically run compass and uglify on file changes. Install watchman, Compass (gem install bundle && bundle install) and npm (part of nodejs) to get started. Then run:

make watch

Once you're done working with assets, simply kill watchman:

pkill watchman

feedhq's People

Contributors

Stargazers

Watchers

feedhq's Issues

feed updation

I have locally installed feedhq on Ubuntu 12.04 VM and have subscribed to few feeds from UI. Following are some of my queries,

Since i need to use proxy for internet access, what all changes are required for "updatefeeds" to work? in Uniquefeed model,i tried supplying "proxies" parameter to requests.get(...) but its not working so i am definitely on wrong track here.
When i run updatefeeds, it doesnt give any output. How can i debug this or how can i refer to log that it generates
Since mine is local installation, i am not sure how feeds that are published through HUB (superfeedr etc.) will get updated. Also how Feedhq handles those subscriptions which are not published through any HUB, does it poll for updates in such cases?

Thanks

Add "mark this page as read" option

Because "mark all as read" marks everything.

Maybe I have readed page 1 of 5 and nothing there interests me anymore, so marking only the items in page 1 as read would be helpful.

English interface

I'm not sure it's a bug but that's certainly an issue: when I got on the feedhq, I had the whole project blurb in English.
At registration, still in English.
At login, in English.
But now I've got the interface in French (which is great).

Couldn't it be better to have the whole interface (marketing stuff and registration process) translated?

Integrate transifex

The content is 100% translateable, transifex would be a nice interface for this.

Error when browsing new entries

When browsing unread entries, clicking on "previous" doesn't display the previous entry you just read. What seem to happen is that the "previous" link (mapped to the left arrow) displays the previous unread entry.

Is this the intended behaviour?

When an item has both content and description, only content should be displayed

One of the feeds I read (http://feeds.feedburner.com/typepad/OEkF) has both a content and a description for each item. However, the description is just a cut-off version of the content. I believe in this case, the description should not be displayed.

OPML export

There is OPML import but export should be possible as well, most likely from the profile edition page.

Add full view

List view is fine sometimes, but a full view is faster other times. Let the user choose, please.

Grouping

Entries with the same URL should be grouped across the same user, with a list of feeds as "tags". Otherwise getting someone's updates from their blog + planet X + planet Y is annoying.

Link inconsistency on the main FeedHQ link.

Since the resolution of #31 (thank you), sometimes the main "FeedHQ" link (top left) will go to "unread" but sometimes to the whole list.

I think I identified the case: I go on a feed on its list of unread items, I click the "mark all as read" button. Then I'm redirected on the feed, but on its main list with all items, not the "unread" ones.

And on this page the "FeedHQ" button is not set to "unread" (maybe because I'm not on an "unread" page).

add read it later support

https://readitlaterlist.com/api/

[feature req] per feed option for headline to link direct to external site

I have a mixture of sites which include all the content from posts in their rss feeds and others which have just a single trimmed sentence. For the latter I'd much prefer if I was able to have headlines from these feeds link directly to the external site rather than having to click in to see the truncated content in feedhq and then click the headline again to actually get to the content.

Add a settings to automatically load feed images

It could be useful for image-based feeds.

Tooltip with full title on list pages

Hi,

New sites can sometimes have long titles for their articles, so long that they get truncated on the list page. It would be handy if their was a tooltip which contained the full title of the item.

Thanks,
Floris

README gives the SSH access

One cannot just follow the README:

$ git clone [email protected]:feedhq/feedhq.git 
[…]
Permission denied (publickey).

Better use the Git or HTTP URL.

Check if Instapaper is up

Yesterday, Instapaper was down but FeedHQ was still showing a "Article successfully added to your reading list" when clicking "Add to Instapaper". Should return an error message in this case.

Footnotes styling

Footnote markers (mainly sup tags) are not really styled yet.

Handle relative links

I just clicked on a link in HTML content of a Github feed and got a 404 on feedhq.org.

Otherwise, great service. Keep it up! I'd flattr you.

View oldest first

I prefer to read older posts first, so I'd like an option to view items oldest to newest. (This makes most sense when viewing unread items.)

I can partially solve this now by viewing the last page of the unread view Bookmarking a page that is higher than I'd ever have (e.g. https://feedhq.org/unread/100/) gets me to the oldest posts every time. But I'd still prefer the option, which would ideally include switching the "previous unread" and "next unread" in this mode.

Pillow installation Bad MD5 hash for package

I am trying to install Feedhq locally on Ubuntu 12.04 VM. While running "pip install -r requirements.txt" , it halts while installing pillow with error "InstallationError: Bad MD5 hash for package....". Any idea?

Don't cut title in grid view

Or, at least, full title should be shown on hover.

`<ol>` styling

They're currently not styled.

Mark all as read

Ability to mark all entries as read

API

Hypermedia, restful and everything, there should be an API.

Categories, Feeds: read-write
Entries: read-only except for the read field
Auth: xAuth?

infinite scrolling

It could be useful, on the main page, to detect the user position and to automatically load the next pages, so the users don't have to use the pagination.

Delete account

Ability to completely remove a user account from the profile.

Add a link to the entry at the end of it.

When reading long feeds, I sometime want to see the comments for it. What I'm doing currently is to go back to the begining of the feed and then click on the link. It could be cool to have this link at the end of the entry as well.

As this could be considered redundant, we can probably only do that for "long" feeds, and not for "short" ones. (though I'm not sure of what "short" and "long" really means in this context).

About the clickable area of the "mark all as read"

First, i think the button is too small. But maybe it's my point of view.

But there is a visual problem because the highlighted node when hovering the button is not the link itself but a bigger dom node (after seeing the source: the hovering is on the form but the input is smaller than the form)

PS: a tiny tiny little detail: the hover of the form is not vertically aligned with the previous links (by 1 ipxel)

Deployment docs

Explain how to deploy an instance.

Search

Implement search in the archives. Please.

Rethink layout

The current grid is fixed, not fluid. Fluid grids are pretty cool for responsive sites as @uggedal suggested.

some feeds are truncated

Some feeds (wordpress blogs) are truncated, even if the feed contain all the entry into it.
I found that at least with https://blog.mozilla.com/webdev/feed/

Deployment problem

Hello,

I want to try feedhq on my kimsufi server.

The apache configuration seems to be good (if I try with a "hello world" file found here https://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide it works).
But when I replace my "hello world" code with feedhq wsgi.py file, I've got a 500 error :

[Tue Mar 19 10:10:56 2013] [error] [client xx.xx.xx.xx] mod_wsgi (pid=3995): Target WSGI script '/var/www/feedhq/feedhq/wsgi.py' cannot be loaded as Python module.
[Tue Mar 19 10:10:56 2013] [error] [client xx.xx.xx.xx] mod_wsgi (pid=3995): Exception occurred processing WSGI script '/var/www/feedhq/feedhq/wsgi.py'.
[Tue Mar 19 10:10:56 2013] [error] [client xx.xx.xx.xx] Traceback (most recent call last):
[Tue Mar 19 10:10:56 2013] [error] [client xx.xx.xx.xx]   File "/var/www/feedhq/feedhq/wsgi.py", line 3, in <module>
[Tue Mar 19 10:10:56 2013] [error] [client xx.xx.xx.xx]     from django.core.wsgi import get_wsgi_application
[Tue Mar 19 10:10:56 2013] [error] [client xx.xx.xx.xx] ImportError: No module named wsgi

Here is my apache file :

<VirtualHost *:80>
    ServerAdmin [email protected]
    ServerAlias feedhq.mydomain.com

    DocumentRoot /var/www/feedhq
    <Directory />
        Order Deny,Allow
        Deny from all
        Options None
        AllowOverride None
    </Directory>

    WSGIScriptAlias / /var/www/feedhq/feedhq/wsgi.py

    <Directory /var/www/feedhq>
        Order Allow,Deny
        Allow from all
        Options -Indexes -ExecCGI
    </Directory>

    ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
    <Directory "/usr/lib/cgi-bin">
        AllowOverride None
        Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
        Order allow,deny
        Allow from all
    </Directory>

    ErrorLog ${APACHE_LOG_DIR}/error.log

    # Possible values include: debug, info, notice, warn, error, crit,
    # alert, emerg.
    LogLevel warn

    CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

It's my first deployment with python, so maybe I made mistakes ...

[css] can't click on the button !

see the screenshot

http://oi45.tinypic.com/ogcdxs.jpg

I think there's a wrap missing somewhere in the CSS so the button cannot be overlayed

[feature req] hide / show category on front page

I would find feedhq even better if I could easily choose to hide /show particular categories from display on the front page. Perhaps just a line of color coded category names across the top where the all | unread | mark all as read links are, which are clickable to hide (and then click again to return to including) all entries from that category.

navigation with "jk"

It could be nice to be able to navigate the feeds using the vim binding "jk" (or maybe be able to configure this?)

Remove the paginator if there is only one page

The "edit a feed" link image is not obvious

Currently, it is a "tick", and I have to say it is a little bit confusing (because nothing is really ticked).

Global flag for allowing external media

Some people don't bother at all, add a flag at the profile level to globally allow external images embedding.

Make the title optional when adding a feed

The title can be fetched in the feed data directly.

Remember the all/new settings

On the home page, i want to see only new items, but when i come back to feedhq, the "all" button is selected

Importing the same OPML file creates duplicate feeds

I was having issues with 500 errors with my Google Reader export. It was failing on number 4 of 24. Removed number 4 and it ran fine. After some experimentation it seemed like the combination was somehow causing it to fail.

However, after importing the same file multiple times, I now have duplicates. Be nice if there was some duplication resolution.

Import Google Reader archives

Since Google Reader is to be shut down on July 1st (2013), that'd be very handy to allow users to import not only their OPML feeds, but their archive files, too. One can export it via Google Takeout. it's a few JSON files (zipped), along with the account OPML file.

security problem ?

Hi @feedhq !
Maybe it's not a bug, but I think that HTML doesn't have to be parsed.
By example, this article http://openweb.eu.org/articles/html-media-capture displays a form in feedhq, instead of displaying html code like this

<form action="index.php" method="post" enctype="multipart/form-data">
  <input type="file" name="image" accept="image/*" capture />
  <input type="submit" value="Upload" />
</form>```

Images handling on the mobile site

Specifically on retina devices: images can be tapped to be scaled to the actual pixel density, and they're wrapped in a scrollable div if they're larger than the viewport.

Sometimes tapping only fixes the horizontal size, not vertical.

When images are wrapped in <a> tags, they can't be properly tapped.

The alt text should also be shown on mobile devices.

Alternative async backend

Redis is a requirement for listen/notify tasks. It could be a "NO-GO" for people like me with shared hosting and no access to a dedicated redis instance.
It might be possible to store async-task-related data in the same db (using django models, for example). doable ? simply ? your turn.

More details on https://www.subtome.com/developers.html

tahnks!

Unread by default

I nearly always read only unread items, so I'd like an option to make the unread view the default view.

(Clearly bookmarking https://feedhq.org/unread/ nearly solves this issue, but the option would still be nice if it's easy to do.)