GithubHelp home page GithubHelp logo

keithcu / linuxreport Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 2.0 3.07 MB

Customizable Linux news site based on Python / Flask

Home Page: https://covidreport.keithcu.com/

License: GNU Lesser General Public License v3.0

Python 88.78% HTML 11.22%
flask linux newsfeed python

linuxreport's People

Contributors

keithcu avatar wommy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

tkulick wommy

linuxreport's Issues

Add timeout

This request hanged:
https://www.google.com/alerts/feeds/12151242449143161443/16985802477674969984

That means that anyone fetching a page will never return either.

It seems to be working again now, but it would be nice to have some socket timeout logic of 5 seconds or so max.

Also, I could have some logic to temporarily cache an empty feedparser result so that anyone who was waiting could just use that value and not block the whole site.

A few architectural issues

  • Consider refactoring the two URL dictionaries into one, and two URL lists for each website.
  • It makes it easier to share feeds between both sites.
  • Most sites return results in less than 1s, but some websites take 2 or 3 seconds. Try to return stale version, and create a thread to fetch. This way typically no one will ever have to wait for an HTTP fetch.

Dark mode

would be a nice feature. I just need to grab some reasonable colors.

Faster when multiple feeds have expired

On startup, or when multiple feeds have expired, it will sequentially fetch the RSS feeds. That's a little slow when there are 9 or more to fetch, and some sites take 1.5 seconds. Note that most users won't have this problem because it's only 1 request per hour or so. Also, there is a little jitter to spread out the requests so it's unusual to have more than a few fetches.

It would be faster to switch to multi-process or multi-threading to allow multiple fetches to happen at the same time.

It would be simplest to use multi-process, but that could mean that each of the ~10 Python engines that respond to Apache requests would probably have their own pool of 2-5 processes sitting around.

It could also be sped up by creating multiple threads which Python now supports.

Ideally it would be done in an async way. You could queue up 2-9 requests async with one thread, which would mostly be sitting for .5 to 1.5 seconds waiting for a response.

I think creating a pool of Python threads is the best solution here because they are lighter weight than Linux threads and obviously processes, and the logic is very simple.

Because this uses a file system cache, a solution using either processes or threads should work.

Mobile order

The mobile order isn't the same as the desktop. One way to fix it would just be to not create 3 columns. The question of whether it's a mobile request can be determined in CSS, but this needs it in the Python.

Smarter refreshing via machine learning

Currently the system fetches usually every hour, or every 6 hours (for sites that usually update just once per day.) It does this 24/7.

It should be possible to apply some machine learning per feed to have the system figure out when the site usually updates, and then only make requests around then. This could be done manually (by keeping track of a week's worth of updates), or by applying some machine learning algorithms. It would be great if it could keep learning over time.

This would also be better for the sites that update once per day, because it could try to catch them soon after they are usually posted, rather than up to 6 hours later.

More URLs

I'm sure there are more URLs that could be worth adding to the page. People can add it themselves already, but it's a bit of work to dig up the RSS feed.

It would be nice to have some extra ones that aren't necessarily shown by default, but can be easily chosen without having to track down the RSS URL.

Prevent excess RSS requests

When a feed has expired, if two people come to the website at the same time, it will possibly fetch for both of them. Implement some logic which if the feed has expired:
Check for URL + "FETCH" in the cache
if it doesn't exist, then create an entry containing this PID / TID
Then check to make sure it's our PID / TID

If it already exists and it's not our PID / TID, sleep for 100 ms and keep checking till cache entry disappears
If it is, then fetch the RSS feed, add to cache, and then delete the URL + FETCH cache entry

Jitter doesn't work with long page cache expirations

The jitter is a nice feature to spread out the requests slightly. That way when someone shows up, an hour after the server started, they will only have to fetch one request, as the other hourly requests will actually expire over the next 5 minutes.

However, it only works for short page cache lengths. Right now, with a page cache of 10 minutes, when that expires, the next user is often having to wait for multiple fetches. So either shorten the page cache to around 1 minute, or take out the jitter.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.