keithcu / linuxreport Goto Github PK
View Code? Open in Web Editor NEWCustomizable Linux news site based on Python / Flask
Home Page: https://covidreport.keithcu.com/
License: GNU Lesser General Public License v3.0
Customizable Linux news site based on Python / Flask
Home Page: https://covidreport.keithcu.com/
License: GNU Lesser General Public License v3.0
This request hanged:
https://www.google.com/alerts/feeds/12151242449143161443/16985802477674969984
That means that anyone fetching a page will never return either.
It seems to be working again now, but it would be nice to have some socket timeout logic of 5 seconds or so max.
Also, I could have some logic to temporarily cache an empty feedparser result so that anyone who was waiting could just use that value and not block the whole site.
would be a nice feature. I just need to grab some reasonable colors.
https://pythonhosted.org/feedparser/http-etag.html
To use this you have to save off some data. It also means that instead of expiring the whole feed to trigger a refresh, actually keep it around and use some other mechanism than cache expiration to trigger when to refetch the data.
On startup, or when multiple feeds have expired, it will sequentially fetch the RSS feeds. That's a little slow when there are 9 or more to fetch, and some sites take 1.5 seconds. Note that most users won't have this problem because it's only 1 request per hour or so. Also, there is a little jitter to spread out the requests so it's unusual to have more than a few fetches.
It would be faster to switch to multi-process or multi-threading to allow multiple fetches to happen at the same time.
It would be simplest to use multi-process, but that could mean that each of the ~10 Python engines that respond to Apache requests would probably have their own pool of 2-5 processes sitting around.
It could also be sped up by creating multiple threads which Python now supports.
Ideally it would be done in an async way. You could queue up 2-9 requests async with one thread, which would mostly be sitting for .5 to 1.5 seconds waiting for a response.
I think creating a pool of Python threads is the best solution here because they are lighter weight than Linux threads and obviously processes, and the logic is very simple.
Because this uses a file system cache, a solution using either processes or threads should work.
The mobile order isn't the same as the desktop. One way to fix it would just be to not create 3 columns. The question of whether it's a mobile request can be determined in CSS, but this needs it in the Python.
Currently the system fetches usually every hour, or every 6 hours (for sites that usually update just once per day.) It does this 24/7.
It should be possible to apply some machine learning per feed to have the system figure out when the site usually updates, and then only make requests around then. This could be done manually (by keeping track of a week's worth of updates), or by applying some machine learning algorithms. It would be great if it could keep learning over time.
This would also be better for the sites that update once per day, because it could try to catch them soon after they are usually posted, rather than up to 6 hours later.
I'm sure there are more URLs that could be worth adding to the page. People can add it themselves already, but it's a bit of work to dig up the RSS feed.
It would be nice to have some extra ones that aren't necessarily shown by default, but can be easily chosen without having to track down the RSS URL.
When a feed has expired, if two people come to the website at the same time, it will possibly fetch for both of them. Implement some logic which if the feed has expired:
Check for URL + "FETCH" in the cache
if it doesn't exist, then create an entry containing this PID / TID
Then check to make sure it's our PID / TID
If it already exists and it's not our PID / TID, sleep for 100 ms and keep checking till cache entry disappears
If it is, then fetch the RSS feed, add to cache, and then delete the URL + FETCH cache entry
The website works with SSL, but it has hard-coded "http:" in various places for images, which need to be fixed.
Is it worth using bootstrap or some custom CSS to look a little prettier?
The jitter is a nice feature to spread out the requests slightly. That way when someone shows up, an hour after the server started, they will only have to fetch one request, as the other hourly requests will actually expire over the next 5 minutes.
However, it only works for short page cache lengths. Right now, with a page cache of 10 minutes, when that expires, the next user is often having to wait for multiple fetches. So either shorten the page cache to around 1 minute, or take out the jitter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.