GithubHelp home page GithubHelp logo

Comments (7)

nroi avatar nroi commented on August 19, 2024 9

Feature draft

This post is intended to summarize all information required to implement this feature, as well as information about what value this feature adds to Flexo.

Problem description:

Database files are currently not cached. With a large number of clients, this can add up in traffic. This is relevant especially for users with a slow internet connection or an ISP that throttles speed after a given amount of data has been downloaded (see also: #82 (comment)).

Background information:

Originally, it was not planned to implement any kind of caching for database files to avoid that Flexo serves any outdated files. However, it turns out that it should actually be possible to implement some kind of caching:
Consider the case when pacman is used without Flexo. When pacman requests a database file, then it sends the If-Modified-Since header. The remote mirror then either serves this file as usual if the database file on the remote mirror is more recent than the header, or it just returns 304 Not Modified no more up-to-date file is available.
We therefore aim to implement something comparable for Flexo: If a new database file is available at the remote mirror, then Flexo should always serve this file instead of a stale, cached version. On the other hand, if Flexo already has the database file in a version that is more recent or just as recent as the version on the remote mirror, then no new download from a remote mirror should be required.

Proposed solution:

  • Flexo stores database files locally.
  • Flexo behaves like pacman against the remote mirror: it sends the If-Modified-Since when requesting database files. The value of this header should be the Modify or Change timestamp of the database file (need to find out which one pacman uses).
  • If the remote mirror then responds with 304, we just assume that the locally cached version is not stale, and serve this one to the requesting client.
  • If the remote mirror responds with 2xx, then we overwrite the locally stored version with the payload served by the remote mirror.

from flexo.

nroi avatar nroi commented on August 19, 2024 2

@Zebradil Thanks for pointing this out. pacman sends the If-Modified-Since header, for example:

If-Modified-Since: Sun, 30 Jan 2022 10:17:26 GMT

Which means that the mirror may respond with a 304 Not Modified instead of sending the entire payload.

The timestamp seems to be set according to the Modify or Change timestamp of the file in /var/lib/pacman/sync. If you run sudo touch -m /var/lib/pacman/sync/core.db, then pacman sends a new If-Modified-Since timestamp.

It makes sense for flexo to behave like pacman, so this is something that should change in flexo.

from flexo.

Zebradil avatar Zebradil commented on August 19, 2024 1

I also see an opportunity of improvement here. Maybe it make sense to check how pacman handles this, because, when I don't use flexo, database files are cached somehow.

sudo pacman -Sy
:: Synchronizing package databases...
 core is up to date
 extra is up to date
 community is up to date
 multilib is up to date

But when I use flexo, the database files are always being downloaded.

I can't check how pacman works right now, but I'll try to figure this out later.

from flexo.

nroi avatar nroi commented on August 19, 2024

Hi @bernhard-da,
although the logs may look like there is some kind of problem, Flexo works as intended here: Notice that the messages saying xxx is not available and xxx was unavailable at all remote mirrors only appear for those files that end with .db.sig, but .db files are served just fine. Flexo does not find db.sig files because they are simply not available at the remote mirror. Have a look at this thread where one of the Arch Linux maintainers explains:

Because the databases are not signed yet. The process for doing that is still being worked out...

So, the current status (even if you don't use Flexo) is that Pacman requests those files, receives a 404 response and then just silently ignores the response.

As I have quite a large number of internal clients the traffic (e.g from community.db) adds up over time.

Files ending with .db are another story: Flexo serves the .db files, but it does not cache them. This is intentional, and it cannot be changed at this moment. If Flexo would cache database files like normal files, then clients would eventually receive outdated database files. Of course, one could implement some special caching logic for database files and only cache them for a configurable duration (e.g., so you can configure Flexo to serve the database from cache if the cached version is not more than one hour old). But I decided against this because I found that the benefit does not justify the added complexity. The community.db file is currently just ~ 6 MB, so I never saw an issue in downloading this file a couple of times.

May I ask how fast your internet connection is? Did you notice this behavior because pacman was slow to download the database files, or did you notice this just by inspecting Flexo's logs?

from flexo.

bernhard-da avatar bernhard-da commented on August 19, 2024

hi @nroi
thx a lot for your detailled answer; indeed I was not really wondering about the .sig files but the the [CACHE MISS] for the .db files;

your explanation does make perfect sense. to answer your question:

May I ask how fast your internet connection is? 
Did you notice this behavior because pacman was slow to download the database files, or did you notice this just by inspecting Flexo's logs?

yes, i have a unreliable internet-connection which is often slow too (max around 20mbit down) and also my isp throttles speeds after a specific amount of downloaded data; so i realized that pacman was slow (on many clients) downloading the same .db files and I also monitored the (total) size of downloaded .db files was quite high.

from flexo.

nroi avatar nroi commented on August 19, 2024

i have a unreliable internet-connection which is often slow too (max around 20mbit down) and also my isp throttles speeds after a specific amount of downloaded data; so i realized that pacman was slow (on many clients) downloading the same .db files and I also monitored the (total) size of downloaded .db files was quite high.

I see. I guess there are other users with similar issues. In that case, I might reconsider if it makes sense to implement some caching mechanism for database files. This should probably be disabled by default, and it should be configurable to determine the duration after which locally stored database files are considered stale and redownloaded again.

But don't expect this to be implemented very soon, I'm currently prioritizing changes that improve the code-maintainability over new features.

from flexo.

bernhard-da avatar bernhard-da commented on August 19, 2024

@nroi fair enough. thx again for your comments and working on flexo :)

from flexo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.