GithubHelp home page GithubHelp logo

Comments (9)

j08lue avatar j08lue commented on May 20, 2024 1

I think I understand what you have in mind here and I agree with the idea, but can you elaborate on how you would implement this?

something like

class SentinelAPI:

    def _progress_bar(self, iter):
        from tqdm import tqdm
        for package in tqdm(iter):
            yield package

    def download(self):
        packages = []
        for package in self._progress_bar(packages):
            _download_package(package)

so I can do

api = SentinelAPI()
api._progress_bar = some_qgis_progress_bar_func

from sentinelsat.

j08lue avatar j08lue commented on May 20, 2024

I do not have much experience with (the shortcomings of) homura, but I can see several advantages of getting rid of that dependency chain, one being that we/I/someone could easily make a QGIS sentinelsat plugin.

So the progress bar with tqdm (very nice choice) would be based on (parallel) iteration over download chunks?

Ideally for me, the progress bar should be a lazy dependency and replaceable by a drop-in (e.g. by overwriting an API class method).

The maximum number of download threads would be two, obviously?

from sentinelsat.

kr-stn avatar kr-stn commented on May 20, 2024

Great idea @valgur!

From what I can remember homura was implemented by @willemarcel mostly due to its convenience (integrated progress bar etc.) - so it is not set in stone as a dependency.

  • removing a compiled dependency is a big, big plus, especially on Windows systems or for novice Python users (and enables easier integration into other tools like QGIS, etc.)
  • we need to keep an eye on performance, but curl vs. requests benchmarks look like large file transfers should be about the same speed, and small requests in the form of search is done already with requests anyway
  • for most applications I don't see the benefit of parallel download support. On all systems I tested the throughput is saturated with a single transfer anyway. Since scihub limits each account to two connections at a time we would lose the ability to download and perform searches at the same time. This could pose problems should we change the query to a generator (#64). Would be nice to benchmark if concurrent downloads are faster, but I doubt it.
  • better loggin would be nice, also freeing the way for a --verbose or --log option for the CLI which would help us analyze issues people are having (e.g. in #89)
  • I'm always in favour of better testing

I'm all in favour for replacing homura with requests, mostly to get rid of the curl dependency. Better logging, testing and progress bars are added benefits.

from sentinelsat.

valgur avatar valgur commented on May 20, 2024

Ideally for me, the progress bar should be a lazy dependency and replaceable by a drop-in (e.g. by overwriting an API class method).

I think I understand what you have in mind here and I agree with the idea, but can you elaborate on how you would implement this?

we need to keep an eye on performance, but curl vs. requests benchmarks look like large file transfers should be about the same speed, and small requests in the form of search is done already with requests anyway

I definitely agree. I've looked at the same results you linked to but they measure a slightly different use case. They measure the speed of running new queries against the server while in our case we repeatedly receive chunks from the same open connection. I suspect pycurl and request perform relatively close in such a case, but I think I'll run some benchmarks myself to verify that.

Regarding parallel downloads, the number of download threads would be configurable and would default to two, indeed, for obvious reasons. In my experience the download speeds vary greatly depending on the load on the servers and what specific server you happen to connect to. I suspect the latter because I've noticed the download speed toggling between a reasonable 8 MB/s and meager 1 MB/s when restarting a download within just a minute.

It's also worth keeping in mind that sentinelsat can be used with other hosts besides the main Copernicus Data Hub. I have some experience using it with the Finnish Data Hub. The Finnish Hub does not limit the number of concurrent downloads or the download speed at all. I did some testing and I needed two parallel downloads there to max out my local connection. I assume the other national hosts are similar in being quite relaxed in their download limits.

Thanks for the feedback, both of you. I appreciate it.

from sentinelsat.

kr-stn avatar kr-stn commented on May 20, 2024

It's also worth keeping in mind that sentinelsat can be used with other hosts besides the main Copernicus Data Hub.

Good point. I just tested up to 4 connections to another hub.

If you want to implement threaded downloads, I think you should. Giving the useres the option to set their perfect/allowed/preferred connection number is a good idea. I was worried about the programming effort necessary to implement it (+the tests on 1 thread failing while one survives, etc.) - but once the tests are in place maintenance shouldn't be an issue. So if you want to to that - knock yourself out 👍

from sentinelsat.

j08lue avatar j08lue commented on May 20, 2024

Just curious, how is this coming @valgur?

from sentinelsat.

valgur avatar valgur commented on May 20, 2024

I have not started working on it yet. Don't have much time to spare right now, unfortunately. Maybe I'll find a couple of days to hammer this out in the coming months, but you should consider this idea to be on hold unless someone else feels like working on it.

from sentinelsat.

kr-stn avatar kr-stn commented on May 20, 2024

@valgur Don't stress yourself out implementing this. I think we can earmark this as one implementation towards the 1.0 milestone, or even > 1.0. While the curl dependency is not ideal it looks like the download is working stable as is for most users right now.

from sentinelsat.

kr-stn avatar kr-stn commented on May 20, 2024

Included in v0.11 release.

from sentinelsat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.