GithubHelp home page GithubHelp logo

Comments (9)

JasonGross avatar JasonGross commented on May 29, 2024 2

All the information you might want is on the colab I linked in the original issue report, which reproduces the issue in full...

The file size shouldn't be relevant, since, as I point out above, 49s (out of 51) is spend downloading urls. But, sure, each artifact has one file in it, the average file size is 130.64 kb, and the total size of the files is 38.53 mb. I added a cell to the bottom of the colab calculating this. But, as I said, the problem is that wandb is spending 49 seconds downloading urls when the files are already downloaded.

I also don't see how this could possibly be related to slow uploads, because, as I've said, the problem is spending time downloading urls. I gave profile including all relevant lines when reporting this issue, and the colab contains the complete profile.

0.16.1 is not capable of downloading artifacts at all, erroring with "CommError: It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)" (#6783)

0.15.12 is a basically the same as 0.16.3 (3m1s with no cache; 53s when everything is already cached).

from wandb.

JasonGross avatar JasonGross commented on May 29, 2024 1

I have added another cell at the bottom to test this option, but I expect this won't help at all. The problem is not that downloading is slower with the cache than without---it's not. Downloading with the cache takes 45 seconds while downloading without the cache takes 3 minutes and 17 seconds.
The problem is that even when the cache is turned on and the artifacts are all in the cache already, wandb is 20x slower at "downloading" artifacts than it needs to be, spending only about 5% of time fetching the artifacts from cache, and the other 95% is spent downloading/fetching urls.

from wandb.

jlzhao27 avatar jlzhao27 commented on May 29, 2024 1

Hey @JasonGross , thank you for the detailed report and sorry for the delayed response. I'm on the engineering team and can confirm that this is definitely an issue that we can address. We've filed an internal ticket to track it and will pick this up in the near future. I don't have an exact timeline right now as we're tackling several other high priority tasks with performance. While we're slower than we could be in the case of cached artifacts, the impact of the extra URL signing is small compared to some other use cases dealing with reference artifacts and large scale downloads. We're currently working on improving performance for those cases which is why it will take a little longer getting to this issue.

I just wanted to share the context and some internal roadmap for the team so you are aware of what is going on. Again, thank you for the detailed feedback and we will aim to get to it as soon as we can!

from wandb.

moredatarequired avatar moredatarequired commented on May 29, 2024 1

@JasonGross @benrhodes26 , I created #7245 to address this, but you might be disappointed at how little of a difference it makes if your script downloads all the artifacts in series. We still have to make a network request to get the manifests for each artifact--in your script this gets glossed over because you load the manifests when you pre-load the artifacts, but in a typical case you still need to download them.

Using your script on the current wandb release I was able to get it to run in <6 seconds by parallelizing the requests:

with ThreadPoolExecutor() as executor:
  executor.map(lambda a: a.download(), logged_artifacts)

Using your script as-is on my feature branch it runs in <200ms, but that ignores the 1m23s it takes to load the initial manifests. If you parallelize the initial load it finishes in <6 seconds.

from wandb.

thanos-wandb avatar thanos-wandb commented on May 29, 2024

Hi @JasonGross thank you for reporting this issue. In wandb v0.16.3 we added the option to skip caching. Can you please try to download the artifacts as follows, and let us know if that resolves this reported issue here?

a = artifact.download(skip_cache=True)

from wandb.

JasonGross avatar JasonGross commented on May 29, 2024
profile of downloading artifacts without cache
         9009763 function calls (8946111 primitive calls) in 204.670 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000  204.681  102.341 /usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py:3512(run_code)
      4/2    0.000    0.000  204.681  102.341 {built-in method builtins.exec}
        1    0.000    0.000  204.681  204.681 <ipython-input-7-c296c4927149>:1(<cell line: 3>)
        1    0.004    0.004  204.069  204.069 <ipython-input-7-c296c4927149>:3(<listcomp>)
      302    0.020    0.000  201.967    0.669 /usr/local/lib/python3.10/dist-packages/wandb/sdk/artifacts/artifact.py:1557(download)
      302    0.051    0.000  201.833    0.668 /usr/local/lib/python3.10/dist-packages/wandb/sdk/artifacts/artifact.py:1683(_download)
      910    0.026    0.000  134.378    0.148 /usr/local/lib/python3.10/dist-packages/requests/sessions.py:502(request)
      910    0.038    0.000  132.273    0.145 /usr/local/lib/python3.10/dist-packages/requests/sessions.py:673(send)
      910    0.036    0.000  131.859    0.145 /usr/local/lib/python3.10/dist-packages/requests/adapters.py:434(send)
      910    0.036    0.000  130.531    0.143 /usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py:596(urlopen)
      910    0.034    0.000  129.857    0.143 /usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py:381(_make_request)
      910    0.049    0.000  103.747    0.114 /usr/local/lib/python3.10/dist-packages/urllib3/connection.py:435(getresponse)
      910    0.013    0.000  103.441    0.114 /usr/lib/python3.10/http/client.py:1331(getresponse)
      910    0.032    0.000  103.378    0.114 /usr/lib/python3.10/http/client.py:311(begin)
      910    0.031    0.000  102.877    0.113 /usr/lib/python3.10/http/client.py:278(_read_status)
    22359    0.029    0.000  102.852    0.005 {method 'readline' of '_io.BufferedReader' objects}
      925    0.012    0.000  102.823    0.111 /usr/lib/python3.10/socket.py:691(readinto)
      925    0.008    0.000  102.809    0.111 /usr/lib/python3.10/ssl.py:1292(recv_into)
      925    0.009    0.000  102.800    0.111 /usr/lib/python3.10/ssl.py:1150(read)
      925  102.791    0.111  102.791    0.111 {method 'read' of '_ssl._SSLSocket' objects}
      906    0.028    0.000   94.507    0.104 /usr/local/lib/python3.10/dist-packages/wandb/sdk/artifacts/artifact.py:610(manifest)
      302    0.011    0.000   71.980    0.238 /usr/local/lib/python3.10/dist-packages/wandb/sdk/artifacts/artifact.py:2166(_load_manifest)
      302    0.005    0.000   71.825    0.238 /usr/local/lib/python3.10/dist-packages/requests/api.py:62(get)
      302    0.008    0.000   71.820    0.238 /usr/local/lib/python3.10/dist-packages/requests/api.py:14(request)
    14433   65.397    0.005   65.397    0.005 {method 'acquire' of '_thread.lock' objects}
      911    0.007    0.000   65.259    0.072 /usr/lib/python3.10/threading.py:589(wait)
     1183    0.016    0.000   65.253    0.055 /usr/lib/python3.10/threading.py:288(wait)
      604    0.013    0.000   64.593    0.107 /usr/lib/python3.10/concurrent/futures/_base.py:201(as_completed)
  910/608    0.008    0.000   64.324    0.106 /usr/local/lib/python3.10/dist-packages/wandb/sdk/lib/retry.py:210(wrapped_fn)
  910/608    0.031    0.000   64.319    0.106 /usr/local/lib/python3.10/dist-packages/wandb/sdk/lib/retry.py:94(__call__)
      608    0.006    0.000   63.728    0.105 /usr/local/lib/python3.10/dist-packages/wandb/apis/public/api.py:66(execute)
      608    0.005    0.000   63.722    0.105 /usr/local/lib/python3.10/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py:48(execute)
      608    0.016    0.000   63.716    0.105 /usr/local/lib/python3.10/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py:58(_get_result)
      608    0.022    0.000   63.698    0.105 /usr/local/lib/python3.10/dist-packages/wandb/sdk/lib/gql_request.py:42(execute)
      608    0.007    0.000   62.635    0.103 /usr/local/lib/python3.10/dist-packages/requests/sessions.py:626(post)
      302    0.006    0.000   42.088    0.139 /usr/local/lib/python3.10/dist-packages/wandb/sdk/artifacts/artifact.py:1773(_fetch_file_urls)
      910    0.006    0.000   25.508    0.028 /usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py:1089(_validate_conn)
      302    0.012    0.000   25.500    0.084 /usr/local/lib/python3.10/dist-packages/urllib3/connection.py:609(connect)
      302    0.014    0.000   24.446    0.081 /usr/local/lib/python3.10/dist-packages/urllib3/connection.py:708(_ssl_wrap_socket_and_match_hostname)
      302    0.008    0.000   24.158    0.080 /usr/local/lib/python3.10/dist-packages/urllib3/util/ssl_.py:398(ssl_wrap_socket)
      302   22.790    0.075   22.790    0.075 {method 'load_verify_locations' of '_ssl._SSLContext' objects}
      303    0.005    0.000    2.098    0.007 /usr/local/lib/python3.10/dist-packages/tqdm/std.py:1160(__iter__)
        5    0.001    0.000    1.953    0.391 /usr/local/lib/python3.10/dist-packages/wandb/apis/paginator.py:55(_load_page)
Indeed this does not address the issue here

from wandb.

thanos-wandb avatar thanos-wandb commented on May 29, 2024

Thank you so much @JasonGross for the detailed download tests. Would it be possible to share some details how many files are you having in your artifacts, what type they are and what's their individual size? also, any chance you could downgrade to wandb 0.16.1 and try again? There seems to be a reported issue with upload speed in 0.16.3 and wanted to confirm they're not related.

from wandb.

thanos-wandb avatar thanos-wandb commented on May 29, 2024

Thank you @JasonGross for the detailed information and for the Colab repro. This is a great suggestion, and I've logged it as a feature request. We're focusing on performance overall, including artifacts download/upload, and that's certainly something we would like to address with either of your suggested solutions. I will keep you updated on its progress here.

from wandb.

benrhodes26 avatar benrhodes26 commented on May 29, 2024

+1. I do think this deserves attention. Fixing would be a noticeable QoL boost.

from wandb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.