GithubHelp home page GithubHelp logo

Comments (6)

xolox avatar xolox commented on August 31, 2024

Hi Alexander and thanks for the feedback!

Short answer without details: The scipy package doesn't actually compile the second time: It is successfully cached and on the second run the cached binary package is correctly re-used. If you are hesitant to accept this explanation I created a gist that demonstrates my statement. As shown in the linked gist the first run takes 14 minutes and the second run takes 2 minutes (the 12 minute difference there is that scipy doesn't have to be compiled the second time :-). However subsequent runs are slower than they could be and I can explain why.

Explanation of cause of slow down: The scipy package depends on the numpy package and includes the numpy package in its setup.py script as a setup_requires dependency. What this comes down to is that every time the setup.py script of scipy is run, it will download and build the numpy binary (this is simplifying a bit, the next section contains more details).

General rant about setup_requires "feature" (with more details): To be very blunt the setup_requires feature of setuptools is a very tricky feature that is hard to use "correctly" and consequently most people indeed don't use it "correctly", so the effect is quite annoying and easy to demonstrate:

  1. Create an empty, clean Python virtual environment and activate it.
  2. Manually download the scipy source distribution archive (the *.tar.gz file) and unpack it.
  3. Navigate into the unpacked scipy source distribution and run an "innocent looking" command like python setup.py --version and be amazed while the answer to this very simple question takes a couple of minutes because numpy is being downloaded and compiled before the version number of the scipy source distribution is printed ಠ_ಠ
  4. The next time you run that command it will be fast because the setup_requires dependency is already satisfied (you can check with ls -ld numpy-*.egg), but if you unpack the source distribution to a clean directory and run step 3 again it will again be slow.

How this problem can be resolved: I've actually fixed this exact issue before by sending pull requests to Python projects, see e.g. pyca/cryptography#1257. So to bring this issue to some sort of conclusion: It is possible to fix this, but it requires cooperation from the package author(s) because I can't think of any way for pip-accel to fix this issue "from the outside" - the issue can be clearly demonstrated without invoking pip-accel at all :-).

Given my extensive explanation here and the additional information available in pyca/cryptography#1257, do you feel like creating a pull request for the scipy project?

I can do it as well, but 1) I have more than 30 open source projects to maintain and am already slacking on half of them because I just don't have the time and 2) I've never actually used scipy and have zero experience with it (I actually spent quite a bit of time determining its build dependencies before I was able to reproduce this issue! :-)

from pip-accel.

xolox avatar xolox commented on August 31, 2024

A small follow up:

I couldn't believe that this "issue" would have gone unfixed in such a popular / high profile Python package as scipy and it actually looks like the scipy people are aware of the issue and working on it, but between all of the issues, pull requests and commits I still don't see a full solution emerging, maybe I'm confused at this point. The most useful reference I was able to find is scipy/scipy#453. Note that the last comment in that pull request is only two months old and states that the issue hasn't been fully fixed in a released version, if I understand correctly.

from pip-accel.

Suor avatar Suor commented on August 31, 2024

Hello, and thank you for this explanation.

I, however, think you can fix it from outside, with some hustle :) Here is what you can do:

  • when doing bdist_dump look for ./.eggs directory and save it along with binary dump,
  • when installing move that .eggs to current directory, thus setup_requires will be satisfied.

Alternatively you can keep .eggs directory for each python with all sorts of stuff and symlink it into current dir before installation.

from pip-accel.

xolox avatar xolox commented on August 31, 2024

Sorry for the long delay in replying here, other issues and projects got in the way of finishing this reply and summarizing my thoughts about your proposal to fix this from the outside.


If it turns out that there is a way to:

  1. Reliably cache setup_requires dependencies
  2. That will work for all users
  3. Without major downsides

Then I'm all for it and don't mind spending time on implementing this. However I'm pretty sure things are not quite as simple as you explain it in your last comment, and this is the reason why I never seriously tackled this issue inside pip-accel before now:

  • In most use cases of pip-accel the user runs either pip-accel install PKG_NAME or pip-accel install -r REQ_FILE. The way this works internally is that pip-accel asks pip to (1) download and (2) unpack the requested Python packages and (3) collect metadata about the packages. The first feedback pip-accel gets from pip after this is when step (3) has already happened. But step (3) is the step which invokes the installation of setup_requires packages. Catch-22! So properly speaking there is no place for me to put the proposed "Please ensure this .eggs symbolic link exists" logic.
    • For the above point to make sense you must realize that during a run of pip-accel we can't talk about a single "working directory" because every source distribution is unpacked to a separate directory and when pip runs setup.py egg_info it does so in the directory that contains the setup.py script. So during a single run of pip-accel we could be talking about a dozen "working directories" that each get their own setup_requires packages.
  • As I said above properly speaking there is no way, but of course there is always a way, it's just that it might become somewhat nasty :-). I'd basically have to monkey patch pip so that pip-accel gets a chance to create the symbolic link after pip has unpacked the source distribution but before it runs the python setup.py egg_info command that triggers the installation of setup_requires packages. Looking into the pip source code it seems that I would have to monkey patch / override one or two of the below methods:
    • pip.req.run_egg_info()
    • pip.req.req_install.run_egg_info()
  • Your proposal depends on the use of the .eggs directory which was added in setuptools 7.0. Before that version setuptools would create *.egg directories in each working directory. Theoretically I could pro-actively link all of these *.egg directories for each unpacked source distribution, but I'm not going to do this because it's madness. So if I want this to work for all users I'd need to make sure that setuptools 7.0 or higher is installed. The natural way to do that is to just give pip-accel a setuptools >= 0.7 dependency.
  • If all of the above points are taken into account and this caching of setup_requires packages turns out to work, users would still be compiling setup_requires packages more times than strictly necessary because the setuptools .eggs cache and the pip-accel binary cache aren't shared, however of course this is still a lot better than recompiling setup requirements on every single run!

The best way for me to find out how realistic all of this is would be to (try to) implement the required changes. The issues caused by setup_requires have been a thorn in my side even before I created pip-accel and since then it hasn't gotten any better, so believe me when I say that I'd love to improve how this works :-).

However there is also issue #57 suggesting to upgrade from pip 6.x to pip 7.x and it seems wise to tackle that upgrade before I introduce yet more monkey patching (as explained above) because every additional pip monkey patch in pip-accel makes it a bit harder for me to upgrade to a new major version of pip.

from pip-accel.

Suor avatar Suor commented on August 31, 2024

Would be nice to se it finally :), but no rush.

from pip-accel.

xolox avatar xolox commented on August 31, 2024

Hi Alexander,

Sorry things took so long, however thanks for your persistence in fixing this on the side of pip-accel. I just released pip-accel 0.39 which 1) depends on setuptools >= 7.0 and 2) manages the creation of .eggs symbolic links to avoid recompilation of setup requirements.

I've now tested this with a couple of packages including the SciPy / NumPy combination and it seems to work very well! There is even an automated test to verify the functionality - getting the test to work correctly in all environments was actually a lot more work than the feature itself :-).


Given that I now have a way to manipulate unpacked source distributions before they are processed by pip I'm considering extending this logic to inject allow_hosts and find_links options that can keep setuptools (easy_install) off the internet. Not sure how that would work yet, but I'm thinking about it :-).

from pip-accel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.