GithubHelp home page GithubHelp logo

Comments (4)

Byron avatar Byron commented on July 26, 2024 1

But I guess when doing this one would know they are ignored and not run gix clean -xd without specifying a path that omits them. Maybe precious files will ultimately be the solution to that.

Yes, precious files would add a layer of protection (even though you shouldn't take my word for it :)), even though with the already implemented change it would at least detect directly ignored repositories as such, so one has to -r explicitly.

Is this just for top-level ignored directories (i.e. those that are not subdirectories of ignored directories and that are ignored because they match an ignore pattern)?

If it applies to git repositories in all ignored directories no matter how deep down, then it would make it easier to understand -r because it would always be needed to delete nested repositories. Maybe that would go against the desire to be as fast as possible by limiting traversal. Then again, maybe that is not a problem in deletion, where one is always taking the time to traverse at least once anyway (since a recursive deletion traverses fully).

No, it's just for the top-level, no nesting. Indeed, this is for performance reasons and there is that warning indicating that ignored directories may include repositories (but they may not be one anymore).

This performance I really must protect, as for instance in GitButler with node_modules/ and target/, it takes ~0.04s with gix clean -xd , but ~3.8s with git clean -nxd. It's day and night.

Interestingly, gix clean -xd --skip-hidden-repositories non-bare is still faster than git.

❯ hyperfine -M3 -w1 'git clean -nxd' 'gix clean -xd --skip-hidden-repositories non-bare'
Benchmark 1: git clean -nxd
  Time (mean ± σ):      4.404 s ±  0.738 s    [User: 0.390 s, System: 3.587 s]
  Range (min … max):    3.958 s …  5.255 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: gix clean -xd --skip-hidden-repositories non-bare
  Time (mean ± σ):      3.229 s ±  0.248 s    [User: 0.408 s, System: 2.621 s]
  Range (min … max):    3.075 s …  3.515 s    3 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (3.515 s). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.

Summary
  gix clean -xd --skip-hidden-repositories non-bare ran
    1.36 ± 0.25 times faster than git clean -nxd

from gitoxide.

Byron avatar Byron commented on July 26, 2024

Thanks so much for researching this and the deep analysis - I was particularly impressed by thinking of submodules that have worktrees in the superproject.

Regarding the behaviour of git clean -f, I thought you might be interested in the undocumented 'double-force feature', which would then indeed remove nested repositories. It's worth noting that at some point git seemingly removed nested ignored repositories, but stopped doing so in favor of -ff. This is no argument at all for not protecting worktrees, just something I thought you might find interesting.

With that said, I also believe that it should never touch the worktrees of the repository it is run in, while being unsure of what to do with worktrees of submodules that are reaching into the superproject. My take here is that it would probably be so unlikely that it basically never happens, and if it does it's more of an accident. As such, it should probably show up as eligible to be cleaned. If one day that shouldn't be desired anymore, then implementing this will be trivial by collecting worktrees of all submodules recursively as well.

I see two points of action here:

  • add a way to classify worktrees and pass in their locations so that the algorithm can identify them. Having this as part of the API also makes callers aware.
  • when the mode is 'for deletion', always double-check if an ignored directory is also a repository based on the passed criterion (i.e. fast if it has .git entry inside or slow by actual repository check). This would also mean that a 'status' would identify such a worktree as untracked, but clean would know more. That way, accidental deletion while ignored will be prevented, even if such repository isn't a worktree. (One will have to specify -r).

from gitoxide.

EliahKagan avatar EliahKagan commented on July 26, 2024

Regarding the behaviour of git clean -f, I thought you might be interested in the undocumented 'double-force feature', which would then indeed remove nested repositories. It's worth noting that at some point git seemingly removed nested ignored repositories, but stopped doing so in favor of -ff. This is no argument at all for not protecting worktrees, just something I thought you might find interesting.

Thanks--I was totally unaware of -ff!

With that said, I also believe that it should never touch the worktrees of the repository it is run in, while being unsure of what to do with worktrees of submodules that are reaching into the superproject. My take here is that it would probably be so unlikely that it basically never happens, and if it does it's more of an accident. As such, it should probably show up as eligible to be cleaned. If one day that shouldn't be desired anymore, then implementing this will be trivial by collecting worktrees of all submodules recursively as well.

It occurs to me--and maybe this is what you're already thinking of--that there is case where a submodule's git worktree managed worktree being present in the superproject is if it is .gitignored in the superproject and created there deliberately so that it exists alongside the submodule's main worktree, in the same way that one would usually create a repository's extra worktrees in the parent directory of the repository's main working tree.

But I guess when doing this one would know they are ignored and not run gix clean -xd without specifying a path that omits them. Maybe precious files will ultimately be the solution to that.

when the mode is 'for deletion', always double-check if an ignored directory is also a repository based on the passed criterion

Is this just for top-level ignored directories (i.e. those that are not subdirectories of ignored directories and that are ignored because they match an ignore pattern)?

If it applies to git repositories in all ignored directories no matter how deep down, then it would make it easier to understand -r because it would always be needed to delete nested repositories. Maybe that would go against the desire to be as fast as possible by limiting traversal. Then again, maybe that is not a problem in deletion, where one is always taking the time to traverse at least once anyway (since a recursive deletion traverses fully).

from gitoxide.

Byron avatar Byron commented on July 26, 2024

Please note that this was implemented as breaking change, which will prevent me from publishing a patch release unfortunately. Thus this fix will be released in a month or two with the next regular 'breaking' one.

Technically, this is breaking just for gix-dir but not for gix, but the publishing system doesn't analyse the public API at all and thus relies on me flagging breaking changes, which are propagated 'downstream' within the workspace.

from gitoxide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.