GithubHelp home page GithubHelp logo

Comments (10)

joshtriplett avatar joshtriplett commented on July 30, 2024 1

For example, cloning into the same location will have one git process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, when kill -9ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process.

You could potentially handle that with file locks, which don't outlive the process holding them.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

I absolutely agree, thanks for making that explicit.

How would you create multiple files atomically? This would be required to perfectly conclude a clone, which currently is done in a plumbing command by creating refs sequentially.

Would you go as far as to maintain an undo list to respond to interrupts properly?

from gitoxide.

joshtriplett avatar joshtriplett commented on July 30, 2024

@Byron Catching interrupts won't handle kill -9, or a power failure, or similar.

You don't have to make entire operations atomic when that isn't possible. It's OK if an interrupted fetch leaves some refs created and others not; another fetch will clear that up. (Clones can be made atomic by doing them into a temporary directory and renaming that directory into place.) What must not happen is a ref getting created but the object it references not existing, or an incomplete pack existing where git looks for packs, or other cases where the repository is in an inconsistent state. There should always be an order you can perform operations in such that the repository remains consistent.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

I see, so rather than complicating things try to design operations to never corrupt the git repository, and ideally recover automatically next time the operation is run in case resources have been leaked, like some refs still being present after interruption.

In the same vein, operations should probably be hardened against races, too. For example, cloning into the same location will have one git process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, when kill -9ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process. Thinking about it, since there is a special kind of temp file which is always cleaned up if a process is killed, that could possibly be used as marker to differentiate this case as well.

Generally, when seeing the .git repository as database, all best-practices should certainly be applied to prevent corruption and allow (auto) recovery.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

git-tempfile and git-lock will help for sure in keeping the repository consistent in cases that are not kill -9. It's still to be figured out how to respond to stray locks though, especially on the server side where doing so should probably be automated.

from gitoxide.

kim avatar kim commented on July 30, 2024

It is potentially interesting to note that git.git repacks refs when more than one is updated in a transaction (for —atomic support). Last time I checked, libgit2 doesn’t do that, making ref transactions not actually atomic.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

@kim And on top of that one has to write the reflog which isn't contained in the packed-refs file and consist of a file per ref. I will still have to see how exactly that is locked, but I am hopeful it's made in a way to not completely tank performance.

from gitoxide.

kim avatar kim commented on July 30, 2024

Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).

Guess you can see now why I keep crying “reftable” ;)

from gitoxide.

Byron avatar Byron commented on July 30, 2024

Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).

Super interesting, thanks for sharing! I checked it against my current sketch for transactions and am glad this would naturally be supported. I also noted that in bare repos, no reflog is created otherwise which wasn't on my radar yet.

Guess you can see now why I keep crying “reftable” ;)

There is no way around it on the server for sure. Since it operates in blocks it's probably less likely to be overwhelmed by the amount of filesystem events that it produces. But whatever it's going to be there is no escape if there are too many events it's either the files backend or the reftable.

from gitoxide.

kim avatar kim commented on July 30, 2024

if there are too many events

I misspoke there, it's more related to the notify crate not allowing the set event masks for portability reasons, and also the inherent raciness of watching directories recursively.

from gitoxide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.