GithubHelp home page GithubHelp logo

Git-LFS support about bashbrew HOT 12 OPEN

docker-library avatar docker-library commented on August 11, 2024
Git-LFS support

from bashbrew.

Comments (12)

tianon avatar tianon commented on August 11, 2024 1

I've figured out a flow that works (on Git LFS 2.8.0) -- we can use git lfs ls-files 'GitCommit' to determine whether any LFS files need to be fetched, and git lfs fetch 'GitRepo' 'GitCommit' to get them fetched into our local directory, after which git archive ... will include the LFS objects!

(Edit: To be clear, we have to git lfs fetch manually instead of letting git archive's smudge filter do it automatically because our bare Git repo doesn't have any remotes configured for it to use, nor would it be able to properly determine which one to use automatically.)

from bashbrew.

andyshinn avatar andyshinn commented on August 11, 2024

I'm curious about this as well. The current workflow of maintaining tarballs as commits in git repositories is not very ideal. But I'm happy to pay for some git space somewhere to keep the tarballs in check. The workflow actually shouldn't change much. It seems like it would mostly just need the git LFS binaries installed on whatever builder hosts are cloning the repository.

from bashbrew.

tianon avatar tianon commented on August 11, 2024

I'm definitely interested in exploring Git LFS, especially now that GitLab supports it too. I think my biggest concern with it is making sure that it's not too invasive an addition on top of Git itself (ie, trivial to install, etc), since Git is widely packaged and available, but I don't know how easy Git LFS is to get set up with (since I haven't had a chance to play with it yet). I'm not very comfortable requiring a large additional non-trivial installation for re-building the official images from scratch unless there are some obvious and large benefits to outweigh it. For example, https://bugs.debian.org/792075 looks promising, but is still a ways out (which would be how it'd become an actual package in Debian and Ubuntu, because it appears to be written in Ruby, which for me personally is a point against it).

As a side note, if Git LFS could potentially give us the ability to "diff" tarballs, that'd be a real killer feature for me that would prioritize this over our current force-push insanity. 😄

from bashbrew.

vaygr avatar vaygr commented on August 11, 2024

Well, in our source-based Linux distro, for example, the only dependency for git-lfs is go.

from bashbrew.

justincormack avatar justincormack commented on August 11, 2024

@tianon at some point it was rewritten in Go https://github.com/github/git-lfs so it is not Ruby any more.

Had pretty good experience with it for a large project that at one time had a lot of binaries checked in.

from bashbrew.

tianon avatar tianon commented on August 11, 2024

Yeah, that's definitely nice! Thanks for chiming in. ❤️

I think this has sat stale long enough that I should write up my current concerns, just so they're clear (not as an argument for or against, just to keep the info current; I'm generally +1 on the idea):

  • what's the simplest way to install git-lfs, especially in Debian and Windows? (native distro packages would be preferred, but the Go dependency chain looks to be holding that back)
  • what's the failure mode like when git-lfs isn't installed? does the filename we actually want exist as metadata, or is it completely nonexistent causing a more clear failure?
  • how difficult is it for external tooling to interact with git-lfs? we currently use git archive to get our context tarball directly from Git itself without ever doing a clone or checkout (just a fetch), so I'm guessing we'd need to invoke git-lfs separately somehow?
    • bashbrew is written in Go, so might it be possible to use a library to include git-lfs support directly in our build tool?

    • https://github.com/github/git-lfs/tree/97306ee58215a3c7730e8588d57d1db133b5df75#using-lfs-from-other-go-code 😢

      At the moment git-lfs is only focussed on the stability of its command line interface, and the server APIs. The contents of the source packages is subject to change. We therefore currently discourage other Go code from depending on the git-lfs packages directly; an API to be used by external Go code may be provided in future.

from bashbrew.

tianon avatar tianon commented on August 11, 2024

More color specifically about git archive:

  • git-lfs/git-lfs#1322: git archive doesn't include LFS files

    I hoped that git archive would use the configured smudge filter, but apparently not. We'd have to modify git archive to support LFS, which probably won't happen since it's a core command, and LFS is an uncommon extension. I think implementing a git lfs archive command would be better.

Which then points to https://github.com/github/git-lfs/blob/master/ROADMAP.md where it's listed under "Possible Features" with a link back to that GitHub issue. 😢

from bashbrew.

justincormack avatar justincormack commented on August 11, 2024

Install is pretty simple see https://git-lfs.github.com/ - there are packages for every OS. We had a team across Linux, OSX and Windows with no issues.

Files are actually stored as a small text file that says where the real data is.

There is reasonable tooling to convert history in both directions (we redid our history and converted all the old checkins to git lfs) but if you rewrite history is is of course an issue - it was easier for us as it was a private repo so we could coordinate all the users.

Not sure about git archive...

from bashbrew.

tianon avatar tianon commented on August 11, 2024

Spent a second playing with this a little bit (https://github.com/tianon/test-lfs).

The "failure mode" looks pretty sane, actually:

$ git clone [email protected]:tianon/test-lfs.git
Cloning into 'test-lfs'...
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 7 (delta 0), reused 4 (delta 0), pack-reused 0
Receiving objects: 100% (7/7), done.
Checking connectivity... done.
git-lfs smudge -- 'git-lfs-linux-amd64-1.4.4.tar.gz': git-lfs: command not found
error: external filter git-lfs smudge -- %f failed -1
error: external filter git-lfs smudge -- %f failed
fatal: git-lfs-linux-amd64-1.4.4.tar.gz: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Additionally, it looks like git archive does run filters, but since we don't clone a working copy, it fails:

tianon@00f82f5ec123:/tmp/test-lfs$ git init
Initialized empty Git repository in /tmp/test-lfs/.git/
tianon@00f82f5ec123:/tmp/test-lfs$ git fetch https://github.com/tianon/test-lfs.git
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 7 (delta 0), reused 4 (delta 0), pack-reused 0
Unpacking objects: 100% (7/7), done.
From https://github.com/tianon/test-lfs
 * branch            HEAD       -> FETCH_HEAD
tianon@00f82f5ec123:/tmp/test-lfs$ git archive 42e18259212eae0301fabbd7883fe97625a3eba4 | tar -t   
Downloading git-lfs-linux-amd64-1.4.4.tar.gz (2.98 MB)
Error downloading object: git-lfs-linux-amd64-1.4.4.tar.gz (2c1de8d00759587a93eb78b24c42192a76909d817214d4abc312135c345fbaca)

Errors logged to /tmp/test-lfs/.git/lfs/objects/logs/20161026T140914.513150585.log
Use `git lfs logs last` to view the log.
error: external filter git-lfs smudge -- %f failed 2
error: external filter git-lfs smudge -- %f failed
fatal: git-lfs-linux-amd64-1.4.4.tar.gz: smudge filter lfs failed
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
tianon@00f82f5ec123:/tmp/test-lfs$ cat .git/lfs/objects/logs/20161026T140914.513150585.log 
git-lfs/1.4.4 (GitHub; linux amd64; go 1.7.3; git cbf91a9)
git version 2.1.4

$ git-lfs smudge -- git-lfs-linux-amd64-1.4.4.tar.gz
Error downloading object: git-lfs-linux-amd64-1.4.4.tar.gz (2c1de8d00759587a93eb78b24c42192a76909d817214d4abc312135c345fbaca)

Smudge error: Error downloading 2c1de8d00759587a93eb78b24c42192a76909d817214d4abc312135c345fbaca: http: Post objects/batch: unsupported protocol scheme "": http: Post objects/batch: unsupported protocol scheme ""
github.com/github/git-lfs/errors.newWrappedError
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/errors/types.go:166
github.com/github/git-lfs/errors.NewSmudgeError
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/errors/types.go:252
github.com/github/git-lfs/lfs.PointerSmudge
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/lfs/pointer_smudge.go:69
github.com/github/git-lfs/lfs.(*Pointer).Smudge
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/lfs/pointer.go:64
github.com/github/git-lfs/commands.smudgeCommand
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/commands/command_smudge.go:66
github.com/github/git-lfs/vendor/github.com/spf13/cobra.(*Command).execute
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/vendor/github.com/spf13/cobra/command.go:477
github.com/github/git-lfs/vendor/github.com/spf13/cobra.(*Command).Execute
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/vendor/github.com/spf13/cobra/command.go:551
github.com/github/git-lfs/commands.Run
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/commands/run.go:65
main.main
    /Users/ttaylorr/dev/go/src/github.com/github/git-lfs/git-lfs.go:33
runtime.main
    /usr/local/Cellar/go/1.7.3/libexec/src/runtime/proc.go:183
runtime.goexit
    /usr/local/Cellar/go/1.7.3/libexec/src/runtime/asm_amd64.s:2086

ENV:
LocalWorkingDir=/tmp/test-lfs
LocalGitDir=/tmp/test-lfs/.git
LocalGitStorageDir=/tmp/test-lfs/.git
LocalMediaDir=/tmp/test-lfs/.git/lfs/objects
LocalReferenceDir=
TempDir=/tmp/test-lfs/.git/lfs/tmp
ConcurrentTransfers=3
TusTransfers=false
BasicTransfersOnly=false
BatchTransfer=true
SkipDownloadErrors=false
FetchRecentAlways=false
FetchRecentRefsDays=7
FetchRecentCommitsDays=0
FetchRecentRefsIncludeRemotes=true
PruneOffsetDays=3
PruneVerifyRemoteAlways=false
PruneRemoteName=origin
AccessDownload=none
AccessUpload=none
DownloadTransfers=basic
UploadTransfers=basic
GIT_PREFIX=

(it looks like git lfs smudge is looking at the local working copy explicitly to find the metadata necessary to fetch the file 😞)

I wonder if there's a way to disable that filters invocation so that we can invoke something after the fact instead.

from bashbrew.

tianon avatar tianon commented on August 11, 2024

http://stackoverflow.com/q/37457367/433558 is not encouraging... 😞

from bashbrew.

tianon avatar tianon commented on August 11, 2024

Checking in to note that I've just tested git-lfs-2.2.1, and got the same result as above (https://github.com/docker-library/official-images/issues/1095#issuecomment-256481338; git archive still unsupported).

from bashbrew.

tianon avatar tianon commented on August 11, 2024

So far, git lfs ls-files COMMIT was the best way I found to determine whether we need to run fetch, but the performance hit is pretty significant (not to mention I'm running into a bad rash of too many open files errors thanks to issues like git-lfs/git-lfs#3752 and our repository being massive and my default ulimit -n being set to 1024 by Debian's defaults).

Maybe we need to do something clever with git show COMMIT:.gitattributes to detect whether an lfs filter is in use at all? Not sure whether that would be faster, but would have the benefit of being pure-go instead of shelling out (since we've got https://github.com/src-d/go-git, which unfortunately doesn't support LFS; yet? src-d/go-git#1110 😄 😇).

We could use that as an initial smoke test for whether we need to run ls-files to determine whether any LFS objects actually exist in a given commit?

Edit: using this as a scratchpad:

  • For context, the "significant performance hit" is such that before the change, I can gather the FROM values from every current image in ~1.348s (across all supported architectures), but after it jumps to ~11m39.467s ❗

  • To implement .gitattributes checking properly, we'd need to check all of some/sub/dir/.gitattributes, some/sub/.gitattributes, some/.gitattributes, and .gitattributes (a bit hairy, but not insurmountable and potentially faster than git lfs ls-files ... which seems to be doing a lot more work than we need to for a at least a basic "do we care about LFS for this commit+directory combination?)

  • Doing a really naive .gitattributes parsing, I've managed ~2.519s (vs ~1.348s without LFS entirely).

  • With a simple cache in place, I'm down to ~1.6s for time bashbrew from --all > /dev/null.

Edit (2022-02-16): linking to the new issue on the new go-git repo instead: go-git/go-git#381

from bashbrew.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.