GithubHelp home page GithubHelp logo

go-ipld-git's Introduction

Git ipld format

An IPLD codec for git objects allowing path traversals across the git graph.

Table of Contents

Install

go get github.com/ipfs/go-ipld-git

About

This is an IPLD codec which handles git objects. Objects are transformed into IPLD graph as detailed below. Objects are demonstrated here using both IPLD Schemas and example JSON forms.

Commit

type GpgSig string

type PersonInfo struct {
  date String
  timezone String
  email String
  name String
}

type Commit struct {
  tree &Tree # see "Tree" section below
  parents [&Commit]
  message String
  author optional PersonInfo
  committer optional PersonInfo
  encoding optional String
  signature optional GpgSig
  mergetag [Tag]
  other [String]
}

As JSON, real data would look something like:

{
  "author": {
    "date": "1503667703",
    "timezone": "+0200",
    "email": "author@mail",
    "name": "Author Name"
  },
  "committer": {
    "date": "1503667703",
    "timezone": "+0200",
    "email": "author@mail",
    "name": "Author Name"
  },
  "message": "Commit Message\n",
  "parents": [
    <LINK>, <LINK>, ...
  ],
  "tree": <LINK>
}

Tag

type Tag struct {
  object &Any
  type String
  tag String
  tagger PersonInfo
  message String
}

As JSON, real data would look something like:

{
  "message": "message\n",
  "object": {
    "/": "baf4bcfg3mbz3yj3njqyr3ifdaqyfv3prei6h6bq"
  },
  "tag": "tagname",
  "tagger": {
    "date": "1503667703 +0200",
    "email": "author@mail",
    "name": "Author Name"
  },
  "type": "commit"
}

Tree

type Tree {String:TreeEntry}

type TreeEntry struct {
  mode String
  hash &Any
}

As JSON, real data would look something like:

{
  "file.name": {
    "mode": "100664",
    "hash": <LINK>
  },
  "directoryname": {
    "mode": "40000",
    "hash": <LINK>
  },
  ...
}

Blob

type Blob bytes

As JSON, real data would look something like:

"<base64 of 'blob <size>\0<data>'>"

Lead Maintainers

Contribute

PRs are welcome!

Small note: If editing the Readme, please conform to the standard-readme specification.

License

MIT © Jeromy Johnson

go-ipld-git's People

Contributors

bollwyvl avatar hsanjuan avatar ipfs-mgmt-read-write[bot] avatar kevina avatar kubuxu avatar magik6k avatar marten-seemann avatar mvdan avatar sameer avatar stebalien avatar web-flow avatar web3-bot avatar whyrusleeping avatar willscott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-ipld-git's Issues

Good first issues / Help wanted & libgit2

Hello,
I came across go-ipld-git while working on a university project for putting git repos on IPFS. For now, I simply add the repo as a folder via ipfs. I would like to use go-ipld-git instead and was wondering where I could get started with helping.

Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information? Some of the issues like #16 could be solved by parsing the date in the commit. Are you avoiding using the cgo compiler?

IPFS max block size and IPLD GIT

I'm a little bit confused about how go-ipld-git deals with sharding and block size. This repo seems to make the assumption that Blobs all fit within one block, but isn't it possible that a git Blob is larger than the maximum block size (1 MB I believe)?

Windows test fail

https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Fgo-ipld-git/detail/master/1/tests

tests / windows / TestObjectParse – go-ipld-git
Error
Failed
Stacktrace
panic: runtime error: index out of range
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/testing/testing.go:742 +0x2a4
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/runtime/panic.go:502 +0x237
c:/jenkins/workspace/IPFS_go-ipld-git_master-66LUBMIUPM6OP6TSZ2OKJLPQ4APLAPM3ANAZQ7FC2VGOGWDQODYQ/src/github.com/ipfs/go-ipld-git/git_test.go:31 +0xb75
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/path/filepath/path.go:357 +0x409
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/path/filepath/path.go:381 +0x2c9
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/path/filepath/path.go:381 +0x2c9
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/path/filepath/path.go:403 +0x10d
c:/jenkins/workspace/IPFS_go-ipld-git_master-66LUBMIUPM6OP6TSZ2OKJLPQ4APLAPM3ANAZQ7FC2VGOGWDQODYQ/src/github.com/ipfs/go-ipld-git/git_test.go:22 +0xa7
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/testing/testing.go:777 +0xd7
c:/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/1.10.2/src/testing/testing.go:824 +0x2e7

Size() shouldn't return a constant.

So, Node.Size() means different things in different places (ipfs/go-ipld-format#12). However, returning arbitrary constants (13, 42, etc...) as we're doing here is, unambiguously, wrong...

I know this is currently "not implemented" but this fact should probably be recorded in an issue so we can make sure to fix this when we actually decide on the semantics of Node.Size().

Testing objects/repo

It would be useful to have the ability to provide a custom git repository for tests, though I don't know how it should be created, current ideas(discussed with @Kubuxu) include:

  • Providing static zip/tar archive with git repo which would be then unzipped in go test
    • This introduces some mess into test code
    • It's a bit harder to update tests that way
  • Have a script that would generate the repo
    • Easier to change/add test cases
    • Requires GIT on CI

Though I'm not sure which is best, I'm tilting toward the archive way.
cc @whyrusleeping

Precompute CID

This should precompute the CID. We tend to call Cid() often so that'll probably massively speed up adding/fetching git repos (edit: doesn't but we should still precompute it).

Date format

Do you think it might be better to use some standard date format? Like for example full ISO8601?

Cache the cid

go-ipfs and friends make no attempt to cache Node CIDs and will call Cid() repeatedly as-needed. Therefore, calling Cid() should not recompute the CID each time. Instead, it should either be cached on when first computed (preferably, when the node is first decoded).

Furthermore, the raw data should be kept around if possible so that calling RawData() is free. I don't expect this to tie up that much memory.

Handle blob objects larger than MessageSizeMax

go-libp2p-net.MessageSizeMax puts an upper limit of ~4 MiB on the size of messages on a libp2p protocol stream: https://github.com/libp2p/go-libp2p-net/blob/70a8d93f2d8c33b5c1a5f6cc4d2aea21663a264c/interface.go#L20

That means Bitswap will refuse to transfer blocks that are bigger than (1 << 22) - bitswapMsgHeaderLength, while locally these blocks are usable just fine. In unixfs that limit is fine, because we apply chunking. In ipld-git however, we can't apply chunking because we must retain the object's original hash. It's quite common to have files larger than 4 MiB in a Git repository, so we should come with a way forward pretty soon.

Here's three options:

  1. Leave it as is. Very unsatisfactory.
  2. Make MessageSizeMax configurable. Better, but still far from satisfactory.
  3. Make Bitswap capable of message fragmentation. The size limit exists mainly to prevent memory exhaustion due to reading big messages and not being able to verify and store them as we go. We could teach Bitswap how to verify and temporarily store fragmented messages. This would end up overly complex though, since these fragments are not ipld blocks, and thus can't reuse the stuff we already have.
  4. Introduce some kind of "virtual" blocks, which look similarly to our existing chunking data structuers, but whose hash is derived from the concatenated contents of its children. This is of course hacky because we can't verify the virtual block until we have fetched all children, but it lets us do 3) while reusing IPLD and the repo, and we can verify the children as we go.

Related issues: ipfs/kubo#4473 ipfs/kubo#3155 (and slightly less related ipfs/kubo#4280 ipfs/kubo#4378)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.