GithubHelp home page GithubHelp logo

Comments (7)

rvagg avatar rvagg commented on May 25, 2024 2

Hey, so a few of us have been furiously discussing this issue and the merits and technical issues involved. One outstanding question has been the Go stack and whether it's ever been able to properly recognise undefined. As I pointed out above, refmt isn't happy about undefined by default so my assumption has been that trying to get go-ipfs to decode one of these blocks is going to end up with pain, either with current versions or historical versions.

But I decided to dig deeper because you've insisted that this isn't correct and we've been doing some head scratching. So here's what I've found, and it looks like we're in a dejavu situation from almost exactly 4 years ago:

  1. go-ipld-cbor originally used whyrusleeping/cbor/go for its CBOR handling. My read of its major-7 handling is that it just ignored undefined and any of the other possible strange values that can come in via that route, effectively treating them as unset values (likely nil or whatever the unset value of the field would be, although I haven't spent much time figuring this bit out): https://github.com/whyrusleeping/cbor/blob/63513f603b11583741970c5045ea567130ddb492/go/cbor.go#L408-L438
  2. In September 2018, go-ipld-cbor has a big refactor and switches to the fairly new refmt: ipfs/go-ipld-cbor#30, at some point soon after this got released into go-ipfs.
  3. Someone files a bug report to go-ipfs complaining that their JS encoded data (using our old codec, which uses borc, which will encode undefined among other inappropriate terminals) no longer works with go-ipfs, because undefined is now rejected as an invalid token.
  4. The fix is to introduce coercion into refmt (polydawn/refmt@ede3ae2), make it into an opt-in (polydawn/refmt@bbf0067), and then opt in go-ipld-cbor (ipfs/go-ipld-cbor@c306514) and ship that in go-ipfs. That change is live today in go-ipld-cbor: https://github.com/ipfs/go-ipld-cbor/blame/master/encoding/unmarshaller.go#L29
  5. go-ipld-prime gets its own dag-cbor encoder, still using refmt, but not enabling that option.
  6. Over the course of this year, go-ipld-prime was integrated into go-ipfs, primarily for the DAG API, but this mostly (or completely?) replaces go-ipld-cbor as of v0.10. ipfs/kubo#7976 is the main PR for this but there are others for this work.
  7. Last year, I rewrote all the primary JS codecs, including dag-cbor, and it now uses a new encoder, cborg, which is intentionally strict and limited in comparison to borc as it's focused on our content-addressed use-case.
  8. During the course of this year, Alex upgraded js-ipfs to use our new js-multiformats stack and all our new codecs—our old set of codecs have been retired, we no longer use them anywhere.
  9. This issue.
  10. ipld/go-ipld-prime#308
  11. ... I guess a change on this repo is due too now.

So it looks like we're repeating 2018, but now with strictness on both JS and Go sides.

We still need to retain strictness on encode for this to all work nicely, though, so hopefully Ceramic, and anyone else who happens to be relying on undefined can find a work-around for existing code that may rely on it.

from js-dag-cbor.

rvagg avatar rvagg commented on May 25, 2024

Sorry, but undefined should never have been allowed and was only possible because it used https://github.com/dignifiedquire/borc which would even let you encode Dates, and its own custom BigNumber objects as well. The IPLD data model has a strict set of kinds that work throughout systems that support IPLD and undefined will break in many places. Your existing data with undefined is going to break if you pass it through Go IPLD / IPFS code for instance. https://ipld.io/docs/data-model/kinds/

You know, SemVer was created for a reason.

Snark isn't particularly helpful. And while I agree and would prefer that js-ipfs be >1 by now so we can do proper signalling, that's a decision by others and they are currently using <0 semver semantics which do allow for breaking changes in minor release. So technically it is following semver.

Could you lift the restrictions for undefined, maybe, so that old IPFS records could be properly decoded still?

You're welcome to do this now by using the BLOCK API to fetch the raw bytes and use the older DAG-CBOR codec or just use a CBOR decoder directly (using borc) and deal with the CID tags yourself. If you really wanted to you could even fork this repo and replace the CBOR decoder with borc to make it behave how you want and then force js-ipfs to use it instead of this codec whenever it encounters a DAG-CBOR block to decode (I believe you can override existing codecs when you supply new ones that conflict). In fact, it's probably easier than that since you may be able to just undo https://github.com/ipld/js-dag-cbor/blob/master/index.js#L88 and the undefineds will flow freely.

It's really important that you ensure that new data doesn't get encoded with undefined, it'll be treated as a bad block by the newer JS stack and the Go stack is the same, and since Go runs most of the backend infra around the world for IPFS, that's bad news: https://github.com/polydawn/refmt/blob/30ac6d18308e584ca6a2e74ba81475559db94c5f/cbor/cborDecoder.go#L213-L218

Sorry for the annoyance on this, clarifying the boundaries of our codecs and ensuring fully compatibility across the stack has been a long journey. Hopefully the current level of clarity helps, though.

from js-dag-cbor.

ukstv avatar ukstv commented on May 25, 2024

@rvagg Reference to go-ipfs is misleading in this case. Despite the linked change is 3 years old, go-ipfs 0.9 is able to return an old cbor too, unlike go-ipfs 0.10. Maybe this repository is not the right venue to discuss this, but anyway with the recent changes both go-ipfs and js-ipfs can not load old blocks.

from js-dag-cbor.

ukstv avatar ukstv commented on May 25, 2024

I could blame own negligence on not following IPLD spec, but changes to the spec making undefined prohibited are also recent. On 2020-10-14 it became prohibited, on 2021-01-28 the old codec became "legacy", and at that time there were no proper codec available.

from js-dag-cbor.

oed avatar oed commented on May 25, 2024

Hey @rvagg this is a big problem for us as it basically breaks all existing data on Ceramic.

It's really important that you ensure that new data doesn't get encoded with undefined

This makes sense and we will definitely do it going forward, but we need a solution for loading existing data. Does it really hurt to keep decode functions that allow for undefined to allow for loading of data that has already been created with older versions?

We also found that this strictness is also enforced on the go-ipfs side with the new go-ipld-prime, while it works in go-ipfs < 0.10.

from js-dag-cbor.

oed avatar oed commented on May 25, 2024

Thanks for digging into this and finding a path forward @rvagg extremely appreciated! Let us know if we can help in any way on the js side, since the go side has a fix (yet to be confirmed to work?).

We still need to retain strictness on encode for this to all work nicely, though, so hopefully Ceramic, and anyone else who happens to be relying on undefined can find a work-around for existing code that may rely on it.

Definitely, no problem for us to not use undefined on encode. We just need a way to be able to retrieve (and pin) old dag-cbor objects.

from js-dag-cbor.

rvagg avatar rvagg commented on May 25, 2024

#45 WIP, it's slightly more complicated than on the Go side which had the "coerce" flag ready to go.

from js-dag-cbor.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.