GithubHelp home page GithubHelp logo

js-multiformats's Introduction

multiformats.io codecov CI

Interface for multihash, multicodec, multibase and CID

About

This library defines common interfaces and low level building blocks for various interrelated multiformat technologies (multicodec, multihash, multibase, and CID). They can be used to implement custom base encoders / decoders / codecs, codec encoders /decoders and multihash hashers that comply to the interface that layers above assume.

This library provides implementations for most basics and many others can be found in linked repositories.

import { CID } from 'multiformats/cid'
import * as json from 'multiformats/codecs/json'
import { sha256 } from 'multiformats/hashes/sha2'

const bytes = json.encode({ hello: 'world' })

const hash = await sha256.digest(bytes)
const cid = CID.create(1, json.code, hash)
//> CID(bagaaierasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea)

Creating Blocks

import * as Block from 'multiformats/block'
import * as codec from '@ipld/dag-cbor'
import { sha256 as hasher } from 'multiformats/hashes/sha2'

const value = { hello: 'world' }

// encode a block
let block = await Block.encode({ value, codec, hasher })

block.value // { hello: 'world' }
block.bytes // Uint8Array
block.cid   // CID() w/ sha2-256 hash address and dag-cbor codec

// you can also decode blocks from their binary state
block = await Block.decode({ bytes: block.bytes, codec, hasher })

// if you have the cid you can also verify the hash on decode
block = await Block.create({ bytes: block.bytes, cid: block.cid, codec, hasher })

Multibase Encoders / Decoders / Codecs

CIDs can be serialized to string representation using multibase encoders that implement MultibaseEncoder interface. This library provides quite a few implementations that can be imported:

import { base64 } from "multiformats/bases/base64"
cid.toString(base64.encoder)
//> 'mAYAEEiCTojlxqRTl6svwqNJRVM2jCcPBxy+7mRTUfGDzy2gViA'

Parsing CID string serialized CIDs requires multibase decoder that implements MultibaseDecoder interface. This library provides a decoder for every encoder it provides:

CID.parse('mAYAEEiCTojlxqRTl6svwqNJRVM2jCcPBxy+7mRTUfGDzy2gViA', base64.decoder)
//> CID(bagaaierasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea)

Dual of multibase encoder & decoder is defined as multibase codec and it exposes them as encoder and decoder properties. For added convenience codecs also implement MultibaseEncoder and MultibaseDecoder interfaces so they could be used as either or both:

cid.toString(base64)
CID.parse(cid.toString(base64), base64)

Note: CID implementation comes bundled with base32 and base58btc multibase codecs so that CIDs can be base serialized to (version specific) default base encoding and parsed without having to supply base encoders/decoders:

const v1 = CID.parse('bagaaierasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea')
v1.toString()
//> 'bagaaierasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea'

const v0 = CID.parse('QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n')
v0.toString()
//> 'QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n'
v0.toV1().toString()
//> 'bafybeihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku'

Multicodec Encoders / Decoders / Codecs

This library defines BlockEncoder, BlockDecoder and BlockCodec interfaces. Codec implementations should conform to the BlockCodec interface which implements both BlockEncoder and BlockDecoder. Here is an example implementation of JSON BlockCodec.

export const { name, code, encode, decode } = {
  name: 'json',
  code: 0x0200,
  encode: json => new TextEncoder().encode(JSON.stringify(json)),
  decode: bytes => JSON.parse(new TextDecoder().decode(bytes))
}

Multihash Hashers

This library defines MultihashHasher and MultihashDigest interfaces and convinient function for implementing them:

import * as hasher from 'multiformats/hashes/hasher'

const sha256 = hasher.from({
  // As per multiformats table
  // https://github.com/multiformats/multicodec/blob/master/table.csv#L9
  name: 'sha2-256',
  code: 0x12,

  encode: (input) => new Uint8Array(crypto.createHash('sha256').update(input).digest())
})

const hash = await sha256.digest(json.encode({ hello: 'world' }))
CID.create(1, json.code, hash)

//> CID(bagaaierasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea)

Traversal

This library contains higher-order functions for traversing graphs of data easily.

walk() walks through the links in each block of a DAG calling a user-supplied loader function for each one, in depth-first order with no duplicate block visits. The loader should return a Block object and can be used to inspect and collect block ordering for a full DAG walk. The loader should throw on error, and return null if a block should be skipped by walk().

import { walk } from 'multiformats/traversal'
import * as Block from 'multiformats/block'
import * as codec from 'multiformats/codecs/json'
import { sha256 as hasher } from 'multiformats/hashes/sha2'

// build a DAG (a single block for this simple example)
const value = { hello: 'world' }
const block = await Block.encode({ value, codec, hasher })
const { cid } = block
console.log(cid)
//> CID(bagaaierasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea)

// create a loader function that also collects CIDs of blocks in
// their traversal order
const load = (cid, blocks) => async (cid) => {
  // fetch a block using its cid
  // e.g.: const block = await fetchBlockByCID(cid)
  blocks.push(cid)
  return block
}

// collect blocks in this DAG starting from the root `cid`
const blocks = []
await walk({ cid, load: load(cid, blocks) })

console.log(blocks)
//> [CID(bagaaierasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea)]

Legacy interface

blockcodec-to-ipld-format converts a multiformats BlockCodec into an interface-ipld-format for use with the ipld package. This can help bridge IPLD codecs implemented using the structure and interfaces defined here to existing code that assumes, or requires interface-ipld-format. This bridge also includes the relevant TypeScript definitions.

Implementations

By default, no base encodings (other than base32 & base58btc), hash functions, or codec implementations are exposed by multiformats, you need to import the ones you need yourself.

Multibase codecs

bases import repo
base16 multiformats/bases/base16 multiformats/js-multiformats
base32, base32pad, base32hex, base32hexpad, base32z multiformats/bases/base32 multiformats/js-multiformats
base64, base64pad, base64url, base64urlpad multiformats/bases/base64 multiformats/js-multiformats
base58btc, base58flick4 multiformats/bases/base58 multiformats/js-multiformats

Other (less useful) bases implemented in multiformats/js-multiformats include: base2, base8, base10, base36 and base256emoji.

Multihash hashers

hashes import repo
sha2-256, sha2-512 multiformats/hashes/sha2 multiformats/js-multiformats
sha3-224, sha3-256, sha3-384,sha3-512, shake-128, shake-256, keccak-224, keccak-256, keccak-384, keccak-512 @multiformats/sha3 multiformats/js-sha3
identity multiformats/hashes/identity multiformats/js-multiformats
murmur3-128, murmur3-32 @multiformats/murmur3 multiformats/js-murmur3
blake2b-*, blake2s-* @multiformats/blake2 multiformats/js-blake2

IPLD codecs (multicodec)

codec import repo
raw multiformats/codecs/raw multiformats/js-multiformats
json multiformats/codecs/json multiformats/js-multiformats
dag-cbor @ipld/dag-cbor ipld/js-dag-cbor
dag-json @ipld/dag-json ipld/js-dag-json
dag-pb @ipld/dag-pb ipld/js-dag-pb
dag-jose dag-jose ceramicnetwork/js-dag-jose

Install

$ npm i multiformats

Browser <script> tag

Loading this module through a script tag will make it's exports available as Multiformats in the global namespace.

<script src="https://unpkg.com/multiformats/dist/index.min.js"></script>

API Docs

License

Licensed under either of

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

js-multiformats's People

Contributors

achingbrain avatar alanshaw avatar andymatuschak avatar carsonfarmer avatar ctavan avatar dependabot[bot] avatar expede avatar gobengo avatar gozala avatar jaller94 avatar juliangruber avatar mf416 avatar mikeal avatar multiformats-mgmt-read-write[bot] avatar oed avatar olizilla avatar patrickwoodhead avatar rangermauve avatar rvagg avatar semantic-release-bot avatar vasco-santos avatar vmx avatar vogdb avatar web-flow avatar web3-bot avatar wemeetagain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

js-multiformats's Issues

How to parse a base58 implicit CID with the identity hash?

This module supports Qmfoo base58 implicit CIDs with a sha-256 hash. Peer IDs can be represented as a base58 implicit CID but with the identity hash.

Am I missing something or is this not supported?

import { CID } from 'multiformats/cid'
import PeerId from 'peer-id'

const peerId = await PeerId.create({ keyType: 'secp256k1', bits: 256 })
const str = peerId.toB58String()

console.info(str)
// 16Uiu2HAmGKisxKeBRLyGyKD8MuQFAiUUnPABGhvWna28CSd7WzFE

CID.parse(str)
// Error: To parse non base32 or base58btc encoded CID multibase decoder must be provided

Passing an explicit base doesn't work either:

import { CID } from 'multiformats/cid'
import { base58btc } from 'multiformats/bases/base58'

CID.parse('16Uiu2HAmGKisxKeBRLyGyKD8MuQFAiUUnPABGhvWna28CSd7WzFE', base58btc)
// Error: Unable to decode multibase string "16Uiu2HAmSs7wbckmrVGFRuEadmUaDLys4zaYny7FiyuYf4knA9pV", base58btc decoder only supports inputs prefixed with z

The CID inspector says: base58btc - cidv0 - dag-pb - (identity : 296 : 08021221...)

Compilation fails on Cannot find name 'T'

It's obvious that the T should not be there since it is unknown and generic. this should be typed differently. Anyone else knows how to make this library actually usable when used with Typescript?

Error:

[typescript] Using TypeScript version 4.2.4
[eslint] Using ESLint version 7.32.0
[typescript] Encountered 4 TypeScript issues:
[typescript] Error: ../../common/temp/node_modules/.pnpm/[email protected]/node_modules/multiformats/types/codecs/json.d.ts:3:29 - (TS2304) Cannot find name 'T'.
[typescript] Error: ../../common/temp/node_modules/.pnpm/[email protected]/node_modules/multiformats/types/codecs/json.d.ts:3:66 - (TS2304) Cannot find name 'T'.
[typescript] Error: ../../common/temp/node_modules/.pnpm/[email protected]/node_modules/multiformats/types/codecs/json.d.ts:4:61 - (TS2304) Cannot find name 'T'.
[typescript] Error: ../../common/temp/node_modules/.pnpm/[email protected]/node_modules/multiformats/types/codecs/json.d.ts:4:68 - (TS2304) Cannot find name 'T'.

Actual Code:

export const name: string;
export const code: 512;
export const encode: (data: T) => import("./interface").ByteView<T>;
export const decode: (bytes: import("./interface").ByteView<T>) => T;
export type BlockCodec<Code extends number, T_1> = import('./interface').BlockCodec<Code, T_1>;
//# sourceMappingURL=json.d.ts.map

Replace CID class with a CID interface

Most of ipfs / libp2p stack depends on CID class which introduces large number of dependencies, it is made worth by the fact that typescript forces all of them to agree on the same exact package to type check.

We should instead introduce CID interface and type our APIs so they take / produce value compatible with that interface. That way they become implementation agnostic allowing us to swap / upgrade implementation without coordinating these changes across the board.

CID Implementation in this library will just become a an implementation of that interface.

Docs!

We need more docs.

The new CID and Block interface aren’t documented at all.

UMD Build Option

I am trying to use js-multiformats in a low-code / no-code enivornment. Long story short, they execute JS libs in a sandbox and the lib needs to be compiled as a UMD format. I see there is CJS and ESM libs compiled. Any way to publish this as a UMD?

For context, I'm trying to import the library using this link: https://unpkg.com/browse/[email protected]/esm/src/index.js

Proposal: Change APIs to support sync multihashers

I find strictly async interface for MultihashHasher to be problematic as it induces asynchrony even when that is impractical (from performance and API point of view). My specific use case is related to HAMT implementation which uses murmur3 that is non cryptographic hash and really doesn't need to be async, not to mention that actual implementation is sync. More generally, I think it would be reasonable to have an API that:

  1. Provides general async API that can be used across implementations
  2. Allows hasher user to conditionally avoid await at expense of increased code complexity.
  3. Allows certain functions to demand Sync API as needed.

I propose to amend current interface definition as follows:

export interface MultihashHasher {
  /**
   * Takes binary `input` and returns it (multi) hash digest.
   * @param {Uint8Array} input
   */
  digest(input: Uint8Array): MultihashDigest|Promise<MultihashDigest>

  /**
   * Name of the multihash
   */
   name: string

  /**
   * Code of the multihash
   */
  code: number
}

export interface SyncMultihashHasher extends MultihashHasher {
   digest(input: Uint8Array): MultihashDigest
}

This way

  1. All existing MultihashHasher implementations are compatible as MultihashHasher just widened return type of digest function.
  2. All users of MultihashHasher can continue using await or choose to do so if return type is a promise.
  3. Some implementations could switch to SyncMultihashHasher while retaining compatibility with all the existing code.

Output of multiformats/legacy should be an interface-ipld-format

ipld/interface-ipld-format recently had types added - the return value from multiformats/legacy should be compatible with that type.

Right now js-IPFS expects IPLD formats to be of type https://github.com/ipfs/js-ipfs/blob/master/packages/ipfs-core-types/src/ipld/format.ts but these aren't the same as ipld/interface-ipld-format. The next js-IPFS release will expect ipld/interface-ipld-format so that's the type multiformats/legacy should return.

Refs: ipfs/js-ipfs#3586

Create a CID from a string or Uint8Array or CID-like object

This module has a number of ways of converting things into CID instances - CID.parse for strings, CID.decode for Uint8Arrays, CID.asCID for things CID-shaped.

.parse and .decode will throw, .asCID will not. We generally want to validate what we've been passed as a CID, so a common pattern seems to be forming as:

function toCID (hash) {
  if (typeof hash === 'string') {
    return CID.parse(hash)
  } else if (hash instanceof Uint8Array) {
    return CID.decode(hash)
  }

  const cid = CID.asCID(hash)
  
  if (!cid) {
    throw new Error('Invalid CID')
  }

  return cid
}

If this seems familiar it may be because this is quite similar to how the constructor from the js-cids class behaves.

It is quite useful though, any objections to making this a utility method on the CID class?

Add ability to register codecs for CID.parse

Currently there's some default codecs that are hardoded into CID.parse() such as base58btc and base32, another common codec that's used with IPLD is base36.

Currently one needs to manually pass in codecs into CID.parse() for anything not built-in and requires the application to know ahead of time what base decoder it should use.

It would be convenient if we could register more codecs with CID.parse() so that applications could just register the coded globally and use .parse() without extra checks.

Proposal: Multiblock encoder interface

We're running into more and more cases where BlockEncoder interface just does not fit the bill:

  1. With IPNFT geared towards NFTs we've discovered that NFT metadata can easily exceed 1MiB size which would hinder our ability to serve such blocks on gateway etc....
  2. With new UnixFS code we basically want pass file and get set of blocks with a root back.
  3. Now with UCANs we want to pass auth chain and produce block per link in chain.

I am sure I'm forgetting some and we are likely to encounter more use cases where we want to turn some input into a DAG represented by many blocks. Which is why I would like to propose adopting following interfaces:

export interface SyncDAGEncoder<Code extends number = number, T extends unknown = unknown> {
  encoder(data:T): IterableIterator<{ code: Code, bytes: Uint8Array }>
}

export interface AsyncDAGEncoder<Code extends number = number, T extends unknown = unknown> {
    encoder(data:T): AsyncIterableIterator<{ code: Code, bytes: Uint8Array }>
}

export type DAGEncoder<Code extends number = number, T extends unknown = unknown> =
  | SyncDAGEncoder<Code, T>
  | AsyncDAGEncoder<Code, T>

Last block would be a DAG root block (which is natural due to hash linking)

Such interfaces would cover all above use cases. Additionally we could make all our block codecs implement these interface too making them compatible.

add length requirements to hash functions

For security purposes we should allow hash implementations to come with hash size requirements, either a single integer or a range tuple. This would be used in the validate function and throw if a hash was not of a secure length.

block.bytes can be a nodejs Buffer

Would like to know if this is the desired behavior:

Block = await import('multiformats/block')
codec = await import('@ipld/dag-cbor')
{ sha256: hasher } = await import('multiformats/hashes/sha2')

bytes = Buffer.from([ 100, 97, 115, 100, 102 ])
block = await Block.decode({ bytes, codec, hasher })

block.bytes instanceof Buffer // true

Noticed this while testing with the ipfs block api in node. Buffer extends Uint8Array which is great, but it seems to override some of the characteristics of Uint8Array.

Document ESM and consequent node version requirements

I just ran into an issue where I was using an older version of Node (12.16.1) and getting a "Cannot find module 'multiformats'" error after installation. This seems to be because of the use of ESM modules, which are only be supported in the latest version of node. It's probably worth adding a note about this to the README as otherwise it's very difficult to debug.

Proposal: new registry (free) interface

As I started looking into integrating this into js-dag-service and shared worker projects I've found that registry and dependency injection introduces a lot of complexity. Below are few highlights:

  • registry.add introduces side effect where same code prior to .add can exhibit different behavior from after .add.

  • Other libraries no longer have a CID it's always bound to some registry so libraries inherit all the dependency injection compexity.

  • Indirection of using identifiers (in form of number or string) that are keys in the registry for it's values has several downsides:

    • Typos can be hard to spot e.g. it could take a while to spot that multicodec.encode({ hello, 'world' }, 'dog-cbor') there is a typo and dynamic registries are likely to complicate things further.
    • Type checkers / linters etc... usually won't spot these kind of issues.

    All that could be resolved simply be passing references instead of identifiers e.g. multicodec.encode({ hello, 'world' }, codecs.dogCbor) would:

    1. provide a better error because dynamic registry is no longer at play.
    2. linter may pickup that codecs.dogCbor such property does not exist.
    3. Type checker will pick up lack of property and when codec.dagCbor is used it will be able to infer and hint things about return values.

After going down this rabbit hole I have realized that all this enables is:

  1. cid.toString() to work without a default.
    > Note: cid.toString('base32') can just be swapped for cid.toString(base32) and address some of the issue highlighted above.
  2. Enables CID.from(cid:string), because it needs a base decoder for the given string.
  3. Enables CID.from(cid:Uint8Array).toString() because it needs a base encoder to serialize to string.

I think there is a way to address above constraints while:

  1. Remove dependency injection on every library that needs to create / decode / parse CIDs.
  2. Reduce margin for error & enable tooling to do more by removing indirection.
  3. Either get rid of dependency injection and registry all together or have separate layer for it.

    Note: Libraries that need to create / decode / parse CIDs will be free of that layer.

  4. Have one CID class to rule them all instead of one per configuration.

Here is the proposed interface to accomplish that (in TS format):

// # SoleBase

/**
 * Interface implemented by arbitrary base encoders like base32 or base58btc.
 */
export interface BaseEncoder {
  /**
   * Name of the encoding.
   */
  name: string
  prefix: string

  /**
   * Encodes binary data into base encoded string.
   */
  encode(bytes: Uint8Array): string
}

/**
 * Interface implemented by arbitrary base decoder like base32 or base58btc.
 */
export interface BaseDecoder {
  decode(text: string): Uint8Array
}

/**
 * In practice (at least currently) bases are both encoders and decoders,
 * however it is useful to separate those capabilities as senders would
 * need encoder capability and receiver would need decoder capability.
 */
export interface BaseCodec extends BaseEncoder, BaseDecoder {}


// # Multibase


// Multibase encoder seems redundant, because it needs to be provided a base
// encoder and at that point it could just do `base.encode(data)`
export interface MultibaseEncoder {
  encode(bytes: Uint8Array, base: BaseEncoder): string
}

// Multibase decoder is API compatible with sole base decoder. Conceptually
// it is a composite base decoder that delegates to a base decoder based on
// prefix. That is to suggest that `MultibaseDecoder` is also redundant.
export interface MultibaseDecoder extends BaseDecoder {
}

// # Multihash

/**
 * Represents a multihash digest which carries information about the
 * hashing alogrithm and an actual hash digest.
 */
// Note: In the current version there is no first class multihash
// representation (plain Uint8Array is used instead) instead there seems to be
// a bunch of places that parse it to extract (code, digest, size). By creating
// this first class representation we avoid reparsing and things generally fit
// really nicely.
declare class MultihashDigest {
  /**
   * Turns bytes representation of multihash digest into an instance.
   */
  static from(bytes:Uint8Array):MultihashDigest

  /**
   * Creates a multihash digest.
   */
  constructor(code:number, digest:Uint8Array)

  /**
   * Code of the multihash
   */
  code: number

  /**
   * Raw digest (without a hashing algorithm info)
   */
  digest: Uint8Array

  /**
   * byte length of the `this.digest`
   */
  size: number

  /**
   * Binary representation of the this multihash digest.
   */
  bytes: Uint8Array[]
}


/**
 * Hasher represents a hashing algorithm implementation that produces as
 * `MultihashDigest`.
 */
interface Hasher {
  /**
   * Takes binary `input` and returns it (multi) hash digest.
   * @param {Uint8Array} input
   */
  digest(input: Uint8Array): MultihashDigest
}


// This is now redundant because one could just do `hasher.digest(input)`
// instead.
interface Multihash {
  digest(input: Uint8Array, hasher:Hasher): MultihashDigest
}


// # IPLD Codec

/**
 * IPLD encoder part of the codec.
 */
export interface Encoder<T> {
  name: string
  code: number
  encode(data: T): Uint8Array
}

/**
 * IPLD decoder part of the codec.
 */
export interface Decoder<T> {
  decode(bytes: Uint8Array): T
}

/**
 * IPLD codec that is just Encoder + Decoder however it is
 * separate those capabilties as sender requires encoder and receiver
 * requires decoder.
 */
export interface Codec<T> extends Encoder<T>, Decoder<T> {}


// This now also looks redundant because one could do `encoder.encode(value)`
// and `decoder.decode(value)` without this inderection.
export interface Multicodec {
  encode<T>(value: T, encoder: Encoder<T>): Uint8Array
  decode<T>(bytes: Uint8Array, decoder: Decoder<T>): T
}


// CID



declare class CID implements CID {
  // Data representation of the CID.
  code: number
  version: number
  multihash: MultihashDigest
  bytes: Uint8Array

  /**
   * Serializes this CID with provided base encoder. If not provided uses base
   * encoder this CID was suplied during instantiation.
   */
  toString(base?:BaseEncoder):string

  
  // Create in addition takes `base` parameter so that it can be used for
  // string serialization.
  static create(version:number, code:number, digest:MultihashDigest, base:BaseEncoder):DecodedCID

  // This is replacing `CID.from('QmHash')` which requires multibase registry
  // that will have to contain base (decoder) that cid was encoded with (which
  // it could be missing). All this introduces a lot of incidental complexity
  // that can be removed by separating concerns.
  /**
   * Takes cid string representation and a base decoder that supports that
   * encoding (it could be a sole decoder like like `base32` or composite
   * multi decoder like `multibase([base32, base56btc, ....])`).
   *
   * Throws if base  encoding is not supported by supplied `base` decoder.
   */
  static parse(cid:string, base:BaseDecoder):ParsedCID

  /**
   * Takes cid in a binary representation and a `base` encoder that will be used
   * for default cid serialization.
   *
   * Throws if non `base56btc` encoder is supplied with CID V0.
   */
  static decode(cid:Uint8Array, base:BaseEncoder):DecodedCID

  /**
   * Creates CID from a binary representation.
   */
  // 💣💣💣💣💣💣💣💣💣
  // This method is problematic because there is no information about base
  // encoding to be used by `this.toString()`.
  // 
  // I think there are following options to resolve this without having to
  // resort to register bound CID classes:
  //
  // 1. Bundle additional subclasses e.g. `Base32CID` and `Base58btcCID` (with
  //    corresponding base encoders bounded) so that `CID.from(new Uint8Array(...))`
  //    is able to can return either one or the other depending on cid version.
  //    `DefaultCID` illustrates implementation of that option.
  //
  // 2. Let `CID.from(new Uint8Array(...)).toString()` throw with no default
  //    base encoder error. Addition we could expose `Base32CID`,
  //    `Base56btcCID`, `DefaultCID` classes (as illustrated below) so that
  //    users can turn bytes into CID into a chosen base encoding.
  //
  // 3. Remove `CID.from` all together and have them on base specific subclasses
  //    like `Base32CID`, `DefaultCID` instead. This way `cid.toString()` will
  //    never throw as it would be impossible to create one without encoder.
  // 
  //    Note that we could still have something like `multiformats.cid(value)`
  //    where (value:Uint8Array|string|CID) to allow a registry approach when
  //    and if it is more convenient. Benefit is that other libs will be able
  //    to interop with CIDs returned by multibase without having to pull it in
  //    or be constrained by dependecy injection.
  // 
  // 4. Remove `CID.from(bytes:Uint8Array)` in favour of
  //    `CID.decode(bytes:Uint8Array, base:BaseEncoder)` so that user has to
  //    specificy default base.
  static from(cid: Uint8Array):CID
  

  // If another CID instance is provided just creates a clone. If we remove all
  // the other forms of `CID.from` this we should probably get rid of this one
  // as well. If functionality is needed we could add `.clone()` method instead.
  static from(cid: CID):CID
}


// This is just an illustrates how `CID.from(cid:Uint8Array)` could be
// implemented if `base32` and `base56btc` encoding were bundled with it.
class DefaultCID extends class CID {
  static from(cid) {
    if (cid instanceof Uint8Array) {
      const [version, offset] = varint.decode(bytes)
      switch (version) {
        case 18: {
          const multihash = new ImplicitSha256Digest(0x12, bytes)
          return CID.create(0, 0x70, multihash, base58btc)
        }
        case 1: {
          const [code, length] = varint.decode(bytes.subarray(offset))
          const multihash = MultihashDigest.from(bytes.subarray(offset + length))
          return CID.create(1, code, multihash, base32)
        }
        default: {
          throw new RangeError(`Invalid CID version ${version}`)
        }
      }
    } else {
      return CID.from(cid)
    }
  }
}

// Because CIDv0 does not really use multihash but rather a plain sha256 hash
// digest we define this implicit sha256 digest to represent it.
class ImplicitSha256Digest extends MultihashDigest {
  constructor(code:number, digest:Uint8Array)
    this.digest = digest
    this.bytes = digest
    this.code = code
    this.size = 32
  }
}

// Just an illustration of a CID bound to base58btc base.
class Base58btcCID extends CID {
  toString(base=base58btc) {
    return base.encode(this.bytes)
  }
}

// Just an illustration of a CID bound to a base32 base.
class Base32CID extends CID {
  toString(base=base32) {
    if (this.version === 0 && base.name !== 'base58btc') {
      throw new Error(`Cannot string encode V0 in ${base.name} encoding`)
    } else {
      return base.encode(this.bytes)
    }
  }
}


// I do not think we need this subclass, but it is useful for illustration
// purposes.
class DecodedCID extends CID {
  // When CID is created from binary data by calling `CID.create` or
  // `CID.decode` it needs to hold a reference to a `BaseEncoder` instance
  // so that it could be used by `toString()` when `base` isn't suppiled.
  private base: BaseDecoder

  toString(base=this.base) {
    if (this.version === 0 && base.name !== 'base58btc') {
      throw new Error(`Cannot string encode V0 in ${base.name} encoding`)
    } else {
      return base.encode(this.bytes)
    }
  }
}

// I do not think we do need this subclass, but it is useful for illustration
// purposes.
class ParsedCID extends CID {
  // When CID is parsed from string it does not need a `BaseEncoader` to provide
  // `toString()` implementation, because original string representation can be
  // retained.
  private asString: string

  toString(base) {
    if (base == null) {
      return this.asString
    } else if (this.version === 0) {
      if (base.name !== 'base58btc') {
        throw new Error(`Cannot string encode V0 in ${base.name} encoding`)
      } else {
        // no need to encode again since for version 0 we're guaranteed
        // to have base58btc
        return this.asString
      }
    } else {
      return base.encode(this.bytes)
    }
  }
}


// Block

// Just a representation for awaitable `T`.
export type Awaitable<T> =
  | T
  | Promise<T>


export interface Block {
  cid(): Awaitable<CID>
  encode(): Awaitable<Uint8Array>
}

export interface BlockEncoder {
  encode<T>(value: T, codec: Encoder<T>, options?: EncodeOptions): Block
}

interface EncodeOptions {
  /**
   * Multihasher to be use for the CID of the block. Will use a default
   * if not provided.
   */
  hasher?: Hasher
  /**
   * Base encoder that will be passed by the CID of the block.
   * Default is used if omitted.
   */
  base?: BaseEncoder
}



export interface BlockDecoder {
  decode<T>(block: Uint8Array, codec: Decoder<T>, options?: DecodeOptions): Block
}

interface DecodeOptions {
  /**
   * Multihasher to be use for the CID of the block. Will use a default
   * if not provided.
   */
  hasher?: Hasher

  /**
   * Base encoder that will be passed by the CID of the block.
   * Default is used if omitted.
   */
  base?: BaseEncoder
}

interface BlockAPI extends BlockDecoder, BlockDecoder {

}

dag-pb implementation of new interface?

It seems like this is sufficient to make it compatible (works in my tests locally). Is there a more "official" package already available (for compatibility reasons)?

const multiformats = require("multiformats/basics.js");
const { Buffer } = require("buffer");
const { util: { serialize, deserialize } } = require("ipld-dag-pb");
const dagpb = {
  encode: serialize,
  decode: buffer => deserialize(Buffer.from(buffer)),
  code: 0x70,
  name: "dag-pb",
};
multiformats.multicodec.add(dagpb);

CIDs as interfaces

#161 was closed and it looks like we've backtracked on the CIDs as interfaces thing. Now we have a Link interface and CID class that implements the link interface. We also have CIDs as an ArrayBufferView which, I don't know, maybe didn't turn out to be as useful as we thought it might be at the time, but the end result is it has extra properties byteOffset and byteLength which are obscured from the uninterested caller by making them non-enumerable. This works but it is very slow.

If we're going to town on #199 maybe we could do the CID-as-interface. I'd like to propose a very minimal interface, any static methods would be exposed as named exports instead.

export type CIDVersion = 0 | 1

export interface CID {
  version: CIDVersion
  code: number
  multihash: MultihashDigest
  bytes: Uint8Array

  equals: (other: CID | Uint8Array | string) => boolean
}

// create CIDs
export function createCID (version: CIDVersion, code: number, digest: MultihashDigest): CID
export function cidFromString (string: string): CID
export function cidFromBytes (bytes: Uint8Array): CID

// housekeeping
export function toV0 (cid: CID): CID
export function toV1 (cid: CID): CID
export function isCID (value: any): value is CID

Some thoughts:

  • toV0/toV1 can we live without these? A quick grep of the js-ipfs codebase and we only call them in tests and for top-level UI type output - the cli and the http gateway, would vote to remove and let users have their own utils for this
  • asCID/isCID my (perhaps unpopular) opinion is that these are named too similarly and that asCID is clunky to use in practice as it requires null-guarding on the output. If the CID interface is only the fields above perhaps we don't need to convert old impls into new ones any more (the original purpose of asCID IIRC)
  • parse/decode vs cidFromString/cidFromBytes - the meaning of the words 'parse' and 'decode' are so close as to be synonymous which makes the API hard to use without reference to documentation; let's just say what they do. @libp2p/peer-id has taken this approach and it makes the codebase a bit more straightforward to the newcomer.
  • toJSON can we live without this? Similar to toV0 and toV1 I only see it used in UIs.
  • toString(encoder?: MultibaseEncoder) omission - maybe if you have a non-default encoder you could just use it to encode the .bytes property of the CID instead of passing it to the toString method? Calling cid.toString() would only ever return the default encoding (base58btc for v0, base32 for v1). The current impl maintains a cache of strings which might be a reason to keep it? I am not sure where this has been a bottleneck in the past though.
  • CIDIface/CIDImpl - @multiformats/multiaddr and @libp2p/peer-id have avoided this sort of naming scheme by intentionally not exporting the default implementation. You only get factory functions that take some input and return something that conforms to the interface. I suggest following that convention here too.

Should a codec export a `codec({ encode, decode })` or bare `{ encode, decode }`?

Discussion so far spread through:

First PR was copying the pattern from the raw and json codecs in this library. @mikeal and I had a direct chat about this between the first two PRs and agreed that we should just export bare and allow wrapping by the consuming code.

I honestly don't care too much about what to export since we're likely going to be importing multiformats in these codecs anyway to deal with CID (and bytes is handy too) so it's not a big stretch to just wrap. I don't really see a need to have the bare ones available, it's just a matter of deciding how low-level codecs should be.

What I am concerned about is user ergonomics. For most codecs it's going to be fine, but I anticipate users wanting the prepare() helper from @ipld/dag-json so the import semantics become a concern if we return a wrapped codec(), as I said in the first PR above.

Various options if we want to wrap codecs include:

Export default and additions (a little bit gross for the user):

import dagPB, { prepare, validate } from '@ipld/dag-pb'

Export wrapped codec but decorating it on the way out (a little bit gross because it messes with the clean Codec):

import dagPB from '@ipld/dag-pb'

dagPB.prepare(...)
dagPB.validate(...)

Export both wrapped and bare versions (fine, but comes back to the question of the utility of having both, do we need them both or are we just risking adding more API surface area we'll need to maintain into the future?):

import dagPB from '@ipld/dag-pb'
import {  code, name, encode, decode, prepare } from '@ipld/dag-pb/codec' 

Export utilities separately (fine, not the prettiest API but not so gross):

import dagPB from '@ipld/dag-pb'
import prepare from '@ipld/dag-pb/prepare' 
import validate from '@ipld/dag-pb/validate' 

or

import dagPB from '@ipld/dag-pb'
import { prepare, validate } from '@ipld/dag-pb/util' 

// @mikeal @Gozala

Proposal: add some method e.g. `.toLink(): CID` on CID object

Rational

Often times when working with partial DAGs we find ourselves with mix of materialized nodes for local blocks and links external blocks. When working in such a domain I often find myself wishing:

  1. I could pass either materialized node or a CID into functions without having to turn node into CID (often times that also requires checking if thing at hand is CID or a node).
  2. In functions that care about links I wish I could allow passing anything that can be linked via CID.

Both sides of the call would be happier if you could pass something "linkable" instead, that way callers could pass arbitrary nodes (as long as they are linkable) and callee could just use a link without having to inspect arguments.

Proposal

  1. I would like to propose introducing a new interface e.g. Linkable:
interface Linkable {
  toLink(): CID
}
  1. Make CID implementation of proposed Linkable interface.

That way arbitrary objects could implement Linkable interface which would allow them to be used everywhere Linkable is accepted. Functions that typically want CIDs could also start accepting more general Linkables and there for allow passing arbitrary object that could be linked.

Alternative

For what it's worth we already have something along these lines in form of asCID property. However it is a private property so TS complains about it. We could make it non private and achieve same goal by implementing asCID property on desired nodes.

base64url and base64urlpad broken in browsers

This test passes in node but fails in the browser with Failed to execute 'atob' on 'Window': The string to be decoded is not correctly encoded.:

test('should round trip base64url', () => {
  const buf = Uint8Array.from([239, 250, 254])
  const enc = b64.base64url.encode(buf)
  const dec = b64.base64url.decode(enc)

  same(dec, buf)
})

I think this is because the browser implementation of base64 uses the btoa and atob functions which treat binary data as strings. When the data has values that do not map to printable characters it corrupts the data as we see above.

js-multibase handled this by using it's own rfc4648 compliant encoding/decoding algorithm - maybe we could re-use that here instead?

We can then also remove the __browser property from the base64 exports making it consistent with the other bases and it becomes trivial to support the final few baseN encodings that are missing from this library.

print deprecated warn when use , better than throw error make incompatible

js-multiformats/src/cid.js

Lines 162 to 180 in f53f7aa

get toBaseEncodedString () {
throw new Error('Deprecated, use .toString()')
}
get codec () {
throw new Error('"codec" property is deprecated, use integer "code" property instead')
}
get buffer () {
throw new Error('Deprecated .buffer property, use .bytes to get Uint8Array instead')
}
get multibaseName () {
throw new Error('"multibaseName" property is deprecated')
}
get prefix () {
throw new Error('"prefix" property is deprecated')
}

How to access `.or` prop of decoders

I'd like to compose a decoder, something like:

import * as b32 from 'multiformats/bases/base32'
import * as b36 from 'multiformats/bases/base36'
import * as b58 from 'multiformats/bases/base58'
import * as b64 from 'multiformats/bases/base64'
import { base32 } from 'multiformats/bases/base32'

const bases:Record<string, MultibaseCodec<any>> = {
  ...b32,
  ...b36,
  ...b58,
  ...b64
}

const baseDecoder = Object
  .keys(bases)
  .map(key => bases[key].decoder)
  .reduce(
    (acc, curr) => acc.or(curr),  // <--- fails because `.or` is not in the types though it is in the `Decoder` class
    base32.decoder
  )

It's not clear to me how I'm supposed to do this and not upset the type checker, any pointers?

I tried adding .or to the MultibaseDecoder type def but it explodes because elsewhere you need to know if you're being passed a UnibaseDecoder or a CombobaseDecoder to create a ComposedDecoder - I started trying to fix it but it got a little out of hand so I thought I'd ask instead.

Skypack build failing with [email protected]

multiformats module was previously working fine in skypack, but with the latest release (9.4.5) its build is now failing, as you can see in https://codepen.io/vascosantos/pen/qBjByRo?editors=0011

image

The build error can be seen in https://cdn.skypack.dev/error/build:[email protected] . Doing a simple https://cdn.skypack.dev/[email protected]/cid works fine, but if we change the version to the next one it will be forwarded for the build error: https://cdn.skypack.dev/[email protected]/cid

From 9.4.5 only setup node action was changed. Also, it seems that ipjs did not have any release between 9.4.4 and 9.4.5. With this in mind, I am clueless at this point of what is the problem here.

Any ideas @rvagg ?

Any interest in using a synchronous hasher ?

In this line the exports are switched depending on whether a browser is loading this library or nodejs. This is the only export to switch in this way.

This means that the operation is synchronous in nodejs, but asynchronous in the browser, which can cause a headache for isomorphic codebases.

If @noble/hashes was used as the hasher on both nodejs and browser, then this oddity can be removed, as well as keeping performance and benefiting from an audit

Should we use `/` instead of `asCID` ?

I recently occurred to me that dag-json uses { '/': CID } for encoding links (I know I hate it too). However since we have introduces new cid.asCID === cid that got me thinking we could rename (or alias) cid.asCID to cid['/'] property and it would technically hold { '/': CID }.

Not sure if this is just brain-fart or there is something to it, but I though I'd share to find out.

package.json exports do not work well with typescript

When trying to import multiformat modules like multiformats/block in typescript I keep getting errors. For example:

import * as Block from 'multiformats/block';

Produces the following exception when I run tsc:

Error: Cannot find module 'multiformats/block'
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15)
    at Function.Module._load (internal/modules/cjs/loader.js:562:25)
    at Module.require (internal/modules/cjs/loader.js:692:17)
    at require (internal/modules/cjs/helpers.js:25:18)
    at Object.<anonymous> (/home/mikola/Projects/rabin-b-tree/src/rabin-b-tree.ts:1:15)
    at Module._compile (internal/modules/cjs/loader.js:778:30)
    at Module.m._compile (/home/mikola/Projects/rabin-b-tree/node_modules/ts-node/src/index.ts:1043:23)
    at Module._extensions..js (internal/modules/cjs/loader.js:789:10)
    at Object.require.extensions.(anonymous function) [as .ts] (/home/mikola/Projects/rabin-b-tree/node_modules/ts-node/src/index.ts:1046:12)
    at Module.load (internal/modules/cjs/loader.js:653:32)

I am currently using the latest versions of node and tsc in the project.

mikola@cybertoaster:~/Projects/rabin-b-tree$ ./node_modules/.bin/node --version
v15.0.1
mikola@cybertoaster:~/Projects/rabin-b-tree$ ./node_modules/.bin/tsc --version
Version 4.0.3

The underlying problem is probably that tsc doesn't recognize package.json exports yet: microsoft/TypeScript#33079

As a result it doesn't seem like you can really use multiformats with typescript yet

Missing "main" entry point file

Some parsers don't understand exports, and fallback to main... right now, this causes issues, as the "main" key points to a non-existent file. For example, in React Native, the following error occurs when importing this module:

Error: While trying to resolve module `multiformats` from file `.../MyApp/node_modules/some/dep/package/index.js`, the package `.../MyApp/node_modules/multiformats/package.json` was successfully found. However, this package itself specifies a `main` module field that could not be resolved (`.../MyApp/node_modules/multiformats/index`. Indeed, none of these files exist:

Moving source into `src`

Would you be OK if I would move all the source code into a src directory? That would make grepping the source code so much easier.

Removing version from CID constructor

I suggest that the version is removed from the constructor and that it is always v1 automatically. There could be a dedicated v0 constructor, or we just allow constructing it by multihash (which is probably most of its uses).

consider adding `toURL` method to embed best practices of serializing cid to URL

There had been various discussions about the fact that in the wild we see IPFS urls to specific gateways. I think we can bake a best practice into CID implementation by adding a method like:

interface CID {
  // ... stuff we already have
  /**
   * Encodes CID into a URL object.
   *
   * @example
   * ```js
   * const cid = CID.parse('QmbrRJJNKmPDUAZ8CGwn1WNx2C7xP4J284VWoAUDaCiLaD')
   * cid.toURL().href // 'ipfs://bafybeigizayotjo4whdurcq6ge7nrgfyxox7ji7oviesmnvgrnxn3nakni/'
   * ```
   * 
   * Optionally you could provide a gateway URL to encode CID to a URL in that gateway.
   * @example
   * ```js
   * const cid = CID.parse('QmbrRJJNKmPDUAZ8CGwn1WNx2C7xP4J284VWoAUDaCiLaD')
   * cid.toURL({ gateway: new URL('https://dweb.link') }).href
   * // => 'https://dweb.link/ipfs/bafybeigizayotjo4whdurcq6ge7nrgfyxox7ji7oviesmnvgrnxn3nakni'
   * ```
   */
  toURL(options?: { gateway?: URL }):URL
}

Note that here we reinforce several of the best practices:

  1. ipfs://${cidv1} is the default that we to see.
  2. We want to encourage CID v1 because it addresses bunch of issues that we have with v0.
  3. Gateways URLs are origin separated.

Generated Typescript types fail to compile, are missing some generic arguments

Hi, all. The generated .d.ts files are missing some generic arguments, making builds fail when using these type definitions. For instance, in base.d.ts:

declare class ComposedDecoder<Prefix extends string> implements MultibaseDecoder, CombobaseDecoder {

MultibaseDecoder and CombobaseDecoder both require a Prefix generic type argument, but none is supplied.

This is the full set of errors I encountered, but it may not be exhaustive, since I'm not importing all submodules:

../../node_modules/multiformats/dist/types/bases/base.d.ts:16:75 - error TS2314: Generic type 'MultibaseCodec' requires 1 type argument(s).

16 export class Codec<Base extends string, Prefix extends string> implements MultibaseCodec, MultibaseEncoder, MultibaseDecoder, BaseCodec, BaseEncoder, BaseDecoder {
                                                                             ~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/bases/base.d.ts:16:91 - error TS2314: Generic type 'MultibaseEncoder' requires 1 type argument(s).

16 export class Codec<Base extends string, Prefix extends string> implements MultibaseCodec, MultibaseEncoder, MultibaseDecoder, BaseCodec, BaseEncoder, BaseDecoder {
                                                                                             ~~~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/bases/base.d.ts:16:109 - error TS2314: Generic type 'MultibaseDecoder' requires 1 type argument(s).

16 export class Codec<Base extends string, Prefix extends string> implements MultibaseCodec, MultibaseEncoder, MultibaseDecoder, BaseCodec, BaseEncoder, BaseDecoder {
                                                                                                               ~~~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/bases/base.d.ts:94:78 - error TS2314: Generic type 'MultibaseEncoder' requires 1 type argument(s).

94 declare class Encoder<Base extends string, Prefix extends string> implements MultibaseEncoder, BaseEncoder {
                                                                                ~~~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/bases/base.d.ts:132:78 - error TS2314: Generic type 'MultibaseDecoder' requires 1 type argument(s).

132 declare class Decoder<Base extends string, Prefix extends string> implements MultibaseDecoder, UnibaseDecoder, BaseDecoder {
                                                                                 ~~~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/bases/base.d.ts:132:96 - error TS2314: Generic type 'UnibaseDecoder' requires 1 type argument(s).

132 declare class Decoder<Base extends string, Prefix extends string> implements MultibaseDecoder, UnibaseDecoder, BaseDecoder {
                                                                                                   ~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/bases/base.d.ts:166:65 - error TS2314: Generic type 'MultibaseDecoder' requires 1 type argument(s).

166 declare class ComposedDecoder<Prefix extends string> implements MultibaseDecoder, CombobaseDecoder {
                                                                    ~~~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/bases/base.d.ts:166:83 - error TS2314: Generic type 'CombobaseDecoder' requires 1 type argument(s).

166 declare class ComposedDecoder<Prefix extends string> implements MultibaseDecoder, CombobaseDecoder {
                                                                                      ~~~~~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/codecs/codec.d.ts:19:78 - error TS2314: Generic type 'BlockEncoder' requires 2 type argument(s).

19 export class Encoder<T, Name extends string, Code extends number> implements BlockEncoder {
                                                                                ~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/codecs/codec.d.ts:41:57 - error TS2314: Generic type 'BlockDecoder' requires 2 type argument(s).

41 export class Decoder<Code extends number, T> implements BlockDecoder {
                                                           ~~~~~~~~~~~~

../../node_modules/multiformats/dist/types/codecs/codec.d.ts:64:76 - error TS2314: Generic type 'BlockCodec' requires 2 type argument(s).

64 export class Codec<Name extends string, Code extends number, T> implements BlockCodec {
                                                                              ~~~~~~~~~~

Property names with '/' character cannot be retrieved using path

Thank you for this library - I appreciate that this problem is very rare.
In this file:

get (path = '/') {
the get function fails if any property keys contain a '/'.

eg: for the object { 'this/will/be/funny': false } if a block is made out of it, block.get('this/will/be/funny') will not work. I think that this library should change the interface to be get( ['this/will/be/funny'] ) instead of a string, to make this unambiguous, but I do appreciate this issue may be too pedantic to warrant a change.

As a curious piece of trivia, linux paths cannot have the '/' in directory names not because of any config or other configurable parameter, but because it remains to this day hard coded into the kernel. JS objects on the other hand....

TODO: DAG walk for Block API

We've had APIs for DAG walks in various iterations but don't have a generic one for the js-multiformats stack and it would be a nice fit in the core package.

e.g. https://github.com/ipld/js-block/blob/master/reader.js#L4 was for the short-lived js-block experiment that predates js-multiformats, and I think there's a library that did it for the older interfaces too. Not a hard algorithm, but the complicating factor now is that we don't want to bundle codecs in the core API—the user needs to provide them. So the API might be a little gnarly with an array of codecs, or we could move forward with something like #38 to make this easier/cleaner.

https://github.com/ipfs/js-ipfs/blob/6a2c710e4b66e76184320769ff9789f1fbabe0d8/packages/ipfs-core/src/components/dag/export.js#L82-L107 was the last implementation of this done. Any implementation here should aim to replace most of that code with a call into this library.

ipfs-car would also be a logical consumer of such an API, to help make deterministic CARs: web3-storage/ipfs-car#76

Questions around ordering will need to have some clarity when comparing to Go APIs. Such DAG walks using go-ipld-prime will strictly use map field ordering as they appear in the encoded block. We have a little more difficulty in JS since we have to rely on Object property ordering to do this work. It would be good to test and document any potential footguns for users wrt compatibility and "determinism" related to this.

Multidecoder Interface

Right now -- I should say, from how I understand it -- js-multiformats defines an interface for encoding and decoding using individual codecs. This encapsulates the codec pattern perfectly well. In fact, for encoding, it's really all you need; seeing as you can only encode something using a single codec.

However, when dealing with IPLD in the more general sense (which strives to be polymorphic over its serializations), developers are left on their own to create codec/hash registries that will hopefully be complete enough to be able to deserialize what's given to them.

This is especially exacerbated in the case of libraries which take CIDs from their users. Library authors have to either:

  • Include as many codecs as possible
    • This introduces massive dependency size overhead
    • Future needs for codecs means library updates are necessary
  • Accept extra codecs to be matched against when deserializing data

Obviously, the former of the two is an obvious antipattern. The latter, though, has been used successfully in many of the higher level JavaScript IPFS libraries to provide generic interfaces libraries can consume -- dependency injection.

I propose a similar interface, named something like a PolyMultidecoder which exposes an add() and remove() to (de)register BlockDecoders, and a single decode() which resolves and uses the correct decoder, or throws an error if it's otherwise missing.

Proposal: CID+Block=Multiblock

We already have multibase, multihash that all in nutshell are metada+data. We don not however have similar thing for blocks, so it becomes impossible to derive what codec to use to decode it.

In the past when I was working on https://github.com/gozala/ipdf/ I came up with cid+block thing that I called inline blocks, so that graphs could contain encrypted and concealed sub-graphs that would only reveal themselves to the key holder.

CAR format seems to also pair CID+blocks.

And this thread #36 (comment) I think also illustrates lack of such abstraction.

Ironically JS Block instance also contains CID+Block but when you encoded you can no longer decode it back without additionally providing 'codec' information.

I think if we do formalize such a building block it would allow for a nice and compos-able libraries around it.

Type problem when doing a generic multibase decode

What is the right way to do a generic multibase decoder?

import { bases } from 'multiformats/basics';

const ds = Object.values(bases).map((c) => c.decoder).reduce(
  (d1, d2) => d1.or(d2)
);

ds.decode('...');

The above results in a type error:

Type 'ComposedDecoder<string>' is not assignable to type 'Decoder<"base58btc", "z"> | Decoder<string, string> | Decoder<"identity", "\0"> | Decoder<"base58flickr", "Z"> | Decoder<"base36", "k"> | Decoder<...> | Decoder<...>'.
  Type 'ComposedDecoder<string>' is missing the following properties from type 'Decoder<"base10", "9">': name, prefix, baseDecodets(2322)
lib.es5.d.ts(1379, 24): The expected type comes from the return type of this signature.

If I try to use any of the the types from:

import type { UnibaseDecoder, CombobaseDecoder, MultibaseDecoder, BaseDecoder } from 'multiformats/bases/interface';

None of them work because none of the interfaces have the or interface.

The other problem is that the Decoder class is not an exported class, and neither is ComposedDecoder.

This used to be possible with the old multibase library.

Performance of Object.defineProperties use in the CID class

I'm doing some profiling to debug high CPU usage and Object.defineProperties takes up rather a lot of execution time:

image

There's an interesting post here where someone tries to optimise very similar usage, an interesting quote being:

there [is] basically no truly efficient way to make properties nonenumerable

Would it be so terrible to remove the Object.definedProperties invocation here?

If we absolutely must have some sort of no-you-cannot-access-them-don't-even-think-about-it-the-sky-will-fall-on-our-heads-type protection for these fields could we switch to using private class fields instead?

Usage react-scripts tests: TypeError: json.encode is not a function

When trying to use multiformats with react-scripts (jest), I get an odd error:

yarn create react-app multiformats-react
cd multiformats-react
yarn add multiformats

Then, I just add code from the readme in App.js:

function App() {

    useEffect(() => {
        const bytes = json.encode({ hello: 'world' })
        const hash = sha256.digest(bytes).then((hash) => {
            const cid = CID.create(1, json.code, hash)
            console.log(cid.toString(base64.encoder))
        })
    })
	// ...
}

And I am getting the following error when running react-scripts test (uses jest in the background):

yarn test
$ react-scripts test
 FAIL  src/App.test.js
  ✕ renders learn react link (89 ms)

  ● renders learn react link

    TypeError: json.encode is not a function

      12 |
      13 |     useEffect(() => {
    > 14 |         const bytes = json.encode({ hello: 'world' })
         |                            ^
      15 |         const hash = sha256.digest(bytes).then((hash) => {
      16 |             const cid = CID.create(1, json.code, hash)
      17 |             console.log(cid.toString(base64.encoder))

      at src/App.js:14:28
      at invokePassiveEffectCreate (node_modules/react-dom/cjs/react-dom.development.js:23487:20)
      at HTMLUnknownElement.callCallback (node_modules/react-dom/cjs/react-dom.development.js:3945:14)
      at HTMLUnknownElement.callTheUserObjectsOperation (node_modules/jsdom/lib/jsdom/living/generated/EventListener.js:26:30)
      at innerInvokeEventListeners (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:318:25)
      at invokeEventListeners (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:274:3)
      at HTMLUnknownElementImpl._dispatch (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:221:9)
      at HTMLUnknownElementImpl.dispatchEvent (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:94:17)
      at HTMLUnknownElement.dispatchEvent (node_modules/jsdom/lib/jsdom/living/generated/EventTarget.js:231:34)
      at Object.invokeGuardedCallbackDev (node_modules/react-dom/cjs/react-dom.development.js:3994:16)
      at invokeGuardedCallback (node_modules/react-dom/cjs/react-dom.development.js:4056:31)
      at flushPassiveEffectsImpl (node_modules/react-dom/cjs/react-dom.development.js:23574:9)
      at unstable_runWithPriority (node_modules/scheduler/cjs/scheduler.development.js:468:12)
      at runWithPriority$1 (node_modules/react-dom/cjs/react-dom.development.js:11276:10)
      at flushPassiveEffects (node_modules/react-dom/cjs/react-dom.development.js:23447:14)
      at Object.<anonymous>.flushWork (node_modules/react-dom/cjs/react-dom-test-utils.development.js:992:10)
      at act (node_modules/react-dom/cjs/react-dom-test-utils.development.js:1107:9)
      at render (node_modules/@testing-library/react/dist/pure.js:97:26)
      at Object.<anonymous> (src/App.test.js:5:3)

  console.error
    Error: Uncaught [TypeError: json.encode is not a function]
        at reportException (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/helpers/runtime-script-errors.js:62:24)
        at innerInvokeEventListeners (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:333:9)
        at invokeEventListeners (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:274:3)
        at HTMLUnknownElementImpl._dispatch (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:221:9)
        at HTMLUnknownElementImpl.dispatchEvent (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:94:17)
        at HTMLUnknownElement.dispatchEvent (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/generated/EventTarget.js:231:34)
        at Object.invokeGuardedCallbackDev (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:3994:16)
        at invokeGuardedCallback (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:4056:31)
        at flushPassiveEffectsImpl (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:23574:9)
        at unstable_runWithPriority (/home/froyer/src/multiformat-react/node_modules/scheduler/cjs/scheduler.development.js:468:12) TypeError: json.encode is not a function
        at /home/froyer/src/multiformat-react/src/App.js:14:28
        at invokePassiveEffectCreate (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:23487:20)
        at HTMLUnknownElement.callCallback (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:3945:14)
        at HTMLUnknownElement.callTheUserObjectsOperation (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/generated/EventListener.js:26:30)
        at innerInvokeEventListeners (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:318:25)
        at invokeEventListeners (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:274:3)
        at HTMLUnknownElementImpl._dispatch (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:221:9)
        at HTMLUnknownElementImpl.dispatchEvent (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:94:17)
        at HTMLUnknownElement.dispatchEvent (/home/froyer/src/multiformat-react/node_modules/jsdom/lib/jsdom/living/generated/EventTarget.js:231:34)
        at Object.invokeGuardedCallbackDev (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:3994:16)
        at invokeGuardedCallback (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:4056:31)
        at flushPassiveEffectsImpl (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:23574:9)
        at unstable_runWithPriority (/home/froyer/src/multiformat-react/node_modules/scheduler/cjs/scheduler.development.js:468:12)
        at runWithPriority$1 (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:11276:10)
        at flushPassiveEffects (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom.development.js:23447:14)
        at Object.<anonymous>.flushWork (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom-test-utils.development.js:992:10)
        at act (/home/froyer/src/multiformat-react/node_modules/react-dom/cjs/react-dom-test-utils.development.js:1107:9)
        at render (/home/froyer/src/multiformat-react/node_modules/@testing-library/react/dist/pure.js:97:26)
        at Object.<anonymous> (/home/froyer/src/multiformat-react/src/App.test.js:5:3)
        at Promise.then.completed (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/utils.js:276:28)
        at new Promise (<anonymous>)
        at callAsyncCircusFn (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/utils.js:216:10)
        at _callCircusTest (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/run.js:212:40)
        at processTicksAndRejections (node:internal/process/task_queues:96:5)
        at _runTest (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/run.js:149:3)
        at _runTestsForDescribeBlock (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/run.js:63:9)
        at run (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/run.js:25:3)
        at runAndTransformResultsToJestFormat (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapterInit.js:176:21)
        at jestAdapter (/home/froyer/src/multiformat-react/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapter.js:109:19)
        at runTestInternal (/home/froyer/src/multiformat-react/node_modules/jest-runner/build/runTest.js:380:16)
        at runTest (/home/froyer/src/multiformat-react/node_modules/jest-runner/build/runTest.js:472:34)
        at Object.worker (/home/froyer/src/multiformat-react/node_modules/jest-runner/build/testWorker.js:133:12)

      at VirtualConsole.<anonymous> (node_modules/jsdom/lib/jsdom/virtual-console.js:29:45)
      at reportException (node_modules/jsdom/lib/jsdom/living/helpers/runtime-script-errors.js:66:28)
      at innerInvokeEventListeners (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:333:9)
      at invokeEventListeners (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:274:3)
      at HTMLUnknownElementImpl._dispatch (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:221:9)
      at HTMLUnknownElementImpl.dispatchEvent (node_modules/jsdom/lib/jsdom/living/events/EventTarget-impl.js:94:17)

  console.error
    The above error occurred in the <App> component:
    
        at App (/home/froyer/src/multiformat-react/src/App.js:13:5)
    
    Consider adding an error boundary to your tree to customize error handling behavior.
    Visit https://reactjs.org/link/error-boundaries to learn more about error boundaries.

      at logCapturedError (node_modules/react-dom/cjs/react-dom.development.js:20085:23)
      at update.callback (node_modules/react-dom/cjs/react-dom.development.js:20118:5)
      at callCallback (node_modules/react-dom/cjs/react-dom.development.js:12318:12)
      at commitUpdateQueue (node_modules/react-dom/cjs/react-dom.development.js:12339:9)
      at commitLifeCycles (node_modules/react-dom/cjs/react-dom.development.js:20736:11)
      at commitLayoutEffects (node_modules/react-dom/cjs/react-dom.development.js:23426:7)
      at HTMLUnknownElement.callCallback (node_modules/react-dom/cjs/react-dom.development.js:3945:14)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        3.566 s, estimated 4 s
Ran all test suites related to changed files.

Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder

I find .code filed and .name (to lesser degree) fields on following interfaces to be troublesome

export interface BlockEncoder<Code extends number, T> {
name: string
code: Code
encode(data: T): ByteView<T>
}
/**
* IPLD decoder part of the codec.
*/
export interface BlockDecoder<Code extends number, T> {
code: Code
decode(bytes: ByteView<T>): T

Problem is that it prevents one from defining codec composition without introducing subtle footgun. For example dag-ucan in theory could be composition of dag-cbor and raw codecs, meaning it could decode block in either cbor or raw encoding and similarly encode node either in cbor or raw representation (depending on UCAN specific nuances).

This double representation is an implementation detail currently hidden under new 0x78c0 multicodec code multiformats/multicodec#264.

Given the arguments in the thread I have considered dropping new code and make an implementation that is UCAN specialized BlockCodec<0x71|0x55> codec. However there are some interesting challenges:

  1. .code could be either 0x71 or 0x55, while type checker would be happy with either option it is misleading because it is common to use that code field when creating cids e.g.:
    const bytes = codec.encode(value)
    const hash = await hasher.digest(bytes)
    const cid = CID.create(1, codec.code, hash)
  2. I think this is a symptom of a broader problem I've experienced in different contexts. Result of encode carries no information about codec. Probably why I find myself resorting to { code, bytes } whenever I want to defer async CID creation.
    • It retrospect it seems silly that we identified need for this in MultihashDigest but not here
      export interface MultihashDigest<Code extends number = number> {
      /**
      * Code of the multihash
      */
      code: Code
      /**
      * Raw digest (without a hashing algorithm info)
      */
      digest: Uint8Array
      /**
      * byte length of the `this.digest`
      */
      size: number
      /**
      * Binary representation of this multihash digest.
      */
      bytes: Uint8Array
      }

Unfortunately I see no way to address this in backwards compatible manner. Maybe we could introduce MultiblockEncoder along the side of BlockEncoder similar to how we have MultibaseEncoder producing prefixed values and BaseEncoder without prefix:

/**
* Base encoder just encodes bytes into base encoded string.
*/
export interface BaseEncoder {
/**
* Base encodes to a **plain** (and not a multibase) string. Unlike
* `encode` no multibase prefix is added.
* @param bytes
*/
baseEncode(bytes: Uint8Array): string
}

export interface MultibaseEncoder<Prefix extends string> {
/**
* Name of the encoding.
*/
name: string
/**
* Prefix character for that base encoding.
*/
prefix: Prefix
/**
* Encodes binary data into **multibase** string (which will have a
* prefix added).
*/
encode(bytes: Uint8Array): Multibase<Prefix>
}


Maybe this is even broader issue of having multicodes in address as opposed to data itself. E.g if we tagged encoded bytes themself with multihash all the IR representations would naturally be represented although that ship has probably sailed a long ago.

Is the new CID(baseEncodedString) constructor from js-cid not supported?

Hi, I'm trying to use the new CID(baseEncodedString) constructor so that I can convert between formats, but it seems the only constructor exposed by this library is new CID(version, codec, multihash, [multibaseName]). Is that correct or am I missing something?

Happy to give more context, but my question isn't really about my specific use case, so I figured I wouldn't bog it down with details.

Which of the multibase formats preserves lexicographic order?

I've come across a situation requiring the ability to base-encode but preserve the lexicographic-order of the input bytes.

That is the order of the base encoding alphabet should be in the same order as those bytes that are mapped to the base encoding.

An example of this library is https://github.com/deanlandolt/base64-lex which is a base64 that preserves lexicographic order.

Which base encodings in multibase support preservation of lexicographic order?

How do I actually create an IPFS URL from bytes?

I have a byte contenthash string: "0xe30101550008efbbbf6361756e74"

A working tool I have found turns this into: "ipfs://bafkqachpxo7wgylvnz2a" which works in IPFS gateways

A different JS multiformat library called "content-hash" turns it into "ipfs://17bet1zbvSNbM" which give me "invalid ipfs path: invalid path "/ipfs/17bet1zbvSNbM": invalid CID: selected encoding not supported" in every IPFS gateway

I have tried various combinations from the front page to try to convert the byte string I have into the working IPFS url, without success. eg: CID.create(0, 0xe3, val)

My question is, how do I get from contenthash bytes and into either an IPFS (0xe3) or IPNS (0xe5) multihash? for ex. the bytes and resultant ipfs url I have included? I am sorry for making an issue out of this but I have run out of things to try.

Identity encoding decoding doesn't produce the same data

I'm not sure if identity encoding is meant to be used like this, but I noticed that after decoding, you don't get the same data:

import { bases } from 'multiformats/basics';

const codec = bases['identity'];

const u = new Uint8Array([
    6, 22, 184, 240, 237, 178,
  112,  0, 150, 137, 182,  54,
  220,  1, 217, 221
]);

const s = codec.encode(u);

const u_ = codec.decode(s);

console.log(u_);

/*
Uint8Array(36) [
    6,  22, 239, 191, 189, 239, 191, 189,
  239, 191, 189, 239, 191, 189, 112,   0,
  239, 191, 189, 239, 191, 189, 239, 191,
  189,  54, 239, 191, 189,   1, 239, 191,
  189, 239, 191, 189
]
*/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.