GithubHelp home page GithubHelp logo

hyperdrive's Introduction

Hyperdrive

See API docs at docs.holepunch.to

Hyperdrive is a secure, real-time distributed file system

Install

npm install hyperdrive

Usage

const Hyperdrive = require('hyperdrive')
const Corestore = require('corestore')

const store = new Corestore('./storage')
const drive = new Hyperdrive(store)

await drive.put('/blob.txt', Buffer.from('example'))
await drive.put('/images/logo.png', Buffer.from('..'))
await drive.put('/images/old-logo.png', Buffer.from('..'))

const buffer = await drive.get('/blob.txt')
console.log(buffer) // => <Buffer ..> "example"

const entry = await drive.entry('/blob.txt')
console.log(entry) // => { seq, key, value: { executable, linkname, blob, metadata } }

await drive.del('/images/old-logo.png')

await drive.symlink('/images/logo.shortcut', '/images/logo.png')

for await (const file of drive.list('/images')) {
  console.log('list', file) // => { key, value }
}

const rs = drive.createReadStream('/blob.txt')
for await (const chunk of rs) {
  console.log('rs', chunk) // => <Buffer ..>
}

const ws = drive.createWriteStream('/blob.txt')
ws.write('new example')
ws.end()
ws.once('close', () => console.log('file saved'))

API

const drive = new Hyperdrive(store, [key])

Creates a new Hyperdrive instance. store must be an instance of Corestore.

By default it uses the core at { name: 'db' } from store, unless you set the public key.

await drive.ready()

Waits until internal state is loaded.

Use it once before reading synchronous properties like drive.discoveryKey, unless you called any of the other APIs.

await drive.close()

Fully close this drive, including its underlying Hypercore backed datastructures.

drive.corestore

The Corestore instance used as storage.

drive.db

The underlying Hyperbee backing the drive file structure.

drive.core

The Hypercore used for drive.db.

drive.id

String containing the id (z-base-32 of the public key) identifying this drive.

drive.key

The public key of the Hypercore backing the drive.

drive.discoveryKey

The hash of the public key of the Hypercore backing the drive.

Can be used as a topic to seed the drive using Hyperswarm.

drive.contentKey

The public key of the Hyperblobs instance holding blobs associated with entries in the drive.

drive.writable

Boolean indicating if we can write or delete data in this drive.

drive.readable

Boolean indicating if we can read from this drive. After closing the drive this will be false.

drive.version

Number that indicates how many modifications were made, useful as a version identifier.

drive.supportsMetadata

Boolean indicating if the drive handles or not metadata. Always true.

await drive.put(path, buffer, [options])

Creates a file at path in the drive. options are the same as in createWriteStream.

const buffer = await drive.get(path, [options])

Returns the blob at path in the drive. If no blob exists, returns null.

It also returns null for symbolic links.

options include:

{
  wait: true, // Wait for block to be downloaded
  timeout: 0 // Wait at max some milliseconds (0 means no timeout)
}

const entry = await drive.entry(path, [options])

Returns the entry at path in the drive. It looks like this:

{
  seq: Number,
  key: String,
  value: {
    executable: Boolean, // Whether the blob at path is an executable
    linkname: null, // If entry not symlink, otherwise a string to the entry this links to
    blob: { // Hyperblobs id that can be used to fetch the blob associated with this entry
      blockOffset: Number,
      blockLength: Number,
      byteOffset: Number,
      byteLength: Number
    },
    metadata: null
  }
}

options include:

{
  follow: false, // Follow symlinks, 16 max or throws an error
  wait: true, // Wait for block to be downloaded
  timeout: 0 // Wait at max some milliseconds (0 means no timeout)
}

const exists = await drive.exists(path)

Returns true if the entry at path does exists, otherwise false.

await drive.del(path)

Deletes the file at path from the drive.

const comparison = drive.compare(entryA, entryB)

Returns 0 if entries are the same, 1 if entryA is older, and -1 if entryB is older.

const cleared = await drive.clear(path, [options])

Deletes the blob from storage to free up space, but the file structure reference is kept.

options include:

{
  diff: false // Returned `cleared` bytes object is null unless you enable this
}

const cleared = await drive.clearAll([options])

Deletes all the blobs from storage to free up space, similar to how drive.clear() works.

options include:

{
  diff: false // Returned `cleared` bytes object is null unless you enable this
}

await drive.purge()

Purge both cores (db and blobs) from your storage, completely removing all the drive's data.

await drive.symlink(path, linkname)

Creates an entry in drive at path that points to the entry at linkname.

If a blob entry currently exists at path then it will get overwritten and drive.get(key) will return null, while drive.entry(key) will return the entry with symlink information.

const batch = drive.batch()

Useful for atomically mutate the drive, has the same interface as Hyperdrive.

await batch.flush()

Commit a batch of mutations to the underlying drive.

const stream = drive.list(folder, [options])

Returns a stream of all entries in the drive at paths prefixed with folder.

options include:

{
  recursive: true | false // Whether to descend into all subfolders or not
}

const stream = drive.readdir(folder)

Returns a stream of all subpaths of entries in drive stored at paths prefixed by folder.

const stream = await drive.entries([range], [options])

Returns a read stream of entries in the drive.

options are the same as Hyperbee().createReadStream([range], [options]).

const mirror = drive.mirror(out, [options])

Efficiently mirror this drive into another. Returns a MirrorDrive instance constructed with options.

Call await mirror.done() to wait for the mirroring to finish.

const watcher = drive.watch([folder])

Returns an iterator that listens on folder to yield changes, by default on /.

Usage example:

for await (const [current, previous] of watcher) {
  console.log(current.version)
  console.log(previous.version)
}

Those current and previous are snapshots that are auto-closed before next value.

Don't close those snapshots yourself because they're used internally, let them be auto-closed.

await watcher.ready()

Waits until the watcher is loaded and detecting changes.

await watcher.destroy()

Stops the watcher. You could also stop it by using break in the loop.

const rs = drive.createReadStream(path, [options])

Returns a stream to read out the blob stored in the drive at path.

options include:

{
  start: Number, // `start` and `end` are inclusive
  end: Number,
  length: Number, // `length` overrides `end`, they're not meant to be used together
  wait: true, // Wait for blocks to be downloaded
  timeout: 0 // Wait at max some milliseconds (0 means no timeout)
}

const ws = drive.createWriteStream(path, [options])

Stream a blob into the drive at path.

options include:

{
  executable: Boolean,
  metadata: null // Extended file information i.e. arbitrary JSON value
}

await drive.download(folder, [options])

Downloads the blobs corresponding to all entries in the drive at paths prefixed with folder.

options are the same as those for drive.list(folder, [options]).

const snapshot = drive.checkout(version)

Get a read-only snapshot of a previous version.

const stream = drive.diff(version, folder, [options])

Efficiently create a stream of the shallow changes to folder between version and drive.version.

Each entry is sorted by key and looks like this:

{
  left: Object, // Entry in folder at drive.version for some path
  right: Object, // Entry in folder at drive.checkout(version) for some path
}

If an entry exists in drive.version of the folder but not in version, then left is set and right will be null, and vice versa.

await drive.downloadDiff(version, folder, [options])

Downloads all the blobs in folder corresponding to entries in drive.checkout(version) that are not in drive.version.

In other words, downloads all the blobs added to folder up to version of the drive.

await drive.downloadRange(dbRanges, blobRanges)

Downloads the entries and blobs stored in the ranges dbRanges and blobRanges.

const done = drive.findingPeers()

Indicate to Hyperdrive that you're finding peers in the background, requests will be on hold until this is done.

Call done() when your current discovery iteration is done, i.e. after swarm.flush() finishes.

const stream = drive.replicate(isInitiatorOrStream)

Usage example:

const swarm = new Hyperswarm()
const done = drive.findingPeers()
swarm.on('connection', (socket) => drive.replicate(socket))
swarm.join(drive.discoveryKey)
swarm.flush().then(done, done)

See more about how replicate works at corestore.replicate.

const updated = await drive.update([options])

Waits for initial proof of the new drive version until all findingPeers are done.

options include:

{
  wait: false
}

Use drive.findingPeers() or { wait: true } to make await drive.update() blocking.

const blobs = await drive.getBlobs()

Returns the Hyperblobs instance storing the blobs indexed by drive entries.

await drive.put('/file.txt', Buffer.from('hi'))

const buffer1 = await drive.get('/file.txt')

const blobs = await drive.getBlobs()
const entry = await drive.entry('/file.txt')
const buffer2 = await blobs.get(entry.value.blob)

// => buffer1 and buffer2 are equals

License

Apache-2.0

hyperdrive's People

Contributors

4c656554 avatar ajschumacher avatar andrewosh avatar bcomnes avatar clkao avatar creationix avatar douganderson444 avatar e-e-e avatar frando avatar hdegroote avatar heapwolf avatar joehand avatar juliangruber avatar karissa avatar kasperisager avatar lukks avatar mafintosh avatar martinheidegger avatar max-mapper avatar mixmix avatar okdistribute avatar pfrazee avatar poga avatar rafapaezbas avatar ralphtheninja avatar rangermauve avatar scriptjs avatar sdockray avatar t-mullen avatar todrobbins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hyperdrive's Issues

Error Trying to dat share w/ Dat v1.0

I was trying to dat share a large folder and got and error after a few hundred (?) files. Karissa said the issue may be hyperdrive:

/Users/joe/node_modules/dat/node_modules/hyperdrive/node_modules/rabin/index.js:18
  this.rabin = rabin.initialize(avgBits, min, max)
                     ^
Error: the value of instance_counter is too damn high
    at Error (native)
    at new Rabin (/Users/joe/node_modules/dat/node_modules/hyperdrive/node_modules/rabin/index.js:18:22)
    at Rabin (/Users/joe/node_modules/dat/node_modules/hyperdrive/node_modules/rabin/index.js:10:40)
    at writeStream (/Users/joe/node_modules/dat/node_modules/hyperdrive/lib/write-stream.js:22:35)
    at Archive.entry (/Users/joe/node_modules/dat/node_modules/hyperdrive/lib/pack.js:46:17)
    at DestroyableTransform._transform (/Users/joe/node_modules/dat/index.js:47:22)
    at DestroyableTransform.Transform._read (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:172:10)
    at DestroyableTransform.Transform._write (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:160:12)
    at doWrite (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:333:12)
    at writeOrBuffer (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:319:5)

Describe protocol in implementation-agnostic way

Hi,

As discussed in dotJS, I'd really like to see a protocol specification for HyperDrive. Bonus points for it containing a comparison to wellknown protocols in the same space (BitTorrent?), key assumptions behind it, novel and old use cases where the new protocol would excel (mobile networks?).

Cheers, Markus

Useful events

  • When an archive is finished downloading
    • When an update arrives
    • When an update is finished downloading (can be the same event as when the archive is finished downloading)

missing docs

  • archive.lookup
  • var dl = archive.download (the dl progress events)

'Offset is out of bounds' crash bug in dat cli

While running dat . --live, I had beaker loading the page (locally) and I got this:

Beaker's using hyperdrive 6.5.1, and dat is 11.0.2.

Error dump:

Share Link 6ece0d8319611a8a834e56a979f484898aaeda172ad5905de929dbceee19d34e
The Share Link is secret and only those you share it with will be able to get the files
Sharing /Users/paulfrazee/tmp/mytestdat, connected to 1/3 sources
Uploading 185.22 kB/s, 926.09 kB Totalfs.js:681
  binding.read(fd, buffer, offset, length, position, req);
          ^

Error: Offset is out of bounds
    at Error (native)
    at Object.fs.read (fs.js:681:11)
    at onread (/Users/paulfrazee/npm/lib/node_modules/dat/node_modules/random-access-file/index.js:84:8)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:675:17)

nasty error message

calling this https://github.com/maxogden/dat/blob/c29fe52378fcdd025c6b664f6fc371b0592e2a6b/index.js#L128

on an entry that has no data (in this case it is a folder that was added as a type: file) causes this error stack:

buffer.js:572
        return this.utf8Write(string, offset, length);
                    ^

TypeError: Argument must be a string
    at TypeError (native)
    at Buffer.write (buffer.js:572:21)
    at Object.exports.bytes.encode (/Users/max/src/js/dat/node_modules/hyperdrive/node_modules/protocol-buffers/encodings.js:35:17)
    at Object.encode (eval at <anonymous> (/Users/max/src/js/dat/node_modules/hyperdrive/node_modules/protocol-buffers/node_modules/generate-function/index.js:55:21), <anonymous>:13:10)
    at Protocol._push (/Users/max/src/js/dat/node_modules/hyperdrive/lib/protocol.js:290:7)
    at Protocol.join (/Users/max/src/js/dat/node_modules/hyperdrive/lib/protocol.js:302:8)
    at Swarm.join (/Users/max/src/js/dat/node_modules/hyperdrive/lib/swarm.js:135:19)
    at new Feed (/Users/max/src/js/dat/node_modules/hyperdrive/lib/feed.js:24:37)
    at Feed (/Users/max/src/js/dat/node_modules/hyperdrive/lib/feed.js:11:39)
    at Hyperdrive.get (/Users/max/src/js/dat/node_modules/hyperdrive/index.js:42:10)

would be nice to have a more user friendly error here

max's list of hyperdrive feature requests

  • download a single file by filename randomly
  • upload progress (for showing total amount downloaded, total amount uploaded when sharing)
  • allow string or buffer as the hex link
  • use catfile for storage/decouple filesystem
  • fix performance bottleneck on large dats
  • switch from index-at-end to index-on-chunk
  • add digest block to end that has total size + file count
  • dont expose hash to swarm (only share hash of hash etc)
  • remove the drive.add stats object

list() ranges

Iterating over every file in a hyperdrive could become slow for archives with very many files.

Adding leveldb-style iterator ranges would speed things up for some use-cases, such as listing all the files in a directory:

archive.list({ gt: 'dir1/', lt: 'dir1/\uffff' })

setting type: 'directory' doesnt override mode

i think this is the code https://github.com/mafintosh/hyperdrive/blob/0926c869ed62a22fbe5e11f67cd4bc0ed9356346/lib/pack.js#L42

if i write this entry on one side:

{ name: 'folder',
  mode: 16877,
  uid: 501,
  gid: 20,
  mtime: 1452023306000,
  ctime: 1452023306000,
  type: 'directory' }

i get this on the other side:

{ type: 'file',
  value: 
   { name: 'folder',
     mode: 16877,
     size: 0,
     uid: 501,
     gid: 20,
     mtime: 1452023306000,
     ctime: 1452023306000 },
  link: null }

its a folder on disk, im not sure why its treating it as a file here

API for flatten filesystem

archive.list and archive.lookup currently work on the whole metadata records. the archive api should maintain (block head) -> (flatten tree) and provide api to work on them.

mkdirp EEXIST error with long path

I added a node_modules folder to hyperdrive. but now i get this error (in electron):

Error: EEXIST: file already exists, mkdir '/Users/max/Desktop/electron/node_modules/lightning-image-poly/node_modules/brfs/node_modules/static-module/node_modules/static-eval/node_modules/escodegen/node_modules/.bin/esvalidate'
    at Error (native)

which came from this line: https://github.com/mafintosh/hyperdrive/blob/73425f4665045a3092be37adbbc4ab5eef557e90/index.js#L231

I checked it out and other paths < 184 chars long worked. not sure if this path being 184 chars long is what caused it

hyperdrive 6.0 exception

Since the latest hyperdrive update (6.0) I get this exception:

Uncaught TypeError: peer.emit is not a function
    at Channel.onend (node_modules/hyperdrive/node_modules/hypercore/lib/replicate.js:203:8)
    at Protocol._close (node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/index.js:492:13)
    at Protocol.onfinalize (node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/index.js:193:14)
    at Protocol.Duplexify._destroy (node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/node_modules/duplexify/index.js:191:8)
    at node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/node_modules/duplexify/index.js:174:10

Hyperdrive version: 6.0
Hypercore version: 4.1.0
Hypercore protocol: 4.3.0

Regards,

Connecting node hyperdrive with browser based WebRTC hyperdrive

Would it be possible to have the node version talk to the browser version of the script?

Currently if I try to connect the second example in the README.md (example code below) to the browser based version (eg. http://mafintosh.github.io/hyperdrive/) the script crashes with the following error:

events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: connect ECONNREFUSED [browsers ip address]:[random port number]
    at Object.exports._errnoException (util.js:856:11)
    at exports._exceptionWithHostPort (util.js:879:20)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1063:14)

Example code

var disc = require('discovery-channel')()
var hyperdrive = require('hyperdrive')
var net = require('net');
var levelup = require('levelup')
var aLevelDB = levelup('./mydb')
var drive = hyperdrive(aLevelDB)

var link = new Buffer({your-hyperdrive-link-from-the-above-example}, 'hex')

var server = net.createServer(function (socket) {
  socket.pipe(drive.createPeerStream()).pipe(socket)
})

server.listen(0, function () {
  function ann () {
    // discovery-channel currently only works with 20 bytes hashes
    disc.announce(link.slice(0, 20), server.address().port)
  }

  ann()
  setInterval(ann, 10000)

  var lookup = disc.lookup(link.slice(0, 20))

  lookup.on('peer', function (ip, port) {
    var socket = net.connect(port, ip)
    socket.pipe(drive.createPeerStream()).pipe(socket)
  })
})

Hyperdrive Name

Add hyperdrive name so we can create a download folder with that name.

 drive.createArchive(key, {
      name: 'my-data-folder'
 }

downloading twice never calls download callback

Create an archive and download the contents successfully, then exit the process. Then, run it again. You'll notice that the callback is never fired to finish the download. Example:

$ dat dat://.....
Download complete, etc.
^C
$ dat dat://
Connected to 1/1 peers

'download complete' never fires because the callback isn't called.

possible mem leak, investigate

> [email protected] start /home/maf/versions/dat.haus-1466086479375
> taco-nginx --domain dat.haus node index.js

Server is listening on port 41616
(node) warning: possible EventEmitter memory leak detected. 11 download listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:432:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 upload listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:436:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 download listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:432:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 upload listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:436:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 download listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:432:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 upload listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:436:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7

Empty dir not created

#70 added a test case for that (but the .download() call was wrong and corrected by #82). We'll need the storage api to support mkdir: in addition to file: for this to work.

Appending existing file overwrites file & changes mtime

Adding a file to archive with createFileWriteStream overwrites the existing file and changes the mtime, when using hyperdrive + random-access-file.

  • This means we can't use mtime to check if files are updated because hyperdrive changes mtime (our own little observer effect).
  • Overwriting existing files seems like unwanted behavior.

This may be a storage/raf issue.

Related issue in hyperdrive-import-files

How about encrypted connection and peer authentication?

I tried to read as much as I could, but I saw nowhere anything related to peer-to-peer connections being authenticated and encrypted. It looks like as now hyperdrive has no security at all. Did I miss something or it is not in the scope of hyperdrive?

If somebody wants to have authentication and encryption of the connections, which npm module and public/private key system would you recommend?

NAT traversal botheration in example code

I'm not sure if this is that important, but I'm finding it hard to get the README example running because it requires NAT traversal. It works fine if I modify it to just connect the two hyperdrives directly to each other.

A couple ideas that could solve this:

  1. Connect the two hyperdrives together without using discovery-channel, e.g. by printing out the port to connect to.
  2. Use webrtc-swarm, which should take advantage of ICE and should always work to let a computer connect to itself, because there will be local ICE candidates.

[Spec] 16-bit Index block entries too restrictive?

Hi, great work from you and the Dat team on this and all of the other component packages. This is all very promising and already super useful! I've been reviewing everything top-to-bottom, and I've run into one design decision that seems a bit too constraining, even in the near-term, and so in the interest of "future-proofing" the spec a bit, I'd like to initiate a discussion.

The decision is the specific choice of exclusively using 16-bit values to hold the Rabin chunked block sizes in the "Index blocks" to provide seekability within the blocks making up a file's feed. This seems to place a rather artificial 64KB upper limit on the maximum size of any chunk, which for TB-scale files would require they be split into 10s of millions of chunks in the process. The block index alone for such a file would weigh-in at > 20MB. The 16-bit values are probably fine for files up to about 1 GB, but we already routinely handle individual files of 100 GB, with larger ones coming all of the time, and so I'd very interested in increasing the random access granularity and deduplication scale in exchange for somewhat lower overhead and better performance.

I've also been benchmarking streaming GB-scale files into replicated NoSQL databases (MongoDB and RethinkDB) using your Rabin chunking and content-addressable indexes, and I'm finding that 20KB mean length chunks are ~2-4x less efficient to store and retrieve (in terms of throughput) when compared to 200KB chunks. From 200KB to ~1MB, the performance benefit of using larger chunks flattens out, but up to that point, the improvement from using larger chunks is substantial.

Are you open to discussing the possibility of either increasing the block index to use 32-bit values, or alternatively, making the choice of 16/32-bit values an option on a per-file/feed basis?

Thanks! -Vaughn

Plans to continuously sync files or folders?

I'm wondering if there are any plans to add the ability to continuously sync a local folder such that all updates that I make are automatically transferred to anyone who wants the latest copy. I can imagine that a signed append-only log like secure-scuttlebutt would provide a way to do this, but maybe you guys already have something in the works.

Doing this with a multi-writer setup would be really difficult, because you would then have to deal with conflict resolution.

Error: entry.content is required for byte cursors

I got this cryptic error from hyperdrive. It happened when I tried to use createReadFileStream on an archive loaded from dat.land.

Error: entry.content is required for byte cursors
    at /Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:128:37
    at Archive.get (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/archive.js:108:55)
    at Cursor._open (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:126:16)
    at open (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:29:10)
    at run (/Users/paulfrazee/my/beaker-browser/app/node_modules/thunky/index.js:13:3)
    at Cursor.open (/Users/paulfrazee/my/beaker-browser/app/node_modules/thunky/index.js:27:3)
    at Cursor._openAndNext (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:163:8)
    at Cursor.next (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:49:33)
    at read (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/archive.js:296:9)
    at /Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/archive.js:305:7

Here's what the entries listing looks like:

"[
  {
    "blocks": 2,
    "content": null,
    "ctime": 1468883103188,
    "gid": 0,
    "length": 30266,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103188,
    "name": "figure-1.1.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.1.png"
  },
  {
    "blocks": 2,
    "content": null,
    "ctime": 1468883103262,
    "gid": 0,
    "length": 22364,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103262,
    "name": "figure-1.1a.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.1a.png"
  },
  {
    "blocks": 3,
    "content": null,
    "ctime": 1468883103329,
    "gid": 0,
    "length": 37023,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103329,
    "name": "figure-1.2.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.2.png"
  },
  {
    "blocks": 4,
    "content": null,
    "ctime": 1468883103385,
    "gid": 0,
    "length": 53469,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103385,
    "name": "figure-1.3.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.3.png"
  }
]"

Hyperdrive crashes trying to get empty files

This means that a dat clone operation will fail if the directory to be cloned includes any empty files. They can be written, but not read. Probably this is worth fixing, since empty files can be useful in some situations.

I added a failing test that illustrates this problem.

archive.list() doesnt callback

I'm getting some inconsistent behaviors from list(). Maybe my usage is wrong.

Here's what I'm doing:

import hyperdrive from 'hyperdrive'
import memdb from 'memdb'
import swarm from 'hyperdrive-archive-swarm'

var drive = hyperdrive(memdb())

function fetch (archiveKey, path) {
  // assume `archiveKey` is a valid key

  // start searching the network
  log('[DAT] Swarming archive', archiveKey)
  var archive = drive.createArchive(archiveKey)
  var sw = swarm(archive)

  // wait for a peer
  sw.once('peer', function (peer) {
    log('[DAT] Swarm peer:', peer, '('+archiveKey+')')

    // list archive contents
    log('[DAT] attempting to list archive')
    archive.list((err, entries) => {
      log('[DAT] list() results', err, entries)
    })
  })
}

What I'm seeing is, the 'list() results' log isnt ever happening. (I had it work a couple times, but it hasn't since.)

Any ideas? [email protected], [email protected]

What files does `list` list for a non-live archive?

Does the list stream end after it's listed all files in the archive? Or does it end after it's listed all files in the local copy? I think it does the second thing, but I was expecting it to do the first one. What do you guys think of updating the behavior of list in this case? If not, a clarification in the README would be cool.

Error: Invalid enum value: 39725

After the second file I store in an archive, I get this error:

Error: Invalid enum value: 39725
    at Object.decode (eval at <anonymous> (/home/substack/projects/hyperdrive/node_modules/generate-function/index.js:55:21), <anonymous>:5:50)
    at Object.decode (eval at <anonymous> (/home/substack/projects/hyperdrive/node_modules/generate-function/index.js:55:21), <anonymous>:31:27)
    at /home/substack/projects/hyperdrive/index.js:391:36
    at /home/substack/projects/hyperdrive/node_modules/levelup/lib/levelup.js:230:7

Here's the hex dump of the value that messages.Index.decode(buf) fails on:

080012213132373638383131303937373532363039362f4453435f3130323933322e6a706718adb6022005

crash on feed access

when we asked for feed.get for an index that didnt exist:

undefined:5
  if (!end) end = buf.length
                     ^

TypeError: Cannot read property 'length' of null
    at Object.decode (eval at <anonymous> (/Users/max/src/js/hyperdrive/node_modules/protocol-buffers/node_modules/generate-function/index.js:55:21), <anonymous>:5:22)
    at /Users/max/src/js/hyperdrive/lib/feed.js:61:32
    at /Users/max/src/js/hyperdrive/lib/feed.js:42:71
    at apply (/Users/max/src/js/hyperdrive/node_modules/thunky/index.js:16:28)
    at Feed.open (/Users/max/src/js/hyperdrive/node_modules/thunky/index.js:27:3)
    at Feed.get (/Users/max/src/js/hyperdrive/lib/feed.js:41:8)
    at downloadNext (/Users/max/src/js/dat/index.js:106:10)
    at /Users/max/src/js/dat/index.js:127:9
    at done (/Users/max/src/js/run-series/index.js:11:7)
    at Object.cb (/Users/max/src/js/dat/index.js:117:15)

Confusing behavior when a relative path is passed into appendFile

In the code below, the callback is never called. I don't think that hyperdrive necessarily needs to support relative paths, but either way the callback should definitely be called.

var memdb = require('memdb')
var os = require('os')
var hyperdrive = require('hyperdrive')

var drive = hyperdrive(memdb())
var tmpDir = os.tmpdir()
var archive = drive.add(tmpDir)
archive.appendFile('./README.md', console.log)

I added a test that shows this behavior, here.

Progress Stats for 6

Total size stats (once finalized):

  • Total archive size
  • Total archive file count

Progress Stats for Individual File Appends/Downloads:

  • Bytes downloaded/uploaded
  • bytes added (?)
  • Bytes total

Returning an object from append/download, like we did before, will work.

Along with these events, I think thats all we will need for Dat CLI.

Empty files aren't downloaded

Empty files get added to metadata but never get transferred or downloaded.

Reproduce:

  • Run dat link in dat tests fixtures folder (with empty.txt)
  • Download the dat link to another folder.
  • folder and empty.txt are missing but metadata gets read (they appear in createEntryStream)

crash on empty file

/Users/max/src/js/hyperdrive/lib/pack.js:58
    for (var i = 0; i < content.index.length; i++) file.size += content.index[i]
                                     ^

TypeError: Cannot read property 'length' of null
    at onfinish (/Users/max/src/js/hyperdrive/lib/pack.js:58:38)
    at emitNone (events.js:72:20)
    at emit (events.js:166:7)
    at finishMaybe (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:509:14)
    at afterWrite (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:388:3)
    at onwrite (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:378:7)
    at WritableState.onwrite (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:123:5)
    at g (events.js:260:16)
    at emitNone (events.js:67:13)
    at emit (events.js:166:7)

deleting this file fixed the crash:

screen shot 2015-12-04 at 1 02 55 pm

Document archive.live

(According to jhand) you can see if an archive is live with the .live attribute. This isn't in the docs, atm.

An easy way to check if the entire archive is present?

I'd like to check whether all files and metadata are present in a non-live archive. The way that I'm currently doing this is by calling list, then creating a read stream for each entry and waiting until all the streams emit end. I'm guessing that there's probably a better way for me to do this? Or, if not, is this worth adding?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.