hypercore-protocol / hyperdrive Goto Github PK

View Code? Open in Web Editor NEW

1.8K 58.0 137.0 1003 KB

Hyperdrive is a secure, real time distributed file system

License: Apache License 2.0

JavaScript 100.00%

hyperdrive's Introduction

Hyperdrive

See API docs at docs.holepunch.to

Hyperdrive is a secure, real-time distributed file system

Install

npm install hyperdrive

Usage

const Hyperdrive = require('hyperdrive')
const Corestore = require('corestore')

const store = new Corestore('./storage')
const drive = new Hyperdrive(store)

await drive.put('/blob.txt', Buffer.from('example'))
await drive.put('/images/logo.png', Buffer.from('..'))
await drive.put('/images/old-logo.png', Buffer.from('..'))

const buffer = await drive.get('/blob.txt')
console.log(buffer) // => <Buffer ..> "example"

const entry = await drive.entry('/blob.txt')
console.log(entry) // => { seq, key, value: { executable, linkname, blob, metadata } }

await drive.del('/images/old-logo.png')

await drive.symlink('/images/logo.shortcut', '/images/logo.png')

for await (const file of drive.list('/images')) {
  console.log('list', file) // => { key, value }
}

const rs = drive.createReadStream('/blob.txt')
for await (const chunk of rs) {
  console.log('rs', chunk) // => <Buffer ..>
}

const ws = drive.createWriteStream('/blob.txt')
ws.write('new example')
ws.end()
ws.once('close', () => console.log('file saved'))

API

`const drive = new Hyperdrive(store, [key])`

Creates a new Hyperdrive instance. store must be an instance of Corestore.

By default it uses the core at { name: 'db' } from store, unless you set the public key.

`await drive.ready()`

Waits until internal state is loaded.

Use it once before reading synchronous properties like drive.discoveryKey, unless you called any of the other APIs.

`await drive.close()`

Fully close this drive, including its underlying Hypercore backed datastructures.

`drive.corestore`

The Corestore instance used as storage.

`drive.db`

The underlying Hyperbee backing the drive file structure.

`drive.core`

The Hypercore used for drive.db.

`drive.id`

String containing the id (z-base-32 of the public key) identifying this drive.

`drive.key`

The public key of the Hypercore backing the drive.

`drive.discoveryKey`

The hash of the public key of the Hypercore backing the drive.

Can be used as a topic to seed the drive using Hyperswarm.

`drive.contentKey`

The public key of the Hyperblobs instance holding blobs associated with entries in the drive.

`drive.writable`

Boolean indicating if we can write or delete data in this drive.

`drive.readable`

Boolean indicating if we can read from this drive. After closing the drive this will be false.

`drive.version`

Number that indicates how many modifications were made, useful as a version identifier.

`drive.supportsMetadata`

Boolean indicating if the drive handles or not metadata. Always true.

`await drive.put(path, buffer, [options])`

Creates a file at path in the drive. options are the same as in createWriteStream.

`const buffer = await drive.get(path, [options])`

Returns the blob at path in the drive. If no blob exists, returns null.

It also returns null for symbolic links.

options include:

{
  wait: true, // Wait for block to be downloaded
  timeout: 0 // Wait at max some milliseconds (0 means no timeout)
}

`const entry = await drive.entry(path, [options])`

Returns the entry at path in the drive. It looks like this:

{
  seq: Number,
  key: String,
  value: {
    executable: Boolean, // Whether the blob at path is an executable
    linkname: null, // If entry not symlink, otherwise a string to the entry this links to
    blob: { // Hyperblobs id that can be used to fetch the blob associated with this entry
      blockOffset: Number,
      blockLength: Number,
      byteOffset: Number,
      byteLength: Number
    },
    metadata: null
  }
}

options include:

{
  follow: false, // Follow symlinks, 16 max or throws an error
  wait: true, // Wait for block to be downloaded
  timeout: 0 // Wait at max some milliseconds (0 means no timeout)
}

`const exists = await drive.exists(path)`

Returns true if the entry at path does exists, otherwise false.

`await drive.del(path)`

Deletes the file at path from the drive.

`const comparison = drive.compare(entryA, entryB)`

Returns 0 if entries are the same, 1 if entryA is older, and -1 if entryB is older.

`const cleared = await drive.clear(path, [options])`

Deletes the blob from storage to free up space, but the file structure reference is kept.

options include:

{
  diff: false // Returned `cleared` bytes object is null unless you enable this
}

`const cleared = await drive.clearAll([options])`

Deletes all the blobs from storage to free up space, similar to how drive.clear() works.

options include:

{
  diff: false // Returned `cleared` bytes object is null unless you enable this
}

`await drive.purge()`

Purge both cores (db and blobs) from your storage, completely removing all the drive's data.

`await drive.symlink(path, linkname)`

Creates an entry in drive at path that points to the entry at linkname.

If a blob entry currently exists at path then it will get overwritten and drive.get(key) will return null, while drive.entry(key) will return the entry with symlink information.

`const batch = drive.batch()`

Useful for atomically mutate the drive, has the same interface as Hyperdrive.

`await batch.flush()`

Commit a batch of mutations to the underlying drive.

`const stream = drive.list(folder, [options])`

Returns a stream of all entries in the drive at paths prefixed with folder.

options include:

{
  recursive: true | false // Whether to descend into all subfolders or not
}

`const stream = drive.readdir(folder)`

Returns a stream of all subpaths of entries in drive stored at paths prefixed by folder.

`const stream = await drive.entries([range], [options])`

Returns a read stream of entries in the drive.

options are the same as Hyperbee().createReadStream([range], [options]).

`const mirror = drive.mirror(out, [options])`

Efficiently mirror this drive into another. Returns a MirrorDrive instance constructed with options.

Call await mirror.done() to wait for the mirroring to finish.

`const watcher = drive.watch([folder])`

Returns an iterator that listens on folder to yield changes, by default on /.

Usage example:

for await (const [current, previous] of watcher) {
  console.log(current.version)
  console.log(previous.version)
}

Those current and previous are snapshots that are auto-closed before next value.

Don't close those snapshots yourself because they're used internally, let them be auto-closed.

await watcher.ready()

Waits until the watcher is loaded and detecting changes.

await watcher.destroy()

Stops the watcher. You could also stop it by using break in the loop.

`const rs = drive.createReadStream(path, [options])`

Returns a stream to read out the blob stored in the drive at path.

options include:

{
  start: Number, // `start` and `end` are inclusive
  end: Number,
  length: Number, // `length` overrides `end`, they're not meant to be used together
  wait: true, // Wait for blocks to be downloaded
  timeout: 0 // Wait at max some milliseconds (0 means no timeout)
}

`const ws = drive.createWriteStream(path, [options])`

Stream a blob into the drive at path.

options include:

{
  executable: Boolean,
  metadata: null // Extended file information i.e. arbitrary JSON value
}

`await drive.download(folder, [options])`

Downloads the blobs corresponding to all entries in the drive at paths prefixed with folder.

options are the same as those for drive.list(folder, [options]).

`const snapshot = drive.checkout(version)`

Get a read-only snapshot of a previous version.

`const stream = drive.diff(version, folder, [options])`

Efficiently create a stream of the shallow changes to folder between version and drive.version.

Each entry is sorted by key and looks like this:

{
  left: Object, // Entry in folder at drive.version for some path
  right: Object, // Entry in folder at drive.checkout(version) for some path
}

If an entry exists in drive.version of the folder but not in version, then left is set and right will be null, and vice versa.

`await drive.downloadDiff(version, folder, [options])`

Downloads all the blobs in folder corresponding to entries in drive.checkout(version) that are not in drive.version.

In other words, downloads all the blobs added to folder up to version of the drive.

`await drive.downloadRange(dbRanges, blobRanges)`

Downloads the entries and blobs stored in the ranges dbRanges and blobRanges.

`const done = drive.findingPeers()`

Indicate to Hyperdrive that you're finding peers in the background, requests will be on hold until this is done.

Call done() when your current discovery iteration is done, i.e. after swarm.flush() finishes.

`const stream = drive.replicate(isInitiatorOrStream)`

Usage example:

const swarm = new Hyperswarm()
const done = drive.findingPeers()
swarm.on('connection', (socket) => drive.replicate(socket))
swarm.join(drive.discoveryKey)
swarm.flush().then(done, done)

See more about how replicate works at corestore.replicate.

`const updated = await drive.update([options])`

Waits for initial proof of the new drive version until all findingPeers are done.

options include:

{
  wait: false
}

Use drive.findingPeers() or { wait: true } to make await drive.update() blocking.

`const blobs = await drive.getBlobs()`

Returns the Hyperblobs instance storing the blobs indexed by drive entries.

await drive.put('/file.txt', Buffer.from('hi'))

const buffer1 = await drive.get('/file.txt')

const blobs = await drive.getBlobs()
const entry = await drive.entry('/file.txt')
const buffer2 = await blobs.get(entry.value.blob)

// => buffer1 and buffer2 are equals

License

Apache-2.0

hyperdrive's People

Contributors

Stargazers

Watchers

Forkers

ivshti wooleners juice10 the9000 ryanramage josejamilena bcomnes codyzu muroko paulkernfeld intfrr joehand orangewise schezuk ajschumacher clkao pfrazee juliangruber lambchopinc zectbynmo tetratorus truonglvx maniacs-js watson jafow tomquas okdistribute ecoblockchain emilbayes psdgithub fmsy alexindigo andrewosh creationix scriptjs hackur sdockray e-e-e samuelmaddock d4tocchini dcposch kustomzone vibedrive martinheidegger jwerle jasnell lachenmayer jimpick fullstackenviormentss harikrishnan-github magland aral tshingen cckelly justforkin frando cloudxtreme zhouhansen genecyber decentriccorp telamon douganderson444 emamatcyber90 allain goofwear kickscondor blockchain-bobby rangermauve poga suryatmodulus noclouds socioprophet mdheller lirylinx mjp0 t-mullen dpaez gatewaybrowser acpassarella igit-cn sekmet holajiawei 2hoursleep sshyran forkkit coboxcoop mroberts1 todrobbins anotherjesse fsteff kyleamathews timgoeller cyrex562 strogo 4c656554 macdigital360 heapwolf dwebprotocol pre-bundled salespaulo

hyperdrive's Issues

Error Trying to dat share w/ Dat v1.0

I was trying to dat share a large folder and got and error after a few hundred (?) files. Karissa said the issue may be hyperdrive:

/Users/joe/node_modules/dat/node_modules/hyperdrive/node_modules/rabin/index.js:18
  this.rabin = rabin.initialize(avgBits, min, max)
                     ^
Error: the value of instance_counter is too damn high
    at Error (native)
    at new Rabin (/Users/joe/node_modules/dat/node_modules/hyperdrive/node_modules/rabin/index.js:18:22)
    at Rabin (/Users/joe/node_modules/dat/node_modules/hyperdrive/node_modules/rabin/index.js:10:40)
    at writeStream (/Users/joe/node_modules/dat/node_modules/hyperdrive/lib/write-stream.js:22:35)
    at Archive.entry (/Users/joe/node_modules/dat/node_modules/hyperdrive/lib/pack.js:46:17)
    at DestroyableTransform._transform (/Users/joe/node_modules/dat/index.js:47:22)
    at DestroyableTransform.Transform._read (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:172:10)
    at DestroyableTransform.Transform._write (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:160:12)
    at doWrite (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:333:12)
    at writeOrBuffer (/Users/joe/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:319:5)

Describe protocol in implementation-agnostic way

Hi,

As discussed in dotJS, I'd really like to see a protocol specification for HyperDrive. Bonus points for it containing a comparison to wellknown protocols in the same space (BitTorrent?), key assumptions behind it, novel and old use cases where the new protocol would excel (mobile networks?).

Cheers, Markus

Useful events

When an archive is finished downloading
- When an update arrives
- When an update is finished downloading (can be the same event as when the archive is finished downloading)

missing docs

archive.lookup
var dl = archive.download (the dl progress events)

'Offset is out of bounds' crash bug in dat cli

While running dat . --live, I had beaker loading the page (locally) and I got this:

Beaker's using hyperdrive 6.5.1, and dat is 11.0.2.

Error dump:

Share Link 6ece0d8319611a8a834e56a979f484898aaeda172ad5905de929dbceee19d34e
The Share Link is secret and only those you share it with will be able to get the files
Sharing /Users/paulfrazee/tmp/mytestdat, connected to 1/3 sources
Uploading 185.22 kB/s, 926.09 kB Totalfs.js:681
  binding.read(fd, buffer, offset, length, position, req);
          ^

Error: Offset is out of bounds
    at Error (native)
    at Object.fs.read (fs.js:681:11)
    at onread (/Users/paulfrazee/npm/lib/node_modules/dat/node_modules/random-access-file/index.js:84:8)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:675:17)

Provide an API for user defined metadata

To enable broader uses of files in a hyperdrive, it requires an API to enable user defined metadata to be added and extracted from files.

Implement file deletion

nasty error message

calling this https://github.com/maxogden/dat/blob/c29fe52378fcdd025c6b664f6fc371b0592e2a6b/index.js#L128

on an entry that has no data (in this case it is a folder that was added as a type: file) causes this error stack:

buffer.js:572
        return this.utf8Write(string, offset, length);
                    ^

TypeError: Argument must be a string
    at TypeError (native)
    at Buffer.write (buffer.js:572:21)
    at Object.exports.bytes.encode (/Users/max/src/js/dat/node_modules/hyperdrive/node_modules/protocol-buffers/encodings.js:35:17)
    at Object.encode (eval at <anonymous> (/Users/max/src/js/dat/node_modules/hyperdrive/node_modules/protocol-buffers/node_modules/generate-function/index.js:55:21), <anonymous>:13:10)
    at Protocol._push (/Users/max/src/js/dat/node_modules/hyperdrive/lib/protocol.js:290:7)
    at Protocol.join (/Users/max/src/js/dat/node_modules/hyperdrive/lib/protocol.js:302:8)
    at Swarm.join (/Users/max/src/js/dat/node_modules/hyperdrive/lib/swarm.js:135:19)
    at new Feed (/Users/max/src/js/dat/node_modules/hyperdrive/lib/feed.js:24:37)
    at Feed (/Users/max/src/js/dat/node_modules/hyperdrive/lib/feed.js:11:39)
    at Hyperdrive.get (/Users/max/src/js/dat/node_modules/hyperdrive/index.js:42:10)

would be nice to have a more user friendly error here

rename .bytesOffset to .byteOffset

/cc @andrewosh

max's list of hyperdrive feature requests

download a single file by filename randomly
upload progress (for showing total amount downloaded, total amount uploaded when sharing)
allow string or buffer as the hex link
use catfile for storage/decouple filesystem
fix performance bottleneck on large dats
switch from index-at-end to index-on-chunk
add digest block to end that has total size + file count
dont expose hash to swarm (only share hash of hash etc)
remove the drive.add stats object

list() ranges

Iterating over every file in a hyperdrive could become slow for archives with very many files.

Adding leveldb-style iterator ranges would speed things up for some use-cases, such as listing all the files in a directory:

archive.list({ gt: 'dir1/', lt: 'dir1/\uffff' })

archive.list({live:false}) is confusing when archive.live doesn't match

The API for archive.list is confusing. The live option doesn't follow the archive.live option and may not give you what you expect (e.g for archive.live = false and archive.list({live: true}).

Possible changes:

Rename this option.
End stream if there is a callback present?

Add Deduplication simple index code example to Specification

https://github.com/mafintosh/hyperdrive/pull/10/files#discussion_r47463174

Just making a reminder so we don't loose track of the inline PR discussion for this one issue.

setting type: 'directory' doesnt override mode

i think this is the code https://github.com/mafintosh/hyperdrive/blob/0926c869ed62a22fbe5e11f67cd4bc0ed9356346/lib/pack.js#L42

if i write this entry on one side:

{ name: 'folder',
  mode: 16877,
  uid: 501,
  gid: 20,
  mtime: 1452023306000,
  ctime: 1452023306000,
  type: 'directory' }

i get this on the other side:

{ type: 'file',
  value: 
   { name: 'folder',
     mode: 16877,
     size: 0,
     uid: 501,
     gid: 20,
     mtime: 1452023306000,
     ctime: 1452023306000 },
  link: null }

its a folder on disk, im not sure why its treating it as a file here

API for flatten filesystem

archive.list and archive.lookup currently work on the whole metadata records. the archive api should maintain (block head) -> (flatten tree) and provide api to work on them.

mkdirp EEXIST error with long path

I added a node_modules folder to hyperdrive. but now i get this error (in electron):

Error: EEXIST: file already exists, mkdir '/Users/max/Desktop/electron/node_modules/lightning-image-poly/node_modules/brfs/node_modules/static-module/node_modules/static-eval/node_modules/escodegen/node_modules/.bin/esvalidate'
    at Error (native)

which came from this line: https://github.com/mafintosh/hyperdrive/blob/73425f4665045a3092be37adbbc4ab5eef557e90/index.js#L231

I checked it out and other paths < 184 chars long worked. not sure if this path being 184 chars long is what caused it

deprecate .append and support passthrough flag in createFileWriteStream

hyperdrive 6.0 exception

Since the latest hyperdrive update (6.0) I get this exception:

Uncaught TypeError: peer.emit is not a function
    at Channel.onend (node_modules/hyperdrive/node_modules/hypercore/lib/replicate.js:203:8)
    at Protocol._close (node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/index.js:492:13)
    at Protocol.onfinalize (node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/index.js:193:14)
    at Protocol.Duplexify._destroy (node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/node_modules/duplexify/index.js:191:8)
    at node_modules/hyperdrive/node_modules/hypercore/node_modules/hypercore-protocol/node_modules/duplexify/index.js:174:10

Hyperdrive version: 6.0
Hypercore version: 4.1.0
Hypercore protocol: 4.3.0

Regards,

Connecting node hyperdrive with browser based WebRTC hyperdrive

Would it be possible to have the node version talk to the browser version of the script?

Currently if I try to connect the second example in the README.md (example code below) to the browser based version (eg. http://mafintosh.github.io/hyperdrive/) the script crashes with the following error:

events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: connect ECONNREFUSED [browsers ip address]:[random port number]
    at Object.exports._errnoException (util.js:856:11)
    at exports._exceptionWithHostPort (util.js:879:20)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1063:14)

Example code

var disc = require('discovery-channel')()
var hyperdrive = require('hyperdrive')
var net = require('net');
var levelup = require('levelup')
var aLevelDB = levelup('./mydb')
var drive = hyperdrive(aLevelDB)

var link = new Buffer({your-hyperdrive-link-from-the-above-example}, 'hex')

var server = net.createServer(function (socket) {
  socket.pipe(drive.createPeerStream()).pipe(socket)
})

server.listen(0, function () {
  function ann () {
    // discovery-channel currently only works with 20 bytes hashes
    disc.announce(link.slice(0, 20), server.address().port)
  }

  ann()
  setInterval(ann, 10000)

  var lookup = disc.lookup(link.slice(0, 20))

  lookup.on('peer', function (ip, port) {
    var socket = net.connect(port, ip)
    socket.pipe(drive.createPeerStream()).pipe(socket)
  })
})

Hyperdrive Name

Add hyperdrive name so we can create a download folder with that name.

 drive.createArchive(key, {
      name: 'my-data-folder'
 }

downloading twice never calls download callback

Create an archive and download the contents successfully, then exit the process. Then, run it again. You'll notice that the callback is never fired to finish the download. Example:

$ dat dat://.....
Download complete, etc.
^C
$ dat dat://
Connected to 1/1 peers

'download complete' never fires because the callback isn't called.

possible mem leak, investigate

> [email protected] start /home/maf/versions/dat.haus-1466086479375
> taco-nginx --domain dat.haus node index.js

Server is listening on port 41616
(node) warning: possible EventEmitter memory leak detected. 11 download listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:432:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 upload listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:436:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 download listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:432:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 upload listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:436:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 download listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:432:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7
(node) warning: possible EventEmitter memory leak detected. 11 upload listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Feed.addListener (events.js:239:17)
    at onindex (/home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:436:18)
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/archive.js:421:7
    at /home/maf/versions/dat.haus-1466086479375/node_modules/hyperdrive/node_modules/hypercore/node_modules/subleveldown/node_modules/levelup/lib/levelup.js:230:7

id and remoteId should match if you connect to yourself

if i comment this out https://github.com/maxogden/dat/blob/master/index.js#L153, which allows local connections, these two return different strings https://github.com/maxogden/dat/blob/master/index.js#L160-L161 even though the two connections are coming from the same process

Empty dir not created

#70 added a test case for that (but the .download() call was wrong and corrected by #82). We'll need the storage api to support mkdir: in addition to file: for this to work.

Appending existing file overwrites file & changes mtime

Adding a file to archive with createFileWriteStream overwrites the existing file and changes the mtime, when using hyperdrive + random-access-file.

This means we can't use mtime to check if files are updated because hyperdrive changes mtime (our own little observer effect).
Overwriting existing files seems like unwanted behavior.

This may be a storage/raf issue.

Related issue in hyperdrive-import-files

How about encrypted connection and peer authentication?

I tried to read as much as I could, but I saw nowhere anything related to peer-to-peer connections being authenticated and encrypted. It looks like as now hyperdrive has no security at all. Did I miss something or it is not in the scope of hyperdrive?

If somebody wants to have authentication and encryption of the connections, which npm module and public/private key system would you recommend?

NAT traversal botheration in example code

I'm not sure if this is that important, but I'm finding it hard to get the README example running because it requires NAT traversal. It works fine if I modify it to just connect the two hyperdrives directly to each other.

A couple ideas that could solve this:

Connect the two hyperdrives together without using discovery-channel, e.g. by printing out the port to connect to.
Use webrtc-swarm, which should take advantage of ICE and should always work to let a computer connect to itself, because there will be local ICE candidates.

[Spec] 16-bit Index block entries too restrictive?

Hi, great work from you and the Dat team on this and all of the other component packages. This is all very promising and already super useful! I've been reviewing everything top-to-bottom, and I've run into one design decision that seems a bit too constraining, even in the near-term, and so in the interest of "future-proofing" the spec a bit, I'd like to initiate a discussion.

The decision is the specific choice of exclusively using 16-bit values to hold the Rabin chunked block sizes in the "Index blocks" to provide seekability within the blocks making up a file's feed. This seems to place a rather artificial 64KB upper limit on the maximum size of any chunk, which for TB-scale files would require they be split into 10s of millions of chunks in the process. The block index alone for such a file would weigh-in at > 20MB. The 16-bit values are probably fine for files up to about 1 GB, but we already routinely handle individual files of 100 GB, with larger ones coming all of the time, and so I'd very interested in increasing the random access granularity and deduplication scale in exchange for somewhat lower overhead and better performance.

I've also been benchmarking streaming GB-scale files into replicated NoSQL databases (MongoDB and RethinkDB) using your Rabin chunking and content-addressable indexes, and I'm finding that 20KB mean length chunks are ~2-4x less efficient to store and retrieve (in terms of throughput) when compared to 200KB chunks. From 200KB to ~1MB, the performance benefit of using larger chunks flattens out, but up to that point, the improvement from using larger chunks is substantial.

Are you open to discussing the possibility of either increasing the block index to use 32-bit values, or alternatively, making the choice of 16/32-bit values an option on a per-file/feed basis?

Thanks! -Vaughn

Plans to continuously sync files or folders?

I'm wondering if there are any plans to add the ability to continuously sync a local folder such that all updates that I make are automatically transferred to anyone who wants the latest copy. I can imagine that a signed append-only log like secure-scuttlebutt would provide a way to do this, but maybe you guys already have something in the works.

Doing this with a multi-writer setup would be really difficult, because you would then have to deal with conflict resolution.

add .datdownload to files in progress

Error: entry.content is required for byte cursors

I got this cryptic error from hyperdrive. It happened when I tried to use createReadFileStream on an archive loaded from dat.land.

Error: entry.content is required for byte cursors
    at /Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:128:37
    at Archive.get (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/archive.js:108:55)
    at Cursor._open (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:126:16)
    at open (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:29:10)
    at run (/Users/paulfrazee/my/beaker-browser/app/node_modules/thunky/index.js:13:3)
    at Cursor.open (/Users/paulfrazee/my/beaker-browser/app/node_modules/thunky/index.js:27:3)
    at Cursor._openAndNext (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:163:8)
    at Cursor.next (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/cursor.js:49:33)
    at read (/Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/archive.js:296:9)
    at /Users/paulfrazee/my/beaker-browser/app/node_modules/hyperdrive/archive.js:305:7

Here's what the entries listing looks like:

"[
  {
    "blocks": 2,
    "content": null,
    "ctime": 1468883103188,
    "gid": 0,
    "length": 30266,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103188,
    "name": "figure-1.1.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.1.png"
  },
  {
    "blocks": 2,
    "content": null,
    "ctime": 1468883103262,
    "gid": 0,
    "length": 22364,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103262,
    "name": "figure-1.1a.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.1a.png"
  },
  {
    "blocks": 3,
    "content": null,
    "ctime": 1468883103329,
    "gid": 0,
    "length": 37023,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103329,
    "name": "figure-1.2.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.2.png"
  },
  {
    "blocks": 4,
    "content": null,
    "ctime": 1468883103385,
    "gid": 0,
    "length": 53469,
    "linkname": "",
    "mode": 0,
    "mtime": 1468883103385,
    "name": "figure-1.3.png",
    "type": "file",
    "uid": 0,
    "path": "/figure-1.3.png"
  }
]"

Document sparse mode

archives should be live per default

set mtime/ctime when using .append

/cc @pfraze

Implement archive deletion

Not really an issue, more just a reminder/to add to a milestone if hyperdrive ever uses milestones

Hyperdrive crashes trying to get empty files

This means that a dat clone operation will fail if the directory to be cloned includes any empty files. They can be written, but not read. Probably this is worth fixing, since empty files can be useful in some situations.

I added a failing test that illustrates this problem.

archive.list() doesnt callback

I'm getting some inconsistent behaviors from list(). Maybe my usage is wrong.

Here's what I'm doing:

import hyperdrive from 'hyperdrive'
import memdb from 'memdb'
import swarm from 'hyperdrive-archive-swarm'

var drive = hyperdrive(memdb())

function fetch (archiveKey, path) {
  // assume `archiveKey` is a valid key

  // start searching the network
  log('[DAT] Swarming archive', archiveKey)
  var archive = drive.createArchive(archiveKey)
  var sw = swarm(archive)

  // wait for a peer
  sw.once('peer', function (peer) {
    log('[DAT] Swarm peer:', peer, '('+archiveKey+')')

    // list archive contents
    log('[DAT] attempting to list archive')
    archive.list((err, entries) => {
      log('[DAT] list() results', err, entries)
    })
  })
}

What I'm seeing is, the 'list() results' log isnt ever happening. (I had it work a couple times, but it hasn't since.)

Any ideas? [email protected], [email protected]

Download progress event 'ready' doesnt fire

Test case is here: https://gist.github.com/maxogden/e63104ca72a26d428690

It should print console.log('download started', entry.name, dl) but doesn't for me

Using node 4.3.2

What files does `list` list for a non-live archive?

Does the list stream end after it's listed all files in the archive? Or does it end after it's listed all files in the local copy? I think it does the second thing, but I was expecting it to do the first one. What do you guys think of updating the behavior of list in this case? If not, a clarification in the README would be cool.

.createHistoryStream

history hash

Error: Invalid enum value: 39725

After the second file I store in an archive, I get this error:

Error: Invalid enum value: 39725
    at Object.decode (eval at <anonymous> (/home/substack/projects/hyperdrive/node_modules/generate-function/index.js:55:21), <anonymous>:5:50)
    at Object.decode (eval at <anonymous> (/home/substack/projects/hyperdrive/node_modules/generate-function/index.js:55:21), <anonymous>:31:27)
    at /home/substack/projects/hyperdrive/index.js:391:36
    at /home/substack/projects/hyperdrive/node_modules/levelup/lib/levelup.js:230:7

Here's the hex dump of the value that messages.Index.decode(buf) fails on:

080012213132373638383131303937373532363039362f4453435f3130323933322e6a706718adb6022005

crash on feed access

when we asked for feed.get for an index that didnt exist:

undefined:5
  if (!end) end = buf.length
                     ^

TypeError: Cannot read property 'length' of null
    at Object.decode (eval at <anonymous> (/Users/max/src/js/hyperdrive/node_modules/protocol-buffers/node_modules/generate-function/index.js:55:21), <anonymous>:5:22)
    at /Users/max/src/js/hyperdrive/lib/feed.js:61:32
    at /Users/max/src/js/hyperdrive/lib/feed.js:42:71
    at apply (/Users/max/src/js/hyperdrive/node_modules/thunky/index.js:16:28)
    at Feed.open (/Users/max/src/js/hyperdrive/node_modules/thunky/index.js:27:3)
    at Feed.get (/Users/max/src/js/hyperdrive/lib/feed.js:41:8)
    at downloadNext (/Users/max/src/js/dat/index.js:106:10)
    at /Users/max/src/js/dat/index.js:127:9
    at done (/Users/max/src/js/run-series/index.js:11:7)
    at Object.cb (/Users/max/src/js/dat/index.js:117:15)

Document lookup()

It's not in the readme yet

Confusing behavior when a relative path is passed into appendFile

In the code below, the callback is never called. I don't think that hyperdrive necessarily needs to support relative paths, but either way the callback should definitely be called.

var memdb = require('memdb')
var os = require('os')
var hyperdrive = require('hyperdrive')

var drive = hyperdrive(memdb())
var tmpDir = os.tmpdir()
var archive = drive.add(tmpDir)
archive.appendFile('./README.md', console.log)

I added a test that shows this behavior, here.

Progress Stats for 6

Total size stats (once finalized):

Total archive size
Total archive file count

Progress Stats for Individual File Appends/Downloads:

Bytes downloaded/uploaded
bytes added (?)
Bytes total

Returning an object from append/download, like we did before, will work.

Along with these events, I think thats all we will need for Dat CLI.

Empty files aren't downloaded

Empty files get added to metadata but never get transferred or downloaded.

Reproduce:

Run dat link in dat tests fixtures folder (with empty.txt)
Download the dat link to another folder.
folder and empty.txt are missing but metadata gets read (they appear in createEntryStream)

crash on empty file

/Users/max/src/js/hyperdrive/lib/pack.js:58
    for (var i = 0; i < content.index.length; i++) file.size += content.index[i]
                                     ^

TypeError: Cannot read property 'length' of null
    at onfinish (/Users/max/src/js/hyperdrive/lib/pack.js:58:38)
    at emitNone (events.js:72:20)
    at emit (events.js:166:7)
    at finishMaybe (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:509:14)
    at afterWrite (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:388:3)
    at onwrite (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:378:7)
    at WritableState.onwrite (/Users/max/src/js/hyperdrive/node_modules/duplexify/node_modules/readable-stream/lib/_stream_writable.js:123:5)
    at g (events.js:260:16)
    at emitNone (events.js:67:13)
    at emit (events.js:166:7)

deleting this file fixed the crash:

Document archive.live

(According to jhand) you can see if an archive is live with the .live attribute. This isn't in the docs, atm.

Add support for Firefox

Visiting https://mafintosh.github.io/hyperdrive/ using Firefox 42.0 on OSX 10.10.5 results in a blank page.

There are no errors logged to the console.

If you could hint at what the problem might be I'd be happy to take a crack at solving it.

An easy way to check if the entire archive is present?

I'd like to check whether all files and metadata are present in a non-live archive. The way that I'm currently doing this is by calling list, then creating a read stream for each entry and waiting until all the streams emit end. I'm guessing that there's probably a better way for me to do this? Or, if not, is this worth adding?

hypercore-protocol / hyperdrive Goto Github PK

hyperdrive's Introduction

Hyperdrive

Install

Usage

API

const drive = new Hyperdrive(store, [key])

await drive.ready()

await drive.close()

drive.corestore

drive.db

drive.core

drive.id

drive.key

drive.discoveryKey

drive.contentKey

drive.writable

drive.readable

drive.version

drive.supportsMetadata

await drive.put(path, buffer, [options])

const buffer = await drive.get(path, [options])

const entry = await drive.entry(path, [options])

const exists = await drive.exists(path)

await drive.del(path)

const comparison = drive.compare(entryA, entryB)

const cleared = await drive.clear(path, [options])

const cleared = await drive.clearAll([options])

await drive.purge()

await drive.symlink(path, linkname)

const batch = drive.batch()

await batch.flush()

const stream = drive.list(folder, [options])

const stream = drive.readdir(folder)

const stream = await drive.entries([range], [options])

const mirror = drive.mirror(out, [options])

const watcher = drive.watch([folder])

const rs = drive.createReadStream(path, [options])

const ws = drive.createWriteStream(path, [options])

await drive.download(folder, [options])

const snapshot = drive.checkout(version)

const stream = drive.diff(version, folder, [options])

await drive.downloadDiff(version, folder, [options])

await drive.downloadRange(dbRanges, blobRanges)

const done = drive.findingPeers()

const stream = drive.replicate(isInitiatorOrStream)

const updated = await drive.update([options])

const blobs = await drive.getBlobs()

License

hyperdrive's People

Contributors

Stargazers

Watchers

Forkers

hyperdrive's Issues

Example code

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`const drive = new Hyperdrive(store, [key])`

`await drive.ready()`

`await drive.close()`

`drive.corestore`

`drive.db`

`drive.core`

`drive.id`

`drive.key`

`drive.discoveryKey`

`drive.contentKey`

`drive.writable`

`drive.readable`

`drive.version`

`drive.supportsMetadata`

`await drive.put(path, buffer, [options])`

`const buffer = await drive.get(path, [options])`

`const entry = await drive.entry(path, [options])`

`const exists = await drive.exists(path)`

`await drive.del(path)`

`const comparison = drive.compare(entryA, entryB)`

`const cleared = await drive.clear(path, [options])`

`const cleared = await drive.clearAll([options])`

`await drive.purge()`

`await drive.symlink(path, linkname)`

`const batch = drive.batch()`

`await batch.flush()`

`const stream = drive.list(folder, [options])`

`const stream = drive.readdir(folder)`

`const stream = await drive.entries([range], [options])`

`const mirror = drive.mirror(out, [options])`

`const watcher = drive.watch([folder])`

`const rs = drive.createReadStream(path, [options])`

`const ws = drive.createWriteStream(path, [options])`

`await drive.download(folder, [options])`

`const snapshot = drive.checkout(version)`

`const stream = drive.diff(version, folder, [options])`

`await drive.downloadDiff(version, folder, [options])`

`await drive.downloadRange(dbRanges, blobRanges)`

`const done = drive.findingPeers()`

`const stream = drive.replicate(isInitiatorOrStream)`

`const updated = await drive.update([options])`

`const blobs = await drive.getBlobs()`