GithubHelp home page GithubHelp logo

filecoin-pickaxe's People

Contributors

jimpick avatar rvagg avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

isabella232

filecoin-pickaxe's Issues

support storing using zip archives, split into files, with random-access and forward error correction

Use case:

I'd like to be able to import directories with large numbers of files and subdirectories (for example, source code directories ... eg. the Linux source code). Filecoin would prefer to work with larger files rather than lots of small files, so it makes sense to use a file archive format (eg. tar, zip, etc.)

By splitting the archive file into multiple chunks, it can be stored in multiple places, and uploads/downloads can happen in parallel. The chunk size can be selected to be a size that is optimal for the Filecoin network.

If we re-encoded the chunks using Forward Error Correction before storing, we can introduce some extra resiliency against chunks that are lost and can't be retrieved (eg. only 3 of 5 chunks are needed to restore the original). The zfec tool from tahoe-lafs is an easy-to-user tool for encoding and decoding written in Python and C.

It would be nice to be able to retrieve a file from the archive via random-access so only the chunk containing it would need to be retrieved from the Filecoin network instead of having to download all chunks for the archive.

Tar archives do not have an index. Zip files have an index appended at the end of the archive.

Datpedia uses javascript and the zip index to provide random access to wikipedia entries stored in a single zip file.

It would be nice to store the zip index separately for extra redundancy, optionally using zfec. I'd like to have the ability to still retrieve files from chunks, even if other chunks have been lost.

Some compression formats work better for random access. I've added some links to the wiki:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.