GithubHelp home page GithubHelp logo

spec's People

Contributors

bmann avatar expede avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spec's Issues

Standard Task Types

Just using this Issue as a placeholder for planning task types to ship out of the box. These are all low effort, high value.

  • Deterministic Wasm
  • Dereference CID ("Content Handles")
  • Source of Randomness
  • HTTP (GET, PUT, POST, PATCH, DELETE, INDEX)
  • DNS (get, set)
  • Stretch: encryption/decryption (becuase there's some complexity on how to do this safely)

Please submit more if there's more that are critical for your use case

Scheduler Guarantees

In the state of #8 at time of writing, tasks are classified as ipvm/wasm and ipvm/effect. This is almost certainly wrong.

From chatting with @lukemarsden & @simonwo earlier, what we probably actually care about is:

  1. Signalling the kind of thing to be run (Wasm, Docker, HTTP, etc)
  2. Under which assumptions (e.g. deterministic Wasm subset, has direct disk access, etc)
  3. Scheduler guarantees (can be safely retried, needs oracle attestation, needs a job lock, must be reproducible for verification, etc)

I think that it's possible to do this by classification rather than writing a config file that could be complex and self-contradictory (fully deterministic and direct disk access).

The easy one is a delineation between pure computation and anything stateful. Docker falls into the stateful bucket itself, so we cannot isolate its effects, and thus oracle attestation is the level of reproducibility (low). But Bacalhau is "safe" to run in the sense that it doesn't produce destructive effects (it's "nondestructive" in the current WIP classification). It does depend on the external world for randomness and time and so on, but you could "safely" schedule these in sequence or parallel without breaking that contract.

Execution from source code

A question that was raised recently was "how would one run source code directly from a Task". For example, how would you run a Python script?

We know from other systems like Nix that this is actually a nontrivial case to make reproducible. It depends on the chip architecture, OS, available system libraries, and so on.

You can specify all of these by hash ahead of time. A generic task type "run this Python" could fall into one of the following strategies:

  1. If you don't care about reproducibility, set that expectation as a very loose/dangerous/attested effect
  2. Fully specification the expectations of the environment (x86 Linux, with these versions down to the hash of these libraries installed)
  3. Run inside of an established Docker container (by CID) that has the relevant environment set up
  4. Feed the source to a Wasm Python interpreter
  5. Feed the source to a Wasm compiler
  6. Compile the Python to Wasm and execute it directly

2, 3, and 4 are actually really interesting, because you don't even need a new kind of Task. The source can "just" be an argument, and either use the output to further steps (4) or get the result back directly.

4 additionally gets you new Wasm that can be cached for future invocation, which is really helpful for (automatically) publishing reproducible packages to registries — which is a common workflow for e.g. a GitHub Actions.

3 and 4 and kind of microkernel-y, because you could later swap in a different (faster, bugfixed, etc) interpreter/compiler.

CC @simonwo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.