GithubHelp home page GithubHelp logo

spacemeshos / poet Goto Github PK

View Code? Open in Web Editor NEW
22.0 19.0 13.0 1.7 MB

Spacemesh PoET service reference implementation

License: MIT License

Go 96.85% Dockerfile 0.17% Makefile 2.77% Shell 0.22%
golang go proof-of-concept proof-of-space proof-of-space-time proof-of-sequential-work

poet's Issues

Add retries to broadcast failure

When round broadcast is failing, the round stays in "unbroadcasted" state, and only the recovery mechanism can initiate another attempt to broadcast. This is not enough, and we should introduce delayed retries after the initial failure.

Add support for multiple sha256() per Hx()

Add a server param - iter - the number of sequential iterations per Hx() when using sha256(), so we shift total cpu time of proof building to sha256() computations.

func (h *sha256Hash) Hash(data ...[]byte) []byte {

	h.hash.Reset()
	h.hash.Write(h.x)

	for _, d := range data {
		_, _ = h.hash.Write(d)
	}

	temp := h.hash.Sum([]byte{})

	for i := 0; i < h.iters; i++ {
		h.hash.Reset()
		h.hash.Write(h.x)
		h.hash.Write(temp)
		temp = h.hash.Sum(h.emptySlice)
	}

	return temp
}

Clean the prover file when execution stops

The prover key-value storage file is cleaned after execution finished and the NIP was generated.
Need to clean the file also when the execution was interrupted and didn't finish.

Open round recovery policy

Background

  • PoET Service starts with the opening of round 1 (opening duration is specified by the initialduration mandatory flag).
  • Once duration elapsed, round 1 starts executing, and round 2 is opening.
  • Round 2 start of execution (and round 3 opening) is defined by rounds scheduling (duration optional flag). If it happens before round 1 end of execution, there would be 2 rounds executing in-parallel. If after, there would be a period in which no round would be executing. Current behaviour is to start round 2 execution on round 1 end of execution if scheduling time is after or not defined (so there's always at least 1 execution).

Options for when attempting to recover a round in “open” state (after server crash or shutdown):

  1. Execute it immediately and open a new round instead.
  2. Keep it open until the previous round end of execution, regardless of the opening schedule.
  3. If schedule is/was defined, keep it open for the remaining duration, including the duration in which the server was down. If elapsed, execute it immediately.
  4. If schedule is/was defined, keep it open for the remaining duration, not including the duration in which the server was down. This might create a time drift in the overall scheduling.

Option 1 is currently implemented. This creates an issue of creating an additional parallel round execution for every server restart attempt, and thus should be changed.

Main consideration is to whether we plan to use the rounds scheduling or just start executing each round on the previous round end of execution. If the first, options 2-4 are viable. If the later, option 2 is the fallback and need to be implemented anyway.

Please provide feedback for the desired behaviour.

Make PoET thread safe

One case where we're not thread safe is when closing a round -- we may be losing the last membership submissions if they're coming in the last moment.

A thorough review is in order.

PoET core and service

Build the client/server protocol over the POET core implementation to serve multiple miners per one sequential work.

Log messages as JSON

In order to parse poet log messages by ES we need to have those logs as JSON.
Use logs infra from go-spacemesh to print the log messages as JSON when configuring --test-mode execution parameter.

Packages refactoring

  • Extract verifier code from internal to a separated verifier package.
  • Rename internal package as prover, and extract shared code to shared/common package.

PoET discovery

Overview / Motivation

We want to support multiple PoET services and allow a miner to discover them and select between them.

The Task

TODO: Clearly describe the issue requirements here...

Implementation Notes

TODO: Add links to relevant resources, specs, related issues, etc...

Contribution Guidelines

Important: Issue assignment to developers will be by the order of their application and proficiency level according to the tasks complexity. We will not assign tasks to developers who have'nt introduced themselves on our Gitter dev channel

  1. Introduce yourself on go-spacemesh dev chat channel - ask our team any question you may have about this task
  2. Fork branch develop to your own repo and work in your repo
  3. You must document all methods, enums and types with godoc comments
  4. You must write go unit tests for all types and methods when submitting a component, and integration tests if you submit a feature
  5. When ready for code review, submit a PR from your repo back to branch develop
  6. Attach relevant issue to PR

Docker push workflow triggered on PR instead of on merge

With the introduction of dependabot a small issue popped up; PRs created with dependabot failed their CI builds (e.g. this one: #117)

The reason for that is that dependabot doesn't have access to the same set of secrets as PRs created from others have. So it cannot authenticate against Dockerhub and the build fails.

The question here is should we move the "Docker build & push" step out of the CI workflow and move it into one that is only triggered AFTER the merge of a PR, instead of on every commit? At the moment a new image is released every time the image build has been successful, even if tests failed.

@dshulyak @lrettig @evt

Use the new merkle-tree implementation

The number of leafs to prove depends on the security parameter constant (T).
merkle-tree can be used to create more efficient proof of multiple leafs, specially for the verifier which currently use a key-value cache to detect repeating nodes in the tree.

Verifier error handling

The verifier currently crashes (panic) if is given incorrect values (invalid proof). Need to gracefully report these error instead.

failed creating poet client harness: error during killing process: exit status 123 | kill: you need to specify whom to kill

Alpine Linux (which we use to run our tests in docker) has a funny version of lsof installed that doesn't tell the truth:

/go/src/github.com/spacemeshos/go-spacemesh # lsof -i tcp:18550 | grep LISTEN | awk '{print $2}' | xargs kill -9
kill: you need to specify whom to kill
/go/src/github.com/spacemeshos/go-spacemesh # apk add busybox-extras
(1/1) Installing busybox-extras (1.31.1-r19)
Executing busybox-extras-1.31.1-r19.post-install
Executing busybox-1.31.1-r16.trigger
OK: 160 MiB in 44 packages
/go/src/github.com/spacemeshos/go-spacemesh # busybox-extras telnet localhost 18550
Connected to localhost
@hello
/go/src/github.com/spacemeshos/go-spacemesh # lsof -i tcp:18550
1       /bin/busybox    /dev/pts/0
1       /bin/busybox    /dev/pts/0
1       /bin/busybox    /dev/pts/0
1       /bin/busybox    /dev/tty
2941    /tmp/poet/poet  /dev/null
2941    /tmp/poet/poet  /dev/null
2941    /tmp/poet/poet  pipe:[865420]
2941    /tmp/poet/poet  /root/.poet/logs/poet.log
2941    /tmp/poet/poet  anon_inode:[eventpoll]
2941    /tmp/poet/poet  pipe:[862641]
2941    /tmp/poet/poet  pipe:[862641]
2941    /tmp/poet/poet  socket:[862642]
2941    /tmp/poet/poet  socket:[862646]
2941    /tmp/poet/poet  socket:[864392]
2941    /tmp/poet/poet  socket:[863516]
2941    /tmp/poet/poet  /tmp/poet/data/1680/challengesDb/LOCK
2941    /tmp/poet/poet  /tmp/poet/data/1680/challengesDb/LOG
2941    /tmp/poet/poet  /tmp/poet/data/1680/challengesDb/MANIFEST-000000
2941    /tmp/poet/poet  /tmp/poet/data/1680/challengesDb/000001.log
2941    /tmp/poet/poet  /tmp/poet/data/1679/layercache_0.bin (deleted)
/go/src/github.com/spacemeshos/go-spacemesh # apk add lsof
(1/1) Installing lsof (4.93.2-r0)
Executing busybox-1.31.1-r16.trigger
OK: 160 MiB in 45 packages
/go/src/github.com/spacemeshos/go-spacemesh # lsof -i tcp:18550
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
poet    2941 root   11u  IPv4 862642      0t0  TCP localhost:18550 (LISTEN)
poet    2941 root   13u  IPv4 864392      0t0  TCP localhost:50380->localhost:18550 (ESTABLISHED)
poet    2941 root   14u  IPv4 863516      0t0  TCP localhost:18550->localhost:50380 (ESTABLISHED)
/go/src/github.com/spacemeshos/go-spacemesh # lsof -i tcp:18550 | grep LISTEN | awk '{print $2}'
2941
/go/src/github.com/spacemeshos/go-spacemesh # lsof -i tcp:18550 | grep LISTEN | awk '{print $2}' | xargs kill -9
/go/src/github.com/spacemeshos/go-spacemesh # lsof -i tcp:18550
/go/src/github.com/spacemeshos/go-spacemesh # 

This causes problems inside the harness:

args := fmt.Sprintf("lsof -i tcp:%d | grep LISTEN | awk '{print $2}' | xargs kill -9", addr.Port)

I see this error from several of the go-spacemesh tests in github.com/spacemeshos/go-spacemesh/cmd/node including this one:

/go/src/github.com/spacemeshos/go-spacemesh # go test -timeout 0 -p 1 -count 1 -v github.com/spacemeshos/go-spacemesh/
cmd/node -run Test_PoETHarnessSanity
=== RUN   Test_PoETHarnessSanity
    Test_PoETHarnessSanity: app_test.go:72:
                Error Trace:    app_test.go:72
                Error:          Received unexpected error:
                                error during killing process: exit status 123 | kill: you need to specify whom to kill
                Test:           Test_PoETHarnessSanity
--- FAIL: Test_PoETHarnessSanity (1.16s)
FAIL
FAIL    github.com/spacemeshos/go-spacemesh/cmd/node    1.184s
FAIL

This code should make sure the output of the shell command here is not null before passing it to kill.

Integration with node broken

Recent updates #66 and #67 broken integration with the go-spacemesh node. please check the node and run tests before or atleast after merging changes.

Those changes break all integration in a way that go-spacemesh cannot be tested right now.

Implement sha256 using Intel sha256 instructions

This is an epic - should be broken down to several tasks

Background

  • Intel accelerated crypto instructions define sha256 extension instructions - they have not been widely implemented on the last few generations of Intel CPUs
  • AMD ships both server and desktop CPUs with the sha extension instructions (EPYC and RYZEN)
  • Servers with EPYC cpus are now available on [AWS](See [Epyc](https://aws.amazon.com/about-aws/whats-new/2018/11/introducing_amazon_ec2_instances_featuring_amd_epyc_processors/ available), and are good candidates to run Spacemesh production poet for the testnet timeframe
  • Benchmarks show x3-x5 pref over optimized sha256 using AVX2 Intel instructions - there is no faster published sha256 hardware benchmark that I could find
  • There are already several open source libraries that use the SHA instructions to compute SHA256 for a small input buffer (add links)

Tasks

  • Integrate and test sha256() using Intel sha256 instructions using assembly code published by open source libraries
  • The hardware optimized sha256 should be automatically used when the cpu support SHA (cpuid=SHA)...
  • Security and correctness review the implementation by our research team
  • Test and profile on both RYZEN (desktop) and EPYC (server) AMD systems

@iddo333 @moshababo @zalmen @tal-m

Every execution round should follow with a idle period

The Poet period should consist of an execution part where the PoSW is executed and an idle period during which the poet will wait for the beginning of the next execution round. The purpose of the idle period is to allow the miners to generate a PoST proof and get ready for the next round of PoET (assuming that a miner will mostly work with the same PoET for many epochs)

Re-broadcast whenever new gateway nodes are set

@noamnelke wrote:

When all PoET server attempts to broadcast a proof fail we currently can’t re-broadcast a proof without restarting the PoET server. This can easily be improved by initiating a re-broadcast whenever new gateway nodes are set via the API.

Out of memory crash

Either a memory leak or out of memory issue in the current code-base. n=33, 32GB Ram. Crashes at 75% DAG generation. Blocks further benchmarks. We need a better way to generate the dag than the naive in-memory depth-first traversal that is currently implements. @zalmen @moshababo @noamnelke

Add versioning, tagging, docker push

We ran into an issue today where after #105 was merged, go-spacemesh system tests started failing because they don't pin the poet version (see spacemeshos/go-spacemesh#2189). In order to facilitate pinning, we should add versioning here and set up automatic dockerbuild/push when a new version is merged or a new release is cut.

Cleanup proof endpoints

Now that PoET sends membership and PoET proofs via gossip, we should clean up the old gRPC endpoints for obtaining proofs. This may require changing how some tests work.

Document usage

There's currently no documentation on how to run the POET server. Add at least a basic "# Running the server" section to the README. It should contain instructions on how to connect go-spacemesh to it.

Happy to work on this.

@moshababo, does anything special need to be done? When I run poet locally, I noticed that it's looking for a poet.conf. Can we add a sample of this to the repo? Do I need to supply any commandline args? Or should go-spacemesh connect to it (on the default port) with no special params/config required?

service.Info() accesses resources concurrently without protecting

service.go:Info()
reads from openRound and executingRounds while the routine started at Start
writes to these variables. there's even protection (mutex) on executingRounds inside Start
but none when reading in Info().
It is important to say that Info() can be called at any time as it is triggered by an API call.

cc: @moshababo

Faulty optimization in PoET verification

In verifier.Verify, the optimization that marks the proof as valid if we already handled one of the siblings on the way to the root is wrong. The fact that we got to a known sibling says nothing on the validity of the path that we already calculated (i.e. from the leaf to the current position). A correct optimization will require storing in the cache sibling pairs (instead of only one of them) and if we got to a pair we can indeed stop the verification and mark the proof as valid.

PoET CI should be executed on Windows and MacOS

Summary

PoET is used as a dependency for go-spacemesh. Since go-spacemesh is built for Linux, MacOS and Windows, PoET should also be able to be built on all 3 platforms:

Acceptance criteria

  • Update CI to build and run tests on all 3 platforms
    • Add standard set of linters & checkers in CI (staticcheck, golangci-lint, test-fmt, etc.)
    • Report passing & failing tests using JUnit Report
  • Target Go 1.19 instead of 1.18
  • Fix existing tests that might fail on non-Linux platforms

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.