GithubHelp home page GithubHelp logo

status-im / nim-blscurve Goto Github PK

View Code? Open in Web Editor NEW
24.0 24.0 11.0 1.29 MB

Nim implementation of BLS signature scheme (Boneh-Lynn-Shacham) over Barreto-Lynn-Scott (BLS) curve BLS12-381

License: Apache License 2.0

C 0.16% Nim 94.80% Sage 3.83% Shell 1.21%
bls cryptography pairing pairing-cryptography signature-scheme elliptic-curve-cryptography elliptic-curves elliptic-curve-arithmetic

nim-blscurve's Introduction

BLS Signature Scheme over BLS12-381 pairing-friendly curve

License: MIT License: Apache Github action

This library implements:

  • The BLS signature scheme (Boneh-Lynn-Shacham)
  • over the BLS12-381 (Barreto-Lynn-Scott) pairing-friendly curve

Cipher suite ID: BLS_SIG_BLS12381G2_XMD:SHA-256_SSWU_RO_POP_

Installation

You can install the developement version of the library through nimble with the following command

nimble install https://github.com/status-im/nim-blscurve

Implementation stability

This repo follows Ethereum 2.0 requirements.

Besides the standardization work described below, no changes are planned upstream for the foreseeable future.

Standardization

Currently (Jun 2019) a cross-blockchain working group is working to standardize BLS signatures for the following blockchains:

  • Algorand
  • Chia Network
  • Dfinity
  • Ethereum 2.0
  • Filecoin
  • Zcash Sapling

Signature scheme

Hashing to curve

Note: the implementation was done following Hash-to-curve v7 v9 and v7 are protocol compatible but have cosmetic changes (naming variables, precomputing constants, ...)

Curve implementation

Backend

This library uses:

BLST uses SSSE3 by default, if supported on the host. To disable that, when building binaries destined for older CPUs, pass -d:BLSTuseSSSE3=0 to the Nim compiler.

Executing the test suite

We recommend working within the nimbus build environment described here: https://github.com/status-im/nim-beacon-chain/

To execute the test suite, just navigate to the root of this repo and execute:

nimble test

Please note that within the nimbus build environment, the repository will be located in nim-beacon-chain/vendor/nim-blscurve.

Executing the fuzzing tests

Before you start, please make sure that the regular test suite executes successfully (see the instructions above). To start a particular fuzzing test, navigate to the root of this repo and execute:

nim tests/fuzzing/run_fuzzing_test.nims <test-name>

You can specify the fuzzing engine being used by passing an additional --fuzzer parameter. The currently supported engines are libFuzzer (used by default) and afl.

All fuzzing tests are located in tests/fuzzing and use the following naming convention:

fuzz_<test-name>.nim

License

Licensed and distributed under either of

at your option. These files may not be copied, modified, or distributed except according to those terms.

Dependencies

  • SupraNational BLST is distributed under the Apache License, Version 2.0

nim-blscurve's People

Contributors

arnetheduck avatar asanso avatar avitkauskas avatar cheatfate avatar etan-k avatar etan-status avatar henridf avatar jangko avatar markspanbroek avatar mratsim avatar narimiran avatar stefantalpalaru avatar swader avatar tersec avatar yglukhov avatar yyoncho avatar zah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nim-blscurve's Issues

Alternate streaming scheme

In Shasper spec, validators will have a bls_proof_of_possession see https://github.com/ethereum/eth2.0-specs/blob/master/specs/casper_sharding_v2.1.md#pow-main-chain-changes and https://github.com/ethereum/eth2.0-specs/blob/master/specs/casper_sharding_v2.1.md#routine-for-adding-a-validator

This means that we do not need to build a big sequence of validator keys and then sort, we can directly loop other the validators sequence.

Actually it's probably require to sign in that order.

Miracl backend: Accelerate "KeyValidate"

Currently verification incurs a significant scalar multiplication overhead due to the repeated need of validating public key.

https://tools.ietf.org/html/draft-irtf-cfrg-bls-signature-04#section-2.5

2.5.  KeyValidate

   The KeyValidate algorithm ensures that a public key is valid.  In
   particular, it ensures that a public key represents a valid, non-
   identity point that is in the correct subgroup.  See Section 5.1 for
   further discussion.

   As an optimization, implementations MAY cache the result of
   KeyValidate in order to avoid unnecessarily repeating validation for
   known keys.

   result = KeyValidate(PK)

   Inputs:
   - PK, a public key in the format output by SkToPk.

   Outputs:
   - result, either VALID or INVALID

   Procedure:
   1. xP = pubkey_to_point(PK)
   2. If xP is INVALID, return INVALID
   3. If xP is the identity element, return INVALID
   4. If pubkey_subgroup_check(xP) is INVALID, return INVALID
   5. return VALID

The pubkey_subgroup_check is costly

func subgroupCheck(P: GroupG1 or GroupG2): bool =
## Checks that a point `P`
## is actually in the subgroup G1/G2 of the BLS Curve
var rP = P
{.noSideEffect.}:
rP.mul(CURVE_Order)
result = rP.isInf()

We have 2 solutions:

  1. Instead of using scalar multiplication, use Bowe19 for Miracl backend, like what BLST does, see pairingwg/bls_standard#21
subgroup_test_g2(P)

Input: a point P
Output: True if P is in the order-q subgroup of E2, else False

Constants:
- z = -0xd201000000010000
- point_at_infinity_E2 is the point at infinity on E2

Steps:
1. pP = psi(P)
2. pP = psi(pP)
3. Q = P - pP
4. pP = psi(pP)
5. pP = z * pP
6. Q = Q + pP
7. return Q == point_at_infinity_E2

psi is already defined for clearCofactor

  1. We can cache the result of "KeyValidate" as suggested in the spec, probably by introducing a CheckedPublicKey type. Note that BLST doesn't offer verification primitives that skip the subgroup check.

Compiler regression with Nim > 1.2.0

Upstream nim-lang/Nim#14136

The following does not work on latest Nim devel:

type
  MDigest*[bits: static[int]] = object
    ## Message digest type
    data*: array[bits div 8, byte]

  Sha2Context*[bits: static[int],
               bsize: static[int],
               T: uint32|uint64] = object
    count: array[2, T]
    state: array[8, T]
    buffer: array[bsize, byte]

  sha256* = Sha2Context[256, 64, uint32]

template hmacSizeBlock*(h: typedesc): int =
  when (h is Sha2Context):
    int(h.bsize)
  else:
    {.fatal: "Choosen hash primitive is not yet supported!".}

type
  HMAC*[HashType] = object
    ## HMAC context object.
    mdctx: HashType
    opadctx: HashType
    ipad: array[HashType.hmacSizeBlock, byte]
    opad: array[HashType.hmacSizeBlock, byte]

func hkdfExtract*[T;S,I: char|byte](ctx: var HMAC[T],
                     prk: var MDigest[T.bits], # <------- error here "Error: type expected"
                     salt: openArray[S],
                     ikm: openArray[I]
                    ) =
  discard

var ctx: HMAC[sha256]
var prk: MDigest[sha256.bits]
let salt = [byte 0x00, 0x01, 0x02]
let ikm = "CompletelyRandomInput"

ctx.hkdfExtract(prk, salt, ikm)

This was not detected because due to #39 nim-blscurve was temporarily removed from testing important packages.

Note: The IETF Hash-To-Curve draft 6 and 7 will remove HKDF needs though we use it in nim-eth and nim-libp2p as well.

Upgrade BLST and remove 2 now unnecessary workarounds

Our BLST target commit is slightly old. We should update.

  1. The main benefit is the release of a portable SHA256 which doesn't require SSE3:

And thus we wouldn't require this workaround:

when BLS_FORCE_BACKEND == "blst" or (
BLS_FORCE_BACKEND == "auto" and
sizeof(int) == 8 and
(defined(arm64) or (
defined(amd64) and
gorgeEx(getEnv("CC", "gcc") & " -march=native -dM -E -x c /dev/null | grep -q SSSE3").exitCode == 0))
):
const BLS_BACKEND* = BLST
else:
const BLS_BACKEND* = Miracl

  1. The second benefit is that the aliasing bug supranational/blst#22 as been closed upstream

And thus we wouldn't require this workaround

when defined(gcc):
# * Using ftree-loop-vectorize will miscompile scalar multiplication
# for example used to derive the public key in blst_sk_to_pk_in_g1
# * Using ftree-slp-vectorize miscompiles something when used
# in nim-beacon-chain in Travis CI (TODO: test case)
# no-tree-vectorize removes both
{.passC: "-fno-tree-vectorize".}

  1. This is not just a submodule bump, we need to take into account the API change at

Key generation according to draft standard

The current key generation predates the draft standard which is likely to be adopted (though it expired on Feb 9)

Implementation

proc random*(a: var BIG_384) =
## Generates random big number `bit by bit` using nimcrypto's sysrand
## generator.
var
rndBuffer: array[MODBYTES_384, byte]
rndByte: byte
j: int32
k: int32
doAssert(randomBytes(rndBuffer) == MODBYTES_384)
let length = 8 * MODBYTES_384
a.zero()
for i in 0..<length:
if j == 0:
rndByte = rndBuffer[k]
inc(k)
else:
rndByte = rndByte shr 1
let bit = Chunk(rndByte and 1'u8)
BIG_384_shl(a, 1)
a[0] = a[0] + bit
inc(j)
j = j and 0x07
proc randomNum*(a: var BIG_384, q: BIG_384) =
## Generates random big number `bit by bit` over modulo ``q`` using
## nimcrypto's sysrand generator.
var
d: DBIG_384
rndBuffer: array[MODBYTES_384 * 2, byte]
rndByte: byte
j: int32
k: int32
doAssert(randomBytes(rndBuffer) == MODBYTES_384 * 2)
let length = 2 * BIG_384_nbits(q)
a.zero()
for i in 0..<length:
if j == 0:
rndByte = rndBuffer[k]
inc(k)
else:
rndByte = rndByte shr 1
let bit = Chunk(rndByte and 1'u8)
BIG_384_dshl(d, 1)
d[0] = d[0] + bit
inc(j)
j = j and 0x07
BIG_384_dmod(a, d, q)

proc random*(t: typedesc[SigKey]): SigKey {.inline.} =
## Creates new random Signature (Private) key.
randomnum(result.x, CURVE_Order)
proc random*(t: typedesc[KeyPair]): KeyPair {.inline.} =
## Create new random pair of Signature (Private) and Verification (Public)
## keys.
result.sigkey = SigKey.random()
result.verkey = result.sigkey.getKey()

Draft standard

https://tools.ietf.org/html/draft-irtf-cfrg-bls-signature-00#section-2.3

   The KeyGen algorithm generates a pair (PK, SK) deterministically
   using the secret octet string IKM.

   KeyGen uses HKDF \[RFC5869\] instantiated with the hash function H.

   For security, IKM MUST be infeasible to guess, e.g., generated by a
   trusted source of randomness.  IKM MUST be at least 32 bytes long,
   but it MAY be longer.
   Because KeyGen is deterministic, implementations MAY choose either to
   store the resulting (PK, SK) or to store IKM and call KeyGen to
   derive the keys when necessary.

   (PK, SK) = KeyGen(IKM)

   Inputs:
   - IKM, a secret octet string. See requirements above.

   Outputs:
   - PK, a public key encoded as an octet string.
   - SK, the corresponding secret key, an integer 0 <= SK < r.

   Definitions:
   - HKDF-Extract is as defined in RFC5869, instantiated with hash H.
   - HKDF-Expand is as defined in RFC5869, instantiated with hash H.
   - L is the integer given by ceil((1.5 * ceil(log2(r))) / 8).
   - "BLS-SIG-KEYGEN-SALT-" is an ASCII string comprising 20 octets.
   - "" is the empty string.

   Procedure:
   1. PRK = HKDF-Extract("BLS-SIG-KEYGEN-SALT-", IKM)
   2. OKM = HKDF-Expand(PRK, "", L)
   3. x = OS2IP(OKM) mod r
   4. xP = x * P
   5. SK = x
   6. PK = point_to_pubkey(xP)
   7. return (PK, SK)

HKDF with extract/expand step separation is already implemented as part of the hash_to_curve PR.

[BLST] Remove heap allocation on aggregateVerify

Currently this is needed:

func aggregateVerify*(
publicKeys: openarray[PublicKey],
proofs: openarray[ProofOfPossession],
messages: openarray[string or seq[byte]],
signature: Signature): bool =
## Check that an aggregated signature over several (publickey, message) pairs
## returns `true` if the signature is valid, `false` otherwise.
##
## Compared to the IETF spec API, it is modified to
## enforce proper usage of the proof-of-possessions
# Note: we can't have openarray of openarrays until openarrays are first-class value types
if publicKeys.len != proofs.len or publicKeys != messages.len:
return false
if not(publicKeys.len >= 1):
return false
# TODO: un-ref (stack smashing)
var ctx{.noInit.}: ref ContextCoreAggregateVerify
new ctx
ctx[].init()
for i in 0 ..< publicKeys.len:
if not publicKeys[i].popVerify(proofs[i]):
return false
ctx[].update(publicKeys[i], messages[i], DST)
return ctx[].finish(signature)

The context is ref or we have corrupted results

Update keygen for BLS v2

The HKDF info parameter was changed in the BLS draft v2 as mentioned in ethereum/EIPs#2337 (comment)

Current

1. PRK = HKDF-Extract("BLS-SIG-KEYGEN-SALT-", IKM)
2. OKM = HKDF-Expand(PRK, "", L)
3. SK = OS2IP(OKM) mod r
4. return SK

# 2. OKM = HKDF-Expand(PRK, "", L)
const L = 48
var okm: array[L, byte]
ctx.hkdfExpand(prk, "", okm)

Target

1. PRK = HKDF-Extract("BLS-SIG-KEYGEN-SALT-", IKM || I2OSP(0, 1))
2. OKM = HKDF-Expand(PRK, key_info || I2OSP(L, 2), L)
3. SK = OS2IP(OKM) mod r
4. return SK

Divide stack usage by 255 (keystore EIP 2333)

See #60

This implements keystore primitives for EIP-2333 (draft)
ethereum/EIPs#2333

This will unblock status-im/nim-beacon-chain#1093

cc @zah

Ready:

Note that the spec requires over 16KB of stack for parent_SK_to_Lamport_PK
image

the lamport_0 and lamport_1 are of size 255 * 32 bytes, in the code we reorder the spec and reuse the same lamport buffer.

# 2. lamport_0 = IKM_to_lamport_SK(IKM, salt)
# TODO: this uses 8KB and has a high stack-overflow potential
var lamport {.noInit.}: array[255, array[32, byte]]
ikm.ikm_to_lamport_SK(salt, lamport)
# TODO: unclear inclusive/exclusive ranges in spec
# assuming exclusive:
# https://github.com/ethereum/EIPs/issues/2337#issuecomment-637521421
# 6. for i = 0 to 255
# lamport_PK = lamport_PK | SHA256(lamport_0[i])
for i in 0 ..< 255:
ctx.update(sha256.digest(lamport[i]).data)
# 3. not_IKM = flip_bits(parent_SK)
# We can flip the bit of the IKM instead
# as flipping bits of milagro representation (Montgomery)
# doesn't make sense
var not_ikm {.noInit.}: array[32, byte]
for i in 0 ..< 32:
not_ikm[i] = not ikm[i]
# 4. lamport_1 = IKM_to_lamport_SK(not_IKM, salt)
# We reuse the previous buffer to limit stack usage
not_ikm.ikm_to_lamport_SK(salt, lamport)
# TODO: inclusive/exclusive range?
# 7. for i = 0 to 255
# lamport_PK = lamport_PK | SHA256(lamport_1[i])
for i in 0 ..< 255:
ctx.update(sha256.digest(lamport[i]).data)

We could further divide stack usage by 255 (!) if we have an alternative HKDF Expand iterator:

let oArray = cast[ptr UncheckedArray[byte]](output)
for i in 0 .. N:
ctx.init(prk.data)
# T(0) = empty string
if i != 0:
ctx.update(t.data)
ctx.update(info)
ctx.update([uint8(i+1)])
discard ctx.finish(t.data)
let iStart = i * HashLen
let size = min(HashLen, output.len - iStart)
copyMem(oArray[iStart].addr, t.data.addr, size)

Note: the tree might change to 32 instead of 255 as per ethereum/EIPs#2337

[SEC] Missing `norm` Operations

labels: nbc-audit-2020-1, status:reported
labels: difficulty:high, severity:medium, type:bug

Description

The isogeny_map_G2() function excerpted below as implemented in hash_to_curve.nim implements the 3-isogeny map from a point P' (x', y') on G'2 to a point P(x, y) on the G2 curve of BLS12-381. The bulk of the logic relates to similar calculations of xNum, xDen, yNum and yDen with the first shown below.

# xNum = k(1,3) * x'³ + k(1,2) * x'² + k(1,1) * x' + k(1,0)
let xNum = block:
  var xNum = k13.mul(xp3)
  norm(xNum)
  xNum.add xNum, k12.mul(xp2)
  norm(xNum)
  xNum.add xNum, k11.mul(xp)
  norm(xNum)
  xNum.add xNum, k10
  xNum

For 58-bit limbs, the add function ultimately resolves to BIG_384_58_add() in the csources, which does not reduce its result. Thus, the final result is missing a norm() operation prior to its return. This can impact correct operation as the xNum and yNum values subsequently become operands in a multiplication where the comments in the FP2_BLS381_mul() execution path state that inputs must be normed. A similar scenario applies to the case of 29-bit limbs.

Exploit Scenario

The logic may calculate incorrect results that will be extremely difficult to debug.

Mitigation Recommendation

Add a final norm to the calculation of xNum, xDen, yNum and yDen.

References

  1. # xNum = k(1,3) * x'³ + k(1,2) * x'² + k(1,1) * x' + k(1,0)

nim CI broken: eth2_vectors.nim(17, 3) Error: cannot open file: yaml

/cc @mratsim

  • your recent commit might've caused this f017051

  • wouldn't nim-blscurve's own CI prevent such regressions? => might indicate a bug in this repo's CI, which said: 9 checks passed

https://dev.azure.com/nim-lang/255dfe86-e590-40bb-a8a2-3c0295ebdeb1/_apis/build/builds/2970/logs/68

2020-03-01T22:13:52.0997954Z PASS: https://github.com/bluenote10/nim-heap C                     ( 4.59764791 secs)
2020-03-01T22:13:52.0998413Z FAIL: https://github.com/status-im/nim-blscurve C
2020-03-01T22:13:52.0998905Z Test "https://github.com/status-im/nim-blscurve" in category "nimble-packages"
2020-03-01T22:13:52.0999336Z Failure: reBuildFailed
2020-03-01T22:13:52.0999628Z package test failed
2020-03-01T22:13:52.0999928Z $ nimble test
2020-03-01T22:13:52.1000511Z   Executing task test in D:\a\1\s\pkgstemp\blscurve\blscurve.nimble
2020-03-01T22:13:52.1000900Z 
2020-03-01T22:13:52.1001244Z [Suite] [Before IETF standard] BLS381-12 test suite (public interface)
2020-03-01T22:13:52.1001553Z 
2020-03-01T22:13:52.1002130Z [Suite] [v0.9.x] Ethereum2 specification BLS381-12 test vectors suite
2020-03-01T22:13:52.1002653Z D:\a\1\s\pkgstemp\blscurve\tests\eth2_vectors.nim(17, 3) Error: cannot open file: yaml
2020-03-01T22:13:52.1003159Z stack trace: (most recent call last)
2020-03-01T22:13:52.1003607Z C:\Users\VSSADM~1\AppData\Local\Temp\nimblecache\nimscriptapi.nim(165, 16)
2020-03-01T22:13:52.1004437Z D:\a\1\s\pkgstemp\blscurve\blscurve_6708.nims(40, 8) testTask
2020-03-01T22:13:52.1005523Z D:\a\1\s\pkgstemp\blscurve\blscurve_6708.nims(21, 8) test
2020-03-01T22:13:52.1006040Z D:\a\1\s\lib\system\nimscript.nim(252, 7) exec
2020-03-01T22:13:52.1006683Z D:\a\1\s\lib\system\nimscript.nim(252, 7) Error: unhandled exception: FAILED: nim c -d:BLS_USE_IETF_API=true --outdir:build -r --hints:off --warnings:off tests/eth2_vectors.nim [OSError]
2020-03-01T22:13:52.1007324Z        Tip: 1 messages have been suppressed, use --verbose to show them.
2020-03-01T22:13:52.1007766Z      Error: Exception raised during nimble script execution
2020-03-01T22:13:52.1008035Z 
2020-03-01T22:35:18.4345562Z PASS: https://github.com/status-im/nim-bncurve C                   (127.68649817 secs)
2020-03-01T22:35:18.4347310Z PASS: https://github.com/nim-lang/c2nim C                          ( 3.22719979 secs)

which breaks my recent PR's (eg nim-lang/Nim#13546 )

Batched signatures verification

One of the upcoming requirements will be catching up to the blockchain and receiving a batch of blocks that will need to be verified.

Assuming we need to verify 3 days of blocks which is below the weak subjectivity period, we would have:

  • 60 seconds * 60 minutes * 24 hours * 3 days = 259200 seconds
  • 1 block every 6 seconds: --> 43200 blocks to verify.

At the current speed on a low power mobile device (#28) we can lower bound to 30 blocks per second per core

So catching up to 3 days of blocks would require 1440s (24min) for cryptography alone on a single core.

We can batch signature verifications by doing multiexponentiation using Pippenger algorithm and get 2x speedups

See Vitalik's writeup: https://ethresear.ch/t/simple-guide-to-fast-linear-combinations-aka-multiexponentiations/7238

Aztec implementation: https://github.com/AztecProtocol/barretenberg/blob/master/barretenberg/src/aztec/ecc/curves/bn254/scalar_multiplication/scalar_multiplication.cpp#L113

Gnark implementation: https://github.com/ConsenSys/gnark/blob/d160f27275a740b879d4132138e642c9c6ea1b0c/ecc/bls381/g1.go#L388

Note: this is different from aggregate signature verification which is implemented.

Deserialization of ECP2_BLS381 is not stable.

ECP2_BLS381 object original value is not equal to deserialized ECP2_BLS381 object value, difference is in XRES fields of FP_BLS381. Its not critical but still needs investigation.

Constant-time operations / side-channel attack resistance

Context

AMCL v3.1. claims that critical calculations are performed in constant-time:

Version 3.1 is a major "under the hood" upgrade. Field arithmetic is
performed using ideas from http://eprint.iacr.org/2017/437 to ensure
that critical calculations are performed in constant time. This strongly
mitigates against side-channel attacks. Exception-free formulae are
now used for Weierstrass elliptic curves. A new standardised script
builds for the same set of curves across all languages.

Obviously the calculation involving private keys must be constant-time, but not having everything constant-time might leave users open to other clever exploits.

Current implementation

For example comparison to zero or one is not constant time and key length can be deduced from it:

https://github.com/status-im/nim-milagro-crypto/blob/4add8c3441802b9962c966d023b629dcfb207640/src/generated/big_384_29.c#L34-L51

Modular inversion uses a lot of if statement, which cannot be constant time as due to cache/prediction misses we can deduce which branches were taken:

https://github.com/status-im/nim-milagro-crypto/blob/4add8c3441802b9962c966d023b629dcfb207640/src/generated/big_384_29.c#L1398-L1459

There is also no tests to count clock cycles for the AMCL library.

Others

This is not an isolated issue, even OpenSSL does not have complete constant-time arithmetics see openssl/openssl#6078, and had successful side-channel attacks against it for RSA (CacheBleed), AES, ECDSA

Testing/implementing constant-time

Wishlist

  • Having a Nim macro that throws a compiler error if something is not implemented in a constant-time manner or at least, use a secret variable as a conditional.

Updated overview

https://github.com/status-im/nim-constantine/wiki/Constant-time-arithmetics

Azure Pipelines: % encoding failure

We now get the following in Azure

##[warning]%25 detected in ##vso command. In March 2021, the agent command parser will be updated to unescape this to %. To opt out of this behavior, set a job level variable DECODE_PERCENTS to false. Setting to true will force this behavior immediately. More information can be found at https://github.com/microsoft/azure-pipelines-agent/blob/master/docs/design/percentEncoding.md

From https://dev.azure.com/nimbus-dev/nim-blscurve/_build/results?buildId=10247&view=logs&j=4956719e-f3a9-5ba8-becf-10a7bdf2d055&t=3818f934-238f-5252-6b59-89a7dc0ae4f9&l=9

Points at infinity.

For milagro-crypto the points at infinity over FP and FP2 are:

https://github.com/status-im/nim-milagro-crypto/blob/290f927865f9e575920dca5f415c58b554dbe92e/src/milagro_crypto/generated/ecp_BLS381.c#L165-L178

https://github.com/status-im/nim-milagro-crypto/blob/290f927865f9e575920dca5f415c58b554dbe92e/src/milagro_crypto/generated/ecp2_BLS381.c#L41-L49

So for FP:

  • Edwards curve: infinite at (0, 1, 1)
  • Montgomery curve: infinite at (0, 0, 0)
  • Weierstrass curve: infinite at (0, 1, 0)

and FP2: infinite at (0, 1, 0)

The research implementation of BLS uses Z1 and Z2 infinites defined in Py-Ecc:
https://github.com/ethereum/py_ecc/blob/master/py_ecc/optimized_bn128/optimized_curve.py#L39-L42

# Point at infinity over FQ
Z1 = (FQ.one(), FQ.one(), FQ.zero())
# Point at infinity for twisted curve over FQ2
Z2 = (FQ2.one(), FQ2.one(), FQ2.zero())

https://github.com/ethereum/beacon_chain/blob/eea52999a578fbd29751330a6f2bb27e60c67f7f/beacon_chain/utils/bls.py

from py_ecc.optimized_bn128 import (  # NOQA
    G1,
    G2,
    Z1,
    Z2,

...

def aggregate_sigs(sigs):
    o = Z2
    for s in sigs:
        o = add(o, decompress_G2(s))
    return compress_G2(o)


def aggregate_pubs(pubs):
    o = Z1
    for p in pubs:
        o = add(o, decompress_G1(p))
    return compress_G1(o)

So I'm not sure what curve type Py-ECC is using.

Furthermore BN128 is different from BLS12-381, at best it's a naming problem, otherwise we can't use the proof of concept test_bls as reference

Tests for fromBytes/FromHex into unitialized buffer

#43 and #44 required almost a day of investigation for what was initializing a BigInt by iterative sum, except that the result was not properly zero-initialized and so if the sum didn't reach all the limbs the last limbs stayed uninitialized.

We need tests to ensure that:

  • BIG384 (SecretKey)
  • ECP (PublicKey)
  • ECP2 (Signature / Proof-of-Possession)
    are properly initialized even if the destination buffer contains random data.

Add KeyValidate + Sanity checks

The BLS standard has a KeyValidate procedure.

Internally this is unnecessary because if the public key is parsed successfuly it is valid.

Additionally some sanity checks to ensure the range of secret keys woul be helpful to avoid a regression after #40 / #41.

Stack corruption due to ECP_mul temporary

A followup on #40 and another debugging session on status-im/nimbus-eth2#780

By instrumenting privToPub and ECP_MUL with

func privToPub*(secretKey: SecretKey): PublicKey =
  ## Generates a public key from a secret key
  debugEcho "Entering privToPub"
  debugEcho "  seckey: ", secretKey.toHex()
  result.point = generator1()
  result.point.mul(secretKey.intVal)
  debugEcho "  pubkey: ", result.toHex()
  debugEcho "Exiting privToPub"
void ECP_BLS381_mul(ECP_BLS381 *P,BIG_384_58 e)
{
#if CURVETYPE_BLS381==MONTGOMERY
   // [...]
#else
    /* fixed size windows */
    int i,nb,s,ns;
    BIG_384_58 mt,t;
    ECP_BLS381 Q,W[8],C;
    sign8 w[1+(NLEN_384_58*BASEBITS_384_58+3)/4];

    if (ECP_BLS381_isinf(P)) return;
    if (BIG_384_58_iszilch(e))
    {
        ECP_BLS381_inf(P);
        return;
    }

    ECP_BLS381_affine(P);

    /* precompute table */

    ECP_BLS381_copy(&Q,P);
    ECP_BLS381_dbl(&Q);

    ECP_BLS381_copy(&W[0],P);

    for (i=1; i<8; i++)
    {
        ECP_BLS381_copy(&W[i],&W[i-1]);
        ECP_BLS381_add(&W[i],&Q);
    }

//printf("W[1]= ");ECP_output(&W[1]); printf("\n");

    /* make exponent odd - add 2P if even, P if odd */
    BIG_384_58_copy(t,e);
    s=BIG_384_58_parity(t);
    BIG_384_58_inc(t,1);
    BIG_384_58_norm(t);
    ns=BIG_384_58_parity(t);
    BIG_384_58_copy(mt,t);
    BIG_384_58_inc(mt,1);
    BIG_384_58_norm(mt);
    BIG_384_58_cmove(t,mt,s);
    ECP_BLS381_cmove(&Q,P,ns);
    ECP_BLS381_copy(&C,&Q);

    nb=1+(BIG_384_58_nbits(t)+3)/4;

    printf("ECP_mul:\n");
    printf("  sign8 w[%d]\n", 1+(NLEN_384_58*BASEBITS_384_58+3)/4);
    printf("  nb: %d\n", nb);
    printf("  t: ");
    BIG_384_58_output(t);
    printf("\n");

    /* convert exponent to signed 4-bit window */
    for (i=0; i<nb; i++)
    {
        w[i]=BIG_384_58_lastbits(t,5)-16;
        BIG_384_58_dec(t,w[i]);
        BIG_384_58_norm(t);
        BIG_384_58_fshr(t,4);
    }
    w[nb]=BIG_384_58_lastbits(t,5);

    ECP_BLS381_copy(P,&W[(w[nb]-1)/2]);
    for (i=nb-1; i>=0; i--)
    {
        ECP_BLS381_select(&Q,W,w[i]);
        ECP_BLS381_dbl(P);
        ECP_BLS381_dbl(P);
        ECP_BLS381_dbl(P);
        ECP_BLS381_dbl(P);
        ECP_BLS381_add(P,&Q);
    }
    ECP_BLS381_sub(P,&C); /* apply correction */
#endif
    ECP_BLS381_affine(P);

    printf("Exiting ECP_mul\n");
}

We get the following stacktrace after building beacon_node with

source env.sh
nim c --cc:clang -d:release --import:libbacktrace --verbosity:0 --hints:off --warnings:off --passC:-fsanitize=address --passL:"-fsanitize=address" -o:build/beacon_node_asan beacon_chain/beacon_node
build/beacon_node_asan --nat:extip:127.0.0.1 --data-dir=local_testnet_data/node0 --state-snapshot=local_testnet_data/network_dir/genesis.ssz
INF 2020-03-12 17:19:40+01:00 Initializing networking                    tid=462177 file=eth2_network.nim:148 announcedAddresses=@[/ip4/127.0.0.1/tcp/9000] bootstrapNodes=@[] hostAddress=/ip4/0.0.0.0/tcp/9000
Control socket: /unix/tmp/nim-p2pd-462177-1.sock
Peer ID: 16Uiu2HAm6wkfwKLZor7HvwGbR14n6b7GH7ZjwUj385XpvUHBQmmS
Peer Addrs:
/ip4/127.0.0.1/tcp/9000
INF 2020-03-12 17:19:41+01:00 LibP2P daemon started                      tid=462177 file=eth2_network.nim:179 addresses=@[/ip4/127.0.0.1/tcp/9000] peer=16Uiu2HAm6wkfwKLZor7HvwGbR14n6b7GH7ZjwUj385XpvUHBQmmS
INF 2020-03-12 17:19:41+01:00 Waiting for connections                    topics="beacnde" tid=462177 file=beacon_node.nim:252
INF 2020-03-12 17:19:41+01:00 Starting beacon node                       topics="beacnde" tid=462177 file=beacon_node.nim:909 SECONDS_PER_SLOT=6 SLOTS_PER_EPOCH=8 SPEC_VERSION=0.10.1 cat=init dataDir=local_testnet_data/node0 finalizedRoot=5668f217 finalizedSlot=0 headRoot=5668f217 headSlot=0 pcs=start_beacon_node timeSinceFinalization=-1784 version="0.3.0 (b2faac7, libp2p_daemon)"
 peers: 0 ❯ epoch: 37, slot: 1/8 (297) ❯ finalized epoch: 0 (5668f217)                                                                                                                                                       ETH: 0 Entering privToPub
  seckey: 000000000000000000000000000000005b136035599c4233c2de3ed4e2eb78f1e3bf40cd550b333dc050695878b49075
ECP_mul:
  sign8 w[103]
  nb: 100
  t: d68000000000000000000000000000000005b136035599c4233c2de3ed4e2eb78f1e3bf40cd550b333dc050695878b49077
Exiting ECP_mul
  pubkey: 8a8ce89d5ae099ca6d86c8a25d6f1dc8b5c1d455cd5483699910ada7c9607f568bf5b6971495a99325fe73dc3dd17c6e
Exiting privToPub
WRN 2020-03-12 17:19:41+01:00 Validator not in registry (yet?)           topics="beacnde" tid=462177 file=beacon_node.nim:273 pubKey="real: 0x8a8ce89d5ae099ca6d86c8a25d6f1dc8b5c1d455cd5483699910ada7c9607f568bf5b6971495a99325fe73dc3dd17c6e"
INF 2020-03-12 17:19:41+01:00 Local validator attached                   tid=462177 file=validator_pool.nim:21 pubKey="real: 0x8a8ce89d5ae099ca6d86c8a25d6f1dc8b5c1d455cd5483699910ada7c9607f568bf5b6971495a99325fe73dc3dd17c6e" validator="real: 0x"
 peers: 0 ❯ epoch: 37, slot: 1/8 (297) ❯ finalized epoch: 0 (5668f217)                                                                                                                                                       ETH: 0 Entering privToPub
  seckey: 00000000000000000000000000000000292bcd8b78ffc85782ed9c1052f581cca2c96d80a3c46c2bc1527b3cf188022b
ECP_mul:
  sign8 w[103]
  nb: 104
  t: 0b4907500000000000000000000000000000000292bcd8b78ffc85782ed9c1052f581cca2c96d80a3c46c2bc1527b3cf188022d
=================================================================
==462177==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffedfc8c7e7 at pc 0x5654edbd17e1 bp 0x7ffedfc8bd10 sp 0x7ffedfc8bd08
WRITE of size 1 at 0x7ffedfc8c7e7 thread T0
    #0 0x5654edbd17e0 in ECP_BLS381_mul /home/beta/Programming/Status/nim-beacon-chain/vendor/nim-blscurve/blscurve/csources/64/ecp_BLS381.c:1112:13
    #1 0x5654edfc31c3 in mul__8dasrHsBDivosc1xoLwN9aQcommon /home/beta/Programming/Status/nim-beacon-chain/vendor/nim-blscurve/blscurve/common.nim:297:2
    #2 0x5654edfc31c3 in privToPub__SbUVL7n9atErGXu7gDCy72Q /home/beta/Programming/Status/nim-beacon-chain/vendor/nim-blscurve/blscurve/bls_signature_scheme.nim:99:2
    #3 0x5654edfc5d0e in pubKey__HCx9cqWY5g0ZVsIUBYuD9cVA /home/beta/Programming/Status/nim-beacon-chain/beacon_chain/spec/crypto.nim:101:20
    #4 0x5654ee28ed0b in addLocalValidator__cSSHpZxKVcAbxlaA9bLXQsQ /home/beta/Programming/Status/nim-beacon-chain/beacon_chain/beacon_node.nim:267:11
    #5 0x5654ee290a9e in addLocalValidators__l9bqvDlqEn0zFromo2S35YQ /home/beta/Programming/Status/nim-beacon-chain/beacon_chain/beacon_node.nim:279:13
    #6 0x5654ee2a34dd in start__ZJSNFUSOl2Ftt60X6ooHFQ_2 /home/beta/Programming/Status/nim-beacon-chain/beacon_chain/beacon_node.nim:929:2
    #7 0x5654ee2ad2ce in NimMainModule /home/beta/Programming/Status/nim-beacon-chain/beacon_chain/beacon_node.nim:1172:4
    #8 0x5654ee2af69a in NimMain /home/beta/Programming/Status/nim-beacon-chain/vendor/nim-eth/eth/common/eth_types.nim:595:2
    #9 0x5654ee2af69a in main /home/beta/Programming/Status/nim-beacon-chain/vendor/nim-eth/eth/common/eth_types.nim:602:2
    #10 0x7fda7d270022 in __libc_start_main (/usr/lib/libc.so.6+0x27022)
    #11 0x5654ed6655fd in _start (/home/beta/Programming/Status/nim-beacon-chain/build/beacon_node_asan+0x15f5fd)

Address 0x7ffedfc8c7e7 is located in stack of thread T0 at offset 2759 in frame
    #0 0x5654edbd10bf in ECP_BLS381_mul /home/beta/Programming/Status/nim-beacon-chain/vendor/nim-blscurve/blscurve/csources/64/ecp_BLS381.c:1022

  This frame has 7 object(s):
    [32, 224) 'NQ.i' (line 985)
    [288, 344) 'mt' (line 1059)
    [384, 440) 't' (line 1059)
    [480, 672) 'Q' (line 1060)
    [736, 2272) 'W' (line 1060)
    [2400, 2592) 'C' (line 1060)
    [2656, 2759) 'w' (line 1061) <== Memory access at offset 2759 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/beta/Programming/Status/nim-beacon-chain/vendor/nim-blscurve/blscurve/csources/64/ecp_BLS381.c:1112:13 in ECP_BLS381_mul
Shadow bytes around the buggy address:
  0x10005bf898a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005bf898b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005bf898c0: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2
  0x10005bf898d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005bf898e0: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2
=>0x10005bf898f0: 00 00 00 00 00 00 00 00 00 00 00 00[07]f3 f3 f3
  0x10005bf89900: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005bf89910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005bf89920: 00 00 00 00 f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f2 f2
  0x10005bf89930: f2 f2 00 00 f2 f2 00 00 f2 f2 f8 f8 f8 f8 f8 f8
  0x10005bf89940: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==462177==ABORTING

image


Trying to isolate this with the following test cases gives different result

import blscurve/[milagro, common], nimcrypto/utils

proc main3() =

  var okm: DBIG_384
  doAssert okm.fromBytes fromHex"00000000000000000000000000000000292bcd8b78ffc85782ed9c1052f581cca2c96d80a3c46c2bc1527b3cf188022b"

  debugEcho "CurveOrder: ", CURVE_Order.toHex()

  var secretkey: BIG_384
  BIG_384_dmod(secretkey, okm, CURVE_Order)

  echo "seckey: ", secretkey

  var pubkey = generator1()
  pubkey.mul(secretkey)

  echo "publickey: ", pubkey

main3()
CurveOrder: 0000000000000000000000000000000073eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001
seckey: 00000000000000000000000000000000292bcd8b78ffc85782ed9c1052f581cca2c96d80a3c46c2bc1527b3cf188022b
ECP_mul:
  sign8 w[103]
  nb: 65
  t: 00000000000000000000000000000000292bcd8b78ffc85782ed9c1052f581cca2c96d80a3c46c2bc1527b3cf188022d
Exiting ECP_mul
publickey: (14773bb3c55136094b5c57afd2bc82df920f18e90494585859b929b1c458422df4e946316d94eb2b36953dc37452ea79, 1449e5104eb4fcf6c19bc20e05412906e26309385dc0c6050a548b1eedc6026aff9bfd1b620eea48e90bb7ca56ec6bef)

Notice that the temporary variable t first bits are not polluted.

Review usage of exceptions and try/except blocks

I don't think we should have any exceptions or try/except block at all in a crypto backend.

This was triggered on an empty aggregate test:

image

func fromHex*(res: var ECP2_BLS381, a: string): bool {.inline.} =
## Unserialize ECP2(G2) point from hexadecimal string ``a`` to ``res``.
##
## This procedure supports only compressed form of serialization.
##
## Returns ``true`` if conversion was successfull.
try:
fromBytes(res, hexToSeqByte(a))
except ValueError:
false

And there is another one here:

func fromHex*(res: var BIG_384, a: string): bool {.inline.} =
## Unserialize big integer from hexadecimal string ``a`` to ``res``.
##
## Returns ``true`` if conversion was successful.
try:
fromBytes(res, hexToSeqByte(a))
except ValueError:
false

i.e. {.push raises: [].} and introduce an exception-free hexToSeqBytes

[SEC] Infinity Public Key and Signature

labels: nbc-audit-2020-1, status:reported
labels: difficulty:high, severity:informational, type:documentation

This purpose of this strictly informational issue is to raise awareness that the Infinity public key is subject to current debate, may be a source of incompatibility from other (incorrect) implementations, and its handling could potentially change in a future draft of the BLS Signature specification.

Description

The coreVerify() function excerpted below as implemented in bls_signature_scheme.nim performs a subgroup check but not a check for an Infinity signature or public key. These values will return a True result.

func coreVerify[T: byte|char](
       publicKey: PublicKey,
       message: openarray[T],
       sig_or_proof: Signature or ProofOfPossession,
       domainSepTag: static string): bool =

<...comments removed...>

  if not subgroupCheck(publicKey.point):
    return false
  let Q = hashToG2(message, domainSepTag)

  # pairing(Q, xP) == pairing(R, P)
  return multiPairing(
           Q, publicKey.point,
           sig_or_proof.point, generator1()
         )

The desirability of detecting and rejecting an Infinity public key or signatures is hotly debated. The code matches the IETF specification which makes no mention of the Infinity (or zero) check so the code is compliant. However, this handling conflicts with the common perception of a signature "giving the recipient very strong reason to believe that the message was created by a known sender and that the message was not altered in transit". More importantly, the first comment in the discussion indicates that the blst code does detect this condition when utilized via the Golang bindings.

Exploit Scenario

Expectations involving an Infinity public key or signature may change and/or calculations may not give consistent results across implementations.

Mitigation Recommendation

Prominently document an explicit statement on how an Infinity public key or signature are expected to be handled.

References

  1. https://github.com/status-im/nim-blscurve/blob/da9ae49dab4cbb95b2cdaa68ecc221ee15c67a9a/blscurve/miracl/bls_signature_scheme.nim

  2. https://github.com/supranational/blst/blob/master/bindings/go/blst.go#L267

side-effect tracking

For crypto, it's valuable to know that most routines are side-effect free and pure math.

However at the moment, even init/fromBytes cannot be tagged as func/noSideEffect.

Multi-signatures verification with offending signature detection

This is similar to #52 with a twist.

#52 is suitable for syncing and catching up to the chain.

However, once we have caught, signature verification is still a bottleneck but we may have to verify multiple signatures that are not aggregated.

Research: https://ethresear.ch/t/fast-verification-of-multiple-bls-signatures/5407

We could aggregate signatures and then use fastAggregateVerify

func fastAggregateVerify*[T: byte|char](
publicKeys: openarray[PublicKey],
proofs: openarray[ProofOfPossession],
message: openarray[T],
signature: Signature
): bool =

but in case we have a wrong signature aggregated, we want to slash that exact validator and we would have to recheck on non-aggregated signatures.

This would allow to do batch verification on non-aggregated signatures while being able to pinpoint the wrong signature.

[SEC] Unnecessary HMAC Call in HKDF

(Tags: nbc-audit-2020-1, difficulty:undetermined, severity:informational, bug)

Location: blscurve/eth2_keygen/hkdf.nim, lines 149-162

Description

When the underlying hash function has output length n bytes, HKDF-Expand() produces output with HMAC calls; each HMAC invocation produces exactly n bytes. The hkdfExpand() implementation computes the number N of full HMAC blocks, then loops exactly N+1 times:

  let N = output.len div HashLen
  var t: MDigest[T.bits]
  let oArray = cast[ptr UncheckedArray[byte]](output)

  for i in 0 .. N:
    ctx.init(prk.data)
    # T(0) = empty string
    if i != 0:
      ctx.update(t.data)
    ctx.update(info)
    when append.len > 0:
      ctx.update(append)
    ctx.update([uint8(i)+1]) # For byte 255, this append "0" and not "256"
    discard ctx.finish(t.data)

    let iStart = i * HashLen
    let size = min(HashLen, output.len - iStart)
    copyMem(oArray[iStart].addr, t.data.addr, size)

If the requested output length (output.len) is exactly a multiple of the hash function output size (HashLen), then N calls to HMAC are sufficient; the value computed in the last iteration is ultimately discarded: the size value is then 0, and none of the bytes are copied to the output.

In the context of the BLS implementation, this inefficiency has negligible impact:

  • The hash function is SHA-256, with a 32-byte output; when hashing data into a curve point, each invocation of HKDF is for 48 bytes, which is not a multiple of 32.
  • The only other use of HKDF is for key generation, when producing the Lamport secret key, of size 8160 bytes. 255 HMAC invocations are needed, and the implementation does 256; the unnecessary 256th HMAC invocation is an overhead of only 0.4%. The perceived overhead will be in practice even lower, because key pair generation also involves other steps which are more expensive, thus further reducing the relative cost induced by this issue.

This is a "security issue" only insofar as it makes the HKDF implementation use too much CPU, which may increase the effect of DoS attacks, if the HKDF implementation were to be used in a context where its cost represents a computational bottleneck (which does not appear to be the case here, hence the sev:info status).

Recommended Mitigation

A comment in hkdf.nim says that it should be merged with the implementation in https://github.com/status-im/nim-eth/blob/b7ebf8ed/eth/p2p/discoveryv5/hkdf.nim. The latter happens to have an extra test to account for the case described above and make sure that N contains exactly the number of required HMAC calls; the loop then performs N iterations.

[SEC] Extraneous Exports and Dead Code

labels: nbc-audit-2020-1, status:reported
labels: difficulty:high, severity:low, type:bug

Description

The common.nim source file contains a significant amount of unnecessary exports as well as unused code.

The code was installed locally and the following functions removed from export. The executable was built and the self-tests successfully run.

  • zero()
  • bitsCount()
  • copy()
  • two (overloaded) sub()
  • shiftr()
  • four (overloaded) setx()

The above ten functions are low-level and should not be usable from outside the library. Note that this list is not meant to be exhaustive.

Additionally, a variety of low-level functions can be removed completely including both sub() and two of the four setx() functions. Again, these specific examples are not exhaustive.

Exploit Scenario

The presence of extraneous exports and dead code could indicate simple coding oversights or an incomplete code refactoring. The availability of this functionality may allow unexpected or incorrect usage outside of the intended API. If untested dead code were to become active then it may become a source of unanticipated bugs.

Mitigation Recommendation

Remove unnecessary exports and unnecessary code. Test coverage may be an informative (initial) method of identification for the latter.

References

  1. https://cwe.mitre.org/data/definitions/561.html

  2. https://github.com/status-im/nim-blscurve/blob/da9ae49dab4cbb95b2cdaa68ecc221ee15c67a9a/blscurve/miracl/common.nim

Carry/Borrow in bigint addition/substraction

There is no carry/borrow in the code. Apparently this is enabled by field arithmetic.

https://github.com/status-im/nim-milagro-crypto/blob/4add8c3441802b9962c966d023b629dcfb207640/src/generated/big_384_29.c#L357-L369

Citing the help - http://docs.milagro.io/en/amcl/milagro-crypto-library-white-paper.html#42-addition-and-subtraction:

4.2 Addition and Subtraction
The existence of a word excess means for example that multiple field elements can be added together digit by digit, without processing of carries, before overflow can occur.

Only occasionally will there be a requirement to normalize these extended values, that is to force them back into the original format. Note that this is independent of the modulus.

The existence of a field excess means that, independent of the word excess, multiple field elements can be added together before it is required to reduce the sum with respect to the modulus. In the literature this is referred to as lazy, or delayed, reduction. In fact we allow the modulus to be as small as 254 bits, which obviously increases the field excess.

Note that these two mechanisms associated with the word excess and the field excess (often confused in the literature) operate largely independently of each other.

AMCL has no support for negative numbers. Therefore subtraction will be implemented as field negation followed by addition. Negation is performed using the method described as Option 1 in
[aranha−karabina−longa−gebotys−lopez]. Basically the number of the active bits in the field excess of the number to be negated is determined, the modulus is shifted left by this amount plus one, and the value to be negated is subtracted from this value.

Note that because of the "plus 1", this will always produce a positive result at the cost of eating a bit into the field excess.
image
small 256-bit number representation

Normalization of extended numbers requires the word excess of each digit to be shifted right by the number of base bits, and added to the next digit, working right to left. Note that when numbers are subtracted digit-by-digit individual digits may become negative. However since we are avoiding using the sign bit, due to the magic of 2's complement arithmetic, this all works fine without any conditional branches.

Reduction of unreduced BIG numbers is carried out using a simple shift-compare-and-subtract of the modulus, with one subtraction needed on average half of the time for every active bit in the field excess. Hopefully such reductions will rarely be required, as they are slow and involve unpredictable program branches.

Since the length of field elements is fixed at compile time, it is expected that the compiler will unroll most of the time-critical loops. In any case the conditional branch required at the foot of a fixed-size loop can be accurately predicted by modern hardware.

The problem now is to decide when to normalize and when to reduce numbers to avoid the possibility of overflow. There are two ways of doing this. One is to monitor the excesses at run-time and act when the threat of overflow arises. The second is to do a careful analysis of the code and insert normalization and reduction code at points where the possibility of overflow may arise, based on a static worst-case analysis.

The field excess En of a number n is easily captured by a simple masking and shifting of the top word. If two normalized numbers a and b are to be added then the excess of their sum will be at worst Ea+Eb+1. As long as this is less than 2FE where FE is the field excess, then we are fine. Otherwise both numbers should be reduced prior to the addition.

In AMCL these checks are performed at run-time. However, as we shall see, in practice these reductions are very rarely required. So the if statement used to control them is highly predictable. Observe that even in the worst case, for a 16-bit implementation, the excess is a generous FE=4, and so many elements can be added or subtracted before reduction is required.

The worst case word excess for the result of a calculation is harder to calculate at run time, as it would require inspection of every digit of every BIG. This would slow computation down to an unacceptable extent. Therefore in this case we use static analysis and insert normalization code where we know it might be needed. This process was supported by special debugging code that warned of places where overflow was possible, based on a simple worst-case analysis.

Confirm curves to use for aggregate signature and public keys

Context

Blocking #4 - Implement AggregateSignature

From spec: https://crypto.stanford.edu/~dabo/pubs/papers/BLSmultisig.html

BLS signature Scheme

  • KeyGen(): choose a random α ← Zq and set h ← g ∈ G1. output pk:=(h) and sk:=(α). (pk public key, sk secret/private key)
  • Sign(sk,m): output σ←H0(m)^α ∈ G0. The signature is a single group element.
  • Verify(pk,m,σ): if e(g1,σ)=e(pk, H0(m)) output "accept", otherwise output "reject".

In Milagro libraries:

  • spec G1 is amcl G the Generator of ECP (see ECP_BLS381_KEY_PAIR_GENERATE)
  • G2 is not used for signing

Referring to Zcash spec: https://github.com/zkcrypto/pairing/tree/master/src/bls12_381#bls12-381-instantiation

Fq elements are 48 bytes (ECP in milagro crypto?)
Fq2 elements are 96 bytes (ECP2 in milagro crypto?)

Referring to Vitalik's Elliptic Curve Pairing: https://medium.com/@VitalikButerin/exploring-elliptic-curve-pairings-c73c1864e627

G1 is on Fq
G2 is on Fq12

Note that signatures in Milagro crypto are passed as a tuple (c, d)
which seems to match with the real + imaginary complex used for ECP2

Some checks

An check that could be done would be making sure that
G1
https://github.com/status-im/nim-milagro-crypto/blob/290f927865f9e575920dca5f415c58b554dbe92e/src/milagro_crypto/generated/rom_curve_BLS381.c#L19-L20

and G2
https://github.com/status-im/nim-milagro-crypto/blob/290f927865f9e575920dca5f415c58b554dbe92e/src/milagro_crypto/generated/rom_curve_BLS381.c#L27-L30

matches with G1

x = 3685416753713387016781088315183077757961620795782546409894578378688607592378376318836054947676345821548104185464507
y = 1339506544944476473020471379941921221584933875938349620426543736416511423956333506472724655353366534992391756441569

and G2

x = 3059144344244213709971259814753781636986470325476647558659373206291635324768958432433509563104347017837885763365758*u + 352701069587466618187139116011060144890029952792775240219908644239793785735715026873347600343865175952761926303160
y = 927553665492332455747201965776037880757740193453592970025027978793976877002675564980949289727957565575433344219582*u + 1985150602287291935568054521177171638300868978215655730859378665066344726373823718423869104263333984641494340347905

from Zcash https://github.com/zkcrypto/pairing/tree/master/src/bls12_381.

We can see from the hex version de Zcash that the couple few last Zcash hex digits match with the first hex digits of Milagro.
17F1D3A73197D7942695638C4FA9AC0FC3688C4F9774B905A14E3A3F171BAC586C55E83FF97A1AEFFB3AF00ADB22C6BB

However Milagro "Big" representation uses a word excess + base word (see Milagro whitepaper) instead of a standard representation so we can't compare with milagro raw hex.

Other lib using Milagro for aggregate signature

This library https://github.com/lovesh/signature-schemes/blob/master/src/bls/aggr_new.rs is using Milagro Rust as a backend for BLS aggregate signature.

  • G1/ECP for signature aggregation
  • G2/ECP2 for public key aggregation

which seems strange

Benchmark vs MCL on x86

How to reproduce

MCL

git clone https://github.com/herumi/mcl
cd mcl

make -j ${ncpu}
make bin/bls12_test.exe # even on Linux

bin/bls12_test.exe

nim-blscurve

git clone https://github.com/status-im/nim-blscurve
cd nim-blscurve
nimble bench

Results

On i9-9980XE. Note: Overclocked at 4.1GHz while nominal clock is 3.0 GHz so the cycle count is off by a factor 4.1/3.0

Reading results:

  • G2 multiplications is sign
  • pairing is composed among others of Miller Loop and Final Exponentiation
    pairing is verify

MCL using JIT (x86-only)

Highlighted the important parts

$  bin/bls12_test.exe
JIT 1
ctest:module=size
ctest:module=naive
i=0 curve=BLS12_381
G1
G2
GT
G1::mulCT      198.538Kclk <-------------------------
G1::mul        207.139Kclk <-------------------------
G1::add          1.252Kclk <-------------------------
G1::dbl        880.09 clk
G2::mulCT      383.361Kclk <-------------------------
G2::mul        407.538Kclk <-------------------------
G2::add          3.424Kclk <-------------------------
G2::dbl          2.169Kclk
GT::pow        802.285Kclk
G1::setStr chk 289.740Kclk
G1::setStr       2.606Kclk
G2::setStr chk 723.913Kclk
G2::setStr       5.467Kclk
hashAndMapToG1 225.867Kclk
hashAndMapToG2 467.456Kclk
Fp::add         14.27 clk
Fp::sub          9.28 clk
Fp::neg          8.05 clk
Fp::mul        107.01 clk
Fp::sqr         98.88 clk
Fp::inv          4.542Kclk
Fp2::add        19.23 clk
Fp2::sub        14.30 clk
Fp2::neg        13.90 clk
Fp2::mul       305.68 clk
Fp2::mul_xi     22.51 clk
Fp2::sqr       228.53 clk
Fp2::inv         5.111Kclk
FpDbl::addPre   13.24 clk
FpDbl::subPre   13.21 clk
FpDbl::add      19.52 clk
FpDbl::sub      12.88 clk
FpDbl::mulPre   46.70 clk
FpDbl::sqrPre   40.28 clk
FpDbl::mod      58.60 clk
Fp2Dbl::mulPre  203.47 clk
Fp2Dbl::sqrPre  113.19 clk
GT::add        114.57 clk
GT::mul          7.663Kclk
GT::sqr          5.490Kclk
GT::inv         16.616Kclk
FpDbl::mulPre   46.79 clk
pairing          2.237Mclk <-------------------------
millerLoop     906.605Kclk <-------------------------
finalExp         1.329Mclk <-------------------------
precomputeG2   201.104Kclk
precomputedML  686.266Kclk
millerLoopVec    4.616Mclk
ctest:module=finalExp
finalExp   1.338Mclk
ctest:module=mul_012
ctest:module=pairing
ctest:module=multi
BN254
calcBN1  32.201Kclk
naiveG2  17.132Kclk
calcBN2  64.564Kclk
naiveG2  46.680Kclk
BLS12_381
calcBN1  77.113Kclk
naiveG1  56.032Kclk
calcBN2 157.989Kclk
naiveG2 129.702Kclk
ctest:module=eth2
mapToG2  org-cofactor 904.585Kclk
mapToG2 fast-cofactor 476.747Kclk
ctest:module=deserialize
verifyOrder(1)
deserializeG1 345.128Kclk
deserializeG2 842.046Kclk
verifyOrder(0)
deserializeG1  57.978Kclk
deserializeG2 121.257Kclk
ctest:name=bls12_test, module=8, total=3600, ok=3600, ng=0, exception=0

MCL using Assembly from LLVM i256 and i384 (x86 and ARM)

$  bin/bls12_test.exe -m llvm_mont
JIT 1
ctest:module=size
ctest:module=naive
i=0 curve=BLS12_381
G1
G2
GT
G1::mulCT      208.328Kclk <-------------------------
G1::mul        212.953Kclk <-------------------------
G1::add          1.279Kclk <-------------------------
G1::dbl        919.59 clk
G2::mulCT      522.604Kclk <-------------------------
G2::mul        537.523Kclk <-------------------------
G2::add          4.638Kclk <-------------------------
G2::dbl          2.704Kclk
GT::pow        916.182Kclk
G1::setStr chk 301.920Kclk
G1::setStr       2.684Kclk
G2::setStr chk 935.164Kclk
G2::setStr       5.723Kclk
hashAndMapToG1 239.817Kclk
hashAndMapToG2 582.365Kclk
Fp::add         13.37 clk
Fp::sub         13.86 clk
Fp::neg          8.78 clk
Fp::mul        110.30 clk
Fp::sqr        110.73 clk
Fp::inv          4.594Kclk
Fp2::add        25.68 clk
Fp2::sub        27.10 clk
Fp2::neg        19.76 clk
Fp2::mul       453.66 clk
Fp2::mul_xi     35.83 clk
Fp2::sqr       254.18 clk
Fp2::inv         5.168Kclk
FpDbl::addPre   14.72 clk
FpDbl::subPre   14.64 clk
FpDbl::add      19.02 clk
FpDbl::sub      17.55 clk
FpDbl::mulPre   54.56 clk
FpDbl::sqrPre   51.65 clk
FpDbl::mod      80.36 clk
Fp2Dbl::mulPre  243.72 clk
Fp2Dbl::sqrPre  145.11 clk
GT::add        168.85 clk
GT::mul          8.658Kclk
GT::sqr          6.252Kclk
GT::inv         20.416Kclk
FpDbl::mulPre   53.03 clk
pairing          2.717Mclk <-------------------------
millerLoop       1.105Mclk <-------------------------
finalExp         1.550Mclk <-------------------------
precomputeG2   269.778Kclk
precomputedML  852.854Kclk
millerLoopVec    5.598Mclk
ctest:module=finalExp
finalExp   1.547Mclk
ctest:module=mul_012
ctest:module=pairing
ctest:module=multi
BN254
calcBN1  34.282Kclk
naiveG2  17.954Kclk
calcBN2  65.749Kclk
naiveG2  48.280Kclk
BLS12_381
calcBN1  82.024Kclk
naiveG1  60.722Kclk
calcBN2 167.170Kclk
naiveG2 139.621Kclk
ctest:module=eth2
mapToG2  org-cofactor   1.150Mclk
mapToG2 fast-cofactor 585.033Kclk
ctest:module=deserialize
verifyOrder(1)
deserializeG1 366.132Kclk
deserializeG2   1.152Mclk
verifyOrder(0)
deserializeG1  62.822Kclk
deserializeG2 130.251Kclk
ctest:name=bls12_test, module=8, total=3600, ok=3600, ng=0, exception=0

Nim-blscurve using Milagro

Warmup: 0.9038 s, result 224 (displayed to avoid compiler optimizing warmup away)


Compiled with GCC
Optimization level => no optimization: false | release: true | danger: true
Using Milagro with 64-bit limbs
Running on Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz



⚠️ Cycles measurements are approximate and use the CPU nominal clock: Turbo-Boost and overclocking will skew them.
i.e. a 20% overclock will be about 20% off (assuming no dynamic frequency scaling)

=================================================================================================================

Scalar multiplication G1                                   2399.307 ops/s       416787 ns/op      1250378 cycles
Scalar multiplication G2                                    890.692 ops/s      1122722 ns/op      3368208 cycles
EC add G1                                                911577.028 ops/s         1097 ns/op         3291 cycles
EC add G2                                                323310.702 ops/s         3093 ns/op         9281 cycles
Pairing (Milagro builtin double pairing)                    420.039 ops/s      2380729 ns/op      7142276 cycles
Pairing (Multi-Pairing with delayed Miller and Exp)         417.711 ops/s      2393999 ns/op      7182089 cycles

⚠️ Warning: using draft v5 of IETF Hash-To-Curve (HKDF-based).
           This is an outdated draft.

Hash to G2 (Draft #5)                                       937.034 ops/s      1067197 ns/op      3201632 cycles

Conclusion

(Our cycles and MCL clocks/clk are the same unit.)

Scalar Multiplication G2 / Signing is about 8x slower
Pairing / Verification is about 3x slower

Optimize Simple SWU (Shallue-van de Woestijne)

The mapping we use for hashToCurve is the simple one described as

map_to_curve_simple_swu(u)

Input: u, an element of F.
Output: (x, y), a point on E.

Constants:
1.  c1 = -B / A
2.  c2 = -1 / Z

Steps:
1.  tv1 = Z * u^2
2.  tv2 = tv1^2
3.   x1 = tv1 + tv2
4.   x1 = inv0(x1)
5.   e1 = x1 == 0
6.   x1 = x1 + 1
7.   x1 = CMOV(x1, c2, e1)    # If (tv1 + tv2) == 0, set x1 = -1 / Z
8.   x1 = x1 * c1      # x1 = (-B / A) * (1 + (1 / (Z^2 * u^4 + Z * u^2)))
9.  gx1 = x1^2
10. gx1 = gx1 + A
11. gx1 = gx1 * x1
12. gx1 = gx1 + B             # gx1 = g(x1) = x1^3 + A * x1 + B
13.  x2 = tv1 * x1            # x2 = Z * u^2 * x1
14. tv2 = tv1 * tv2
15. gx2 = gx1 * tv2           # gx2 = (Z * u^2)^3 * gx1
16.  e2 = is_square(gx1)
17.   x = CMOV(x2, x1, e2)    # If is_square(gx1), x = x1, else x = x2
18.  y2 = CMOV(gx2, gx1, e2)  # If is_square(gx1), y2 = gx1, else y2 = gx2
19.   y = sqrt(y2)
20.  e3 = sgn0(u) == sgn0(y)  # Fix sign of y
21.   y = CMOV(-y, y, e3)
22. return (x, y)

and implemented there

func mapToIsoCurveSimpleSWU_G2(u: FP2_BLS381): tuple[x, y: FP2_BLS381] =
## Implementation of map_to_curve_simple_swu
## to map an element of FP2 to a curve isogenous
## to the G2 curve of BLS12-381 curve.
##
## SWU stands for Shallue-van de Woestijne-Ulas mapping
## described in https://tools.ietf.org/html/draft-irtf-cfrg-hash-to-curve-04#section-6.5.2
##
## Input:
## - u, an element of FP2
##
## Output:
## - (x, y), a point on G'2, a curve isogenous to G2 curve of BLS12-381
{.noSideEffect.}: # Only globals accessed are A, B, Z, c1, c2.
# we use globals to ensure they are computed only once.
let # Constants, See 8.9.2. BLS12-381 G2 suite
A {.global.} = toFP2( 0, 240) # A' = 240 * I
B {.global.} = toFP2(1012, 1012) # B' = 1012 * (1+I)
Z {.global.} = neg toFP2(2, 1) # Z = -(2+I)
c1 {.global.} = neg mul(B, inv(A)) # -B/A
c2 {.global.} = neg inv(Z) # -1/Z
var one {.global.} = block:
# TODO, we need an increment procedure
# this is incredibly inefficient
var one: FP2_BLS381
setOne(one)
one
{.noSideEffect.}:
let tv1 = mul(Z, sqr(u))
var tv2 = sqr(tv1)
var x1 = add(tv1, tv2)
x1 = inv(x1) # TODO: Spec defines inv0(0) == 0; inv0(x) == x^(q-2)
let e1 = x1.isZilch()
x1.add(x1, one)
x1.cmov(c2, e1) # If (tv1 + tv2) == 0, set x1 = -1 / Z
x1.mul(x1, c1) # x1 = (-B / A) * (1 + (1 / (Z² * u^4 + Z * u²)))
var gx1 = sqr(x1)
gx1.add(gx1, A)
gx1.mul(gx1, x1)
gx1.add(gx1, B) # gx1 = g(x1) = x1³ + A * x1 + B
let x2 = mul(tv1, x1) # x2 = Z * u² * x1
tv2.mul(tv1, tv2)
let gx2 = mul(gx1, tv2) # gx2 = (Z * u²)³ * gx1
let e2 = gx1.isSquare()
let x = cmov(x2, x1, e2) # If is_square(gx1), x = x1, else x = x2
let y2 = cmov(gx2, gx1, e2) # If is_square(gx1), y2 = gx1, else y2 = gx2
var y = sqrt(y2)
let e3 = u.isNeg() == y.isNeg() # Fix sign of y
y = cmov(neg y, y, e3)
result.x = x
result.y = y

There exist an optimized implementation that leverage the fact that the prime q' of the isogenous curve E' is q' ≡ 9 (mod 16) to avoid the expensive square root operation (i.e. 381 field multiplications avoided in the fast case when the target prime q of E is q ≡ 3 (mod 4) which is the case for BLS12-381 G2)

D.2.3.  q = 9 (mod 16)

The following is a straight-line implementation of the Simplified SWU
mapping that applies to any curve over GF(q) where q = 9 (mod 16).
This includes the curve isogenous to BLS12-381 G2 (Section 8.8.2).

map_to_curve_simple_swu_9mod16(u)

Input: u, an element of F.
Output: (xn, xd, yn, yd) such that (xn / xd, yn / yd) is a
        point on the target curve.

Constants:
1. c1 = (q - 9) / 16            # Integer arithmetic
2. c2 = sqrt(-1)
3. c3 = sqrt(c2)
4. c4 = sqrt(Z^3 / c3)
5. c5 = sqrt(Z^3 / (c2 * c3))

Steps:
1.  tv1 = u^2
2.  tv3 = Z * tv1
3.  tv5 = tv3^2
4.   xd = tv5 + tv3
5.  x1n = xd + 1
6.  x1n = x1n * B
7.   xd = -A * xd
8.   e1 = xd == 0
9.   xd = CMOV(xd, Z * A, e1)   # If xd == 0, set xd = Z * A
10. tv2 = xd^2
11. gxd = tv2 * xd              # gxd == xd^3
12. tv2 = A * tv2
13. gx1 = x1n^2
14. gx1 = gx1 + tv2             # x1n^2 + A * xd^2
15. gx1 = gx1 * x1n             # x1n^3 + A * x1n * xd^2
16. tv2 = B * gxd
17. gx1 = gx1 + tv2             # x1n^3 + A * x1n * xd^2 + B * xd^3
18. tv4 = gxd^2
19. tv2 = tv4 * gxd             # gxd^3
20. tv4 = tv4^2                 # gxd^4
21. tv2 = tv2 * tv4             # gxd^7
22. tv2 = tv2 * gx1             # gx1 * gxd^7
23. tv4 = tv4^2                 # gxd^8
24. tv4 = tv2 * tv4             # gx1 * gxd^15
25.   y = tv4^c1                # (gx1 * gxd^15)^((q - 9) / 16)
26.   y = y * tv2               # This is almost sqrt(gx1)
27. tv4 = y * c2                # check the four possible sqrts
28. tv2 = tv4^2
29. tv2 = tv2 * gxd
30.  e2 = tv2 == gx1
31.   y = CMOV(y, tv4, e2)
32. tv4 = y * c3
33. tv2 = tv4^2
34. tv2 = tv2 * gxd
35.  e3 = tv2 == gx1
36.   y = CMOV(y, tv4, e3)
37. tv4 = tv4 * c2
38. tv2 = tv4^2
39. tv2 = tv2 * gxd
40.  e4 = tv2 == gx1
41.   y = CMOV(y, tv4, e4)      # if x1 is square, this is its sqrt
42. gx2 = gx1 * tv5
43. gx2 = gx2 * tv3             # gx2 = gx1 * Z^3 * u^6
44. tv5 = y * tv1
45. tv5 = tv5 * u               # This is almost sqrt(gx2)
46. tv1 = tv5 * c4              # check the four possible sqrts
47. tv4 = tv1 * c2
48. tv2 = tv4^2
49. tv2 = tv2 * gxd
50.  e5 = tv2 == gx2
51. tv1 = CMOV(tv1, tv4, e5)
52. tv4 = tv5 * c5
53. tv2 = tv4^2
54. tv2 = tv2 * gxd
55.  e6 = tv2 == gx2
56. tv1 = CMOV(tv1, tv4, e6)
57. tv4 = tv4 * c2
58. tv2 = tv4^2
59. tv2 = tv2 * gxd
60.  e7 = tv2 == gx2
61. tv1 = CMOV(tv1, tv4, e7)
62. tv2 = y^2
63. tv2 = tv2 * gxd
64.  e8 = tv2 == gx1
65.   y = CMOV(tv1, y, e8)      # choose correct y-coordinate
66. tv2 = tv3 * x1n             # x2n = x2n / xd = Z * u^2 * x1n / xd
67.  xn = CMOV(tv2, x1n, e8)    # choose correct x-coordinate
68.  e9 = sgn0(u) == sgn0(y)    # Fix sign of y
69.   y = CMOV(-y, y, e9)
70. return (xn, xd, y, 1)

Notes

As we might still switch BLS implementation, we should decide first what library to use in the long-term before doing this optimization.

Many upstreams possible

AMCL repo contains the sources we were tracking. The current commit we track is quite old:

Since then:

For action

Ultimately, the codebases are still pretty similar at the moment but we should expect them to diverge with time, including with bug fixes.

This means that it may make sense to reconsider using Apache as upstream and instead use MIRACL.

[SEC] Insufficient Input Validation

labels: nbc-audit-2020-1, status:reported
labels: difficulty:high, severity:medium, type:bug

Description

The fromHex() function shown below as implemented in bls_sig_io.nim does not validate the size of its input string prior to processing. Note that the absence of leading zeroes will cause this function to execute in non-constant time on a SecretKey although this latter aspect is likely not significant.

func fromHex*[T: SecretKey|PublicKey|Signature|ProofOfPossession](
       obj: var T,
       hexStr: string
     ): bool {.inline.} =
  ## Initialize a BLS signature scheme object from
  ## its hex raw bytes representation.
  ## Returns true on a success and false otherwise
  when obj is SecretKey:
    result = obj.intVal.fromHex(hexStr)
  else:
    result = obj.point.fromHex(hexStr)

Similarly, the frombytes() function in the same source file does not validate the size of its input. Note that the export functions present in this source file do not ensure that a constant-length output string is returned. Note that the missing checks are indeed performed in the bls_sig_min_pubkey_size_pop.nim source file (in blst, thus also presenting an inconsistency).

Separately, the common.nim source file contains a large number of functions similar to that shown below.

func fromBytes*(res: var ECP2_BLS12381, data: openarray[byte]): bool =
  ## Unserialize ECP2(G2) point from array of bytes ``data``.
  ##
  ## This procedure supports only compressed form of serialization.
  ##
  ## Returns ``true`` on success, ``false`` otherwise.
  if len(data) >= MODBYTES_384 * 2:
  ...

In the above code, the input data is checked via len(data) >= MODBYTES_384 * 2 which will unnecessarily allow the presence of oversized input. There are a total of 5 functions in this source file containing >=. Additionally, line 581 contains a check that 'cleans' the length of incorrectly sized data thereby explicitly allowing processing to continue.

Exploit Scenario

Incorrectly size input data will not be rejected but rather sent into downstream logic. This logic may not be hardened sufficiently to prevent unanticipated or undesirable behavior. An opportunity to detect malicious activity or correct bad code is missed.

Mitigation Recommendation

Perform precise checks on expected input length and reject malformed input.

References

  1. https://github.com/status-im/nim-blscurve/blob/da9ae49dab4cbb95b2cdaa68ecc221ee15c67a9a/blscurve/miracl/bls_sig_io.nim

  2. https://github.com/status-im/nim-blscurve/blob/da9ae49dab4cbb95b2cdaa68ecc221ee15c67a9a/blscurve/miracl/common.nim

Public key aggregation is slow

On my i5-5257U (dual core mobile Broadwell 2.7Ghz Turbo 3.1 from 2015), I get the following stats for signature aggregation:

Warmup: 1.1908 s, result 224 (displayed to avoid compiler optimizing warmup away)

#### Block parameters
Number of validators:                                                                       482
Number of block parent hashes:                                                               12
Fork version:                                                                                 3
Slot:                                                                                      4246
Shard_id:                                                                                   555
Parent_hash[0]:                99D2587E07003CFE8023D46401577191EF89BFCC239A6EF1922AC49A687116A2
Shard_block_hash:              0CF579DC04024D8D4292A4BBCFCAD24F6A20C44AF665A7A4144CE84E8821E77A
justified_slot:                                                                            1846


#### Message, crypto keys and signatures
482 secret and public keys pairs generated in 2.014 s
Throughput: 239.279 kps/s (key pairs/second)


Message generated in 0.010 ms


482 public key and message signature pairs generated in 1.153 s
Throughput: 418.150 kps/s (keysig pairs/second)


#### Benchmark: signature aggregation

Benchmarking signature aggregation
Collected 100 samples in 153.974 seconds
Average time: 1539.735 ms
Stddev  time: 3.821 ms
Min     time: 1536.821 ms
Max     time: 1558.711 ms

Display computation result to make sure it's not optimized away
0418ff7d1d14353af2f95bb25724fa9787cd4e95c4b5040dbddf1ff3a601c29943974ad5cf806c89b04fda4564c513d2ae1420cecdeaaa0bd4888a5b066efafa2222425216e8e8a43982735c68ddf37ef0494cfc1830e8be270bd5d026804f19f8

But uncommenting the public key aggregation benchmark will leave the bench stuck, not even 10 samples can be benchmarked in 2 min:

image

If we dive into the detail of ECP2_BLS381_mul, FP2_BLS381_mul is a huge bottleneck:
2018-10-27_18-58-45

This is due to BIG_384_29_mul and BIG_384_29_monty (Montgomery reduction?)

image

Reject invalid serialized infinity points

From status-im/nimbus-eth2#555

It's an invalid encoding of an infinity point.

The issue is that a signature is represented as a (G1, G1) with G1 using 381 bits out of 48 bytes (384 bits) and some of the extra 3 bits are used as flags according to the Zcash serialization spec: https://github.com/zcash/librustzcash/tree/f55f094e/pairing/src/bls12_381#serialization
Note that the Zcash serialization spec is recommended in IETF (https://tools.ietf.org/id/draft-irtf-cfrg-pairing-friendly-curves-07.html#name-zcash-serialization-format-)

image

The Ethereum test generators has a 50% chance of setting this infinity bit to 1. But if it's set to 1, the rest of the bits should be set to 0.

The NBC fix is that this was raised with test data for the SSZ test vectors, for this suite we only need opaque blob, we never need to deserialize the signatures.

The crypto fix is to always reject invalid encoding of infinity points.

In BLST this is done by copying the input bytes, then checking if they were all zeros: https://github.com/supranational/blst/blob/27d3083e/src/e2.c#L300-L307

This is not done in Milagro

Signature parsing issue

The following signature is extracted from SSZ consensus test:

import blscurve

let sigbytes = @[byte 217, 149, 255, 97, 73, 133, 236, 43, 248, 34, 30, 10, 15, 45, 82, 72, 243, 179, 53, 17, 27, 17, 248, 180, 7, 92, 200, 153, 11, 3, 111, 137, 124, 171, 29, 218, 191, 246, 148, 57, 160, 50, 232, 129, 81, 90, 72, 161, 110, 138, 243, 116, 0, 88, 125, 180, 67, 153, 194, 181, 117, 152, 166, 147, 13, 77, 15, 91, 33, 50, 140, 199, 150, 10, 15, 10, 209, 165, 38, 57, 56, 114, 175, 29, 49, 11, 11, 126, 55, 189, 170, 46, 218, 240, 189, 144]

var sig: Signature

let success = init(sig, sigbytes)

echo success
echo sig

It results in

true
c00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Instead of the expected

true
0xd995ff614985ec2bf8221e0a0f2d5248f3b335111b11f8b4075cc8990b036f897cab1ddabff69439a032e881515a48a16e8af37400587db44399c2b57598a6930d4d0f5b21328cc7960a0f0ad1a526393872af1d310b0b7e37bdaa2edaf0bd90

Compiled version

Hello,

this repo looks very interesting. I would like to use this library for another project (written in Go).

Therefore I was wondering whether there is a precompiled binary available, which supports the ETH 2.0 interface as stated here.

That way, I could test and use this library.

Thanks in advance.

Fuzzing

A a minimum we need to add fuzzing to Hash-To-Curve as we might receive forged messages that might trigger edge cases.

One nice thing is that Milagro is using Exception-Free Addition formulas that fail to handle infinity points and for a point P(x, y) that needs special handling of Q(x, y) or Q(x, -y)

The issue stems from Short Weierstrass Addition law

P + Q = R
(Px, Py) + (Qx, Qy) = (Rx, Ry)

with
Rx = λ² - Px - Qx
Ry = λ(Px - Rx) - Py

with `λ = (Qy - Py) / (Px - Qx)`
which would divide by 0 if Px == Qx

For actual elliptic curve testing, it's quite probably the a fuzzer won't be able to create valid elliptic curve points (though AFL learned to create valid jpegs from nothing but fuzzing https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html) so we will need to turn to differential fuzzing.

Thankfully there is a host of alternative implementations that we can use and that are sufficiently fast:

And somewhat slower:

Fuzzing: multiple definition of functions

The recent BLST cannot be compiled for use in libnfuzz and so prevents fuzzing Nimbus.

Command:

NIMFLAGS="-d:disableLTO" make libnfuzz.so libnfuzz.a

Example duplicated proc

  • quot_rem_64
    but there are dozen others

[SEC] Missing Check in `aggregate*()` Which Cannot Return INVALID

labels: nbc-audit-2020-1, status:reported
labels: difficulty:high, severity:medium, type:bug

Description

The two aggregate() functions shown below as implemented in bls_signature_scheme.nim exhibit diverging behavior when presented with an empty signature set. The former will simply return while the latter will encounter a failing assertion.

proc aggregate*(agg: var AggregateSignature, sigs: openarray[Signature]) =
  ## Aggregates an array of signatures `sigs` into a signature `sig`
  for s in sigs:
    agg.point.add(s.point)

...

proc aggregate*(sigs: openarray[Signature]): Signature =
  ## Aggregates array of signatures ``sigs``
  ## and return aggregated signature.
  ##
  ## Array ``sigs`` must not be empty!
  # TODO: what is the correct empty signature to return?
  #       for now we assume that empty aggregation is handled at the client level
  doAssert(len(sigs) > 0)
  result = sigs[0]
  for i in 1 ..< sigs.len:
    result.point.add(sigs[i].point)

Separately, the aggregate*() functions are not able to signal INVALID to calling code, which is a minor divergence from the BLS Signature specification. While INVALID is primarily associated with point deserialization which is performed elsewhere and so not needed here, it also pertains to the len(sigs) == 0 condition, which the latter function handles via the assertion. Further, should Infinity signatures be handled differently in future, this code will struggle to adapt.

This issue is also present in blscurve/blst/bls_sig_min_pubkey_size_pop.nim on lines 256-279.

Exploit Scenario

The first aggregate*() function will silently accept an empty set of signatures and return. This violates the specification and may impact downstream logic which may not be hardened sufficiently to prevent unanticipated or undesirable behavior.

Mitigation Recommendation

Ensure both functions handle an empty set of signatures in the same way (ideally by returning INVALID, but an assertion is acceptable). Consider adapting the functions to return a boolean indication of success along with its result, perhaps via a nim result structure.

References

  1. proc aggregate*(sigs: openarray[Signature]): Signature =

  2. https://tools.ietf.org/html/draft-irtf-cfrg-bls-signature-02

  3. https://github.com/arnetheduck/nim-result

[SEC] EIP 2386 Specification Completeness

labels: nbc-audit-2020-2 🛂, status:reported
labels: difficulty:medium, severity:medium, type:bug

Description

The EIP 2386 specification is silent on a number of aspects relevant to blscurve/eth2_keygen, including:

  • The maximum length of the name string and expectations regarding Unicode normalization.
  • The type or maximum allowed nextaccount value.
  • The type or maximum expected version value (though this is currently hardcoded and thus not yet necessary).
  • Required or expected behavior when extraneous fields are present.

For example, allowing a name string of length one gigabyte is not necessary and may surface downstream impacts stemming from unexpected/unnecessary memory allocations.

While the name string is a simple identifier, different UTF-8 encodings may arise from malicious intent or simply the broad range of participating devices, operating systems, languages and applications. Characters with accents or other modifiers can have multiple correct Unicode encodings. For example, the Á (a-acute) glyph can be encoded as a single character U+00C1 (the "composed" form) or as two separate characters U+0041 then U+0301 (the "decomposed" form). In some cases, the order of a glyph's combining elements is significant and in other cases different orders must be considered equivalent. Normalization is the process of standardizing string representation such that if two strings are canonically equivalent and are normalized to the same normal form, their byte representations will be the same. Only then can string comparison and ordering operations be relied upon. Performing this step is best practice to support user expectations related to rendering consistency.

Regarding nextaccount, values beyond 2**53 are likely not necessary and may encounter problems related to the JavaScript number type having a 53-bit floating-point mantissa. Further, if this value is related to an index in EIP 2333, then a constraint of 2**32 is more reasonable.

Specifying similar constraints for version can be done alongside other modifications for completeness, though this is not currently necessary.

Specifying required or expected behavior when extraneous fields are present will improve implementation interoperability.

Mitigation Recommendation

To summarize, it is recommended to consider a maximum string length for name and to indicate that implementations should immediately normalize this value to the NKFC form per section 2.11.2.B.2 of Unicode Technical Report #36. Additionally, consider specifying the type and maximum values for both the nextaccount and version values, as well as what should happen if extraneous fields are present.

This issue is also submitted to the EIP 2386 discussion at ethereum/EIPs#2386.

Both nim-blscurve and nim-beacon-chain should track the resolution and implement the corresponding validation checks.

[Tech debt] Cleanup repo

Part of the tech debt week status-im/nimbus-eth2#867

With the varying schemes we had we have accumulated some cruft in the repo.
In preparation for auditing, the code paths that are not used should be cleaned up.

For example

Implement Aggregate Signature

Unfortunately, Milagro Crypto doesn't come with aggregate signatures so they must be implemented in C or Nim.

We can follow implementation in Rust done here: https://github.com/lovesh/signature-schemes.

Challenges

Octet string. One major pain point is having to workaround the "octet string" introduced by Milagro Crypto (probably to workaround C limitations that Nim doesn't have).

This octet string requires a lot of boilerplate and a high-level API that hides away all of this need to be built on top of Milagro-Crypto.

Boilerplate means allocating a storage backend with the proper size and lifetime (arrays are collected when function exists).

https://github.com/status-im/nim-milagro-crypto/blob/9ad68657cf0d6c238220a3b35efd62fddd6b2ab5/tests/all_tests.nim#L58-L93

Need to wrap lots of low level types and primitives

BIG. The big int implementation also brings problem due to pointers/lifetime/array passing in C.
https://github.com/status-im/nim-milagro-crypto/blob/9ad68657cf0d6c238220a3b35efd62fddd6b2ab5/src/generated/big_384_29.h#L39-L60

typedef chunk BIG_384_29[NLEN_384_29];     /**< Define type BIG as array of chunks */

Other low-level types are stack objects are more straightforward.

  • FP ~ field point,
  • ECP_BLS381 ~ field point on elliptic curve,
  • FP2 ~ complex field point (with real and imaginary part)
  • ECP2_BLS381 ~complex points on elliptic curve

Need to provide glue

To manage lifetimes properly, we will probably need glue, but also need to develop constant-time primitives to not grow the attack surface mentionned in #2.

Stack corruption on keypair generation

The stack can be corrupted on key-pair generation.

For some reason this does not happen when generated in a loop, for example

import blscurve, nimcrypto

proc main() =

  for i in 0 ..< 100:
    echo "iteration ", i, ":"
    var ikm: array[32, byte]

    let written = randomBytes(ikm)
    doAssert written >= 32, "Key generation failure"

    var secKey: SecretKey
    var pubKey: PublicKey

    let ok = keygen(ikm, pubkey, seckey)

    doAssert ok
    echo "  seckey: ", seckey.toHex()
    echo "  pubkey: ", pubkey.toHex()

main()

But it happens when generated in global arrays (even on the heap) like in PR status-im/nimbus-eth2#780 unless it has to do with object variants?

import
  # Specs
  ../../beacon_chain/spec/[datatypes, crypto]

# this is being indexed inside "mock_deposits.nim" by a value up to `validatorCount`
# which is `num_validators` which is `MIN_GENESIS_ACTIVE_VALIDATOR_COUNT`
proc genMockPrivKeys(privkeys: var array[MIN_GENESIS_ACTIVE_VALIDATOR_COUNT, ValidatorPrivKey]) =
  for i in 0 ..< privkeys.len:
    let pair = newKeyPair()
    privkeys[i] = pair.priv

proc genMockPubKeys(
       pubkeys: var array[MIN_GENESIS_ACTIVE_VALIDATOR_COUNT, ValidatorPubKey],
       privkeys: array[MIN_GENESIS_ACTIVE_VALIDATOR_COUNT, ValidatorPrivKey]
     ) =
  for i in 0 ..< privkeys.len:
    pubkeys[i] = pubkey(privkeys[i])

# TODO: Ref array necessary to use a proc to avoid stack smashing in ECP_BLS381_mul (see gdb)
var MockPrivKeys*: ref array[MIN_GENESIS_ACTIVE_VALIDATOR_COUNT, ValidatorPrivKey]
new MockPrivKeys
genMockPrivKeys(MockPrivKeys[])

var MockPubKeys*: ref array[MIN_GENESIS_ACTIVE_VALIDATOR_COUNT, ValidatorPubKey]
new MockPubKeys
genMockPubKeys(MockPubKeys[], MockPrivKeys[])

type MockKey = ValidatorPrivKey or ValidatorPubKey

template `[]`*[N: static int](a: array[N, MockKey], idx: ValidatorIndex): MockKey =
  a[idx.int]

when isMainModule:
  from blscurve import toHex

  echo "========================================"
  echo "Mock keys"
  for i in 0 ..< MIN_GENESIS_ACTIVE_VALIDATOR_COUNT:
    echo "  validator ", i
    echo "    seckey: ", MockPrivKeys[i].toHex()
    echo "    pubkey: ", MockPubKeys[i]

In Clang

This seems to manifest as a secret key that is bigger than the curve order:
image

In GDB we crash soon after

image

Benchmarks vs MCL on ARM

For reference, this is MCL speed on ARM-32 Rpi 4

https://github.com/mratsim/mcl/blob/2b318e84/bench-arm32-pi4.log

JIT 0
ctest:module=size
ctest:module=naive
i=0 curve=BLS12_381
G1
G2
GT
G1::mulCT        1.932msec
G1::mul          1.976msec
G1::add         11.765usec
G1::dbl          8.713usec
G2::mulCT        4.389msec
G2::mul          4.527msec
G2::add         39.784usec
G2::dbl         21.466usec
GT::pow          7.413msec
G1::setStr chk   2.834msec
G1::setStr      12.919usec
G2::setStr chk   7.554msec
G2::setStr      26.452usec
hashAndMapToG1   2.722msec
hashAndMapToG2   5.766msec
Fp::add         47.45nsec
Fp::sub         56.04nsec
Fp::neg         29.36nsec
Fp::mul        894.09nsec
Fp::sqr          1.371usec
Fp::inv        121.102usec
Fp2::add        93.46nsec
Fp2::sub       109.45nsec
Fp2::neg        58.77nsec
Fp2::mul         4.102usec
Fp2::mul_xi    116.80nsec
Fp2::sqr         1.948usec
Fp2::inv       129.694usec
FpDbl::addPre   83.59nsec
FpDbl::subPre   84.77nsec
FpDbl::add      83.48nsec
FpDbl::sub      86.79nsec
FpDbl::mulPre  888.74nsec
FpDbl::sqrPre  846.63nsec
FpDbl::mod     525.81nsec
Fp2Dbl::mulPre    3.015usec
Fp2Dbl::sqrPre    1.910usec
GT::add        570.59nsec
GT::mul         71.228usec
GT::sqr         49.861usec
GT::inv        258.619usec
FpDbl::mulPre  888.75nsec
pairing         21.020msec
millerLoop       9.109msec
finalExp        11.902msec
precomputeG2     2.191msec
precomputedML    6.881msec
millerLoopVec   47.449msec
ctest:module=finalExp
finalExp  11.990msec
ctest:module=mul_012
ctest:module=pairing
ctest:module=multi
BN254
calcBN1 499.901usec
naiveG2 214.120usec
calcBN2 981.032usec
naiveG2 725.049usec
BLS12_381
calcBN1   1.122msec
naiveG1 690.237usec
calcBN2   2.271msec
naiveG2   1.818msec
ctest:module=eth2
mapToG2  org-cofactor  10.949msec
mapToG2 fast-cofactor   6.192msec
ctest:name=bls12_test, module=7, total=832, ok=832, ng=0, exception=0

Reproduction command:

git clone [email protected]:herumi/mcl
cd mcl
make bin/bls12_test.exe MCL_USE_GMP=0 MCL_USE_OPENSSL=0
bin/bls12_test.exe

Hash_to_curve property-based testing

As highlighted by 6a4e452 in #30, we need better coverage of input message ranges as from time-to-time we need to call FP2 normalization to put the FP2 back in 0 ..< Prime range, especially after multiplication (norm calls)

var xp2 = sqr(xp)
norm(xp2)
var xp3 = mul(xp, xp2)
norm(xp3)
{.noSideEffect.}: # TODO overload `+` and `*` for readability
# xNum = k(1,3) * x'³ + k(1,2) * x'² + k(1,1) * x' + k(1,0)
let xNum = block:
var xNum = k13.mul(xp3)
norm(xNum)
xNum.add xNum, k12.mul(xp2)
norm(xNum)
xNum.add xNum, k11.mul(xp)
norm(xNum)
xNum.add xNum, k10
xNum
# xDen = x'² + k(2,1) * x' + k(2,0)
let xDen = block:
var xDen = xp2
xDen.add xDen, k21.mul(xp)
norm(xDen)
xDen.add xDen, k20
xDen
# yNum = k(3,3) * x'³ + k(3,2) * x'² + k(3,1) * x' + k(3,0)
let yNum = block:
var yNum = k33.mul(xp3)
norm(yNum)
yNum.add yNum, k32.mul(xp2)
norm(yNum)
yNum.add yNum, k31.mul(xp)
norm(yNum)
yNum.add yNum, k30
yNum
# yDen = x'³ + k(4,2) * x'² + k(4,1) * x' + k(4,0)
let yDen = block:
var yDen = xp3
yDen.add yDen, k42.mul(xp2)
norm(yDen)
yDen.add yDen, k41.mul(xp)
norm(yDen)
yDen.add yDen, k40
yDen

One way to do that would be to add property-based testing.

Invariants that hash_to_curve should abide to are available in the test suite from:

Other relevant repo to generate test vectors:

Example property based testing: https://github.com/status-im/nim-stint/blob/9e49b00148884a01d61478ae5d2c69b543b93ceb/tests/property_based_uint256.nim

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.