blake2 / blake2 Goto Github PK

View Code? Open in Web Editor NEW

640.0 40.0 123.0 2.08 MB

BLAKE2 official implementations

Home Page: https://blake2.net

License: Creative Commons Zero v1.0 Universal

C 94.61% C# 5.06% Makefile 0.26% Roff 0.07%

blake2's Introduction

BLAKE2

This is the reference source code package of BLAKE2, which includes

ref/: C implementations of BLAKE2b, BLAKE2bp, BLAKE2s, BLAKE2sp, aimed at portability and simplicity.
sse/: C implementations of BLAKE2b, BLAKE2bp, BLAKE2s, BLAKE2sp, optimized for speed on CPUs supporting SSE2, SSSE3, SSE4.1, AVX, or XOP.
neon/: Implementations of BLAKE2{s,b} using the NEON/ASIMD ARM instruction set.
power8/: Implementations of BLAKE2{s,b} for POWER8, using the VSX and Altivec extensions.
csharp/: C# implementation of BLAKE2b.
b2sum/: Command line utility to hash files, based on the sse/ implementations.
bench/: Benchmark tool to measure cycles-per-byte speeds and produce graphs copyright.

All code is triple-licensed under the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at your choosing.

More: https://blake2.net.

Contact: [email protected]

blake2's People

Contributors

Stargazers

Watchers

Forkers

codekaizen dictcp qarterd xwiz ahmetshen-us wheelcomplex arzeth mschuler kice specialsymbol scarletts dagenix wispproxy jedisct1 vicular tiran cryptomaniac janetizzy lifego metadings pawelgerr aa10000 theexoticman spockuto pixelb cryptifyme cybernetics walkoncross bigjoe01 poeblu the1nk msmmer 93i zhj2232006 zhangsoledad holoduke cardinalproperty 7snovic evaluation-alex gitgnu ociidii-works enhex zvampirem77 assured lake2010 nevrax diantaowang sb3n 23skdu sony2pl lukw00heck leighbb liugundam mjvankampen spidobot tekahuna menehune-io tchekjunior fusionfoto pelian jaggedsoft cryptocode wujiangge btalaei pct960 easyaspi314 fengfengchensan linuxperia styac cremator skinny63 cryptostiltskin laurentiu-andronache pebble8888 froexilize rkanadev riverrun afbjorklund 44670 ahm-1986 noloader innerop denren inteos bitkis shallom jcaesar lxngoddess5321 serpent27 laurentsimon arukaminado yusiukui16 haytastan papdpcpd hartl3y94 howjmay jqk6 felixonmars justdmitry bastiencaillot

blake2's Issues

SSE41 codepath is SLOWER than SSE2

In my benchmarks I have found the SSE41 codepath for message loading (LOAD_MSG_0_1 and friends) to be
slower than the SSE2 one.

When looking at the assembly output it seems that GCC is able to optimize the _mm_set_epi32 calls into a combination of punpckldq, punpcklqdq and pinsrd, requiring one of each for each message mix-in, but also quite a few times less.

Detect CPU features (such as SSE4) at runtime.

I suggest detecting CPU features for the SSE implementations at runtime rather than compile time using cpuid, because often, especially on Windows, one fat binary is distributed over many end-user machines.

This would work by creating multiple blake2s_compress and blake2b_compress functions, each with the apprioriate CPU features. Then once, at startup or at the first call, the correct function will be selected and stored in a function pointer (costing no branches during the actual time critical parts.)

The disadvantage of this is slightly larger code size in the binary, but not the cache. On the common user machines where this feature is wanted, the extra kilobyte(s) of binary space is well worth the optimum performance on every machine.

On very space-limited machines (embedded devices), this feature is not wanted and can be trivially disabled.

Some questions about the C version of Blake2B..

Hi,
I am writing this out in C++, and wonder if you might clarify a couple of things..
In the blake2bp_update function, bytes that are not processed are copied to the blake2bp_state 'buf' array, but not to the buffers in the individual blake2b_state members, (as the update functions should always receive full blocks), but in the blake2bp_final, you are processing those buffers as 2 * block size arrays of zero bytes.. maybe I've read this wrong, but if not, why process them at all?
I've added constructors to the struts and initialize the individual state buffers to 0 size, saving 1024 bytes of memory, and 8 (unnecessary?) compression cycles, but of course, it will not align to your kats.
Another point, in blake2bp_state, you initialize the 'S' state structures as a 2 dimensional array (blake2b_state S[4][1];), but the depth is fixed at 2, so I am not sure why you need the extra dimension..

memcpy of overlapping buffer in final

Coverity Scan complains about overlapping buffers in blake2s_final() and blake2b_final().

346  if( S->buflen > BLAKE2S_BLOCKBYTES )
347  {
348    blake2s_increment_counter( S, BLAKE2S_BLOCKBYTES );
349    blake2s_compress( S, S->buf );
350    S->buflen -= BLAKE2S_BLOCKBYTES;
   CID 1372514 (#1 of 1): Overlapping buffer in memory copy (BUFFER_SIZE)4. overlapping_buffer: The source buffer &S->buf[BLAKE2S_BLOCKBYTES] potentially overlaps with the destination buffer S->buf, which results in undefined behavior for memcpy.
   Use memmove instead of memcpy.
351    memcpy( S->buf, S->buf + BLAKE2S_BLOCKBYTES, S->buflen );
352  }

Reference Implementation for Blake2 parameter block

Hi,

As part of ongoing work on verifying implementations of Blake2, I am looking for a reference implementation of Blake2 with parameter blocks. As far as I understand, the current implementation in ref/ implements the newer Blake2x, which has some minor differences (for instance addition of the xof_length parameter, different ranges allowed for some other parameters).

Is a reference implementation available somewhere? Additionally, are there existing test vectors exercising the parameters that would help validating formal specifications?

Thanks in advance
cc @protz

Question about proposed drbg Blake2X

Hey,
Just read the paper (https://blake2.net/blake2x.pdf) and would like some clarification.
Is there any source code for this?
The text is a little light on details, but am I to take it that the message is first hashed, and a counter pre-pended to the hash state and this becomes the message input for subsequent generation cycles?
What purpose does the XOF flag serve?

Thanks,
John

Want to package for OS X Homebrew, but BLAKE2 does not tag any releases.

I noticed that the b2sum utility is not available in Homebrew on OS X. If you are not familiar homebrew is a very popular way to distribute compiled from source packages for OS X for programs that are not included with the OS.

http://brew.sh/

I was surprised to find that there was no package for b2sum out of the thousands that they offer.

After submitting issue #25 I am able to cleanly compile b2sum on OS X using the built in dev tools so I wanted to create a homebrew formula (which is just a Ruby script that downloads, compares to a known hash, compiles and installs a binary). This would be a relatively easy thing for me to do and would be useful to others.

However, one of the requirements of getting a package merged into Homebrew core is the ability to reference a release tarball or git tag. Sadly I think I discovered why no-one has packaged up b2sum to date, because there are no releases or milestone tags in this repository!

https://github.com/Homebrew/brew/blob/master/share/doc/homebrew/Formula-Cookbook.md

https://github.com/BLAKE2/BLAKE2/tags

Homebrew needs this so that they have a stable anchor tarball that can be hashed so that everyone who installs from that tarball is assured of getting what the packager intended.

If the project would cut a release, or tag the current head of the master branch, I would be able to complete this contribution. This is just a sound practice in any case.

Can you please cut a release, or even just an annotated tag, of the current state of the repository? If so I will create that homebrew package. There are no other actions that this project would need to take other than periodically tagging new releases (especially those with changes to b2sum).

Add SunCC 12.4 support

Attached is the diff for Sun Studio 12.4 support. It may make a good branch, or even an addition to Master. The SunCC folks will know what to do given the starting point.

SunCC 12.1 added GNU style inline assembler support; while SunCC 12.4 added the necessary SSE4 and instrinsic support. The thrust of the change was additional makefiles with the following. The various -D defines were required because SunCC is not like Clang, GCC and ICC. Also see How to print preprocessor macros under Sun Studio on Stack Overflow.

CC=/opt/solarisstudio12.4/bin/cc
CFLAGS=-w -O3 -native -m64 -xarch=aes -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__

blake2b.c required a check for __SIZEOF_INT128__ >= 16 to guard __uint128_t. That's another Clang, GCC and ICC-ism. Also see 128-bit integer documentation on the GCC mailing list.

BLAKE2b and BLAKE2s numbers are very respectable when testing under a VirutalBox VM. With a Core i5 4th gen running at 2.6 GHz capped at 92%, BLAKE2s achieved 4.86 cpb; while BLAKE2b achieved 3.83 cpb.

Here's the diff of the changes: BLAKE2-suncc.diff.zip. My apologies for not providing a Pull Request. Git never ceases to amaze me at how difficult it is to perform simple tasks.

Porting on SPARC CPU

Hello there,
is it possible to have a porting on SPARC CPU?

Here the cpucaps:
lush,stbar,swap,muldiv,v9,blkinit,n2,mul32,div32,fsmuld,v8plus,popc,vis,vis2,ASIBlkInit,fmaf,vis3,hpc,ima,pause,cbcond,adp,vis3b,pause-nsec,mwait,sparc5,vamask,aes,des,camellia,md5,sha1,sha256,sha512,mpmul,montmul,montsqr,crc32c,xmpmul,xmontmul,xmontsqr
Thanks,
Marco

Provide a reference implementation for BLAKE2sp and BLAKE2bp in Java

Could you please provide a reference implementation for BLAKE2sp and BLAKE2bp in the Java programming language?

cpuinfo under different architectures

I'm trying to get BLAKE2 on MIPSel and while I have compiled b2sum successfully, the bench program seems to be the annoying part. In particular, it's the cpucycles() function. Since this is MIPSel I cannot use the included cpuinfo headers. I tried replacing the asm code with a random integer but that just produced all 0s in the output.

Having trouble getting b2sum to compile on OS X

I cannot get b2sum to compile on OS X (10.11.5 El Capitan). If I use the OS X command line compile tools installed with Xcode I see:

$ make
cc b2sum.c ../sse/blake2b.c ../sse/blake2s.c ../sse/blake2bp.c ../sse/blake2sp.c  -O3 -march=native -static -Werror=declaration-after-statement -std=c99 -I../sse -fopenmp  -o b2sum
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
make: *** [all] Error 1

If I try to use homebrew installed gcc I get:

$ export CC=/usr/local/bin/gcc-6
$ make
/usr/local/bin/gcc-6 b2sum.c ../sse/blake2b.c ../sse/blake2s.c ../sse/blake2bp.c ../sse/blake2sp.c  -O3 -march=native -static -Werror=declaration-after-statement -std=c99 -I../sse -fopenmp  -o b2sum
ld: library not found for -lcrt0.o
collect2: error: ld returned 1 exit status
make: *** [all] Error 1

Is this a bug, or am I doing it wrong?

:if ??

Looks to me like there's an erroneous ":" that's snuck into blake2sp-ref.c:

% make
cc blake2sp-ref.c blake2s-ref.c -o blake2sp -std=c99 -Wall -pedantic -DBLAKE2SP_SELFTEST
blake2sp-ref.c: In function ‘blake2sp’:
blake2sp-ref.c:191:3: error: expected expression before ‘:’ token
:if ( NULL == in && inlen > 0 ) return -1;
^
make: *** [blake2sp] Error 1

Include Blake2x

===> https://www.blake2.net/blake2x.pdf

npm-gyp failer to rebuild blake2

I'm trying to run .\node_modules\.bin\electron-rebuild.cmd
but this is what I get:

× Rebuild Failed

An unhandled error occurred inside electron-rebuild
node-gyp failed to rebuild 'C:\Users\sanfe\Desktop\Storytell\node_modules\blake2'.
For more information, rerun with the DEBUG environment variable set to "electron-rebuild".

Error: Could not find any Visual Studio installation to use



Error: node-gyp failed to rebuild 'C:\Users\sanfe\Desktop\Storytell\node_modules\blake2'.
For more information, rerun with the DEBUG environment variable set to "electron-rebuild".

Error: Could not find any Visual Studio installation to use


    at NodeGyp.rebuildModule (C:\Users\sanfe\Desktop\Storytell\node_modules\electron-rebuild\lib\src\module-type\node-gyp.js:109:19)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async ModuleRebuilder.rebuildNodeGypModule (C:\Users\sanfe\Desktop\Storytell\node_modules\electron-rebuild\lib\src\module-rebuilder.js:94:9)
    at async Rebuilder.rebuildModuleAt (C:\Users\sanfe\Desktop\Storytell\node_modules\electron-rebuild\lib\src\rebuild.js:226:9)
    at async Rebuilder.rebuild (C:\Users\sanfe\Desktop\Storytell\node_modules\electron-rebuild\lib\src\rebuild.js:184:17)
    at async C:\Users\sanfe\Desktop\Storytell\node_modules\electron-rebuild\lib\src\cli.js:154:9

I've been trying to fix this since days. Everything is correctly installed on system, from visual studio to c++ developers tools and python 2.7. Can't figure out how to make it work.

Question: Where do IV values come from?

There are values called IVs used by Blake and I want to know more about them:

static const uint64_t blake2b_IV[8] =
{
  0x6a09e667f3bcc908ULL, 0xbb67ae8584caa73bULL,
  0x3c6ef372fe94f82bULL, 0xa54ff53a5f1d36f1ULL,
  0x510e527fade682d1ULL, 0x9b05688c2b3e6c1fULL,
  0x1f83d9abfb41bd6bULL, 0x5be0cd19137e2179ULL
};

static const uint32_t blake2s_IV[8] =
{
  0x6A09E667UL, 0xBB67AE85UL, 0x3C6EF372UL, 0xA54FF53AUL,
  0x510E527FUL, 0x9B05688CUL, 0x1F83D9ABUL, 0x5BE0CD19UL
};

What was the criterion/method used to establish them?

If I change them and use random values, will this compromise the security of Blake2?

Undefined behavior in rotates

From blake-impl.h. There are four of them similar to this:

static inline uint32_t rotr32( const uint32_t w, const unsigned c )
{
    return ( w >> c ) | ( w << ( 32 - c ) );
}

When c is 0, then 32 - 0 is 32. The valid range of shift amounts in this case is [0,31] inclusive, which leads to undefined behavior.

Perhaps the following would better serve Blake's needs:

static inline uint32_t rotr32( const uint32_t w, const unsigned c )
{
    static const uint32_t MASK = sizeof(w)*8 - 1;
    return ( w >> c ) | ( w << (-c & MASK) );
}

The pattern is recognized by most of the major compilers, includeing Clang, GCC and Intel's ICC. it reduces to a single rotate instruction. For references, see:

My apologies if you were aware of the behavior or you don't call the function with a rotate amount of 0. (Branching to avoid a 0 rotate may introduce a side channel, so its sometimes better to unconditionally make the call).

Personalization misaligned in C#

Hi,
First off, thanks for the great work!
I've been looking over the code from both submissions and I noticed that a personalization value is not being added to the correct position within Blake2IvBuilder::ConfigB method.
The second half of the input just overwrites the first half at the same position within the rawConfig array:

// Personalization
...
rawConfig[6] = Blake2BCore.BytesToUInt64(config.Personalization, 0);
rawConfig[6] = Blake2BCore.BytesToUInt64(config.Personalization, 8); // should start at 56 bytes [7]

Diagonal shuffle optimization for BLAKE2s

This BLAKE2b optimization has a BLAKE2s counterpart, which was implemented by Sean Gulley at oconnor663/blake2_simd@e26796e. This repo might benefit from porting that change. (Same for https://github.com/BLAKE2/libb2, though I won't duplicate the issue.)

Questions about Blake2x: Its state size (internal state) and its security when generating keys with size more than 256/512bits

I read the Blake2x paper: https://www.blake2.net/blake2x.pdf

It says Blake2x can be used to build a "DRBG" (CSPRNG): https://csrc.nist.gov/glossary/term/deterministic_random_bit_generator

"An algorithm that produces a sequence of bits that are uniquely determined from an initial value called a seed. The output of the DRBG “appears” to be random, i.e., the output is statistically indistinguishable from random values. A cryptographic DRBG has the additional property that the output is unpredictable, given that the seed is not known. A DRBG is sometimes also called a Pseudo Random Number Generator (PRNG) or a deterministic random number generator."

===

I want to know if the initial state (state size) of Blake2x is "enlarged" when hashing, because I didn't understand this notation:

Despite the internal state of Blake be 256/512 bits, can Blake2x be used to build a stream cipher (CSPRNG/DRBG) with security more than 256/512 bits given a seed with a larger size?

If I have a source full of entropy (like a high resolution photo) and I hash with Blake2x to a key of 8192 bits per example, will I get a key material with this size?

Can some Blake2 enthusiast answer my questions? (I tried to contact one of Blake2x authors, Jean-Philippe Aumasson, but I gave no response).

Tree test vectors missing

It would be nice to have test vectors for the "generic tree mode" mentioned in the whitepaper.

Segfault with clang + blake2bp/blake2sp

If you compile b2sum with clang-3.5 or clang-3.6, with the optimization level set to -O1, -O2, or -O3 (not -O0), running b2sum -a blake2bp FILE or b2sum -a blake2sp FILE will segfault quickly. The serial algorithms, blake2b and blake2s will work fine, though.

# ls -l ~/bigfile 
-rw-rw---- 1 at at 1,297,903,616 2015-04-30 19:27 /home/at/bigfile

# ./b2sum -a blake2bp ~/bigfile 
zsh: segmentation fault (core dumped)  ./b2sum -a blake2bp ~/bigfile

# valgrind ./b2sum -a blake2bp ~/bigfile
==54656== Memcheck, a memory error detector
==54656== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==54656== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==54656== Command: ./b2sum -a blake2bp /home/at/bigfile
==54656== 
==54656== Conditional jump or move depends on uninitialised value(s)
==54656==    at 0x43D146: __linkin_atfork (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x419963: ptmalloc_init.part.7 (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x419CED: malloc_hook_ini (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x46DDA2: _dl_get_origin (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x43D7EE: _dl_non_dynamic_init (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x43E5E7: __libc_init_first (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x4050C1: (below main) (in /home/at/BLAKE2/b2sum/b2sum)
==54656== 
==54656== Conditional jump or move depends on uninitialised value(s)
==54656==    at 0x415399: _int_free (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x4678B0: fillin_rpath (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x467F8E: _dl_init_paths (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x43DCCC: _dl_non_dynamic_init (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x43E5E7: __libc_init_first (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x4050C1: (below main) (in /home/at/BLAKE2/b2sum/b2sum)
==54656== 
==54656== Conditional jump or move depends on uninitialised value(s)
==54656==    at 0x4153FF: _int_free (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x4678B0: fillin_rpath (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x467F8E: _dl_init_paths (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x43DCCC: _dl_non_dynamic_init (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x43E5E7: __libc_init_first (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x4050C1: (below main) (in /home/at/BLAKE2/b2sum/b2sum)
==54656== 
==54656== Conditional jump or move depends on uninitialised value(s)
==54656==    at 0x414C08: malloc_consolidate (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x416930: _int_malloc (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x41894C: malloc (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x40135A: blake2bp_stream (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x401590: main (in /home/at/BLAKE2/b2sum/b2sum)
==54656== 
==54656== 
==54656== Process terminating with default action of signal 11 (SIGSEGV)
==54656==  General Protection Fault
==54656==    at 0x40169C: blake2b_init_param (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x404078: blake2bp_init_leaf (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x403F8F: blake2bp_init (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x40137D: blake2bp_stream (in /home/at/BLAKE2/b2sum/b2sum)
==54656==    by 0x401590: main (in /home/at/BLAKE2/b2sum/b2sum)
==54656== 
==54656== HEAP SUMMARY:
==54656==     in use at exit: 0 bytes in 0 blocks
==54656==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==54656== 
==54656== All heap blocks were freed -- no leaks are possible
==54656== 
==54656== For counts of detected and suppressed errors, rerun with: -v
==54656== Use --track-origins=yes to see where uninitialised values come from
==54656== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
zsh: segmentation fault (core dumped)  valgrind ./b2sum -a blake2bp ~/bigfile

# uname -a
Linux lindev 3.13.0-49-generic #83-Ubuntu SMP Fri Apr 10 20:11:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Any plan to make a new release?

Hi,

Is there any plan to make a new release, e.g. 20190724?
Thanks.

different read blocksize result in failed checksum

I have simple Python program for checking data integrity during copy operations.

tools used:

Blake2 implementation in Python 3.x on macOS X 10.14.2
coreutils-blake2 from brew ports on macOS X 10.14.2
Debian Linux 2.6.x coreutils-blake2

Since, large files are impossible to read without read_buffer I have to set read(BLOCKSIZE). I used to use automatic detection of optimal blocksize for each file system. For example my Apple MBP uses 1MB blocksize for its SSD

Unfortunately, various filesystems have various blocksizes. Whenever different blocksize is used in create checksum and then re-check - checksum FAILS !!!

Real life example:

50GB tar archive split into 2GB chunks. Before tar I have run Python script on whole directory tree in order to check data integrity after copying to server.

After uploading chunks, rejoining them and using b2sum official tool, all files failed. I used old initial checksums made by shasum and all files are ok.

I do not know how Blake2 exactly works but I like its speed but this blocksize issue is weird - even if you read with various blocksizes, end checksum must be always same, right?

b2sum actually slower than crc32/md5?

I got the AUR package compiled with march=native on an AMD Phenom II X4 970, and testing an 1.9GB file I get the following results with time:

2.064s for crc32
3.310s for md5sum
6.590s for b2sum

Since it's supposed to be faster than md5, I must be doing something wrong.

Please provide function to init with both key and params

I don't see how to use salt, personalization etc in combination with keyed hashing, without copy-pasting the memset/memcpy logic inside blake2b_init_key. Am I missing something that should have been obvious?

blake2b_init doesn't take a key.
blake2b_init_key forces params.
blake2b_init_param looks like a lower-level helper, and i don't see a clean, non copy-paste, way of getting the key update done after it.

I'm mostly interested in the SSE variant.

Please provide a reference implementation for ARM NEON

Hi Gentleman,

This is more a feature request than anything else. BLAKE2b is being cut-in on 32-bit ARM platforms based on code from @sneves and @veorq. The code is production quality, and its numbers are very good, and usually somewhere in the ballpark of a 3x speedup.

Here's are the numbers copied from a request for Jack Llyod's Botan. The "NEON implementation" is a modified version of BLAKE2's intrinsic implementation for ARM.

... The dev-boards used for testing were a BeagleBoard v3 (Cortex-A8), Banana Pi (Cortex-A7) and CubieTruck v5 (Cortex-A7). The BeagleBoard was configured with -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard; and the CubieTruck was configured with -march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard.

Here are the relative numbers:

BeagleBoard, CXX implementation (0.95 GiHz)
$ ./botan speed --msec=3000 Blake2b
Blake2b(512) [base] hash 11.721 MiB/sec (35.164 MiB in 3000.092 ms)
BeagleBoard, NEON implementation (0.95 GiHz)
$ ./botan speed --msec=3000 Blake2b
Blake2b(512) [base] hash 31.662 MiB/sec (94.988 MiB in 3000.038 ms)
BananaPi, CXX implementation (0.96 GiHz)
$ ./botan speed --msec=3000 Blake2b
Blake2b(512) [base] hash 15.769 MiB/sec (47.309 MiB in 3000.182 ms)
BananaPi, NEON implementation (0.96 GiHz)
$ ./botan speed --msec=3000 Blake2b
Blake2b(512) [base] hash 41.872 MiB/sec (125.617 MiB in 3000.044 ms)
CubieTruck, CXX implementation (1.7 GiHz)
$ ./botan speed --msec=3000 Blake2b
Blake2b(512) [base] hash 27.119 MiB/sec (81.359 MiB in 3000.123 ms)
CubieTruck, NEON implementation (1.7 GiHz)
$ ./botan speed --msec=3000 Blake2b
Blake2b(512) [base] hash 70.449 MiB/sec (211.348 MiB in 3000.020 ms)

Support for Apple Silicon on macOS (mach-o ARM64)

Can you look to support M1/Apple Silicon?

OID for blake2s?

Is there an existing OID(s) for blake2s and RSA (pkcs1-1.5 or RSA-PSS)?

Invalid JSON format for test vectors

The last array member in blake2-kat.json ends with comma, which is not allowed in JSON, so conforming parsers can't parse it.

Keyed hash test case for an independent implementation

Hi, I am implementing the blake2b/2s based keyed hashing in the cryptographic library raaz
https://github.com/piyush-kurur/raaz/tree/mac. The test cases seems to be working except the one for the empty string. Which means that there is some issue with the way null strings is handled in keyed hashing.

The standard says the following: The first block is the key padded with necessary 0-bytes. For an empty string the standard says that it will only compress this block.

What should be the length in this case (key size or the block size) ?
What should be the value of the finalisation flag. Should it be that of the final block or that of initial blocks.

Perfromance degradation on ARM Cortex-M3 with NATIVE_LITTLE_ENDIAN defined

ARM Cortex-M3 is a little endian and with NATIVE_LITTLE_ENDIAN I expected performance to increase. However, it instead decreases.
For 1M cycles of hashing a 32-byte message on Teensy 3 (ARM Cortex-M3 @ 72 MHz) the following results were achieved:
blake2s without NATIVE_LITTLE_ENDIAN - 59968 ms
blake2s with NATIVE_LITTLE_ENDIAN - 79803 ms

The problem is in memcpy - it is slow since is is designed to handle any alignment of source and destination.
In cases where you controls alignment of the source and destination, use of simple 32-bit assignment gives mush better performance.

Unify libb2 and the main blake2 repo?

It seems that the code in libb2 is basically the same as in the main repository, with some modest type and style changes. With the addition of blake2x, or the C89 compatibility fixes, the two repositories continue to diverge further.

I'd like to encourage building a libb2 from the main repository. This has the advantage of not having to do stuff twice, and having one upstream project that can be fixed from a downstream perspective. You could even build a static libb2 and then link it with b2sum.

blake2

I tried to introduce the blake2 module, but the project reported an error and I couldn't find the binding module.

Update binaries at blake2.net

Please, update the b2sum binaries at blake2.net. The ones out there are 4.5 years old.

Specifically, I came here because I need the -L option in the Windows 32- and 64-bit executables. I see that it's present in the sources of the b2sum utility since about 2 years ago. Unfortunately, I don't have GCC on my Windows machine, and my very limited experience tells me it's a huge time sink to get MinGW/Cygwin toolchain working.

I tried borrowing b2sum.exe from Git for Windows 32- and 64-bit editions. They do provide the -L option, but my performance tests show two things: 1) 64-bit build is about 4% slower than the b2sum from blake2.net; 2) the 32-bit build is about 288% (almost three times) slower than the b2sum from blake2.net. Yeah, they really messed up the 32-bit build somehow.

If there is anyone kind enough to share a set of pre-built Windows executables of b2sum for 32- and 64-bits, I would be most grateful!

Detect PARALLELISM_DEGREE at runtime.

Currently PARALLELISM_DEGREE is hardcoded at 8, which is not a bad guess, but it's certainly no more than that - a guess.

I think we should be able to better than that - at runtime. A good starting point in code could be this SO answer: http://stackoverflow.com/a/3082553/565635

Question: What is the maximum key lenght that BLAKE2x can proccess?

In this document: https://www.blake2.net/blake2x.pdf

/\ Is said that the key length is 256 bytes, but can BLAKE2B process keys larger than this?

Autotools configure.ac and Makefile.am

Hi Everyone/Samuel,

PR #64 has configure.ac and Makefile.am for Autotools. The BLAKE2 project is free to rip them. They were placed in public domain.

I found it easier to use Autotools rather than patching the makefiles and sources. Distros like Debian and Fedora should enjoy it, too. Autotools makes it very easy for a distro to uptake the project.

To use the files, unpack the zip file in the root of the BLAKE2 release directory. The root of the BLAKE2 release directory will have b2sum/, bench/ and friends below it. Then run:

$ autoreconf --install --force
$ ./configure
$ make
$ make check
$ make install

There's a configure option for regular users (non-distro users): --enable-native. It attempts to find the best compiler options for the native platform. On x86_64 it would be -march=native. On arm it would be -march=armv7 -mfpu=neon if available (which it usually is). It is off-by-default so distros don't accidentally distribute files built for the wrong arch (i.e., SSE4 or AVX on x86_64).

The SSE sources are used when appropriate. That currently includes i686 and x86_64. NEON sources are also used when appropriate.

Configure produces three or four artifacts:

b2sum
blake2-genkat-c
blake2-genkat-json
blake2-bench (x86 and x64 only)

The self tests are built and run during make check but they are not installed. For installation, the self tests would need to be wired-up to find their KATs in $ORIGIN/../share/b2sum/testvectors/. That has not happened.

You can create a folder like contrib/autotools, and drop them in the directory. You could also drop the CMake files into the contrib/cmake folder. Add a README that says things in contrib/ are not officially supported.

'b2sum' names conflict

Hello.

b2sum binary uses same name of b2sum tool provided by coreutils (widely popular on Gnu/Linux distributions); this creates a name conflict inside the /usr/bin directory.

Any chance to rename blake2's b2sum?

Status of blake2.net implementation list

Hello,

I recently completed a SPARK83/Ada 87 (https://github.com/lkujaw/ada-blake2) package of the BLAKE2s hash function, and I was wondering if implementations are still being validated for the list on https://www.blake2.net/ .

Apologies for asking questions on the issue tracker, but connections to the blake2.net email server are timing out for me.

Kind regards, Lev

The usage of tree hashing in .NET (C# version)

Hi!

I'm not sure if this is the right place to ask, but is the C# implementation of tree hashing useable? Looking at the repo, the tree specific parts seem to be commented out. Is the implementation missing?

I have another question too – and I may be misguided here – considering the following piece of demo code:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication1
{
    public struct Node
    {
        //The payload over which to calculate the hash of this node.
        public byte[] Payload;

        //The hash from the leaves up until now.
        //this.HashOfThisNodePayloadAndSubNodes = Hash(Hash(Payload).Concat(SubNodes.Concat(HashOfThisNodePayloadAndSubNodes)).
        public UInt32 HashOfThisNodePayloadAndSubNodes;

        //The subnodes in this tree.
        public List<Node> SubNodes;
    }


    class Program
    {
        static void Main(string[] args)
        {
            var root = new Node
            {
                Payload = Enumerable.Range(1, 5).Select(i => (byte)i).ToArray(),
                SubNodes = new List<Node>(new[]
                {
                    new Node
                    {
                        Payload = Enumerable.Range(1, 5).Select(i => (byte)i).ToArray()
                    },
                    new Node
                    {
                        Payload = Enumerable.Range(1, 5).Select(i => (byte)i).ToArray()
                    }
                })
            };

            //Puzzled here, should a function be called? How could the algorithm know
            //how to traverse the tree? There must be something else going on here...
            //var hashTree = ...
        }
    }
}

Would the assumptions be correct in general about creating a hash tree? If so, how it be done using Blake2b assuming the C# implementation would work? I tried to look also at the C implementation as for an example and if there were tree specific bits, I managed to miss them. It might be there is a .NET implementation somewhere too, but I haven't managed to find one.

Feature request: Add -c argument to b2sum

Take md5sum for instance where you can save the results to a text file and use -c argument to recheck the hashes. This simplifies the task of verifying file integrities, especially when dealing with large number of files and folders.

Kindly request you add this feature to b2sum. It should,

Behave like -c argument in md5sum, sha1sum, etc.
Support subdirectory scan.
Able to auto detect UTF-8 characters in file and folder names. This is important since saved text file probably won't have BOM header.

blake2 / blake2 Goto Github PK

blake2's Introduction

BLAKE2

blake2's People

Contributors

Stargazers

Watchers

Forkers

blake2's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs