GithubHelp home page GithubHelp logo

xxhash's Introduction

xxhash GoDoc Build Status Coverage

This is a native Go implementation of the excellent xxhash* algorithm, an extremely fast non-cryptographic Hash algorithm, working at speeds close to RAM limits.

  • The C implementation is (Copyright (c) 2012-2014, Yann Collet)

Install

go get github.com/OneOfOne/xxhash

Features

  • On Go 1.7+ the pure go version is faster than CGO for all inputs.
  • Supports ChecksumString{32,64} xxhash{32,64}.WriteString, which uses no copies when it can, falls back to copy on appengine.
  • The native version falls back to a less optimized version on appengine due to the lack of unsafe.
  • Almost as fast as the mostly pure assembly version written by the brilliant cespare, while also supporting seeds.
  • To manually toggle the appengine version build with -tags safe.

Benchmark

Core i7-4790 @ 3.60GHz, Linux 4.12.6-1-ARCH (64bit), Go tip (+ff90f4af66 2017-08-19)

➤ go test -bench '64' -count 5 -tags cespare | benchstat /dev/stdin
name                          time/op

# https://github.com/cespare/xxhash
XXSum64Cespare/Func-8          160ns ± 2%
XXSum64Cespare/Struct-8        173ns ± 1%
XXSum64ShortCespare/Func-8    6.78ns ± 1%
XXSum64ShortCespare/Struct-8  19.6ns ± 2%

# this package (default mode, using unsafe)
XXSum64/Func-8                 170ns ± 1%
XXSum64/Struct-8               182ns ± 1%
XXSum64Short/Func-8           13.5ns ± 3%
XXSum64Short/Struct-8         20.4ns ± 0%

# this package (appengine, *not* using unsafe)
XXSum64/Func-8                 241ns ± 5%
XXSum64/Struct-8               243ns ± 6%
XXSum64Short/Func-8           15.2ns ± 2%
XXSum64Short/Struct-8         23.7ns ± 5%

CRC64ISO-8                    1.23µs ± 1%
CRC64ISOString-8              2.71µs ± 4%
CRC64ISOShort-8               22.2ns ± 3%

Fnv64-8                       2.34µs ± 1%
Fnv64Short-8                  74.7ns ± 8%

Usage

	h := xxhash.New64()
	// r, err := os.Open("......")
	// defer f.Close()
	r := strings.NewReader(F)
	io.Copy(h, r)
	fmt.Println("xxhash.Backend:", xxhash.Backend)
	fmt.Println("File checksum:", h.Sum64())

playground

TODO

  • Rewrite the 32bit version to be more optimized.
  • General cleanup as the Go inliner gets smarter.

License

This project is released under the Apache v2. license. See LICENSE for more details.

xxhash's People

Contributors

andrewkroh avatar asellappen avatar dlsniper avatar edwardbetts avatar johnaoss avatar millerkil avatar oneofone avatar shibukawa avatar virrages avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xxhash's Issues

uint64 too large

Thanks for the fantastic xxhash lib!

I ran into this problem on a 10 year old Atom 'Netbook' running Fedora 26. and I am wondering if this is more of a go-compiler problem, a library specific problem, or my-machine-is-too-old problem (eg the whole 32 bits physical, 32 bits virtual thing). I am fairly confident that this is arch dependent, and it is not critical, but am curious what is going on.

λ git describe
v1.2-17-g7606720

λ go build
# github.com/OneOfOne/xxhash
./xxhash_unsafe.go:53: type [2147483647]uint64 larger than address space
./xxhash_unsafe.go:53: type [2147483647]uint64 too large
./xxhash_unsafe.go:110: type [2147483647]uint64 larger than address space
./xxhash_unsafe.go:110: type [2147483647]uint64 too large
./xxhash_unsafe.go:211: type [2147483647]uint64 larger than address space
./xxhash_unsafe.go:211: type [2147483647]uint64 too large

λ uname -a
Linux avaps-demo 4.13.9-200.fc26.i686+PAE #1 SMP Mon Oct 23 14:13:49 UTC 2017 i686 i686 i386 GNU/Linux

λ free -h
              total        used        free      shared  buff/cache   available
Mem:           2.0G        357M        646M         59M        1.0G        1.5G
Swap:            0B          0B          0B

λ cat /etc/redhat-release
Fedora release 26 (Twenty Six)

λ cat /proc/cpuinfo
processor       : 0
  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 28
  model name      : Intel(R) Atom(TM) CPU N280   @ 1.66GHz
  stepping        : 2
  microcode       : 0x218
  cpu MHz         : 1666.479
  cache size      : 512 KB
  physical id     : 0
  siblings        : 2
  core id         : 0
  cpu cores       : 1
  apicid          : 0
  initial apicid  : 0
  fdiv_bug        : no
  f00f_bug        : no
  coma_bug        : no
  fpu             : yes
  fpu_exception   : yes
  cpuid level     : 10
  wp              : yes
  flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse     sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 x    tpr pdcm movbe lahf_lm dtherm
  bugs            :
  bogomips        : 3332.95
  clflush size    : 64
  cache_alignment : 64
  address sizes   : 32 bits physical, 32 bits virtual
  power management:

processor       : 1
  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 28
  model name      : Intel(R) Atom(TM) CPU N280   @ 1.66GHz
  stepping        : 2
  microcode       : 0x219
  cpu MHz         : 1666.479
  cache size      : 512 KB
  physical id     : 0
  siblings        : 2
  core id         : 0
  cpu cores       : 1
  apicid          : 1
  initial apicid  : 1
  fdiv_bug        : no
  f00f_bug        : no
  coma_bug        : no
  fpu             : yes
  fpu_exception   : yes
  cpuid level     : 10
  wp              : yes
  flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse     sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 x    tpr pdcm movbe lahf_lm dtherm
  bugs            :
  bogomips        : 3332.82
  clflush size    : 64
  cache_alignment : 64
  address sizes   : 32 bits physical, 32 bits virtual
  power management:

TestSum64 test fails on bigendian

I am running the test using go test -v on my s390x system. Logs are below:

=== RUN   Test
--- PASS: Test (0.00s)
        xxhash_test.go:34: xxhash backend: GoUnsafe
        xxhash_test.go:35: Benchmark string len: 2235
=== RUN   TestHash32
--- PASS: TestHash32 (0.00s)
=== RUN   TestHash32Short
--- PASS: TestHash32Short (0.00s)
=== RUN   TestWriteStringNil
--- PASS: TestWriteStringNil (0.01s)
=== RUN   TestSum64
--- FAIL: TestSum64 (0.00s)
        xxhash_test.go:99: [i=4,chunkSize=1] got 0x84140adb817143db; want 0x415872f599cea71e
FAIL
exit status 1
FAIL    github.com/OneOfOne/xxhash 0.035s

I think the function New64() produces different result on bigendian. Can someone help me on this?

New tag

Hello,

It will be useful to create a new tag which will include 2c166c6.

The debian package for xxhash relies on tags for its versioning. So it needs a new tag to include this fix.

The new tag will help us fix the build errors on mips for debian.

Playground doesn't work

Hi, I followed the link in the README to https://play.golang.org/p/rhRN3RdQyd, but running the code leads to an error:

go: finding github.com/OneOfOne/xxhash v1.2.5
go: downloading github.com/OneOfOne/xxhash v1.2.5
go: extracting github.com/OneOfOne/xxhash v1.2.5
build command-line-arguments: cannot load github.com/OneOfOne/xxhash/native: module github.com/OneOfOne/xxhash@latest found (v1.2.5), but does not contain package github.com/OneOfOne/xxhash/native

Panic occasionally with unsafe version of writeString()

I wrote something like this in my project:

    h := xxhash.New32()
    h.WriteString(param)  // please notice that |param| was passed by value, not pointer

And I got panic occasionally as:

WARN[0141] panic occurred: runtime error: invalid memory address or nil pointer dereference  func=runtime.panicmem line=62
WARN[0141] goroutine 1519 [running]:
framework/logger.GetStacktraces(0xc42052ba40, 0xc4203eebf0)
	/Users/foo/gohome/src/framework/logger/logger.go:29 +0x74
afsserver/user.catchReturnErr(0xc4203ef688)
	/Users/foo/gohome/src/afsserver/user/helpers.go:34 +0x1d9
panic(0x6215c0, 0xc4200140e0)
	/usr/local/Cellar/go/1.7.3/libexec/src/runtime/panic.go:458 +0x243
github.com/OneOfOne/xxhash.writeString(0x9890a0, 0xc420493a70, 0x0, 0x0, 0x699580, 0xc4203eee01, 0xc420493a70)
	/Users/foo/gohome/src/github.com/OneOfOne/xxhash/xxhash_unsafe.go:29 +0x27
github.com/OneOfOne/xxhash.(*XXHash32).WriteString(0xc420493a70, 0x0, 0x0, 0x1, 0xc4202b4248, 0x1)
	/Users/foo/gohome/src/github.com/OneOfOne/xxhash/xxhash.go:79 +0x4b
... omitted stack

If I change the code to:

    h := xxhash.New32()
    h.Write([]byte(param))

then the panic will disappeared.

new release?

Hello,

I'm looking at v1.2...master and wandering, maybe it's time for new release/tag? So dep could pick up newer version. What do you think?

OneOfOne and cespare xxhash implementations give different results

It appears that when using the Write interface for multiple blocks of data, this xxhash and the @cespare xxhash produce different results. For a single block they do agree.

The following test program demonstrates the issue:

package hash_test

import (
	"testing"

	xx2 "github.com/OneOfOne/xxhash"
	xx1 "github.com/cespare/xxhash"
)

var lines = [][]byte{
	[]byte("Lorem ipsum dolor sit amet, consectetuer adipiscing elit."),
	[]byte("Aenean commodo ligula eget dolor. Aenean massa."),
}

func TestHashImplementations(t *testing.T) {
	h1 := xx1.New()
	h2 := xx2.New64()
	for i, line := range lines {
		s1 := xx1.Sum64(line)
		s2 := xx2.Checksum64(line)
		if s1 != s2 {
			t.Errorf("line %d single: %x != %x\n", i, s1, s2)
		}
		h1.Write(line)
		h2.Write(line)
		var b1, b2[8]byte
		copy(b1[:], h1.Sum(nil))
		copy(b2[:], h2.Sum(nil))
		if b1 != b2 {
			t.Errorf("line %d composite: %x != %x\n", i, b1, b2)
		}
	}
}

The output is:

--- FAIL: TestHashImplementations (0.00s)
	hash2_test.go:30: line 1 composite: b70cd033547f374d != fefd5c3b483cdc2a
FAIL

runtime error: unsafe pointer conversion

Hi @OneOfOne

See this error when running with the race detector using this version of Go:

go version devel +3c0fbee Tue Nov 5 05:22:07 2019 +0000 linux/amd64

2019/11/05 10:30:22 http: panic serving host:41960: runtime error: unsafe pointer conversion
goroutine 28 [running]:
net/http.(*conn).serve.func1(0xc0003701e0)
/home/ubuntu/repos/go/src/net/http/server.go:1772 +0x147
panic(0xe59c20, 0xc0001d4140)
/home/ubuntu/repos/go/src/runtime/panic.go:967 +0x396
github.com/OneOfOne/xxhash.(*XXHash64).WriteString(0xc000306060, 0xc0001da0f0, 0xa, 0x7fef142c9028, 0xc000306060, 0x1)
/home/ubuntu/go/pkg/mod/github.com/!one!of!one/[email protected]/xxhash_unsafe.go:52 +0x90
io.WriteString(0x111c9c0, 0xc000306060, 0xc0001da0f0, 0xa, 0x7fef1a51b008, 0xc0001e2760, 0x7fef142c9008)
/home/ubuntu/repos/go/src/io/io.go:291 +0x82
strings.(*Reader).WriteTo(0xc0001d4120, 0x111c9c0, 0xc000306060, 0x7fef142c9008, 0xc0001d4120, 0x1)
/home/ubuntu/repos/go/src/strings/reader.go:137 +0x141
io.copyBuffer(0x111c9c0, 0xc000306060, 0x111d3a0, 0xc0001d4120, 0x0, 0x0, 0x0, 0x5dc14f3e, 0xc0001da0f0, 0xa)
/home/ubuntu/repos/go/src/io/io.go:387 +0x47f
io.Copy(...)
/home/ubuntu/repos/go/src/io/io.go:364
main.hashString(0xc0001da0f0, 0xa, 0x0)
/home/ubuntu/repos/code/main.go:713 +0x288
main.main.func2(0x11278a0, 0xc0002ea8c0, 0xc0001c2400)
/home/ubuntu/repos/code/main.go:1531 +0x90b
net/http.HandlerFunc.ServeHTTP(0xedde38, 0x11278a0, 0xc0002ea8c0, 0xc0001c2400)
/home/ubuntu/repos/go/src/net/http/server.go:2012 +0x52
net/http.(*ServeMux).ServeHTTP(0x1629300, 0x11278a0, 0xc0002ea8c0, 0xc0001c2400)
/home/ubuntu/repos/go/src/net/http/server.go:2387 +0x289
net/http.serverHandler.ServeHTTP(0xc0002ea620, 0x11278a0, 0xc0002ea8c0, 0xc0001c2400)
/home/ubuntu/repos/go/src/net/http/server.go:2807 +0xcf
net/http.(*conn).serve(0xc0003701e0, 0x112a7e0, 0xc0001a11c0)
/home/ubuntu/repos/go/src/net/http/server.go:1895 +0x838
created by net/http.(*Server).Serve
/home/ubuntu/repos/go/src/net/http/server.go:2932 +0x5c0

Streaming support

Is there a way to hash a large amount of data without loading it all into ram at once? Thanks

Wrong checksum is size >= 2,147,483,647

After updating the golang version and pulling from git we have encountered an issue with checksums of data size bigger than 2,147,483,647 bytes.

with the following code :

package main

import (
"github.com/OneOfOne/xxhash"
"os"
"io"
"fmt"
"flag"
)

func main() {
var name string
flag.StringVar(&name, "filename", "", "")
flag.Parse()
t, _ := os.Open(name)
h := xxhash.New64()
io.Copy(h, t)
fmt.Print(h.Sum64())
}

By comparing with the python version which works correctly and from random data ( dd if=/dev/urandom of=test-rd count=1048576 bs=2048) we get the following :

  • 11325124925539256852 (python3.4.3 xxhash)
  • 11833478386888335343 (golang1.6.1)
    running the dd command again :
  • 430381004049939737 (python3)
  • 11833478386888335343 (golang1.6)

And here the checksums are correct ( dd if=/dev/urandom of=test-rd count=1048575 bs=2048:
1705419376331282536 (for both)

multiple writes and native returns bad hash ?

Hi,

am I doing something wrong ?

Native and NonNative returns differents hashes in this test

// note : fails with go version go1.4.2 windows/amd64

package xxhash_test

import (
    "fmt"
    "testing"

    C "github.com/OneOfOne/xxhash"
    N "github.com/OneOfOne/xxhash/native"
)

func TestERR(t *testing.T) {
    t.Logf("test with NATIVE")
    h := N.New64()

    //
    p1 := "http"
    p2 := "://"
    p3 := "www.marmiton.org"
    p4 := "/recettes/recherche.aspx"
    p5 := "?st=2&aqt=gateau&"

    url := p1 + p2 + p3 + p4 + p5

    // compute hash by parts
    h.Write([]byte(p1))
    h.Write([]byte(p2))
    h.Write([]byte(p3))
    h.Write([]byte(p4))
    h.Write([]byte(p5))
    s1 := h.Sum64()

    // compute hash once
    h.Reset()
    h.Write([]byte(url))
    s2 := h.Sum64()

    // should be the same, right ?
    if s1 != s2 {
        t.Logf("parts = '%s%s%s%s%s'", p1, p2, p3, p4, p5)
        t.Logf("url   = '%s'", url)
        t.Errorf("%x %x", s1, s2)
    }
}

func TestOK(t *testing.T) {
    t.Logf("test with NON NATIVE")
    h := C.New64()

    //
    p1 := "http"
    p2 := "://"
    p3 := "www.marmiton.org"
    p4 := "/recettes/recherche.aspx"
    p5 := "?st=2&aqt=gateau&"

    url := p1 + p2 + p3 + p4 + p5

    // compute hash by parts
    h.Write([]byte(p1))
    h.Write([]byte(p2))
    h.Write([]byte(p3))
    h.Write([]byte(p4))
    h.Write([]byte(p5))
    s1 := h.Sum64()

    // compute hash once
    h.Reset()
    h.Write([]byte(url))
    s2 := h.Sum64()

    // should be the same, right ?
    if s1 != s2 {
        t.Logf("parts = '%s%s%s%s%s'", p1, p2, p3, p4, p5)
        t.Logf("url   = '%s'", url)
        t.Errorf("%x %x", s1, s2)
    } else {
        fmt.Printf("test 'C' : %x %x\n", s1, s2)
    }
}

Getting incorrect hash for "www."

Here is the program to reproduce the issue

Correct 64bit xxhash for "www." is 17563882986220463421

cversion returns the correct value

package main

import (
    "fmt"
    "github.com/OneOfOne/xxhash/native"
    cxhash "github.com/OneOfOne/xxhash"
    xxhash1 "github.com/pierrec/xxHash/xxHash64"
)



func main() {
    fmt.Println(xxhash.Checksum64([]byte("www.")))   // 11951406202292033109
    fmt.Println(xxhash1.Checksum([]byte("www."), 0)) // 17563882986220463421
    fmt.Println(cxhash.Checksum64([]byte("www.")))   // 17563882986220463421
}

get hash of full object if i have parts

Hi. I have file 16Mb and hashes for each Mb (16 items) Does it possible to get complete hash for 16Mb using this 1Mb hashes or i need to recalc for all 16Mb ?

Unable to write more than 1G of data using CGO engine

See : https://github.com/OneOfOne/xxhash/blob/master/xxhash_cgo.go#L111

This limitation does not apply on the pure GO engine.
Looking at the code in https://github.com/OneOfOne/xxhash/blob/master/c-trunk/xxhash.c#L875 which uses a size_t, the standard defines it as a unsigned integer with at least 16 bits. Usually it's 32bits on 32bit arch and 64bits on 64bit arch. Why is there a limit of 1G ? Either the limit is 16bits wide or the same as integer in go.

Ref (https://en.wikipedia.org/wiki/C_data_types#Size_and_pointer_difference_types)

App engine build fails

In the /native/xxhash_safe.go the newByteReader method is incompatible with the other implementations as it takes a *[]byte instead of a []byte. so when building with goapp the build fails.

let's try to get this code into the standard library.

Have you made any effort to try and get this hashing algorithm into the standard libraries "hash" pkg yet? The code in the native pkg is perfect in terms of being (impressively) consistent with the idiomatic development style found in Go's standard library.

Is there any chance you would be willing to switch to Go's licensing scheme?

Remove the cgo version.

This is a reminder that the cgo version will be eventually removed when Go1.7 gets released.

[PENDING] Windows binary

Dear Ahmed,
Could you be so kind to generate .exe for the rest of us who are mere Windows users w/o compiler?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.