GithubHelp home page GithubHelp logo

xid's Introduction

Globally Unique ID Generator

godoc license Build Status Coverage

Package xid is a globally unique id generator library, ready to safely be used directly in your server code.

Xid uses the Mongo Object ID algorithm to generate globally unique ids with a different serialization (base32hex) to make it shorter when transported as a string: https://docs.mongodb.org/manual/reference/object-id/

  • 4-byte value representing the seconds since the Unix epoch,
  • 3-byte machine identifier,
  • 2-byte process id, and
  • 3-byte counter, starting with a random value.

The binary representation of the id is compatible with Mongo 12 bytes Object IDs. The string representation is using base32hex (w/o padding) for better space efficiency when stored in that form (20 bytes). The hex variant of base32 is used to retain the sortable property of the id.

Xid doesn't use base64 because case sensitivity and the 2 non alphanum chars may be an issue when transported as a string between various systems. Base36 wasn't retained either because 1/ it's not standard 2/ the resulting size is not predictable (not bit aligned) and 3/ it would not remain sortable. To validate a base32 xid, expect a 20 chars long, all lowercase sequence of a to v letters and 0 to 9 numbers ([0-9a-v]{20}).

UUIDs are 16 bytes (128 bits) and 36 chars as string representation. Twitter Snowflake ids are 8 bytes (64 bits) but require machine/data-center configuration and/or central generator servers. xid stands in between with 12 bytes (96 bits) and a more compact URL-safe string representation (20 chars). No configuration or central generator server is required so it can be used directly in server's code.

Name Binary Size String Size Features
UUID 16 bytes 36 chars configuration free, not sortable
shortuuid 16 bytes 22 chars configuration free, not sortable
Snowflake 8 bytes up to 20 chars needs machine/DC configuration, needs central server, sortable
MongoID 12 bytes 24 chars configuration free, sortable
xid 12 bytes 20 chars configuration free, sortable

Features:

  • Size: 12 bytes (96 bits), smaller than UUID, larger than snowflake
  • Base32 hex encoded by default (20 chars when transported as printable string, still sortable)
  • Non configured, you don't need set a unique machine and/or data center id
  • K-ordered
  • Embedded time with 1 second precision
  • Unicity guaranteed for 16,777,216 (24 bits) unique ids per second and per host/process
  • Lock-free (i.e.: unlike UUIDv1 and v2)

Best used with zerolog's RequestIDHandler.

Notes:

  • Xid is dependent on the system time, a monotonic counter and so is not cryptographically secure. If unpredictability of IDs is important, you should not use Xids. It is worth noting that most other UUID-like implementations are also not cryptographically secure. You should use libraries that rely on cryptographically secure sources (like /dev/urandom on unix, crypto/rand in golang), if you want a truly random ID generator.

References:

Install

go get github.com/rs/xid

Usage

guid := xid.New()

println(guid.String())
// Output: 9m4e2mr0ui3e8a215n4g

Get xid embedded info:

guid.Machine()
guid.Pid()
guid.Time()
guid.Counter()

Benchmark

Benchmark against Go Maxim Bublis's UUID.

BenchmarkXID        	20000000	        91.1 ns/op	      32 B/op	       1 allocs/op
BenchmarkXID-2      	20000000	        55.9 ns/op	      32 B/op	       1 allocs/op
BenchmarkXID-4      	50000000	        32.3 ns/op	      32 B/op	       1 allocs/op
BenchmarkUUIDv1     	10000000	       204 ns/op	      48 B/op	       1 allocs/op
BenchmarkUUIDv1-2   	10000000	       160 ns/op	      48 B/op	       1 allocs/op
BenchmarkUUIDv1-4   	10000000	       195 ns/op	      48 B/op	       1 allocs/op
BenchmarkUUIDv4     	 1000000	      1503 ns/op	      64 B/op	       2 allocs/op
BenchmarkUUIDv4-2   	 1000000	      1427 ns/op	      64 B/op	       2 allocs/op
BenchmarkUUIDv4-4   	 1000000	      1452 ns/op	      64 B/op	       2 allocs/op

Note: UUIDv1 requires a global lock, hence the performance degradation as we add more CPUs.

Licenses

All source code is licensed under the MIT License.

xid's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xid's Issues

Add an id pool layer

userid_pool := xid.NewPool()
orderid_pool := xid.NewPool()

userid := userid_pool.New()
orderid := userid_pool.New()

Keep idCounter in different pool

Ids in the same pool are unique.

But in different pools, ids do not need to be unique.

big bug: not sortable

var list = make([]string, 0)
var busy = make(chan bool, 1)

func AddXidToList(i string) {
busy <- true
id := xid.New()
list = append(list, id.String()+" "+i)
<-busy
}

func Test_test6(t *testing.T) {
wg := sync.WaitGroup{}
wg.Add(2)
go func() {
for i := 0; i < 1000000; i++ {
AddXidToList("1")
}
wg.Done()
}()
go func() {
for i := 0; i < 1000000; i++ {
AddXidToList("2")
}
wg.Done()
}()
wg.Wait()
for i := 0; i < len(list); i++ {
if i+1 < len(list) && list[i] > list[i+1] {
//1820049 cdiebmnlt656j2nvvvvg 2 cdiebmnlt656j2g00000 1
fmt.Println(i, list[i], list[i+1])
t.Error("big bug")
}
}
if list[len(list)-2] > list[len(list)-1] {
t.Error("big bug2")
fmt.Println(list[len(list)-2], list[len(list)-1])
}
fmt.Println("END")
}
func Test_test7(t *testing.T) {
i := 0
for {
i++
Test_test6(t)
if i == 10 {
break
}
}
}

run the method Test_test7, it print following:
1969577 cdj6vqvlt652mmnvvvvg 1 cdj6vqvlt652mmg00000 1
id_test.go:177: big bug

How do you pronounce xid?

I've been saying "zid", but sometimes "x id". Would be good to know how you say "xid" out loud so I can copy it.

A version tag

Thank you for a nice implementation of the id generator.

I just wonder it's possible to add some semantic version tag? Something like 1.0.0, so that I can specify it in my glide.yaml file.

Cryptographically secure ?

Hi, Thanks for the package. I have been using this in a project of mine and it is very helpful. This is more a usage question than an Issue.

Are the ids generated by this package cryptographically secure ? There are quite a few sources that you use (machine id, process id, counter etc.) but the documentation does not say anything about if it is cryptographically secure (unpredictable) to use this package when there is a necessity. It will be good to mention the answer to this in the README. Thanks once again.

Remove panics

As a library that is used by many other application perhaps a "panic" is not the correct way to handle errors as that would cause the user app to die when that said app should be able to choose what to do? The "New()" function should then return (ID, error)

Breaking Change

XID as a 12 byte (bytea) unique indexed Postgres column

In my Postgres 9.6+ tables, I want to use an xid as a unique indexed column and also sometimes as primary key or foreign key column too. Is it more performant to store it as a 20 character string or 12 byte binary? I initially thought binary would be more efficient, but this article suggests otherwise.

Does anyone have any experience using a bytea as a primary key? If so, is it more efficient to use the default bytea hex format or the older bytea escape format?

Cryptographically secure comparing to ULID

There is a note in the README:

Xid is dependent on the system time, a monotonic counter and so is not cryptographically secure. If unpredictability of IDs is important, you should not use Xids. It is worth noting that most other UUID-like implementations are also not cryptographically secure. You should use libraries that rely on cryptographically secure sources (like /dev/urandom on unix, crypto/rand in golang), if you want a truly random ID generator.

Can XID be used with those random generators and how?

On the other hand, fr ULID that should be possible: https://github.com/ulid/javascript#pseudo-random-number-generators, and here is an example how: https://github.com/prometheus/prometheus/pull/6867/files

Is there a similar example for XID (if this is at all possible)?

Out of Sequence ID

An out of sequence phenomena can be seen by running this code (or there is a mistake in my approach). This code simply compare two ids, generated consecutively:

package main

import (
	"bytes"
	"fmt"

	"github.com/rs/xid"
)

func main() {
	var (
		p, n []byte
	)

	var cnt int64
	for p, n = nil, []byte("0"); bytes.Compare(p, n) < 0; p, n = n, next() {
		cnt++
	}

	idp := conv(p)
	idn := conv(n)
	fmt.Printf("%v %s %s\n", cnt, idp.String(), idn.String())
}

func next() []byte {
	id := xid.New()
	return id[:]
}

func conv(b []byte) xid.ID {
	var id xid.ID
	copy(id[:], b)
	return id
}

Sample outputs:

6135364 bah53vtgl2r1p4vvvvvg bah53vtgl2r1p4o00000
2586623 bah540dgl2r1q1fvvvvg bah540dgl2r1q1800000
3508359 bah55elgl2r1sdfvvvvg bah55elgl2r1sd800000

'/proc/self/cpuset' not found inside the container

I noticed that xid will xor the PID with the CRC of '/proc/self/cpuset' to against the situation when the process is running inside a container. Unfortunately '/proc/self/cpuset' could not be found inside my containers.

Would '/proc/self/cgroup' be a better choice?

Thanks

What is "K-ordered" and "sortable"?

Hi! Could you please explain what exactly is K-ordered and sortable? I'm not sure if I got the meaning of these aspects compared to other approaches.

Does that mean by any chance that the xid generated at two distinct moments e.g. start of today will necessarily be sorted before any xid generated afterwards (behaving like a timestamp)?

Thank you! :)

Failing to get Xid embedded info

Let me start by saying I'm new to golang :)

I can't seem to get the embedded info out of the UUID. Could you shed some light on what might be the issue?

./test.go:15: undefined: xid.Time

package main

import (
	"log"
	"fmt"
	"github.com/rs/xid"
)

func getUuid() string {
	guid := xid.New()
	return guid.String()
}

func printUuidData(s string) {
	ID, err := xid.FromString(s)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(ID)
	return
}

func printUuidTime(s string) {
	t := xid.Time(s)
	fmt.Println(t)
	return
}

func main() {
	testUuid := getUuid()
	printUuidData("Print UUID: " + testUuid)
	printUuidTime(testUuid)
}

Rewriting this some recommendations

You can replace your misnamed RandomInt32 (which should be RandomUint32 because it returns a uint32) with a single return rand.Uint32()

Most of the bitwise mess that is intelligible to non-C developers, and often makes them cry when they look at it with binary.(Big|Little)Endian.PutUint(16|32)

I also found a way to shrink the timestamp data to 2 bytes while keeping the most important aspects of it.

If there is any interest I can make pull requests with some of these changes to make it more readable to most Go programmers and make it easier to modify and update because right now I imagine most developers who don't know bitwise operataions very well look at this codebase and cry and its not making it more efficient.

There is tons of efficiency issues in the code, it uses way way more memory than it needs to.

README: Update the example

The example in the README shows output looking like base64 (mixing lower and upper case letters), not base32.

Potential curse/profane/offensive words generated using xid (in a user-facing setting)

My question is around xid's that may be facing users of a software (i.e. not just stored as keys in a database).

As it is a 20 characters string and the alphabet includes most characters of the alphabet there are decent probabilities that the accidental f*ck shows up or even whole offensive short sentences are formed. Has anyone considered this aspect before and, if yes, found reasonable solutions that would be practical?

I could think of:

  • re-encoding the bytes with a reduced alphabet
  • switching some of aeiou for wxyz

What happens when the 4 byte time value overflows?

4-byte value representing the seconds since the Unix epoch,

It should be in 2038 if I'm not mistaken. It's sooner than you think. 20 years of my life went by like fingersnip this.

Shouldn't a 64bit (8byte) value be used instead?

Current Time

Using time.Now() for getting current time is problematic, when we are using this package, running on different servers with different timezones. Seems logical to use time.Now().UTC() as default - a sane default. Or add some package (or type level) default function for getting current time, that one could change if needed:

var (
	TimeFunc = func() time.Time {
		return time.Now().UTC()
	}
)

Tag a release

Would be nice if you are able to tag a release @rs, including the latest fixes, even if there are not to many of them.

Pro-tip: The auto-generated release notes on GitHub are not to bad; it will generate a link to each PR that has been merged since the last release. I notice that there is no GitHub release for v1.3.0, just a v1.3.0 git tag. I think that still works as far as go.mod is concerned, but it would be possible to create a release for that tag first, if you want better auto-generated release notes.

Usage in docker

I'm always running my golang executables within containers in docker.
The PID is then always 1, which then defeats the purpose of having this field.

Would there be any possible replacement?

The least would be to specify this caveat in the README I think.

Is it possible to generate XID from SQL?

Hi,
To set DEFAULT value for an PK of table, I wonder if has anyone tried to generate a valid xid value by using SQL.

For example for UUIDv4, we can generate default value if uuid-ossp extension is not installed for PostrgreSQL as:

`CREATE TABLE IF NOT EXISTS table (
   id VARCHAR(36) DEFAULT md5(random()::text || clock_timestamp()::text)::uuid  NOT NULL,
....

Of course it'll be better to generate from application but, I it makes simpler to add default value for manual interactions with tables.

Has anyone tried to develop an SQL function to geneate xid?

What if machine ids are the same?

Suppose

  1. I have two machines, and both of them have the same hostname, they will get the same machine id.
  2. I have two machines, and both of them failed to get the hostname, and their random number generator give the same number as machine id.

How do we prevent this from happening?

How do I use/implement it?

Do I use it with a Google Cloud Function?

Or do I use a $5 Digital Ocean Droplet with a node/go app where I use it?

Is it 100% safe uniquely, or only 99.9999999999999999999999999999999999%?

If I need many unique IDs per second, I can just use ten $5 Digital Ocean droplets with the same node app, and they all gonna still spit out 100% unique IDs? Or also only 99.99999999999999999999%?

sorry for my dumbness, I am a beginner

Kind regards

Is xid thread-safe?

Can we safely use the library within goroutines without worrying about locking the call?

Release Tagging

The releases section hasn't been updated since 2018. It might be beneficial to mark the tags as releases so the project page accurately reflects the latest info.

I came to this after running go get -u ./..., seeing there was an update, and wanting to find what changed from 1.4.0 to 1.5.0.

pid in container managed by systemd

Hi guys, my server is managed by systemd, running in a docker container, which probably own the same pid(and it's not 1) in different containers. This will cause xid conflict when multi instances deployed in one host.
Any idea to solve this case?

Use raw []byte value in sql interfaces for smaller indices

As mentioned in #14, XID may be stored as a bytea in Postgres, resulting it to take up 16 bytes rather then 24.

While read/write performance is impacted slighlty, as far as I understand, the benefits of a BYTEA over TEXT in smaller size, which again means smaller (and thus in theory faster) indices for large tables in particular. In addition, there are less "special rules" (e.g. unicode / local encoding rules) for comparison, which agin, in theory, should make it faster to query as well.

https://www.db-fiddle.com/f/jgYzsKTFGu3NU9ZjDjfRUw/0

The link above shows a simple table with an ID as either bytea or string. Given 50.000 entries, the index size is reduced from 2496 kB to 2048 kB by using bytea.

I don't know at which table sizes this become significant, and if it really matters. A propper benchmark with a few million rows and a few quries is probably wise before making any changes.

How to define bytea IDs

Given bytea is used as an ID in the schema, test-code to encode/decode XIDs from binary is provaided here:

type XID struct {
	xid.ID
}

// NewXID generates a new XID instance.
func NewXID() XID {
	return XID{ID: xid.New()}
}

// Value implements the driver.Valuer interface.
func (id XID) Value() (driver.Value, error) {
	if id.IsNil() {
		return nil, nil
	}
	return id.Bytes(), nil
}

// Scan implements the sql.Scanner interface.
func (id *XID) Scan(value interface{}) error {
	switch b := value.(type) {
	case []byte:
		_id, err := xid.FromBytes(b)
		if err != nil {
			return err
		}
		id.ID = _id
		return nil
	case nil:
		id.ID = xid.ID{}
		return nil
	default:
		return fmt.Errorf("xid: scanning unsupported type: %T", value)
	}
}

`kern.uuid` is misinterpreted as a machine ID when it's actually a kernel version identifier

The readPlatformMachineID function in hostid_darwin.go performs a syscall to obtain kern.uuid, and it uses the result as the machine ID. This value isn't actually meant to be a unique ID for a machine, but rather a unique ID for the currently running kernel version.

To verify this, you can run sysctl kern.uuid and then search for the value on Google, and as long as your current version of macOS has been released for a while, you'll likely find other people with the same ID.

(For concrete examples of this, see shirou/gopsutil#1058.)

Question around readMachineID

Thank you for your work! I've been using it as an inspiration for ID generation for a particular project.

I'm most likely wrong here, but I'm curious to verify my understanding. Feel free to ignore the question, and just close the ticket!

I cannot see how readMachineID is stable.

func readMachineID() []byte {
	id := make([]byte, 3)
	hid, err := readPlatformMachineID()
	if err != nil || len(hid) == 0 {
		hid, err = os.Hostname()
	}
	if err == nil && len(hid) != 0 {
		hw := md5.New()
		hw.Write([]byte(hid))
		copy(id, hw.Sum(nil))
	} else {
		// Fallback to rand number if machine id can't be gathered
		if _, randErr := rand.Reader.Read(id); randErr != nil {
			panic(fmt.Errorf("xid: cannot get hostname nor generate a random number: %v; %v", err, randErr))
		}
	}
	return id
}

Assuming a Linux platform, readPlatformMachineID would return according to the man machine-id:

The machine ID is a single newline-terminated, hexadecimal, 32-character, lowercase ID

If we successfully read it, and the length is correct, we then md5 hash it, and copy only the 3 first bytes out of 16 bytes.

Wouldn't this be prone to collisions as we are only using the 3 first bytes of the md5 hash?

I wrote a tiny test, following the same approach but randomly generating the string to be feed into the hash

package unique

import (
	"crypto/md5"
	"math/rand"
	"testing"
)

const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func randStr(r *rand.Rand) string {
	b := make([]byte, 32)
	for i := range b {
		b[i] = chars[r.Intn(len(chars))]
	}
	return string(b)
}

func genID(hid string) []byte {
	id := make([]byte, 3)
	hw := md5.New()
	hw.Write([]byte(hid))
	copy(id, hw.Sum(nil))
	return id
}

func TestUnique(t *testing.T) {
	r := rand.New(rand.NewSource(999))

	rounds := 100000

	mStr := map[string]interface{}{}
	mID := map[string]string{}

	for i := 0; i < rounds; i++ {
		src := randStr(r)
		_, ok := mStr[src]
		if ok { // skip when we've already randomly generated the same string as source
			continue
		}
		mStr[src] = struct{}{}

		genID := string(genID(src))
		storedSrc, ok := mID[genID]
		if ok {
			t.Fatalf("collision? round: %d, storedSrc: '%s', src: '%s', id: '%v'",
				i, storedSrc, src, genID)
		}
		mID[genID] = src
	}
}

func TestCollision(t *testing.T) {
	src1 := "pplbyfSYmSkuUQbjJvcOWsUuSwoPYOTk"
	src2 := "LRnRfzVvPAWbEhDNOegktwBvpaCnutyH"
	require.NotEqual(t, genID(src1), genID(src2))
}
➜  unique go test -v ./...
=== RUN   TestUnique
    unique_test.go:48: collision? round: 12209, storedSrc: 'pplbyfSYmSkuUQbjJvcOWsUuSwoPYOTk', src: 'LRnRfzVvPAWbEhDNOegktwBvpaCnutyH', id: '�(m'
--- FAIL: TestUnique (0.01s)
=== RUN   TestCollision
    unique_test.go:58: 
        	Error Trace:	unique_test.go:58
        	Error:      	Should not be: []byte{0xa3, 0x28, 0x6d}
        	Test:       	TestCollision
--- FAIL: TestCollision (0.00s)
FAIL
FAIL	github.com/sata/unique	0.013s
FAIL

My take of it is, after 12209 attempts, we ended up with two identical machine IDs while their sources are different.

Is the xid ID considered to be stable since we take epoch time + machine identifier + local process id + start counter at a random value? i.e the likelihood of there being a collision of xid IDs is so low due to the other factors?

Command-line tool

Dear @rs,
I'm not sure how to appreciate the beauty of your solution without being able to run it on Windows as a binary that produces the result. E.g. check ULID and CUIDv2

$ ulid.exe
01GPR0A4J919E253QDAGVR8MK7

$ cuidgen.exe
gib227a07c6a1njttd9jn982

upgrade go version to 1.13 or later?

so many error with golint.
Although the error is reported, it can be used normally

invalid operation: signed shift count 16 (untyped int constant) requires go1.13 or later

2022-04-01 21-12-29屏幕截图

network partitioning

you use time when generating the UUID.
So if the network partitions the clocks will skew.
https://en.wikipedia.org/wiki/Clock_skew

I remember readin once the google paper on this for spanner.
the buggers have atomic clocks in each data center and they account for clock skew and also bandwidth latency during clock synchronisation.

Time and Space is the biggest computing problem :)

SO its a heuristic approach. I wonder if there is benchmarking code where someone has run tests with many servers and clients and controlled the partitioning and latency to find out how useful the google approach in. If you can get ordered keys and use OT ( operational transforms ) all your chickens will come home to roost...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.