GithubHelp home page GithubHelp logo

astarstartup / linearid Goto Github PK

View Code? Open in Web Editor NEW
2.0 0.0 0.0 171 KB

Generates monotonic and linear unique IDs blazing fast; faster than UUID and ULID; for use with React, Next.JS, Vue, PlanetScale and other sharded databases and web frameworks.

JavaScript 1.25% TypeScript 98.75%

linearid's Introduction

LinearId is an npm package for generating 64-bit and 128-bit monotonic unique IDs for use with sharded databases like PlanetScale. Please read this ReadMe file on GitHub for the latest updates, where you can also contribute code, bug reports, feature requests, documentation improvements, etc.

example workflow

Contributing

We need so people to help test LinearId and improve our documentation to ensure everything is working properly, to use LinearId with the popular database engines, and contribute to the unit tests to get optimal test coverage.

I can't figure out how to configure the NPM package so we can use the import syntax, you have to use const require syntax for now. I can use some help on #11. Please start by reading the Contributing Guide. Thanks.

Solution

When your website grows to a large number of users, you need to shard the database and use multiple SQL servers. When that the database is copied the autoincrement primary key isn't valid anymore. PlanetScale automatically shards the database to scale to more users, so this is why there are no foreign keys with PlanetScale. While you might be tempted to use UUID, it does not generate values that always increase (i.e. monotonically increasing), which is not good for doing binary searches with. Binary searches require monotonically increasing search indexes, and the SQL database engine uses the inode structure in your data drives to search for SQL table rows.

Another solution is to use Universally Unique Lexicographically Sortable Identifier (ULID), but it uses a 48-bit millisecond timestamp MSB and 80-byte random number in the LSB. There are two problems with this design approach. First is that the x86 CPU doesn't have a sub-second timestamp, so databases do not use them. This means that to translate the milliseconds to seconds when you want to work with the database and you will have to divide and multiple by 1000, which is slow and error prone. To get a sub-second timestamp on an x86 server will require a dedicated thread to do a spin clock with an inter-process pipe, which is complex and unnecessary. We want an approach that doesn't have to generate any random numbers at runtime and we work in seconds and it will work for almost everything for thousands of years.

128-bit LinearId (LID16) use a 33-bit Unix second timestamp in the Most-Significant Bits (MSB) followed by a 22-bit sub-second spin ticker and 73-bit Cryptographically-Secure Generated-Upon-Boot Random Number (CSGUBRN):

 v--MSb                        128-bit LID                           LSb--v
+---------------------------------------------------------------------------+
| 33-bit seconds timestamp | 22-bit sub-second spin ticker | 73-bit CSGUBRN |
+---------------------------------------------------------------------------+

Statistically this means that when you have two web servers active, the probability that both servers generate the same random number is 7.12e-41%. If you had 1,000 servers running then the probability would be 1.06e-22%, which is a 1 in 9,444,732,965,739,290,427,392 chance. If you had 1,000,000 servers running, the probability would be 1.06e-16%, which is a 1 in 9,444,732,965,739,290 chance and is a 53-bit number. If there ever is actually is more than one server with the same source id, this means that the server will have to regenerate a 73-bit random source id upon boot, which will result in the first database write from that server to have to be performed one. This makes this bit pattern statistically acceptable to use for military and banking applications.

The 22-bit sub-second spin ticker caps out the number of calls you can make to per second to 2^22, which is 4,194,304. If you make more calls than this per second than the algorithm will spin wait until the next second and then reset the sub-second ticker. Assuming the upper limit of a normal computer, which is no more than 4,294,967,296Hz (4.3GHz) and just so happens to be 2^32 or 32-bits, making the math easy. This would give you about 1024 instructions between when you can call LID. Given not all CPU instructions are single-cycle, you're usually waiting for memory, and you're going to be creating a data structure, it's highly unlikely you'll ever hit this cap and if you ever did you'll probably have no problem with the delay. This is an edge case.

The 36-bit timestamp has an epoch span of 2,177.6 years. By that time everything we know including your software and hardware will be long gone and forgotten. The above characteristics make the 22-bit spin ticker and 70-bit CSGUBRN a sweet spot that will work for almost every computer and last not be outdated for thousands of years.

The benefit of LID is that you don't need a naming server. You can use a 32-bit timestamp, a 22-bit sub-second ticker, and 10-bit server id if you use a naming server and that will give you an optimized 64-bit index, but each thread that uses LID will have to have it's own source, so you can quickly run out of source ids.

To optimize for SQL and other database searches, we need to take advantage of the 64-bit index in the inode data structure used by all in-disk database engines. Indexing can be very complicated and you can index your database tables different ways at runtime to optimize your lookups. You don't just want to XOR the LID LSW and MSW together because you'll get clustering, the result will be non-monotonic, and as the database grows you will get collisions. For this reason it's better to create new database rows using 128-indexes that you then index contiguously.

For users of your websites using LID, they will get HTML where the items with LIDs will show up with an HTML property uid that will be a string. When this string is 32-characters long (in hex so that is 16-bytes) that means it's a 128-bit LID that has not been compacted to a 64-bit UID. In the OS filesystem, inodes have timestamps, so when you see these 32-character UIDs you will need to extract the seconds from the timestamp and search for the database row by timestamp and UID.

Also, check out my open-source C++ software to at https://github.com/KabukiStarship. Thanks.

Quickstart

1. Install npm package:

npm install linearid

2. Set compiler options:

{
  "compilerOptions": {
    "target": "es2020",
    //...
  }
  //...
}

3. Add to your Drizzle ORM schema:

import { and, bigint, eq, mysqlTable, varbinary } from "drizzle-orm/mysql-core";

export const UserAccounts = mysqlTable('UserAccounts', {
	uidx: bigint('uidx').primaryKey(),
	uid: varbinary('uid', { length: 16}),
  time_created: bigint('time_created', { mode: 'number' }),
  //...
});

4 Add to your TypeScript or JavaScript imports:

//import { LIDFromHex, LIDNext, LIDNextBuffer, LIDToHex } from "linearid";
//import { randomInt } from 'crypto';

// or

const { randomInt } = require('crypto');
const { BufferToBigInt, HexToBigInt, LIDNext, LIDNextBuffer
} = require("linearid");

5 Generate LIDs (Drizzle example in TypeScript):

import { eq, gte, lte } from 'drizzle-orm';

let lid = LIDNext(randomInt);
const LidHexString = lid.toString(16);
console.log('\nExample LID hex string:0x' + LidHexString);
lid = HexToBigInt(lid_hex_string);

let Buf = LIDNextBuffer(randomInt);
lid = BufferToBigInt(Buf);

const InsertUserAccount = async (account: UserAccount) => {
  return db.insert(UserAccounts).values(account);
}

let Timestamp = LIDTimestamp(lid);

/* I think this is how you do it but I'm still trying to get
this working. If you know how to do this please contribute on
GitHub and we will love you forever; thanks. */
let results = await db.select().from(UserAccounts).where(
  and(
    and(
      gte(users.date_created, Timestamp),
      lte(users.date_created, Timestamp + 2n)
    ),
    eq(users.uid, uid)
  )
);

// When you have converted the LID 
results = await db.select().from(UserAccounts).where(
  eq(users.uidx, uidx)
);

64-bit Local LIDS

When rendering UI components on client and in many other situations you need to generate a UID, or a ref in React, without adding it to a database, so there is no need for a source id. JavaScript uses a millisecond timestamp natively, which when truncated to 32-bits provides an epoch of 49.7 days, which is much longer than the expected time that a webpage. It's not expected for users to need to generate 2^32 Local LID (LLID) per second, but it's nice and easy to just use either a 32-bit seconds timer or the lower 32-bits of a milliseconds timer in the Most Significant 32-bits ORed with a 32-bit sub-timer ticker; sub-timer meaning either sub-second or sub-millisecond.

import { LLIDNextHex } from 'linearid';

const ExampleItems = [ 'Foo', 'Bar' ]

export function ExampleList() {
  return <ul> { ExampleItems?.map((item) => {
      return <li ref={LLIDNextHex()}>{item}</li>
    })}
  <ul>
}

License

Copyright AStartup.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

linearid's People

Contributors

cookingwithcale avatar

Stargazers

Vincent Waller avatar  avatar

linearid's Issues

Add descriptor for LID format with epoch to pass as optional argument

Problem

The problem I am addressing on this mission is that some people want to set the epoch, which will allow them to pack the timestamp into fewer bits to suit their needs Others want more ticker bits.

Solution

The solution that I'm addressing on this mission is to use a JSON object to describe the bit map and set the timestamp epoch.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Add 64-bit Local LID (LLID)

Problem

The problem I am addressing on this mission is that I need to faster method to assign UIDs in React on the client that has nothing to do with a database.

Solution

The solution that I'm addressing on this mission is to use the JavaScript microsecond timestamp and use a 32-bit second or millisecond timer and a 32-bit sub-second ticker.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Add LIDBufferFromHex

Problem

The problem I am addressing on this mission is that in practice I will need to give the users a LID hex string, and I will need to convert the hex string to a Buffer.

Solution

The solution that I'm addressing on this mission is self-evident.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Remove console.log() statements

Problem

The problem I am addressing on this mission is there are a lot of console.log statements in the npm package code. Looks bad.

Solution

The solution that I'm addressing on this mission is I just deleted the comments and rebuilt and patched the npm package so this code is live. I still don't know how to automate the changeset.

File Affected

  1. **/*.*

Hierarchy

Tags

Mission

A

Sessions

License

Copyright AStartup; all rights reserved.

Add function to print and parse LID string

Problem

The problem I am addressing on this mission is that just because I don't want to use a string doesn't mean others don't.

Solution

The solution that I'm addressing on this mission is to print the LID to a hex string without any delimiters. To test this I need to parse LIDs so I need an optimal scanner.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Add 64-bit LID functions with postfix 8

Problem

The problem I am addressing on this mission is that most people will be able to work just fine using a 64-bit LID.

Solution

What I am accomplishing on this mission to solve the problem is that I'm going to keep the 128-bit LID functions the same, but I'm going to add the new functions with a postfix 8, as in 8 bytes. The other option is to use a postfix 128, but that is more typing that isn't required so I'm going to have to say no on this one.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Add hybrid 128-bit and 64-bit LID format

Problem

The problem I am addressing on this mission is that I need to be able to start out my website using 64-bit LIDs and pack those up into contiguous indexes 1-N, and upgrade to 128-bit LID when I have enough servers where

Solution

The solution that I'm addressing on this mission is that if the value is above 64-bits then it's a 128-bit LID, but if it's less than 2^64 then it's a 64-bit LIK. The 128-bit LIDs, or LID16, use a 33-bit timestamp and 73-bit source id. 64-bit LID with a 32-bit seconds timestamp and 16-bit source id.

On the matter of the epoch, only systems that have a 0 in b64-b72 will be affected, and the solution is to just increment the ticker so it's not at 0.

Undone

Part of this task is also to figure out how to pack up the 64-bit LIDs into packed uids; I'm kicking the can to another Mission. Maybe I need to bump down to a 31-bit timestamp and if the MSb is asserted then it's a packed 64-bit UID. That would be acceptable for up to (2^31) -1 UIDs. Either way when you get up to having that many table rows maybe this isn't the solution for your exact tech stack. Sounds good enough for my foreseeable company needs.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright AStartup; all rights reserved.

Fix broken LIDNextBuffer function

Problem

The problem I am addressing on this mission is that I forgot to update the LIDNextBuffer arguments to take randomInt.

Solution

The solution that I'm addressing on this mission is self-evident.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Add function to convert a LID to a Buffer; Switch LED word order of [msb, lsb] to [lsb, msb]

Problem

The problem I am addressing on this mission is that the first element is element 0 of the array, which is the LSB.

Solution

The solution that I'm addressing on this mission is to just make the element 0 the Least-Significant Word the LSB and element 1 is the MSB, and I'm going to add a function to convert LID from [bigint, bigint] to a Buffer by hand with some good old bit shifting and masking.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright AStartup; all rights reserved.

Remove calls to crypto.randomInt and add instruction in ReadMe to use the CryptoRandomInt function

Problem

The problem I am addressing on this mission is that when you compile a module that uses crypto I'm unable to import the function because the way it's coded in the browser is not the same as the way that it's coded in Node.

Solution

The solution for the above Problem I will finish on this mission is to use a function pointer to pass the random number generator function in like so:

const { randomInt } = require('crypto');
let [lsw_i, msw_i] = LIDNext(randomInt);

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Add function to convert LID to seconds timestamp

Problem

The problem I am addressing on this mission is that I just learned that to search for SQL table rows the fastest way to do so is by timestamp.

Solution

The solution that I'm addressing on this mission is to just to do some bit shifting and divide by 1000.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright AStartup; all rights reserved.

Switch to random LIDSource

Problem

The problem I am addressing on this mission is that we are going to have to expose these numbers to users over the web to use them as indexes.

Solution

The solution that I'm addressing on this mission is that last time I did it wrong and I had to use BigInt, and had to generate two 32-bit random numbers and bit shift one up 32 bits and OR them together. I also had to upgrade to "target": "es2020" in the tsconfig.json { "compilerOptions" { ... }}.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Update ReadMe

Problem

The problem I am addressing on this mission is...

Solution

The solution that I'm addressing on this mission is...

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Implement n-bit LID (v0.3)

Problem

The problem we must solve to pass this milestone is that not everyone wants to use 33-bit timestamps and 22-bit sub-second tickers.

Solution

The solution that passes this milestone is to provide a descriptor object that allows you to set the bit pattern.

Requirements

  1. The system shall be...

Problems with Solution

The problems with this solution are...

File Affected

  1. **/*.*

Hierarchy

Tags

Milestone

License

Copyright AStartup; all rights reserved.

Upgrade to 22-bit ticker

Problem

The problem I am addressing on this mission is that there are still some zeros in the MSb of the timestamp when I bit shift 16 bits up.

Solution

The solution that I'm addressing on this mission is that when I bit shift up 22 bits there is always a 0 in the bits and 2^22 is 4,194,304 and it's unlikely that I'll need to make that many calls second to LID but not impossible, but in that case I wouldn't care if it had to spin wait for 5% of the time.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Implement system to convert from 64 and 128-bit LIDs to contiguous UIDs; switch back to LID64 and LID128

Problem

The problem I am addressing on this mission is that I don't have a solid plan to convert from 128-bit to 64-bit UIDs.

Solution

The solution that I'm addressing on this mission is to provide multiple strategies to convert 128-bit UIDs into 64-bit UIDs. If I XOR the bits together then it won't be linear, and I'll have to change names. When the SQL table rows are first inserted, we won't be expecting to have high database reads on that row for some time. During this time I need to search for the row by timestamp and LID. I need to seek consultation to if this is appropriate; sounds good to me.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright 2023 AStartup; all rights reserved.

Utilities: Fix test

Problem

The problem I am addressing on this mission is that the unit tests were all broken.

Solution

The solution that I'm addressing on this mission is that I finally got the Utilities unit test to pass. The game plan is to debug the LLID first, then the LID64, then the LID 128.

File Affected

  1. **/*.*

Hierarchy

Tags

Mission

A

Sessions

License

Copyright AStartup; all rights reserved.

This

Mission

Our mission is to generate n-bit monotonically increasing unique ids as fast as possible without generating any random numbers at runtime.

Milestones

  1. #22
  2. #32
  3. #33

License

Copyright AStartup; all rights reserved.

Implement 128-bit LID MVP (v0.1)

Problem

The problem that I have solved when I pass this milestone is that I need a prototype of an algorithm that generates 128-bit unique ids without generating random numbers at runtime.

Solution

The solution that got us past this milestone was to use a 36-bit second timestamp in the MSB, a 22-bit sub-second ticker, and a 70-bit randomly generated upon boot source id.

Problems with Solution

Different users will want different numbers of bits for the timestamp, subsecond ticker, and source ids. Also, I'm new to JavaScript and I bozoed how to use BigInt. For some reason, I thought that a BigInt was just a 64-bit integer, but it's an n-bit integer and I was still in the C++ mindset, so this code is not optimal. For 0.2.0 I will completely rewrite the algorithms to use BigInt correctly.

File Affected

  1. **/*.*

Hierarchy

Tags

Milestone

License

Copyright AStartup; all rights reserved.

Debug algorithm

Problem

The problem I am addressing on this mission is that the code I checked in I just hammed out quickly but did not debug.

Solution

The solution that I'm addressing on this mission is that I debugged the algorithm and set it up to be an NPM package but have not uploaded the package. There are still console.log("...") statements.

File Affected

  1. ?

Hierarchy

Tags

Mission

A

Sessions

License

Copyright AStartup; all rights reserved.

Implement n-bit LIDs (v0.3)

Problem

The problem we must solve to pass this milestone is that people may want to use any number of bits for the timestamp, subsecond ticker, or source id.

Solution

The solution that passes this milestone is to use a descriptor object to configure the number of bits for the timestamp, subsecond ticker, and source id.

Requirements

  1. The system shall be...

Problems with Solution

The problems with this solution are...

File Affected

  1. **/*.*

Hierarchy

Tags

Milestone

License

Copyright AStartup; all rights reserved.

Implement hybrid 64-bit 128-bit LIDs and Local LID (v0.2)

Problem

The problem we must solve to pass this milestone is that it's much faster to search through inodes using 64-bit indexes.

Solution

The solution that passes this milestone is to implement an MVP of the 64-bit LID using the same LID128 format with different bit counts.

Requirements

  1. The system shall be resistant to timestamp epoch cycles.
  2. The system shall be able to differentiate between 64 and 128-bit LIDs and contiguous UIDs.
    1. The asserted MSb of the timestamp shall indicate a contiguous UI and all UIDs shall be 31-bit.
      1. Reasoning: This makes indexes a positive 32-bit signed integer, and all negative numbers are LIDs.

Problems with Solution

The problems with this solution are...

File Affected

  1. **/*.*

Hierarchy

Tags

Milestone

License

Copyright AStartup; all rights reserved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.