aergoio / litetree Goto Github PK

SQLite with Branches

License: MIT License

C 98.35% Makefile 0.07% Python 1.58%

litetree's Introduction

LiteTree: SQLite with Branches

Imagine being able to have many connections to the same database, each one reading a separate branch or commit at the same time. Or even writing to separate branches.

This is possible with LiteTree. It is a modification of the SQLite engine to support branching, like git!

Database branching is a very useful tool for blockchain implementations and LiteTree will be at the core of Aergo.

This is how it works:

Each database transaction is saved as a commit, and each commit has an incremental number. Let's consider an empty db in which we run this first SQL command:

CREATE TABLE t1 (name)

Now it will have the first commit (number 1) in the automatically created master branch:

When we execute new transactions it will add new commits to the current branch:

INSERT INTO t1 VALUES ('first')
INSERT INTO t1 VALUES ('second')

Now we have 3 commits:

To include many SQL commands in a single commit we must enclose them in BEGIN and COMMIT commands.

We create new branches informing the source branch and commit number:

PRAGMA new_branch=test at master.2

After this command is executed the new branch is created but without any new data added to it. The database connection also moves to this new branch, having it as the current branch.

We can check the current branch with the command:

PRAGMA branch

In this case it will return: test

If we execute a SQL command on this db connection the commit will be saved in the connection's current branch:

INSERT INTO t1 VALUES ('from test branch')

Now the graph state will be:

We can also read the database at this new branch:

SELECT * FROM t1

It will return these values:

first

from test branch

We can move to the master branch:

PRAGMA branch=master

And executing the same SELECT command (but now in the master branch) it will return:

first

second

Different content for the same table on separate branches.

Commits in separate branches have the same numbering based on the distance from the first commit:

We can read the database in a previous point-in-time by moving to that commit, like this:

PRAGMA branch=master.2

At this point the table t1 has a single row and if we do a SELECT it will return just first.

We cannot write to the database when we are in a defined commit, writing is only possible at the head of each branch. If you want to make modifications to some previous commit you must create a new branch that starts at that commit.

It is also possible to truncate a branch at a specific commit, rename a branch, delete it and retrieve branch info.

Supported commands

Selecting the active branch:
```
 PRAGMA branch=<name>
```
Selecting a specific commit in a branch:
```
 PRAGMA branch=<name>.<commit>
```
Retrieving the current/active branch:
```
 PRAGMA branch
```
Listing the existing branches:
```
 PRAGMA branches
```

Creating a new branch:

 PRAGMA new_branch=<name> at <source>.<commit>

Deleting a branch:
```
 PRAGMA del_branch(<name>)
```

Renaming a branch:

 PRAGMA rename_branch <old_name> <new_name>

Truncating a branch at a specific commit:

 PRAGMA branch_truncate(<name>.<commit>)

Displaying the tree structure:
```
 PRAGMA branch_tree
```
Retrieving the branch info:
```
 PRAGMA branch_info(<name>)
```
Showing the commit and SQL log/history for a branch:
```
 PRAGMA branch_log(<name>)
```

Not yet available

Some of these commands are being developed:

Modifying a commit:

 PRAGMA branch_log [--set|--add|--del] <name> <sql commands>

Showing the diff between 2 branches or commits:

 PRAGMA branch_diff <from_branch>[.<commit>] <to_branch>[.<commit>]

Save metadata to each branch and/or commit
Merging 2 branches

And maybe these extended features could be supported:

Access control by branch

Check the roadmap on our wiki. Feature requests and suggestions are welcome.

Technologies

We can use LiteTree with big databases (many gigabytes). There is no data copying when a new branch is created. When a new transaction is commited only the modified database pages are copied.

LiteTree is implemented storing the SQLite db pages on LMDB.

The data is not compressed, and each db page is stored on just one disk sector (4096 bytes by default). This is achieved by reserving some bytes at each SQLite db page so it can fit into one LMDB overflow page, that can hold 4080 (4096 - 16) bytes.

Performance

LiteTree is way faster than normal SQLite (journal mode) with comparable performance to WAL mode.

Here are the some results:

Linux

writing:
--------
normal   = 22.8921730518 seconds
wal      = 10.7780168056 seconds
mmap     = 10.4009709358 seconds
litetree = 10.8633410931 seconds

reading:
--------
normal   = 0.817955970764 seconds
wal      = 0.660045146942 seconds
mmap     = 0.592491865158 seconds
litetree = 0.619393110275 seconds

MacOSX

writing:
--------
normal   = 1.9102909565 seconds
wal      = 1.30300784111 seconds
mmap     = 1.21677088737 seconds
litetree = 0.988132953644 seconds

reading:
--------
normal   = 0.999235868454 seconds
wal      = 0.776713132858 seconds
mmap     = 0.653935909271 seconds
litetree = 0.714652061462 seconds

Windows

writing:
--------
normal   = 68.0931215734 seconds
litetree = 39.239919979 seconds

reading:
--------
normal   = 0.012673914421 seconds
litetree = 0.00631055510799 seconds

You can make your own benchmark (after installing LiteTree) with this command:

make benchmark

Current Limits

Number of branches: 1024 branches (can be increased)

Number of commits per branch: 2^64 = 18,446,744,073,709,551,615 commits

Concurrent db connections to the same db: XXX readers

Some Limitations

A database file created in one architecture cannot be used in another. This is a limitation of LMDB. We need to dump the database using mdb_dump and load it using mdb_load.

The db file cannot be opened by unmodified SQLite libraries.

Savepoints are not yet supported.

How to use

LiteTree can be used in many programming languages via existing SQLite wrappers.

Update your app to open the database file using an URI containing the branches parameter, like this:
```
“file:data.db?branches=on”
```
Make your app use this new library instead of the pre-installed SQLite library:

On Linux

This can be achieved in 4 ways:

Using the LD_LIBRARY_PATH environment variable:
```
 LD_LIBRARY_PATH=/usr/local/lib/litetree ./myapp
```
This can be used with all programming languages and wrappers.
Patching your wrapper or app to search for the library in the new path:
```
 patchelf --set-rpath /usr/local/lib/litetree lib_or_app
```
Setting the rpath at the link time:
```
 LIBPATH = /usr/local/lib/litetree
 gcc myapp.c -Wl,-rpath,$(LIBPATH) -L$(LIBPATH) -lsqlite3
```
You can use this if your app is linking directly to the LiteTree library.
Replacing the pre-installed SQLite library on your system

This can also be used with many programming languages. But use it with care because the native library may have been compiled with different directives.

On Mac OSX

This can be achieved in these ways:

Patching your wrapper or app to search for the library in the new path:
```
 install_name_tool -change /old/path/to/libsqlite3.dylib /usr/local/lib/litetree/libsqlite3.dylib lib_or_app
```
You can check the old path with this command:
```
 otool -L lib_or_app
```
This method can be used with all programming languages and wrappers as long as they are not protected by the OS.

It it is protected then you will need to install a new copy of the wrapper, modify it and use it instead of the protected one.
Using the DYLD_LIBRARY_PATH environment variable:
```
 DYLD_LIBRARY_PATH=/usr/local/lib/litetree ./myapp
```
This can be used if the wrapper was linked to just the library name and does not contain any path.

If it does not work we can patch the wrapper to not contain any path:
```
 install_name_tool -change /old/path/to/libsqlite3.dylib libsqlite3.dylib lib_or_app
```
But if you are able to modify the wrapper with install_name_tool then the first method above may be better.

Linking to the LiteTree library:

 gcc myapp.c -L/usr/local/lib/litetree -lsqlite3

On Windows

Copy the modified SQLite library to the system folder.

On 64 bit Windows:

C:\Windows\System32 (if 64 bit DLL)

C:\Windows\SysWOW64 (if 32 bit DLL)
On 32 bit Windows:

C:\Windows\System32

Compiling and installing

On Linux and Mac OSX

Install LMDB if not already installed:

git clone https://github.com/lmdb/lmdb
cd lmdb/libraries/liblmdb
make
sudo make install

Then install LiteTree:

git clone https://github.com/aergoio/litetree
cd litetree
make
sudo make install

On Windows

You can use these pre-compiled binaries: (can be outdated)

Or follow these steps:

Compile LMDB using MinGW or Visual Studio (1 or 2)
Compile LiteTree using MinGW or Visual Studio
Copy the libraries to the Windows System folder

Running the Tests

The tests are written in Python using the pysqlite wrapper.

On MacOSX we cannot use a modified SQLite library with the pre-installed system python due to the System Integrity Protection so we need to install another copy of pysqlite and link it to the LiteTree library:

git clone https://github.com/ghaering/pysqlite
cd pysqlite
echo "include_dirs=/usr/local/include" >> setup.cfg
echo "library_dirs=/usr/local/lib/litetree" >> setup.cfg
python setup.py build
sudo python setup.py install

To run the tests:

make test

License

MIT

Creator

Developed by Bernardo Ramos at

litetree's People

Contributors

Stargazers

Watchers

litetree's Issues

branch and thread ?

Is the multi branch work with multi thread or process ?

I mean for example branch master work on thread A, and test for thread B at same time without lock ?

does it support merge branch?

Builtin multiindex primary key support

Hi. I guess it may be not a right place to post a bug report, but I gonna post it here since sqlite project has some issues with its bug tracker and maillist.

The problem is that I need FAST lookup by 2 indexex and ranges. Currently I solve this problem by packing 2 numbers into one rowid by concatenating, I mean major_index << n_bits | minor_index and lookups like where oid == (major_index << n_bits | minor_index) and where oid >= major_index << n_bits and oid <= major_index << n_bits | minor_index_mask. This solves the problem pretty well, but ... this ruins schema since parts of rowid cannot be a part of schema, I cannot store this relation in DB and make DB engine enforce consistency. I mean that major_index is a foreign key referring a primary key in another table.

I wonder if it makes sense have this feature built into the db.

Command line interface

Is there any command line interface to interact with litetree database?

Where is the sources without amalgamation ?

Hello !
Nice project but I wonder why you do not also show the individual src files (no amalgamation) for other to see what changes you did to original sqlite3 or even contribute to your project.
Cheers !

Size limit 512 MB

When my .db file reaches 512 MB, every update/insert query creates an SQL logic error.

Keeping up with SQLite development and special SQLite features?

Some questions (maybe FAQ candidates) regarding the whole SQLite environment:

How actual is the SQLite engine in litetree? Can the SQLite code be updated manually to keep up with SQLite versions? Or are there are many patches necessary?
Are SQLite features like encryption, spital indexing extensions, etc. transparently supported?
Is it right, that the SQLite session extension could be completely replaced by litetree?

Document how to merge newer SQLite releases

As a fork, LiteTree is inevitably going to need updating as newer versions of SQLite are released (At the moment SQLite is up to 3.30, while litetree is still based on 3.27.2.) Keeping up with SQLite can be very important if one needs new features or bug fixes.

How does one perform such an update? All I see in here are heavily modified versions of sqlite3.c and sqlite3.h, not any tools for updating the source. Does one just run a 3-way-merge tool and feed it litetree's source, the unmodified version of the same SQLite release, and the new SQLite release?

It would be good to document this, so potential users of the library can feel that they're not going to be stuck on an old release of SQLite forever, or be dependent on someone else to keep it up to date for them.

Error building make test

I am on High Sierra. I tried to follow the given instruction to build litetree on my machine and it gave me some errors as follows:

macoss-iMac:litetree macos$ make test
cd test && python test.py -v
test00_read_config (__main__.TestSQLiteBranches) ... ERROR
test01_branches (__main__.TestSQLiteBranches) ... FAIL
test02_branch_info (__main__.TestSQLiteBranches) ... FAIL
test02_branch_tree (__main__.TestSQLiteBranches) ... ERROR
test02b_sql_log (__main__.TestSQLiteBranches) ... ERROR
test03_reading_branches_at_the_same_time (__main__.TestSQLiteBranches) ... FAIL
test04_concurrent_access (__main__.TestSQLiteBranches) ... FAIL
test05_single_connection_uri (__main__.TestSQLiteBranches) ... FAIL
test06_invalid_branch_name (__main__.TestSQLiteBranches) ... ERROR
test07_rename_branch (__main__.TestSQLiteBranches) ... FAIL
test08_truncate_branch (__main__.TestSQLiteBranches) ... ERROR
test09_delete_branch (__main__.TestSQLiteBranches) ... FAIL
test10_rollback (__main__.TestSQLiteBranches) ... ERROR
test11_attached_dbs (__main__.TestSQLiteBranches) ... FAIL
test12_temporary_db (__main__.TestSQLiteBranches) ... FAIL
test13_discard_commits (__main__.TestSQLiteBranches) ... FAIL
test14_forward_merge (__main__.TestSQLiteBranches) ... FAIL
test15_forward_merge (__main__.TestSQLiteBranches) ... FAIL
test18_savepoints (__main__.TestSQLiteBranches) ... ERROR
test19_closed_connection (__main__.TestSQLiteBranches) ... ERROR
test20_open_while_writing (__main__.TestSQLiteBranches) ... ERROR
test21_internal_temporary_dbs (__main__.TestSQLiteBranches) ... ERROR
test22_normal_sqlite (__main__.TestSQLiteBranches) ... ok

Anything I can trace from the log? where are they located?

EDIT: I fixed the formatting, it did not go well with the code tag

Need a go wrapper

Since aergo blockchain kernel is written in go, is there a golang wrapper for litetree?

Compare to Fossil - SCM based on SqlLite

Doesn't work with sqlite3 ruby

I tried litetree with sqlite3 ruby gem which didn't work.

Does PRAGMA branch_truncate() deletes the "future" or the "past"?

At first look, I thought this would be great for "game saves", but one thing is not clear to me: how to prevent the DB from growing infinitely.

Keeping the entire transaction history of a DB, which is the only way one can implement "branching from a previous transaction", has a significant cost.

Is "PRAGMA branch_truncate()" used to truncate the "history of the past", thereby "compacting" it, or is it there to remove all changes after some point, thereby performing a "revert"?

If "PRAGMA branch_truncate()" is used to "revert" a branch to a previous state, then how do you "merge all changes" before some point, thereby loosing the history, to compact the database? Without it, the DB will grow infinitely, even if you only ever modify one single row in one single table.

Single file option?

Is it possible to store all the necessary information in a single file? This would make it possible to copy files around that contain the complete life-cycle.

branch PRAGMA not working: SQL logic error

I have compiled the latest litetree on macOS 10.10.5, and I cannot get the branch commands to work. From the output, the new commands are available, but for some reason just not working:

sqlite> PRAGMA new_branch=test at master.2;
Error: SQL logic error
sqlite> PRAGMA new_branch=test at master.1;
Error: SQL logic error
sqlite> PRAGMA new_branch=test at master;
Error: SQL logic error
sqlite> PRAGMA new_branch=test;
Error: SQL logic error
sqlite> PRAGMA new_branch;
Error: argument required
sqlite> PRAGMA branch;
sqlite> PRAGMA branches;

Pruning of old commits / revisions

As far as I understand, litetree works by keeping copies of all versions of any changed database pages. But that also means that the total storage requirement grows over time, particularly if the database is changed very often (even if the database itself does not grow), right? Is there a way to tell litetree that some commit / old version is no longer needed and can be discarded (like checkpointing in SQLite's WAL mode)?

Particularly for blockchain applications (which seems to be the intent of litetree) this can be useful: In order to handle chain reorgs, it is necessary to keep (persistent) snapshots of the last couple of states. But as more and more blocks are added on top of an old one, it can be pruned eventually to save space. (In some sense, revisions in litetree are like undo data in Bitcoin Core, which can (optionally) be pruned for old blocks.)

Perhaps my understanding is also completely wrong or this is already supported. If so, please let me know.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.