GithubHelp home page GithubHelp logo

simonw / sqlite-diffable Goto Github PK

View Code? Open in Web Editor NEW
96.0 2.0 4.0 30 KB

Tools for dumping/loading a SQLite database to diffable directory structure

License: Apache License 2.0

Python 100.00%
sqlite datasette-tool datasette-io

sqlite-diffable's Introduction

sqlite-diffable

PyPI Changelog License

Tools for dumping/loading a SQLite database to diffable directory structure

Installation

pip install sqlite-diffable

Demo

The repository at simonw/simonwillisonblog-backup contains a backup of the database on my blog, https://simonwillison.net/ - created using this tool.

Dumping a database

Given a SQLite database called fixtures.db containing a table facetable, the following will dump out that table to the dump/ directory:

sqlite-diffable dump fixtures.db dump/ facetable

To dump out every table in that database, use --all:

sqlite-diffable dump fixtures.db dump/ --all

Loading a database

To load a previously dumped database, run the following:

sqlite-diffable load restored.db dump/

This will show an error if any of the tables that are being restored already exist in the database file.

You can replace those tables (dropping them before restoring them) using the --replace option:

sqlite-diffable load restored.db dump/ --replace

Converting to JSON objects

Table rows are stored in the .ndjson files as newline-delimited JSON arrays, like this:

["a", "a", "a-a", 63, null, 0.7364712141640124, "$null"]
["a", "b", "a-b", 51, null, 0.6020187290499803, "$null"]

Sometimes it can be more convenient to work with a list of JSON objects.

The sqlite-diffable objects command can read a .ndjson file and its accompanying .metadata.json file and output JSON objects to standard output:

sqlite-diffable objects fixtures.db dump/sortable.ndjson

The output of that command looks something like this:

{"pk1": "a", "pk2": "a", "content": "a-a", "sortable": 63, "sortable_with_nulls": null, "sortable_with_nulls_2": 0.7364712141640124, "text": "$null"}
{"pk1": "a", "pk2": "b", "content": "a-b", "sortable": 51, "sortable_with_nulls": null, "sortable_with_nulls_2": 0.6020187290499803, "text": "$null"}

Add -o to write that output to a file:

sqlite-diffable objects fixtures.db dump/sortable.ndjson -o output.txt

Add --array to output a JSON array of objects, as opposed to a newline-delimited file:

sqlite-diffable objects fixtures.db dump/sortable.ndjson --array

Output:

[
{"pk1": "a", "pk2": "a", "content": "a-a", "sortable": 63, "sortable_with_nulls": null, "sortable_with_nulls_2": 0.7364712141640124, "text": "$null"},
{"pk1": "a", "pk2": "b", "content": "a-b", "sortable": 51, "sortable_with_nulls": null, "sortable_with_nulls_2": 0.6020187290499803, "text": "$null"}
]

Storage format

Each table is represented as two files. The first, table_name.metadata.json, contains metadata describing the structure of the table. For a table called redirects_redirect that file might look like this:

{
    "name": "redirects_redirect",
    "columns": [
        "id",
        "domain",
        "path",
        "target",
        "created"
    ],
    "schema": "CREATE TABLE [redirects_redirect] (\n   [id] INTEGER PRIMARY KEY,\n   [domain] TEXT,\n   [path] TEXT,\n   [target] TEXT,\n   [created] TEXT\n)"
}

It is an object with three keys: name is the name of the table, columns is an array of column strings and schema is the SQL schema text used for tha table.

The second file, table_name.ndjson, contains newline-delimited JSON for every row in the table. Each row is represented as a JSON array with items corresponding to each of the columns defined in the metadata.

That file for the redirects_redirect.ndjson table might look like this:

[1, "feeds.simonwillison.net", "swn-everything", "https://simonwillison.net/atom/everything/", "2017-10-01T21:11:36.440537+00:00"]
[2, "feeds.simonwillison.net", "swn-entries", "https://simonwillison.net/atom/entries/", "2017-10-01T21:12:32.478849+00:00"]
[3, "feeds.simonwillison.net", "swn-links", "https://simonwillison.net/atom/links/", "2017-10-01T21:12:54.820729+00:00"]

sqlite-diffable's People

Contributors

simonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

zeta1999 sthagen m8e

sqlite-diffable's Issues

Restoring dumps which contain AUTOINCREMENT columns

Hi, great tool, this is exactly what I'm looking for.

I do however have an issue when using AUTOINCREMENT columns.

If you create a table like this

CREATE TABLE IF NOT EXISTS DummyTable (
    DummyTableId INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
);

and insert a row into that table

INSERT INTO DummyTable VALUES(NULL);

the exported dump will contain both the .metadata.json and the .json file of the sqlite_sequence table.

When trying to import that, the import into a new file fails with:

Error: object name reserved for internal use: sqlite_sequence

Importing into an existing file will fail when trying to drop that table.

Updating the table seems to work though, so programming a special case for this to just insert/update any values might work

Ability to round-trip binary data

e.g. for the binary numbits column in the .coverage SQLite database generated by coveragepy.

Those currently end up represented like this:

[4, 1, "b'\\xfe\\xff\\xfd{\\xe0\\x02\\x10\\x00W}o\\xdb{\\xef}o\\xef\\xbd\\xf7\\x92\\xe8\\x00\\x00\\xca\\t\\xe0\\xfb\\xdf\\x07y\\xdb\\xbe\\xf3\\x97s\\xd7\\xd8\\xeb\\x06\\xd9Y\\x16A\\x17\\xe6\\x02\\x02 @\\x08\\x10\\x00\\xbcH\\xc1$@\\xf7}?\\x01\\x04 \\x00\\x00\\x00\\x00\\x04%\\x00\\x04\\x00\\x00\\x00\\x00\\x00<\\x17H\\x00\\x00\\x12 \\xe9\\xc8\\x08\\x00\\x00\\x00\\x00\\x00\\x00@\\x00\\x00\\x00\\xd4M\\xb5\\x18\\x00w\\xd7\\xdd\\xdd\\xb6m\\xba\\xa9\\xe0\\xa7\\xf3Z\\x82\\xfbN\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00$`\\x00\\x04'"]

Once I implement the load command (#3) these will be a problem, because they won't round-trip correctly.

I need some kind of special-case syntax for storing binary values such that they can be round-tripped properly.

`load` should handle case where database tables already exist

% sqlite-diffable load simonwillisonblog.db simonwillisonblog
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/sqlite-diffable-J1UjzcIN/bin/sqlite-diffable", line 33, in <module>
    sys.exit(load_entry_point('sqlite-diffable', 'console_scripts', 'sqlite-diffable')())
  File "/Users/simon/.local/share/virtualenvs/sqlite-diffable-J1UjzcIN/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/simon/.local/share/virtualenvs/sqlite-diffable-J1UjzcIN/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/simon/.local/share/virtualenvs/sqlite-diffable-J1UjzcIN/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/simon/.local/share/virtualenvs/sqlite-diffable-J1UjzcIN/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/simon/.local/share/virtualenvs/sqlite-diffable-J1UjzcIN/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/simon/Dropbox/Development/sqlite-diffable/sqlite_diffable/cli.py", line 90, in load
    db.execute(schema)
  File "/Users/simon/.local/share/virtualenvs/sqlite-diffable-J1UjzcIN/lib/python3.10/site-packages/sqlite_utils/db.py", line 465, in execute
    return self.conn.execute(sql)
sqlite3.OperationalError: table "blog_entry" already exists

That error should be neater - but there should also be options for running this against a previously created database.

Command for outputting a dump as NL objects

Rows are currently stored on disk in this format:

[1, "Simon"]
[2, "Cleo"]

You have to consult the accompanying metadata JSON file to find out what the column names are.

A command that streams out one of these dumps as newline separated objects with the column names as keys would be useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.