GithubHelp home page GithubHelp logo

brunodb's Introduction

brunodb

Brunodb is a lightweight but useful python interface for sqlite and postgres. It is tailored to data science workflows which are basically high throughout streaming computation patterns (rather than transactional patterns).

The idea is to use databases instead of files and also do most of your work in pure python in streaming fashion rather than using batch libraries like pandas and other data frame libraries. Databases allow for operations like joins, ordering and simple aggregations without having to put everything in memory.

The idea of the library is part of a strategy to enable very productive proof of concepts on local resources (your laptop) which can migrate naturally and painlessly into production applications without extensive rewrites. Brunodb can be an efficient solution by itself for moderate data sizes. Streaming pattern pipelines can be ported to Spark or some distributed cluster compute system fairly easily.

Brunodb frees you from some of the lower level details of dealing with these python database clients. It gives you any easy and natural way to schema and load data from either files or streams. It gives you some shortcuts for doing queries while also allowing you full SQL functionality when you need it. It makes working on either SQLite or Postgres the same. And it allows for very fast bulk loads for Postgres by levering the dbcrossbar library.

There are no real dependencies besides sqlite3 which is a standard library module and pytest for running tests. psycopg2 and a postgres database is needed to run the interface on postgres. dbcrossbar (easy to install rust library) is required for doing extremely fast bulk loads of postgres.

To install

pip install brunodb

See here for a demo:

Or to run demo:

from brunodb import demo
demo()

To run tests:

python -m pytest test

If you have postgres installed, you can test it as well. You'll need to put the database password in the POSTGRES_PWD environment variable and have the usual standards: running on localhost, usual port, user name postgres etc.

python -m pytest test_postscript 

If you install dbcrossbar you can do much faster postgres loads. Around 80X faster.

python -m pytest test_postgres_bulk_load

Or run all tests if you have postgres and dbcrossbar installed

python -m pytest

There is a wrapper for either Database class called DBase:

For in memory sqlite database:

from brunodb import DBase
config = {'db_type': 'sqlite'}
dbase = DBase(config)

Or with a file:

config = {'db_type': 'sqlite', 'filename': 'path/my_database.db'}
dbase = DBase(config)

Or using postgres:

config = {'db_type': 'postgres'}
dbase = DBase(config)

Or add other config options:

config = {'db_type': 'postgres', 'port': 5555, 'password':'foo'}
dbase = DBase(config)

brunodb's People

Watchers

 avatar

brunodb's Issues

Using brunodb for batch uploading (insert and update)

Hi there!

Upon investigating dbcrossbar I stumpled across this package. Both dbcrossbar and brunodb look very promising for my use-case. However, as specific examples are missing (i.e. similar to what I want to) and with little experience with database I wanted to ask for advice if brunodb is suited for what I want to do.

I currently have scraped data (~ 50 000 records) which I scraped every month. Every run I get two types of records: 1) completely new records and 2) records which require a value to be updated. My current plan is to simply dump everything in a json file and use brunodb (or dbcrossbar) to upsert into a postgreSQL table. Hopefully I can get reliable INSERT and UPDATE operations this way. If this doesn't work I can simply created an insert file and update file, and again use brunodb for this operation.

Does this sound like a proper use-case of this package? Or does only dbcrossbar already suffice, or is dbcrossbar also not suited for this idea.

If it is, I'll dive a bit deeper and possibly contribute to the package if necessary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.