GithubHelp home page GithubHelp logo

fagan2888 / antarctic Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tschm/antarctic

0.0 1.0 0.0 3.12 MB

Persist Pandas objects within a MongoDB database

License: MIT License

Dockerfile 0.02% Makefile 0.01% Python 0.16% Jupyter Notebook 99.81%

antarctic's Introduction

Antarctic

CI Release Binder

Project to persist Pandas data structures in a MongoDB database.

Installation

pip install antarctic

Usage

This project (unless the popular arctic project which I admire) is based on top of MongoEngine, see https://pypi.org/project/mongoengine/ MongoEngine is an ORM for MongoDB. MongoDB stores documents. We introduce here two new fields --- one for a Pandas Series and one for a Pandas DataFrame.

from mongoengine import Document, connect
from antarctic.PandasFields import SeriesField, FrameField

# connect with your existing MongoDB (here I am using a popular interface mocking a MongoDB)
client = connect(db="test", host="mongomock://localhost")

# Define the blueprint for a portfolio document
class Portfolio(Document):
    nav = SeriesField()
    weights = FrameField()
    prices = FrameField()

The portfolio objects works exactly the way you think it works

p = Portfolio()
p.nav = pd.Series(...)
p.prices = pd.DataFrame(...)
p.save()

print(p.nav)
print(p.prices)

Behind the scenes we convert the both Series and Frame objects into json documents and store them in a MongoDB database.

We don't apply any clever conversion into compressed bytestreams. Performance is not our main concern here.

Database?

Storing json or bytestream representations of Pandas objects is not exactly a database. Appending is rather expensive as one would have to extract the original Pandas object, append to it and convert the new object back into a json or bytestream representation. Clever sharding can mitigate such effects but at the end of the day you shouldn't update such objects too often. Often practitioners use a small database for recording (e.g. over the last 24h) and update the MongoDB database once a day. It's extremely fast to read the Pandas objects out of such a construction.

Also note that in theory one could try to build this on top of pyarrow and support both R and Python.

antarctic's People

Contributors

tschm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.