GithubHelp home page GithubHelp logo

mongodict's Introduction

mongodict - MongoDB-backed Python dict-like interface

So you are storing some key-values in a dict but your data became huge than your memory or you want to persist it on the disk? Then mongodict is for you!

As it uses MongoDB to store the data, you get all cool MongoDB things, like shardings and replicas. It uses the pickle module available on Python standard library to serialize/deserialize data and store everything as bson.Binary in MongoDB. You can also provide another codec (serializer/deserializer).

mongodict is tested under Python 2.7.5 and Python 3.3.2.

Installation

As simple as:

pip install mongodict

Usage

As it uses collections.MutableMapping as its base, you just need to change the line which creates your dict. For instace, just replace:

>>> my_dict = {}

with:

>>> from mongodict import MongoDict
>>> my_dict = MongoDict(host='localhost', port=27017, database='my_dict',
                        collection='store')

and then use it like a normal dict:

>>> my_dict['python'] = 'rules'
>>> print my_dict['python']
rules
>>> del my_dict['python']
>>> print my_dict['python']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mongodict.py", line 82, in __getitem__
    raise KeyError(key)
KeyError: u'python'
>>> my_dict['spam'] = 'eggs'
>>> my_dict['ham'] = 'damn'
>>> for key, value in my_dict.items():
...    print '{} = {}'.format(key, value)
...
spam = eggs
ham = damn

If you want to use another codec, you should pass serialize and deserialize functions to the class during the initialization. For example, to use JSON:

>>> import json
>>> json_dict = MongoDict(host='localhost', port=27017,
                          database='json_dict', collection='store',
                          codec=(json.dumps, json.loads))
>>> # use json_dict as usual

Enjoy! :-)

Note

There is no kind of in-memory cache, so all key lookups will be translated in a MongoDB query but as MongoDB's server put everything it can in memory, probably it'll not be a problem (if your working set is always entire in memory).

Authentication

If you want to use MongoDB's authentication to the database MongoDict is connecting to, you just need to provide an auth parameter, as in this example:

from mongodict import MongoDict


my_dict = MongoDict(host='localhost', port=27017, database='mydb',
                    collection='mongodict',
                    auth=('my username', 'my password'))

Why not Redis?

Redis is "remote directory server" - it's a great piece of software and can do the job if all your data fit on memory. By other side, MongoDB already have mature sharding and replica set features. So, if you need to store lots of key-value pairs that don't fit on memory, mongodict can solve your problem.

Note

mongodict does not have the same API other key-value software have (like memcached). Some features are missing to compete directly with these kind of software (by now we have only the dict-like behaviour), but I have plans to add it soon.

Contributing

You can run the tests either with or without tox.

Without tox

This is the simplest approach: you'll test only for one Python version. To do it, just execute:

mkvirtualenv --no-site-packages mongodict-without-tox
pip install -r requirements/develop.txt
make test

With tox

With tox you can test for more than one Python version (currently for 2.7 and 3.2). You just need to create a virtualenv, install and run it:

mkvirtualenv --no-site-packages tox-for-mongodict
pip install tox
tox

tox will create one virtualenv for each Python version, install requirements and then run the tests for each of them. Note that you need the python binaries available in your system (2.7 and 3.2) to run the tests.

Author

This software was written and is maintained by Álvaro Justen (aka Turicas). Please contact me at alvarojusten at gmail dot com.

License

It's licensed under GPL version 3.

mongodict's People

Contributors

flavioamieiro avatar jucacrispim avatar turicas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mongodict's Issues

Add authentication option

Although the default is to not have authentication, some MongoDB databases have. We need to provide a way to specify user and password.

Specify index type (key only or key + value)

I was using mongodict inside PyPLN and got this error:

Tue Nov 12 00:54:41.380 [conn3]  pypln_demo.analysis ERROR: key too large len:41575 max:1024 41575 pypln_demo.analysis.$_id_1_value_1

The value is too large to store it in a compound index, so there should be an option on MongoDict's instantiation to specify which kind of index I want (key-only or key + value).

Store objects using pickle

Currently, pymongo is responsible for serializing key and value objects. This is not a good behaviour since it uses JSON and JSON is not fully compatible with Python objects. For example, if we want to store a tuple, the stored value will be a JSON array and the value deserialized will be a list (not a tuple at all).
So, we need to serialized key and values using pickle and then store the result using bson.Binary. We also need to deserialize using pickle, obviously.

Don't force utf-8 encoding

from mongodict import MongoDict
from bson import Binary
md = MongoDict()
pdf_chunk = '%PDF-1.4\n%\xc7\xec\x8f\xa2\n5 0 o'
md['blob'] = Binary(pdf_chunk)

UnicodeDecodeError: 'utf8' codec can't decode byte 0xc7 in position 10: invalid continuation byte

Use pickle to unknown objects

Currently it'll work only for objects pymongo can store (as key or as value). It should pickle other types of objects.

OperationFailure: .... failed: exception: distinct too big, 16mb cap

In [1]: from mongodict import MongoDict

In [2]: data_dict = MongoDict(host='XXX.XXX.XXX.XXX', port=27017, database='XXXXX', collection='XXXXXX',safe=True)

In [4]: for i in data_dict.keys():
    raw_input()
    print i
   ...:     
---------------------------------------------------------------------------
OperationFailure                          Traceback (most recent call last)
 in ()
----> 1 for i in data_dict.keys():
      2     raw_input()
      3     print i
      4 

/usr/lib/python2.7/_abcoll.pyc in keys(self)
    364 
    365     def keys(self):
--> 366         return list(self)
    367 
    368     def items(self):

/home/xxxxx/.virtualenvs/xxxxxxx/lib/python2.7/site-packages/mongodict.pyc in __iter__(self)
     96     def __iter__(self):
     97         ''' Iterate over all stored keys '''
---> 98         for result in iter(self._collection.distinct('_id')):
     99             yield result
    100 

/home/xxxxx/.virtualenvs/xxxxx/lib/python2.7/site-packages/pymongo/collection.pyc in distinct(self, key)
   1076         .. versionadded:: 1.1.1
   1077         """
-> 1078         return self.find().distinct(key)
   1079 
   1080     def map_reduce(self, map, reduce, out, full_response=False, **kwargs):

/home/xxxxx/.virtualenvs/xxxxx/lib/python2.7/site-packages/pymongo/cursor.pyc in distinct(self, key)
    578                                 self.__collection.name,
    579                                 uuid_subtype = self.__uuid_subtype,
--> 580                                 **options)["values"]
    581 
    582     def explain(self):

/home/xxxxx/.virtualenvs/xxxxx/lib/python2.7/site-packages/pymongo/database.pyc in command(self, command, value, check, allowable_errors, uuid_subtype, **kwargs)
    391             msg = "command %s failed: %%s" % repr(command).replace("%", "%%")
    392             helpers._check_command_response(result, self.connection.disconnect,
--> 393                                             msg, allowable_errors)
    394 
    395         return result

/home/xxxxx/.virtualenvs/xxxxx/lib/python2.7/site-packages/pymongo/helpers.pyc in _check_command_response(response, reset, msg, allowable_errors)
    142                                (details["assertionCode"],))
    143                 raise OperationFailure(ex_msg, details.get("assertionCode"))
--> 144             raise OperationFailure(msg % details["errmsg"])
    145 
    146 

OperationFailure: command SON([('distinct', u'dict_processos'), ('key', '_id')]) failed: exception: distinct too big, 16mb cap

Provide a memcached-like API

Should provide a memcached-like API, with methods like:

  • MongoDict.set(key, value)
  • MongoDict.get(key, default=None)
  • MongoDict.delete(key)
  • MongoDict.incr(key, increment=1)
  • MongoDict.decr(key, decrement=1)
  • and so on...

Optimize queries

All queries (except for __del__) can be optimized ot use indexOnly=true (only hits the index).
An example can be seen on this gist.

Run on Python's dict tests

The tests around Python dict implementation are in Python and are customizable to use another class instead of dict. So, to ensure that mongodict's API is compatible with dict's, it's better to use dict's tests.
The tests can be found in the Python source at ./Lib/test/mapping_tests.py.

Use the same connection or collection object

If we are using lots of MongoDicts it's better to reuse the same connection for all of them. MongoDict.__init__ should have an option to receive a connection object.
If you have a MongoDB connection in your code, maybe you want to pass only the collection object, so MongoDict.__init__ should accept it as a parameter too.

PyMongo 'Connection' API was changed in version 3

Currently, with the latest pymongo, the following exception is raised:

AttributeError                            Traceback (most recent call last)

/Users/prashantsinha/.virtualenvs/revisions/lib/python3.5/site-packages/mongodict.py in __init__(self, host, port, database, collection, codec, safe, auth, default, index_type)
     54         `auth` must be (login, password)'''
     55         super(MongoDict, self).__init__()
---> 56         self._connection = pymongo.Connection(host=host, port=port, safe=safe)
     57         self._safe = safe
     58         self._db = self._connection[database]

AttributeError: module 'pymongo' has no attribute 'Connection'

This is because PyMongo now has MongoClient.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.