miserlou / nodb Goto Github PK

View Code? Open in Web Editor NEW

380.0 17.0 45.0 59 KB

NoDB isn't a database.. but it sort of looks like one.

Home Page: https://blog.zappa.io/posts/introducing-nodb-pythonic-data-store-s3

Python 99.10% Shell 0.90%

python s3 zappa serverless nodb nosql json aws lambda

nodb's Introduction

NoDB

NoDB isn't a database.. but it sort of looks like one!

NoDB an incredibly simple, Pythonic object store based on Amazon's S3 static file storage.

It's useful for prototyping, casual hacking, and (maybe) even low-traffic server-less backends for Zappa apps!

Features

Schema-less!
Server-less!
Uses S3 as a datastore.
Loads to native Python objects with cPickle
Can use JSON as a serialization format for untrusted data
Local filestore based caching
Cheap(ish)!
Fast(ish)! (Especially from Lambda)

Performance

Initial load test with Goad of 10,000 requests (500 concurrent) with a write and subsequent read of the same index showed an average time of 400ms. This should be more than acceptable for many applications, even those which don't have sparse data, although that is preferred.

Installation

NoDB can be installed easily via pip, like so:

$ pip install nodb

Warning!

NoDB is insecure by default! Do not use it for untrusted data before setting serializer to "json"!

Usage

NoDB is super easy to use!

You simply make a NoDB object, point it to your bucket and tell it what field you want to index on.

from nodb import NoDB

nodb = NoDB("my-s3-bucket")
nodb.index = "name"

After that, you can save and load literally anything you want, whenever you want!

# Save an object!
user = {"name": "Jeff", "age": 19}
nodb.save(user) # True

# Load our object!
user = nodb.load("Jeff")
print(user['age']) # 19

# Delete our object
nodb.delete("Jeff") # True

By default, you can save and load any Python object.

Here's the same example, but with a class. Note the import and configuration is the same!

class User(object):
    name = None
    age = None
    
    def print_name(self):
        print("Hi, I'm " + self.name + "!")
    
    def __repr__(self):
        """ show a human readable representation of this class """
        return "<%s: %s (%s)>" % (self.__class__.__name__, self.name, self.age)

new_user = User()
new_user.name = "Jeff"
new_user.age = 19
nodb.save(new_user) 
# True

jeff = nodb.load("Jeff")
jeff.print_name() 
# Hi, I'm Jeff!

You can return a list of all objects using the .all() method.

Here's an example following from the code above, adding some extra users to the database and then listing all.

newer_user = User()
newer_user.name = "Ben"
newer_user.age = 38
nodb.save(newer_user)
# True

newest_user = User()
newest_user.name = "Thea"
newest_user.age = 33
nodb.save(newest_user)
# True

nodb.all()
# [<User: Jeff (19)>, <User: Ben (38)>, <User: Thea (33)>]

Advanced Usage

Different Serializers

To use a safer, non-Pickle serializer, just set JSON as your serializer:

nodb = NoDB()
nodb.serializer = "json"

Note that for this to work, your object must be JSON-serializable.

Object Metadata

You can get metainfo (datetime and UUID) for a given object by passing metainfo=True to load, like so:

# Load our object and metainfo!
user, datetime, uuid = nodb.load("Jeff", metainfo=True)

You can also pass in a default argument for non-existent values.

user = nodb.load("Jeff", default={}) # {}

Human Readable Indexes

By default, the indexes are hashed. If you want to be able to debug through the AWS console, set human_readable_indexes to True:

nodb.human_readable_indexes = True

Caching

You can enable local file caching, which will store previously retrieved values in the local rather than remote filestore.

nodb.cache = True

AWS settings override

You can override your AWS Profile information or boto3 session by passing either as a initial keyword argument.

nodb = NoDB(profile_name='my_aws_development_profile')
# or supply the session
session = boto3.Session(
    aws_access_key_id=ACCESS_KEY,
    aws_secret_access_key=SECRET_KEY,
    aws_session_token=SESSION_TOKEN,
)
nodb = NoDB(session=session)

TODO (Maybe?)

Tests with Placebo
Local file storage
Quering ranges (numberic IDs only), etc.
Different serializers
Custom serializers
Multiple indexes
Compression
Bucket management
Pseudo-locking
Performance/load testing

Related Projects

Zappa - Python's server-less framework!
K.E.V. - a Python ORM for key-value stores based on Redis, S3, and a S3/Redis hybrid backend.
s3sqlite - An S3-backed database engine for Django

Contributing

This project is still young, so there is still plenty to be done. Contributions are more than welcome!

Please file tickets for discussion before submitting patches. Pull requests should target master and should leave NoDB in a "shippable" state if merged.

If you are adding a non-trivial amount of new code, please include a functioning test in your PR. For AWS calls, we use the placebo library, which you can learn to use in their README. The test suite will be run by Travis CI once you open a pull request.

Please include the GitHub issue or pull request URL that has discussion related to your changes as a comment in the code (example). This greatly helps for project maintainability, as it allows us to trace back use cases and explain decision making.

License

nodb's People

Contributors

Stargazers

Watchers

nodb's Issues

Compatibility warnings

Great little project, thanks for this.

FYI a few warnings come up when installing, see below.

Python: 3.6.6
Pip: 10.0.1

nodb 0.3.2 has requirement botocore==1.5.38, but you'll have botocore 1.10.65 which is incompatible.
nodb 0.3.2 has requirement jmespath==0.9.2, but you'll have jmespath 0.9.3 which is incompatible.
nodb 0.3.2 has requirement python-dateutil==2.6.0, but you'll have python-dateutil 2.6.1 which is incompatible.
nodb 0.3.2 has requirement six==1.10.0, but you'll have six 1.11.0 which is incompatible.```

README.md is out of date with master branch

Changes in the code from PR #12 have broken the documentation.

.all() is not listed in the document at all
setting the bucket is done via an attribute and not at init, however, I'm considering this a different bug.

Delete doesn't invalidate the cache

If you instantiate NoDB and set cache to True, then you can save and object and delete the object.

If you load the object at at this point, the deleted object will be returned since it's still in the cache. Delete is not cache aware; it deletes from the s3 bucket, but doesn't update the cache.

nodb=NoDB()
nodb.cache=True
nodb.index = 'name'
nodb.save({'name': 'jeff', 'age': 19})
nodb.delete('jeff')
nodb.load('jeff') #=> {'name': 'jeff', 'age': 19}

That's probably not the expected behavior.

Outdated Requirements

The requirements required here are outdated and caused issues with the xray stuff in zappa. Miserlou/Zappa#1182

Deletion of Objects

Can't believe I forgot about this until literally right now.

setting the bucket via an attribute fails

since PR #12 was merged, the documented practice for setting up a bucket fails

previously

db = NoDB()
db.bucket = 'mybucket'

now

db = NoDB('mybucket')

whilst the new method makes a lot of sense, we shouldn't break all the existing code.

Error using NoDB on python3

When I tried the basic example I got the following error:

user = {"name": "Jeff", "age": 19} nodb.save(user) # True

AttributeError: 'dict' object has no attribute 'has_key'
which seems like a python3 compatibility issue.
Anyone can help?

.all() not working

Im using the latest version pip install git+https://github.com/Miserlou/NoDB.git

pip list | ag nodb
nodb                          0.4.0

After the i run the example,

I tried to show all the values stored using nodb.all()

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-3-9ca61584a6c9> in <module>
----> 1 nodb.all()

~/miniconda3/lib/python3.7/site-packages/nodb/__init__.py in all(self, metainfo)
    188             serialized = obj.get()["Body"].read()
    189             # deserialize and add to list
--> 190             deserialized_objects.append(self._deserialize(serialized))
    191 
    192         # sort by insert datetime

~/miniconda3/lib/python3.7/site-packages/nodb/__init__.py in _deserialize(self, serialized)
    236 
    237         obj = None
--> 238         deserialized = json.loads(serialized)
    239         return_me = {}
    240 

~/miniconda3/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

~/miniconda3/lib/python3.7/json/decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 1 column 3 (char 2)

Publish latest versions to PyPI

The newer 0.3.3 release doesn't appear to have been published to PyPI. Can you publish it so that we don't have to use a GitHub link in editable mode when installing with pipenv (relevant docs)

Test error

I found some error while on test.

Traceback (most recent call last):
  File "/Users/kwangin/workspace/github_source/NoDB/tests/tests.py", line 56, in test_nodb
    nodb._deserialize(serialized)
  File "/Users/kwangin/workspace/github_source/NoDB/nodb/__init__.py", line 170, in _deserialize
    return_me['obj'] = pickle.loads(base64.b64decode(des_obj))
  File "/Users/kwangin/.pyenv/versions/3.5.1/lib/python3.5/base64.py", line 90, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

It looks like known one because it has been commented as #python 3 in decoding line.
Is it right?

S3 Select support

Hi, do you think to support S3 Select in the future?

Ranged Queries

zappa.io domain down

The website link for this repo is down due to zappa.io being down: https://blog.zappa.io/posts/introducing-nodb-pythonic-data-store-s3

there is no How To connect with amazon credentials

there is no documentation about how to pass amazon credentials

My first attent went: "The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256".

What next?

Python3 Support

This should be a pretty easy fix - I think. The only constraint is that that both Python2 and Python3 versions need to be able to read the same data!

Authorization problem for fresh AWS regions

us-default supports both v2 and v4 auth, but Frankfurt for example doesn't support v2 (used by default).
So instead of:

s3 = boto3.resource('s3')

It shall be

boto3.resource('s3', config=botocore.client.Config(signature_version='s3v4'))

(or please document some global way to do it).

bug

Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

from nodb import NoDB
nodb = NoDB("my-s3-bucket")
Traceback (most recent call last):
File "", line 1, in
TypeError: NoDB() takes no arguments
nodb = NoDB('my-s3-bucket')
Traceback (most recent call last):
File "", line 1, in
TypeError: NoDB() takes no arguments
nodb = NoDB()
nodb.index = "name"
user = {"name": "Jeff", "age": 19}
nodb.save(user)
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\nodb_init_.py", line 63, in save
real_index = self.get_object_index(obj, self.index)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\nodb_init.py", line 241, in _get_object_index
if obj.has_key(index):
AttributeError: 'dict' object has no attribute 'has_key'

.all() not working

#31 is still an issue.

I've installed 0.5.1 from tag 0.5.1:
pipenv install git+https://github.com/Miserlou/[email protected]#egg=nodb
and tried locally from the 0.5.1 tarball also

(btw, the version in setup.py still shows 0.4.0)

nodb.all() returns the error

Traceback (most recent call last):
File "", line 1, in
File "/home/dschofield/clients/mdg/NoDB-0.5.1/nodb/init.py", line 202, in all
deserialized_objects.append(self._deserialize(serialized))
File "/home/dschofield/clients/mdg/NoDB-0.5.1/nodb/init.py", line 252, in _deserialize
deserialized = json.loads(serialized)
File "/usr/local/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Use generic cloud storage API

While this is obviously intended for use with AWS, we will probably want to support other providers in the future they implement serverless frameworks. Using a library like OFS will allow this to support local file systems, as well as provide a mechanism for supporting other providers going forward.

Potential dependency conflicts between NoDB and botocore

Hi, as shown in the following full dependency graph of NoDB, NoDB requires botocore (the latest version), while the installed version of s3transfer(0.2.1) requires botocore>=1.12.36,<2.0.0.

According to Pip's “first found wins” installation strategy, botocore 1.12.199 is the actually installed version.

Although the first found package version botocore 1.12.199 just satisfies the later dependency constraint (botocore>=1.12.36,<2.0.0), it will lead to a build failure once developers release a newer version of botocore in the near future, which is greater than 2.0.0.

Dependency tree--------

NoDB(version range:)
| +-appdirs(version range:>=1.4.3)
| +-boto3(version range:>=1.4.4)
| +-botocore(version range:>=1.5.38)
| +-docutils(version range:>=0.13.1)
| +-funcsigs(version range:>=1.0.2)
| +-futures(version range:>=3.0.5)
| +-jmespath(version range:>=0.9.2)
| +-packaging(version range:>=16.8)
| +-pbr(version range:>=2.0.0)
| +-pyparsing(version range:>=2.2.0)
| +-python-dateutil(version range:==2.6.0)
| +-s3transfer(version range:>=0.1.10)
| | +-botocore(version range:>=1.12.36,<2.0.0)
| +-six(version range:>=1.10.0)

Thanks for your attention.
Best,
Neolith