GithubHelp home page GithubHelp logo

miserlou / nodb Goto Github PK

View Code? Open in Web Editor NEW
380.0 17.0 45.0 59 KB

NoDB isn't a database.. but it sort of looks like one.

Home Page: https://blog.zappa.io/posts/introducing-nodb-pythonic-data-store-s3

Python 99.10% Shell 0.90%
python s3 zappa serverless nodb nosql json aws lambda

nodb's Introduction

NoDB

Build Status Coverage PyPI Slack Gun.io Patreon

NoDB isn't a database.. but it sort of looks like one!

NoDB an incredibly simple, Pythonic object store based on Amazon's S3 static file storage.

It's useful for prototyping, casual hacking, and (maybe) even low-traffic server-less backends for Zappa apps!

Features

  • Schema-less!
  • Server-less!
  • Uses S3 as a datastore.
  • Loads to native Python objects with cPickle
  • Can use JSON as a serialization format for untrusted data
  • Local filestore based caching
  • Cheap(ish)!
  • Fast(ish)! (Especially from Lambda)

Performance

Initial load test with Goad of 10,000 requests (500 concurrent) with a write and subsequent read of the same index showed an average time of 400ms. This should be more than acceptable for many applications, even those which don't have sparse data, although that is preferred.

Installation

NoDB can be installed easily via pip, like so:

$ pip install nodb

Warning!

NoDB is insecure by default! Do not use it for untrusted data before setting serializer to "json"!

Usage

NoDB is super easy to use!

You simply make a NoDB object, point it to your bucket and tell it what field you want to index on.

from nodb import NoDB

nodb = NoDB("my-s3-bucket")
nodb.index = "name"

After that, you can save and load literally anything you want, whenever you want!

# Save an object!
user = {"name": "Jeff", "age": 19}
nodb.save(user) # True

# Load our object!
user = nodb.load("Jeff")
print(user['age']) # 19

# Delete our object
nodb.delete("Jeff") # True

By default, you can save and load any Python object.

Here's the same example, but with a class. Note the import and configuration is the same!

class User(object):
    name = None
    age = None
    
    def print_name(self):
        print("Hi, I'm " + self.name + "!")
    
    def __repr__(self):
        """ show a human readable representation of this class """
        return "<%s: %s (%s)>" % (self.__class__.__name__, self.name, self.age)

new_user = User()
new_user.name = "Jeff"
new_user.age = 19
nodb.save(new_user) 
# True

jeff = nodb.load("Jeff")
jeff.print_name() 
# Hi, I'm Jeff!

You can return a list of all objects using the .all() method.

Here's an example following from the code above, adding some extra users to the database and then listing all.

newer_user = User()
newer_user.name = "Ben"
newer_user.age = 38
nodb.save(newer_user)
# True

newest_user = User()
newest_user.name = "Thea"
newest_user.age = 33
nodb.save(newest_user)
# True

nodb.all()
# [<User: Jeff (19)>, <User: Ben (38)>, <User: Thea (33)>]

Advanced Usage

Different Serializers

To use a safer, non-Pickle serializer, just set JSON as your serializer:

nodb = NoDB()
nodb.serializer = "json"

Note that for this to work, your object must be JSON-serializable.

Object Metadata

You can get metainfo (datetime and UUID) for a given object by passing metainfo=True to load, like so:

# Load our object and metainfo!
user, datetime, uuid = nodb.load("Jeff", metainfo=True)

You can also pass in a default argument for non-existent values.

user = nodb.load("Jeff", default={}) # {}

Human Readable Indexes

By default, the indexes are hashed. If you want to be able to debug through the AWS console, set human_readable_indexes to True:

nodb.human_readable_indexes = True

Caching

You can enable local file caching, which will store previously retrieved values in the local rather than remote filestore.

nodb.cache = True

AWS settings override

You can override your AWS Profile information or boto3 session by passing either as a initial keyword argument.

nodb = NoDB(profile_name='my_aws_development_profile')
# or supply the session
session = boto3.Session(
    aws_access_key_id=ACCESS_KEY,
    aws_secret_access_key=SECRET_KEY,
    aws_session_token=SESSION_TOKEN,
)
nodb = NoDB(session=session)

TODO (Maybe?)

  • Tests with Placebo
  • Local file storage
  • Quering ranges (numberic IDs only), etc.
  • Different serializers
  • Custom serializers
  • Multiple indexes
  • Compression
  • Bucket management
  • Pseudo-locking
  • Performance/load testing

Related Projects

  • Zappa - Python's server-less framework!
  • K.E.V. - a Python ORM for key-value stores based on Redis, S3, and a S3/Redis hybrid backend.
  • s3sqlite - An S3-backed database engine for Django

Contributing

This project is still young, so there is still plenty to be done. Contributions are more than welcome!

Please file tickets for discussion before submitting patches. Pull requests should target master and should leave NoDB in a "shippable" state if merged.

If you are adding a non-trivial amount of new code, please include a functioning test in your PR. For AWS calls, we use the placebo library, which you can learn to use in their README. The test suite will be run by Travis CI once you open a pull request.

Please include the GitHub issue or pull request URL that has discussion related to your changes as a comment in the code (example). This greatly helps for project maintainability, as it allows us to trace back use cases and explain decision making.

License

(C) Rich Jones 2017, MIT License.


Made by Gun.io

nodb's People

Contributors

bendog avatar bojakeca avatar nephlm avatar szilagyiabo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nodb's Issues

Compatibility warnings

Great little project, thanks for this.

FYI a few warnings come up when installing, see below.

Python: 3.6.6
Pip: 10.0.1

nodb 0.3.2 has requirement botocore==1.5.38, but you'll have botocore 1.10.65 which is incompatible.
nodb 0.3.2 has requirement jmespath==0.9.2, but you'll have jmespath 0.9.3 which is incompatible.
nodb 0.3.2 has requirement python-dateutil==2.6.0, but you'll have python-dateutil 2.6.1 which is incompatible.
nodb 0.3.2 has requirement six==1.10.0, but you'll have six 1.11.0 which is incompatible.```

README.md is out of date with master branch

Changes in the code from PR #12 have broken the documentation.

  • .all() is not listed in the document at all
  • setting the bucket is done via an attribute and not at init, however, I'm considering this a different bug.

Delete doesn't invalidate the cache

If you instantiate NoDB and set cache to True, then you can save and object and delete the object.

If you load the object at at this point, the deleted object will be returned since it's still in the cache. Delete is not cache aware; it deletes from the s3 bucket, but doesn't update the cache.

nodb=NoDB()
nodb.cache=True
nodb.index = 'name'
nodb.save({'name': 'jeff', 'age': 19})
nodb.delete('jeff')
nodb.load('jeff') #=> {'name': 'jeff', 'age': 19}

That's probably not the expected behavior.

setting the bucket via an attribute fails

since PR #12 was merged, the documented practice for setting up a bucket fails

previously

db = NoDB()
db.bucket = 'mybucket'

now

db = NoDB('mybucket')

whilst the new method makes a lot of sense, we shouldn't break all the existing code.

Error using NoDB on python3

When I tried the basic example I got the following error:

user = {"name": "Jeff", "age": 19} nodb.save(user) # True

AttributeError: 'dict' object has no attribute 'has_key'
which seems like a python3 compatibility issue.
Anyone can help?

.all() not working

Im using the latest version pip install git+https://github.com/Miserlou/NoDB.git

pip list | ag nodb
nodb                          0.4.0  

After the i run the example,

I tried to show all the values stored using nodb.all()

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-3-9ca61584a6c9> in <module>
----> 1 nodb.all()

~/miniconda3/lib/python3.7/site-packages/nodb/__init__.py in all(self, metainfo)
    188             serialized = obj.get()["Body"].read()
    189             # deserialize and add to list
--> 190             deserialized_objects.append(self._deserialize(serialized))
    191 
    192         # sort by insert datetime

~/miniconda3/lib/python3.7/site-packages/nodb/__init__.py in _deserialize(self, serialized)
    236 
    237         obj = None
--> 238         deserialized = json.loads(serialized)
    239         return_me = {}
    240 

~/miniconda3/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

~/miniconda3/lib/python3.7/json/decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 1 column 3 (char 2)

Test error

I found some error while on test.

Traceback (most recent call last):
  File "/Users/kwangin/workspace/github_source/NoDB/tests/tests.py", line 56, in test_nodb
    nodb._deserialize(serialized)
  File "/Users/kwangin/workspace/github_source/NoDB/nodb/__init__.py", line 170, in _deserialize
    return_me['obj'] = pickle.loads(base64.b64decode(des_obj))
  File "/Users/kwangin/.pyenv/versions/3.5.1/lib/python3.5/base64.py", line 90, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

It looks like known one because it has been commented as #python 3 in decoding line.
Is it right?

Python3 Support

This should be a pretty easy fix - I think. The only constraint is that that both Python2 and Python3 versions need to be able to read the same data!

Authorization problem for fresh AWS regions

us-default supports both v2 and v4 auth, but Frankfurt for example doesn't support v2 (used by default).
So instead of:

s3 = boto3.resource('s3')

It shall be

boto3.resource('s3', config=botocore.client.Config(signature_version='s3v4'))

(or please document some global way to do it).

bug

Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

from nodb import NoDB
nodb = NoDB("my-s3-bucket")
Traceback (most recent call last):
File "", line 1, in
TypeError: NoDB() takes no arguments
nodb = NoDB('my-s3-bucket')
Traceback (most recent call last):
File "", line 1, in
TypeError: NoDB() takes no arguments
nodb = NoDB()
nodb.index = "name"
user = {"name": "Jeff", "age": 19}
nodb.save(user)
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\nodb_init_.py", line 63, in save
real_index = self.get_object_index(obj, self.index)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\nodb_init
.py", line 241, in _get_object_index
if obj.has_key(index):
AttributeError: 'dict' object has no attribute 'has_key'

.all() not working

#31 is still an issue.

I've installed 0.5.1 from tag 0.5.1:
pipenv install git+https://github.com/Miserlou/[email protected]#egg=nodb
and tried locally from the 0.5.1 tarball also

(btw, the version in setup.py still shows 0.4.0)

nodb.all() returns the error

Traceback (most recent call last):
File "", line 1, in
File "/home/dschofield/clients/mdg/NoDB-0.5.1/nodb/init.py", line 202, in all
deserialized_objects.append(self._deserialize(serialized))
File "/home/dschofield/clients/mdg/NoDB-0.5.1/nodb/init.py", line 252, in _deserialize
deserialized = json.loads(serialized)
File "/usr/local/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Use generic cloud storage API

While this is obviously intended for use with AWS, we will probably want to support other providers in the future they implement serverless frameworks. Using a library like OFS will allow this to support local file systems, as well as provide a mechanism for supporting other providers going forward.

Potential dependency conflicts between NoDB and botocore

Hi, as shown in the following full dependency graph of NoDB, NoDB requires botocore (the latest version), while the installed version of s3transfer(0.2.1) requires botocore>=1.12.36,<2.0.0.

According to Pip's “first found wins” installation strategy, botocore 1.12.199 is the actually installed version.

Although the first found package version botocore 1.12.199 just satisfies the later dependency constraint (botocore>=1.12.36,<2.0.0), it will lead to a build failure once developers release a newer version of botocore in the near future, which is greater than 2.0.0.

Dependency tree--------

NoDB(version range:)
| +-appdirs(version range:>=1.4.3)
| +-boto3(version range:>=1.4.4)
| +-botocore(version range:>=1.5.38)
| +-docutils(version range:>=0.13.1)
| +-funcsigs(version range:>=1.0.2)
| +-futures(version range:>=3.0.5)
| +-jmespath(version range:>=0.9.2)
| +-packaging(version range:>=16.8)
| +-pbr(version range:>=2.0.0)
| +-pyparsing(version range:>=2.2.0)
| +-python-dateutil(version range:==2.6.0)
| +-s3transfer(version range:>=0.1.10)
| | +-botocore(version range:>=1.12.36,<2.0.0)
| +-six(version range:>=1.10.0)

Thanks for your attention.
Best,
Neolith

Divide classes in __init__

This is just a suggestion, because I'm not a good python user.
It looks like all functions such as CRUD and serializing logic are all in init file, and it seems not a good design. Maybe you could not have much time to do it on.

How about divide these to a seperate python files?

Compression

Rather than just storing B64 encoded data, we could compress it as well.

.all()

This is another obvious omission.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.