datastore / datastore Goto Github PK

View Code? Open in Web Editor NEW

130.0 130.0 32.0 590 KB

Unified API for multiple data stores

License: MIT License

Python 100.00%

datastore's People

Contributors

Stargazers

Watchers

datastore's Issues

FileSystemDataStore exposes whole filesystem

Passing in a key like ../../../../etc/passwd or ../../.bashc will make it read/write outside the given path.

Trying to contain user-controlled keys under a prefix doesn't work either, the resulting key path just becomes foo/bar/../../../../etc/passwd which normalizes to ../../etc/passwd.

PEP 8 compliance

I have noticed that the code is not fully PEP 8 compliant. For instance, some method names use a mixed case form instead of lowercase with underscores.

Are there any intentions to improve PEP 8 compliance? If so, how to handle backward compatibility? Thanks.

Concurrency in the absence of transactions is painful. A LockShimDatastore that provides a locking scheme for safe access to an underlying datastore would be nice. It could use a 2nd datastore to store the locks (e.g. shared memcached datastore across threads/processes).

Interface maybe?

with store.synchronize(key):
    value = store.read(key)
    new_value = modify(value)
    store.write(key, new_value)

FilesystemDatastore Error

There seems to be an error with FilesystemDatastore.

Consider:

>>> from datastore import *
>>> from datastore.impl.filesystem import FileSystemDatastore
>>> 
>>> fds = FileSystemDatastore('/tmp/dstest')
>>> fds.put(Key('/a/b/c:d'), '1')
>>> fds.get(Key('/a/b/c:d'))
'1'
>>> 
>>> list(fds.query(Query(Key('/a/b/c'))))
['1']

which all works fine.

Instead of the expected:

>>> list(fds.query(Query(Key('/a'))))
[]

An error is caused:

>>> list(fds.query(Query(Key('/a'))))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/datastore/query.py", line 502, in next
    next = self._iterator.next()
  File "/Library/Python/2.7/site-packages/datastore/impl/filesystem.py", line 140, in _read_object_gen
    yield self._read_object(filename)
  File "/Library/Python/2.7/site-packages/datastore/impl/filesystem.py", line 130, in _read_object
    raise RuntimeError('%s is a directory, not a file.' % path)
RuntimeError: /tmp/dstest/a/a is a directory, not a file.

FileSystemDatastore should either ignore subdirectories in queries, or include them recursively. But certainly not cause an error.

'error: command 'x86_64-linux-gnu-gcc' failed with exit status 1'

I wanted to install datastore using pip but was unable to do so.
This is what I get.

ubuntu@ip-172-31-29-228:~$ sudo pip install datastore
The directory '/home/ubuntu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubuntu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied (use --upgrade to upgrade): datastore in /usr/local/lib/python2.7/dist-packages/datastore-0.3.6-py2.7.egg
Collecting smhasher==0.136.2 (from datastore)
  Downloading smhasher-0.136.2.tar.gz (56kB)
    100% |################################| 61kB 2.3MB/s 
Installing collected packages: smhasher
  Running setup.py install for smhasher ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-q_RIxB/smhasher/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-IqNg1W-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'smhasher' extension
    creating build
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/smhasher
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DMODULE_VERSION="0.136.2" -Ismhasher -I/usr/include/python2.7 -c smhasher.cpp -o build/temp.linux-x86_64-2.7/smhasher.o
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DMODULE_VERSION="0.136.2" -Ismhasher -I/usr/include/python2.7 -c smhasher/MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/smhasher/MurmurHash3.o
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    smhasher/MurmurHash3.cpp:81:23: warning: always_inline function might not be inlinable [-Wattributes]
     FORCE_INLINE uint64_t fmix ( uint64_t k )
                           ^
    smhasher/MurmurHash3.cpp:68:23: warning: always_inline function might not be inlinable [-Wattributes]
     FORCE_INLINE uint32_t fmix ( uint32_t h )
                           ^
    smhasher/MurmurHash3.cpp:60:23: warning: always_inline function might not be inlinable [-Wattributes]
     FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
                           ^
    smhasher/MurmurHash3.cpp:55:23: warning: always_inline function might not be inlinable [-Wattributes]
     FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
                           ^
    smhasher/MurmurHash3.cpp: In function 'void MurmurHash3_x86_32(const void*, int, uint32_t, void*)':
    smhasher/MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten at link time
    smhasher/MurmurHash3.cpp:112:36: error: called from here
         uint32_t k1 = getblock(blocks,i);
                                        ^
    smhasher/MurmurHash3.cpp:68:23: error: inlining failed in call to always_inline 'uint32_t fmix(uint32_t)': function body can be overwritten at link time
     FORCE_INLINE uint32_t fmix ( uint32_t h )
                           ^
    smhasher/MurmurHash3.cpp:143:16: error: called from here
       h1 = fmix(h1);
                    ^
    smhasher/MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten at link time
     FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
                           ^
    smhasher/MurmurHash3.cpp:112:36: error: called from here
         uint32_t k1 = getblock(blocks,i);
                                        ^
    smhasher/MurmurHash3.cpp:68:23: error: inlining failed in call to always_inline 'uint32_t fmix(uint32_t)': function body can be overwritten at link time
     FORCE_INLINE uint32_t fmix ( uint32_t h )
                           ^
    smhasher/MurmurHash3.cpp:143:16: error: called from here
       h1 = fmix(h1);
                    ^
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-q_RIxB/smhasher/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-IqNg1W-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-q_RIxB/smhasher/

on analysis I found the error is occuring at this line

smhasher/MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten at link time
`

Merging repositories

datastore

I'm porting datastore to go, as I now need it there.

For go, it's possible to write multiple packages in one repository, e.g.:

datastore.go/  # core
datastore.go/s3

It works for python too, and the many repositories approach is not a great one.
So I want to merge all python repositories into one (datastore.py). The old repos would be deleted. (Packages are stills submitted independently, so the change is only a github one). So:

github.com/datastore/datastore                  --> github.com/datastore/datastore.py/core
github.com/datastore/datastore/datastore.s3     --> github.com/datastore.py/datastore.py/s3
github.com/datastore/datastore/datastore.redis  --> github.com/datastore.py/redis
...

Any oppositions to this?

@willembult
@ali01
@antiface
@davidlesieur
@mmahesh
@techdragon
@qiangli20171
@Dorianux
@adelevie

(Notifying people it may affect)

IndexDatastore

IndexDatastore adds query support to any datastore, by tracking indices of all parent keys. Interface:

>>> ds = IndexDatastore(ds, index_ds)
>>> ds = IndexDatastore(ds) # index_ds defaults to ds
>>>
>>> # indices updated on put
>>> ds.put(Key('/foo/bar'), 'bar')
>>> ds.put(Key('/foo/baz'), 'baz')
>>>
>>> # query support
>>> for item in ds.query(Query(Key('/foo'))):
...   print item
'bar'
'baz'
>>>
>>> # getting the index (`.index` is placeholder. extension TBD)
>>> for key in ds.get(Key('/foo.index')):
...   print key
Key('/foo/bar')
Key('/foo/baz')
>>>
>>> # indices updated on delete
>>> ds.delete(Key('/foo/baz'))
>>> for key in ds.get(Key('/foo.index')):
...   print key
Key('/foo/bar')

Transfer ownership to datastore org

I'm transfering ownership of this repo to the datastore org, and then forking a mirror to avoid breaking links.

This means the canonical repo will be datastore/datastore

DirectoryDatastore

DirectoryDatastore will support the creation of key collections. The actual method names still to be set. This just specifies the interface.

>>> ds = DirectoryDatastore(ds)
>>> ds.put(Key('/foo/bar'), 'bar')
>>> ds.put(Key('/foo/bar'), 'baz')
>>>
>>> # initialize directory at /foo
>>> ds.directory(Key('/foo')) 
>>>
>>> # adding directory entries
>>> ds.directoryAdd(Key('/foo'), Key('/foo/bar'))
>>> ds.directoryAdd(Key('/foo'), Key('/foo/baz'))  
>>>
>>> # value is a generator returning all the keys in this dir
>>> for key in ds.get(Key('/foo')):
...   print key
Key('/foo/bar')
Key('/foo/baz')
>>>
>>> # querying for a collection works
>>> for item in ds.query(Query(Key('/foo'))):
...  print item
'bar'
'baz'
>>> 
>>> # removing directory entries
>>> ds.directoryRemove(Key('/foo'), Key('/foo/baz'))
>>> for key in ds.get(Key('/foo')):
...   print key
Key('/foo/bar')

Query for specific fields

I'm exploring the library; good work!

One feature I find missing is the ability for a Query to specify fields to retrieve, instead of retrieving full objects. MongoDB allows this. I'm not too familiar with other datastores, but I guess a query could always return full objects if the datastore does not support selecting fields.

What would you think about such feature?

Thanks!

BTW, issue tracking is not enabled on https://github.com/datastore/datastore. Not sure if that is intentional or not, so I thought I'd let you know.

Unable to install

pip install datastore results in pages & pages of gcc errors after

building 'smhasher' extension

runs, ending with:

Cleaning up...
Command /home/lxch/.virtualenvs/p/bin/python -c "import setuptools;__file__='/home/lxch/.r/.virtualenvs/p/build/smhasher/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-Wgz_sk-record/install-record.txt --single-version-externally-managed --install-headers /home/lxch/.virtualenvs/p/include/site/python2.7 failed with error code 1 in /home/lxch/.r/.virtualenvs/p/build/smhasher
Traceback (most recent call last):
File "/home/lxch/.virtualenvs/p/bin/pip", line 9, in <module>
load_entry_point('pip==1.4.1', 'console_scripts', 'pip')()
File "/home/lxch/.r/.virtualenvs/p/local/lib/python2.7/site-packages/pip/__init__.py", line 148, in main
return command.main(args[1:], options)
File "/home/lxch/.r/.virtualenvs/p/local/lib/python2.7/site-packages/pip/basecommand.py", line 169, in main
text = 'n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128)

I'm not sure what libs are required for smhasher (specifically ubuntu -dev with dev headers usually need to be installed for many things to compile properly, and I'm not finding anything relevant).

Also, smhasher required by this package is fixed at an earlier version, not allowing for newer version, ie will always install a previous version (0.136.2 whereas current smhasher is 0.150 something)

Query should return keys, not values.

After discussions, @ali01 and I determined that Datastore.query should be changed to return either:

key iterator
(key, value) iterator

(currently returns a value iterator). The rationale is that one can retrieve values from keys, but not viceversa.

Note that this is a backwards-incompatible change.

Python 3.9 compatibility

(abk-py39) λ python -m pip install datastore
Collecting datastore
Using cached datastore-0.3.6.tar.gz (30 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [7 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\AppData\Local\Temp\pip-install-yhfke41y\datastore_61b3125fe70b48ed9b5ec5c847e78a47\setup.py", line 21
print 'something went wrong reading the README.md file.'
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('something went wrong reading the README.md file.')?
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I believe there is some compatibility issue with Python3.9. Are you planning to fix and release the lib for Py3.9 version?
please suggest any other alternatives.

How can we sync dynamodb with datastore in nodejs?

I am using Amplify CLI for datastore. It is working fine. But it is not syncing data with dynamodb. Means Post gets save in datastore local storage but does not go in dynamodb POST table. My code is below

const {syncExpression} = require("aws-amplify")
const {Amplify} = require("aws-amplify")
const Post = require("./models")
const awsconfig = require("./aws-exports")
Amplify.configure(awsconfig.awsmobile)

exports.handler = async (event) => {
    try {
        console.log("inside")
        
        let response=await DataStore.save(
            new Post.Post({
              title: "My second Post",
              status:"DRAFT",
              rating: 7
            })
          );
    
          console.log("Post saved successfully!");
     
        return response;
      } catch (error) {
        console.log("Error saving post", error);
      }

};

This is not storing my data to dynamodb.

Add travis-ci integration

Python 3

There are some serious issues getting this to work with Python 3 for me.

namespace packaging is a bit different (PEP420), so the current structure does not exactly translate.
Can't do this: https://github.com/datastore/datastore/blob/master/datastore/core/__init__.py#L54-L64

There are tons of other issues, but these were the first major stumbling blocks I encountered.

Wrapping

Unclear what the best way to wrap datastores is.

foo = SomeDatastore()
bar = OtherDatastore(foo)
baz = AnotherDatastore(bar)

As wrapping happens, the interface gets swallowed. This is unideal for things like SymlinkDatastore or DirectoryDatastore, which should continue providing their methods.

TimeCacheDatastore

perhaps add this?

import nanotime
import datastore.core

class TimeCacheDatastore(datastore.core.Datastore):

  sentinel = 'time_cache'

  def __init__(self, expire_seconds, child_datastore=None):

    if not child_datastore:
      child_datastore = datastore.core.DictDatastore()
    self.child_datastore = child_datastore

    self.expire = nanotime.seconds(expire_seconds)


  def valid(self, key):
    '''Whether the object pointed to by `key` is still valid.'''
    time_key = self._timestamp_key(key)
    time_value = self.child_datastore.get(time_key)
    if time_value is None:
      return False

    t = nanotime.nanoseconds(int(time_value))
    return nanotime.now() - t <= self.expire


  def get(self, key):
    '''Returns object pointed to by `key` (if timestamp still valid).'''

    try:
      valid = self.valid(key)
    except:
      valid = False

    if not valid:
      self.delete(key)
      return None

    return self.child_datastore.get(key)


  def put(self, key, value):
    '''Stores `value` pointed to by `key`, and logs a timestamp.'''
    self.child_datastore.put(key, value)

    # store time_cache timestamp
    time_key = self._timestamp_key(key)
    time_value = str(nanotime.now().nanoseconds())
    self.child_datastore.put(time_key, time_value)


  def delete(self, key):
    '''Removes object pointed to by `key`, and cleans up timestamp.'''
    self.child_datastore.delete(key)
    self.child_datastore.delete(self._timestamp_key(key))


  def _timestamp_key(self, source_key):
    '''Returns the time cache key for given `source_key`.'''
    return source_key.child(self.sentinel)

DynamoDatastore

A datastore that supports DynamoDB would be excellent. I'd be interested and eager to contribute to such a sub-project.

Discussion: how to handle datastore modules?

Note: In writing this, a clear solution seems to have emerged. However,
it's still worth posting in case anyone has feedback between now and when I
implement the change.

Motivation

datastore wraps specific datastores with a unified api. In order to do this,
datastore uses a set of specific impl (implementation) modules that
translate between the datastore api and underlying system api. (See
http://datastore.readthedocs.org/en/latest/package/datastore.impl.html)

Naturally, the impl modules depend on the packages for that given system.
A question arises: how should these modules be distributed?

Approaches

Part of the `datastore` package

Note: this is how impl modules are currently distributed.

The impl modules are contained within the datastore package. In order
to not balloon the dependencies of datastore itself, the system-specific
dependencies of the impl are not included in datastore's requirements.

Pros:

Easy to use: users only need install the datastore package. Without having to find more packages, datastore is a simple tool
Unified: all impl modules are tracked as part of the main distribution, and thus can be tested and updated as datastore changes.

Cons:

ImportErrors: Presumably, the system-specific package is already installed. If is not, however, the impl module will prompt the user to install it with an ImportError.
Untracked dependencies: impl modules actually require specific versions of the underlying system-specific packages (e.g. datastore.impl.mongo) is currently tested with MongoDB 1.8.2 and pymongo 1.11. Losing this dependency chain is cumbersome and error-prone.

As independent packages

The impl modules are developed and released as entirely independent packages
following a naming convention (e.g. datastore_mongo). They can specify their
own sets of dependencies accordingly, but require that users install every
specific impl module they want to use.

Pros:

Tracked dependencies: in this approach, the impl modules can specify their own requirements and thus can track dependencies
No ImportErrors: installing a specific impl module would also install the system-specific dependencies (at the correct version!)
Encourages Independence: developers can easily play with and create their own subsystem modules without having to worry about forking the main project. (?)

Cons:

Chaotic: disjointed development is likely to produce sets of modules that do not fall out of date with mainline datastore development, potentially introducing more headaches than datastore solves.
Namespace Pollution: the development, questions, and distibution of these independent packages would pollute namespaces and fragment tags in services like pypi, stackoverflow, and github.

Extras within the `datastore` package

A hybrid of the previous two approaches, incluing the impl modules within the
datastore package as extras seems to be the best of both worlds. impl
modules are included in the main distribution. Dependencies are tracked via
the extras_requre field.

within datastore/setupy.py:

setup(
    ...
    name="datastore",
    extras_require = {
        'mongo':  ["pymongo>=1.11"],
        ...
    },
    ...
)

within my_project/setup.py

setup(
    ...
    install_requires = ['datastore[mongo]'],
    ...
)

This should do the right thing (install datastore by itself, and install the
datastore[mongo]' dependencies whenmy_project` is installed).

Pros:

Easy to use: users only need install the datastore package. Without having to find more packages, datastore is a simple tool
Unified: all impl modules are tracked as part of the main distribution, and thus can be tested and updated as datastore changes.
Tracked dependencies: in this approach, the impl modules can specify their own requirements and thus can track dependencies

Cons:

Unclear: this approach may be unclear to the average user, and the ImportError problem may still emerge when developing. This may be a non-issue with good documentation.

Links:

Notes

I am leaning towards the extras approach. However, I just found out about it, and have had no (conscious) experience using it. So any notes on taking this route (in particular cautionary tales) would be helpful to read before taking this direction.

ObjectMapper

It would be great to have an object mapper on top of datastore, along the lines of dronestore or django's.

The interface would be something like:

>>> from datastore.objects import Model, Attribute
>>>
>>> # define a class with attributes
>>> class Person(Model):
...   name = Attribute(type=str, required=True)
...   age = Attribute(type=int, default=0)
...   phone = Attribute(type=str)
>>>
>>> juan = Person('juan')
>>> juan.key
Key('/person:juan')
>>> juan.name = 'Juan Benet'
>>> juan.age = '24'
TypeError: '24' is not of type 'int'
>>> juan.age = 24
>>>
>>> # wrap a ds with a special object manager ds
>>> omds = ObjectDatastore(ds, [Person])
>>>
>>> # put instances directly
>>> omds.put(juan.key, juan)
>>> omds.put(juan) # key defaults to value.key
>>> 
>>> # get returns instances
>>> juan2 = omds.get(juan.key)
>>> juan2 == juan
True
>>> juan2 is juan
False
>>> 
>>> # put non-instances fails
>>> omds.put(juan.key, 'foo')
TypeError: 'foo' is not of type '<Person>'
>>>
>>> # underlying values are serialized
>>> # (support for changing omds.serializer)
>>> ds.get(juan.key)
{
  "key": "/person:juan",
  "name": "Juan Benet",
  "age": 24,
}

datastore on PyPi

It'd be really cool if datastore were on PyPi -- I'd like to be able to put datastore in a project's install_requires and have pip automatically install it.

datastore / datastore Goto Github PK

datastore's People

Contributors

Stargazers

Watchers

Forkers

datastore's Issues

datastore

Motivation

Approaches

Part of the datastore package

As independent packages

Extras within the datastore package

Notes

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Part of the `datastore` package

Extras within the `datastore` package