datastore / datastore Goto Github PK
View Code? Open in Web Editor NEWUnified API for multiple data stores
License: MIT License
Unified API for multiple data stores
License: MIT License
Passing in a key like ../../../../etc/passwd
or ../../.bashc
will make it read/write outside the given path.
Trying to contain user-controlled keys under a prefix doesn't work either, the resulting key path just becomes foo/bar/../../../../etc/passwd
which normalizes to ../../etc/passwd
.
I have noticed that the code is not fully PEP 8 compliant. For instance, some method names use a mixed case form instead of lowercase with underscores.
Are there any intentions to improve PEP 8 compliance? If so, how to handle backward compatibility? Thanks.
Concurrency in the absence of transactions is painful. A LockShimDatastore that provides a locking scheme for safe access to an underlying datastore would be nice. It could use a 2nd datastore to store the locks (e.g. shared memcached datastore across threads/processes).
Interface maybe?
with store.synchronize(key):
value = store.read(key)
new_value = modify(value)
store.write(key, new_value)
There seems to be an error with FilesystemDatastore
.
Consider:
>>> from datastore import *
>>> from datastore.impl.filesystem import FileSystemDatastore
>>>
>>> fds = FileSystemDatastore('/tmp/dstest')
>>> fds.put(Key('/a/b/c:d'), '1')
>>> fds.get(Key('/a/b/c:d'))
'1'
>>>
>>> list(fds.query(Query(Key('/a/b/c'))))
['1']
which all works fine.
Instead of the expected:
>>> list(fds.query(Query(Key('/a'))))
[]
An error is caused:
>>> list(fds.query(Query(Key('/a'))))
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Library/Python/2.7/site-packages/datastore/query.py", line 502, in next
next = self._iterator.next()
File "/Library/Python/2.7/site-packages/datastore/impl/filesystem.py", line 140, in _read_object_gen
yield self._read_object(filename)
File "/Library/Python/2.7/site-packages/datastore/impl/filesystem.py", line 130, in _read_object
raise RuntimeError('%s is a directory, not a file.' % path)
RuntimeError: /tmp/dstest/a/a is a directory, not a file.
FileSystemDatastore
should either ignore subdirectories in queries, or include them recursively. But certainly not cause an error.
I wanted to install datastore using pip but was unable to do so.
This is what I get.
ubuntu@ip-172-31-29-228:~$ sudo pip install datastore
The directory '/home/ubuntu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubuntu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied (use --upgrade to upgrade): datastore in /usr/local/lib/python2.7/dist-packages/datastore-0.3.6-py2.7.egg
Collecting smhasher==0.136.2 (from datastore)
Downloading smhasher-0.136.2.tar.gz (56kB)
100% |################################| 61kB 2.3MB/s
Installing collected packages: smhasher
Running setup.py install for smhasher ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-q_RIxB/smhasher/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-IqNg1W-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'smhasher' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/smhasher
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DMODULE_VERSION="0.136.2" -Ismhasher -I/usr/include/python2.7 -c smhasher.cpp -o build/temp.linux-x86_64-2.7/smhasher.o
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DMODULE_VERSION="0.136.2" -Ismhasher -I/usr/include/python2.7 -c smhasher/MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/smhasher/MurmurHash3.o
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
smhasher/MurmurHash3.cpp:81:23: warning: always_inline function might not be inlinable [-Wattributes]
FORCE_INLINE uint64_t fmix ( uint64_t k )
^
smhasher/MurmurHash3.cpp:68:23: warning: always_inline function might not be inlinable [-Wattributes]
FORCE_INLINE uint32_t fmix ( uint32_t h )
^
smhasher/MurmurHash3.cpp:60:23: warning: always_inline function might not be inlinable [-Wattributes]
FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
^
smhasher/MurmurHash3.cpp:55:23: warning: always_inline function might not be inlinable [-Wattributes]
FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
^
smhasher/MurmurHash3.cpp: In function 'void MurmurHash3_x86_32(const void*, int, uint32_t, void*)':
smhasher/MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten at link time
smhasher/MurmurHash3.cpp:112:36: error: called from here
uint32_t k1 = getblock(blocks,i);
^
smhasher/MurmurHash3.cpp:68:23: error: inlining failed in call to always_inline 'uint32_t fmix(uint32_t)': function body can be overwritten at link time
FORCE_INLINE uint32_t fmix ( uint32_t h )
^
smhasher/MurmurHash3.cpp:143:16: error: called from here
h1 = fmix(h1);
^
smhasher/MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten at link time
FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
^
smhasher/MurmurHash3.cpp:112:36: error: called from here
uint32_t k1 = getblock(blocks,i);
^
smhasher/MurmurHash3.cpp:68:23: error: inlining failed in call to always_inline 'uint32_t fmix(uint32_t)': function body can be overwritten at link time
FORCE_INLINE uint32_t fmix ( uint32_t h )
^
smhasher/MurmurHash3.cpp:143:16: error: called from here
h1 = fmix(h1);
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-q_RIxB/smhasher/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-IqNg1W-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-q_RIxB/smhasher/
on analysis I found the error is occuring at this line
smhasher/MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten at link
time
`
I'm porting datastore to go, as I now need it there.
For go, it's possible to write multiple packages in one repository, e.g.:
datastore.go/ # core
datastore.go/s3
It works for python too, and the many repositories approach is not a great one.
So I want to merge all python repositories into one (datastore.py
). The old repos would be deleted. (Packages are stills submitted independently, so the change is only a github one). So:
github.com/datastore/datastore --> github.com/datastore/datastore.py/core
github.com/datastore/datastore/datastore.s3 --> github.com/datastore.py/datastore.py/s3
github.com/datastore/datastore/datastore.redis --> github.com/datastore.py/redis
...
Any oppositions to this?
@willembult
@ali01
@antiface
@davidlesieur
@mmahesh
@techdragon
@qiangli20171
@Dorianux
@adelevie
(Notifying people it may affect)
IndexDatastore
adds query support to any datastore, by tracking indices of all parent keys. Interface:
>>> ds = IndexDatastore(ds, index_ds)
>>> ds = IndexDatastore(ds) # index_ds defaults to ds
>>>
>>> # indices updated on put
>>> ds.put(Key('/foo/bar'), 'bar')
>>> ds.put(Key('/foo/baz'), 'baz')
>>>
>>> # query support
>>> for item in ds.query(Query(Key('/foo'))):
... print item
'bar'
'baz'
>>>
>>> # getting the index (`.index` is placeholder. extension TBD)
>>> for key in ds.get(Key('/foo.index')):
... print key
Key('/foo/bar')
Key('/foo/baz')
>>>
>>> # indices updated on delete
>>> ds.delete(Key('/foo/baz'))
>>> for key in ds.get(Key('/foo.index')):
... print key
Key('/foo/bar')
I'm transfering ownership of this repo to the datastore org, and then forking a mirror to avoid breaking links.
This means the canonical repo will be datastore/datastore
DirectoryDatastore
will support the creation of key collections. The actual method names still to be set. This just specifies the interface.
>>> ds = DirectoryDatastore(ds)
>>> ds.put(Key('/foo/bar'), 'bar')
>>> ds.put(Key('/foo/bar'), 'baz')
>>>
>>> # initialize directory at /foo
>>> ds.directory(Key('/foo'))
>>>
>>> # adding directory entries
>>> ds.directoryAdd(Key('/foo'), Key('/foo/bar'))
>>> ds.directoryAdd(Key('/foo'), Key('/foo/baz'))
>>>
>>> # value is a generator returning all the keys in this dir
>>> for key in ds.get(Key('/foo')):
... print key
Key('/foo/bar')
Key('/foo/baz')
>>>
>>> # querying for a collection works
>>> for item in ds.query(Query(Key('/foo'))):
... print item
'bar'
'baz'
>>>
>>> # removing directory entries
>>> ds.directoryRemove(Key('/foo'), Key('/foo/baz'))
>>> for key in ds.get(Key('/foo')):
... print key
Key('/foo/bar')
I'm exploring the library; good work!
One feature I find missing is the ability for a Query
to specify fields to retrieve, instead of retrieving full objects. MongoDB allows this. I'm not too familiar with other datastores, but I guess a query could always return full objects if the datastore does not support selecting fields.
What would you think about such feature?
Thanks!
BTW, issue tracking is not enabled on https://github.com/datastore/datastore. Not sure if that is intentional or not, so I thought I'd let you know.
pip install datastore results in pages & pages of gcc errors after
building 'smhasher' extension
runs, ending with:
Cleaning up...
Command /home/lxch/.virtualenvs/p/bin/python -c "import setuptools;__file__='/home/lxch/.r/.virtualenvs/p/build/smhasher/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-Wgz_sk-record/install-record.txt --single-version-externally-managed --install-headers /home/lxch/.virtualenvs/p/include/site/python2.7 failed with error code 1 in /home/lxch/.r/.virtualenvs/p/build/smhasher
Traceback (most recent call last):
File "/home/lxch/.virtualenvs/p/bin/pip", line 9, in <module>
load_entry_point('pip==1.4.1', 'console_scripts', 'pip')()
File "/home/lxch/.r/.virtualenvs/p/local/lib/python2.7/site-packages/pip/__init__.py", line 148, in main
return command.main(args[1:], options)
File "/home/lxch/.r/.virtualenvs/p/local/lib/python2.7/site-packages/pip/basecommand.py", line 169, in main
text = 'n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128)
I'm not sure what libs are required for smhasher (specifically ubuntu -dev with dev headers usually need to be installed for many things to compile properly, and I'm not finding anything relevant).
Also, smhasher required by this package is fixed at an earlier version, not allowing for newer version, ie will always install a previous version (0.136.2 whereas current smhasher is 0.150 something)
After discussions, @ali01 and I determined that Datastore.query
should be changed to return either:
key iterator
(key, value) iterator
(currently returns a value iterator
). The rationale is that one can retrieve values from keys, but not viceversa.
Note that this is a backwards-incompatible change.
(abk-py39) λ python -m pip install datastore
Collecting datastore
Using cached datastore-0.3.6.tar.gz (30 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [7 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\AppData\Local\Temp\pip-install-yhfke41y\datastore_61b3125fe70b48ed9b5ec5c847e78a47\setup.py", line 21
print 'something went wrong reading the README.md file.'
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('something went wrong reading the README.md file.')?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I believe there is some compatibility issue with Python3.9. Are you planning to fix and release the lib for Py3.9 version?
please suggest any other alternatives.
I am using Amplify CLI for datastore. It is working fine. But it is not syncing data with dynamodb. Means Post gets save in datastore local storage but does not go in dynamodb POST table. My code is below
const {syncExpression} = require("aws-amplify")
const {Amplify} = require("aws-amplify")
const Post = require("./models")
const awsconfig = require("./aws-exports")
Amplify.configure(awsconfig.awsmobile)
exports.handler = async (event) => {
try {
console.log("inside")
let response=await DataStore.save(
new Post.Post({
title: "My second Post",
status:"DRAFT",
rating: 7
})
);
console.log("Post saved successfully!");
return response;
} catch (error) {
console.log("Error saving post", error);
}
};
This is not storing my data to dynamodb.
There are some serious issues getting this to work with Python 3 for me.
namespace packaging is a bit different (PEP420), so the current structure does not exactly translate.
Can't do this: https://github.com/datastore/datastore/blob/master/datastore/core/__init__.py#L54-L64
There are tons of other issues, but these were the first major stumbling blocks I encountered.
Unclear what the best way to wrap datastores is.
foo = SomeDatastore()
bar = OtherDatastore(foo)
baz = AnotherDatastore(bar)
As wrapping happens, the interface gets swallowed. This is unideal for things like SymlinkDatastore
or DirectoryDatastore
, which should continue providing their methods.
perhaps add this?
import nanotime
import datastore.core
class TimeCacheDatastore(datastore.core.Datastore):
sentinel = 'time_cache'
def __init__(self, expire_seconds, child_datastore=None):
if not child_datastore:
child_datastore = datastore.core.DictDatastore()
self.child_datastore = child_datastore
self.expire = nanotime.seconds(expire_seconds)
def valid(self, key):
'''Whether the object pointed to by `key` is still valid.'''
time_key = self._timestamp_key(key)
time_value = self.child_datastore.get(time_key)
if time_value is None:
return False
t = nanotime.nanoseconds(int(time_value))
return nanotime.now() - t <= self.expire
def get(self, key):
'''Returns object pointed to by `key` (if timestamp still valid).'''
try:
valid = self.valid(key)
except:
valid = False
if not valid:
self.delete(key)
return None
return self.child_datastore.get(key)
def put(self, key, value):
'''Stores `value` pointed to by `key`, and logs a timestamp.'''
self.child_datastore.put(key, value)
# store time_cache timestamp
time_key = self._timestamp_key(key)
time_value = str(nanotime.now().nanoseconds())
self.child_datastore.put(time_key, time_value)
def delete(self, key):
'''Removes object pointed to by `key`, and cleans up timestamp.'''
self.child_datastore.delete(key)
self.child_datastore.delete(self._timestamp_key(key))
def _timestamp_key(self, source_key):
'''Returns the time cache key for given `source_key`.'''
return source_key.child(self.sentinel)
A datastore that supports DynamoDB would be excellent. I'd be interested and eager to contribute to such a sub-project.
Note: In writing this, a clear solution seems to have emerged. However,
it's still worth posting in case anyone has feedback between now and when I
implement the change.
datastore
wraps specific datastores with a unified api. In order to do this,
datastore
uses a set of specific impl
(implementation) modules that
translate between the datastore
api and underlying system api. (See
http://datastore.readthedocs.org/en/latest/package/datastore.impl.html)
Naturally, the impl
modules depend on the packages for that given system.
A question arises: how should these modules be distributed?
datastore
packageNote: this is how impl
modules are currently distributed.
The impl
modules are contained within the datastore
package. In order
to not balloon the dependencies of datastore
itself, the system-specific
dependencies of the impl
are not included in datastore
's requirements.
Pros:
datastore
package. Without having to find more packages, datastore
is a simple toolimpl
modules are tracked as part of the main distribution, and thus can be tested and updated as datastore
changes.Cons:
ImportError
s: Presumably, the system-specific package is already installed. If is not, however, the impl
module will prompt the user to install it with an ImportError
.impl
modules actually require specific versions of the underlying system-specific packages (e.g. datastore.impl.mongo
) is currently tested with MongoDB 1.8.2
and pymongo 1.11
. Losing this dependency chain is cumbersome and error-prone.The impl
modules are developed and released as entirely independent packages
following a naming convention (e.g. datastore_mongo). They can specify their
own sets of dependencies accordingly, but require that users install every
specific impl
module they want to use.
Pros:
impl
modules can specify their own requirements and thus can track dependenciesImportError
s: installing a specific impl
module would also install the system-specific dependencies (at the correct version!)Cons:
datastore
development, potentially introducing more headaches than datastore solves.pypi
, stackoverflow
, and github
.datastore
packageA hybrid of the previous two approaches, incluing the impl
modules within the
datastore
package as extras seems to be the best of both worlds. impl
modules are included in the main distribution. Dependencies are tracked via
the extras_requre
field.
within datastore/setupy.py
:
setup(
...
name="datastore",
extras_require = {
'mongo': ["pymongo>=1.11"],
...
},
...
)
within my_project/setup.py
setup(
...
install_requires = ['datastore[mongo]'],
...
)
This should do the right thing (install datastore by itself, and install the
datastore[mongo]' dependencies when
my_project` is installed).
Pros:
datastore
package. Without having to find more packages, datastore
is a simple toolimpl
modules are tracked as part of the main distribution, and thus can be tested and updated as datastore
changes.impl
modules can specify their own requirements and thus can track dependenciesCons:
ImportError
problem may still emerge when developing. This may be a non-issue with good documentation.Links:
I am leaning towards the extras
approach. However, I just found out about it, and have had no (conscious) experience using it. So any notes on taking this route (in particular cautionary tales) would be helpful to read before taking this direction.
It would be great to have an object mapper on top of datastore
, along the lines of dronestore
or django
's.
The interface would be something like:
>>> from datastore.objects import Model, Attribute
>>>
>>> # define a class with attributes
>>> class Person(Model):
... name = Attribute(type=str, required=True)
... age = Attribute(type=int, default=0)
... phone = Attribute(type=str)
>>>
>>> juan = Person('juan')
>>> juan.key
Key('/person:juan')
>>> juan.name = 'Juan Benet'
>>> juan.age = '24'
TypeError: '24' is not of type 'int'
>>> juan.age = 24
>>>
>>> # wrap a ds with a special object manager ds
>>> omds = ObjectDatastore(ds, [Person])
>>>
>>> # put instances directly
>>> omds.put(juan.key, juan)
>>> omds.put(juan) # key defaults to value.key
>>>
>>> # get returns instances
>>> juan2 = omds.get(juan.key)
>>> juan2 == juan
True
>>> juan2 is juan
False
>>>
>>> # put non-instances fails
>>> omds.put(juan.key, 'foo')
TypeError: 'foo' is not of type '<Person>'
>>>
>>> # underlying values are serialized
>>> # (support for changing omds.serializer)
>>> ds.get(juan.key)
{
"key": "/person:juan",
"name": "Juan Benet",
"age": 24,
}
It'd be really cool if datastore were on PyPi -- I'd like to be able to put datastore in a project's install_requires
and have pip automatically install it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.