scille / umongo Goto Github PK

View Code? Open in Web Editor NEW

443.0 15.0 62.0 1.52 MB

sync/async MongoDB ODM, yes.

License: MIT License

Makefile 0.63% Python 99.37%

mongodb python3 odm asyncio twisted

umongo's People

Contributors

Stargazers

Watchers

umongo's Issues

Method field

I'm in the need for computed attributes on my model objects.
Looking at marshmallow it seems the most apropriate way to do this is by implementing the Method field like:

class UserSchema(Schema):
    since_created = fields.Method("get_days_since_created")
    def get_days_since_created(self, obj):
        return dt.datetime.now().day - obj.created_at.day

Looking at umongo the method field is not implemented yet. For testing i tried subclassing a new Field-Type like

class MethodField(BaseField, marshmallow_fields.Method):
    pass

of course resulting in an error. Marshmallow seems not to have access to the method field of the subclass (at this position) since self.parent isn't the umongo object but the marshmallow schema object.

What would be the right path to implement the Method field funcionality? Was there already done something in this direction or should i dig deeper and find a solution on how to pass the method call to marshmallow?

Troubles using super() inside Document

It could be me doing things wrong.

When using inheritance, you sometimes want to use super().

AFAIU, you can't use the zero argument form super() because the compiler will pass it the template class, not the implementation class. So you have to write super(MyDocument, self).

        @instance.register
        class Doc(Document):

            name = fields.StrField()

            def __init__(self, **kwargs):
                super(Doc, self).__init__(**kwargs)

This does not look like such a big deal. I mean I can easily live with this.

Unfortunately, I'm afraid it does not work that great if registration is not done using a decorator. More precisely:

This works:

        class Doc(Document):

            name = fields.StrField()

            def __init__(self, **kwargs):
                super(Doc, self).__init__(**kwargs)

        Doc = self.instance.register(Doc)
        doc = Doc()

However, this does not:

        class Doc(Document):

            name = fields.StrField()

            def __init__(self, **kwargs):
                super(Doc, self).__init__(**kwargs)

        SuperDoc = self.instance.register(Doc)
        doc = SuperDoc()

Here's what I get:

>       super(Doc, self).__init__(**kwargs)
E       TypeError: super(type, obj): obj must be an instance or subtype of type

Indeed, in this latter case, when in __init__, type(Doc) returns

'<class 'umongo.template.MetaTemplate'>'

In other words, it worked in the first case because at init time, Doc, in the namespace, now refers to the implementation. (I'm not familiar with Python internals, but that's how it looks.)

But in the general case, it seems broken because Doc still refers to the template.

BTW, this works (for illustration purpose but pretty useless in practice):

    def test_super_in_inheritance(self):

        class Doc(Document):

            name = fields.StrField()

            def __init__(self, **kwargs):
                super(SuperDoc, self).__init__(**kwargs)

        SuperDoc = self.instance.register(Doc)
        doc = SuperDoc()

The consequence of this is that some use cases are broken:

super + Multi-driver as described in the docs.
super + Registration happening in another file

I'm in that second use case and I don't see any workaround.

super(type(self), self).__init__(**kwargs) makes the test pass but I imagine it would break in a GrandParent - Parent - Child case.

Should we add some method to fetch the right implementation?

In the method, we know self and we know the template class (or we could know it if we kept a link in to it at registration), so we could move up the inheritance tree from the implementation class until we meet the one that corresponds to the template.

Am I missing something?

Synchronizing referencer and referenced document

I've been hit by this. Sometimes you have a Document and a referenced Document and when you're working on both instances at the same time you can find yourself in a sort of broken state.

I've pushed a branch with commented tests to explain the idea.

Currently, fetch() fetches from the database, so if you have a referenced Document instance you're working on, the changes won't be seen until it is committed and the Document is reloaded. This is shown here.

The branch shows two measures that would mitigate this:

Ensure that when assigning a Document to the reference field, that Document is set as _document so that fetch returns it even if it is not committed in DB. Shown here. Admittedly It looks like a bit of a hack. The point is to see what behavior this would provide.
Provide a force_reload parameter to fetch() so that if the referenced Document instance is committed, it is possible to refresh the reference without reloading the whole Document. Shown here.

I'm not sure about the first approach. It looks like it strives for providing a transparent reference management, but this is bound to fail at some point, so we'd rather keep things explicit.

On the other hand, the force_reload parameter is explicit and has limited impact. It just avoids reloading the whole Document (which could be problematic if work is ongoing). Without this (and without the other approach described above), there is no way to get fetch() to return the updated Document if it has changed in DB.

In fact, the more I think of it, the more the force_reload approach seems like the right approach to me. Anyway, I'm exposing both here for feedback.

Converting aggregation results to umongo objects

I'm using umongo to gather documents using mongos aggregation framework:

MyDocumentClass.collection.aggregate([
            {'$match': {
                'field1': value1,
                'field2': value2
            }},
            # more pipes
        ])

The resulting documents are plain dictionaries. It wouldn't be that bad since i want to convert them to json for serving over an api anyway, but i loose umongos nice integration for dump() that way and i would have to convert objectIDs, Dates etc myself.

My current solution is iterating over the results and converting them into my desired umongo Class

raw_docs = await above_future.to_list(None)
docs = [MyDocumentClass.build_from_mongo(docs) for docs in raw_docs]

Is there a better way to do this or what is your recommended solution for converting aggregate results to umongo Document objects (i can totally see why you wouldn't do this automatically since aggregation results don't have to be anyway structured like modeled Document classes)?

abstract and allow_inheritance in EmbeddedDocument inheritance

It seems Meta attributes abstract and allow_inheritance have no effect in EmbeddedDocument. Indeed, they do not appear in EmbeddedDocumentOpts.

There is no collection consideration involved, but still a model may need to have abstract EmbeddedDocuments that are not meant to be instantiated, only subclassed. And one may want to only allow inheritance on some embedded documents.

I think I addressed those in my PR since I copied a lot (maybe too much) from Document.

Document.created

Maybe it will be better to rename this to something like is_created? Because it's common practice to use created and updated for time of creating and updating of document.

Implement per driver io_validate

EmbeddedDocument inheritance: GrandChild can't be used as Parent

While trying to create a test case for #56 (comment), I think I found another (unrelated) issue.

In test_embedded_inheritance:

        # Test child can be passed as parent
        # This passes
        MyDoc(parent={'cls': 'EmbeddedChild', 'a': 2})

        # Test grandchild can be passed as parent
        # This fails
        MyDoc(parent={'cls': 'GrandChild', 'd': 2})
        # Is this due to a recursion issue in embedded_document_cls.opts.children?
        # This fails as well
        assert 'GrandChild' in EmbeddedParent.opts.children

Either embedded_document_cls.opts.children should know about its grandchildren, either the test in EmbeddedField._deserialize should be recursive.

Deserialization of Embedded child from Mongo

I think there is an issue with deserialization of inherited embedded documents.

Basically, when a child embedded document is read from mongo, umongo should use the cls field to know which class to use, but it does not.

Something should happen in _deserialize_from_mongo.

This is what I tried to explain and illustrate in #56. This PR is pretty messy so I'm opening a new issue to hopefully make this clearer.

Here's a simple test case to reproduce. Add this code at the end of test_embedded_inheritance:

        # Test embedded child deserialization from mongo
        child = EmbeddedChild(c=69)
        doc = MyDoc(parent=child)
        mongo_data = doc.to_mongo()
        MyDoc.build_from_mongo(mongo_data)

The parent field in MyDoc does not know child is an EmbeddedChild instance so it chokes on unknown fields. It should know from cls field and pick the right EmbeddedDocumentImplmentation accordingly.

pymongo in requirements

I'm not sure, but maybe it's better to remove pymongo from requirements.
Now if I do pip install umongo -U, it upgrades pymongo to latest 3.2.2 version.
But motor requires pymongo==2.8.0, so I have to reinstall it after that.

Tests fail if python-dateutil is installed

I'm having issues running the tests.

The docs say:

$ flake8 umongo
$ py.test tests
$ tox

Running py.test from Debian Jessie

I'm using a virtualenv, but py.test does not call py.test version from the virtualenv and actually, I don't know how to let it find it. I installed Debian Jessie's packaged pytest version python3-pytest (v 2.6.3, while latest from pip is 2.9.2). To make sure it uses python virtualenv version, I call it like this:

python $(which py.test-3.4) tests

Using this command, I get a lot of errors related to skipif (see PR #16).

I also have a datetime issue:

====================================================== test session starts ======================================================
platform linux -- Python 3.4.2 -- py-1.4.25 -- pytest-2.6.3
collected 133 items 

tests/test_data_proxy.py ..............
tests/test_document.py ...................
tests/test_fields.py .F..x......
tests/test_i18n.py ...
tests/test_indexes.py .....
tests/test_inheritance.py ....
tests/test_instance.py .........
tests/test_query_mapper.py ..
tests/frameworks/test_mongomock.py s
tests/frameworks/test_motor_asyncio.py ssssssssssssssssssssss
tests/frameworks/test_pymongo.py sssssssssssssssssssss
tests/frameworks/test_txmongo.py ssssssssssssssssssssss

=========================================================== FAILURES ============================================================
___________________________________________________ TestFields.test_datetime ____________________________________________________

self = <tests.test_fields.TestFields object at 0x7f58d52d9be0>

    def test_datetime(self):

        class MySchema(EmbeddedSchema):
            a = fields.DateTimeField()

        s = MySchema(strict=True)
        data, _ = s.load({'a': datetime(2016, 8, 6)})
        assert data['a'] == datetime(2016, 8, 6)
        data, _ = s.load({'a': "2016-08-06T00:00:00Z"})
>       assert data['a'] == datetime(2016, 8, 6)
E       assert datetime.datetime(2016, 8, 6, 0, 0, tzinfo=tzutc()) == datetime.datetime(2016, 8, 6, 0, 0)
E        +  where datetime.datetime(2016, 8, 6, 0, 0) = datetime(2016, 8, 6)

tests/test_fields.py:98: AssertionError
================================== 1 failed, 65 passed, 66 skipped, 1 xfailed in 0.67 seconds ===================================

Is Jessie's py.test supposed to be "supported" or should we only use latest version from pip (in which case I'd appreciate a hint as I don't know how to run it without tox)?

Running tox and latest py.test

Both errors don't appear when running

tox -e py34-pymongo

which I suppose uses pytest from virtualenv. So these issues may be due to py.test itself.

Using tox, I get a lot of flake8 errors (including errors in umongo's dependencies) and the report says a test xfailed, but the test that failed with Jessie's pytest passes.

tests/test_data_proxy.py ..............
tests/test_document.py ...................
tests/test_fields.py ....x......
tests/test_i18n.py ...
tests/test_indexes.py .....
tests/test_inheritance.py ....
tests/test_instance.py .........
tests/test_query_mapper.py ..
tests/frameworks/test_mongomock.py s
tests/frameworks/test_motor_asyncio.py ssssssssssssssssssssss
tests/frameworks/test_pymongo.py sssssssssssssssssssss
tests/frameworks/test_txmongo.py ssssssssssssssssssssss

missing_accessor does not propagate to embedded documents

as_marshmallow_schema calls as_marshmallow_field on all fields, and if the field is an EmbeddedField, it calls as_marshmallow_schema on the embedded document class schema.

However, the missing_accessor parameter is lost along the chain as it is not passed to as_marshmallow_field.

A simple solution would be to pass it to as_marshmallow_field. It will be ignored in most cases, and used in EmbeddedField.

Currently, due to #73, this breaks the whole mongo_world feature, because when setting mongo_world to True, you can't reliably set missing_accessor to False (as it won't propagate). This specific sub-issue is solved with the simple fix here. But the issue remains for OO world schema if the user wants to set missing_accessor to False for some reason.

ListField(EmbeddedField): need to manually declare modifications in EmbeddedField

When an embedded document is modified in a ListField, the modification is not automatically detected. The list by itself is not modified, but its elements are mutables, so there either they should notify ListField when they are modified, either ListField should check its elements when asked for changes.

Here's a (failing) test: https://github.com/Nobatek/umongo/tree/embeddeddoc_listfield

Is this known issue? I'm not sure how to fix it. I tried overriding is_modified and clear_modified in List, but I ended up breaking things.

Current workaround is calling set_modified on the ListField explicitly when modifying an element (see comment in test).

Add on_set / on_get callbacks to fields

Following discussing here.

I wrote:

I still think it would be nice to have a way to call the callback at get or set time.
class User(Document):
    birthday = DateTimeField(required=True, on_set=self._compute_age)  # either here
    age = IntField(required=True, on_get=self._compute_age)  # or there

    def _compute_age(self):
        self.age = (datetime.utcnow() - self.birthday).days / 365
One nice thing with umongo is that validation occurs at set time, so the object is always valid. You don't have to wait until commit (or explicit validation) to check that. get/set callbacks would allow those dependencies between fields to be enforced anytime just as well.

@touilleMan answered:

My 0.2$ on this: Given an umongo document keeps internally the data in mongo world representation, it would be a bit cumbersome to add a check to do dynamic evaluation when retrieving a field (like you propose with on_get)
On the other hand on_set seems much easier given data always goes through BaseField.deserialize_from_mongo method. We could add a simple check there to call the function if defined.
However this will create small troubles if multiple fields are updated and have their on_set param pointing on the same callback. User would like to have the callback called once with the final value, instead it will be called twice, first time with half the data up to date...

I didn't think about the update use case. It seems this feature is more complicated than I expected.

I'd like to add a new requirement. on_[g|s]et should be lists of callbacks rather than callbacks, to allow several callbacks to be set, with a deterministic order.

class User(Document):
    birthday = DateTimeField(required=True, on_set=[self._compute_age, another_callback])  # either here
    age = IntField(required=True, on_get=[self._compute_age, yet_another_callback])  # or there

    def _compute_age(self):
        self.age = (datetime.utcnow() - self.birthday).days / 365

Mess of pymongo versions

Hi :)

It looks like pymongo driver implemented for pymongo3, but motor driver for pymongo2.
So if I install lib for motor, I can't use it with pymongo. E.g. if i try to run homepage sample, I got:

Traceback (most recent call last):
  File "test_umongo_1.py", line 18, in <module>
    goku.commit()
  File "/home/imbolc/.pyenv/versions/3.5.0/lib/python3.5/site-packages/umongo/dal/pymongo.py", line 70, in commit
    ret = self.collection.insert_one(payload)
  File "/home/imbolc/.pyenv/versions/3.5.0/lib/python3.5/site-packages/pymongo/collection.py", line 1773, in __call__
    self.__name.split(".")[-1])
TypeError: 'Collection' object is not callable. If you meant to call the 'insert_one' method on a 'Collection' object it is failing because no such method exists.

How to you test an app?

We have a web app which uses MongoDB via MongoEngine. In our tests we using https://github.com/theorm/mongobox to start a MongoDB instance on a port before the tests start, then before each test method (we are using unittest) all the collections and databases are dropped. It works for us.

But I see that Mongobox is not developing and has some not fixed bugs. This makes me to think that other people use something different in their tests.

I've seen mentions of mongomock ( http://docs.mongoengine.org/guide/mongomock.html ) but I'd like to use a real MongoDB instance in our tests.

What do you use?

async / await style

Hi :)

MotorAsyncIODal.commit doesn't work with await syntax. I think it's better to rewrite all async code in 3.5 style, nobody care about 3.4. Thanks for great lib! Do you have power for docs?

as_marshmallow_field params are not passed to the container in a ListField

In ListField, the params that are passed to as_marshmallow_field are used only for the underlying Marshmallow List but nothing is passed to the container.

https://github.com/Scille/umongo/blob/master/umongo/fields.py#L119

AFAIU, there is no way to pass params to the container, for instance an EmbeddedField inside the ListField. This sort of breaks the cascading mechanism.

Value to return by document.get('field') when field is missing in DB/data_proxy

I'm opening a dedicated issue for this topic, to follow up on #23.

Currently, when a value is missing in the data proxy, None is returned:

    def get(self, name, to_raise=KeyError):
        if name not in self._fields:
            raise to_raise(name)
        field = self._fields[name]
        name = field.attribute or name
        value = self._data[name]
        if value is missing:
            if self.partial:
                raise FieldNotLoadedError(name)
            elif field.default is not missing:
                return field.default
            else:
                return None
        return value

@touilleMan:

The use of None is always slippy given you never know what it really means.

The policy in umongo is to say "None is None", so if you want to set a field to None you must have set allow_none=True in your field config.

Only exception is the fields that are missing in the database which are represented to the user as None given comparing against a missing object would be to cumbersome (thus we could make the missing value configurable and None by default).

Another option is to raise an exception, just like any dict would do when a value is missing. The advantage of this is that it always respects the difference between None and "no value". But it is less practical to manage (need to cath the exception or call get with a default value).

So there's a trade-off here between ease of use and absolute data integrity.

@touilleMan's proposal is that in most cases, None is equivalent to "no value", so let's keep things simple, and for users who really needs to make a difference, introduce a feature allowing to specify another default value.

Maybe we could expend this last feature by allowing the user to ask for an exception to be raised rather than a return value.

I have no definitive opinion about this. I'm just trying to summarize the options.

Another consequence of this choice: when serializing a Document using shema.dump(document), the value returned by document.get (None or another default value) will appear in the dump. If get returns an Exception, however, the field disappears from the dump. Right now, I like this behavior (it allows my API to only return values that are in database), but I don't have the expertise to tell if it is the best. And anyway, maybe it is the user's problem and it shouldn't be taken into account by umongo.

pymongo2 and data_objects.List

When I run next code:

user = User.find_one({"email": '[email protected]'})
print(user.friends)

With pymongo driver + pymongo3, i have:

umongo.data_objects.List([<object umongo.dal.pymongo.PyMongoReference(document=User, pk=ObjectId('572192eb444bcb21df700488'))>])>

But if I run the same code with pymongo driver + pymongo2.8, I have:

<object umongo.data_objects.List([])>

So data_objects.List doesn't work with motor.

Should EmbeddedDocument child of abstract EmbeddedDocument have a cls field?

EmbeddedDocument and Document behaves differently with regards to the cls field:

If ChildDocument inherits from AbstractDocument, it has no cls field.
If ChildEmbeddedDocument inherits from AbstractEmbeddedDocument, it gets a cls field.

This results from a difference in the code:

def _is_child(bases):
    """Find if the given inheritance leeds to a child document (i.e.
    a document that shares the same collection with a parent)
    """
    return any(b for b in bases if issubclass(b, DocumentImplementation) and not b.opts.abstract)

def _is_child_embedded_document(bases):
    """Same thing than _is_child, but for EmbeddedDocument...
    """
    return any(b for b in bases
               if issubclass(b, EmbeddedDocumentImplementation) and
               b is not EmbeddedDocumentImplementation)

I'm not sure I understand the rationale behind this.

We want to create an abstract parent EmbeddedDocument in our app with shared fields and methods for our EmbeddedDocuments to inherit, and now we get cls fields in every EmbeddedDocument, which pollute the API output. It would be easier to deal with this if it only happened when a concrete EmbeddedDocument is subclassed (which is much less frequent).

It's easy not to add cls from a concrete EmbeddedDocuments when its parent is abstract. Just copy Document:

def _is_child_embedded_document(bases):
    """Same thing than _is_child, but for EmbeddedDocument...
    """
    return any(b for b in bases
               if issubclass(b, EmbeddedDocumentImplementation) and not b.opts.abstract)

but if we do that, we should also prevent EmbeddedField from accepting an abstract EmbeddedDocument, otherwise, it won't be able to deserialize that concrete child.

Is this the reason you did things this way?

It boils down to what we expect from an abstract EmbeddedDocument.

For real polymorphism, where the abstract EmbeddedDocument is Vehicle and its children are Car and Truck, it makes sense to keep things the way they are. We may want to create an EmbeddedField that expects a concrete Vehicle subclass but not a Vehicle.
If we create a SuperEmbeddedDocument for our EmbeddedDocuments to inherit, it does not make much sense to accept it in EmbeddedField and I don't see the use for the cls field.

It looks like we're dealing with two different concepts. Are we abusing inheritance by doing this? Should we do it differently? Should there be two different levels of abstraction?

Notes:

If we keep things this way, we may need to add a protection in EmbeddedField._deserialize because IIUC, if value has no cls key, it will try to instantiate the parent abstract EmbeddedDocument, which will trigger umongo.exceptions.AbstractDocumentError: Cannot instantiate an abstract EmbeddedDocument rather than a ValidationError (I could be wrong, I didn't test that).
Such a restriction applies to Document: you can't reference an abstract VehicleDocument since it has no collection.
I don't see any simple way to prevent the API from spitting those cls fields out. Any nested structures with a little bit of complexity would require awful params trickery in as_marshmallow_schema.

check_unknown_fields raises ValidationError if passed a dump_only field even if value is 'missing'

I'm using webargs to parse API requests using the Schema I get from the Document and I have an issue with dump_only fields.

Consider a document/resource user with dump_only id field.

When sending a request like this:

http POST http://127.0.0.1:5000/user/ name=Test

webargs searches for all attributes in the query locations and original_data in check_unknown_fields is:

{'id': <marshmallow.missing>, 'name': 'Test'}

Then, since id is not in loadable_fields, check_unknown_fields raises an error.

    @validates_schema(pass_original=True)
    def check_unknown_fields(self, data, original_data):
        loadable_fields = [k for k, v in self.fields.items() if not v.dump_only]
        for key in original_data:
            if key not in loadable_fields:
                raise ValidationError(_('Unknown field name {field}.').format(field=key))

This happens when I'm reusing the Schema in webargs, not on typical umongo usage like in the Flask example.

Possible fixes/workarounds:

Stick to the version of check_unknown_fields suggested in Marshmallow's documentation:

    @validates_schema(pass_original=True)
    def check_unknown_fields(self, data, original_data):
        for key in original_data:
            if key not in self.fields:
                raise ValidationError('Unknown field name {}'.format(key))

Doing this removes the error but also won't raise any error for

http POST http://127.0.0.1:5000/user/ name=Test id=whatever

If we want to get an error for this, then another fix could be

    @validates_schema(pass_original=True)
    def check_unknown_fields(self, data, original_data):
        for key, value in original_data.items():
            if key not in self.fields:
                raise ValidationError(_('Unknown field name {field}.').format(field=key))
            if self.fields[key].dump_only and value is not missing:
                raise ValidationError(_('Field {field} is dump_only.').format(field=key))

which also raises a different message for dump_only and unknown fields.

Another fix would be to stop webargs from parsing dump_only fields in the first place so that they don't appear in original_data. After all, I don't see the point in adding those fields since they are ignored in the next step.

It would happen here:

            for argname, field_obj in iteritems(argdict):
                argname = field_obj.load_from or argname
                parsed_value = self.parse_arg(argname, field_obj, req, locations)
                parsed[argname] = parsed_value

This could be changed into

            for argname, field_obj in iteritems(argdict):
                if field_obj.dump_only:
                    continue
                argname = field_obj.load_from or argname
                parsed_value = self.parse_arg(argname, field_obj, req, locations)
                parsed[argname] = parsed_value

But this would also silently mask cases where data is provided to a dump_only field:

http POST http://127.0.0.1:5000/user/ name=Test id=whatever

so maybe this is not any better.

Besides, I'm not familiar enough with webargs' insides to be confident with this change.

Anyway, if we want to raise an error if a dump_only field receives a value, we should make sure it is not "missing".

What do you think about the value is not missing test alternative?

await umongo.dal.motor_asyncio.MotorAsyncIOReference

It looks like umongo.dal.motor_asyncio.MotorAsyncIOReference can't work in await-style.

Problem when modifying a Document pk

If I create and commit a Document, then change its id, I get an error when committing:

    def test_create(self, classroom_model):
        Student = classroom_model.Student
        john = Student(name='John Doe', birthday=datetime(1995, 12, 12))
        ret = john.commit()

        print(john.id)
        # 58580041f667370f03a677e1
        print(john.is_created)
        # True
        john.id = ObjectId("5672d47b1d41c88dcd37ef05")
        # No error
        print(john.is_created)
        # True
        john.commit()
        # umongo.exceptions.UpdateError: <pymongo.results.UpdateResult object at 0x7f19b14d4a20>

This is because in commit(), we fall in the update case and pymongo searches for a doc with the new id.

umongo should either prevent changing the ID field after the Document has been committed, or take necessary precautions if allowing it.

If we want to allow it, then maybe the role of is_created needs to be redefined, I don't know. At least, there should be a way to remember the old pk value so that the update can find the document.

I'm not sure this is a critical use case, but if feasible, I'd rather allow it.

Labelling as bug as it leads to an UpdateError exception while nothing tells the user he's doing something wrong.

Missing to_raise=AttributeError parameter in EmbeddedDocumentImplementation's *attr methods?

Isn't there a to_raise=AttributeError parameter missing in the __getattr__, __setattr__ and __delattr__ methods of EmbeddedDocumentImplementation?

https://github.com/Scille/umongo/blob/master/umongo/embedded_document.py#L92

Like in DocumentImplementation:

umongo/umongo/document.py

Line 240 in 06eed17

def __setattr__(self, name, value):

Issue passing a missing value to a ListField

Current code:

class ListField(BaseField, ma_fields.List):

    def __init__(self, *args, **kwargs):
        kwargs.setdefault('default', [])
        kwargs.setdefault('missing', lambda: List(self.container))
        super().__init__(*args, **kwargs)

If I put this in my model

test = fields.ListField(
    fields.IntField(),
    missing=[1,2,3])

then when instantiating my Document, I end up with a list instead of a umongo List, which is not good. For instance, when appending to the list, document.is_modified() returns False. There may be other consequences.

Is the user expected to import umongo's List and feed this to the field (not sure it actually works)? From a user perspective, since setting a default anything other than a List is pretty dangerous, maybe there should be a safeguard, here.

Or should we sort of cast missing to List in the __init__? Not so obvious as List([1,2,3]) or List(container, [1,2,3]) won't work.

It's not a blocking issue for me but I thought I'd write about it here before I forget.

Edit: In real life, don't pass a mutable as default. It would be shared by all field instances. Use a tuple.

Improve txmongo's cursor

Handle required attribute

marshmallow {pre,post}_{load,dump} functions handled as fields

When I added a @pre_load decorator to a function in my Document class:

@instance.register
class Test(Document):
    name = fields.StringField()

    @pre_load
    def test(self, data):
        return data

the function is handled like a field and I end up with the error:

(...)
  File "/<path_to_venv>/lib64/python3.4/site-packages/umongo/schema.py", line 15, in find_id_field
    if (name == '_id' and not field.attribute) or field.attribute == '_id':
AttributeError: 'function' object has no attribute 'attribute'

I check the tests (test_document.py), and it looks like there the error is avoided by defining

name = fields.StrField(attribute='_id')  # Overwrite automatic pk

which for me is not the sense of these tests.

Also, the tests are run with their own MokedBuilder, which avoids tests with the real builders. There I found another error (again because the function is treated like a field):

  File "/<path_to_venv>/lib64/python3.4/site-packages/umongo/frameworks/pymongo.py", line 296, in _patch_field
    validators = field.io_validate
AttributeError: 'function' object has no attribute 'io_validate'

I corrected the errors where they appear, for now, but I think a real fix would look different.

The function builder._collect_fields() is responsible for separating "fields and non-fields" as the docstring says, but I am not sure why items with a __marshmallow_tags__ attribute are include to the fields here.

Please help me to find the right place to fix this issue.

Can mongo_world=True work with missing_accessor=True?

AFAIU, in as_marshmallow_schema, one can pass

missing_accessor=True, mongo_world=True

But it is bound to fail, because schema_from_umongo_get_attribute expects obj to be an object, not a dict.

def schema_from_umongo_get_attribute(self, attr, obj, default):
    ret = MaSchema.get_attribute(self, attr, obj, default)
    if ret is None and ret is not default and attr in obj.schema.fields:
        raw_ret = obj._data.get(attr)
        return default if raw_ret is missing else raw_ret
    else:
        return ret

Shouldn't this be prevented? Like, by setting missing_accessor to False when mongo_world is True?

Many tests are not done with a real framework

As I mentioned in #90, many tests are only done against the pseudo database. A better solution would be to have all tests test against every database driver.

EmbeddedDocument required field validation issue

When a field is required, it is still possible to instantiate the Document without any value for this field, the validation error occurs only at commit time.

However, this does not seem to apply to embedded documents:

        @self.instance.register
        class MyEmbeddedDocument(EmbeddedDocument):
            a = fields.IntField(required=True, dump_only=True, missing=69)

        @self.instance.register
        class MyDoc(Document):
            e = fields.EmbeddedField(MyEmbeddedDocument)
            l = fields.ListField(fields.EmbeddedField(MyEmbeddedDocument))
            b = fields.IntField(required=True)

        data = {'e': {}, 'l': [{}]}

        d = MyDoc(**data)

This raises

marshmallow.exceptions.ValidationError: {'l': {0: {'a': ['Missing data for required field.']}}, 'e': {'a': ['Missing data for required field.']}}

I think we have two errors, here:

The required condition should not be checked at instantiation time.
No error should be reported when the field is both required and dump_only.

I've been digging into this for hours and couldn't find the root cause, not even a workaround.

Pass meta attributes to schema in as_marshmallow_schema

Not sure whether this is a feature request or a question.

I would like to exclude a field from a schema of an EmbeddedDocument when calling as_marshmallow_schema on its container Document.

To do that on a Schema generated by as_marshmallow_schema, I would just call as_marshmallow_schema to get the Schema then subclass it to add a Meta class:

schema = Doc.as_marshmallow_schema()

class MySchema(schema):
    class Meta:
        exclude: ('hidden_field',)

It would be cool to be able to call

schema = Doc.as_marshmallow_schema(meta={'exclude': ('hidden_field',)})

but it's not critical, at least in my current use of umongo.

It becomes more useful if the Schema I need to modify is nested in the document, so that the call to as_marshmallow_schema is not direct but cascaded through EmbeddedField. In this case, the meta attributes need to be provided to as_marshmallow_schema by as_marshmallow_field in
EmbeddedField.

Maybe it could take the (absolutely untested) following form:

    # In BaseSchema
    def as_marshmallow_schema(self, params=None, base_schema_cls=MaSchema,
                              check_unknown_fields=True, missing_accessor=True,
                              mongo_world=False, meta=None):

        [...]
        if meta is not None:
            # Do what is to be done

    # In EmbeddedField
    def as_marshmallow_field(self, params=None, mongo_world=False):
        # Overwrite default `as_marshmallow_field` to handle nesting
        kwargs = self._extract_marshmallow_field_params(mongo_world)
        if params:
            nested_params = params.pop('params')
            nested_meta = params.pop('meta')
            kwargs.update(params)
        else:
            nested_params = None
            nested_meta = None
        nested_ma_schema = self._embedded_document_cls.schema.as_marshmallow_schema(
            params=nested_params, mongo_world=mongo_world, meta=meta)
        return ma_fields.Nested(nested_ma_schema, **kwargs)

Or am I missing an already existing feature?

ListField not validated on append

I'm not sure whether this is expected behavior or not.

One can set a validator on a ListField, but it is used only when setting values, not when appending.

Here's a trivial example:

        @db_instance.register
        class Doc(Document):
            l = fields.ListField(
                fields.IntField,
                validate=NoneOf(([1, 2, 3],))
            )

        d = Doc()

        with pytest.raises(ValidationError):
            d.l = [1, 2, 3]

        d.l = [1, 2]

        # The test fails here. No Exception raised.
        with pytest.raises(ValidationError):
            d.l.append(3)

        # However, an exception is raised here.
        with pytest.raises(ValidationError):
            d.l = d.l

This does not seem consistent. Assuming this is a bug/shortcoming, I'm not sure how I would address this.

My use case is a custom validator that checks a subitem is unique in a list of embedded documents: I have a ListField of EmbeddedField and I want to be sure that there is not two embedded documents with the same value for some specific field (or list of fields).

The validator is applied to the list, as it can obviously not be applied to the embedded documents individually.

Worse, I need it to be applied not only when an embedded document is appended to the list, but also when an embedded document in the list is mutated. If the latter is not feasible, I guess I can work this around by making the field read_only in my API. Better than nothing.

umongo 0.7.5 does not provide the extra 'motor'

pip install umongo[motor]
umongo 0.7.5 does not provide the extra 'motor'

Is there support for OrderedDict

If you like to find embedded documents the order of the kv pairs in the embedded document is essential as 'find' will only match documents with the same order. As the mongo docs say (https://docs.mongodb.com/manual/tutorial/query-documents/#exact-match-on-the-embedded-document):

Equality matches on an embedded document require an exact match of the specified , including the field order.

pymongo thus supports python OrderedDict:
http://stackoverflow.com/a/30787769/4273834

as does marshmallow:
http://marshmallow.readthedocs.io/en/latest/quickstart.html#ordering-output

Is there a way to use OrderedDict in umongo?

EmbeddedDocument inheritance

Currently, only Document supports inheritance. EmbeddedDocument inheritance should be covered as well.

I could try to have a go at it, although I'm not sure when and how difficult it would be. It impacts the core of umongo.

@touilleMan, do you have any advice? Is it something you'd rather do yourself?

Add `MethodField` and `FunctionField`

Those fields come directly form marshmallow. So far they are present but commented in umongo (https://github.com/Scille/umongo/blob/master/umongo/fields.py#L179).

The idea would be make sure they behave right and write unit tests on them too.

DateTime and timezone awareness

I'm pulling my hair with datetime TZ awareness issues.

I initiate the connection to MongoDB with tz_aware = False. I'm no expert about this, and I had never thought much about it before now, but from what I gathered, it seems like a reasonable choice to make. Besides, it's pymongo's default (but flask-PyMongo's hardcodes tz_aware to True).

When a document is pulled from the DB, its DateTimeField attribute's TZ awareness depends only on MongoClient's tz_aware parameter (no Marshmallow schema involved):

tz_aware=True -> pymongo provides an aware datetime -> umongo returns an aware datetime
tz_aware=False -> pymongo provides a naive datetime -> umongo returns a naive datetime

This is one more reason to set tz_aware = False, because I'm passing a document/object to a lib that expects a naive timezone. (OAuth2 lib expects expires timestamp to be naive and compares it to datetime.utcnow() (see code).) I suppose I could alternatively modify my getters to make all datetimes naive before returning tokens/grants, but on some use cases, it could get cumbersome.

Marshmallow, however, returns every datetime as TZ aware (doc). For this reason, umongo's DateTimeField's _deserialize method returns a TZ aware datetime. Since I'm using it (via webargs) to parse the inputs to my API, I'm getting TZ aware datetimes. Likewise, calling load() on a document will use _deserialize and result in a TZ aware datetime.

So if I load a date, it becomes TZ aware. Therefore, to compare it to a date from the database, this one needs to be aware as well.

Should I understand that umongo is meant to be used with tz_aware=True, so that dates fetched from the database can be compared to dates loaded thought Marshmallow schemas?

Could there be a flag/meta allowing to specify if a DateTimeField should return a naive datetime?

I made a quick and dirty patch to DateTime's _deserialize to remove the TZ from the returned output.

    dt = super()._deserialize(value, attr, data)
    return dt.replace(tzinfo=None)

This seems to work on my use case. However, the day we complete pure Marshmallow schema export, the exported schema I pass to webargs for API input parsing won't have that feature. Unless this is added to Marshmallow as well. (I asked there about it there: marshmallow-code/marshmallow#520.)

Feedback welcome. Those TZ issues are new to me, so I may be totally misguided.

Add `conditions` to delete method

commit method has a conditions argument (https://github.com/Scille/umongo/blob/master/umongo/frameworks/pymongo.py#L66), should be also available for delete method

Allow exportation of the Marshmallow Schema of a Document

Currently, the Schema of a Document can be obtained from Document.Schema. However, this Schema is made to [|de]serialize between "DB / data_proxy" and "OO world", not between "OO world" and "client/JSON world". (See diagram in the docs).

In other words, uMongo is made to be used like this:

    document.dump(schema=schema)
    # which is equivalent to
    schema.dump(document._data._data)
    # but not equivalent to
    schema.dump(document)

The difference being that the data_proxy does not behave like the document:

It may have keys named differently is "attribute" is used
document returns None if a key is missing
...

Therefore, using the Schema to serialize a Document may work but it currently has corner cases.

@touilleMan confirms that the ability to export a Marshmallow Schema without the uMongo specificities is in the scope of uMongo and is a feature we want to have.

The idea could be to add a method or attribute to the document to provide that "cleaned up" Schema.

I'm opening this issue to centralize the reflexions about that.

Currently, here are the issues I found:

check_unknown_fields raises ValidationError if passed a dump_only field even if value is missing, which is an issue when validating a document before deserialization. (Bug report: #18, PR: #19)
Some uMongo fields fail if value is None. Maybe this one is unrelated but just happened to occur while calling `schema.dump(document) (PR: #32)
Fields with "attribute" not being None will fail because the Schema tries to find the value in "attribute", while in the document, it is available at the field's name. To avoid this, we could set "attribute" to None in all the fields. (PR: #33)

#33 drafts a way of exporting the Marshmallow Schema from the document.

Validation issue

Hi.

I'm a bit confused about document validation.

Consider this test from test_pymongo.py:

    def test_required(self, classroom_model):

        Student = classroom_model.Student
        student = Student(birthday=datetime(1968, 6, 9))

        with pytest.raises(exceptions.ValidationError):
            student.io_validate()

        with pytest.raises(exceptions.ValidationError):
            student.commit()

        student.name = 'Marty'
        student.commit()

The idea is that before a newly created document is committed, all its fields are validated.

See io_validate():

    def io_validate(self, validate_all=False):
        """
        Run the io_validators of the document's fields.

        :param validate_all: If False only run the io_validators of the
            fields that have been modified.
        """
        if validate_all:
            _io_validate_data_proxy(self.schema, self._data)
        else:
            _io_validate_data_proxy(
                self.schema, self._data, partial=self._data.get_modified_fields())

When calling io_validate() on a new document, with validate_all=False by default, we call

_io_validate_data_proxy(self.schema, self._data, partial=self._data.get_modified_fields())

But self._data.get_modified_fields() is equal to [] (empty list) because data_proxy's load(), never marks the fields as modified.

Thus, in _io_validate_data_proxy(), partial is [] and evaluates to False so that all fields are validated.

Good, because we want those fields validated, but it looks like this was unintended.

Shouldn't all the fields be marked as modified, so that they are listed explicitly when calling get_modified_fields()? I thought a field was marked as modified as soon as it differed from what is in database.

It becomes an issue if we modify a field after loading the document:

        Student = classroom_model.Student
        student = Student(birthday=datetime(1968, 6, 9))

        print(student._data.get_modified_fields())
        # prints []
        del student.birthday
        # prints ['birthday']
        print(student._data.get_modified_fields())

        with pytest.raises(exceptions.ValidationError):
            student.io_validate()
            # This time, only birthday is checked, not name.
            # Failed: DID NOT RAISE <class 'marshmallow.exceptions.ValidationError'>

In this case, partial is ['birthday'] and the other fields are never checked. The required condition on name is never checked and validation passes.

AFAIU, the root-cause is the fields not being marked as changed on load. Is that it?

Add indexes registering

Rethink the missing/default field attributes

Those attributes come from marshmallow, which is all about serialize/deserialize a document:

missing is used when the field is not present in the dict to deserialize
default is used when the field is not present in the document to serialize

However in umongo the focus shifts toward the document itself, so those terms seems a bit clunky

the missing object is already used inside umongo's DataProxy to represent field not present in MongoDB. Hence the missing attribute should mean what value to return if the value is missing
the default attribute sounds like the value to set by default to the field when creating a new document

In a nutshell the meaning of missing and default are reversed between mashmallow and umongo...

What I'm thinking to do:

Remove the missing attribute (in fact just hide it from abstract.BaseField constructor and documentation)
Only use the default attribute for both missing and default (in marshmallow's logic)
Add methods to get from the umongo document an equivalent pure mashmallow Schema (see #34 )

The idea is to hide mashmallow logic to expose a more consistent API from the umongo user's point of view. Then provide a way to get back a pure Marshmallow Schema when needed to do all the custom.

example:

@instance.register
class Person(Document):
    name = fields.StrField(default='John Doe')

p = Person()
# Default is set
assert p.name == 'John Doe'
# Default value will be written in database
assert p._data.get('name') == 'John Doe'
# If we want more cunning behavior (e.g. only save in database non-default value)
# we should use a `MethodField` to define custom method for serialization/deseriazation

del p.name
assert p._data.get('name') == missing
# If not present on database, we switch back to default value as well
assert p.name == 'John Doe'

# Now it's customization time !
PersonSchema = p.get_marshmallow_schema()
class MyCustomSchema(PersonSchema):
    ...

# It could be also useful to provide method to get a specific field
class MyCustomSchema2(marshmallow.Schema)
    name = p.get_marshmallow_field('name')

@lafrech What do you think ?

Store byte object as field

Is there a way to store a python byte object as a field using umongo without relying on gridfs?
In my case i want to store pretty small binary objects that get changed rarely.
In pymongo the recommended way seems to use a bson field but i cannot find a related field in umongos fields.py.

pymongo: UnboundLocalError when committing unmodified Document

To reproduce, you may for instance duplicate this line in the tests to commit a Document twice in a row:

    def test_create(self, classroom_model):
        Student = classroom_model.Student
        john = Student(name='John Doe', birthday=datetime(1995, 12, 12))
        ret = john.commit()
        ret = john.commit()

The problem is in commit():

self.is created = True
payload = None

then ret is assigned no value before return ret.

Motor 1.0+ support

Motor 1.0 released 3 Nov 2016, it also have some support for MongoDB 3.4.

But, for now, umongo only support motor < 1.0 version.

Auto-indexes in Instance?

What would you think about a kind of semi-automatic opt-in indexes generation like this?

# Mark Document for indexing
@instance.register(auto_index=True)
class MyDocument(Document):
...

In Instance:

    def register(self, template, as_attribute=True, auto_index=False):
        # Retrieve the template if another implementation has been provided instead
        template = get_template(template)
        if issubclass(template, DocumentTemplate):
            implementation = self._register_doc(template)
+           if auto_index:
+               self._auto_indexes.append(implementation)
        else:  # EmbeddedDocumentTemplate
+           if auto_index:
+               raise SomeException
            implementation = self._register_embedded_doc(template)
        if as_attribute:
            setattr(self, implementation.__name__, implementation)
        return implementation

+   def ensure_indexes(self):
+       for doc in self._auto_indexes:
+           doc.ensure_indexes()

umongo.data_objects.List with asyncio

How to use umongo.data_objects.List with motor driver?

If i use pymongo driver with pymongo3, i have:

user = User.find_one({"email": '[email protected]'})
print(user.friends)

<object umongo.data_objects.List([<object umongo.dal.pymongo.PyMongoReference(document=User, pk=ObjectId('572192eb444bcb21df700488'))>])>

But if I use motor driver with pymongo2, I have:

user = await User.find_one(email='[email protected]')
print(user)

<object Document __main__.User({'birthday': datetime.datetime(1984, 11, 20, 0, 0), 'email': '[email protected]', 'friends': <object umongo.data_objects.List([])>, '_id': ObjectId('572192eb444bcb21df700488')})>

Could you explain how data_objects.List works? Should it fetch related data in main query? Or if it lazy we need something like await user.friends.fetch() for async code, becouse we can't use getter here.

ReferenceField lazy/auto dereference

Currently, a ReferenceField deserializes as (for instance) a PyMongoReference, and must be dereferenced manually using fetch().

from pymongo import MongoClient
from umongo import Instance, Document, fields

db = MongoClient().test
instance = Instance(db)

@instance.register
class Ref(Document):
    a = fields.IntField()

@instance.register
class Doc(Document):
    ref = fields.ReferenceField(Ref)

ref = Ref(a=12)
ref.commit()

# Here's my object
print(ref)
# <object Document __main__.Ref({'a': 12, '_id': ObjectId('57c4ae2cf66737156a9ebfff')})>

# Let's reference it in another object
doc = Doc(ref=ref)

# I can't get it directly from here
print(doc.ref)
# <object umongo.frameworks.pymongo.PyMongoReference(document=Ref, pk=ObjectId('57c4ae2cf66737156a9ebfff'))>

# I need to call fetch manually
print(doc.ref.fetch())
# <object Document __main__.Ref({'a': 12, '_id': ObjectId('57c4ae2cf66737156a9ebfff')})>

This is not as practical as automatic dereferencing (systematic or lazy as in MongoEngine).

Is this a design choice (simplicity) or something that you'd like to see improved?

In practice, I could call fetch() every time I'm accessing a ref field, but there are cases where you need to pass an object to another method you can't modify (imported library) and you really want doc.ref to be the referenced document, not a PyMongoReference.

print(doc.ref.a)
AttributeError: 'PyMongoReference' object has no attribute 'a'

And I can't even dereference every reference manually to get a "clean" object:

# This won't work
doc.ref = doc.ref.fetch()
print(doc.ref)
# <object umongo.frameworks.pymongo.PyMongoReference(document=Ref, pk=ObjectId('57c4ae2cf66737156a9ebfff'))>

I'm kinda stuck. Is there a way around this? Am I missing something?

scille / umongo Goto Github PK

umongo's People

Contributors

Stargazers

Watchers

Forkers

umongo's Issues

Running py.test from Debian Jessie

Running tox and latest py.test

Recommend Projects

Recommend Topics

Recommend Org

Jobs