GithubHelp home page GithubHelp logo

Store byte object as field about umongo HOT 8 OPEN

scille avatar scille commented on May 30, 2024 1
Store byte object as field

from umongo.

Comments (8)

touilleMan avatar touilleMan commented on May 30, 2024 1

That's a good idea, beside this doesn't seems like something hard to add : Just create a new ByteField which would mimic marshmallow's String but with a check on byte in _deserialize instead of basestring

PR welcomed 👍

from umongo.

thodnev avatar thodnev commented on May 30, 2024 1

@touilleMan @chenjr0719 @lafrech @martinjuhasz
Guys, this issue seems dead old, but this is where search engine leads you when you're looking for ways to add binary fields to uMongo document. In my opinion, it is such a shame Marshmallow does not provide users binary field. They do it on purpose, but there are really no good reasons for doing so:

  • BSON spec has Binary and Mongo supports it.
  • Binary field is needed for numerous appliances (whether it would an avatar, some hash or small blob). And this is where Mongo plays strong in terms of efficiency.
  • People are trying to store bytes either as UTF-8 encoded string, which one day will result in ultimate failure (example – b'\xd5\xce\xe1\x86\xcf'), or as base64 encoded value. Which is more reliable, but introduces inconveniences (no obvious way to check length, slice, ... without decoding first) and computation overheads.
  • Others are trying to store blobs in GridFS. Stackoverflow is full of such recommendations. Of course, it is not the use case GridFS was initially made for

Conclusion from the above: uMongo needs BinaryField. If Marshmallow guys refuse to add support for it – f*ck them, let's do it in uMongo

Unfortunately, I'm not uMongo developer and haven't dig deep into how everything works. Here is an example of BinaryField I came with:

import bson
from marshmallow import compat as ma_compat, fields as ma_fields
from umongo import fields


class BinaryField(fields.BaseField, ma_fields.Field):
    default_error_messages = {
        'invalid': 'Not a valid byte sequence.'
    }

    def _serialize(self, value, attr, data):
        return ma_compat.binary_type(value)

    def _deserialize(self, value, attr, data):
        if not isinstance(value, ma_compat.binary_type):
            self.fail('invalid')
        return value

    def _serialize_to_mongo(self, obj):
        return bson.binary.Binary(obj)

    def _deserialize_from_mongo(self, value):
        return bytes(value)

Maybe there are some obscure caveats, maybe not. This is the code I'm currently having in project and it seems to work like a charm. (I'm using Motor)

Would be nice if someone familiar with internals of uMongo could take a look

from umongo.

martinjuhasz avatar martinjuhasz commented on May 30, 2024

Okay, good. But what would the default serialization method do? Byte data isn't necessarily convertible into a string, right? How would someone want a byte Field to be serialized?

In my special case the byte field represents some pickled state of an object that i don't even want to be serialized and sent over my api (is there a way to exclude fields on serialization?).

class ByteField(BaseField, ma_fields.String):
    def _deserialize(self, value, attr, data):
        if isinstance(value, bytes):
            return value
        return super()._deserialize(value, attr, data)

Works fine for storing and if the stored byte field is valid utf-8 it gets converted into a string on serialization.

from umongo.

touilleMan avatar touilleMan commented on May 30, 2024

Bytes is a valid bson type (named Binary data in mongodb types)

Beside, it seems pymongo does the convertion bytes <=> Binary data by itself:

>>> hello = 'héllo'
>>> doc_id = db.test.insert({'str': hello, 'bytes': hello.encode()})
>>> db.test.find_one(doc_id)
{'bytes': b'h\xc3\xa9llo', 'str': 'héllo', '_id': ObjectId('57ad9b0713adf23b7095fcee')}

So I think the _deserialize method should check the entry data is bytes and that's it ! pymongo will gladly take care of those bytes for us ;-)

In my special case the byte field represents some pickled state of an object that i don't even want to be serialized and sent over my api (is there a way to exclude fields on serialization?).

Yes there is ! You should use the attribute load_only for your field. This way it will never be serialize.
I guess you should also use dump_only attribute as well in order for your API not to accept incoming data for this field during deserialization.

@instance.register
class MyDoc(Document):
    pickled_stuff = field.BytesField(load_only=True, dump_only=True)
    public_name = field.StrField()

# inside your POST API
payload = get_payload_from_request()
my_doc = MyDoc(**payload)
# raise ValidationError if a 'pickled_stuff' field is present
assert my_doc.pickled_stuff == None
my_doc.pickled_stuff = pickle_my_stuff()  # must return bytes
my_doc.commit()
return 200, 'Ok'

# inside your GET API
my_doc = MyDoc.find({'id': my_id})
print(my_doc)
# <... {'pickled_stuff': b'<pickled data>', 'public_name': 'test' }...>
my_doc.dump()
{'public_name': 'test'}
return 200, json.dumps(my_doc.dump())

You should also have a look at the flask example which show you how to use umongo inside an API with custom loading/dumping schema

from umongo.

martinjuhasz avatar martinjuhasz commented on May 30, 2024

Thanks for sharing this, great stuff!
So you think BytesField should try to serialize using ensure_text_type (as it does when inheriting from BaseField, ma_fields.String)? It will fail on binary data thats not utf-8 encoded, but i guess thats fine, because if you want binary data to be serialized you would have thought about encoding before storing it.

My pickled data would fail on serialization:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

from umongo.

touilleMan avatar touilleMan commented on May 30, 2024

I think we should not try to do any string encode/decode inside umongo. This should be the user responsibility to provide bytes given there is too much suppositions on his workflow otherwise (what is the encoding the user want ? what to do with bytes that can't be decoded ? etc.).
Beside this makse the implementation more straightforward and simple, so why bother ;-)

from umongo.

chenjr0719 avatar chenjr0719 commented on May 30, 2024

Does this enhancement still need? If so, I want to get a try.

from umongo.

kevinbosak avatar kevinbosak commented on May 30, 2024

@thodnev Thanks for the code, I'm going to use it as I need the ability to store binary data (in this case, a password salt created from os.urandom). If this code works, I can't imagine it would be too hard to add to a PR (if you haven't already).

from umongo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.