GithubHelp home page GithubHelp logo

marshmallow-code / marshmallow Goto Github PK

View Code? Open in Web Editor NEW
6.9K 80.0 618.0 5.75 MB

A lightweight library for converting complex objects to and from simple Python datatypes.

Home Page: https://marshmallow.readthedocs.io/

License: MIT License

Python 100.00%
serialization deserialization validation python marshalling python-3 serde schema hacktoberfest

marshmallow's Introduction

marshmallow: simplified object serialization

Latest version

Build status

pre-commit.ci status

Documentation

marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, to and from native Python datatypes.

from datetime import date
from pprint import pprint

from marshmallow import Schema, fields


class ArtistSchema(Schema):
    name = fields.Str()


class AlbumSchema(Schema):
    title = fields.Str()
    release_date = fields.Date()
    artist = fields.Nested(ArtistSchema())


bowie = dict(name="David Bowie")
album = dict(artist=bowie, title="Hunky Dory", release_date=date(1971, 12, 17))

schema = AlbumSchema()
result = schema.dump(album)
pprint(result, indent=2)
# { 'artist': {'name': 'David Bowie'},
#   'release_date': '1971-12-17',
#   'title': 'Hunky Dory'}

In short, marshmallow schemas can be used to:

  • Validate input data.
  • Deserialize input data to app-level objects.
  • Serialize app-level objects to primitive Python types. The serialized objects can then be rendered to standard formats such as JSON for use in an HTTP API.

Get It Now

$ pip install -U marshmallow

Documentation

Full documentation is available at https://marshmallow.readthedocs.io/ .

Requirements

  • Python >= 3.8

Ecosystem

A list of marshmallow-related libraries can be found at the GitHub wiki here:

https://github.com/marshmallow-code/marshmallow/wiki/Ecosystem

Credits

Contributors

This project exists thanks to all the people who contribute.

You're highly encouraged to participate in marshmallow's development. Check out the Contributing Guidelines to see how you can help.

Thank you to all who have already contributed to marshmallow!

Contributors

Backers

If you find marshmallow useful, please consider supporting the team with a donation. Your donation helps move marshmallow forward.

Thank you to all our backers! [Become a backer]

Backers

Sponsors

Support this project by becoming a sponsor (or ask your company to support this project by becoming a sponsor). Your logo will show up here with a link to your website. [Become a sponsor]

Become a sponsor

Professional Support

Professionally-supported marshmallow is now available through the Tidelift Subscription.

Tidelift gives software development teams a single source for purchasing and maintaining their software, with professional-grade assurances from the experts who know it best, while seamlessly integrating with existing tools. [Get professional support]

Get supported marshmallow with Tidelift

License

MIT licensed. See the bundled LICENSE file for more details.

marshmallow's People

Contributors

3rdcycle avatar aganezov avatar arbor-dwatson avatar deckar01 avatar dependabot-preview[bot] avatar dependabot-support avatar dependabot[bot] avatar dunstrom avatar ecarreras avatar eprikazc avatar frol avatar hugovk avatar hukkinj1 avatar imhoffd avatar jmcarp avatar kelvinhammond avatar lafrech avatar mahenzon avatar mgetka avatar pre-commit-ci[bot] avatar pyup-bot avatar rooterkyberian avatar sayanarijit avatar sirosen avatar sloria avatar svenstaro avatar taion avatar vgavro avatar yuriheupa avatar zblz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

marshmallow's Issues

Keep fields order

Fields are returned in a random order after marshalling. I would not pay attention to it, if the return type was not OrderedDict. Why use OrderedDict, if the fields are still returned in random order?

I think the problem for this is the use of unordered set here.

It would be great if the fields returned in the order in which they are declared in serializer. It is much prettier for RESTful APIs.

Performance of serializing nested collections is poor

I worked up a quick test using the nose timed decorator.

class TestSerializerTime(unittest.TestCase):

    def setUp(self):
        self.users = []
        self.blogs = []
        letters = list(string.ascii_letters)

        for i in range(500):
            self.users.append(User(''.join(random.sample(letters, 15)),
                email='[email protected]', age=random.randint(10, 50)))

        for i in range(500):
            self.blogs.append(Blog(''.join(random.sample(letters, 50)),
                user=random.choice(self.users)))

    @timed(.2)
    def test_small_blog_set(self):
        res = BlogSerializer(self.blogs[:20], many=True)

    @timed(.4)
    def test_medium_blog_set(self):
        res = BlogSerializer(self.blogs[:250], many=True)

    @timed(1)
    def test_large_blog_set(self):
        res = BlogSerializer(self.blogs, many=True)

    @timed(.1)
    def test_small_user_set(self):
        res = UserSerializer(self.users[:20], many=True)

    @timed(.2)
    def test_medium_user_set(self):
        res = UserSerializer(self.users[:250], many=True)

    @timed(.5)
    def test_large_user_set(self):
        res = UserSerializer(self.users, many=True)

The user tests all pass, but the medium and large blog tests do not. Obviously, these could pass on some machines, but it's still rather slow.

I did a little bit more testing with profile. Serializing the whole blog collection was running between 5 and 6s.

It looks like the bottleneck is the deepcopy operation in serializer.py and it doesn't seem like the call can be removed, or changed to a pickle/unpickle operation.

I'm going to keep digging to see what I can do. If you have any insight, I'd appreciate the help. Thanks!

Better handling of "two-way" nesting

Having two serializers that nest each other is quite awkward. For example, for many-to-one relationship between Books and Authors, you'd have to do something like the following:

class BaseBookMarshal(Serializer):
    date_created = fields.DateTime()
    isbn = fields.String()

class AuthorMarshal(Serializer):
    created = fields.DateTime(attribute='date_created')
    books = fields.Nested(BaseBookMarshal, many=True)

class BookMarshal(BaseBookMarshal):
    author = fields.Nested(AuthorMarshal, allow_null=True)

While this certainly works, having to create the extra BaseBookMarshal class is a bit clunky. It would be nice if you could declare nested serializers without having to worry about declaration order, and just pass class names into the Nested field

class AuthorMarshal(Serializer):
    created = fields.DateTime(attribute='date_created')
    books = fields.Nested('BookMarshal', many=True)

class BookMarshal(Serializer):
    author = fields.Nested('AuthorMarshal', allow_null=True)
    date_created = fields.DateTime()
    isbn = fields.String()

I'm still undecided on whether this is a good idea. Not only with this require more metaclass magicks, but it would necessarily involve implicit removal of fields in order to prevent infinite recursion.

Creating additional fields on the fly

I'm trying to create a very flexible serializer, such that users can generate additional fields in the future. Let's say that today they only need the defaults I've provided

class PostSerializer(Serializer):
    id = fields.String()
    title = fields.String(default="Untitled")
    body = fields.String(default=None)
    author = fields.List(fields.String) 

The user creates several posts, and they decide they want a field for "category." I provide an interface where they set a new category field. Now perhaps I store this field in a dictionary.

additional_fields = {
    "category" : "list"
}

When I modify the serializer on the fly (the only way that seems to work is via Meta.additional, setattr never seems to work)

s = PostSerializer
PostSerializer.Meta.additional = additional_fields.keys()

Posts which were created without the 'category' field will cause the following AttributeError:

AttributeError: "category" is not a valid field for {'id': '123456', 'title': 'Cool Post', 'body': 'Lorem Ipsum...', 'author': ['John', 'Steve']}

How can I maintain flexibility to add user generated fields, but also protect myself in the future? Is there a way to set a global default for additional fields?

Non ValidationError Exceptions still get silently put into errors list

For instance, if I forget importing ValidationError itself, I get a list like this:

{ "username": [ "'Marshmallow' object has no attribute 'ValidationError'" ] }

Now, this should obviously this exception should not be caught by Marshmallow. I'm not sure why that even works. Any idea whether this is a bug or a problem on my side?

My code:

def duplicate_email_validator(email):
    <logic>
    raise ma.ValidationError("Email already exists")

class UserInputSchema(ma.Schema):
    username = ma.Email(validate=duplicate_email_validator, required=True)
    password = ma.String(required=True)

result, errors = UserInputSchema(strict=False).load(request.json)

using an attribute named items when serializing a dict

When serializing a dict with an attribute called "items"
the fails to get the correct "items" value and instead gets the items function of the dict object.
The problem is in utils line 298:
if isinstance(key, basestring) and hasattr(obj, key):
hasattr(obj,key) == True.

skip_missing is not working for fields of type String

It seems that skip_missing option is working only if field will have None value.
In case if input dict will not have some key which is declared as String type we will have and empty string in result.

Sample code:

class UserSchema(Schema):
    first = String()
    last = String()

    class Meta:
        skip_missing = True


test_data = dict(
    first='Name',
)

sch = UserSchema()
print sch.dump(test_data)

[question] Skip missing fields instead of default values

I have tried to find answer myself but failed - it seems it is not supported right now.

In case if I've missed it: is it possible to skip missing fields instead of assigning default values during serialization?

I will try to describe it in example:

some_data = dict(
    first_name='Joe',
    age=20,
)


class TestSchema(Schema):
    first_name = String()
    family_name = String()
    age = Integer()


schema = TestSchema()
print(schema.dump(some_data).data)

Current result: OrderedDict([('first_name', u'Joe'), ('family_name', ''), ('age', 20)])
Desired result: OrderedDict([('first_name', u'Joe'), ('age', 20)])

Of course it is possible to filter the result afterwards. Although it is quite tricky due to the different default values (i.e. for strings, integers) but possible and I've already done it.

I am just curious if I've missed some core functionality.

Make attribute getter function configurable

Currently, marshmallow.utils.get_value is used pull values from many different types of objects (both simple and complex types).

It may be useful to override this behavior, e.g. via a class Meta option, when you know exactly what type of objects you will be serializing and how to pull values from them.

I see two use cases for this:

  • Handling objects that get_value will not work with
  • Optimizing serialization

Nested class can't process dict

Hello~ at first, I'm sorry for my English

Why Nested field can't process dict? only accept instance it's inherited object.

class Book(object):
    title = ''
    author = ''

class BookSerializer(Serializer):
    title = fields.String()
    author = fields.String()

class BookList(object):
    items = list()

class BookListSerializer(Serializer):
    items = fields.Nested(BookSerializer, many=True)

The solo BookSerializer class accept two types well. object and dic.
it's not problems.
for example

# 1st case which using object.
book = Book()
book.title = 'hello android'
book.author = 'leejaycoke'
return jsonify(BookSerializer(book).data)

# 2st case which using dict.
book = {'title': 'hello android', 'author': 'leejaycoke'}
return jsonify(BookSerializer(book).data)

But Nested fields can't accept dic for listing books but object is ok.
for example

# 1st case which using object
book1 = {'title': 'hello android', 'author': 'leejaycoke'}
book2 = {'title': 'hello iOS', 'author': 'tommy'}
book_list = BookList()
book.items = [book1, book2]
return jsonify(BookListSerializer(book).data)
"""
{
    "items": [
        {
            "title": "hello android",
            "author": "leejaycoke"
        },
        {
            "title": "hello iOS",
            "author": "tommy"
        }
    ]
}
"""

# 2st case which using dict it's failed
book1 = {'title': 'hello android', 'author': 'leejaycoke'}
book2 = {'title': 'hello iOS', 'author': 'tommy'}
book = {'items': [book1, book2]}
return jsonify(BookListSerializer(book).data)
"""
TypeError: Could not marshal nested object due to error:
"'builtin_function_or_method' object is not iterable"
If the nested object is a collection, you need to set "many=True".\
"""

can you help me?

[discuss] Validation behavior during deserialization vs. serialization

Is it ok that required fields doesn't work in load() method?

From quickstart example:

class UserSchema(Schema):
    name = fields.String(required=True)
    email = fields.Email()

user = {'name': None, 'email': '[email protected]'}
data, errors = UserSchema().dump(user)
errors  # {'name': 'Missing data for required field.'}

user = {'name': None, 'email': '[email protected]'}
data, errors = UserSchema().load(user)
errors  # {}

I thought that load() method is used for loading model objects from input data and SHOULD support required fields. On the contrary, method dump() is used to serialize inner data and not requires validation at all. Whether I understand everything correctly?

Field value incorrect with many=True option.

Hi there,

First let me apologize by stating that my attempts to create a small, reproducible example have failed. I'm hoping to instead provide examples of what I'm seeing and perhaps you'll be able to tell me what I'm doing incorrectly!

Serialize two items of a list, individually:

(Pdb) EventSerializer(events[0]).data
OrderedDict([('event_id', 11), ('index', None), ('contact_id', 1), ('profile_id', None), ('action', 'updated'), ('type', 'contact')])
(Pdb) EventSerializer(events[1]).data
OrderedDict([('event_id', 13), ('index', None), ('contact_id', None), ('profile_id', 2),('action', 'added'), ('type', 'profile')])

Notice how the 'contact_id' key is 1 in the first, and None in the second. This is as I'd expect. Now, when I serialize the entire list as a whole:

(Pdb) EventSerializer(events, many=True).data
[OrderedDict([('event_id', 11), ('index', None), ('contact_id', 1), ('profile_id', None),  ('action', 'updated'), ('type', 'contact')]), OrderedDict([('event_id', 13), ('index', None), ('contact_id', 0), ('profile_id', 2), ('action', 'added'), ('type', 'profile')])]

Notice that now the 'contact_id' of the second list element is now 0 and not None. Odd!

My serializer definition looks like this:

class EventSerializer(Serializer):
    action = fields.Method('action_to_text')
    type = fields.Method('type_to_text')
    event_id = fields.Integer(attribute='id')

    class Meta:
        fields = ['event_id', 'action', 'profile_id', 
                  'index', 'contact_id', 'type']

    def action_to_text(self, obj):
        return ActionType.to_text(obj.action)

    def type_to_text(self, obj):
        return EventType.to_text(obj.type)

I'm probably missing something obvious...

Thank you for your time!

Improved error message when many omitted

I ran into this when I accidentally omitted many=True in my serializer when it was actually a many relation.

Traceback (most recent call last):
  File "./shell.py", line 26, in <module>
    serializers.MyBSerializer(b).data
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 183, in __init__
    raw_data = self.marshal(self.obj, self.fields, many=self.many)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/fields.py", line 106, in marshal
    item = (key, field_obj.output(attr_name, data))
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/fields.py", line 306, in output
    self.serializer._update_fields(nested_obj)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 234, in _update_fields
    ret = self.__filter_fields(field_names)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 284, in __filter_fields
    print('type(obj_dict[key]): ', type(obj_dict[key])) # Error as obj_dict is a query, not an ORM
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/sqlalchemy/orm/dynamic.py", line 255, in __getitem__
    return self._clone(sess).__getitem__(index)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/sqlalchemy/orm/query.py", line 2206, in __getitem__
    return list(self[item:item + 1])[0]
TypeError: Can't convert 'int' object to str implicitly

This happens at https://github.com/sloria/marshmallow/blob/dev/marshmallow/serializer.py#L280

Perhaps this exception handling should also handle TypeError and then (using the value of self.many) raise an informative message. Basically a suggestion that many=True may have been omitted.

Minimal test case:

class MyB(db.Model):
    __tablename__ = 'myb'
    id = db.Column(db.Integer, primary_key=True)

    myas = db.relationship('MyA', backref='myb', lazy='dynamic')


class MyA(db.Model):
    __tablename__ = 'mya'
    id = db.Column(db.Integer, primary_key=True)
    myb_id = db.Column(db.Integer, db.ForeignKey('myb.id'),
                       nullable=False)


class MyASerializer(Serializer):
    class Meta:
        fields = ('id', 'myb_id')


class MyBSerializer(Serializer):
    myas = fields.Nested(MyASerializer)              # Accidentally Broken
    # myas = fields.Nested(MyASerializer, many=True) # Correct

    class Meta:
        fields = ('id', 'myas')


b = models.MyB()
db.session.add(b)

a1 = models.MyA(myb=b)
a2 = models.MyA(myb=b)
db.session.add(a1)
db.session.add(a2)

db.session.commit()

serializers.MyBSerializer(b).data

Inconsistent behavior between dump(obj, many=True) and Schema(many=True)

Doing

    schema = MySchema(many=True)
    print(schema.dump(mythings))

I get the correct behavior and everything works fine. However, doing

    schema = MySchema()
    print(schema.dump(mythings, many=True))

results in

myfile.py:29: in get print(schema.dump(mythings, many=True)) env/lib/python3.4/site-packages/marshmallow/schema.py:435: in dump self._update_fields(obj) env/lib/python3.4/site-packages/marshmallow/schema.py:583: in _update_fields ret = self.__filter_fields(field_names, obj) env/lib/python3.4/site-packages/marshmallow/schema.py:630: in __filter_fields attribute_type = type(obj_dict[key]) E TypeError: list indices must be integers, not str

Something's up here.

Serializing None with fields.Integer yields 0.0 float

When serializing a None with a field specified as Integer you get a 0.0. float back. I think this is kind of strange, I think that you either shall get 0 back, or perhaps None. I will be happy to submit a patch after discussing a bit first.

I suspect that the behavior comes from this line, as Integer inherits from Number, which has the defaults of 0.0 in here:

https://github.com/sloria/marshmallow/blob/dev/marshmallow/fields.py#L348

What do you think @sloria? I would think that serializing None would yield None back.

Setting default on Nested item

For nested items, with many=True, not only do I want to allow null, but if the field is in fact null, I'd like to return an empty array using default.

This will allow API users to ignore checks for null.

Is there a way to accomplish this with the current feature set?

Always pass single instance to a data handler

Currently, data handler functions are passed the serialized data, as is. This means that if you pass many=True when serializing data, you have to handle a list instead of a single dictionary.j

class AuthorSerializer(Serializer):
    first = fields.String()
    last = fields.String()

@AuthorSerializer.data_handler
def add_fullname(ser, data, obj):
    if ser.many: # data is a list
        for each in data:
            data['fullname'] = ' '.join(data['first'], data['last'])
    else:
        data['fullname'] = ' '.join(data['first'], data['last'])
    return data

It may be more user-friendly to always pass a single dictionary to the data handler function and have marshmallow handle the many parameter automatically. So the following code would work whether you serialize a list or to a single dict:

class AuthorSerializer(Serializer):
    first = fields.String()
    last = fields.String()

@AuthorSerializer.data_handler
def add_fullname(ser, data, obj):
    data['fullname'] = ' '.join(data['first'], data['last'])
    return data

mock.Mock objects don't serialize correctly

This has been a known bug for a while; finally posting it here.

Mock objects from the mock package (or Py3's unittest.mock) are not serialized correctly.

from unittest.mock import Mock
from marshmallow import Schema, fields, pprint

class UserSchema(Schema):
    name = fields.Str()
    email = fields.Email()

schema = UserSchema()
mock_user = Mock()
mock_user.email = 'hi guys'
pprint(schema.dump(mock_user).data)
# {"name": "<Mock name='mock.name' id='4379527880'>", "email": null}

Remove legacy API

In version 2.0, the pre-1.0 legacy API will be completely removed from the codebase.

This includes:

  • Passing object to be serialized to Schema constructor
  • data and errors properties of Schema
  • The error param of Fields (still in question)
  • Arbitrary, Fixed and Price fields (remove in 2.0)
  • Select field (remove in 2.0)
  • Deprecated function validators. Alias validator classes?
  • context argument of Method fields? (in question)
  • @Schema.preprocessor, @Schema.data_handler, etc.
  • MarshallingError and UnmarshallingError (remove in 2.1)
  • QuerySelect and QuerySelectList (remove in 2.2)
  • allow_none and required string arguments (remove in 2.2)

EDIT: Updated checklist based on comments.

pprint displays booleans incorrectly

Because marshmallow's pprint function json-encodes OrderedDicts, booleans display as Javascript booleans, with lowercase letters.

from collections import OrderedDict
from marshmallow import pprint
>>> d = OrderedDict([('foo', True), ('bar', False)])
>>> pprint(d)
{"foo": true, "bar": false}

Support for read-only fields

It would be nice to have the ability to mark fields as read-only. When deserializing, validation should fail if field(s) marked as read-only are present in the target dictionary.

Add option for generating envelops

This should be an option for the Schema class. There should be an option like MySchema(envelope="things") to wrap the generated output/assume an envelope on the input like this:

schema = AlbumSchema(envelope="album")
result = schema.dump(album)
pprint(result.data, indent=2)
# {'album': 
#   { 'artist': {'name': 'David Bowie'},
#     'release_date': '1971-12-17',
#     'title': 'Hunky Dory'}

The reasoning is partly security (http://flask.pocoo.org/docs/0.10/security/#json-security) though this is becoming outdated and partly because some APIs actually work like this. I think this is a proper thing marshmallow should have.

many=True renders wrong when given nothing to render

from marshmallow import Schema, fields, pprint

class User(object):
    def __init__(self, name, email, age=None):
        self.name = name

class ChildSchema(Schema):
    name = fields.String()

class ParentSchema(Schema):
    name = fields.String()
    children = fields.Nested(ChildSchema, many=True)

user = User(name="Monty")
schema = ParentSchema()
result = schema.dump(user)
pprint(result.data)
# -> {'children': {'name': ''}, 'name': u'Monty'}

I would expect the result to be {'children': [], 'name': u'Monty'}. If I set a field to many=True, it should always be a list, no exceptions. What I'm getting instead looks like a mistake.

[feature] Schema-level validation

  • I couldn't find any way to do Schema level validation in the docs. I mean something to do the sort of validations that involve multiple fields (e.g. if field A is less than 10 then field B must be greater than 50). Any suggestion?
  • I found Schema.data_handler for data post-processing. Is there something for data pre-processing? For instance: remove all the spaces from a string before validation, or more generally, apply a function to the data before validation (but after deserialization).

Thanks

MongoEngine model instances always marshal as a list

The way that marshmallow checks to see if it should marshal a list is perhaps a bit error prone when dealing with object instances that implement the __iter__ magic method. For example, MongonEngine document instances implement this method. So any time I try to serialize a MongoEngine document instance it always returns a list. The first thing that came to mind was a flag for the serializer constructor that would force a single instance. However, I'm wary of suggesting you should pollute that space with more args.

Another approach I just thought of is to make the Serializer._marshal property configurable by passing one to the constructor. Otherwise, use the default implementation.

Serialized SQLAlchemy Query resulting in empty dictionaries or an error

Using sqlalchemy, flask & marshmallow.
I have an issue when Serializing an sqlalchemy query.
I seem to get empty dictionaries when I serialize the query result.
If I try to specify column names to serialize then it errors.
Using a query such as:

modules = db.session.query(Hosts.hostname, Modules.name, HostMatrix.enabled).filter(Hosts.hostname == host).all()

To recreate:

import sqlalchemy
from marshmallow import Serializer

modules = [sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 1)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 0)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 0)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 1))]

Serializer(modules, many=True).data

[OrderedDict(), OrderedDict(), OrderedDict(), OrderedDict()]

Serializer(modules, only=('name', 'enabled'), many=True).data

Traceback (most recent call last):
File "", line 1, in
Serializer(modules, only=('name', 'enabled'), many=True).data
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 193, in init
self._update_fields(obj)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 294, in _update_fields
ret = self.__filter_fields(self.only)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 359, in __filter_fields
attribute_type = type(obj_dict[key])
TypeError: tuple indices must be integers, not str

I also created a class to specify the fields to include but that results in the same error.

Deprecate legacy serialization API

As of 1.0.0 the correct way to serialize objects is to use the Serializer.dump method.

Usage of Serializer(some_obj).data will be deprecated, as will the related Serializer.errors and Serializer.is_valid members (dump returns both the serialized data and a dictionary of errors, so these validation methods are redundant).

For the 1.0.0 release, deprecation warnings should be raised.

Deserialization errors not accessible from exception

I'd like to parse incoming request data using schema.load() and have my framework handle any parsing errors.

It seems that returning a clear error message to the user is hard to do, because I cannot figure out how to know from an UnmarshallingError which field actually failed.

Deserialization support is missing

I'm not sure if it can be considered as an issue, but I think that supporting deserialization would really benefit marshmallow.
It's obvious that not every serializer can provide reverse operation, but it's true in many cases. If it's not against your view on what this library should be then I can work on extending marshmallow to support it and prepare pull request.

Email Field validation not working for Schema.dump()

Email field type validation does not appear to work when using Schema.dump() but works fine for Schema.load(). Working example included below:

from datetime import datetime
from marshmallow import Schema, fields, pprint

# model
class Person(object):
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.date_born = datetime.now()

# serializer schema
class PersonSchema(Schema):
    name = fields.String()
    email = fields.Email()
    date_born = fields.DateTime()

person = Person(name='Guido van Rossum', email='invalid-email')
schema = PersonSchema()
dumps = schema.dump(person)

print '--DUMPS--'
pprint(dumps.data)
pprint(dumps.errors)

loads = schema.load({'name': 'Guido van Rossum', 'email': 'invalid-email'})

print '--LOADS--'
pprint(loads.data)
pprint(loads.errors)

Required fields

I can't seem to find in the docs if there is a way to make certain fields required? Is there no present implementation that marks the is_valid call as invalid if a certain field is missing? Would be willing to contrib ๐Ÿ˜„

Using many=True results in problem with Date type

So I have an everyday query like:

things = Thing.query.all()
ThingSerializer(things, many=True).data

This results in contained DateTime objects getting correctly serialized while Date objects don't get serialized!

Example output:

[OrderedDict([('end_date', datetime.date(2011, 1, 4)), ('updated_at', 'Fri, 06 Jun 2014 20:59:56 -0000')])]

Note that end_date is from my SQLAlchemy declarative model and it's a Column.Date type while updated_at is a Column.DateTime type.

However, doing

thing = Thing.query.first()
ThingSerializer(thing).data

results in

OrderedDict([('end_date', '2011-01-01'), ('updated_at', 'Fri, 06 Jun 2014 20:59:56 -0000')])

Note how end_date gets serialized in this case but not in the other and how updated_at always get serialized correctly.

I suspect a typo somewhere that has to do with Date not getting tested a lot or something. Hopefully you can find the issue quickly. :D

Error using "only" parameter with tuple in Serializer.__init__

You have an error which seems to be the result of accidentally iterating over a string as if it were an array. Here is the code to produce the error:

from marshmallow import Serializer
from marshmallow import fields


class UserInputSerializer(Serializer):
    email = fields.String()
    username = fields.String()


json = {"email": "blah"}

user = UserInputSerializer(json, only=('email'))

Error generated:

> Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "<string>", line 12, in <module>
  File "/Users/miles/.../marshmallow/serializer.py", line 193, in __init__
    self._update_fields(obj)
  File "/Users/miles/.../marshmallow/serializer.py", line 294, in _update_fields
    ret = self.__filter_fields(self.only)
  File "/Users/miles/.../marshmallow/serializer.py", line 362, in __filter_fields
    '"{0}" is not a valid field for {1}.'.format(key, self.obj))
AttributeError: "e" is not a valid field for {'email': 'blah'}.

For diagnostic purposes, consider the following code (it shouldn't run -- but it should give a different error). If I change the name of the "email" field to "e", like so:

from marshmallow import Serializer
from marshmallow import fields


class UserInputSerializer(Serializer):
    e = fields.String()
    username = fields.String()


json = {"email": "blah"}

user = UserInputSerializer(json, only=('email'))

... and run this script, I get a similar error (notice the difference is "m" not "e"):

>>> Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "<string>", line 12, in <module>
  File "/Users/miles/.../marshmallow/serializer.py", line 193, in __init__
    self._update_fields(obj)
  File "/Users/miles/.../marshmallow/serializer.py", line 294, in _update_fields
    ret = self.__filter_fields(self.only)
  File "/Users/miles/.../marshmallow/serializer.py", line 362, in __filter_fields
    '"{0}" is not a valid field for {1}.'.format(key, self.obj))
AttributeError: "m" is not a valid field for {'email': 'blah'}.

If I change the field name to simply "e", and pass only=('email') , the code does not generate an error:

from marshmallow import Serializer
from marshmallow import fields

class UserInputSerializer(Serializer):
    e = fields.String()
    username = fields.String()


json = {"e": "blah"}

user = UserInputSerializer(json, only=('e'))

(No error)

The good news is, it seems like the problem is only with tuples. The following code, using a list for the parameter, executes with no errors:

from marshmallow import Serializer
from marshmallow import fields


class UserInputSerializer(Serializer):
    email = fields.String()
    username = fields.String()


json = {"email": "blah"}

user = UserInputSerializer(json, only=['email'])

Ths problem appears to be in the __filter_fields function of Serializer.py.

Please let me know if this is expected behavior and I'm doing something wrong...

Better namespacing in class registry.

"Namespaces are one honking great idea -- let's do more of those!"

Currently, class registry uses a global dictionary. This works. However, when developing a versioned API (my current situation), this will potentially lead to schema names like V1_SomeSchema or SomeSchema_V2.

I understand why the class registry was implemented in this fashion. However, I'm proposing one of two changes.

Easy: Add a schema_group or similar named attribute on schemas that groups schemas into...well, groups. With schema_group defaulting to something sane (such as default or base). This could be implemented on either the actual schema or (even better) on the Meta options for the Schema.

Example

# v1/schemas/__init__.py:
SomeSchema(BaseSchema):
    class Meta:
        schema_group = 'v1'

# v2/schemas/__init__.py:
SomeSchema(BaseSchema):
    class Meta:
        schema_group = 'v2'

And then _registry would resemble:

{
    'v1' : 
        { 'SomeSchema' : [v1.schemas.SomeSchema] },
    'v2' : 
        { 'SomeSchema' : [v2.schemas.SomeSchema] }
}

Of course, class_registry.get_class and how it's implemented with things such as fields.Nested will also have to change to accommodate this change as well.

Harder: Some how create instances of the registry and explicitly pass them around or some how tie them to Schemas (think SQLAlchemy's metadata object). This would be more difficult to implement as things like class_registry would need to change completely. Again, the most likely home for this would be on the Meta class:

v1_reg = Registry()
SomeSchema(BaseSchema):
    class Meta:
        registry = v1_reg

Factory function for creating a serialization function?

I've been toying around with the idea of a factory that allows you to generate serialization functions

serialize_user = UserSerializer.factory()
serialize_user(user)  # {'name': 'Steve Loria' ...}

# Pass in default params
serialize_user = UserSerializer.factory(strict=True)
serialize_user(invalid_user)  # MarshallingError

How to handle (polymorphic) subclasses

I'm using SQLAlchemy's polymorphic identities and have been trying to figure out how to get the UserMarshal to use the BusinessProfileMarshal if the Profile attached to the User is actually a BusinessProfile

class User(db.Model):
    profile = db.relationship('Profile', backref='users')

class Profile(db.Model):

    __mapper_args__ = {
        'polymorphic_identity': 'profile',
        'polymorphic_on': type
    }
    ...

class BusinessProfile(Profile):

    __tablename__ = 'profile_business'
    __mapper_args__ = {
        'polymorphic_identity': 'business',
    }
    ...


class UserMarshal(ma.Serializer):
    class Meta:
        fields = (
            'email',
            'profile',
        )

    profile = fields.Nested(ProfileMarshal)


class ProfileMarshal(ma.Serializer):
    class Meta:
        fields = (
            'first_name',
        )

class BusinessProfileMarshal(ma.Serializer):
    class Meta:
        fields = (
            'first_name',
            'company_name',
        )

Change datetimes to ISO8601 format in docs

As of 1.0.0, DateTime fields serialize to ISO8601 format by default. This makes a number of the examples in the docs show incorrect output (the former default was RFC822). These examples should be updated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.