schematics / schematics Goto Github PK

View Code? Open in Web Editor NEW

2.6K 62.0 286.0 2.52 MB

Python Data Structures for Humans™.

Home Page: http://schematics.readthedocs.org/

License: Other

Python 100.00%

python validation datastructures types schema serialization deserialization

schematics's Introduction

Schematics

Python Data Structures for Humans™.

About

Project documentation: https://schematics.readthedocs.io/en/latest/

Schematics is a Python library to combine types into structures, validate them, and transform the shapes of your data based on simple descriptions.

The internals are similar to ORM type systems, but there is no database layer in Schematics. Instead, we believe that building a database layer is easily made when Schematics handles everything except for writing the query.

Schematics can be used for tasks where having a database involved is unusual.

Some common use cases:

Design and document specific data structures
Convert structures to and from different formats such as JSON or MsgPack
Validate API inputs
Remove fields based on access rights of some data's recipient
Define message formats for communications protocols, like an RPC
Custom persistence layers

Example

This is a simple Model.

>>> from schematics.models import Model
>>> from schematics.types import StringType, URLType
>>> class Person(Model):
...     name = StringType(required=True)
...     website = URLType()
...
>>> person = Person({'name': u'Joe Strummer',
...                  'website': 'http://soundcloud.com/joestrummer'})
>>> person.name
u'Joe Strummer'

Serializing the data to JSON.

>>> import json
>>> json.dumps(person.to_primitive())
{"name": "Joe Strummer", "website": "http://soundcloud.com/joestrummer"}

Let's try validating without a name value, since it's required.

>>> person = Person()
>>> person.website = 'http://www.amontobin.com/'
>>> person.validate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "schematics/models.py", line 231, in validate
    raise DataError(e.messages)
schematics.exceptions.DataError: {'name': ['This field is required.']}

Add the field and validation passes.

>>> person = Person()
>>> person.name = 'Amon Tobin'
>>> person.website = 'http://www.amontobin.com/'
>>> person.validate()
>>>

Testing & Coverage support

Run coverage and check the missing statements. :

$ coverage run --source schematics -m py.test && coverage report

schematics's People

Contributors

Stargazers

Watchers

Forkers

d1on tomwaits tzuryby st0w sjhewitt droot seanoc talos gone brubeck jhalcrow cqfd sethmurphy chopachom nicolaiarocci martyanov fawce okoye keitheis rozza idonethis shreyansb eroh92 caustin nod pombredanne hansent davidykay laprice robspectre talksum jbeluch quantopian jnrowe-retired-forks szaydel nkey yalon jie st4lk gummihaf stuntgoat serverdensity ryanolson exitio krone chenl imbolc mauriciosl coppersmith lesnik mentat-enki jensrantil malicustommade olegpidsadnyi cristobalcl hakenkje reduxdj imclab web5design adamlwgriffiths nside bdickason models duncm joshzzheng punkrockpolly agiledata todazar estebistec voidfiles emmanuj tobigue kstrauser gennady-andreyev cmaron jvantuyl miphreal hoov lipvun bintoro thedrow jmizgajski eeaston kevinjqiu honewatson secat rpk jzlatov nvdnkpr svisser afthill yoloseem kura afey amineyaiche herczy ghotiv qin0385 radaniba martino

schematics's Issues

required=True/False is too binary ;-)

A fairly common feature of RESTful APIs is that there are fields which are expected on Read (GET) which aren't settable on Create (POST or maybe PUT). Examples of these fields are auto-increment IDs and datestamps. To deal with this in Dictshield, I've had to create two documents for each RESTful resource:

class BaseRepresentation(Document):
    id = URLField(required=True)
    createdAt = DateTimeField(required=True)
    updatedAt = DateTimeField(required=True)

class SalesShipment(BaseRepresentation):
    <list of fields>

class NewSalesShipment(Document):
    <identical list of fields>

Note that NewSalesShipment does not inherit from BaseRepresentation - because the id, createdAt and updatedAt fields shouldn't exist in a new sales shipment.

The above works fine but it is obviously less than DRY for long lists of fields. Also having a set of resources called New* looks a bit lame!

Just riffing here but maybe an alternative would be something like:

id = URLField(required=["GET","DELETE"])
...
shipment.validate("POST") # No id is fine

Just a thought! Feel free to close as not a bug, but thought it was worth raising.

EmbeddedDocumentField doesn't allow a set to None

Hi, from the source I think this is a feature and not a bug, but I'd like a workaround. Putting required=false in a EmbeddedDocumentField does allow not to include it when creating the model, but later trying to set it to None fails.

From the setter in source we can see that this is the intended behaviour:

def __set__(self, instance, value):
        if value is None:
            return
        if not isinstance(value, self.document_type):
            value = self.document_type(**value)
        instance._data[self.field_name] = value

So how can we set a EmbeddedDocumentField to None, once it has been assigned a value?

Thanks

Make ujson a DictShield dependency

It's extremely fast. DictShield should use it by default.

ListField(required=True) doesn't invalidate

I have the following Document:

class Article(Document):
    url = URLField(required=True)
    pub_date = DateTimeField(required=True, default=datetime.datetime.utcnow)
    authors = ListField(ObjectIdField(required=True), required=True)

and the test:

article1 = Article(url=ARTICLE_URL, pub_date=datetime.strptime("21/11/12 21:00", "%d/%m/%y %H:%M"))
self.assertRaises(ShieldException, article1.validate)

just don't raise an exception

required=True doesn't allow to set fields to None

Hello, I get "ShieldException: Required field missing - cover:None", while my cover is defined as follows:
cover = StringField(default=None, required=True)

from the following snippet from document.py around line ~310:

# treat empty strings is nonexistent
if value is not None and value != '':
    try:
        field._validate(value)
    except (ValueError, AttributeError, AssertionError):
        raise ShieldException('Invalid value', field.field_name,
                              value)
elif field.required:
    raise ShieldException('Required field missing',
                          field.field_name,
                          value)

I understand that "empty strings as nonexistent", but I just want to make sure I don't miss any particular field required by my scheme. It is ok to have keys set to None or null in my MongoDB. I want some fields to be "required" mainly because I want to avoid triple state in jsons: 1. the value of the key is set and is not null, 2. the value is set and is null, 3. the key is not set. I just want to have more strict scheme and threat 3rd case as an error.

So I suggest to introduce a new key, kind of 'allow_empty' which will make sense only if required=True and if it's set to True will allow None and '' to be valid.

DateTimeField does not serialize properly when used in a ListField EmbeddedDocument

The following code snippet is using the most recent commit of dictshield, 995ba29

I suspect you're already aware of this issue, based on the note in commit 0558c82 but I wanted to enter something for tracking purposes and others' awareness.

import datetime

from dictshield.document import Document, EmbeddedDocument
from dictshield.fields import DateTimeField, EmbeddedDocumentField, ListField

class Item(EmbeddedDocument):
    due = DateTimeField()

class Container(Document):
    items = ListField(EmbeddedDocumentField(Item))

i = Item(due=datetime.datetime.utcnow())
c = Container(items=[i,])

print '------ Item -------'
print i.to_python()
print i.to_json()

print '----- Container -----'
print c.to_python()
print c.to_json()

Output ::
------ Item -------
{'_types': ['Item'], 'due': datetime.datetime(2011, 7, 25, 21, 24, 46, 510683), '_cls': 'Item'}
{"_types": ["Item"], "due": "2011-07-25T21:24:46.510683", "_cls": "Item"}
----- Container -----
{'_types': ['Container'], 'items': [{'_types': ['Item'], 'due': datetime.datetime(2011, 7, 25, 21, 24, 46, 510683), '_cls': 'Item'}], '_cls': 'Container'}
Traceback (most recent call last):
File "ex.py", line 21, in
print c.to_json()
File "/Users/st0w/.virtualenvs/obts-tracking/lib/python2.6/site-packages/dictshield/base.py", line 443, in to_json
return json.dumps(data)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/init.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 367, in encode
chunks = list(self.iterencode(o))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 309, in _iterencode
for chunk in self._iterencode_dict(o, markers):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 275, in _iterencode_dict
for chunk in self._iterencode(value, markers):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 306, in _iterencode
for chunk in self._iterencode_list(o, markers):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 204, in _iterencode_list
for chunk in self._iterencode(value, markers):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 309, in _iterencode
for chunk in self._iterencode_dict(o, markers):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 275, in _iterencode_dict
for chunk in self._iterencode(value, markers):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 317, in _iterencode
for chunk in self._iterencode_default(o, markers):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 323, in _iterencode_default
newobj = self.default(o)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 344, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: datetime.datetime(2011, 7, 25, 21, 24, 46, 510683) is not JSON serializable

id fields don't subclass correctly

The following code does not honor the id field via subclassing. Comment or uncomment the id field in SalesShipment in the code snippet below to demonstrate the bug.

#!/usr/bin/env python

from dictshield.base import ShieldException
from dictshield.document import Document
from dictshield.fields import (URLField,
                               DateTimeField,
                               FloatField,
                               StringField)

class BaseRepresentation(Document):
    id = URLField(required=True)
    createdAt = DateTimeField(required=True)
    updatedAt = DateTimeField(required=True)

class SalesShipment(BaseRepresentation):
    id = URLField(required=True)
    cost = FloatField(required=True, min_value=0)
    currency = StringField(required=True)


data = {
  "id" : "http://localhost:8080/sales-shipments/21e5bde8-f7e9-11e0-be50-0800200c9a66",
  "currency" : "GBP",
  "updatedAt" : "2011-10-28T19:39:22.783271",
  "createdAt" : "2011-10-11T08:39:30",
  "cost" : 0.92
}


shipment = SalesShipment(**data)
try:
    shipment.validate()
except ShieldException, se:
    print 'ShieldException caught: %s' % se

print "This shipment cost %f %s" % (shipment.cost, shipment.currency)

Documentation has to be updated .

Example Uses
There are a few ways to use DictShield. A simple case is to create a class structure that has typed fields. DictShield offers multiple types in fields.py, like an EmailField or DecimalField.

There is no file fields.py and all fields details are located in fields directory. This has to be updated in docs. might be earlier version had fields.py and now details are under fields\__init__.py \base.py \temporal.py \mongo.py contains all field implementation details

Postpone validation until validate() is called

Example:

from dictshield.document import Document
from dictshield.fields import StringField, DateTimeField

class BlogPost(Document):
    title = StringField(max_length=40)
    body = StringField(max_length=4096)
    dt = DateTimeField()

data = {
    'title': 'aaa',
    'body': 'bbb',
    'dt': 'ccc'
}

#bp = BlogPost(**data)
bp = BlogPost()
bp.title = data['title']
bp.body = data['body']
bp.dt = data['dt']
bp.validate()

Gives me:

Traceback (most recent call last):
  File "test1.py", line 19, in <module>
    bp.dt = data['dt']
  File "/Users/up/.virtualenvs/test/lib/python2.7/site-packages/dictshield/fields/base.py", line 561, in __set__
    value = DateTimeField.iso8601_to_date(value)
  File "/Users/up/.virtualenvs/test/lib/python2.7/site-packages/dictshield/fields/base.py", line 586, in iso8601_to_date
    date_info = elements[0]
IndexError: list index out of range

and validate() is never reached.
The problem with this approach is that I have to surround this code with 'try'es of all possible exceptions + ShieldException on validate() or a generic Exception which is a bad idea. I also must catch ValueError because inner implementation of DateTimeField uses datetime.datetime(). The same applies to ObjectId with InvalidId exception and possibly other *Fields.

>>> date_digits = [1000, 1000, 1000, 1000, 1000, 1000, 1000,]
>>> datetime.datetime(*date_digits)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: month must be in 1..12

My guess is that that __set__() must just save the value in it's object and validation must be postponed until actual validate() call is used. Let user call validate() manually which must throw ShieldException() only. If a user want to know how the date is converted from 'string' representation to datetime object, introduce a separate _check() or _convert() and get any specific error (ValueError and IndexError in example case).

In other words:
If user don't want to know anything about what is wrong in data coming from for example web form, he just wraps his model.validate() with try ... except ShieldException, not with all possible exceptions or wildcard Exception. If user want's to validate() each field separately, he calls validate() on each field (which wraps all data conversion related exceptions) and catches the only ShieldException. If user wants to catch any specific low-level data conversion error, he calls _check() or _convert() on every field object (or _check_all() on model?) which is also internally used and wrapped by validate(). Actually I have no idea who needs to get these specific exceptions, it is just enough with ShieldException that says that model/field is wrong, so I propose to have _check() or _convert() private.

Coerce IntField

Hello,

This may not be a bug but a feature but I found IntField will not coerce int while UUIDField will:

class Book(Document):
    uuid = UUIDField() # assign a string, it will create a UUID
    title = StringField(max_length=60)
    year = IntField() # assign a string... won't cast to int

Maybe I miss something... if so, how do I enforce an int ?

Recommend a BCrypt Field

For no other reason than to promote it and make it readily accessible :)

EmbeddedDocumentField does not work when not in a ListField

I believe this may be related to issue #5, however I don't have permission to re-open it and I felt it may be different enough to warrant creation of a new issue.

Currently, EmbeddedDocumentFields only work when used in the context of a ListField. When attempting to embed directly in an object, an error is generated upon instantiating an instance of said object. Here's the most simple example for comparison:

from dictshield.document import EmbeddedDocument,Document
from dictshield.fields import StringField, EmbeddedDocumentField, ListField

class Action(EmbeddedDocument):
    name = StringField(required=True, max_length=256)

class TaskL(Document):
    action = ListField(EmbeddedDocumentField(Action))

class Task(Document):
    action = EmbeddedDocumentField(Action)

a = Action(name='Phone call')

tl = TaskL()
t=Task()

Upon running this, an error is generated when python reaches the last line:

Traceback (most recent call last):
  File "test-simple.py", line 16, in <module>
    t=Task()
  File "/Users/st0w/.virtualenvs/obts-tracking/lib/python2.6/site-packages/dictshield/base.py", line 312, in __init__
    setattr(self, attr_name, value)
  File "/Users/st0w/.virtualenvs/obts-tracking/lib/python2.6/site-packages/dictshield/fields.py", line 458, in __set__
    value = self.document_type(**value)
TypeError: DocumentMetaclass object argument after ** must be a mapping, not NoneType

Add support for MongoDB BinaryField

Here's what works for me:

class BinaryField(BaseField):
    def __init__(self, subtype=None, **kwargs):
        self.subtype = subtype
        super(BinaryField, self).__init__(**kwargs)

    def __set__(self, instance, value):
        if isinstance(value, (str, unicode)):
            kwargs = {}
            if not self.subtype is None:
                kwargs = {'subtype': self.subtype}
            value = pymongo.binary.Binary(value, **kwargs)

        instance._data[self.field_name] = value

    def _jsonschema_type(self):
        return 'string'

    def for_python(self, value):
        try:
            return pymongo.binary.Binary(value)
        except Exception, e:
            raise ShieldException('Invalid Binary', self.field_name, value)

    def for_json(self, value):
        return str(value)

    def validate(self, value):
        if not isinstance(value, pymongo.binary.Binary):
            try:
                kwargs = {}
                if not self.subtype is None:
                    kwargs = {'subtype': self.subtype}
                value = pymongo.binary.Binary(value, **kwargs)
            except Exception, e:
                raise ShieldException('Invalid Binary', self.field_name, value)
        return value

Another documentation clarification

The documentation doesn't make clear that make_json_publicsafe and make_json_ownersafe take the document as an argument, e.g:

data = customer.make_json_publicsafe(customer)

(Out of interest, why is it necessary to pass the customer back into the function as an argument?)

'dict' object has no attribute 'to_json' marshalling error for complex Documents

There's an issue with make_json_*safe for complex representations. Here is the error:

Traceback (most recent call last):
  File "./scratchpad.py", line 205, in <module>
    data = rep.make_json_ownersafe(rep)
  File "/dictshield/document.py", line 183, in make_json_ownersafe
    white_list=white_list)
  File "/dictshield/dictshield/document.py", line 151, in make_safe
    doc_dict[k] = doc_converter(v)
  File "/dictshield/dictshield/document.py", line 177, in <lambda>
    doc_converter = lambda d: d.make_json_ownersafe(doc_encoder(d), encode=False)
  File "/dictshield/dictshield/document.py", line 183, in make_json_ownersafe
    white_list=white_list)
  File "/dictshield/dictshield/document.py", line 156, in make_safe
    doc_dict[k] = field_converter(k, v)
  File "/dictshield/dictshield/document.py", line 175, in <lambda>
    field_converter = lambda f, v: cls._fields[f].for_json(v)
  File "/dictshield/dictshield/fields/base.py", line 604, in for_json
    return value.to_json(encode=False)
AttributeError: 'dict' object has no attribute 'to_json'

There's a gist with a complete test script for this problem here: https://gist.github.com/1427099

Design question: why does Document.validate() raise an exception for one particular field?

The Document.validate() method loops through all fields (in no predictable order, as _fields is just a regular dict) and raises a ShieldException for the first error it finds, which means when you validate() a model you have no idea how many of the fields are invalid. An obvious use case where this isn't ideal is form validation - the user filling out the form can only be alerted of one error when there might be many (unless you explicitly validate each field one by one, but that defeats the point of having a Document-level validate() method)

Is there a benefit to the current behavior that I'm missing?

Ability to call make_*safe() routines directly on EmbeddedDocuments

It would seem a good idea to have the ability to call the make_*safe() routines directly on EmbeddedDocument objects. As a use case, consider where the possible options for EmbeddedDocument values are stored in a table in a database, and then used to generate the EmbeddedDocument objects at runtime, which are then used within the embedding objects.

One might desire to present a list of such possible values for an EmbeddedDocument utilizing the make_json_publicsafe() method to sanitize the results. However as it currently stands, this cannot be done because the make_*safe() routines are part of the Document class.

Alternatively, one may wish to present only the values of a particular EmbeddedDocument. However as it currently stands, the only method is to make_publicsafe() on the Python object, and then use json.dumps() since the result of make_publicsafe() is a dict rather than a Document object.

Could they be moved to the BaseDocument class so all derived classes have access, or would that cause problems with how they're currently called on embedded documents?

Feature request: add global for underscored internal fields only

To work with the new permissions system I'm having to override the _internal_private_fields in a few places:

class BaseRepresentation(Representation):
    """The fields shared by all existing representations
    """
    _internal_fields = ['_id', '_cls', '_types'] # Exclude id as this is an actual field

    id = UUIDField(required=True)
    created_at = DateTimeField(required=True)
    updated_at = DateTimeField(required=True)

class RepresentationLink(EmbeddedDocument):
    """A RepresentationLink holds the ID of and HATEOAS
       path to an individual Representation
    """
    _internal_fields = ['_id', '_cls', '_types'] # Exclude id as this is an actual field

    id = UUIDField(required=True)
    link = EmbeddedDocumentField(AtomLink)

    def __repr__(self):
        return "<RepresentationLink(%s, %s)>" % (self.id, self.link)

This works okay but it's a bit fragile - if DictShield introduces a new internal field (e.g. _timestamp), then my code will start exposing this field incorrectly. One solution would be a DictShield global like:

UNDERSCORE_ONLY = ['_id', '_cls', '_types']

And then I could override with:

_internal_fields = UNDERSCORE_ONLY

I imagine that this will be a very common use case - because id is a very common public field in JSONs.

Add an enum field

Is there something similar to an enum field in dictshield ? I understand there is no enum in python but still it would be nice to be able to limit the range of possible values for a field. Something similar to this:

class mydocument(Document):
    title = StringField(max_length=60, required=True)
    collection = EnumField(values=["books", "magazines","papers"])

I'm willing to implement it if I can get some good advice from you.

Rename EmbeddedDocument to EmbeddableDocument to reflect it works as a Document too

The examples on embedded documents imply that a typical representation will inherit either from Document or from EmbeddedDocument, depending on whether the item is a document itself or embedded within a larger document.

In fact in a RESTful API, every Document is typically also an EmbeddedDocument - because GET "/orders/1" yields an individual order Document, but GET "/orders" will return a list of all orders, which I can define like this in DictShield:

class OrderWrapper(Document):
    orders = ListField(EmbeddedDocumentField(Order))

It turns out that it's fine in DictShield to define a representation as both a document and an embedded document, like this:

class Order(Document, EmbeddedDocument):
    <blah>

I wasn't expecting this to be okay, because Document and EmbeddedDocument sound like alternatives (unlike, say, Document and EmbeddableMixin)...

Maybe something should be added to the documentation to make it clear that it's possible to make a Document "embeddable" by mixing in EmbeddedDocument?

Support for naming convention transforms

Naming convention transforms are a common feature in serialisation libs (e.g. Java Jackson) and ORMs (e.g. Python SQLalchemy and Scala Squeryl). The basic idea is to support different naming conventions (e.g. camelCase, snake_case) expressed in the JSON/SQL table/whatever, without having to breach the style guide of the host language.

At the moment with Dictshield, if I have a JSON which contains updatedAt and createdAt, then I need to rename my Python fields to match - I can't use updated_at and created_at.

The Squeryl way of supporting this is really DRY:

// Auto-translate Scala camelCase field names into database lower_underscore field names
override def columnNameFromPropertyName(n:String) =
  SquerylNamingConventionTransforms.camelCase2LowerUnderscore(n)

The SQLalchemy conversion functions are in model_generator.py

Jerkson (Scala Jackson) uses a @JsonSnakeCase annotation per-field, powered by a snakeCase method.

for_json() attempting to apply isoformat() to a string

This is with the latest version in pip. Test script:

#!/usr/bin/env python

from dictshield.document import Document, EmbeddedDocument
from dictshield.base import UUIDField
from dictshield.fields import DateTimeField, EmbeddedDocumentField
from datetime import datetime

class TestRepresentation(EmbeddedDocument):
    id = UUIDField(required=True)
    created_at = DateTimeField(required=True)

    def __init__(self, id, created_at):
        super(EmbeddedDocument, self).__init__() # Need to call Document constructor
        self.id = id
        self.created_at = created_at

class RootRepresentation(Document):
    test = EmbeddedDocumentField(TestRepresentation, required=True)

    def __init__(self, test):
        super(Document, self).__init__() # Need to call Document constructor
        self.test = test

test = TestRepresentation("c3491590-1ce9-11e1-8bc2-0800200c9a66", datetime.now())
print test.make_json_ownersafe(test)

root = RootRepresentation(test)
print root.make_json_ownersafe(root)

Output:

{"created_at": "2011-12-02T14:28:31.239244"}
Traceback (most recent call last):
  <snip>
  File "/usr/local/lib/python2.7/dist-packages/dictshield/document.py", line 176, in <lambda>
    field_converter = lambda f, v: cls._fields[f].for_json(v)
  File "/usr/local/lib/python2.7/dist-packages/dictshield/fields/base.py", line 362, in for_json
    v = DateTimeField.date_to_iso8601(value)
  File "/usr/local/lib/python2.7/dist-packages/dictshield/fields/base.py", line 351, in date_to_iso8601
    iso_dt = dt.isoformat()
AttributeError: 'str' object has no attribute 'isoformat'

make_json_ownersafe doesn't traverse list of EmbeddedDocumentFields

class Test(EmbeddedDocument):
... text = StringField()
...
class Tester(Document):
... items = ListField(EmbeddedDocumentField(Test))
...
t=Tester(items=[Test(text='mytest')])
Tester.make_json_ownersafe(t)
'{"items": [{"_types": ["Test"], "text": "mytest", "_cls": "Test"}]}'

Without the ListField wrapping the embedded documents, it works just fine.

Ability to manually override field names

At the moment there is no way to unmarshal the following JSON using DictShield:

{
  "code" : "GBP",
  "self" : "http://localhost:8080/currencies/GBP" # Replacing with "link": works fine
}

... because the self key can't be defined as a field in a Document subclass. And similarly it's hard to work with the following:

{
    "customer" : "Bob",
    "email" : "[email protected]",
    "link" : {
      "rel" : "self",
      "href" : "http://localhost:8080/customer/2",
      "type" : "text/xml"
    },
    "link" : {
      "rel" : "next",
      "href" : "http://localhost:8080/customer/3",
      "type" : "text/xml"
    },
    "link" : {
      "rel" : "prev",
      "href" : "http://localhost:8080/customer/1",
      "type" : "text/xml"
    },
}

... because there are three elements with the same name.

It would be nice to have some sort of Jackson @JsonProperty-style naming override, like this:

self_link = URLField(property_name="self", required=True)

self_atom = EmbeddedDocumentField(AtomLink, property_name="link")
next_atom = EmbeddedDocumentField(AtomLink, property_name="link")
prev_atom = EmbeddedDocumentField(AtomLink, property_name="link")

Reason why UUIDField is in base.py not fields.py?

It looks like the original commit was into fields.py but somewhere along the line the UUIDField got moved into base.py - is there a reason why? Maybe a comment could be added to the code if it's important it stays there.

Update DictShield version in pip

Unless I'm mistaken the version of DictShield in pip is very old - e.g. it doesn't have a UUIDField yet...

Can't pass "id" to a document class

I get this error:

File "/usr/local/lib/python2.6/dist-packages/dictshield/document.py", line 262, in init
setattr(self, attr_name, attr_value)
File "/usr/local/lib/python2.6/dist-packages/dictshield/fields/base.py", line 161, in set
value = uuid.UUID(value)
File "/usr/lib/python2.6/uuid.py", line 134, in init
raise ValueError('badly formed hexadecimal UUID string')

Add method to return just fields and values

When using a model to build a SQL query (and possibly other use cases), it's common to want to iterate over just the fields and their values in the dictionary.

A method that excluded key/value pairs that are not a field in the model (like _cls, _types, etc.) would be helpful.

Can't load UUIDField

For some odd reason I can't seem to import the UUIDField. This is with the latest (master) version of DictShield:

>>> import dictshield
>>> from dictshield.fields import StringField
>>> from dictshield.base import UUIDField
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name UUIDField

Move MongoDB specific stuff into separate module

The plan is to move fields.py into fields/init.py and then add a mondo.py module to contain the mongo specific ObjectIdField.

Design question: is field deletion avoided on purpose?

I'm considering hooking dictshield in with an ongoing project, and noticed the lack of __del__ methods on dictshield.fields.*. Is this a deliberate design decision? Or is it more "haven't needed it yet, why complexify unnecessarily"?

FWIW, I'd like to log destructive accesses to fields (__set__ and __del__ if it appears) for some crude STM.

Extending Document with metaclass

I want to extend my Model with metaclass, is it possible?

    class ModelMetaClass(document.TopLevelDocumentMetaclass):
        def __new__(cls, name, bases, attrs):
            super_new = super(ModelMetaClass, cls).__new__
            klass = super_new(cls, name, bases, attrs)

            klass._collection_name = cls.__name__.lower()

            return klass

    class Model(Document, TimeStamped):
        __metaclass__ = ModelMetaClas

I get this error:

Traceback (most recent call last):
File "/Users/up/t/tests/test_models.py", line 25, in test_timestamped
class Model(Document, TimeStamped):
File "/Users/up/t/models/tests/test_models.py", line 21, in new
klass = super_new(cls, name, bases, attrs)
File "/Users/up/.virtualenvs/dictshield/lib/python2.7/site-packages/dictshield/document.py", line 195, in new
for field_name, field in klass._fields.items():
AttributeError: type object 'Model' has no attribute '_fields'

make_json_*safe() functions do not call for_json() to clean field values

Problem in the master branch - consider e.g:

@classmethod
def make_json_publicsafe(cls, doc_dict_or_dicts):
    """Trims the object using make_publicsafe and dumps to JSON
    """
    trimmed = cls.make_publicsafe(doc_dict_or_dicts)
    return json.dumps(trimmed)

There is no calling of for_json() for each field prior to dumping, unlike with e.g. base.py's to_json():

def to_json(self, encode=True):
    """Return data prepared for JSON. By default, it returns a JSON encoded
    string, but disabling the encoding to prevent double encoding with
    embedded documents.
    """
    fun = lambda f, v: f.for_json(v)
    data = self._to_fields(fun)
    if encode:
        return json.dumps(data)
    else:
        return data

Example of the problem:

#!/usr/bin/env python

from dictshield.document import Document
from dictshield.fields import DateTimeField
from datetime import datetime

class TestRepresentation(Document):
    created_at = DateTimeField(required=True)

test = TestRepresentation()
test.created_at = datetime.now()

try:
    test.validate()
except ShieldException, se:
    print "Narcolepsy validation error: %s" % se

data = test.make_json_publicsafe(test)

Throws error:

  File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line 192, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: datetime.datetime(2011, 11, 30, 18, 31, 45, 842086) is not JSON serializable

Allow '$' and '.' in DictField keys

The restriction on https://github.com/j2labs/dictshield/blob/master/dictshield/fields/base.py#L644 is undocumented and looks completely arbitrary.

add validation to minimized_field_name

so that 2 fields can not have the same minimized name and will thrown an initialize time exception

currently, the exception is cryptic and not obvious to the root cause

Support for to_python using small key names

Rob Spychala requests support for shortening key names so they take less space in Mongo.

https://twitter.com/robspychala/status/57914932720697344

Error when using make_json_publicsafe() on Document with EmbeddedDocumentField

Sample code:

from dictshield.document import Document, EmbeddedDocument
from dictshield.fields import EmbeddedDocumentField, IntField, StringField                                                                                    

class Status(EmbeddedDocument):
    _public_fields = ('status_id',
                      'name',)

    status_id = IntField(required=True, min_value=1)
    name = StringField(required=True, max_length=64)


class StudySubject(Document):
    _public_fields = ('subj_id',
                      'status',)

    subj_id = IntField(required=True, min_value=1)
    status = EmbeddedDocumentField(Status, required=True)

stat = Status(name='ON STUDY', status_id=2)

subj = StudySubject(
    status=stat,
    subj_id=123,
)

print subj.to_python()
print subj.make_publicsafe(subj)
print subj.make_json_publicsafe(subj)

Resulting output:

{'status': <Status: Status object>, '_types': ['StudySubject'], '_cls': 'StudySubject', 'subj_id': 123}
{'status': <Status: Status object>, 'subj_id': 123}
Traceback (most recent call last):
  File "./test-status.py", line 30, in <module>
    print subj.make_json_publicsafe(subj)
  File "/Users/st0w/.virtualenvs/obts-tracking/lib/python2.6/site-packages/dictshield/document.py", line 147, in make_json_publicsafe
    return json.dumps(trimmed)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 367, in encode
    chunks = list(self.iterencode(o))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 309, in _iterencode
    for chunk in self._iterencode_dict(o, markers):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 275, in _iterencode_dict
    for chunk in self._iterencode(value, markers):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 317, in _iterencode
    for chunk in self._iterencode_default(o, markers):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 323, in _iterencode_default
    newobj = self.default(o)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 344, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <Status: Status object> is not JSON serializable

make_json_*safe() methods adding lots of escaping to the output

Master branch: make_json_*safe() are outputting JSONs with a ton of incorrect extra \" escaping.

JSON field called "self" throws DictShield error

Test code:

#!/usr/bin/env python

from dictshield.document import Document
from dictshield.fields import StringField

data = {
  "code" : "GBP",
  "self" : "http://localhost:8080/currencies/GBP" # Replacing with "link": works fine
}

class Currency(Document):
    code = StringField(required=True)

curr = Currency(**data)
try:
    curr.validate()
except ShieldException, se:
    print 'ShieldException caught: %s' % se

print "This currency's code is %s" % curr.code

Error:

Traceback (most recent call last):
  File "./scratchpad.py", line 14, in <module>
    curr = Currency(**data)
TypeError: __init__() got multiple values for keyword argument 'self'

add auto_now field to DateTimeField

so that if attempting to init a model will auto set the None field to current time.

useful for created_date fields.

demos/diff_obj_id.py fails

The demos/diff_obj_id.py demo throws the following error:

SimpleDoc:
Traceback (most recent call last):
  File "demos/diff_obj_id.py", line 29, in <module>
    print 'SimpleDoc:', sd.to_python()
  File "/Users/talos/.virtualenvs/caustic/lib/python2.7/site-packages/dictshield/document.py", line 424, in to_python
    data = self._to_fields(fun)
  File "/Users/talos/.virtualenvs/caustic/lib/python2.7/site-packages/dictshield/document.py", line 406, in _to_fields
    data[field.uniq_field] = field_converter(field, value)
  File "/Users/talos/.virtualenvs/caustic/lib/python2.7/site-packages/dictshield/document.py", line 423, in <lambda>
    fun = lambda f, v: f.for_python(v)
  File "/Users/talos/.virtualenvs/caustic/lib/python2.7/site-packages/dictshield/fields/mongo.py", line 42, in for_python
    raise ShieldException('Invalid ObjectId', self.field_name, value)
dictshield.base.ShieldException: Invalid ObjectId - id:4edef13acc50ff07e4000000

ObjectIdField appears to never validate properly once it is assigned.

documentation says ObjectIDField, should be ObjectIdField

Unexpected behavior with field named `id`

It seems that DictShield's internal use of the field named id for tracking ObjectIds can cause some confusion if someone attempts to create their own field named id.

from dictshield.document import Document
from dictshield.fields import IntField

class StudySubject(Document):
   id = IntField(required=True, min_value=1)

s = StudySubject()
s.id = -1
print 'ID: %d' % s.id
s.validate()

Output shows ID: -1, but no exception is thrown for the negative value. This appears to be due to a conflict on the field name id in the following block of code in dictshield.base:

           if field.id_field:
                current_id = new_class._meta['id_field']
                if current_id and current_id != field_name:
                    raise ValueError('Cannot override id_field')

                new_class._meta['id_field'] = field_name
                # Make 'Document.id' an alias to the real primary key field
                new_class.id = field

        if not new_class._meta['id_field']:
            new_class._meta['id_field'] = 'id' # <-- Here be dragons
            new_class._fields['id'] = ObjectIdField(uniq_field='_id') # <-- Here too
            new_class.id = new_class._fields['id']

In this case, the resulting id field that is created is of type <dictshield.base.ObjectIdField>. So when the user tries to validate() the document, the validation code for ObjectIdField is called - and not, as might be expected in this example case, validation for IntField.

It would seem to me that DictShield should protect its own internal id tracking variable, or thrown an exception if the user tries to create a field named id - or possibly at least display some kind of warning. This could lead to significant user confusion, as id is a very commonly used field name.

Understandably, the ability to control an object's ID is very powerful and beneficial for the end-developer and as such it should remain in place. Perhaps a developer wishing to override OID handling should have to do so via a specific parameter passed to BaseDocument.__init__(), say for example id_field as it currently stands? If this were done and DictShield tracked internal OIDs via a protected parameter, then users could still use id as a field name without the conflict.

Ability to set/validate a root key in the JSON

At the moment, DictShield produces JSONs without a root key, as in the following example:

class Media(Document):
    """Simple document that has one StringField member
    """
    title = StringField(max_length=40)

producing:

{
    '_types': ['Media'],
    '_cls': 'Media',
    'title': u'Misc Media'
}

However, many RESTful APIs (see e.g. the Shopify API) wrap the JSON in a root key, like so:

{
  "article": {
    "created_at": "2008-07-31T20:00:00-04:00",
    "body_html": "<p>Do <em>you</em> have an <strong>IPod</strong> yet?</p>",
    <yada>
  }
}

This behaviour isn't something JSON really needs, but it appears in a lot of APIs to achieve document-equivalence with XML (which always has a root key).

You can model a root key in DictShield by making your Document an EmbeddedDocument and wrapping it in a Document which just holds the root key. But it would be nice to have that functionality pre-rolled somehow, either declaratively on each Document class definition or imperatively on the make_json_*safe etc.

A few notes:

This would nicely complement #22 to customise the root key name (e.g. "MovieMedia" -> "movie-media")
Here's how the equivalent functionality works (badly!) in Java Jackson: Use class name as root key for JSON Jackson serialization
Any root key setting should be ignored when a class is being used as an EmbeddedDocument

Documentation typos/clarifications

print 'ShieldException caught: %s' % (se))

^ Additional bracket on end

json_string = request.get_arg('data')
user_input = json.loads(json_string)
u.validate(**user_input)

^ u not defined above. (Also: "This method builds a User instance out of the input, which also throws away keys that aren't in the User definition." - how does the above code know to build a User instance rather than say a Media document? I can't see any reference to User.)

Add FileField

Add support for storing information and validating file uploads using DictShield models.

Static validation methods do not remove rogue fields

The default behavior for static methods is to leave rogue fields in the input. This means the validation is only for fields known to be present with no action taken on fields that shouldn't there.

when i have an objectIdField in Document to_json can't serialize

I'm using dictshield with mongodb backend.
I'm trying things out with a basic user:
User(document):
username = StringField(max_length=50)

UserProfile(document)
owner = ObjectIdField()
city = stringField(max_length=50)

I create the user and save it to mongo
which returns the ObjectId of the user (ie ObjectId('4e1b2d46d1f5ce5af4000000'))
then I create the User profile with the owner = ObjectId of new user
If I try user_profile.to_json() I get error "can't serialize objectId" (line 407 base.py)

I solved the issue by returning unicode(value) for ObjectIdField.for_python (line 141 in base.py).

I'm not sure this is correct solution...

if you could confirm or let me know what i'm doing wrong.

thx

PS: this happens using both objectId from base.py and bson.py

i18n support

Any i18n support planned for DictShield? I'd be glad to help.

choices always validates if Document has an EmbeddedDocumentField

Not sure if this is intended behaviour but it is strange that a Document validates a StringField with invalid choice when that Document has a non-empty EmbeddedDocumentField.
See the Test Case for details.