GithubHelp home page GithubHelp logo

pydantic / pydantic-core Goto Github PK

View Code? Open in Web Editor NEW
1.3K 27.0 184.0 6.6 MB

Core validation logic for pydantic written in rust

License: MIT License

Makefile 0.16% Python 58.44% Rust 41.03% JavaScript 0.27% HTML 0.10%
json-schema parsing pydantic rust schema validation

pydantic-core's Introduction

pydantic-core

CI Coverage pypi versions license

This package provides the core functionality for pydantic validation and serialization.

Pydantic-core is currently around 17x faster than pydantic V1. See tests/benchmarks/ for details.

Example of direct usage

NOTE: You should not need to use pydantic-core directly; instead, use pydantic, which in turn uses pydantic-core.

from pydantic_core import SchemaValidator, ValidationError


v = SchemaValidator(
    {
        'type': 'typed-dict',
        'fields': {
            'name': {
                'type': 'typed-dict-field',
                'schema': {
                    'type': 'str',
                },
            },
            'age': {
                'type': 'typed-dict-field',
                'schema': {
                    'type': 'int',
                    'ge': 18,
                },
            },
            'is_developer': {
                'type': 'typed-dict-field',
                'schema': {
                    'type': 'default',
                    'schema': {'type': 'bool'},
                    'default': True,
                },
            },
        },
    }
)

r1 = v.validate_python({'name': 'Samuel', 'age': 35})
assert r1 == {'name': 'Samuel', 'age': 35, 'is_developer': True}

# pydantic-core can also validate JSON directly
r2 = v.validate_json('{"name": "Samuel", "age": 35}')
assert r1 == r2

try:
    v.validate_python({'name': 'Samuel', 'age': 11})
except ValidationError as e:
    print(e)
    """
    1 validation error for model
    age
      Input should be greater than or equal to 18
      [type=greater_than_equal, context={ge: 18}, input_value=11, input_type=int]
    """

Getting Started

You'll need rust stable installed, or rust nightly if you want to generate accurate coverage.

With rust and python 3.8+ installed, compiling pydantic-core should be possible with roughly the following:

# clone this repo or your fork
git clone [email protected]:pydantic/pydantic-core.git
cd pydantic-core
# create a new virtual env
python3 -m venv env
source env/bin/activate
# install dependencies and install pydantic-core
make install

That should be it, the example shown above should now run.

You might find it useful to look at python/pydantic_core/_pydantic_core.pyi and python/pydantic_core/core_schema.py for more information on the python API, beyond that, tests/ provide a large number of examples of usage.

If you want to contribute to pydantic-core, you'll want to use some other make commands:

  • make build-dev to build the package during development
  • make build-prod to perform an optimised build for benchmarking
  • make test to run the tests
  • make testcov to run the tests and generate a coverage report
  • make lint to run the linter
  • make format to format python and rust code
  • make to run format build-dev lint test

Profiling

It's possible to profile the code using the flamegraph utility from flamegraph-rs. (Tested on Linux.) You can install this with cargo install flamegraph.

Run make build-profiling to install a release build with debugging symbols included (needed for profiling).

Once that is built, you can profile pytest benchmarks with (e.g.):

flamegraph -- pytest tests/benchmarks/test_micro_benchmarks.py -k test_list_of_ints_core_py --benchmark-enable

The flamegraph command will produce an interactive SVG at flamegraph.svg.

Releasing

  1. Bump package version locally. Do not just edit Cargo.toml on Github, you need both Cargo.toml and Cargo.lock to be updated.
  2. Make a PR for the version bump and merge it.
  3. Go to https://github.com/pydantic/pydantic-core/releases and click "Draft a new release"
  4. In the "Choose a tag" dropdown enter the new tag v<the.new.version> and select "Create new tag on publish" when the option appears.
  5. Enter the release title in the form "v<the.new.version> "
  6. Click Generate release notes button
  7. Click Publish release
  8. Go to https://github.com/pydantic/pydantic-core/actions and ensure that all build for release are done successfully.
  9. Go to https://pypi.org/project/pydantic-core/ and ensure that the latest release is published.
  10. Done ๐ŸŽ‰

pydantic-core's People

Contributors

adriangb avatar alexmojaki avatar aminalaee avatar art049 avatar czotomo avatar danielsanchezq avatar davidhewitt avatar dependabot[bot] avatar dmontagu avatar dswij avatar hramezani avatar jeanarhancet avatar kludex avatar lig avatar markussintonen avatar messense avatar mgorny avatar neevcohen avatar ollz272 avatar philhchen avatar prettywood avatar realdragonium avatar samdobson avatar samuelcolvin avatar sanders41 avatar stranger6667 avatar sydney-runkle avatar viicos avatar vvanglro avatar yohanvalencia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pydantic-core's Issues

strict JSON types

I seem to have had a mental lapse and forgotten about strict working properly on JSON types.

JSON Input should match python e.g.:

  • only allow int for int input
  • only allow float for float inputs

Python Exception for custom kind and message

We need a way for errors raise in python to properly populate kind, message, context, maybe even loc.

Solutions:

  1. the simplest solutions is to just look for some attributes on all ValueErrors
  2. we could also create and export a custom Exception, then check for that - tthereby avoiding unnecessary getattr
  3. variation on 1) would be to look for on attribute which must be a dict, then get items from that

I guess we should do some profile to see which is fastest.

`on_error` validator and field property

  • Validator which returns a default value when validation fails
  • setting on TypeDict Field which allows the value to be omitted or the default to be used if an error occurs

recursive dictionaries passed into SchemaValidator result in a panic

For obvious reasons passing in a recursive dict causes a segfault. It's certainly user error, but it might be nice to return a Python RecursionError instead of a segfault

from pydantic_core import SchemaValidator
schema = {"type": "union", "choices": []}
schema['choices'].append(schema)
SchemaValidator(schema)

Include internal details on some errors

We could add another property to line errors with internal details on what went wrong.

E.g. info that we wouldn't want to show to end users but which might help a developer debugging the errors.

Example would be DateTimeObjectInvalid, see #77.

`Sequence` Validator

Like the other sequence like validators, but keep the type.

What do we do about str? I guess we have to allow it, but there are regularly scenarios where you want "any sequence other than a string".

Validating existing models

As per #21 we need to make sure values are copied.

As @tiangolo points out at in pydantic/pydantic#4218 (comment) we need to be able to validate a model without relying on isinstance.

I think we should therefore change how existing models are validated to effectively revalidate model.__dict__. That should solving copying (of models at least) and avoid subclasses being validated as parent classes.

This might have some performance impact, but it'll be much smaller than a hack in python to work around it.

decide about `try_instance`

lax_dict takes a try_instance argument and can build a dict from a python object.

https://github.com/samuelcolvin/pydantic-core/blob/9330a17ea51e566539ba761709f8f5b9aec7553d/src/input/input_python.rs#L140

We need to decide when this should be used and when not, this also needs to be reflected in LookupKey.

Presumably this should be a config setting? But if it's just a config setting, we can't easily have a from_orm method (where this would apply).

Perhaps we're happy to have it set via config, then if from_orm = True in config, Model.parse_obj(my_object) would just work.

We should also use a consistent name, e.g. either from_orm or try_instance or something better, everywhere.

I would like to avoid runtime configuration options if possible.

@PrettyWood what do you think?

Also relates to #108.

make sure all values are copied

I guess in python versions of to_py we could somehow deep copy everything to match current pydantic behaviour.

Won't help performance but I think it's correct.

support positional arguments

We need a way to support positional arguments, this will be helpful for:

  • functions
  • named tuples
  • dataclasses - it should simplify our logic for dataclasses by allowing us to do all the validation before creating the dataclass

I'd like to reuse as much of the logic from TypedDictValidator as possible, in some regards here we can learn from the logic of validate_arguments in pydantic.

My proposal would be this:

  • a new validator which wraps a TypedDictValidator
  • converts positional arguments to named arguments
  • we pass the resulting dict to the function
  • we say we can't support position-only arguments for now - if we really need to support position only argument we could either have another validator or some custom logic, it'll be a bit slow, but that's fine

switch to maturin

If this can be done easily, it should all much faster cross platform builds.

separation of validation and python interface

Hi!

I have been a long time user of pydantic (great library btw) and have been following the development of the rust library.

I was wondering if it would make sense/would be possible to separate the python related code (use of Py* etc.) and rust implementation.

That way it'd be possible to use this great library not only in python code, but also port it to e.g. js or use for validation of rust code.

I'd be happy to take a stab at it and create a PR - if wanted.

More types

  • union - move strait to smart union behaviour
  • literal - need specific cases for all strings and all ints (both using rust sets, call strict_str etc. before) and a general casing using a python set #31
  • strict string
  • strict int etc.
  • custom types - perhaps done, can mostly use the function validators
  • date #77
  • datetime #77
  • set
  • frozen set #86
  • tuple #73
  • bytes #80
  • Decimal this can just be a function

Performance questions?

I'm keen to "run onto the spike" and find any big potential performance improvements in pydantic-core while the API can be changed easily.

I'd therefore love anyone with experience of rust and/or pyo3 to have a look through the code and see if I'm doing anything dumb.

Particular concerns:

  • The "cast_as vs. extract" issues described in PyO3/pyo3#2278 was a bit scary as I only found the solution by chance, are there any other similar issues with pyo3?
  • generally copying and cloning values - am I copying stuff where I don't have to? In particular, is input or parts of input (in the case of a dict/list/tuple/set etc.) copied when it doesn't need to be?
  • Similarly, could we use a PyObject instead of PyAny or visa-versa and improve performance?
  • here and in a number of other implementations of ListInput and DictInput we do a totally unnecessary map, is this avoidable? Is this having a performance impact? Is there another way to give a general interface to the underlying datatypes that's more performance
  • The code for generating models here seems to be pretty slow compared to other validators, can anything be done?
  • Recursive models are slowing than I had expected, I thought it might be the use of RwLock that was causing the performance problems, but I managed to remove that (albeit in a slightly unsafe way) in #32 and it didn't make a difference. Is something else the problem? Could we remove Arc completely?
  • lifetimes get pretty complicated, I haven't even checked if get a memory leak from running repeat validation, should we/can we change any lifetimes?

I'll add to this list if anything else comes to me.

More generally I wonder if there are performance improvements that I'm not even aware of? "What you don't know, you can't optimise"

@pauleveritt @robcxyz

Changes to list, tuple, set, frozenset coersion

Stop coercing set / frozenset to list / tuple?

Although this is not "loosing information", the result is not deterministic/repeatable.

E.g. if you have the Field Tuple[PositiveInt, NegativeInt, str] then the input set {1, -1, 'a'} will work sometimes, and fail sometimes - this is pretty confusing.

I think we should change this.

Stop coercing list / tuple to set / frozenset?

Should we allow coercing a list to a set? In this case we are "loosing information" (e.g. order), however creating a set from a list is often desired - e.g. when parsing a format (yaml, toml etc.) that only has a list type.

I think we should not change this

Add coercing dict_key to set / frozenset?

Not that common, but we have it now and I think it kind of makes sense since dict_key "feels like" (sorry to be fluffy) a set.

I guess since dict_key are ordered, it should be fine to coerce them to list and tuple too.

I guess as with currently, we should allow dict_values to all these types too?

I think we should change this.

Generators?

In pydantic V1 we allow converting a generator to any of these types.

I think we should allow converting a generator to a list or tuple, but not set or frozenset.

@PrettyWood @tiangolo thoughts?

First class field validator

Documenting in person discussion.

It might make sense to have a "type": "field" schema or something along this lines to collect options that apply to the field and not the type of the field, like a "not required" optional that would leave the field unpopulated if it is not included.

implement `timedelta`

Somehow I forgot timedelta the work in speedate is done, just needs the type implementing.

support NamedTuple / other initialization methods

From in person discussion.

Initializing a NamedTuple fails because it's immutable at a low level (object.__setattr__ and such tricks won't work).
Is this something we want to support?
Should we have a field option to specify how the field should be set (__new__ or setattr)? May be related to #59

wasm support

I really want pydantic-core to support wasm. This is mostly so that the examples in pydantic's docs can be edited and run in the browser, but also for wider use of pydantic.

As per PyO3/pyo3#2412 (comment), it looks like it should be possible.

But I'm not sure how to integrate that with maturin github actions. @messense any pointers? Or would you be willing to submit a PR?

Also, as well as getting wheels to build, what more do we need to do to get pydantic-core working with pyiodide?

Support self referencing models

E.g. like

from typing import Optional

from pydantic import BaseModel
from devtools import debug


class Branch(BaseModel):
    name: str
    sub_branch: Optional['Branch'] = None


b = Branch(name='main', sub_branch=Branch(name='sub'))
debug(b)

Feature Request: 3rd party non-JSON serialization/deserialization

Hi, author of pydantic-yaml here. I have no idea about anything Rust-related, unfortunately, but hopefully this feature request will make sense in Python land.

I'm going off this slide in this presentation by @samuelcolvin, specifically:

We could add support for other formats (e.g. yaml, toml) the only side affect would be bigger binaries.

Here's a relevant discussion about "3rd party" deserialization from v1: pydantic/pydantic#3025

It would be great if pydantic-core were built in a way where non-JSON formats could be added "on top" rather than necessarily being built into the core. I understand performance is a big question in this rewrite, so ideally these would be high-level interfaces that can be hacked in Python (or implemented in Rust/etc. for better performance).

From the examples available already, it's possible that such a feature could be quite simple on the pydantic-core side - the 3rd party would create their own function a-la validate_json, possibly just calling validate_python. However, care would be needed on how format-specific details are sent between pydantic and the implementation. In V1 this is done with the Config class and special json_encoder/decoder attributes, which have been a pain to re-implement for YAML properly (without way too much hackery).

Ideally for V2, this would be something more easily addable and configurable. The alternative would be to just implement TOML, YAML etc. directly in the binary (and I wouldn't have to keep supporting my project, ha!)

Thanks again for Pydantic!

add tagged union

As you may already know I love unions: smart, strict and tagged ones ๐Ÿ˜„
I would like to work on a PR to add this.

The proposed syntax would be

'type': 'model',
'fields': {
    'pet': {
        'schema': {
            'type': 'union',
            'tag': 'species',
            'choices': [
                {
                    'type': 'model',
                    'fields': {
                        'species': {'schema': {'type': 'literal', 'expected': ['cat']}},
                        'lives': {'schema': {'type': 'int'}, 'default': 9},
                    },
                },
                {
                    'type': 'model',
                    'fields': {
                        'species': {'schema': {'type': 'literal', 'expected': ['dog']}},
                        'barks': {'schema': {'type': 'bool'}},
                    },
                },
            ],
        }
    },
},

I guess the UnionValidator would become something like

pub struct UnionValidator {
    choices: Vec<CombinedValidator>,
    strict: bool,
    tag: Option<Tag>
}

pub struct Tag {
    name: String,
    field_validator_mapping: HashMap<*const str, *const CombinedValidator>
}

What do you think @samuelcolvin?

Parsing JSON directly

Would be amazing if we could parse and validate JSON directly, without creating python objects, then validating them.

The basic idea would be to create traits to achieve all the conversions used here, then implement those traits for both serde types, and pyo3 types.

Then use those types instead of pyo3 types throughout validators.

If we did this, it also opens the door to using pydantic-core without python ๐Ÿ‘€ - e.g. in an entirely theoretical "Tydantic" typescript package.

Recursion with function before

The recursion guard is unable to catch the following which results in a seg-fault

@pytest.mark.skip(reason='This case causes a seg-fault since the recursion checker cannot detect the cycle')
def test_function_change_id():
    def f(input_value, **kwargs):
        return input_value + ' Changed'

    v = SchemaValidator(
        {
            'choices': [
                {
                    'type': 'function',
                    'mode': 'before',
                    'function': f,
                    'schema': {'schema_ref': 'root-schema', 'type': 'recursive-ref'},
                },
                'int',
            ],
            'ref': 'root-schema',
            'type': 'union',
        }
    )

    with pytest.raises(ValidationError) as exc_info:
        assert v.validate_python('input value') == 'input value Changed'

    print(str(exc_info.value))

This is because the input is changing on each step, so the id isn't found in the recursion guard lookup set.

I don't see how we can detect this without introducing a mini-stack, which would really harm performance.

I think we just put a note in the docs. Saying "if you do really dumb stuff, you can get the validator to recursive infinitely".

add `loc` and `file_position` to `PydanticValueError`

More changes after #185.

Also, pydantic/pydantic#4254.

We should add loc to PydanticValueError which gets appended to the error loc

Also add file_position (tuple[int, int] of (line, col)), one day when we have a custom JSON parser we can populate this in pydantic-core, until then we just add it via PydanticValueError.

file_position will require some pretty output in error messages.

Pypy support

@tiangolo asked about pypy support. @messense do you have input?

Looks like pyo3 does support pypy, see here and PyO3/rust-numpy#219.

But I know pydantic-core uses some non abi3 methods, maybe we use stuff that would cause problems with pypy, we should probably try it sooner rather than later.

simplify recursive refs

I'm not sure if this would be possible (I'm guessing it's not), but it would be nice to be able to recursively refer to a parent validator without having to know a-priori if it will be recursive or not.

Currently, if you are parsing something like:

class Outer:
    inner: List[Outer]

You would have to know that Outer is recursive before you parse its fields.

Would it be possible to have an optional "id" property on every validator that acts as the reference for recursive schemas instead of a special "recursive-container" validator? So:

from typing import List

from pydantic_core import SchemaValidator


class Outer:
    inner: "List[Outer]"


v = SchemaValidator(
    {
        "type": "model-class",
        # id is optional, required for this to be usable as a recursive ref
        # the value is arbitrary, the id of the type seems like a safe choice
        "id": str(id(Outer)),
        "schema": {
            "type": "model",
            "fields": {
                "inner": {
                    "type": "list",
                    "items": {
                        "type": "recursive-ref",
                        "id": str(id(Outer)),
                    }
                },
            },
        },
    }
)

Cleanup and test `Config`

Currently I'm not clear where config is used, and what attributes are respected.

E.g. there are some properties that are used in string.rs that are not in the python types.

It also has minimal tests, we need to test it properly - perhaps a separate test file to be clean.

What's going on here?

I realise I invited collaborators to from pydantic to this repo without an explanation of what's going on.

The plan is obviously to make this repo public, but I want to get the basic design solid before flipping the switch (maybe that's unnecessary, I don't know).

I'd love feedback on this idea in general, and specifics if possible.

I'm not "announcing" this yet, but feel free to discuss it with others if that helps.

The idea is not particularly secret after this but i'm hoping to build some suspense/fomo before going public.

TODO before v0.1 release

This is not going to be the release which pydantic V2 is released with, but we should get a proper release out to give a target for other work.

What needs fixing before we do that?

cache `PyString` for short strings

As per PyO3/pyo3#2463, we could cache the PyString value for short (length <63 say) strings to achieve a significant time saving.

Although it would save time on the specific case of building dicts with repeated inputs, I wonder how much time it would really save in the real world?

Definitely this is an optimisation that should be looked at after pydantic v2 is released.

If we do do it, I guess it should be configurable.

cache.rs in orjson might be a useful starting point for this.

Panic on Ctrl+C

Just got this error when hitting ctrl+c while running SchemaValidator - e.g. creating a validator.

^Cthread '<unnamed>' panicked at 'a Display implementation returned an error unexpectedly: Error', /rustc/90ca44752a79dd414d9a0ccf7a74533a99080988/library/alloc/src/string.rs:2478:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/Users/samuel/code/pydantic-core/create_many.py", line 8, in <module>
    v = SchemaValidator(
pyo3_runtime.PanicException: a Display implementation returned an error unexpectedly: Error

Validation Context

See pydantic/pydantic@c8ba8f1 for an explanation of usage.

I Implemented some of the plumbing for this early on, then forgot the porcelain. Needs completing and testing.

I guess we should also make sure the other kwargs to validation functions make sense at the same time.

Questions regarding performance

In your presentation you talk about achieving a 12x performance improvement for validating a list of dicts with length of 100 elements, is this test consistently accomplishing the same numbers for bigger lists?

The other question that I have is regarding the decision to choosing Rust as the language for the core of pedantic, which was the criteria to choose Rust over other languages like C or C++ or even dotnet?

Yaml native support?

I read here that Pydantic v2 will have native json support....

however a lot of devs are using .yaml for configs (it is much readable for humans).

i susspect i will be able to load it as python object but Strict Mode can't be used and data goes from native code (probably c) to python and back to native code in rust. That is at least kind of stupid

I also susspect other people may want to add validation to other serialization formats (like bson,protobuff or what ever they need). Some of those people would like to do it in rust (idealy (runtime) plugable (not everyone will want everything) but i am not shure if it is easyly acheavable, but it is possible).

that could make pydantic-core serialization agnostic and still have same performance and validation would not just care how data got to it (and i undestand it has to have at least somewhat resembeling json structure and types)

Aplicability would be huge. Because a good developer should always write some validation and lot of time you are doing it whole by hand in some validate method writing checking ranges of numers making regex matchers to strings. or converting int/str to date object conversions... you get the idea and with pydantic it would be just much easier to do it exhaustively.

Those are just ideas and i wanted to share them publicly. Maybe a effort to split out serialization from validation would be to huge.

but that is to @samuelcolvin to decide

main thing is to not rush this... if this will make it in any time ( i am not saying it must be 2.0.0) i would be happy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.