gatkin / declxml Goto Github PK
View Code? Open in Web Editor NEWDeclarative XML processing for Python
Home Page: https://declxml.readthedocs.io/en/latest/
License: MIT License
Declarative XML processing for Python
Home Page: https://declxml.readthedocs.io/en/latest/
License: MIT License
Hi! Is this funcionality implemented?
xml.integer('..', attribute='nFrames', alias='_nFrames')]
I need for each child to have a copy of its parent "nFrames" attribute. Is this achievable with XPath?
I'm confused about the behaviour of aggregate processors (dictionary
and in particular user_object
) when using required=False
. Consider the following example:
>>> import declxml as xml
>>>
>>> person = xml.dictionary('person', [
... xml.string('name', required=True),
... ], required=False)
>>> root = xml.dictionary('root', [
... person,
... ])
>>>
>>> xml.parse_from_string(root, '<root></root>')
{'person': {}}
>>> xml.serialize_to_string(root, {'person': {}})
'<root />'
I find the default value {}
for missing optional dictionaries a bit strange. It's consistent with the primitive processors (that return e.g. ''
instead of None
or a missing dict key for strings), but it's counterintuitive to me that {'person': {}}
is a valid input to the root processor, but {}
is invalid for the person processor:
>>> xml.serialize_to_string(person, {})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/centos/declxml/declxml.py", line 356, in serialize_to_string
root = root_processor.serialize(value, state)
File "/home/centos/declxml/declxml.py", line 1009, in serialize
self._serialize(end_element, value, state)
File "/home/centos/declxml/declxml.py", line 1041, in _serialize
child.serialize_on_parent(element, child_value, state)
File "/home/centos/declxml/declxml.py", line 1261, in serialize_on_parent
state.raise_error(MissingValue, self._missing_value_message(parent))
File "/home/centos/declxml/declxml.py", line 1369, in raise_error
raise exception_type(error_message)
declxml.MissingValue: Missing required value for element "name" at person/name
This becomes particularly annoying when it comes to optional user_object
processors: when the corresponding element is omitted from the XML, the parser produces an instance of the custom class with no attributes set, which I would think is rarely meaningful.
In my current codebase I'm using the following workaround in my base model class, which results in missing objects parsing to None
:
def __new__(cls, *args, **kwargs):
if all(v is None for v in kwargs.values()):
return None
return super().__new__(cls)
This could alternatively be achieved with an after_parse hook. Either way, though, it's not perfect: in the case where the subprocessor validates against an empty element, empty elements appear as if they are missing, and as far as I can tell there's no way to distinguish them. For example, if we set required=False
on person.name, the documents <root />
and <root><person/></root>
will parse to the same structure.
Changing the way optional processors work would obviously be a large breaking change, but what do you think about adding a default=
parameter to dictionary
, user_object
(and maybe array
)? Then users could simply specify default=None
.
Serialization fails if a namedtuple or object value is None when the processor attempts to convert the namedtuple or object to a dictionary by calling _asdict()
or __dict__
on the None value.
Hi, first thanks for the nice library, looks well-designed and fits the needs of a project I am currently working on.
I have a (heavily) nested document structure, where I use a myriad of named tuples to represent the structure.
When testing it on a real-world example, the named tuple processors work OK in the case the related XML is there, but it does start acting up in case it's not -- even though I specifically set the processor to consider it non-required.
Snippets: (slightly altered)
Processor:
import declxml as xml
processor = xml.named_tuple(
"Invoice",
Invoice,
[
...
xml.named_tuple(
"Reference",
Reference,
[xml.string("ID")],
required=False,
),
]
)
Named tuple definition:
Reference = namedtuple("Reference", ["ID"])
Issue I am getting is:
dict_value = {}
def _from_dict(dict_value):
> return tuple_type(**dict_value)
E TypeError: __new__() takes exactly 2 arguments (1 given)
My best guess is that it has something to do with the named_tuple parsing, where it doesn't really check for any non-required values:
Line 1515 in afefe8a
Can you confirm, or am I missing something?
I did check when specifying it an array (much like the examples given in the docs), it does work, but I actually want a 0 or 1 relationship, rather than a 0 or n relationship.
Running
import json
import declxml as xml
processor = xml.dictionary('Container', [
xml.string('.', attribute='test')
])
container = """
{
"test": 1
}
"""
print('Without indent:', xml.serialize_to_string(processor, json.loads(container)))
print('-----------------------------')
print('With indent', xml.serialize_to_string(processor, json.loads(container), indent=' '))
produces
Without indent: <Container test="1" />
-----------------------------
With indent <?xml version="1.0" encoding="utf-8"?>
<Container test="1"/>
I have an xml as follows:
<elements resource-type="VNF">
<element>
<is-active>true</is-active>
<is-nullable>false</is-nullable>
<is-ref>false</is-ref>
<max-length>8</max-length>
<min-length>8</min-length>
<name-of-the-field>Site Code</name-of-the-field>
<sequence-id>2</sequence-id>
<value>ABC-1234</value>
</element>
</elements>
How will I give the attribute name to this array.
seems following code is not working as attribute is not there for array:
resource_info_processor = xml.dictionary('resource-info',[
xml.array(element_processor, attribute='resource-type', alias='elements'),
xml.boolean('is-active')
]
)
Hi! I really like declxml, and it almost does exactly what I was looking for. I understand if this is not something you want to support.
I would like to construct a custom class from an XML file, but do the transformations of the data in the constructor. As I understand the setup right now, attributes are "magically" set from declxml, instead of being sent to the constructor where I could transform them before saving them.
For this dictionary:
{'Name': 'Shear',
'Type': 'RASResultsMap',
'Checked': 'True',
'Filename': '.\\Test Plan\\Shear Stress (08Aug2020 08 00 00).vrt',
'MapParameters': [{'MapType': 'Shear',
'LayerName': 'Shear Stress',
'OutputMode': 'Stored Current Terrain',
'StoredFilename': '.\\Test Plan\\Shear Stress (08Aug2020 08 00 00).vrt',
'Terrain': 'CBR_041619',
'ProfileIndex': '576',
'ProfileName': '08Aug2020 08:00:00',
'ArrivalDepth': '0'}]}
and this processor:
ras_processor = xml.dictionary('Layer',[
xml.string('.', attribute='Name'),
xml.string('.', attribute='Type'),
xml.string('.', attribute='Checked'),
xml.string('.', attribute='Filename'),
xml.array(xml.dictionary('MapParameters', [
xml.string('.', attribute = "MapType"),
xml.string('.', attribute = 'LayerName'),
xml.string('.', attribute = 'OutputMode'),
xml.string('.', attribute = 'StoredFilename'),
xml.string('.', attribute = 'Terrain'),
xml.string('.', attribute = 'ProfileIndex'),
xml.string('.', attribute = 'ProfileName'),
xml.string('.', attribute = 'ArrivalDepth')
])
)])
After I run xml.serialize_to_string(ras_processor, test, indent=' ')
. The order of the attributes has changed to alphabetical. I would prefer to have the same attribute order as the input dictionary. Can this be done?
<?xml version="1.0" encoding="utf-8"?>
<Layer Checked="True" Filename=".\Test Plan\Shear Stress (08Aug2020 08 00 00).vrt" Name="Shear" Type="RASResultsMap">
<MapParameters ArrivalDepth="0" LayerName="Shear Stress" MapType="Shear" OutputMode="Stored Current Terrain" ProfileIndex="576" ProfileName="08Aug2020 08:00:00" StoredFilename=".\Test Plan\Shear Stress (08Aug2020 08 00 00).vrt" Terrain="CBR_041619"/>
</Layer>
Thanks!
Hello,
I have an XML of a structure:
<categories>
<category parentId="" id="000000001" order="1" tagid="">Red</category>
<category parentId="000000001" id="000000002" order="1" tagid="">Blue</category>
<category parentId="000000002" id="000000254" order="1" tagid="">Green</category>
</categories>
So, you see, it's an array of dictionaries.
Could you please give me a hint how to parse such data?
After reading through the docs here:
http://declxml.readthedocs.io/en/latest/guide.html#attributes
I realized the error I was receiving earlier was due to me trying to read attributes as if they were elements.
However, the docs as linked above only show how to read an attribute from a child element to add an attribute to the parent. What if the element-in-question has no children?
Following the example code, say the XML looked like this:
How would one parse this xml with declxml such that the resulting output had "favoriteFood" with a value of pizza as an attribute of author?
When I parse this:
<Mode System="B">Cable</Mode>
with:
xml.dictionary('Mode', [
xml.string('.', attribute='System')
])
the result when serialized and printed is:
<Mode System="B"/>
What happened to the value (CABLE
)?
Full code:
proc = xml.dictionary('Mode', [
xml.string('.', attribute='System')
])
bar = xml.parse_from_string(proc, '<?xml version="1.0" encoding="UTF-8"?><Mode System="B">Cable</Mode>')
print(xml.serialize_to_string(proc, bar, indent=' '))
Hi @gatkin,
This looks like a great package, providing a much saner way to interact with XML data; the documentation is complete and clear as well.
Being able to use both classes and namedtuples is a very convenient, but I feel there's some duplication of info going on if you're using type annotations (as they've been added in recent Python version). To demonstrate what I mean:
from dataclasses import dataclass
import decxml as xml
@dataclass
class Extent:
xmin: float
ymin: float
xmax: float
ymax: float
extent_processor = xml.user_object("extent", Extent, [
xml.floating_point("xmin"),
xml.floating_point("ymin"),
xml.floating_point("xmax"),
xml.floating_point("ymax"),
]
I'm stating twice that the attributes should be floats. It's pretty straightforward to define a function which does this for you:
type_mapping = {
bool: xml.boolean,
int: xml.integer,
float: xml.floating_point,
string: xml.string,
}
def make_processor(datacls):
fields = []
for name, vartype in datacls.__annotations__.items():
xml_type = type_mapping[vartype]
field = xml_type(name)
fields.append(field)
return xml.user_object(datacls.__name__.lower(), datacls, fields)
extent_processor = make_processor(Extent)
This is all you need for simple processors (for typing.NamedTuple as well, mutatis mutandis).
Aggregate processors are easy to include via recursion, although you probably want to encode the "aggregateness" somewhere. After some playing around, I find encoding it in the type to be most straightforward:
import abc
class Aggregate(abc.ABC):
pass
@dataclass
class Extent(Aggregate):
xmin: float
ymin: float
xmax: float
ymax: float
@dataclass
class SpatialData(Aggregate):
epsg: str
extent: Extent
def make_processor(datacls):
fields = []
for name, vartype in datacls.__annotations__.items():
if issubclass(vartype, Aggregate):
field = make_processor(vartype)
else:
xml_type = type_mapping[vartype]
field = xml_type(name)
fields.append(field)
return xml.user_object(datacls.__name__.lower(), datacls, fields)
spatialdata_processor = make_processor(SpatialData)
This provides a very concise way of defining (nested) data structures -- which I'd generally want to do anyway -- and turn them into XML processors with a single function call and adding a new base class (which can even be monkey-patched at runtime, if needed).
I'm not sure you'd really want to put this in declxml
(see the trouble below), but I do think it's useful (and non-trivial) enough to maybe warrant a section in the documentation. What do you reckon?
I haven't tried it yet, but I'm pretty sure you can use typing.Optional and typing.List to map to the declxml equivalents.
There's some trouble due to with the fact that XML has a separation between attributes and elements. For the XML's I'm working with, I don't really see a reason to separate between attributes and elements (of course, neither does JSON, or TOML, etc.) But you need to encode it somehow, or it won't end up the in the right place of the XML. But I can solve in it a slightly hacky way, by (ab)using typing.Union:
from typing import Union
class Attribute(abc.ABC):
pass
@dataclass
class Example:
a: Union[Attribute, int]
b: int
c: int
example = Example(1, 2, 3)
To write an XML:
<example a=1>
<b>2</b>
<c>3</c>
</example>
We can check again by inspecting the annotations:
def is_union(vartype):
return hasattr(vartype, "__args__") and (vartype.__args__[0] is Attribute)
This shouldn't trip up any type checker, but it is clearly not quite intended use: you'll never provide an Attribute as the value.
There's more issues with the fact that sometimes you need to include names that aren't part of the dataclass or the namedtuple, e.g. an array in the xml, where every entry is tagged "item":
<item value="-5980.333333333333" label="-5980" alpha="255" color="#0d0887"/>
<item value="-5863.464309399999" label="-5863" alpha="255" color="#1b068d"/>
I can't use something as general as "item" as my class name. This how I want to see it in Python:
@dataclass
class Color(Attribute):
value: str
label: str
alpha: str
color: str
Of course, I can just fall back to regular use at any time, and provide the name which is only part of the processor, not of the dataclass:
color_processor = xml.user_object(
"item",
Color,
[
xml.string(".", attribute="value"),
xml.string(".", attribute="label"),
xml.string(".", attribute="alpha"),
xml.string(".", attribute="color"),
],
)
At any rate, you can just mix and match as needed: when everything's encoded in the dataclass or namedtuple, you can generate the processors automatically; if not, you just have to write a few extra lines or provide an explicit name.
Similarly, there's cases where aliases are required. In my case, I'm lowering class names and replacing underscores by dashes: so it's sorta implicitly defined. Stuff like this makes me think it might be smarter to let the user figure out the details of their idiosyncratic XML format, and provide a "base recipe" to help them along a little.
Or perhaps you see a better way that is nice and general?
Parsing XML into an immutable object such as
@attr.s(frozen=True)
class Book(object):
title = attr.ib()
author = attr.ib()
fails because the class does not allow calls to the __setattr__
method
Is there a way to model a recursive element?
e.g.
<foos>
<foo>
<bar>hello world</bar>
<foos>
<foo>
<bar>blah blah</bar>
<foos/>
</foo>
</foos>
</foo>
</foos>
where a list of foos
has a foo
which itself may have a list of foos
For better documentation and integration with other projects that prefer to use type hints, all public API functions should be updated to include type hints.
First, thanks for writing this. It's a great little tool that does it's job well, and I love the declaritive style.
One issue I wanted to point out though is that the exceptions that are thrown leave much to be desired in terms of usefulness. I'm stuck right now on processing an xml file because I get:
declxml.MissingValue: Missing required element: "name"
In my case, I've deduced that the problem is likely something not being marked as optional that should be and I'm sure I'll track it down shortly, but I had to figure that out on my own. At the VERY least, it would be extremely helpful if the error message explained what the current state of the parser was at the time it found an issue. What node was the parser looking at where it couldn't find an attribute "name"? Or is this an issue with the user_object being passed in not having an attribute called "name"? It's too vague which lead to me having to spend quite a bit of extra time stepping through declxml code to figure out what was going on.
Just a suggestion.
Under python2, using non ASCII characters raises UnicodeEncodeError.
How would you like this to be addressed?
Hi! I'm trying to parse a tricky piece of XML that has some important information in attributes, that I would like to turn into a dictionary. Instead of explaining this in writing, here's some code... I'm trying to write a processor that would make the assert True:
import declxml as xml
data = """
<item>
<poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="25">
<results numplayers="1">
<result value="Best" numvotes="0"/>
<result value="Recommended" numvotes="0"/>
<result value="Not Recommended" numvotes="9"/>
</results>
<results numplayers="2">
<result value="Best" numvotes="4"/>
<result value="Recommended" numvotes="12"/>
<result value="Not Recommended" numvotes="4"/>
</results>
<results numplayers="3">
<result value="Best" numvotes="11"/>
<result value="Recommended" numvotes="7"/>
<result value="Not Recommended" numvotes="1"/>
</results>
</poll>
</item>
"""
processor = ...
parsed = xml.parse_from_string(processor, data)
assert parsed == {
"players": {
"1": {"Best": 0, "Recommended": 0, "Not Recommended": 9},
"2": {"Best": 4, "Recommended": 12, "Not Recommended": 4},
"3": {"Best": 11, "Recommended": 7, "Not Recommended": 1},
}
}, parsed
I've tried two different ways to get this working.
element/@attribute
syntax to "select" the value of the element and use the result of that as the key of the dictionary:processor = xml.dictionary('item', [
xml.dictionary("poll[@name='suggested_numplayers']", [
xml.dictionary("results/@numplayers", [
xml.dictionary("result/@value", [
xml.integer("result", attribute="numvotes"),
])
])
], alias="players"),
])
This fails with "KeyError: '@'", likely because it searches for an element, not an attribute.
attribute
to filter down to the element's value.processor = xml.dictionary('item', [
xml.dictionary("poll[@name='suggested_numplayers']", [
xml.dictionary(xml.string("results", attribute="numplayers"), [
xml.dictionary(xml.string("result", attribute="value"), [
xml.integer("result", attribute="numvotes"),
])
])
], alias="players"),
])
This fails with "TypeError: '_PrimitiveValue' object is not subscriptable", likely because it doesn't expect a primitive processor there, but a string.
Is there any way to get the result I'm looking for? Anything similar to what I'm looking for?
Hi @gatkin. Great project! I started using declxml
this week and hope to contribute in the future.
I wanted to let you know your project is available on Arch Linux platforms:
https://aur.archlinux.org/packages/python-declxml-git/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.