mahmoud / glom Goto Github PK
View Code? Open in Web Editor NEW☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
Home Page: https://glom.readthedocs.io
License: Other
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
Home Page: https://glom.readthedocs.io
License: Other
annoying_xml = """<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark>
<name>Simple placemark</name>
<description>Attached to the ground. Intelligently places itself
at the height of the underlying terrain.</description>
<Point>
<coordinates>-122.0822035425683,37.42228990140251,0</coordinates>
</Point>
</Placemark>
</kml>
"""
# Does this work?
import glom
glom.glom(target=annoying_xml, spec='Placemark.Point.coordinates')
# What about this?
import xml.etree.ElementTree as ET
tree = ET.fromstring(annoying_xml)
glom.glom(target=tree, spec='Placemark.Point.coordinates')
# I'm trying to get this
print(tree[0][2][0].text)
# -122.0822035425683,37.42228990140251,0
Each one of those two spec work separately.
from glom import Spec, T, glom
d = {'a': [1, {'b': 2}, 3], 'c': {'d': [4, 5]}}
spec = Spec(T['a'][1]['b'], T['c']['d'][1])
print(glom(d, spec))
We already have a snippet documented for automatic Django ORM type iteration. Should this behavior happen automatically if glom and Django are in the same env?
If so, I think we may want to make an environment variable disabling this behavior, for those who wish to avoid the runtime import overhead. Right now glom.core
depends on nothing but the stdlib and represents a very lightweight import. That won't stay true if it tries importing from a bunch of paths and either failing or loading large codebases that aren't even necessarily used. (see also: mahmoud/ashes#31)
I was thinking about the challenge of calculating the "coverage" of Glom that @mahmoud raised on the Test & Code podcast.
Manually writing parameterised tests for Pytest would be cumbersome and also you wouldn't know the coverage
In the JSON specification, there are objects
and arrays
, an object can contain values with a fixed list of types, and an array can contain either arrays or objects.
As there is only a fixed number of types for a value
,
If you converted this into a feature matrix, you could then (deciding on N
) first, map out the potential combinations.
Feature 1 | Feature 2 | Feature 3 | Feature 4 | Feature-N |
---|---|---|---|---|
null | object | object | object | object |
string | number | array | array | |
string |
Then converting that feature matrix into a numpy array, you could dynamically generate all of the possible combinations. Since JSON supports an infinite level of nesting, you would have to fix the limted depth to N.
Once you have this you can calculate the possible number of combinations, create test data for each and use them as parameterised values.
Then, since Glom is a DSL, you again decide on N levels of operations-deep and calculate the same feature matrix for Glom.
The possible number of combinations (and your 100% coverage) is then a product of the 2 feature matrices.
You could apply the same technique to generate the same tests.
('path', 'segment') gives a much worse error message than 'path.segment' currently
maybe there's a way we can make the error message of the tuple form good
I have some array of objects like this:
target = [
{'id': 0},
{'id': 1},
...
]
I now want to get the object with id=0 for example. Is there a way to do this with glom?
My current solution looks like this and doesn't feel that elegant:
glom.glom(target, [lambda t: t if t['id']==0 else glom.OMIT])[0]
I also got it working with
glom.glom(target, ([lambda t: t if t['id'] == pk else glom.OMIT], glom.T[0]))
which still looks not that clean to me.
Python has the extremely-handy enumerate
function, allowing you to iterate through a list and get the index of the current element. What's the team's view on adding similar functinality to Glom? Could be very useful for data exploration/validation (c.f. #7), so that you can find your way back to where a value came from.
Don't have a good idea of the best interface for this, but something like:
target = {
"foo": [
{"a": 1, "b": 2},
{"a": 3, "b": 4}
]
}
spec = ("foo", ["b"])
result = glom(target, spec, trace=True)
# [("foo.0.b", 2), ("foo.1.b", 4)]
So if I was using
this to power a validator ensuring that, say, no "b" was greater than 3, it could do:
for path, value in result:
if value > 3:
print(f"Warning: {path} greater than 3!")
It seems like the information needed to power this is all there already, based on the output of a recursive Inspect
. Is there already some easy way of achieving this that I'm missing?
=================================== FAILURES ===================================
________________________________ test_coalesce _________________________________
def test_coalesce():
val = {'a': {'b': 'c'}, # basic dictionary nesting
'd': {'e': ['f'], # list in dictionary
'g': 'h'},
'i': [{'j': 'k', 'l': 'm'}], # list of dictionaries
'n': 'o'}
assert glom(val, 'a.b') == 'c'
assert glom(val, Coalesce('xxx', 'yyy', 'a.b')) == 'c'
with pytest.raises(CoalesceError) as exc_info:
glom(val, Coalesce('xxx', 'yyy'))
msg = exc_info.exconly()
assert "'xxx'" in msg
assert "'yyy'" in msg
assert msg.count('PathAccessError') == 2
> assert "[PathAccessError(KeyError('xxx',), Path('xxx'), 0), PathAccessError(KeyError('yyy',), Path('yyy'), 0)], [])" in repr(exc_info.value)
E assert "[PathAccessError(KeyError('xxx',), Path('xxx'), 0), PathAccessError(KeyError('yyy',), Path('yyy'), 0)], [])" in "CoalesceError(<glom.core.Coalesce object at 0x7ffff236b550>, [PathAccessError(KeyError('xxx'), Path('xxx'), 0), PathAccessError(KeyError('yyy'), Path('yyy'), 0)], [])"
E + where "CoalesceError(<glom.core.Coalesce object at 0x7ffff236b550>, [PathAccessError(KeyError('xxx'), Path('xxx'), 0), PathAccessError(KeyError('yyy'), Path('yyy'), 0)], [])" = repr(CoalesceError(<glom.core.Coalesce object at 0x7ffff236b550>, [PathAccessError(KeyError('xxx'), Path('xxx'), 0), PathAccessError(KeyError('yyy'), Path('yyy'), 0)], []))
E + where CoalesceError(<glom.core.Coalesce object at 0x7ffff236b550>, [PathAccessError(KeyError('xxx'), Path('xxx'), 0), PathAccessError(KeyError('yyy'), Path('yyy'), 0)], []) = <ExceptionInfo CoalesceError tblen=4>.value
glom/test/test_basic.py:75: AssertionError
________________________ test_path_access_error_message ________________________
def test_path_access_error_message():
# test fuzzy access
with raises(GlomError) as exc_info:
glom({}, 'a.b')
assert ("PathAccessError: could not access 'a', part 0 of Path('a', 'b'), got error: KeyError"
in exc_info.exconly())
> assert repr(exc_info.value) == "PathAccessError(KeyError('a',), Path('a', 'b'), 0)"
E assert "PathAccessEr...'a', 'b'), 0)" == "PathAccessErr...'a', 'b'), 0)"
E - PathAccessError(KeyError('a'), Path('a', 'b'), 0)
E + PathAccessError(KeyError('a',), Path('a', 'b'), 0)
E ? +
glom/test/test_path_and_t.py:56: AssertionError
The snippet Filter Iterable
glom(['cat', 1, 'dog', 2], Check(types=str, default=OMIT))
Gives me results:
Sentinel('OMIT')
Adding a pair of brackets gives me the expected result.
glom(['cat', 1, 'dog', 2], [Check(types=str, default=OMIT)])
Result:
['cat', 'dog']
Not sure if an issue but was hoping to get some insight as to how you might specify this spec.
I have a dictionary that looks like this:
data = {"data": [{'name': 'bob',
'custom': {
'hobbies': [{
'swimming': True
}]
}
},
{'name': 'joey',
'custom': {
'hobbies': [{
'swimming': False
}]
}
}]
}
and I want to get a dictionary that looks like:
{'names': ['bob', 'joey'], 'swims': [True, False]}
the closest I'm able to get is by using this spec:
spec = {"names": ("data", ["name"]), "swims": ("data", ["custom"], ["hobbies"], [['swimming']])}
but the 'swims' attribute comes back as [[True], [False]]
Is there a way to get those attributes out of those inner lists by adjusting the spec? It seems trivial but actually nested lists multiple levels deep seems pretty common in json.
BTW glom is super handy!
Thanks, W
would be handy if Assign could take S-based paths as well as T-based
>>> glom({}, Assign(T['foo'], 'bar'))
{'foo': 'bar'}
it seems intuitive that Assign(S...) would work the same way:
>>> glom({}, (Assign(S['foo'], 'bar'), S['foo']))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\users\kurt\workspace\glom\glom\mutable.py", line 100, in __init__
path = Path(path)
File "c:\users\kurt\workspace\glom\glom\core.py", line 277, in __init__
% sub_parts[0])
ValueError: path segment must be path from T, not T
just expanding ideas, getting some terminology out that can nucleate further docs / cookbook items and guide future development --
a "macro" or "glomacro" in the glom context is a function or callable that:
outputs a spec
input may be anything (valid spec, or any python objects, or both)
is meant to run once at spec definition time
a "compiler" or "glompiler" in a glom context is a function or callable that:
takes a spec as input
output may be anything (valid spec, or any python object)
is meant to run once against a spec (since specs are meant to be small in number and global, things generated from specs should be similar)
both of these concepts could eventually be supported by official Macro and Compiler types; this would be a stepping stone to important tools like coverage checking
both of these concepts (but especially compilers) could be supported by glom specs that accept and/or output other glom specs, aka "meta-specs" or "glometas"
I downloaded sources and run
sudo: /configure ERROR configure: error: *** A compiler with support for C++14 language features is required.
Can you help me?
The Assign is a great enhancement of glom features. But it seems to work only with fixed values, whereas the fetch syntax allows execution of function at a given path. The most simple example would be:
from glom import assign,Call,T
o = {'a': 2}
def f(x):
return 2 * x
# classic python
o['a'] = f(o['a'])
# > {'a': 4}
assign(o, 'a', f) # doesn't work
# > {'a': <function f at 0x7effe21b91e0>}
assign(o, 'a', Call(f)) # doesn't work
# > Call(<function f at 0x7f65c8c7a1e0>, args=(), kwargs={})
assign(o, 'a', f(T['a']))
# TypeError: unsupported operand type(s) for *: 'int' and 'TType'
A more realistic use case would be:
from functools import partial
from operator import itemgetter
sort_by_quantity = partial(sorted,key=itemgetter('quantity'))
assign(fat_nested_structure,'long.path.to.a.list.of.dicts',sort_by_quantity)
# expected fat_nested_structure["long"]["path"]["to"]["a"]["list"]["of"]["dicts"] get sorted inplace by 'quantity'
Hi, thanks for the awesome project!
Is there a way to convert [{"id": 1, "name": "foo"}, {"id": 2, "name": "bar"}]
to {1: "foo", 2: "bar"}
?
Best regards, Artem.
Might be nice to have the equivalent of re.compile()
or a runnable spec object.
E.g.
s = Spec({ glom-spec })
result = s.glom(target)
I'm mapping a large dictionary to a smaller (but still with many properties) one using glom, and ran into a strange issue. I'm using Coalesce
to easily enable a default for every property in the mapped values, and T
to ease legibility of my code. I'm not sure if I'm missing something here and using it incorrectly, or if this is a difference which shouldn't happen. It basically looks like in the context of Coalesce
, T
works differently than a simple string path.
Simplified version:
from glom import glom, T, Coalesce
complete_data = {
'object': {
'prop1': 1,
'prop2': {
'nested': 2,
},
},
}
limited_data = {
'object': {
'prop1': 1,
'prop2': None,
},
}
def full_coalesce(obj_spec):
return {k: Coalesce(v, default=None) for k, v in obj_spec.items()}
obj = T['object']
working_spec = full_coalesce({
'prop1': obj['prop1'],
'nested': 'object.prop2.nested',
})
broken_spec = full_coalesce({
'prop1': obj['prop1'],
'nested': obj['prop2']['nested'],
})
print glom(complete_data, working_spec)
print glom(limited_data, working_spec)
print glom(complete_data, broken_spec)
print glom(limited_data, broken_spec)
The output:
{'prop1': 1, 'nested': 2}
{'prop1': 1, 'nested': None}
{'prop1': 1, 'nested': 2}
Traceback (most recent call last):
File "testing-glom.py", line 41, in <module>
print glom(limited_data, broken_spec)
File "./venv/lib/python2.7/site-packages/glom/core.py", line 919, in glom
ret = self._glom(target, spec, path=path, inspector=inspector)
File "./venv/lib/python2.7/site-packages/glom/core.py", line 949, in _glom
val = self._glom(target, subspec, path=path, inspector=next_inspector)
File "./venv/lib/python2.7/site-packages/glom/core.py", line 994, in _glom
ret = self._glom(target, subspec, path=path, inspector=next_inspector)
File "./venv/lib/python2.7/site-packages/glom/core.py", line 978, in _glom
ret = _t_eval(spec, target, path, inspector, self._glom)
File "./venv/lib/python2.7/site-packages/glom/core.py", line 678, in _t_eval
cur = cur[arg]
TypeError: 'NoneType' object has no attribute '__getitem__'
show options how to filter a list
1- lambda list comprehension on the outside lambda t: [e for e in t if cond]
2- lambda return OMIT lambda v: v if cond else OMIT
First of all, this is a wonderful package, and basically ideal for my core use case.
When writing tests for glommy kinds of stuff, I sometimes have to go the other way: assigning something to some deeply nested data structure:
my_obj["a"]["b"]["c"]["d"] = True
Right now, all my ways of handling this are kind of cloodgy. But it seems like there could be an inverse glom that could do something like this:
iglom(my_obj, "a.b.c.d", True)
Realistic? Is there something that already does this? I bet there's something that already does this and I missed the memo.
as we are building larger and larger glom-specs it is important to make sure unit tests are covering all the nooks and crannies; for that reason add to Inspect or wherever is appropriate a coverage ability --
c = Coverage(SPEC)
result = glom(target, c)
print(c.coverage_report())
something akin to this (maybe Coverage
is really Inspect
)
two steps:
1- walk the whole spec and get all children specs and put them in a set
2- during execution, remove specs from the set as they are hit
afterwards, generate a report -- first idea for this is a pretty-printed version of the glom with children that were hit colored green and children that are missed colored red
Hi,
sorry if this is too obvious, but from the tutorial I can't understand if glom supports grouping or if this is not something it's intended to do.
If you extend the planets example to include a category and sum up the moons by that category?
from glom import glom, T
target = {'system': {'planets': [{'name': 'earth', 'category':1, 'moons': 1},
{'name': 'pluto', 'category':1, 'moons': 5},
{'name': 'uranus', 'category':2, 'moons': 5},
{'name': 'jupiter', 'category':2, 'moons': 69}]}}
spec = T['system']['planets']
if I wanted to sum up the moons by category, I could run
[
sum([p['moons'] for p in glom(target, spec) if p['category'] == 1]),
sum([p['moons'] for p in glom(target, spec) if p['category'] == 2])
]
But is there a way of adding filtering (for example sum only category 2) or grouping directly to the spec?
Of is this outside the scope of glom?
##UPDATE##
A spec without the T notation works for filtering:
from glom import OMIT
spec = ('system.planets', [lambda x : x['moons'] if x['category'] == 2 else OMIT], sum)
As the T-notation is not fixed yet, I'm closing the issue.
Now that #32 has added scopes to glom, it's possible to do multi-target glomming. I think the next frontier in this area is enabling relative paths.
The obvious analogy is that filesystem paths: T
is .
and we could add something like ..
, which could enable embeddable spec components and other self-referential fun.
There's an undocumented UP
constant that works with T
, but it doesn't work on Path
, and it doesn't function like ..
in that it can't exist at the start of a path.
I haven't needed UP
much, but I'll be keeping an eye out for utility. If it starts getting useful, I suggest we close the gaps above, and rename it to U
to keep the pattern with S
and T
.
the job of a Traverse
is to walk its target recursively and return an iterator over all of the bits (as in depth-first or breadth-first traversal) -- this could perhaps share some bits with TargetRegistry
this is very useful when combined with Check
and Assign
for a kind of pattern-matching strategy:
# not sure if Traverse even needs an argument or if it should just implicitly walk current target
# maybe the argument should specify what it iterates over: just items, items + paths, etc
glom(target, (Traverse(T), (Check(T.val, validate=lambda t: t<0), Assign('val', 0)))
# ensure T.val >= 0
if there was an un-traverse glom possible, that would be even more powerful; but in the absence of that being able to do something to the items being traversed is still useful
the ultimate goal of this kind of approach is a useful meta-glom -- you can imagine transformations like "set all defaults to a unique marker object that stores the path" to debug why an output is coming as None
the ultimate, ultimate goal being useful glom-macros (glomacro
?) and glom-compilation (glompilation
?)
The YAML test test_yaml_target
fails due to a missing YAML file (test_valid.yaml
). That file doesn't exist in the distribution and it's not clear where it's supposed to come from.
=================================== FAILURES ===================================
_______________________________ test_yaml_target _______________________________
def test_yaml_target():
cwd = os.path.dirname(os.path.abspath(__file__))
# Handles the filepath if running tox
if '.tox' in cwd:
cwd = os.path.join(cwd.split('.tox')[0] + '/glom/test/')
path = os.path.join(cwd, 'data/test_valid.yaml')
argv = ['__', '--target-file', path, '--target-format', 'yml', 'Hello']
> assert main(argv) == 0
glom/test/test_main.py:23:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
glom/cli.py:83: in main
return cmd.run(argv) or 0
/nix/store/jpyxgdwhniixch3cqq9g922vrsg8pfkj-python2.7-face-0.1.0/lib/python2.7/site-packages/face/command.py:380: in run
return inject(wrapped, kwargs)
/nix/store/jpyxgdwhniixch3cqq9g922vrsg8pfkj-python2.7-face-0.1.0/lib/python2.7/site-packages/face/sinter.py:59: in inject
return f(**kwargs)
<string>:6: in next_
???
glom/cli.py:173: in mw_get_target
_error('could not read target file %r, got: %s' % (target_file, ose))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
msg = "could not read target file '/build/glom-18.3.1/glom/test/data/test_valid.yaml', got: [Errno 2] No such file or directory: '/build/glom-18.3.1/glom/test/data/test_valid.yaml'"
def _error(msg):
# TODO: build this functionality into face
print('error:', msg)
> raise CommandLineError(msg)
E CommandLineError: could not read target file '/build/glom-18.3.1/glom/test/data/test_valid.yaml', got: [Errno 2] No such file or directory: '/build/glom-18.3.1/glom/test/data/test_valid.yaml'
glom/cli.py:101: CommandLineError
----------------------------- Captured stdout call -----------------------------
error: could not read target file '/build/glom-18.3.1/glom/test/data/test_valid.yaml', got: [Errno 2] No such file or directory: '/build/glom-18.3.1/glom/test/data/test_valid.yaml'
I work on a project that flings around deeply nested Python structures with wild abandon. glom nicely handles the "get something from this structure even if all the branches of the path aren't there" and now I can replace some code I wrote. Yay!
The other side of things that I need to handle is setting a value in a deeply nested structure where the branches of the path may not be there.
For example, maybe something like this which uses dicts:
>>> from glom import glom_set
>>> foo = {}
>>> glom_set(foo, 'a.b.c', value=5)
>>> foo
{'a': {'b': {'c': 5}}}
There are more complex tree manipulations that could be done, but at the moment I'm thinking about setting a single leaf value.
Is manipulating deeply nested data structures in place in-scope for glom?
I'm dealing with some nested lists and would like to do something like:
>>> target = {
... 'f1': 'v',
... 'f2': [{
... 'f3': 'a',
... 'f4': 0,
... 'f5': [{
... 'f6': 1,
... 'f7': 2, ...
... }, ...], ...
... }, ...], ...
... }
>>> glom(target, ...)
[{'f1': 'v', 'f2.f3': 'a', ..., 'f2.f5.f6': 1},
{'f1': 'v', 'f2.f4': 0, ..., 'f2.f5.f6': 1},
{'f1': 'v', 'f2.f3': 'a', ..., 'f2.f5.f7': 2}, ...]
I can get to the list of 'f2.f5.f6' kind of fields but how do I merge this list with parent values? Is this even possible?
Look at this beauty ❤️
from toolz import curry
from toolz.curried import pipe, map
from glom import glom
callsigns = [{'callsign':'goose'}, {'callsign':'maverick'}]
@curry
def glom_curried(spec, v):
return glom(v, spec)
# --- userland code ---
convert_callsigns = glom_curried({'name':'callsign'})
print(pipe(callsigns,
map(convert_callsigns),
list))
perhaps this can be a nice API:
from glom.curried import glom
convert_callsigns = glom({'name':'callsign'})
print(pipe(callsigns,
map(convert_callsigns),
list))
Is there a spec that can do the following ?
d = {'a': 1, 'b': 2}
glom(d, <some_spec_here>)
>> (1,2)
My intended use case is to extract tuples of selected data from complex items, that could be then used for sorting or grouping. Glom API creates easily dict outputs, but in the case the output needs to be an immutable tuple. It also differs from the nested list example of the tutorial because each element of the tuple is reached through a different path.
@mahmoud @kurtbrose Hi, guys,there is a question i can not understand,please help me.
from glom import glom
target = {'system': {'planets': [{'name': 'earth'}, {'name': 'jupiter'}, {'name2': 'jupiters'}]}}
print glom(target, ('system.planets', ['name']))
Why this occur PathAccessError?
Does this mean the dict which key is name2 can not be in the list?
And how can i get data like this {'plants':['earth', 'jupiter', 'jupiters']} in a pythonic way use glom?
thank you.
Related to #81
The same problem I have with other attributes in dicts that must be renamed before actually working with them.
def tranform_data(webhook):
return glom(webhook_data, (
Assign('project.id', Spec('project.uid')),
# Issue tranformation:
Assign('object_attributes.id', Spec('object_attributes.uid')),
Assign('issue', Spec('object_attributes')),
))
And then usage:
webhook_data = {'project': {'uid': 1}, 'object_attributes': {'uid': 2}}
modified = tranform_data(webhook_data)
# => {'project': {'id': 1, 'uid': 1}, 'object_attributes': {'id': 2, 'uid': 2}, 'issue': {'id': 2, 'uid': 2}}
That can quickly become to complex inside, when we really want ot just rename the key.
So, that why I suggest to make Rename
mutation.
That's how it would work:
def tranform_data(webhook):
return glom(webhook_data, (
Rename('project.id', Spec('project.uid')),
# Issue tranformation:
Rename('object_attributes.id', Spec('object_attributes.uid')),
Rename('issue', Spec('object_attributes')),
))
webhook_data = {'project': {'uid': 1}, 'object_attributes': {'uid': 2}}
modified = tranform_data(webhook_data)
# => {'project': {'id': 1}, 'issue': {'id': 2}}
I've been using glom a fair amount lately, but one thing that's mildly frustrating is how obvious it is that I'm using it:
a = data['owner']['name']['last']
b = glom(data, T['owner']['name']['last'])
Personally, I think the second assignment is a bit more convoluted and hard to read at a glance. However, if there was some sort of operator overload on T
, I think it could be a lot cleaner:
c = data | T['owner']['name']['last']
Looking at the definition of TType
, it seems like adding an operator overload would be pretty simple. In this case, it would be the __ror__
operator.
Another idea that I think might clean things up a bit is a wrapper that facilitates something like this:
d = G(data)['owner']['name']['last']
But I'm not sure how you would "conclude" the lookup and tell it to evaluate, rather than keep providing a nestable object.
T = target, S = scope
P = path?
P is the 0 of path operations?
Path(T, P, P, P, P) == Path(T)
currently there are a few places we assume T is the "blank" path; this paints us into a corner when we want to break a path into chunks as in assign when we break the "GOTO" prefix from the "ASSIGN THIS VALUE" tail of the path
maybe we should have a true "empty" path global, and T and S should be like different root directories
P for Path?
R for Relative?
Right now glom is only extensible in the sense that you can register new types for automatic handling, etc.
But internally there's an emerging signature of what plugins to glom's recursion could look like. I think after #7 adds validation we could look at turning that API into a GlomContext
object and exposing this.
Per @jcollado's comment in #7, most validation libraries don't support custom error messages.
I've just merged #25 which brings in the first iteration of Check()
, which can perform a variety of validations. With that in place, we can discuss custom error messages as an enhancement.
If you look at the docstring of Check, you'll see that there's a validate
kwarg, which accepts callables. I'm thinking of making this a mapping of callables, where the key is the callable and the value is a message or message template.
Thoughts, @jcollado, @kurtbrose, others?
glom.glom may be able to use yield for the same reason as twisted / asyncio / etc -- to form a trampoline function and avoid infinite recursion
that is, we could avoid recursion limit in cases like this
>>> a = {'a': glom.T}
>>> for i in range(500): a = {'a': a}
...
>>> glom.glom(1, a)
#...
File "glom\chainmap_backport.py", line 113, in new_child
return self.__class__(m, *self.maps)
File "glom\chainmap_backport.py", line 63, in __init__
self.maps = list(maps) or [{}] # always at least one map
RuntimeError: maximum recursion depth exceeded while calling a Python object
had a real world example of a using-scope vs data structures outside of scope pop up
thought it might be helpful to rework for cookbook or docs -- showing a relatively simple case how both approaches work
def totally_outside_glom():
models = queryset()
models_by_id = { model.id: model for model in models}
values = models.values('id', 'bar')
for valdict in values:
valdict['model'] = models_by_id[val_dict[['id']]]
glom(values,
[{
'foo': ('model.foo', T()),
'bar': 'bar'
}])
def using_scope():
models = queryset()
values = models.values('id', 'bar')
glom(values,
[{
'foo': S['models-by-id'][T['id']].foo(),
'bar': 'bar',
}],
scope={
'models-by-id': { model.id: model for model in models}
}
)
maybe it could be pushed even further down into the scope
Have you considered adding a "safe navigation operator" to Glom?
It could be very elegant and powerful to use a target of a?.b?.c
, and it's a well established feature in several languages.
While this wasn't on my mind when I first started out, it's been pointed out to me that glom
may also benefit from a validation story.
Some preliminary design work suggests the following would work well:
Check()
specifier typetype=...
, value=...
, and maybe other kwargs.action
kwarg to determine what to do if the Check
fails the condition ('omit', 'raise', other?)Inspect
, where it can wrap a spec or appear on it's own (probably after the spec it's supposed to check)This is great for an assert
-like functionality here and there, but for heavily Checked specs, we may want to have a convenience construct of some sort.
From https://glom.readthedocs.io/en/latest/snippets.html
glom({1:2, 2:3}, Call(dict, args=T.items()) <--- missing a paren,
glom({1:2, 2:3}, lambda t: dict(t.items()))
glom({1:2, 2:3}, dict)
also crashes when fixed
>>> from glom import glom, T, Call
>>> glom({1:2, 2:3}, Call(dict, args=T.items()))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/glom/core.py", line 1446, in glom
ret = _glom(target, spec, scope)
File "/usr/local/lib/python3.6/dist-packages/glom/core.py", line 1462, in _glom
return spec.glomit(target, scope)
File "/usr/local/lib/python3.6/dist-packages/glom/core.py", line 747, in glomit
return _eval(self.func)(*args, **kwargs)
TypeError: dict expected at most 1 arguments, got 2
I got there because I can't figure out how to transform into a dict where the keys are coming from the data, which is a bunch of dicts in a list (and its data from a JQL response). For ex:
{ 'issues' : [ { 'id': '999999', 'summary': 'this is issue 999999'},
{'id': '888888', 'summary': 'this is 888888'}
]
}
I'd like to get this into an new dict that looks like:
{ '999999': 'this is issue 999999', '888888': 'this is issue 888888' }
The data driven example above makes some sense but I can't wrap my head around how to do it in the context of the issues list.
Will go back and try some more .Thanks
T and S are really hard to google for -- see Q and F in django queries;
in the documentation verbiage refer to "T" as "TARGET" and "S" as "SCOPE", then in all code samples we can use T and S, and advise users to do so, but if you google for "what is TARGET glom" they will get meaningful hits in our docs
python_requires
allows you to bundle actual machine-readable metadata about what Python versions are supported in your distributions. It is best to add it before you need it, since pip
falls back to the most recent version of a library where your python version meets the requirements, so you don't want to add this after your support matrix changes.
I think this project needs to add this to its setup()
:
python_requires=">=2.7, !=3.0.*, !=3.1.*, !=3.2.*,!=3.3.*,!=3.4.*"
Note that you must also build with a recent version of setuptools and upload your packages with twine in order for PyPI to respect it.
Right now the tutorial is coherently designed, tested, and even documented. However, it doesn't build up in a way that's very beginner friendly. It establishes glom's value and then immediately uses it at an intermediate level.
I'd like it if it was a bit more drawn out to use basic features first and then add a multi-line Coalesce
as the finisher. The announcement blog post does a better job of this, but doesn't go far enough before jumping ahead to T
.
At first, T and Path seem like they have a lot of overlap. But T is for very specific access, and Path() is for more looser, more general access. It doesn't look like either is going away, or merging into the other and that's good because that means this is all we need to do:
T
objects inside of Path
, such that Path(T)
gives the same result as Path()
(See note below)_get_path
support resolving TsGlomKeyError
and so forth, opting instead for the PathAccessError throughout.The one disadvantage is that users won't be able to use T
as keys in their own dict targets, but that seems a very very niche case. The one challenge is making the PathAccessError messaging around the "index" of the path with the problem reflect the T
traversal.
Sometimes we need to move data from key to key.
def format_data(webhook_data):
return glom(webhook_data, (
Assign('object_attributes.project', Spec('project')),
))
And then using it:
webhook_data = {'project': {'uid': 1}, 'object_attributes': {'uid': 2}}
modified = format_data(webhook_data)
# => {'project': {'uid': 1}, 'object_attributes': {'uid': 2, 'project': {'uid': 1}}}
It works, but this way we tend to use more memory that we possibly can. And storing extra 'project'
key is not required. It maybe way too memory consuming when dictionaries are big. And there are many inputs.
So, that why I suggest to add Move
class that will handle that.
With Move
it will work like so:
def format_data(webhook_data):
return glom(webhook_data, (
Move('object_attributes.project', Spec('project')),
))
webhook_data = {'project': {'uid': 1}, 'object_attributes': {'uid': 2}}
modified = format_data(webhook_data)
# => {'object_attributes': {'uid': 2, 'project': {'uid': 1}}}
If you like the idea - I would be happy to work on it.
I'm running into a weird issue using glom in PySpark on Databricks.
This expression:
glom(ping, (T[stub]["values"].values(), sum), default=0)
(where stub
is "a11y_time"
)
is consistently throwing this exception when I run it on my real data:
/databricks/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
318 raise Py4JJavaError(
319 "An error occurred while calling {0}{1}{2}.\n".
--> 320 format(target_id, ".", name), value)
321 else:
322 raise Py4JError( Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 243, 10.166.248.213, executor 2): org.apache.spark.api.python.PythonException:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/worker.py", line 229, in main process()
File "/databricks/spark/python/pyspark/worker.py", line 224, in process serializer.dump_stream(func(split_index, iterator), outfile)
File "/databricks/spark/python/pyspark/serializers.py", line 372, in dump_stream vs = list(itertools.islice(iterator, batch))
File "/databricks/spark/python/pyspark/rdd.py", line 1354, in takeUpToNumLeft yield next(iterator)
File "<command-26292>", line 10, in to_row
File "<command-26292>", line 5, in histogram_measures
File "/databricks/python/local/lib/python2.7/site-packages/glom/core.py", line 753, in __getitem__ return _t_child(self, '[', item)
File "/databricks/python/local/lib/python2.7/site-packages/glom/core.py", line 791, in _t_child _T_PATHS[t] = _T_PATHS[parent] + (operation, arg)
File "/usr/lib/python2.7/weakref.py", line 330, in __getitem__ return self.data[ref(key)]
KeyError: <weakref at 0x7f84c7d2f6d8; to '_TType' at 0x7f84c8933f30>
The object that's crashing it is, itself, totally unremarkable:
{'submission_date': u'20180718', 'a11y_count': None, 'a11y_node_inspected_count': None, 'a11y_service_time': None, 'toolbox_time': None, 'toolbox_count': None, 'a11y_time': None, 'branch': u'Treatment', 'client_id': u'some-random-uuid', 'a11y_picker_time': None, 'a11y_select_accessible_for_node': None}
The Python that Databricks is running looks like 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609]
.
I can't reproduce it on my Mac in 2.7.14 or 2.7.12.
{T}
could work exactly like [T]
but build up a set rather than a list
Python doesn't have a great track record of CLIs that handle piping well. Basically, we need to break that mold and make sure that when glom plays nicely in shell pipelines.
Some links on the topic:
face
middleware should make it easy to semi-contextually catch the error, register a different signal handler, or even inject wrapped stdin/stdout handles.
how could I find branch item by name (may be name='branch-a-a-a' , or maybe name='branch-b' ) from a deeply (may be depth > 10) nested json?
Can anyone help me ? thanks.
{
"root": [
{
"type": "branch",
"name": "branch-a",
"children": [
{
"type": "branch",
"name": "branch-a-a",
"children": [
{
"type": "branch",
"name": "branch-a-a-a",
"children": [
{
"type": "leaf",
"name": "leaf-a"
},
{
"type": "leaf",
"name": "leaf-aa"
}
]
}
]
}
]
},
{
"type": "branch",
"name": "branch-b",
"children": [
{
"type": "leaf",
"name": "leaf-ba"
},
{
"type": "leaf",
"name": "leaf-bb"
}
]
}
]
}
The CLI help specifies that 'python-full' is an acceptable option for the spec format, but if you try and use it, you get an error: expected spec-format to be one of python or json
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.