As in several places in Python's syntax, commas in subscripts are a bit overloaded. Do they simply form a tuple, or are they a separator in a distinct special form, as they are in e.g. function calls?
In the runtime, Python tries really hard to pretend that they simply form a tuple: foo[1, 2, 3]
and foo[(1, 2, 3)]
are indistinguishable at runtime (in both cases the __getitem__
of foo
's class receives the tuple (1, 2, 3)
); even foo[1, 2, 3:4]
just sends the tuple (1, 2, slice(3, 4, None))
to __getitem__
.
But there's a catch: the literal syntax for slices (e.g. 3:4
) is not valid anywhere except directly inside a subscript. It is not valid in a tuple. So while foo[1, 2, 3:4]
is valid syntax, foo[(1, 2, 3:4)]
is not, giving the lie to the idea that these forms should be fully equivalent. (What that syntax means is a different question entirely; it's not used in the standard library, but it is used for e.g. slicing multi-dimensional numpy arrays.)
This leaves some gray area in terms of how a syntax tree should represent these forms. It seems that foo[1, 2, 3]
should be a simple tuple index. And in fact, Python's AST does represent it that way! But foo[1, 2, 3:4]
cannot be represented that way without weakening the tuple node to allow it to contain a literal slice, which in general it cannot. So the AST invents a special ExtSlice node which is effectively much like a tuple that can contain slices, and that occurs in the AST only when a subscript contains multiple comma-separated values, at least one of which is a slice.
LibCST handles it similarly, but slightly differently. Today in LibCST the slice
attribute of a Subscript
node can be one of three types:
- An
Index
node containing any arbitrary expression (e.g. the 1
in foo[1]
). This is like the AST.
- A
Slice
node representing a slice such as 1:
or 2:4
or 2:8:2
. This is also like the AST.
- A
Sequence
of ExtSlice
nodes, each of which has an optional trailing comma and which itself has a value that is either an Index
or a Slice
. This is used for cases like foo[1, 2, 3:4]
, but also for simple foo[1, 2, 3]
. Unlike the AST, LibCST uses ExtSlice
anytime the subscript has commas at the top level, not only when one of the elements is a slice.
There's no perfect answer here; either the representation of foo[1, 2, 3]
will be awkwardly un-parallel to foo[(1, 2, 3)]
(the LibCST choice) or awkwardly un-parallel to foo[1, 2, 3:4]
(the AST choice).
In practice, we've found that the irregularity of LibCST's current representation is painful to work with, particularly with subscripts in PEP 484 generic types, because Foo[Bar]
and Foo[Bar, Baz]
have totally different LibCST representations (the former subscript is an Index
containing a Name
, the latter is a length-2 sequence of ExtSlice
each containing a Name
). This runs counter to LibCST's core value of regularity.
One possible fix would be to move in the direction of the AST, and use an Index
containing a Tuple
whenever possible, falling back to ExtSlice
only if one of the elements is a slice. This improves regularity (a bit: you still have to handle both Name
and Tuple
in the above case) as long as you are not using slices in a multi-element subscript, but if you ever do, things get awkwardly irregular again.
So after much discussion with @DragonMinded, we feel that the best option here is to move in the other direction, and regularize a Subscript
to always contain a Sequence
of ExtSlice
, each of which can contain either an Index
or a Slice
. This adds an additional layer in the simple cases of foo[1]
and foo[2:3]
, but it means that traversing a Subscript
is always regular.
Ideally I might suggest that ExtSlice
should also be renamed to something like SubscriptElement
, since the LibCST ExtSlice
bears very little resemblance to the AST one (the AST one is a singular container for a list of children, not a single element in the list), and on its own the name ExtSlice
doesn't communicate clearly (already today, and especially in the new proposal, it will often exist in the absence of any slice at all). This rename could be done backwards-compatibly with a deprecation period if we provide an import shim for the old name.