instagram / libcst Goto Github PK

A concrete syntax tree parser and serializer library for Python that preserves many aspects of Python's abstract syntax tree

Home Page: https://libcst.readthedocs.io/

License: Other

Python 86.03% Rust 13.96% Shell 0.01%

libcst's People

Contributors

Stargazers

Watchers

Forkers

jimmylai bgw hydpublic rayjzeng zac-hd okorolev garetht stykmartin stjordanis mvismonte chrahunt petersktang dendisuhubdy freemanzyq alanyee engmux thatch kelsolaar amrlotfy77 eoranged shannonzhu mercileesb hide5stm jhance mananpal1997 zsol shifu-engineer pradeep90 willcrichton zomglings 5l1v3r1 rowillia feng-tao barseghyanartur josieesh lrjball hauntsaninja mkurnikov yashsehgal yuji-mizobuchi maggiemoss sk- luciawlli b4tt3r10 kronuz abdulniyaspm teymour-aldridge pombredanne isidentical michaelbelousov cdonovick lgp2xkssvznrmq akx francescelies benhgreen tor4z mteterin djyou brandonwillard faisal-deepsource browniebroke lpetre lisroach arun-rama lyft lensvol zhammer stroxler josverl classicvalues rodrigozhou pkshr amyreese mapleccc ichillous timorobin wenpeng8019 sehz nikolaik ariebovenberg dkgi 0xgpapad stanislavlevin isabella232 dmitryvinn jschavesr wiyr toofar e-scheer python-repository-hub adamchainz devdave encukou dariorussi-zz superbobry zzl0 chenguang-zhu ljodal dhruvmanila icodein

libcst's Issues

Bowler integration

Bowler provides nice API and CLI over writing codemods. However, Bowler uses the lib2to3 CST format, which you claim is complex and "makes it hard to extract the semantics we care about."
However, Bowler is useful in that it provide it provides a CLI interface, which make is very easy to use in CI/CD environments.
Is there any plan to integrate libCST with Bowler? If so, when?

Immutable module attribute on libcst visitor / best way to use code_for_node

Seems like a flow for me when debugging transformers will be adding a new attribute to a visitor/transformer to capture the visited module as self.module. i'd do this in the visit_Module handler so that I can later do print(self.module.code_for_node(current_node).

is this in sync with how code_for_node is being used internally at instagram? this gets a bit confusing if the original module is mutated during tree traversing so that code for node on an updated_node may not be totally accurate. (though this is where my full understanding of the visitor pattern gets fuzzier).

just curious how y'all use code_for_node in debugging workflows.

[WIP] API for fully qualified name

Probably using FullQualifiedNameProvider using FullRepoManager.
Questions: how can we make sure FullRepoManager is easy to use, if people run TypeInferenceProvider without FullRepoManager, will it raise Exception?

Add a `with_deep_changes` to make change deep attribute of Literal/Sequence value easier.

We have useful helpers to generate an immutable updated node: .with_changes(**changes) and .deep_repalce(old_node: CSTNode, new_node: CSTNode).

In many cases, user may want to update an attribute having a non CSTNode value (a str/int literal or sequence) and if it's deep in the tree, we'll need something like this example:

# we want to replace the old_value as new_value
node = cst.parse_expression("fn(a = "old_value", b = 2)")
updated_node = node.deep_replace(node.args[0].value, node.args[0].value.with_changes(value="new_value")
# note: we cannot pass node.args[0].value.value to deep_replace since 
# it's a str instead of a CSTNode.

If we can have a .with_deep_changes(old_node: CSTNode, new_node: CSTNode, **changes) which combines .with_changes() and .deep_replace(), the code can be really simple:

updated_node = node.with_deep_changes(node.args[0].value, value='"new_value"')

Lint Warning or Test for Missing Copyright Headers

When creating a new file, its easy to forget to add the license preamble and copyright header to the file. It would be nice if either a test or a lint would catch this before landing. Ideally this would be a lint rule, but our lint framework that uses LibCST isn't open source yet, so it makes it a bit difficult to use this to implement the rule. This isn't high-priority, but would be nice to add to our lint tox env once it is open sourced.

Add test coverage report to CI

Make LibCST well tested is one of our major motivation and we used to check test coverage in previous development environment.
https://libcst.readthedocs.io/en/latest/motivation.html#well-tested

However, when we moved to GitHub, we lost the visibility of test coverage. We should add it to CI and make it available for every revision to make sure changes always come with good tests.

Add a py.typed file

Can you add a py.typed file (https://www.python.org/dev/peps/pep-0561/) for libcst? Without it I can't use the libcst types in my mypy typechecking when importing the package.

I understand if you're not ready to export the type definitions yet. Also, given I haven't used pyre before, maybe pyre doesn't require a py.typed file for exporting types?

Here's my error log from mypy:

Zachs-MBP:tornado-async-transformer zhammer$ mypy tornado_async_transformer
tornado_async_transformer/helpers.py:4: error: Cannot find module named 'libcst'
tornado_async_transformer/helpers.py:4: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
tornado_async_transformer/tornado_async_transformer.py:3: error: Cannot find module named 'libcst'
tornado_async_transformer/tool.py:8: error: Cannot find module named 'libcst'

Undo patches when pyre fixes handling init with ABC

Undo patch from pull request #17

ExpressionContextProvider assigns wrong value to For.target

It looks like the ast considers that the target for a for loop to be a Store; however, the ExpressionContextProvider considers it a Load. I think there's value in having consistency between the two and the target for a for loop seems like it's actually a Store. Is this a bug and should we fix it?

Repro steps:

In [8]: code = """
   ...: for _ in []:
   ...:     pass
   ...: """

In [9]: ast_module = ast.parse(code)

In [10]: ast_module.body[0].target.ctx
Out[10]: <_ast.Store at 0x7f79aa68bbe0>

In [11]: import libcst

In [12]: from libcst.metadata.expression_context_provider import (
    ...:     ExpressionContext,
    ...:     ExpressionContextProvider,
    ...: )

In [13]: from libcst.metadata import MetadataWrapper

In [14]: wrapper = libcst.MetadataWrapper(libcst.parse_module(code))

In [15]: metadata = wrapper.resolve(ExpressionContextProvider)

In [16]: wrapper.module.body[0].target
Out[16]:
Name(
    value='_',
    lpar=[],
    rpar=[],
)

In [17]: metadata[wrapper.module.body[0].target]
Out[17]: <ExpressionContext.LOAD: 1>

Add a utility to get docstring node for module/class/function

Something similar to ast.get_docstring() could be useful to find docstrings more easily.

Could be even better, if it were available at Module.docstring, ClassDef.docstring, FunctionDef.docstring but this might not fit your API design that well, especially that docstring isn't a special part of python's grammar and is just a regular string and only special thing about it is that it's a first statement in module/class/function.

My own use case for such utility would be modifying a class's docstring or adding it if it's missing in a visitor. I can obviously do it without such utility, but I think it would make it easier.

Validation too strict on "except"

Found this pattern in real world code, which fails validation with "Internal error: Must have at least one space after except when ExceptHandler has a type."

try:
  ...
except(Exception):
  pass

also

try:
  ...
except(IOError, ImportError):
  pass

Make ParserSyntaxError more generic, use it in more places

ParserSyntaxError is a little too coupled to how pgen2 works (my fault), which prevents us from using it everywhere we should. It's possible to get a parser error that isn't a ParserSyntaxError. That shouldn't be possible.

I propose that we:

Remove the expected/encountered fields, and instead provide a helper function for creating an error message with those values. Not every syntax error follows this format.
Initialize line and column numbers to a sentinel. We can make the exposed line/column properties raise an exception if this sentinel isn't initialized. While an exception may temporally lack position information during construction, it should always have a line/column number from the end user's perspective.
Intercept exceptions caused by conversion functions in _base_parser, and attach line and column numbers before re-raising them.

[RFC] Analyze attribute access from imports to support dead import clean up better

Per @zsol 's request, propose this change. Feedback is welcome.

In this example, the access a.b is recorded in assignment a.b and a.c. So the RemoveUnusedImportsCommand cleans up nothing.

import a.b
import a.c

a.b

It's because we only record the top level name of assignments and accesses.
https://github.com/Instagram/LibCST/blob/master/libcst/metadata/scope_provider.py#L310-L311

For example, when we see an access a.b, a can be imported or a local object.
To provide the dead import clean up support better, we can record the best assignment and access when multiple imports are available in infer_access.

https://github.com/Instagram/LibCST/blob/master/libcst/metadata/scope_provider.py#L786-L800

That means, given an access, we can check all import assignments having the same top level name (a.b and a.c) and only assign the access when it startswith the fully qualified name of the access.

Specify minimum versions for each dependency

I've been using libCST in Hypothesmith to generate source code, which has been lots of fun. (let me know if you're interested in bug reports, or how stable tests need to be before you'd want a PR)

It turns out that one of my environments had parso==0.2 already installed, and while pip install libcst worked fine Parso 0.2 is not actually compatible. Can I suggest specifying a minimum version of all your dependencies, and adding a test env that checks they're all still supported?

ImportFrom nodes do not reflect relative imports in module value

It seems impossible currently to tell whether an ImportFrom statement is importing relative to the package. The .s are getting dropped when the ImportFrom node module Name node is populated.

To reproduce:

Create a package:

mkdir import_issue
touch import_issue/__init__.py import_issue/cli.py import_issue/greeting.py

Make greeting.py look like this:

greeting = 'hello world'

Make cli.py look like this:

from .greeting import greeting
print(greeting)

Test that you get the right greeting:

python -m import_issue.cli

Generate CST for import_issue/cli.py:

python -m libtool.cst print import_issue/cli.py

This looks like this:

Module(
  body=[
    SimpleStatementLine(
      body=[
        ImportFrom(
          module=Name(
            value='greeting',
          ),
          names=[
            ImportAlias(
              name=Name(
                value='greeting',
              ),
            ),
          ],
        ),
      ],
    ),
    SimpleStatementLine(
      body=[
        Expr(
          value=Call(
            func=Name(
              value='print',
            ),
            args=[
              Arg(
                value=Name(
                  value='greeting',
                ),
              ),
            ],
          ),
        ),
      ],
    ),
  ],
)

If you modify import_issue/cli.py to look like this (not follow relative import), you get the exact same CST:

from greeting import greeting
print(greeting)

feature question: infer a function def for a call

I adapted a pylint plugin today and found they had a helpful util that tries to infer the function def for a call node while traversing the AST.

here's how i use safe_infer:

def visit_call(self, node):
    if not self.linter.is_message_enabled(self.MESSAGE_ID):
        return

    func_def = utils.safe_infer(node.func)
    # i then check if the called function has a specific coroutine
    ...

here's the definition: https://github.com/PyCQA/pylint/blob/47fdef57e6c453ddbd65d4960db20c0e580ee041/pylint/checkers/utils.py#L1077-L1097

is there anything like this being used in libcst tools in use at instagram? it gives a lot more power to linters/codemods imo. i imagine this'd look something like:

do an initial first pass to collect metadata function (as well as other) definitions, keeping their file/module path
use that metadata in a visitor to help infer the definition of some function/class/etc.

[low-pri][doc] explains how libcst works (grammer, tokenize, parse, code structure)

Explains how libcst works (grammer, tokenize, parse, code structure and development process).
Future developers or users who are interested in LibCST internal will benefit from this document. It's also useful to be reused as material when presenting LibCST.

Add documentation for developer helpers like ensure_type, add_slots, etc.

Recipe for adding an import

Curious if you have a preferred way of adding an import to a module. This was a quick solution I put together, but it wouldn't work in a lot of scenarios and is messy:

    @staticmethod
    def _with_added_import(module_node: cst.Module, import_node: cst.Import) -> cst.Module:
        """
        Adds new import `import_node` after the first import in the module `module_node`.
        """
        updated_body: List[Union[cst.SimpleStatementLine, BaseCompoundStatement]] = []
        added_import = False
        for line in module_node.body:
            updated_body.append(line)
            if not added_import and IterItemsTransformer._is_import_line(line):
                updated_body.append(cst.SimpleStatementLine(body=tuple([import_node])))
                added_import = True

        return module_node.with_changes(body=tuple(updated_body))


    @staticmethod
    def _is_import_line(line: Union[cst.SimpleStatementLine, BaseCompoundStatement]) -> bool:
        return (
            isinstance(line, cst.SimpleStatementLine) and
            len(line.body) == 1 and
            isinstance(line.body[0], (cst.Import, cst.ImportFrom))
        )

Missing SubscriptElement node reference doc

SubscriptElement is new and it deprecates ExtSlice but the deprecation is not clear in the doc.

The reference doc of SubscriptElement is also missing.

Improve regularity of Subscript slice

As in several places in Python's syntax, commas in subscripts are a bit overloaded. Do they simply form a tuple, or are they a separator in a distinct special form, as they are in e.g. function calls?

In the runtime, Python tries really hard to pretend that they simply form a tuple: foo[1, 2, 3] and foo[(1, 2, 3)] are indistinguishable at runtime (in both cases the __getitem__ of foo's class receives the tuple (1, 2, 3)); even foo[1, 2, 3:4] just sends the tuple (1, 2, slice(3, 4, None)) to __getitem__.

But there's a catch: the literal syntax for slices (e.g. 3:4) is not valid anywhere except directly inside a subscript. It is not valid in a tuple. So while foo[1, 2, 3:4] is valid syntax, foo[(1, 2, 3:4)] is not, giving the lie to the idea that these forms should be fully equivalent. (What that syntax means is a different question entirely; it's not used in the standard library, but it is used for e.g. slicing multi-dimensional numpy arrays.)

This leaves some gray area in terms of how a syntax tree should represent these forms. It seems that foo[1, 2, 3] should be a simple tuple index. And in fact, Python's AST does represent it that way! But foo[1, 2, 3:4] cannot be represented that way without weakening the tuple node to allow it to contain a literal slice, which in general it cannot. So the AST invents a special ExtSlice node which is effectively much like a tuple that can contain slices, and that occurs in the AST only when a subscript contains multiple comma-separated values, at least one of which is a slice.

LibCST handles it similarly, but slightly differently. Today in LibCST the slice attribute of a Subscript node can be one of three types:

An Index node containing any arbitrary expression (e.g. the 1 in foo[1]). This is like the AST.
A Slice node representing a slice such as 1: or 2:4 or 2:8:2. This is also like the AST.
A Sequence of ExtSlice nodes, each of which has an optional trailing comma and which itself has a value that is either an Index or a Slice. This is used for cases like foo[1, 2, 3:4], but also for simple foo[1, 2, 3]. Unlike the AST, LibCST uses ExtSlice anytime the subscript has commas at the top level, not only when one of the elements is a slice.

There's no perfect answer here; either the representation of foo[1, 2, 3] will be awkwardly un-parallel to foo[(1, 2, 3)] (the LibCST choice) or awkwardly un-parallel to foo[1, 2, 3:4] (the AST choice).

In practice, we've found that the irregularity of LibCST's current representation is painful to work with, particularly with subscripts in PEP 484 generic types, because Foo[Bar] and Foo[Bar, Baz] have totally different LibCST representations (the former subscript is an Index containing a Name, the latter is a length-2 sequence of ExtSlice each containing a Name). This runs counter to LibCST's core value of regularity.

One possible fix would be to move in the direction of the AST, and use an Index containing a Tuple whenever possible, falling back to ExtSlice only if one of the elements is a slice. This improves regularity (a bit: you still have to handle both Name and Tuple in the above case) as long as you are not using slices in a multi-element subscript, but if you ever do, things get awkwardly irregular again.

So after much discussion with @DragonMinded, we feel that the best option here is to move in the other direction, and regularize a Subscript to always contain a Sequence of ExtSlice, each of which can contain either an Index or a Slice. This adds an additional layer in the simple cases of foo[1] and foo[2:3], but it means that traversing a Subscript is always regular.

Ideally I might suggest that ExtSlice should also be renamed to something like SubscriptElement, since the LibCST ExtSlice bears very little resemblance to the AST one (the AST one is a singular container for a list of children, not a single element in the list), and on its own the name ExtSlice doesn't communicate clearly (already today, and especially in the new proposal, it will often exist in the absence of any slice at all). This rename could be done backwards-compatibly with a deprecation period if we provide an import shim for the old name.

[proposal] New convention for importing helper function in libcst.helpers

Proposal 1: from libcst.helpers import something by adding all names to helpers/__init__.py.
Proposal 2: from libcst.helpers.expression import ensure_type, user needs to remember the submodule of helper, which seem less convenient.

I prefer Proposal 1.
CC @DragonMinded

Cannot deep_replace multiple nodes

One consequence of the issue mentioned here, where CSTTransformer always replaces nodes, is that methods that rely on nodes-by-identity become harder to use in bulk.

For example, I'm trying to implement a rename(mod, src, dst) function that takes all instances of a src variable and renames them to dst, using the ScopeProvider. However, this code does not work:

for access in scope.accesses[src]:
  mod = mod.deep_replace(access.node, cst.Name(dst))

This is because after the first deep_replace, all nodes have been replaced, and the original scope is now invalidated. To fix this, you either have to rebuild the scopes after every deep_replace (inefficient), or batch-replace all nodes at the same time.

Perhaps y'all can consider adding a deep_replace_many? That is, enhance the _ChildReplacementTransformer to take multiple nodes. For example, I'm using the following class:

class ReplaceNodes(cst.CSTTransformer):
    def __init__(self, replacements):
        self.replacements = replacements

    def on_leave(self, original_node, updated_node):
        return self.replacements.get(original_node, updated_node)

Inserting statements into blocks

I need to refactor code in a way that inserts statements into blocks. For example, let's say I want to extract a constant as a variable. That is, go from:

y = 1 + 2

Into:

x1 = 1
x2 = 2
y = x1 + x2

Using the standard AST library, I can express this as follows using NodeTransformers:

import ast
import astor

class FindNumbers(ast.NodeTransformer):
    def __init__(self):
        self.assgns = []
        self.i = 0
        
    def fresh(self):
        self.i += 1
        return f'x{self.i}'
        
    def visit_Num(self, node):
        var = self.fresh()
        self.assgns.append(ast.Assign(targets=[ast.Name(id=var)], value=node))
        return ast.Name(id=var)

class ExtractNumbers(ast.NodeTransformer):
    def visit_Assign(self, assgn):
        finder = FindNumbers()
        finder.visit(assgn)
        return finder.assgns + [assgn]
    
stmt = ast.parse('y = 1 + 2')
print(astor.to_source(ExtractNumbers().visit(stmt)))

One key element of this is that NodeTransformers can return a list of statements, which enables the code above to insert the additional assignments during visit_Assign. The relevant code in CPython is here: https://github.com/python/cpython/blob/3.8/Lib/ast.py#L439-L440

As far as I can tell, this is not possible in LibCST. The transformer leave methods must return a CSTNode, and there are no nodes which can represent a block of statements that aren't indented.

Is there another way to accomplish this task in LibCST? If not, would y'all be open to a PR to add this feature?

Parameters for metadata providers

In my application, I have information about whether particular line numbers were executed. I would like to use a metadata provider to turn this information into AST node-level annotations. However, this requires my IsNodeExecutedProvider to be parameterized by information external to the AST. As far as I can tell, neither the wrapper.resolve nor METADATA_DEPENDENCIES APIs allow metadata providers to take parameters, e.g. by overriding __init__.

Is there a way to do this that I have not found, or is this otherwise on the roadmap?

`RemoveImportsVisitor` incorrectly removes imports in `try/except` and `if` blocks

Consider the following input:

try:
  import a
except Exception:
  import a

a.hello()

Calling libcst.codemod.visitors.RemoveImportsVisitor.remove_unused_import for the imports in the above file will incorrectly remove them, even though they're referenced later on.

Here's a small test harness to demonstrate:

from typing import Tuple

import click
import libcst as cst
from libcst import codemod
from libcst.codemod.visitors import RemoveImportsVisitor


class RemoveUnusedImportsVisitor(codemod.VisitorBasedCodemodCommand):
    def visit_Import(self, node: cst.Import) -> bool:
        RemoveImportsVisitor.remove_unused_import_by_node(self.context, node)
        return False

    def visit_ImportFrom(self, node: cst.ImportFrom) -> bool:
        RemoveImportsVisitor.remove_unused_import_by_node(self.context, node)
        return False


@click.command()
@click.argument("src", nargs=-1, type=click.Path(writable=True))
def main(src: Tuple[str]) -> None:
    files = codemod.gather_files(src)
    context = codemod.CodemodContext()
    transform = RemoveUnusedImportsVisitor(context)
    codemod.parallel_exec_transform_with_prettyprint(transform, files)


if __name__ == "__main__":
    main()

[feature][metadata] add SuperClassProvider

In some lint or codemod use cases, we want to know the super classes of a class to see if it's inherited from a specific class in order to enforce convention or codemods on all subclasses.
The superclass information requires full repository analysis due to multiple inheritance. We can leverage Pyre for that which is similar to the existing TypeInferenceProvider .
https://libcst.readthedocs.io/en/latest/metadata.html#type-inference-metadata

Prerequisite: pyre query support superclasses query given a list of paths.
The current pyre query "superclasses(...)" only supports fully qualified class name but that requires caller to pass the name. To make it more efficient, we want to add the path support to pyer query, so FullRepoManager can pass a list of path to pyre query to read all classes in those paths and their superclasses. To map superclasses data to LibCST syntax tree, we also need the location info.
The pyre query interface for TypeInferenceProvider is: pyre query "types(path=path1, path=path2, ...)" and the output format is

LibCST/libcst/tests/pyre/simple_class.json

Lines 1 to 15 in 2fb0db3

 { 

 "types": [ 

 { 

 "location": { 

 "start": { 

 "line": 8, 

 "column": 19 

 }, 

 "stop": { 

 "line": 8, 

 "column": 27 

 } 

 }, 

 "annotation": "typing.Type[typing.Sequence]" 

 },

We want to have a similar interface for SuperClassProvider: pyre query "superclasses(path=path1, path=path2, ...)" and the output format is

List[  
{  
class_name: fully qualified string  
location: location fields  
superclasses: list of superclasses (fully qualified string)  
}  
]

CC @shannonzhu

Implementing SuperClassProvider
We want to implement a SuperClassProvider which returns a list of superclasses on ClassDef node or ClassDef.name node.
TypeInferenceProvider can be used as an example.

LibCST/libcst/metadata/type_inference_provider.py

Line 40 in 2fb0db3

class TypeInferenceProvider(BatchableMetadataProvider[str]):

We also need some mocked integration tests by extending this script:
https://github.com/Instagram/LibCST/blob/2fb0db33d1b393228a6b45f7749e6df659f186b2/libcst/tests/test_pyre_integration.py

The script generates pyre json output by running pyre query and store as test artifacts in dir
https://github.com/Instagram/LibCST/tree/2fb0db33d1b393228a6b45f7749e6df659f186b2/libcst/tests/pyre
The test cases in the file checks whether each node has expected output.
We also run the artifact generation in CI to make sure we address needed changes when we upgrade Pyre or update the test examples. https://github.com/Instagram/LibCST/blob/master/.circleci/config.yml#L78

For unit tests, we can reuse the generated artifacts and verify the correctness of superclass like

LibCST/libcst/metadata/tests/test_type_inference_provider.py

Line 18 in 2fb0db3

 def _test_simple_class_helper(test: UnitTest, wrapper: MetadataWrapper) -> None: 

[docs] Metadata on CSTVisitor

We should make it more obvious in the examples how to use metadata with CSTVisitor; if you are making incremental improvements and already have a visitor, the docs don't make it super obvious that

module.visit(SomeClass())

becomes

wrapper = cst.MetadataWrapper(module)
module.visit(SomeClass())

If you forget that, get_metadata() just gives KeyError. We can probably improve that, when there is no metadata to give a more descriptive error.

Unclear if code ranges are inclusive or exclusive

https://libcst.readthedocs.io/en/latest/metadata.html#libcst.CodeRange specifies this:

class libcst.CodeRange
  start : CodePosition
    Starting position of a node.

  end : CodePosition
    Ending position of a node.

It's not clear from this documentation whether the end position should be considered to be part of the node, or if it indicates one-past-the-last-character of the node. The documentation should specify this.

Running scope analisys example raises exception

Running this example https://libcst.readthedocs.io/en/latest/scope_tutorial.html
raises the following error when I run it locally:

Traceback (most recent call last):
File "~/bla.py", line 24, in
wrapper = cst.metadata.MetadataWrapper(cst.parse_module(source))
AttributeError: module 'libcst.metadata' has no attribute 'MetadataWrapper'

but it works on the binder notebook. There are also differences in the module structure.

Is the code on master turned into a package and pushed to pypi on every successful merge?

[CI] move test coverage job as a separate job and configure Codecov bot less verbose

Handle deprecation in a standard and consistent way?

We have a couple things deprecated and pending removal in the future.

Can we have a standard and consistent methodology to handle deprecation and removal?

What we have now:

BaseAssignment.accesses calls warnings.warn(..., DeprecationWarning) when it's been called.
BatchableMetadataProvider, MetadataWrapper, and other metadata related classes were moved to libcst.metadata but there were still available in libcst for backward compatibility.
ExtSlice is deprecated and we have inline comments in the code.
SyntacticPositionProvider and BasicPositionProvider are about to be deprecated in #114
TODO: move BaseMetadataProvider to libcst.metadata

The warnings.warn(..., DeprecationWarning) seems to be a widely adopted way to warn on deprecation. Can we try to use it on all cases? For class deprecation, we don't want to just subclass since that may require lots of other changes to make existing callsite/type-checking work. Can we somehow have a helper to wrap all all methods in a class to call warnings.warn(..., DeprecationWarning)?

There are more things we can do, e.g. add the deprecation warning to readthedoc docs, so reader see it. Or explicitly declare the target version that deprecated things will be removed.
There are deprecation packages provides a @deprecation to make all those easier:

Some projects put all deprecated things together in a doc for tracking, e.g.
https://django.readthedocs.io/en/1.5.x/internals/deprecation.html
numpy/numpy#11521
This is more like a nice to have to me.

Inconsistency and confusion in `FunctionDef.params` and `Call.args`, can we consolidate it?

This is a feedback I got during PyCon Taiwan and it make sense to me.

Here are differences:
The FunctionDef.params is a Parameters. Parameters has the following different attributes to categorize different type of Param: params, default_params, start_arg, kwonly_params, star_kwarg, which is a good design. (one thing can be improved here is to rename params as positional_params or position_params which I discussed with @bgw recently)

The Call.args is a list of Arg, each Arg has some attributes like value, keyword (for kw arg), star (for asterisk param). In this pattern, user needs to check each keyword attribute to figure out which arg is kw arg. Why not follow the pattern used in Parameter to categorize it as positional_args, kw_args, star_kwarg? That can improve the API consistency and usability. Is there any particular reason we don't want to do it?

Expose add_slots as public API

Currently we import add_slots in this way. from libcst._add_slots import add_slots
We'll need add_slots when building other frameworks.
Any suggestion on the namespace we should use? E.g. libcst.common or just like the other classes with only libcst?

LibCST didn't generate a `tree`, it's a graph which some node can be children of multiple parent nodes

Surprisingly, LibCST didn't generate a tree, it's a graph which some nodes can be the children of multiple parent nodes.

When I worked on ParentNodeProvider #71, found this issue.

Here is a simple example test case to reproduce the issue.

LibCST/libcst/metadata/tests/test_parent_node_provider.py

Lines 49 to 77 in f4b3175

 def test_no_duplicate_node(self): 

 module = cst.parse_module( 

 dedent( 

 """ 

  foo = 'toplevel' 

  fn1(foo) 

  fn2(foo) 

  def fn_def(): 

  foo = 'shadow' 

  fn3(foo) 

  """ 

 ) 

 ) 

 class CountVisitor(cst.CSTVisitor): 

 def __init__(self) -> None: 

 self.count = Counter() 

 self.nodes = {} 

 def on_visit(self, node: cst.CSTNode) -> bool: 

 self.count[id(node)] += 1 

 self.nodes[id(node)] = node 

 return True 

 visitor = CountVisitor() 

 module.visit(visitor) 

 for _id, count in visitor.count.items(): 

 if count != 1: 

 print(count, visitor.nodes[_id])

By using a simple visitor to traverse the node and count the number of visited times by id. One of the node was reused three times as the child node of three different parent nodes.

SimpleWhitespace(
    value='',
)

Test result: https://circleci.com/gh/Instagram/LibCST/2283?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Example source code to be parsed:

        foo = 'toplevel'
        fn1(foo)
        fn2(foo)
        def fn_def():
            foo = 'shadow'
            fn3(foo)

Due to this issue, I wasn't able to implement the desired ParentNodeProvider because a child not can have multiple nodes as parents but we only expect one. We need this to be fixed to unblock ParentNodeProvider. CC @DragonMinded @bgw

[discussion] LibCST based lint framework provides auto fixes

LibCST based lint framework can provide the following benefits:

auto fixes.
rich information from Metadata API for advanced lint rules.
flexible patterns to traverse syntax tree: visitor or matcher.

Autofix:

suggest code changes for developer to accept in one click.
suggested changes could be wrong and developer can silent them with special comment.

Examples:

many programming suggestions in PEP8 can be auto-fixed: https://www.python.org/dev/peps/pep-0008/#programming-recommendations
- is rather than == when compare to a singleton
- ... is not ... rather than not ... is ...

More context: "Lint Fatigue" section in https://instagram-engineering.com/static-analysis-at-scale-an-instagram-story-8f498ab71a0c

Add attribute visitors

We've got visit_If, but we should also have visitors for each node's attribute, like visit_If_test for people that only want to track specific parts of a node.

It sounds like @DragonMinded wants to take this since it's related to #23.

Can you ignore ParserSyntaxError (for reserved async keyword in 3.6)?

I'd like to use this on a project that is python3.6 syntax but not fully 3.7 yet (as far as I know the only violation is that there are still some async keyword args which are reserved in 3.7).

Here's the exception I'm getting when visiting a module:

libcst._exceptions.ParserSyntaxError: Syntax Error: incomplete input @ 200:76.
Encountered 'async', but expected one of ["')'"].

        results = client.query_performers(query, "api_ingestion_consumer", async=False)

Would it be possible to suppress this error until the code is refactored to be totally 3.7 compliant? I realize "LibCST parses Python 3.7" is specifically noted in the README so understood if that's not possible.

[low-pri][doc] Add document to explain the design differences between libcst and ast

ast and libcst looks similar but different in many design details. To make it more clear and prevent confusion for users familiar with ast (I'm one of them), it worth documents the differences with some paragraphs.

Item with Difference	LibCST	ast
string value (e.g. asname in Import/ImportFrom)	Name	str
Name	`value`	`id`
FunctionDef	FunctionDef	FunctionDef and AsyncFunctionDef
decorators in ClassDef	`decorators`	`decorator_list`
access statement inside for or while loop	`For.body.body[0].body.value` (For.body is an IndentBlock, For.body.body[0] is a SimpleStatementLine, For.body.body[0].body[0] is a Expr.)	`For.body[0].value` (body[0] is an Expr)
Call.args	`Sequence[Arg]`, Arg.value stores the value	`Sequence[value]`
keyword arg in Call	Arg in Call.args with Arg.keyword is not None	keyword.arg with keyword in Call.keywords
ImportFrom.module	Name or Attribute	str or Attribute

Accessing node parent?

Hi,

I'm looking at a way to access a node parent but unless I overlooked something there does not seem to be a way to do that. I see a libcst.CSTNode.children method but nothing for the parent.

I have a setup.py file that I would like to modify from:

setup_kwargs = {
    'name': 'colour-science',
    'version': '0.3.14',
    'description': 'Colour Science for Python',
    'long_description': '',
    'author': 'Colour Developers',
    'author_email': None,
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://www.colour-science.org/',
    'package_dir': package_dir,
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'extras_require': extras_require,
    'python_requires': '>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*',
}


setup(**setup_kwargs)

setup(
    name= 'colour-science',
    version= '0.3.14',
    description= 'Colour Science for Python',
    long_description= '',
    author= 'Colour Developers',
    author_email= None,
    maintainer= None,
    maintainer_email= None,
    url= 'https://www.colour-science.org/',
    package_dir= package_dir,
    packages= packages,
    package_data= package_data,
    install_requires= install_requires,
    extras_require= extras_require,
    python_requires= '>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*',
)

So I was looking for all the Dict nodes with the hope of being able to find the name they are assigned to by going up the tree and maybe down. With ast it is possible to walk the tree directly and set an attribute as follows:

for node in ast.walk(root):
    for child in ast.iter_child_nodes(node):
        child.parent = node

[question] Extract body of the root functions from module

Hello,

Recently discovered this project during finding a solution for my problem.
I have a single python file, which contains one or more functions. I'd like to extract only body of those functions for splitting them later to separate files.

Could you please review my attempt and suggest any improvements?
Also I don't know how to separate nested functions in the visitor: tried to find some property of indentation, but can't locate it. Would be glad if you could help me with that.

https://repl.it/@Grigory1/splitter

Thank you.

3.8 Support

I have a test runner that uses Pyre as an option for type-checking, ptr. I put up a diff to move CI to start using official 3.8 for test running on travis and this results in the expected ValueError:

raise ValueError(
ValueError: LibCST can only parse code using one of the following versions of Python's grammar: 3.5, 3.6, 3.7. More versions may be supported by future releases.

CI Job: https://travis-ci.com/facebookincubator/ptr/jobs/247724096

What is there an expected timeframe for 3.8 support? I will have the tool not run pyre for now in >=3.8. What's the thoughts on possibly adding a section to the README explaining the 3.8 support plans with 3.8 being an official version now?

mypy CSTTypedTransformerFunctions issue

context

ran into a peculiar bug where the CSTTypedTransformerFunctions types seem to not work with mypy. leave_{Node} calls fail mypy type check with Signature of "leave_{Node}" is incompatible with supertype "CSTTypedTransformerFunctions".

the type definition that mypy says is incompatible doesnt look incompatible to me, though:

LibCST/libcst/_typed_visitor.py

Lines 5785 to 5787 in c9b10fe

 @mark_no_op 

 def leave_Module(self, original_node: "Module", updated_node: "Module") -> "Module": 

 return updated_node

i'm not sure yet if this is an issue with mypy or libcst, but wanted to share here for replication. (mypy's error messages aren't super descriptive so am still trying to figure out where mypy is getting the incompatible defintion from.)

replicate (using: python 3.7.5)

`./requirements.txt`

libcst==0.2.4
mypy==0.750

`./transformer.py`

import libcst

class Transformer(libcst.CSTTransformer):

    def leave_Module(self, original_node: libcst.Module, updated_node: libcst.Module) -> libcst.Module:
        return updated_node

    def leave_Call(self, original_node: libcst.Call, updated_node: libcst.Call) -> libcst.Call:
        return updated_node

`mypy errors`

$ mypy transformer.py 
transformer.py:5: error: Signature of "leave_Module" incompatible with supertype "CSTTypedTransformerFunctions"
transformer.py:8: error: Signature of "leave_Call" incompatible with supertype "CSTTypedTransformerFunctions"
Found 2 errors in 1 file (checked 1 source file)

Compatibility with older python syntax

Because I've noticed a couple of references in other projects' bugs: I'm already working on this and don't anticipate that it will be super difficult. I will be adding support back to 2.7 (but probably stopping there). I have future-import parsing working as well, which is necessary for print statements and barry_as_FLUFL handling.

https://github.com/thatch/python-grammar-changes has a hand-compiled list of what changed, but I hope to create snippets and mechanically verify what versions they work on as well.

3.10+ support

PEP 617 (if it is accepted, and it probably will) is out and I'm wondering if LibCST's underlying parser is capable of parsing PEG grammar. With 3.10, the LL(1) restriction on the grammar will be deferred and this means that lib2to3.pgen2 won't be able to parse new changes on the python grammar. I'm not sure about internals of LibCST but from what I have seen in readme that it uses something that bases on lib2to3.pgen2. Does LibCST will continue to support newer python versions and their grammar? (We are currently using lib2to3 as our refactoring tool on unimport but we might need to migrate another tool to support 3.10+ which is why I am asking)

Add a `.code` alternative for all CSTNode subclasses

Related to the discussion here: #15 (review)

.code exists on Module, but not on other CSTNode subclasses.

We can only generate code in the context of a module because the module contains information about the default indentation level, newline, etc.

However it'd be useful to have a .code property on every node for debugging purposes. I just don't know how we can communicate these pitfalls to the developer.

I'm thinking we could define an API like this:

def code_with_context(self, default_newline: str = "\n", default_indent: str = "    "):
    ...

What do people think? Is it obvious enough when reading node.code_with_context() that the generated code might not exactly match the code in the original module? Should we use a different name? Is this a bad idea? Should we stick with Module.code_for_node?

Missing dependency in wheel uploaded to pypi (v20.4.1)

 ❯❯❯ pip install monkeytype
Collecting monkeytype
  Downloading MonkeyType-20.4.1-py3-none-any.whl (41 kB)
     |████████████████████████████████| 41 kB 18 kB/s 
Collecting mypy-extensions
  Downloading mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)
Installing collected packages: mypy-extensions, monkeytype
Successfully installed monkeytype-20.4.1 mypy-extensions-0.4.3
 ❯❯❯ monkeytype --help
Traceback (most recent call last):
  File "/tmp/nwani_1585950476/dev/bin/monkeytype", line 5, in <module>
    from monkeytype.cli import entry_point_main
  File "/tmp/nwani_1585950476/dev/lib/python3.8/site-packages/monkeytype/cli.py", line 16, in <module>
    from libcst import parse_module
ModuleNotFoundError: No module named 'libcst'
 ❯❯❯

[low-pri] move CodeRange and CodePosition to under libcst.metadata for consistency.

Move CodeRange and CodePosition to under libcst.metadata for consistency.
We cannot actually move their definition to under libcst.metadata since they were needed in codegen which causing circular dependency. We can export them from libcst.metadata instead of libcst for consistency.

Handle string escaping for SimpleString expression in FormattedString

f"{a or ""}" this is invalid syntax but LibCST handles it and renders the invalid code.

We can either:

automatically escape the string quotes in the inner expression.
raise exception on code generation.

`parse_module("not+1")` raises an error

But it is valid Python, which evaluates to False.

Of course the fuzzer didn't find this on my machine or in CI for the pull request, but after merging 🙄

	{
	"types": [
	{
	"location": {
	"start": {
	"line": 8,
	"column": 19
	},
	"stop": {
	"line": 8,
	"column": 27
	}
	},
	"annotation": "typing.Type[typing.Sequence]"
	},

	def test_no_duplicate_node(self):
	module = cst.parse_module(
	dedent(
	"""
	foo = 'toplevel'
	fn1(foo)
	fn2(foo)
	def fn_def():
	foo = 'shadow'
	fn3(foo)
	"""
	)
	)

	class CountVisitor(cst.CSTVisitor):
	def __init__(self) -> None:
	self.count = Counter()
	self.nodes = {}

	def on_visit(self, node: cst.CSTNode) -> bool:
	self.count[id(node)] += 1
	self.nodes[id(node)] = node
	return True

	visitor = CountVisitor()
	module.visit(visitor)
	for _id, count in visitor.count.items():
	if count != 1:
	print(count, visitor.nodes[_id])

	@mark_no_op
	def leave_Module(self, original_node: "Module", updated_node: "Module") -> "Module":
	return updated_node

instagram / libcst Goto Github PK

libcst's People

Contributors

Stargazers

Watchers

Forkers

libcst's Issues

context

replicate (using: python 3.7.5)

./requirements.txt

./transformer.py

mypy errors

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`./requirements.txt`

`./transformer.py`

`mypy errors`