GithubHelp home page GithubHelp logo

instagram / libcst Goto Github PK

View Code? Open in Web Editor NEW
1.4K 1.4K 169.0 3.23 MB

A concrete syntax tree parser and serializer library for Python that preserves many aspects of Python's abstract syntax tree

Home Page: https://libcst.readthedocs.io/

License: Other

Python 86.03% Rust 13.96% Shell 0.01%

libcst's People

Contributors

akx avatar amyreese avatar bgw avatar cdonovick avatar dependabot[bot] avatar dragonminded avatar giomeg avatar hauntsaninja avatar isidentical avatar jakkdl avatar jimmylai avatar josieesh avatar kelsolaar avatar kit1980 avatar kronuz avatar lpetre avatar lrjball avatar luciawlli avatar mapleccc avatar martindemello avatar orf avatar shannonzhu avatar sk- avatar stroxler avatar thatch avatar venkatsubramaniam avatar zac-hd avatar zhammer avatar zsol avatar zzl0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libcst's Issues

Bowler integration

Bowler provides nice API and CLI over writing codemods. However, Bowler uses the lib2to3 CST format, which you claim is complex and "makes it hard to extract the semantics we care about."
However, Bowler is useful in that it provide it provides a CLI interface, which make is very easy to use in CI/CD environments.
Is there any plan to integrate libCST with Bowler? If so, when?

Immutable module attribute on libcst visitor / best way to use code_for_node

Seems like a flow for me when debugging transformers will be adding a new attribute to a visitor/transformer to capture the visited module as self.module. i'd do this in the visit_Module handler so that I can later do print(self.module.code_for_node(current_node).

is this in sync with how code_for_node is being used internally at instagram? this gets a bit confusing if the original module is mutated during tree traversing so that code for node on an updated_node may not be totally accurate. (though this is where my full understanding of the visitor pattern gets fuzzier).

just curious how y'all use code_for_node in debugging workflows.

[WIP] API for fully qualified name

Probably using FullQualifiedNameProvider using FullRepoManager.
Questions: how can we make sure FullRepoManager is easy to use, if people run TypeInferenceProvider without FullRepoManager, will it raise Exception?

Add a `with_deep_changes` to make change deep attribute of Literal/Sequence value easier.

We have useful helpers to generate an immutable updated node: .with_changes(**changes) and .deep_repalce(old_node: CSTNode, new_node: CSTNode).

In many cases, user may want to update an attribute having a non CSTNode value (a str/int literal or sequence) and if it's deep in the tree, we'll need something like this example:

# we want to replace the old_value as new_value
node = cst.parse_expression("fn(a = "old_value", b = 2)")
updated_node = node.deep_replace(node.args[0].value, node.args[0].value.with_changes(value="new_value")
# note: we cannot pass node.args[0].value.value to deep_replace since 
# it's a str instead of a CSTNode.

If we can have a .with_deep_changes(old_node: CSTNode, new_node: CSTNode, **changes) which combines .with_changes() and .deep_replace(), the code can be really simple:

updated_node = node.with_deep_changes(node.args[0].value, value='"new_value"')

Lint Warning or Test for Missing Copyright Headers

When creating a new file, its easy to forget to add the license preamble and copyright header to the file. It would be nice if either a test or a lint would catch this before landing. Ideally this would be a lint rule, but our lint framework that uses LibCST isn't open source yet, so it makes it a bit difficult to use this to implement the rule. This isn't high-priority, but would be nice to add to our lint tox env once it is open sourced.

Add a py.typed file

Can you add a py.typed file (https://www.python.org/dev/peps/pep-0561/) for libcst? Without it I can't use the libcst types in my mypy typechecking when importing the package.

I understand if you're not ready to export the type definitions yet. Also, given I haven't used pyre before, maybe pyre doesn't require a py.typed file for exporting types?

Here's my error log from mypy:

Zachs-MBP:tornado-async-transformer zhammer$ mypy tornado_async_transformer
tornado_async_transformer/helpers.py:4: error: Cannot find module named 'libcst'
tornado_async_transformer/helpers.py:4: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
tornado_async_transformer/tornado_async_transformer.py:3: error: Cannot find module named 'libcst'
tornado_async_transformer/tool.py:8: error: Cannot find module named 'libcst'

ExpressionContextProvider assigns wrong value to For.target

It looks like the ast considers that the target for a for loop to be a Store; however, the ExpressionContextProvider considers it a Load. I think there's value in having consistency between the two and the target for a for loop seems like it's actually a Store. Is this a bug and should we fix it?

Repro steps:

In [8]: code = """
   ...: for _ in []:
   ...:     pass
   ...: """

In [9]: ast_module = ast.parse(code)

In [10]: ast_module.body[0].target.ctx
Out[10]: <_ast.Store at 0x7f79aa68bbe0>

In [11]: import libcst

In [12]: from libcst.metadata.expression_context_provider import (
    ...:     ExpressionContext,
    ...:     ExpressionContextProvider,
    ...: )

In [13]: from libcst.metadata import MetadataWrapper

In [14]: wrapper = libcst.MetadataWrapper(libcst.parse_module(code))

In [15]: metadata = wrapper.resolve(ExpressionContextProvider)

In [16]: wrapper.module.body[0].target
Out[16]:
Name(
    value='_',
    lpar=[],
    rpar=[],
)

In [17]: metadata[wrapper.module.body[0].target]
Out[17]: <ExpressionContext.LOAD: 1>

Add a utility to get docstring node for module/class/function

Something similar to ast.get_docstring() could be useful to find docstrings more easily.

Could be even better, if it were available at Module.docstring, ClassDef.docstring, FunctionDef.docstring but this might not fit your API design that well, especially that docstring isn't a special part of python's grammar and is just a regular string and only special thing about it is that it's a first statement in module/class/function.

My own use case for such utility would be modifying a class's docstring or adding it if it's missing in a visitor. I can obviously do it without such utility, but I think it would make it easier.

Validation too strict on "except"

Found this pattern in real world code, which fails validation with "Internal error: Must have at least one space after except when ExceptHandler has a type."

try:
  ...
except(Exception):
  pass

also

try:
  ...
except(IOError, ImportError):
  pass

Make ParserSyntaxError more generic, use it in more places

ParserSyntaxError is a little too coupled to how pgen2 works (my fault), which prevents us from using it everywhere we should. It's possible to get a parser error that isn't a ParserSyntaxError. That shouldn't be possible.

I propose that we:

  • Remove the expected/encountered fields, and instead provide a helper function for creating an error message with those values. Not every syntax error follows this format.
  • Initialize line and column numbers to a sentinel. We can make the exposed line/column properties raise an exception if this sentinel isn't initialized. While an exception may temporally lack position information during construction, it should always have a line/column number from the end user's perspective.
  • Intercept exceptions caused by conversion functions in _base_parser, and attach line and column numbers before re-raising them.

[RFC] Analyze attribute access from imports to support dead import clean up better

Per @zsol 's request, propose this change. Feedback is welcome.

In this example, the access a.b is recorded in assignment a.b and a.c. So the RemoveUnusedImportsCommand cleans up nothing.

import a.b
import a.c

a.b

It's because we only record the top level name of assignments and accesses.
https://github.com/Instagram/LibCST/blob/master/libcst/metadata/scope_provider.py#L310-L311

For example, when we see an access a.b, a can be imported or a local object.
To provide the dead import clean up support better, we can record the best assignment and access when multiple imports are available in infer_access.

https://github.com/Instagram/LibCST/blob/master/libcst/metadata/scope_provider.py#L786-L800

That means, given an access, we can check all import assignments having the same top level name (a.b and a.c) and only assign the access when it startswith the fully qualified name of the access.

Specify minimum versions for each dependency

I've been using libCST in Hypothesmith to generate source code, which has been lots of fun. (let me know if you're interested in bug reports, or how stable tests need to be before you'd want a PR)

It turns out that one of my environments had parso==0.2 already installed, and while pip install libcst worked fine Parso 0.2 is not actually compatible. Can I suggest specifying a minimum version of all your dependencies, and adding a test env that checks they're all still supported?

ImportFrom nodes do not reflect relative imports in module value

It seems impossible currently to tell whether an ImportFrom statement is importing relative to the package. The .s are getting dropped when the ImportFrom node module Name node is populated.

To reproduce:

Create a package:

mkdir import_issue
touch import_issue/__init__.py import_issue/cli.py import_issue/greeting.py

Make greeting.py look like this:

greeting = 'hello world'

Make cli.py look like this:

from .greeting import greeting
print(greeting)

Test that you get the right greeting:

python -m import_issue.cli

Generate CST for import_issue/cli.py:

python -m libtool.cst print import_issue/cli.py

This looks like this:

Module(
  body=[
    SimpleStatementLine(
      body=[
        ImportFrom(
          module=Name(
            value='greeting',
          ),
          names=[
            ImportAlias(
              name=Name(
                value='greeting',
              ),
            ),
          ],
        ),
      ],
    ),
    SimpleStatementLine(
      body=[
        Expr(
          value=Call(
            func=Name(
              value='print',
            ),
            args=[
              Arg(
                value=Name(
                  value='greeting',
                ),
              ),
            ],
          ),
        ),
      ],
    ),
  ],
)

If you modify import_issue/cli.py to look like this (not follow relative import), you get the exact same CST:

from greeting import greeting
print(greeting)

feature question: infer a function def for a call

I adapted a pylint plugin today and found they had a helpful util that tries to infer the function def for a call node while traversing the AST.

here's how i use safe_infer:

def visit_call(self, node):
    if not self.linter.is_message_enabled(self.MESSAGE_ID):
        return

    func_def = utils.safe_infer(node.func)
    # i then check if the called function has a specific coroutine
    ...

here's the definition: https://github.com/PyCQA/pylint/blob/47fdef57e6c453ddbd65d4960db20c0e580ee041/pylint/checkers/utils.py#L1077-L1097

is there anything like this being used in libcst tools in use at instagram? it gives a lot more power to linters/codemods imo. i imagine this'd look something like:

  1. do an initial first pass to collect metadata function (as well as other) definitions, keeping their file/module path
  2. use that metadata in a visitor to help infer the definition of some function/class/etc.

Recipe for adding an import

Curious if you have a preferred way of adding an import to a module. This was a quick solution I put together, but it wouldn't work in a lot of scenarios and is messy:

    @staticmethod
    def _with_added_import(module_node: cst.Module, import_node: cst.Import) -> cst.Module:
        """
        Adds new import `import_node` after the first import in the module `module_node`.
        """
        updated_body: List[Union[cst.SimpleStatementLine, BaseCompoundStatement]] = []
        added_import = False
        for line in module_node.body:
            updated_body.append(line)
            if not added_import and IterItemsTransformer._is_import_line(line):
                updated_body.append(cst.SimpleStatementLine(body=tuple([import_node])))
                added_import = True

        return module_node.with_changes(body=tuple(updated_body))


    @staticmethod
    def _is_import_line(line: Union[cst.SimpleStatementLine, BaseCompoundStatement]) -> bool:
        return (
            isinstance(line, cst.SimpleStatementLine) and
            len(line.body) == 1 and
            isinstance(line.body[0], (cst.Import, cst.ImportFrom))
        )

Improve regularity of Subscript slice

As in several places in Python's syntax, commas in subscripts are a bit overloaded. Do they simply form a tuple, or are they a separator in a distinct special form, as they are in e.g. function calls?

In the runtime, Python tries really hard to pretend that they simply form a tuple: foo[1, 2, 3] and foo[(1, 2, 3)] are indistinguishable at runtime (in both cases the __getitem__ of foo's class receives the tuple (1, 2, 3)); even foo[1, 2, 3:4] just sends the tuple (1, 2, slice(3, 4, None)) to __getitem__.

But there's a catch: the literal syntax for slices (e.g. 3:4) is not valid anywhere except directly inside a subscript. It is not valid in a tuple. So while foo[1, 2, 3:4] is valid syntax, foo[(1, 2, 3:4)] is not, giving the lie to the idea that these forms should be fully equivalent. (What that syntax means is a different question entirely; it's not used in the standard library, but it is used for e.g. slicing multi-dimensional numpy arrays.)

This leaves some gray area in terms of how a syntax tree should represent these forms. It seems that foo[1, 2, 3] should be a simple tuple index. And in fact, Python's AST does represent it that way! But foo[1, 2, 3:4] cannot be represented that way without weakening the tuple node to allow it to contain a literal slice, which in general it cannot. So the AST invents a special ExtSlice node which is effectively much like a tuple that can contain slices, and that occurs in the AST only when a subscript contains multiple comma-separated values, at least one of which is a slice.

LibCST handles it similarly, but slightly differently. Today in LibCST the slice attribute of a Subscript node can be one of three types:

  1. An Index node containing any arbitrary expression (e.g. the 1 in foo[1]). This is like the AST.
  2. A Slice node representing a slice such as 1: or 2:4 or 2:8:2. This is also like the AST.
  3. A Sequence of ExtSlice nodes, each of which has an optional trailing comma and which itself has a value that is either an Index or a Slice. This is used for cases like foo[1, 2, 3:4], but also for simple foo[1, 2, 3]. Unlike the AST, LibCST uses ExtSlice anytime the subscript has commas at the top level, not only when one of the elements is a slice.

There's no perfect answer here; either the representation of foo[1, 2, 3] will be awkwardly un-parallel to foo[(1, 2, 3)] (the LibCST choice) or awkwardly un-parallel to foo[1, 2, 3:4] (the AST choice).

In practice, we've found that the irregularity of LibCST's current representation is painful to work with, particularly with subscripts in PEP 484 generic types, because Foo[Bar] and Foo[Bar, Baz] have totally different LibCST representations (the former subscript is an Index containing a Name, the latter is a length-2 sequence of ExtSlice each containing a Name). This runs counter to LibCST's core value of regularity.

One possible fix would be to move in the direction of the AST, and use an Index containing a Tuple whenever possible, falling back to ExtSlice only if one of the elements is a slice. This improves regularity (a bit: you still have to handle both Name and Tuple in the above case) as long as you are not using slices in a multi-element subscript, but if you ever do, things get awkwardly irregular again.

So after much discussion with @DragonMinded, we feel that the best option here is to move in the other direction, and regularize a Subscript to always contain a Sequence of ExtSlice, each of which can contain either an Index or a Slice. This adds an additional layer in the simple cases of foo[1] and foo[2:3], but it means that traversing a Subscript is always regular.

Ideally I might suggest that ExtSlice should also be renamed to something like SubscriptElement, since the LibCST ExtSlice bears very little resemblance to the AST one (the AST one is a singular container for a list of children, not a single element in the list), and on its own the name ExtSlice doesn't communicate clearly (already today, and especially in the new proposal, it will often exist in the absence of any slice at all). This rename could be done backwards-compatibly with a deprecation period if we provide an import shim for the old name.

Cannot deep_replace multiple nodes

One consequence of the issue mentioned here, where CSTTransformer always replaces nodes, is that methods that rely on nodes-by-identity become harder to use in bulk.

For example, I'm trying to implement a rename(mod, src, dst) function that takes all instances of a src variable and renames them to dst, using the ScopeProvider. However, this code does not work:

for access in scope.accesses[src]:
  mod = mod.deep_replace(access.node, cst.Name(dst))

This is because after the first deep_replace, all nodes have been replaced, and the original scope is now invalidated. To fix this, you either have to rebuild the scopes after every deep_replace (inefficient), or batch-replace all nodes at the same time.

Perhaps y'all can consider adding a deep_replace_many? That is, enhance the _ChildReplacementTransformer to take multiple nodes. For example, I'm using the following class:

class ReplaceNodes(cst.CSTTransformer):
    def __init__(self, replacements):
        self.replacements = replacements

    def on_leave(self, original_node, updated_node):
        return self.replacements.get(original_node, updated_node)

Inserting statements into blocks

I need to refactor code in a way that inserts statements into blocks. For example, let's say I want to extract a constant as a variable. That is, go from:

y = 1 + 2

Into:

x1 = 1
x2 = 2
y = x1 + x2

Using the standard AST library, I can express this as follows using NodeTransformers:

import ast
import astor

class FindNumbers(ast.NodeTransformer):
    def __init__(self):
        self.assgns = []
        self.i = 0
        
    def fresh(self):
        self.i += 1
        return f'x{self.i}'
        
    def visit_Num(self, node):
        var = self.fresh()
        self.assgns.append(ast.Assign(targets=[ast.Name(id=var)], value=node))
        return ast.Name(id=var)

class ExtractNumbers(ast.NodeTransformer):
    def visit_Assign(self, assgn):
        finder = FindNumbers()
        finder.visit(assgn)
        return finder.assgns + [assgn]
    
stmt = ast.parse('y = 1 + 2')
print(astor.to_source(ExtractNumbers().visit(stmt)))

One key element of this is that NodeTransformers can return a list of statements, which enables the code above to insert the additional assignments during visit_Assign. The relevant code in CPython is here: https://github.com/python/cpython/blob/3.8/Lib/ast.py#L439-L440

As far as I can tell, this is not possible in LibCST. The transformer leave methods must return a CSTNode, and there are no nodes which can represent a block of statements that aren't indented.

Is there another way to accomplish this task in LibCST? If not, would y'all be open to a PR to add this feature?

Parameters for metadata providers

In my application, I have information about whether particular line numbers were executed. I would like to use a metadata provider to turn this information into AST node-level annotations. However, this requires my IsNodeExecutedProvider to be parameterized by information external to the AST. As far as I can tell, neither the wrapper.resolve nor METADATA_DEPENDENCIES APIs allow metadata providers to take parameters, e.g. by overriding __init__.

Is there a way to do this that I have not found, or is this otherwise on the roadmap?

`RemoveImportsVisitor` incorrectly removes imports in `try/except` and `if` blocks

Consider the following input:

try:
  import a
except Exception:
  import a

a.hello()

Calling libcst.codemod.visitors.RemoveImportsVisitor.remove_unused_import for the imports in the above file will incorrectly remove them, even though they're referenced later on.

Here's a small test harness to demonstrate:

from typing import Tuple

import click
import libcst as cst
from libcst import codemod
from libcst.codemod.visitors import RemoveImportsVisitor


class RemoveUnusedImportsVisitor(codemod.VisitorBasedCodemodCommand):
    def visit_Import(self, node: cst.Import) -> bool:
        RemoveImportsVisitor.remove_unused_import_by_node(self.context, node)
        return False

    def visit_ImportFrom(self, node: cst.ImportFrom) -> bool:
        RemoveImportsVisitor.remove_unused_import_by_node(self.context, node)
        return False


@click.command()
@click.argument("src", nargs=-1, type=click.Path(writable=True))
def main(src: Tuple[str]) -> None:
    files = codemod.gather_files(src)
    context = codemod.CodemodContext()
    transform = RemoveUnusedImportsVisitor(context)
    codemod.parallel_exec_transform_with_prettyprint(transform, files)


if __name__ == "__main__":
    main()

[feature][metadata] add SuperClassProvider

In some lint or codemod use cases, we want to know the super classes of a class to see if it's inherited from a specific class in order to enforce convention or codemods on all subclasses.
The superclass information requires full repository analysis due to multiple inheritance. We can leverage Pyre for that which is similar to the existing TypeInferenceProvider .
https://libcst.readthedocs.io/en/latest/metadata.html#type-inference-metadata

Prerequisite: pyre query support superclasses query given a list of paths.
The current pyre query "superclasses(...)" only supports fully qualified class name but that requires caller to pass the name. To make it more efficient, we want to add the path support to pyer query, so FullRepoManager can pass a list of path to pyre query to read all classes in those paths and their superclasses. To map superclasses data to LibCST syntax tree, we also need the location info.
The pyre query interface for TypeInferenceProvider is: pyre query "types(path=path1, path=path2, ...)" and the output format is

{
"types": [
{
"location": {
"start": {
"line": 8,
"column": 19
},
"stop": {
"line": 8,
"column": 27
}
},
"annotation": "typing.Type[typing.Sequence]"
},

We want to have a similar interface for SuperClassProvider: pyre query "superclasses(path=path1, path=path2, ...)" and the output format is

List[  
{  
class_name: fully qualified string  
location: location fields  
superclasses: list of superclasses (fully qualified string)  
}  
]

CC @shannonzhu

Implementing SuperClassProvider
We want to implement a SuperClassProvider which returns a list of superclasses on ClassDef node or ClassDef.name node.
TypeInferenceProvider can be used as an example.

class TypeInferenceProvider(BatchableMetadataProvider[str]):

We also need some mocked integration tests by extending this script:
https://github.com/Instagram/LibCST/blob/2fb0db33d1b393228a6b45f7749e6df659f186b2/libcst/tests/test_pyre_integration.py

The script generates pyre json output by running pyre query and store as test artifacts in dir
https://github.com/Instagram/LibCST/tree/2fb0db33d1b393228a6b45f7749e6df659f186b2/libcst/tests/pyre
The test cases in the file checks whether each node has expected output.
We also run the artifact generation in CI to make sure we address needed changes when we upgrade Pyre or update the test examples. https://github.com/Instagram/LibCST/blob/master/.circleci/config.yml#L78

For unit tests, we can reuse the generated artifacts and verify the correctness of superclass like

def _test_simple_class_helper(test: UnitTest, wrapper: MetadataWrapper) -> None:

[docs] Metadata on CSTVisitor

  1. We should make it more obvious in the examples how to use metadata with CSTVisitor; if you are making incremental improvements and already have a visitor, the docs don't make it super obvious that
module.visit(SomeClass())

becomes

wrapper = cst.MetadataWrapper(module)
module.visit(SomeClass())
  1. If you forget that, get_metadata() just gives KeyError. We can probably improve that, when there is no metadata to give a more descriptive error.

Running scope analisys example raises exception

Running this example https://libcst.readthedocs.io/en/latest/scope_tutorial.html
raises the following error when I run it locally:

Traceback (most recent call last):
File "~/bla.py", line 24, in
wrapper = cst.metadata.MetadataWrapper(cst.parse_module(source))
AttributeError: module 'libcst.metadata' has no attribute 'MetadataWrapper'

but it works on the binder notebook. There are also differences in the module structure.

Is the code on master turned into a package and pushed to pypi on every successful merge?

Handle deprecation in a standard and consistent way?

We have a couple things deprecated and pending removal in the future.

Can we have a standard and consistent methodology to handle deprecation and removal?

What we have now:

  • BaseAssignment.accesses calls warnings.warn(..., DeprecationWarning) when it's been called.
  • BatchableMetadataProvider, MetadataWrapper, and other metadata related classes were moved to libcst.metadata but there were still available in libcst for backward compatibility.
  • ExtSlice is deprecated and we have inline comments in the code.
  • SyntacticPositionProvider and BasicPositionProvider are about to be deprecated in #114
  • TODO: move BaseMetadataProvider to libcst.metadata

The warnings.warn(..., DeprecationWarning) seems to be a widely adopted way to warn on deprecation. Can we try to use it on all cases? For class deprecation, we don't want to just subclass since that may require lots of other changes to make existing callsite/type-checking work. Can we somehow have a helper to wrap all all methods in a class to call warnings.warn(..., DeprecationWarning)?

There are more things we can do, e.g. add the deprecation warning to readthedoc docs, so reader see it. Or explicitly declare the target version that deprecated things will be removed.
There are deprecation packages provides a @deprecation to make all those easier:

Some projects put all deprecated things together in a doc for tracking, e.g.
https://django.readthedocs.io/en/1.5.x/internals/deprecation.html
numpy/numpy#11521
This is more like a nice to have to me.

Inconsistency and confusion in `FunctionDef.params` and `Call.args`, can we consolidate it?

This is a feedback I got during PyCon Taiwan and it make sense to me.

Here are differences:
The FunctionDef.params is a Parameters. Parameters has the following different attributes to categorize different type of Param: params, default_params, start_arg, kwonly_params, star_kwarg, which is a good design. (one thing can be improved here is to rename params as positional_params or position_params which I discussed with @bgw recently)

The Call.args is a list of Arg, each Arg has some attributes like value, keyword (for kw arg), star (for asterisk param). In this pattern, user needs to check each keyword attribute to figure out which arg is kw arg. Why not follow the pattern used in Parameter to categorize it as positional_args, kw_args, star_kwarg? That can improve the API consistency and usability. Is there any particular reason we don't want to do it?

Expose add_slots as public API

Currently we import add_slots in this way. from libcst._add_slots import add_slots
We'll need add_slots when building other frameworks.
Any suggestion on the namespace we should use? E.g. libcst.common or just like the other classes with only libcst?

LibCST didn't generate a `tree`, it's a graph which some node can be children of multiple parent nodes

Surprisingly, LibCST didn't generate a tree, it's a graph which some nodes can be the children of multiple parent nodes.

When I worked on ParentNodeProvider #71, found this issue.

Here is a simple example test case to reproduce the issue.

def test_no_duplicate_node(self):
module = cst.parse_module(
dedent(
"""
foo = 'toplevel'
fn1(foo)
fn2(foo)
def fn_def():
foo = 'shadow'
fn3(foo)
"""
)
)
class CountVisitor(cst.CSTVisitor):
def __init__(self) -> None:
self.count = Counter()
self.nodes = {}
def on_visit(self, node: cst.CSTNode) -> bool:
self.count[id(node)] += 1
self.nodes[id(node)] = node
return True
visitor = CountVisitor()
module.visit(visitor)
for _id, count in visitor.count.items():
if count != 1:
print(count, visitor.nodes[_id])

By using a simple visitor to traverse the node and count the number of visited times by id. One of the node was reused three times as the child node of three different parent nodes.

SimpleWhitespace(
    value='',
)

Test result: https://circleci.com/gh/Instagram/LibCST/2283?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Example source code to be parsed:

        foo = 'toplevel'
        fn1(foo)
        fn2(foo)
        def fn_def():
            foo = 'shadow'
            fn3(foo)

Due to this issue, I wasn't able to implement the desired ParentNodeProvider because a child not can have multiple nodes as parents but we only expect one. We need this to be fixed to unblock ParentNodeProvider. CC @DragonMinded @bgw

[discussion] LibCST based lint framework provides auto fixes

LibCST based lint framework can provide the following benefits:

  • auto fixes.
  • rich information from Metadata API for advanced lint rules.
  • flexible patterns to traverse syntax tree: visitor or matcher.

Autofix:

  • suggest code changes for developer to accept in one click.
  • suggested changes could be wrong and developer can silent them with special comment.

Examples:

More context: "Lint Fatigue" section in https://instagram-engineering.com/static-analysis-at-scale-an-instagram-story-8f498ab71a0c

Add attribute visitors

We've got visit_If, but we should also have visitors for each node's attribute, like visit_If_test for people that only want to track specific parts of a node.

It sounds like @DragonMinded wants to take this since it's related to #23.

Can you ignore ParserSyntaxError (for reserved async keyword in 3.6)?

I'd like to use this on a project that is python3.6 syntax but not fully 3.7 yet (as far as I know the only violation is that there are still some async keyword args which are reserved in 3.7).

Here's the exception I'm getting when visiting a module:

libcst._exceptions.ParserSyntaxError: Syntax Error: incomplete input @ 200:76.
Encountered 'async', but expected one of ["')'"].

        results = client.query_performers(query, "api_ingestion_consumer", async=False)

Would it be possible to suppress this error until the code is refactored to be totally 3.7 compliant? I realize "LibCST parses Python 3.7" is specifically noted in the README so understood if that's not possible.

[low-pri][doc] Add document to explain the design differences between libcst and ast

ast and libcst looks similar but different in many design details. To make it more clear and prevent confusion for users familiar with ast (I'm one of them), it worth documents the differences with some paragraphs.

Item with Difference LibCST ast
string value (e.g. asname in Import/ImportFrom) Name str
Name value id
FunctionDef FunctionDef FunctionDef and AsyncFunctionDef
decorators in ClassDef decorators decorator_list
access statement inside for or while loop For.body.body[0].body.value (For.body is an IndentBlock, For.body.body[0] is a SimpleStatementLine, For.body.body[0].body[0] is a Expr.) For.body[0].value (body[0] is an Expr)
Call.args Sequence[Arg], Arg.value stores the value Sequence[value]
keyword arg in Call Arg in Call.args with Arg.keyword is not None keyword.arg with keyword in Call.keywords
ImportFrom.module Name or Attribute str or Attribute

Accessing node parent?

Hi,

I'm looking at a way to access a node parent but unless I overlooked something there does not seem to be a way to do that. I see a libcst.CSTNode.children method but nothing for the parent.

I have a setup.py file that I would like to modify from:

setup_kwargs = {
    'name': 'colour-science',
    'version': '0.3.14',
    'description': 'Colour Science for Python',
    'long_description': '',
    'author': 'Colour Developers',
    'author_email': None,
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://www.colour-science.org/',
    'package_dir': package_dir,
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'extras_require': extras_require,
    'python_requires': '>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*',
}


setup(**setup_kwargs)

to

setup(
    name= 'colour-science',
    version= '0.3.14',
    description= 'Colour Science for Python',
    long_description= '',
    author= 'Colour Developers',
    author_email= None,
    maintainer= None,
    maintainer_email= None,
    url= 'https://www.colour-science.org/',
    package_dir= package_dir,
    packages= packages,
    package_data= package_data,
    install_requires= install_requires,
    extras_require= extras_require,
    python_requires= '>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*',
)

So I was looking for all the Dict nodes with the hope of being able to find the name they are assigned to by going up the tree and maybe down. With ast it is possible to walk the tree directly and set an attribute as follows:

for node in ast.walk(root):
    for child in ast.iter_child_nodes(node):
        child.parent = node

[question] Extract body of the root functions from module

Hello,

Recently discovered this project during finding a solution for my problem.
I have a single python file, which contains one or more functions. I'd like to extract only body of those functions for splitting them later to separate files.

  1. Could you please review my attempt and suggest any improvements?
  2. Also I don't know how to separate nested functions in the visitor: tried to find some property of indentation, but can't locate it. Would be glad if you could help me with that.

https://repl.it/@Grigory1/splitter

Thank you.

3.8 Support

I have a test runner that uses Pyre as an option for type-checking, ptr. I put up a diff to move CI to start using official 3.8 for test running on travis and this results in the expected ValueError:

raise ValueError(
ValueError: LibCST can only parse code using one of the following versions of Python's grammar: 3.5, 3.6, 3.7. More versions may be supported by future releases.

What is there an expected timeframe for 3.8 support? I will have the tool not run pyre for now in >=3.8. What's the thoughts on possibly adding a section to the README explaining the 3.8 support plans with 3.8 being an official version now?

mypy CSTTypedTransformerFunctions issue

context

ran into a peculiar bug where the CSTTypedTransformerFunctions types seem to not work with mypy. leave_{Node} calls fail mypy type check with Signature of "leave_{Node}" is incompatible with supertype "CSTTypedTransformerFunctions".

the type definition that mypy says is incompatible doesnt look incompatible to me, though:

@mark_no_op
def leave_Module(self, original_node: "Module", updated_node: "Module") -> "Module":
return updated_node

i'm not sure yet if this is an issue with mypy or libcst, but wanted to share here for replication. (mypy's error messages aren't super descriptive so am still trying to figure out where mypy is getting the incompatible defintion from.)

replicate (using: python 3.7.5)

./requirements.txt

libcst==0.2.4
mypy==0.750

./transformer.py

import libcst

class Transformer(libcst.CSTTransformer):

    def leave_Module(self, original_node: libcst.Module, updated_node: libcst.Module) -> libcst.Module:
        return updated_node

    def leave_Call(self, original_node: libcst.Call, updated_node: libcst.Call) -> libcst.Call:
        return updated_node

mypy errors

$ mypy transformer.py 
transformer.py:5: error: Signature of "leave_Module" incompatible with supertype "CSTTypedTransformerFunctions"
transformer.py:8: error: Signature of "leave_Call" incompatible with supertype "CSTTypedTransformerFunctions"
Found 2 errors in 1 file (checked 1 source file)

Compatibility with older python syntax

Because I've noticed a couple of references in other projects' bugs: I'm already working on this and don't anticipate that it will be super difficult. I will be adding support back to 2.7 (but probably stopping there). I have future-import parsing working as well, which is necessary for print statements and barry_as_FLUFL handling.

https://github.com/thatch/python-grammar-changes has a hand-compiled list of what changed, but I hope to create snippets and mechanically verify what versions they work on as well.

3.10+ support

PEP 617 (if it is accepted, and it probably will) is out and I'm wondering if LibCST's underlying parser is capable of parsing PEG grammar. With 3.10, the LL(1) restriction on the grammar will be deferred and this means that lib2to3.pgen2 won't be able to parse new changes on the python grammar. I'm not sure about internals of LibCST but from what I have seen in readme that it uses something that bases on lib2to3.pgen2. Does LibCST will continue to support newer python versions and their grammar? (We are currently using lib2to3 as our refactoring tool on unimport but we might need to migrate another tool to support 3.10+ which is why I am asking)

Add a `.code` alternative for all CSTNode subclasses

Related to the discussion here: #15 (review)

.code exists on Module, but not on other CSTNode subclasses.

We can only generate code in the context of a module because the module contains information about the default indentation level, newline, etc.

However it'd be useful to have a .code property on every node for debugging purposes. I just don't know how we can communicate these pitfalls to the developer.

I'm thinking we could define an API like this:

def code_with_context(self, default_newline: str = "\n", default_indent: str = "    "):
    ...

What do people think? Is it obvious enough when reading node.code_with_context() that the generated code might not exactly match the code in the original module? Should we use a different name? Is this a bad idea? Should we stick with Module.code_for_node?

Missing dependency in wheel uploaded to pypi (v20.4.1)

 ❯❯❯ pip install monkeytype
Collecting monkeytype
  Downloading MonkeyType-20.4.1-py3-none-any.whl (41 kB)
     |████████████████████████████████| 41 kB 18 kB/s 
Collecting mypy-extensions
  Downloading mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)
Installing collected packages: mypy-extensions, monkeytype
Successfully installed monkeytype-20.4.1 mypy-extensions-0.4.3
 ❯❯❯ monkeytype --help
Traceback (most recent call last):
  File "/tmp/nwani_1585950476/dev/bin/monkeytype", line 5, in <module>
    from monkeytype.cli import entry_point_main
  File "/tmp/nwani_1585950476/dev/lib/python3.8/site-packages/monkeytype/cli.py", line 16, in <module>
    from libcst import parse_module
ModuleNotFoundError: No module named 'libcst'
 ❯❯❯                          

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.