tophat / codewatch Goto Github PK

[deprecated] Monitor and manage deeply customizable metrics about your python code using ASTs

License: Apache License 2.0

Python 92.14% JavaScript 5.40% CSS 1.67% HTML 0.79%

python opensource code-metrics abstract-syntax-tree

codewatch's Introduction

This project is currently deprecated and may be archived. If you're looking for something similar, you could try bellybutton or writing a custom checker in pylint instead.

Overview

Monitor and manage deeply customizable metrics about your python code using ASTs.

Codewatch lets you write simple python code to track statistics about the state of your codebase and write lint-like assertions on those statistics. Use this to incrementally improve and evolve the quality of your code base, increase the visibility of problematic code, to encourage use of new patterns while discouraging old ones, to enforce coding style guides, or to prevent certain kinds of regression errors.

What codewatch does:

Traverses your project directory
Parses your code into AST nodes and calls your visitor functions
Your visitor functions run and populate a stats dictionary
After all visitor functions are called, your assertion functions are called
Your assertion functions can assert on data in the stats dictionary, save metrics to a dashboard, or anything you can think of

Installation

Python: 2.7, 3.6, 3.7

Execute the following in your terminal:

pip install codewatch

Usage

codewatch codewatch_config_module

codewatch_config_module is a module that should contain your visitors, assertions and filters (if required)

Visitors

You should use the @visit decorator. The passed in node is an astroid node which follows a similar API to ast.Node

from codewatch import visit


def _count_import(stats):
    stats.increment('total_imports_num')

@visit('import')
def count_import(node, stats, _rel_file_path):
    _count_import(stats)

@visit('importFrom')
def count_import_from(node, stats, _rel_file_path):
    _count_import(stats)

This will build a stats dictionary that contains something like the following:

{
    "total_imports_num": 763
}

Assertions

Once again in the codewatch_config_module you can add assertions against this stat dictionary using the @assertion decorator

from codewatch import assertion


@assertion()
def number_of_imports_not_too_high(stats):
    threshold = 700
    actual = stats.get('total_imports_num')
    err = 'There were {} total imports detected which exceeds threshold of {}'.format(actual, threshold)
    assert actual <= threshold, err

In this case, the assertion would fail since 763 is the newStat and the message:

There were 763 total imports detected which exceeds threshold of 700

would be printed

Filters

You can add the following optional filters:

directory_filter (defaults to skip test and migration directories)

# visit all directories
def directory_filter(_dir_name):
    return True

file_filter (defaults to only include python files, and skips test files)

# visit all files
def file_filter(_file_name):
    return True

Tune these filters to suit your needs.

Contributing

See the Contributing docs

Contributors

Thanks goes to these wonderful people emoji key:

_{Josh Doncaster Marsiglio} 💻	_{Rohit Jain} 💻	_{Chris Abiad} 💻	_{Francois Campbell} 💻	_{Monica Moore} 🎨	_{Jay Crumb} 📖	_{Jake Bolam} 🚇
_{Shouvik D'Costa} 📖	_{Siavash Bidgoly} 🚇	_{Noah Negin-Ulster} 💻	_{Vardan Nadkarni} 💻	_{greenkeeper[bot]} 🚇	_{Kazushige Tominaga} 💻

We welcome contributions from the community, Top Hatters and non-Top Hatters alike. Check out our contributing guidelines for more details.

Credits

Special thanks to Carol Skelly for donating the 'tophat' GitHub organization.

codewatch's People

Contributors

Stargazers

Watchers

Forkers

vardan10 tooooooooomy lagaby66

codewatch's Issues

An in-range update of docusaurus is breaking the build 🚨

The devDependency docusaurus was updated from `1.6.2` to `1.7.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

docusaurus is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

❌ ci/circleci: website: Your tests failed on CircleCI (Details).
✅ ci/circleci: python-27: Your tests passed on CircleCI! (Details).
✅ ci/circleci: python-36: Your tests passed on CircleCI! (Details).
✅ ci/circleci: python-37: Your tests passed on CircleCI! (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Explore changing assertion function API to `AssertionError` on failure

See #1 (comment)

Gather default metrics (imports, method and function calls) and make them available for assertion methods. Document metrics and how to use them in assertions.

bulk assertions generator

In a large codebase, we saw that we were writing lots of similar assertions that compared stats in a certain namespace to a dict representing the expected results. Seemed like it might be unnecessary boilerplate. Maybe there are easy ways we can help kill it.

Project deprecated?

@lime-green / @rohitjain

I've been focusing on other things than this project for a while now and haven't seen many changes coming though from other contributors, including for the auto-detected security issues.

I'm curious about your thoughts on whether it's time to mark this project as deprecated.

Add GitHub Issue Template

Support explicit ordering of visitor functions rather than purely relying on declaration order

Is your feature request related to a problem? Please describe.

Ensuring a strict ordering on the execution of "visitors" is necessary if we want to guarantee deterministic assertions. If visitor_a increments counter_a for some node of type X and visitor_b increments counter_b for node of type X only if counter_a is greater than 0, then changing the ordering that visitor_a and visitor_b are called will change the final value of the computed statistics, meaning assertions may differ from the run that used different ordering.

While typing out this issue I was trying to come up with a good use case for having one visitor function depend on another. I could not. Perhaps someone else can think of an example?

Even if we were to consider this an anti-pattern, we should attempt to minimize possibility of error / flakiness.

Changing to a reducer-style approach for visitors may encourage inter-visitor dependencies as per #10.

Currently order of execution of visitors is defined by declaration order, purely b/c we use a central registry with a global array that is appended to whenever the @visit decoration is executed. The use of a global registry seems like an anti-pattern, however in order to remove this registry, we'd need some way to guarantee visitor order (dir(module) sorts alphabetically, while module.__dict__.keys() is only guaranteed to be ordered in py3.6+).

Describe the solution you'd like

A pattern similar to https://pytest-dependency.readthedocs.io/en/latest/about.html#what-is-the-purpose, where you essentially mark the names of tests that must be run first. We can thus use a topological sort of the dependencies.

e.g.

@visit(nodes.FunctionDef)
def some_func(node, .., ..):
  pass

@visit(nodes.FunctionDef, predicate=None, depends_on=['some_func', 'some_other_func'])
def count_funcs(node, .., ..):
  pass

We'd examine "depends_on" to generate a graph and then topological sort. We could sort alphabetically for visitors with no dependencies (or that are tied in sorting order).

Describe alternatives you've considered

Use inspect module to identify line numbers of items in dir(). Use this to order. Pro: consistent in all versions of python. Con: probably slow (?) and will grow in complexity when your config is spread across multiple files.
Rely on module.dict which is sorted by declaration order in py3.6+ and should remain in whatever arbitrary order it ends up in pre-3.6 (i.e. re-running program should keep same order even if not declaration order). Con: behaviour isn't easy to understand pre-3.6.

Additional context

https://tophat-opensource.slack.com/archives/CE14KJGET/p1543705988030800

Implement docusaurus for codewatch

`Stats` instances should support comparison to `dict`s

Otherwise, it's easy to accidentally compare a Stats instance to a dict and not realize why it fails.

Public example app/repo using the codewatch

Is your feature request related to a problem? Please describe.
It would be great to see something like thm config that is public, so it's easy for people to understand how to use this project.

Describe the solution you'd like
A publicly available example/repo/project on how to use codewatch.

Describe alternatives you've considered
n/a

Additional context
n/a

make it possible to specify the base path instead of just using cwd

#1 (comment)

Add sane defaults for file and directory filters

#1 (comment)

Add code of conduct

If you don't return node in your visit functions, there's no error or warning

Maybe raise an error if the transformation doesn't return a node, or just return it explicitly from our NodeVisitor class

Document package release process

Is your feature request related to a problem? Please describe.

Missing documentation on core repo processes, mainly "release" process. What constitutes a release? Are there any requirements? Semantic versioning?

Describe the solution you'd like

Documentation added (and maybe a "process" / issue label specifically for questions around repo policies).

By default, file walking will include non-python files

Related to #11 for sure.

See #1 (comment)

Keeping a `ast.NodeVisitor` compatible API may have performance limitations

See #1 (comment)

Another problem with the NodeVisitor API is that it couples registering a visitor subclass (batch of related node visitors) to walking (and visiting) the AST.

It's not clear to me whether re-walking each AST for each subclass would scale reasonably if we had dozens or hundreds of registered subclasses. In other words, would we linearly increase total execution time for a given set of ASTs or would some level of caching dominate, making the subsequent AST walks negligible.

If we see performance problems, a clear optimization path to consider is to walk once and call each visitor (method) registered for that node type

directory_filter should receive full relative path

Directory filter methods receive the folder name but should receive the full path, relative to the base directory

Update for async API calling pattern (for debug-time use of linter failures)

For large codebases, it would be good if lint results could be communicated to devs before CI / pre-commit time. We were wondering if we could integrate with our web back end when it's in debug mode to show devs ASAP if there are problems with what they're doing.

Use pip-tools to fully pin all dependencies

Mostly just an idea for now.

Thinking about / reading this: https://hynek.me/articles/python-app-deps-2018/

We don’t really have deployment needs or anything, but having actually-repeatable-CI would still be pretty nice and I think would require all deps (even implicit ones) to be pinned.

pip tools is a nice way to maintain the difference between the explicit and implicit deps while still having them fully pinned

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

bin/codewatch should return an appropriate exit code if there were errors in the assertion functions

Return a non 0 exit if there are any errors caught in the assertion functions:
https://github.com/tophat/codewatch/blob/master/bin/codewatch#L39

Improve readability of result output

just printing the various dicts isn't exactly user-friendly ;)

Improve `visit()` decorator's argument syntax

See #1 (comment)

It's not clear to me what the normalization here (adding an initial cap to the passed in string param) adds to our API.

Maybe node_name should have to be a valid node class name?

I might also consider making the param to visit the actual class, but the boilerplate of importing everything you need seems painful. Would give us the comfy feeling of an enumeration though.

Consider making `wrapped_node_visitor` private (leading underscore)

See #1 (comment)

Don't traverse hidden directories/files by defaults

Is your feature request related to a problem? Please describe.

The default directory and file filters do not capture some basic cases.

Describe the solution you'd like

Add rules to default directory & file filters to ignore any file or directory that begins with a ".", excluding current directory (this captures ".git")

Describe alternatives you've considered

By default (or via opt-out option outside of default dir/file filter system), parse and then ignore any files/dirs that match a .gitignore rule.

Additional context

Alternative is now possible to implement due to #94

Check whether "expected" mis-uses of the API give meaningful error messages

See #1 (comment) for one example.

What kind of error do we see if we pass an invalid node name to our visit decorator?

Make this project windows compatible

Right now codewatch does not work in a windows environment due to usages of paths and subprocess

Consider and explore redux-style reducer approach for stats

I tend to agree that following the stats flow through the code is a little unclear. Maybe this would help.

See #1 (comment)

just a random comment: what about exploring a redux-style reducer approach for stats, where a visitor would receive the node and the top-level stats tree as arguments and return a new stats object?

remove debug level logging from bin/codewatch

Remove, or set to warning, or be user configurable

Read about asteroid + potential proof of concept - https://astroid.readthedocs.io/en/latest/

An in-range update of all-contributors-cli is breaking the build 🚨

🚨 Reminder! Less than one month left to migrate your repositories over to Snyk before Greenkeeper says goodbye on June 3rd! 💜 🚚💨 💚

Find out how to migrate to Snyk at greenkeeper.io

The devDependency all-contributors-cli was updated from `6.14.2` to `6.15.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

all-contributors-cli is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

❌ ci/circleci: python-27: CircleCI is running your tests (Details).
❌ ci/circleci: python-36: CircleCI is running your tests (Details).
✅ ci/circleci: website: Your tests passed on CircleCI! (Details).
❌ ci/circleci: python-37: Your tests failed on CircleCI (Details).

Release Notes for v6.15.0

6.15.0 (2020-05-24)

Features

contribution-types: add missing contribution types (#261) (bcc0d99)

Commits

The new version differs by 9 commits.

bcc0d99 feat(contribution-types): add missing contribution types (#261)
e987eb0 chore(package): update cz-conventional-changelog to version 3.0.0 (#198)
4573e29 docs: add AnandChowdhary as a contributor (#219)
33e1a43 chore(package): update semantic-release to version 16.0.0 (#242)
77923a3 docs: add kharaone as a contributor (#212)
b5d85de chore(package): update git-cz to version 4.1.0 (#243)
e2ed91d docs: add ilai-deutel as a contributor (#257)
d26cd47 docs: add MarceloAlves as a contributor (#222)
9a6cf19 chore(package): update kcd-scripts to version 5.0.0 (#246)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Automatically enforce black-style code formatting

Use filename instead of module path for the config module

Is your feature request related to a problem? Please describe.
When calling codewatch myu_codewatch_config_module.py, we are greeted with an ImportError: No module named py. If you know that codewatch interprets config_module.py as a Python module path, you know to remove the .py, but if you don't it's an annoying interface.

Describe the solution you'd like
In the event that #85 is not accepted, or if it is accepted with a manual override for the config file, make the CLI use the filename of the config module rather than a Python module path.

Use well-known file for the config module

Is your feature request related to a problem? Please describe.
The CLI command feels verbose compared to other tools.

Describe the solution you'd like
Like make, invoke, and eslint, use a well-known filename, perhaps codewatch.py as a default config module filename. This would make the CLI simpler, with the potential for just running codewatch and nothing more. This is related to #12 in that the CLI could become `codewatch .

We could preserve the ability to specify the config module under a named CLI parameter, such as codewatch --config <path to config file>.

Describe alternatives you've considered
No major alternatives, this feels like either it's changed or not.

tophat / codewatch Goto Github PK

codewatch's Introduction

Overview

Installation

Usage

Visitors

Assertions

Filters

Contributing

Contributors

Credits

codewatch's People

Contributors

Stargazers

Watchers

Forkers

codewatch's Issues

The devDependency docusaurus was updated from 1.6.2 to 1.7.0.

The devDependency all-contributors-cli was updated from 6.14.2 to 6.15.0.

6.15.0 (2020-05-24)

Features

Recommend Projects

Recommend Topics

Recommend Org

Jobs

The devDependency docusaurus was updated from `1.6.2` to `1.7.0`.

The devDependency all-contributors-cli was updated from `6.14.2` to `6.15.0`.