flacle / truegitcodechurn Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 9.0 47 KB

Python script to compute "true" code churn of a Git repository.

License: MIT License

Python 100.00%

truegitcodechurn's People

Contributors

Stargazers

Watchers

Forkers

alexanderpinkerton sboysel jonathanguerne newmountain aishwarryavs kkane-wesleyan changecx-org

truegitcodechurn's Issues

Add the ability to measure churn for all authors within a given timespan without specifying a specific author filter.

Currently you have to manually compile a list of authors within a certain timespan and measure churn for each, so a use case would be a snapshot to compare time periods and so knowing specifically which contributors were active is then irrelevant.

understanding the jargon used in the code

Sorry for a naive question. Can you please clarify what exactly is contribution and churn in this output, please?

contribution: 11000
churn: -900

Is here contribution means totally new work without any changes in the code? I mean brand new code that does not replace any older code?

Is here churn means only the changes in the same lines of code?

I think it will be better to consider efficiency and legacy refactor.

Add contribution guideline

Given the recent increase in support (really awesome), it makes sense to add a contribution guideline, as suggested by GitHub.

Transition codebase to classes

Currently the script is a simple set of functions, encapsulating all functionality with OO allows for better interoperability within existing systems/workflows.

Improve analytics data by tracking remove and add counts by line in `files`

Related to several other issues, it would make analytics easier if the files structure, rather than storing data presently as:

{
        "README.md": {
            2: 0,
            8: 0,
            10: 0,
            11: 0,
            ...
            24: 2,
            31: 0,
            33: 1,
            35: 1,
            37: 3,
            41: 12,
        },
        "gitcodechurn.py": {
            0: 0,
            1: 190,
            2: 4,
            4: 0,
            11: -1,
            15: 6,
            16: 5,
            37: 2,
           ...
            167: 1,
            172: 0,
            173: 2,
            189: 1,
            191: 1,
            192: 5,
            193: 0,
            196: 2,
            197: 0,
            198: 25,
            200: 1,
            217: 14,
            223: 1,
            224: 1,
        },
    }

instead tracked the count of removed and count of added. This additional data would allow more detailed analytics and nuance to questions regarding user specific churn and other questions.

I propose we instead utilize a structure such as:

{
        "README.md": [
            {"added": 0, "removed": 0, "line_number": 2},
            {"added": 3, "removed": 1, "line_number": 42},
            ....
       }
}

I would be happy to submit a PR in support of this.

Code churn per file

It would be interesting to be able to spit out the churn rate for each file in the repository (or a subfolder there of), to detect potential fragile code (high churn rate could indicate more bugs). Would such a feature be possible to add?

Add ability to exclude certain folders

Sometimes project folders or configuration files get checked in, these can or cannot be part of efforts to increase software quality. Users of this tool should be able to specify an optional parameter to exclude specific folders.

Please add testing

Hi @flacle !

First, thank you all for your work. Some colleagues of mine have expressed interest in this script and have requested a few features in line with some of the current issues (which I would also like to contribute ;) ).

Before I work on these features, I wanted to establish a safety net so I don't break anything. As such, I would like to request (and then submit) some unit tests to make sure functionality does not regress.

Date range is different when leaving the day and/or month from the before and/or after arguments

The date parser of Git (https://github.com/git/git/blob/master/date.c) provides a different result when month and day values are omitted. Appending default values for each attribute (before, after) will ensure better consistency.

Add ability to specify more than one excluded directory

Sometimes you may the need to exclude several subdirectories from your root directory, we should be able to pass in a list of some sort.

How is the barchart generated

as a user, I would like to generate the bar chart. How would I do so?

Add plotting mechanisms/options

Sometimes you may want to have the tool export a PNG (or SVG) of a plot as part of a pipeline or for ease-of-use. Users should be able to do optionally request this in a simple way.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 650113: invalid continuation byte

Getting this error on my latest run:

Traceback (most recent call last):
File "/truegitcodechurn/./gitcodechurn.py", line 264, in
main()
File "/truegitcodechurn/./gitcodechurn.py", line 94, in main
[files, contribution, churn] = get_loc(
File "/truegitcodechurn/./gitcodechurn.py", line 121, in get_loc
results = get_proc_out(command, dir).splitlines()
File "/truegitcodechurn/./gitcodechurn.py", line 247, in get_proc_out
return process.communicate()[0].decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 650113: invalid continuation byte

dir argument is not picked up

Hi,

Please assist, I am trying to test the script but i am getting an error with regards to passing the "dir" argument. Here's how i am calling the script

python ./gitcodechurn.py after="2021-09-21” before="2021-09-27” author="" dir="/Users/admin/Desktop/jmeter" -exdir="/Users/admin/Desktop/jmeter/bin"
usage: python [/]gitcodechurn.py after="YYYY[-MM[-DD]]" before="YYYY[-MM[-DD]]" author="flacle" dir="[/]path" [-exdir="[*/]path"]
gitcodechurn.py: error: the following arguments are required: dir

I am using python 3.10

Search by commit instead of dates

Search by commit would probably be optional arguments.

There are two arguments, start and end commit hashes.
Search by default is the short hash, any hash longer than this gets truncated to its short version, any hash shorter than this gets an error message.
If the timestamps of the start and end commits are not in order (reversed, equal) then the order has to be fixed, in case the same commit is used for both start and end date then the git command should just take one hash.
The optional commit arguments should override the positional dates arguments.

Reference: #10

Improve user experience

True Git Code Churn can use some additional minor enhancements:

change the order of before & after in the documentation and usage descriptions
add more specific usage copy in the read me
print also the author to reduce copy & paste mistakes in case outputs are manually copied over into a sheet

Churn on particular commit [QUESTION]

First of all, I appreciate you for creating this awesome package. I have one question. If you have time, can you please tell me how you would calculate the churn of a particular commit? I mean, suppose, I committed 3 commits and I need to find the churn for each separate commit. The thing is now I need to check if the changes has been done within 21 days in that particular commit. How can I make sure the churn is based on 21 days for each commit?

Add ability to specify more than one repository

Use case for authors that are working on multiple repos, we would probably then output a list in the console per repo.

flacle / truegitcodechurn Goto Github PK

truegitcodechurn's People

Contributors

Stargazers

Watchers

Forkers

truegitcodechurn's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs