flacle / truegitcodechurn Goto Github PK
View Code? Open in Web Editor NEWPython script to compute "true" code churn of a Git repository.
License: MIT License
Python script to compute "true" code churn of a Git repository.
License: MIT License
Search by commit would probably be optional arguments.
Reference: #10
as a user, I would like to generate the bar chart. How would I do so?
It would be interesting to be able to spit out the churn rate for each file in the repository (or a subfolder there of), to detect potential fragile code (high churn rate could indicate more bugs). Would such a feature be possible to add?
Hi @flacle !
First, thank you all for your work. Some colleagues of mine have expressed interest in this script and have requested a few features in line with some of the current issues (which I would also like to contribute ;) ).
Before I work on these features, I wanted to establish a safety net so I don't break anything. As such, I would like to request (and then submit) some unit tests to make sure functionality does not regress.
Hi,
Please assist, I am trying to test the script but i am getting an error with regards to passing the "dir" argument. Here's how i am calling the script
python ./gitcodechurn.py after="2021-09-21โ before="2021-09-27โ author="" dir="/Users/admin/Desktop/jmeter" -exdir="/Users/admin/Desktop/jmeter/bin"
usage: python [/]gitcodechurn.py after="YYYY[-MM[-DD]]" before="YYYY[-MM[-DD]]" author="flacle" dir="[/]path" [-exdir="[*/]path"]
gitcodechurn.py: error: the following arguments are required: dir
I am using python 3.10
Use case for authors that are working on multiple repos, we would probably then output a list in the console per repo.
Currently you have to manually compile a list of authors within a certain timespan and measure churn for each, so a use case would be a snapshot to compare time periods and so knowing specifically which contributors were active is then irrelevant.
Getting this error on my latest run:
Traceback (most recent call last):
File "/truegitcodechurn/./gitcodechurn.py", line 264, in
main()
File "/truegitcodechurn/./gitcodechurn.py", line 94, in main
[files, contribution, churn] = get_loc(
File "/truegitcodechurn/./gitcodechurn.py", line 121, in get_loc
results = get_proc_out(command, dir).splitlines()
File "/truegitcodechurn/./gitcodechurn.py", line 247, in get_proc_out
return process.communicate()[0].decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 650113: invalid continuation byte
Sometimes you may want to have the tool export a PNG (or SVG) of a plot as part of a pipeline or for ease-of-use. Users should be able to do optionally request this in a simple way.
First of all, I appreciate you for creating this awesome package. I have one question. If you have time, can you please tell me how you would calculate the churn of a particular commit? I mean, suppose, I committed 3 commits and I need to find the churn for each separate commit. The thing is now I need to check if the changes has been done within 21 days in that particular commit. How can I make sure the churn is based on 21 days for each commit?
Related to several other issues, it would make analytics easier if the files
structure, rather than storing data presently as:
{
"README.md": {
2: 0,
8: 0,
10: 0,
11: 0,
...
24: 2,
31: 0,
33: 1,
35: 1,
37: 3,
41: 12,
},
"gitcodechurn.py": {
0: 0,
1: 190,
2: 4,
4: 0,
11: -1,
15: 6,
16: 5,
37: 2,
...
167: 1,
172: 0,
173: 2,
189: 1,
191: 1,
192: 5,
193: 0,
196: 2,
197: 0,
198: 25,
200: 1,
217: 14,
223: 1,
224: 1,
},
}
instead tracked the count of removed and count of added. This additional data would allow more detailed analytics and nuance to questions regarding user specific churn and other questions.
I propose we instead utilize a structure such as:
{
"README.md": [
{"added": 0, "removed": 0, "line_number": 2},
{"added": 3, "removed": 1, "line_number": 42},
....
}
}
I would be happy to submit a PR in support of this.
Sometimes you may the need to exclude several subdirectories from your root directory, we should be able to pass in a list of some sort.
Given the recent increase in support (really awesome), it makes sense to add a contribution guideline, as suggested by GitHub.
Currently the script is a simple set of functions, encapsulating all functionality with OO allows for better interoperability within existing systems/workflows.
True Git Code Churn can use some additional minor enhancements:
Sometimes project folders or configuration files get checked in, these can or cannot be part of efforts to increase software quality. Users of this tool should be able to specify an optional parameter to exclude specific folders.
Sorry for a naive question. Can you please clarify what exactly is contribution
and churn
in this output, please?
contribution: 11000
churn: -900
Is here contribution means totally new work without any changes in the code? I mean brand new code that does not replace any older code?
Is here churn means only the changes in the same lines of code?
I think it will be better to consider efficiency and legacy refactor.
The date parser of Git (https://github.com/git/git/blob/master/date.c) provides a different result when month and day values are omitted. Appending default values for each attribute (before, after) will ensure better consistency.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.