hinnefe2 / gitrisky Goto Github PK
View Code? Open in Web Editor NEWPredict code bug risk with git metadata
License: MIT License
Predict code bug risk with git metadata
License: MIT License
After installing gitrisky and then cd'ing to a repo, I am able to train a repo without any errors. I see an output that says Model trained on 5464 training examples with 0 positive cases
If I then run gitrisky predict
or git risky predict -c id
I see the following output:
Traceback (most recent call last):
File "/usr/local/bin/gitrisky", line 9, in <module>
load_entry_point('gitrisky==0.1.2', 'console_scripts', 'gitrisky')()
File "/Library/Python/2.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/Library/Python/2.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/Library/Python/2.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Library/Python/2.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Library/Python/2.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/Library/Python/2.7/site-packages/gitrisky/cli.py", line 67, in predict
[(_, score)] = model.predict_proba(features)
ValueError: need more than 1 value to unpack
Hello,
Thanks for the nice library.
I am trying to run training script for my repo., but facing error starting with 'Fatal: no such path [ ]'
And training finishes with 0 positive cases which returns feature extraction error during prediction.
When I run predict with the model trained, it returns the error 'feature extraction error', and it seems like all the labels are set to 0.
I found that this case does not happen everytime. for some repos. it works well, but for some doesn't.
Is there any reason for this?
Thanks,
For larger commit histories it would be nice to have more detailed logging about the steps in model training (e.g. 'building features', 'calculating labels', 'training model'). We should probably be using the logging
module anyway. Or maybe tqdm progress bars?
Python 2.7.14
$ gitrisky train
Traceback (most recent call last):
File "/usr/local/bin/gitrisky", line 11, in <module>
load_entry_point('gitrisky==0.1.0rc0', 'console_scripts', 'gitrisky')()
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/gitrisky/cli.py", line 29, in train
model.fit(features, labels.label)
File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/forest.py", line 247, in fit
X = check_array(X, accept_sparse="csc", dtype=DTYPE)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 433, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: MERGE
I enjoyed your talk on how git metadata can be used to detect bugs in code, could a similar approach be applied pre-emptively to detect risk factors or vulnerabilities in software.
Hi Henry,
For the _get_commit_lines() in gitcmds.py, should it be match = re.match('@@ -(.) +(.) @@', header).group(2) instead of group(1), if you want to get the part prefixed by '+'
Right now we don't use any categorical string features, eg the commit tag (REF, BUG, TST, etc) or author name. This is because these features would need to be one-hot encoded before being passed to a sklearn estimator and doing that consistently is annoying.
One way to implement this would be to have the model be a sklearn Pipeline
with a DictVectorizer
step, or maybe the OneHotEncoder
that's coming in v0.20.
Right now the feature generation all happens in one (kind of gross) function in parsing.py
. We should refactor this to
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.