hinnefe2 / gitrisky Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 14.0 54 KB

Predict code bug risk with git metadata

License: MIT License

Python 100.00%

gitrisky's People

Contributors

Stargazers

Watchers

Forkers

deenaik nunofernandes-plight jacsonrbinf jbravo harrybaa cxm0000 samir-kulkarni jg8481 imrohu peijiexie arthur-befumo sohanlal1234 mhyeonsoo louison400

gitrisky's Issues

Running gitrisky predict throws error ValueError: need more than 1 value to unpack

After installing gitrisky and then cd'ing to a repo, I am able to train a repo without any errors. I see an output that says Model trained on 5464 training examples with 0 positive cases

If I then run gitrisky predict or git risky predict -c id I see the following output:

Traceback (most recent call last):
  File "/usr/local/bin/gitrisky", line 9, in <module>
    load_entry_point('gitrisky==0.1.2', 'console_scripts', 'gitrisky')()
  File "/Library/Python/2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Library/Python/2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Python/2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Python/2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/gitrisky/cli.py", line 67, in predict
    [(_, score)] = model.predict_proba(features)
ValueError: need more than 1 value to unpack

Fatal: no such path error within git blame subprocesss

Hello,

Thanks for the nice library.
I am trying to run training script for my repo., but facing error starting with 'Fatal: no such path [ ]'
And training finishes with 0 positive cases which returns feature extraction error during prediction.
When I run predict with the model trained, it returns the error 'feature extraction error', and it seems like all the labels are set to 0.

I found that this case does not happen everytime. for some repos. it works well, but for some doesn't.
Is there any reason for this?

Thanks,

Add verbose option to generate more logging

For larger commit histories it would be nice to have more detailed logging about the steps in model training (e.g. 'building features', 'calculating labels', 'training model'). We should probably be using the logging module anyway. Or maybe tqdm progress bars?

App crashes at launch

Python 2.7.14

$ gitrisky train
Traceback (most recent call last):
  File "/usr/local/bin/gitrisky", line 11, in <module>
    load_entry_point('gitrisky==0.1.0rc0', 'console_scripts', 'gitrisky')()
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gitrisky/cli.py", line 29, in train
    model.fit(features, labels.label)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/forest.py", line 247, in fit
    X = check_array(X, accept_sparse="csc", dtype=DTYPE)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 433, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: MERGE

Can git meta data be used to detect risk factors or vulnerabilities in the code

I enjoyed your talk on how git metadata can be used to detect bugs in code, could a similar approach be applied pre-emptively to detect risk factors or vulnerabilities in software.

Group the git diff header

Hi Henry,

For the _get_commit_lines() in gitcmds.py, should it be match = re.match('@@ -(.) +(.) @@', header).group(2) instead of group(1), if you want to get the part prefixed by '+'

Add one-hot encoded string features

Right now we don't use any categorical string features, eg the commit tag (REF, BUG, TST, etc) or author name. This is because these features would need to be one-hot encoded before being passed to a sklearn estimator and doing that consistently is annoying.

One way to implement this would be to have the model be a sklearn Pipeline with a DictVectorizer step, or maybe the OneHotEncoder that's coming in v0.20.

Refactor feature generation

Right now the feature generation all happens in one (kind of gross) function in parsing.py. We should refactor this to

make the code less gross (potentially using the gitpython library we use elsewhere)
make this more extensible so that it's easy to add new features.

hinnefe2 / gitrisky Goto Github PK

gitrisky's People

Contributors

Stargazers

Watchers

Forkers

gitrisky's Issues

Running gitrisky predict throws error ValueError: need more than 1 value to unpack

Fatal: no such path error within git blame subprocesss

Add verbose option to generate more logging

App crashes at launch

Can git meta data be used to detect risk factors or vulnerabilities in the code

Group the git diff header

Add one-hot encoded string features

Refactor feature generation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs