GithubHelp home page GithubHelp logo

gitrisky's People

Contributors

hinnefe2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gitrisky's Issues

Running gitrisky predict throws error ValueError: need more than 1 value to unpack

After installing gitrisky and then cd'ing to a repo, I am able to train a repo without any errors. I see an output that says Model trained on 5464 training examples with 0 positive cases

If I then run gitrisky predict or git risky predict -c id I see the following output:

Traceback (most recent call last):
  File "/usr/local/bin/gitrisky", line 9, in <module>
    load_entry_point('gitrisky==0.1.2', 'console_scripts', 'gitrisky')()
  File "/Library/Python/2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Library/Python/2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Python/2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Python/2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/gitrisky/cli.py", line 67, in predict
    [(_, score)] = model.predict_proba(features)
ValueError: need more than 1 value to unpack

Fatal: no such path error within git blame subprocesss

Hello,

Thanks for the nice library.
I am trying to run training script for my repo., but facing error starting with 'Fatal: no such path [ ]'
And training finishes with 0 positive cases which returns feature extraction error during prediction.
When I run predict with the model trained, it returns the error 'feature extraction error', and it seems like all the labels are set to 0.

I found that this case does not happen everytime. for some repos. it works well, but for some doesn't.
Is there any reason for this?

Thanks,

Add verbose option to generate more logging

For larger commit histories it would be nice to have more detailed logging about the steps in model training (e.g. 'building features', 'calculating labels', 'training model'). We should probably be using the logging module anyway. Or maybe tqdm progress bars?

App crashes at launch

Python 2.7.14

$ gitrisky train
Traceback (most recent call last):
  File "/usr/local/bin/gitrisky", line 11, in <module>
    load_entry_point('gitrisky==0.1.0rc0', 'console_scripts', 'gitrisky')()
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gitrisky/cli.py", line 29, in train
    model.fit(features, labels.label)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/forest.py", line 247, in fit
    X = check_array(X, accept_sparse="csc", dtype=DTYPE)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 433, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: MERGE

Group the git diff header

Hi Henry,

For the _get_commit_lines() in gitcmds.py, should it be match = re.match('@@ -(.) +(.) @@', header).group(2) instead of group(1), if you want to get the part prefixed by '+'

Add one-hot encoded string features

Right now we don't use any categorical string features, eg the commit tag (REF, BUG, TST, etc) or author name. This is because these features would need to be one-hot encoded before being passed to a sklearn estimator and doing that consistently is annoying.

One way to implement this would be to have the model be a sklearn Pipeline with a DictVectorizer step, or maybe the OneHotEncoder that's coming in v0.20.

Refactor feature generation

Right now the feature generation all happens in one (kind of gross) function in parsing.py. We should refactor this to

  1. make the code less gross (potentially using the gitpython library we use elsewhere)
  2. make this more extensible so that it's easy to add new features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.