GithubHelp home page GithubHelp logo

laurawartschinski / vulnerabilitydetection Goto Github PK

View Code? Open in Web Editor NEW
96.0 96.0 41.0 300.69 MB

vulnerability detection in python source code with LSTM networks

TeX 53.73% Python 46.27%
deep-learning lstm machine-learning python thesis vulnerability-detection word2vec word2vec-model

vulnerabilitydetection's People

Contributors

laurawartschinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

vulnerabilitydetection's Issues

About dataset.

Hey, good afternoon. These data sets' names have X or Y at the end. What is the difference between X and Y, and where are they used?
2

The output of example is different

Hello, thank you for your work.
I tried to implent your work;but there were something wrong with my work.
Firstly, I want to use your w2v model: word2vec_withString10-100-200 ;but need .npy file; like other issue mentioned.So I have to make this model by myself.
When I try model.wv.most_similar("if") this command,the out put is a little different from yours.like
image
Then I try to implent an example,like your work.But the output is really different from your work like this.
image

I really don't know why...Could you please help me..I would really appreciate it

About labeling

Hey, it is mentioned in your master paper that "A small focus window traverses through the whole source code in steps of length n" at part "5.2 Processing the data". I wonder what 'steps of length n' means. Does n and m refer to the number of tokens, or the number of lines of code, or other indicators?

Common samples within training and test set

Hi,
As you provided in the paper, for each vulnerable token you make some samples (perhaps 200) with a moving window. My question is do you prevent the samples from one token to be common in training and test sets? Because there is many common tokens between them, results in optimistic performance.

漏洞数据集

请问data文件夹下的漏洞数据集是在哪里下载的,我需要这些数据集

requirements.txt

I hasn't find the env to run the example.So maybe you can share the requirements.txt?

About w2v_pythoncorpus.py

Hey, I am an undergraduate from Harbin Institute of Technology, China, and I am very insterestd in your research. But I have some trouble running file w2v_pythoncorpus.py. When I input 'python3 w2v_pythoncorpus.py' in CMD, it shows me some mistakes. Could you help me with it? Thank you very much!
1

Error while testing model

I get this error on this line: yhat_classes = model.predict_classes(X_finaltest, verbose=0)
.
.
.
inferred_from[input_arg.type_attr]))

TypeError: Input 'b' of 'MatMul' Op has type float32 that does not match type int32 of argument 'a'.

I encounter the same error while trying to demo the examples as well. Would you be able to tell me you tensorflow, keras and numpy versions please?

The output of example is different

Hello, thank you for your work.
I tried to implent your work;but there were something wrong with my work.
Firstly, I want to use your w2v model: word2vec_withString10-100-200 ;but need .npy file; like other issue mentioned.So I have to make this model by myself.
When I try model.wv.most_similar("if") this command,the out put is a little different from yours.like
image
Then I try to implent an example,like your work.But the output is really different from your work like this.
image

I really don't know why...Could you please help me..I would really appreciate it

Cannot load pretrained Word2vec model

Please help, When I load the Word2Vec model, I have a problem that I cannot solve.

The code to run is as follows:

from gensim.models import Word2Vec
w2v_model = Word2Vec.load('E:/projectlzy/data/word2vec_withString10-100-200.model')

FileNotFoundError Traceback (most recent call last)
Input In [1], in <cell line: 2>()
1 from gensim.models import Word2Vec
----> 2 w2v_model = Word2Vec.load('E:/projectlzy/data/word2vec_withString10-100-200.model')

File D:\Anaconda\envs\mykeras\lib\site-packages\gensim\models\word2vec.py:1141, in Word2Vec.load(cls, *args, **kwargs)
1122 """Load a previously saved :class:~gensim.models.word2vec.Word2Vec model.
1123
1124 See Also
(...)
1138
1139 """
1140 try:
-> 1141 model = super(Word2Vec, cls).load(*args, **kwargs)
1143 # for backward compatibility for max_final_vocab feature
1144 if not hasattr(model, 'max_final_vocab'):

File D:\Anaconda\envs\mykeras\lib\site-packages\gensim\models\base_any2vec.py:1230, in BaseWordEmbeddingsModel.load(cls, *args, **kwargs)
1199 @classmethod
1200 def load(cls, *args, **kwargs):
1201 """Load a previously saved object (using :meth:~gensim.models.base_any2vec.BaseWordEmbeddingsModel.save) from file.
1202
1203 Also initializes extra instance attributes in case the loaded model does not include them.
(...)
1228
1229 """
-> 1230 model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
1231 if not hasattr(model, 'ns_exponent'):
1232 model.ns_exponent = 0.75

File D:\Anaconda\envs\mykeras\lib\site-packages\gensim\models\base_any2vec.py:602, in BaseAny2VecModel.load(cls, fname_or_handle, **kwargs)
575 @classmethod
576 def load(cls, fname_or_handle, **kwargs):
577 """Load a previously saved object (using :meth:gensim.models.base_any2vec.BaseAny2VecModel.save) from a file.
578
579 Parameters
(...)
600
601 """
--> 602 return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)

File D:\Anaconda\envs\mykeras\lib\site-packages\gensim\utils.py:436, in SaveLoad.load(cls, fname, mmap)
433 compress, subname = SaveLoad._adapt_by_suffix(fname)
435 obj = unpickle(fname)
--> 436 obj._load_specials(fname, mmap, compress, subname)
437 logger.info("loaded %s", fname)
438 return obj

File D:\Anaconda\envs\mykeras\lib\site-packages\gensim\utils.py:467, in SaveLoad._load_specials(self, fname, mmap, compress, subname)
465 logger.info("loading %s recursively from %s.* with mmap=%s", attrib, cfname, mmap)
466 with ignore_deprecation_warning():
--> 467 getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
469 for attrib in getattr(self, '__numpys', []):
470 logger.info("loading %s from %s with mmap=%s", attrib, subname(fname, attrib), mmap)

File D:\Anaconda\envs\mykeras\lib\site-packages\gensim\utils.py:478, in SaveLoad._load_specials(self, fname, mmap, compress, subname)
476 val = np.load(subname(fname, attrib))['val']
477 else:
--> 478 val = np.load(subname(fname, attrib), mmap_mode=mmap)
480 with ignore_deprecation_warning():
481 setattr(self, attrib, val)

File D:\Anaconda\envs\mykeras\lib\site-packages\numpy\lib\npyio.py:390, in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
388 own_fid = False
389 else:
--> 390 fid = stack.enter_context(open(os_fspath(file), "rb"))
391 own_fid = True
393 # Code to distinguish from NumPy binary files and pickles.

FileNotFoundError: [Errno 2] No such file or directory: 'E:/projectlzy/data/word2vec_withString10-100-200.model.wv.vectors.npy'

about w2v_pythoncorpus.py

J2@5IDUWXO2A()54SN}J`@E
Hello, I'm very interested in your project, but I'm a new hand . when I run w2v_pythoncorpus.py,The data on the "https://github.com/numpy/numpy" is not available.For a new hand like me, I don't know how to run this project completely.It would be great if you could come up with a video tutorial。thank you very much.

pydriller version

Hello, when I run w2v_pythoncorpus.py, it shows that my pydriller version is inappropriate. What is your pydriller version? Thank you very much.

About hightlighting the vulnerable code snippets

The implementation of highlighting defective code snippets is really so great!! But I don't fully understand its implementation principle. Could you please share with me the implementation ideas?
Thank you very much! :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.