GithubHelp home page GithubHelp logo

jeffsvajlenko / bigcloneeval Goto Github PK

View Code? Open in Web Editor NEW
72.0 72.0 17.0 16.32 MB

BigCloneEval - A Clone Detection Tool Evaluation Framework for BigCloneBench

License: GNU General Public License v2.0

Shell 0.83% Batchfile 0.23% Java 98.79% Makefile 0.14%

bigcloneeval's Introduction

Hi there ๐Ÿ‘‹

bigcloneeval's People

Contributors

exkazuu avatar jeffsvajlenko avatar qw3ry avatar simonbaars avatar t45k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

bigcloneeval's Issues

each function has same number of clones

I queried the clones table and the functions table using following queries and obtained clones for each functionality.
sql="SELECT function_id_one, function_id_two from clones where functionality_id=8";
sql="SELECT id, name, startline, endline from functions WHERE id = " + function_id_one;
sql="SELECT id, name, startline, endline from functions WHERE id = " + function_id_two;
I observed that every function has same set of clone files e.g. for functionality 8 there are a total of 276 clones in the clones table. Yet, there are only 23 functions making up all of these clone pairs. Out of these 23 functions, one is query function and other 22 functions are clones of that query document/function. My question is how can all functions have equal number of clones? What am I doing wrong?

EvaluateTool fails to consider imported clones

I want to evaluate a tool with BigCloneEval.

To this end, I started with a single file to ensure that the whole pipeline works without issue. I ran my tool on BinarySearch.java.

Here are the detected clone pairs:

sample,BinarySearch.java,6,6,sample,BinarySearch.java,19,19
sample,BinarySearch.java,6,16,sample,BinarySearch.java,19,29
sample,BinarySearch.java,6,16,sample,BinarySearch.java,32,45
sample,BinarySearch.java,6,16,sample,BinarySearch.java,48,60
sample,BinarySearch.java,19,29,sample,BinarySearch.java,32,45
sample,BinarySearch.java,19,29,sample,BinarySearch.java,48,60
sample,BinarySearch.java,32,32,sample,BinarySearch.java,48,48
sample,BinarySearch.java,32,45,sample,BinarySearch.java,48,60

Following the README instructions, I registered a new tool, imported the pairs reported in the above file and ran the evaluateTool command in the following way.

./evaluateTool -t 1 -o report -st both -mis 0 -mil 6 -mip 6 -mit 50 -m "CoverageMatcher 0.7"

In the generated report recall is equal to 0 for every clone type despite the clone pairs in BinarySearch.java having been correctly detected.

I manually checked the tools database and the pairs were imported without issue.

Do you know what could be causing this problem? Are the parameter values I am using in evaluateTool inappropriate?

evaluate tool error

There is a bug in the ./evaluateTool command:

> ./evaluateTool -t1 -mis0 -mil0 -mip0 -mit0
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
	at tasks.EvaluateTool.main(EvaluateTool.java:243)

This is caused by the CLI parser, thinking -m was a matcher. It tries to split the matcher string and fails.

Suggested solution:

  • Put a error message when the matcher string cannot be parsed
  • use two long option names for those min/max options instead of a short and a long option, so the call above would become ./evaluateTool -t1 --mis0 --mil0 --mip0 --mit0

I'd be happy to do a PR for this, if this is wanted.

While browsing the code, I was thinking of picocli which would make the whole option parsing more compact. If this is an option, I could port the whole project from apache cli to picocli as part of the PR, if you are interested.

Error loading SubsumeMatcher

Trying to run ./evaluateTool with the SubsumeMatcher results in an error

./evaluateTool -t 1 -o ~/report -st both  -m "SubsumeMatcher 1 ratio 0.7" -mis 50 -mil 10 -mip 10 -mit 50
Error loading clone matcher.  Please see the exception for details:
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at cloneMatchingAlgorithms.CloneMatcher.load(CloneMatcher.java:21)
	at tasks.EvaluateTool.main(EvaluateTool.java:243)
Caused by: org.h2.jdbc.JdbcSQLException: Column "TOOL_ID" not found; SQL statement:
SELECT 1 FROM tool_1_clones where tool_id = ? and type1 = ? and name1 = ? and startline1 <= ? and endline1 >= ? and type2 = ? and name2 = ? and startline2 <= ? and endline2 >= ? [42122-176]
	at org.h2.message.DbException.getJdbcSQLException(DbException.java:344)
	at org.h2.message.DbException.get(DbException.java:178)
	at org.h2.message.DbException.get(DbException.java:154)
	at org.h2.expression.ExpressionColumn.optimize(ExpressionColumn.java:148)
	at org.h2.expression.Comparison.optimize(Comparison.java:179)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.expression.ConditionAndOr.optimize(ConditionAndOr.java:131)
	at org.h2.command.dml.Select.prepare(Select.java:834)
	at org.h2.command.Parser.prepareCommand(Parser.java:248)
	at org.h2.engine.Session.prepareLocal(Session.java:442)
	at org.h2.engine.Session.prepareCommand(Session.java:384)
	at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1188)
	at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:73)
	at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:276)
	at com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1024)
	at cloneMatchingAlgorithms.SubsumeMatcher.init(SubsumeMatcher.java:56)
	at cloneMatchingAlgorithms.SubsumeMatcher.<init>(SubsumeMatcher.java:48)
	... 6 more

Any quick fixes for this?

importClones command problem

I am getting this error, I was able to detect clones using Nicad6.0, It works great, but when I execute ./importClones -t 1 -c ../nicad.clones I got this error, I tried many times. Please help me Thank you in advance

(base) mrbot@bot:~/Desktop/bigcloneeval/BigCloneEval/commands$ ./importClones -t 1 -c ../nicad.clones
Some error occured with the database connection or interaction.

Please try a fresh copy of the datbase, and report the error to.

the developers.

org.h2.jdbc.JdbcSQLException: NULL not allowed for column "NAME1"; SQL statement:
INSERT INTO tool_1_clones SELECT * FROM csvread('/home/mrrobot/Desktop/SanjaySir/bigcloneeval/BigCloneEval/commands/../nicad.clones','type1,name1,startline1,endline1,type2,name2,startline2,endline2') [23502-176]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:344)
at org.h2.message.DbException.get(DbException.java:178)
at org.h2.message.DbException.get(DbException.java:154)
at org.h2.table.Column.validateConvertUpdateSequence(Column.java:308)
at org.h2.table.Table.validateConvertUpdateSequence(Table.java:726)
at org.h2.command.dml.Insert.addRow(Insert.java:196)
at org.h2.command.dml.Insert.insertRows(Insert.java:173)
at org.h2.command.dml.Insert.update(Insert.java:115)
at org.h2.command.CommandContainer.update(CommandContainer.java:79)
at org.h2.command.Command.executeUpdate(Command.java:254)
at org.h2.jdbc.JdbcStatement.executeUpdateInternal(JdbcStatement.java:132)
at org.h2.jdbc.JdbcStatement.executeUpdate(JdbcStatement.java:117)
at com.jolbox.bonecp.StatementHandle.executeUpdate(StatementHandle.java:497)
at database.Clones.importClones(Clones.java:40)
at tasks.ImportClones.main(ImportClones.java:121)

Rogue file encoding

This is more of a PSA, although it would be convenient if a future version of the dataset either replaced this file with a copy converted to utf-8, or included a note.

I noticed that there is one file, bcb_reduced/3/selected/649201.java, which uses the file encoding cp932. All other files are (at least compatible with) UTF-8, but I had to use a dedicated tool (https://pypi.org/project/charset-normalizer/) to infer the encoding of this file, so I thought I'd mention it in case others encounter the same problem.

VM import error with virtualbox

Virtualbox reports an SHA1 mismatch on import:

Digest mismatch (VERR_NOT_EQUAL): Attribute 'SHA1' on 'BigCloneEval-disk1.vmdk' does not match ('a05158d159d8bde33b0ef5a7d8aa8118b777879f' vs. 'eb8c1b8fd8167c800f59018ddef45622921d1c36').

table not found exception

There are 12 tables in the database and I am only able to query 4 of them.
CLONES
FALSE_POSITIVES
FUNCTIONALITIES
FUNCTIONS

For the rest I get the following exception
org.h2.jdbc.JdbcSQLException: Table " whatever the name " not found; SQL statement:
The schema really needs to be explained.

evaluate failed

$./evaluateTool -t 1 -o nicad.report

There is some error with the database. Try a new copy of the database, or report the error:
org.h2.jdbc.JdbcSQLException: Table "FUNCTIONALITIES" not found; SQL statement:
select id from functionalities [42102-176]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:344)
at org.h2.message.DbException.get(DbException.java:178)
at org.h2.message.DbException.get(DbException.java:154)
at org.h2.command.Parser.readTableOrView(Parser.java:5213)
at org.h2.command.Parser.readTableFilter(Parser.java:1220)
at org.h2.command.Parser.parseSelectSimpleFromPart(Parser.java:1859)
at org.h2.command.Parser.parseSelectSimple(Parser.java:1968)
at org.h2.command.Parser.parseSelectSub(Parser.java:1853)
at org.h2.command.Parser.parseSelectUnion(Parser.java:1674)
at org.h2.command.Parser.parseSelect(Parser.java:1662)
at org.h2.command.Parser.parsePrepared(Parser.java:434)
at org.h2.command.Parser.parse(Parser.java:306)
at org.h2.command.Parser.parse(Parser.java:278)
at org.h2.command.Parser.prepareCommand(Parser.java:243)
at org.h2.engine.Session.prepareLocal(Session.java:442)
at org.h2.engine.Session.prepareCommand(Session.java:384)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1188)
at org.h2.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:75)
at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
at database.Functionalities.getFunctionalityIds(Functionalities.java:17)
at evaluate.ToolEvaluator.(ToolEvaluator.java:2480)
at tasks.EvaluateTool.main(EvaluateTool.java:412)

Possible Mistakes in README file

We found a couple of possible mistakes, and would want to bring your attention to those.

In the "Step 3"

Extract the contents of BigCloneBench into the 'ijadataset' directory of the
BigCloneEval distribution.

Should it be: Extract the contents of "IJadataset_BCEvalVersion.tar.gz" into "ijadataset" directory of the BigCloneEval distribution.

This should create a directory 'ijadataset/bcb_sample/' which contains one sub-directory
per functionality in BigCloneBench.

Should it be: This should create a directory 'ijadataset/bcb_reduced/' which contains one sub-directory
per functionality in BigCloneBench.

Thanks

The problem of the import

The importClones return 0 after execution, but narrowing the dataset returns the correct number of rows. Is there a maximum line requirement for this?

How to evaluate parse tree based tool on IJADataset

To detect clones, we convert java codes to parse tree, then calculate the similarity of two parse trees to check whether they are clones or what. BigCloneBench gives an error, kindly help how can we convert IJAdataset to parse tree. We are converting java code to parse tree using ANTLR grammar, it needs the main function in java code to convert into a parse tree. (IJA Dataset contains java files without main function).
KIndly suggest how to go ahead to evaluate our work on BigCloneBench
Screenshot from 2020-07-15 14-46-29

Discrepancy between the file and the DB

Is there a discrepancy between the contents of bcb_reduced/14/selected/1280123.java and FUNCTIONS table in bcb.h2.db?
For example, record name: 1280123.java, type: selected, startline: 280, endline: 290 seems to point to a code fragment,

519    private final int search(char c, boolean exact) {
520        int low = 0;
521        int high = children.size() - 1;
522        while (low <= high) {
523            int middle = (low + high) / 2;
524            char cmiddle = get(middle).getLabelStart();
525            if (cmiddle < c) low = middle + 1; else if (c < cmiddle) high = middle - 1; else return middle;
526        }
527        if (exact) return -1;
528        return high;
529    }

but it actually points to

275    public Iterator getPrefixedBy(String prefix, int startOffset, int stopOffset) {
276        TrieNode node = root;
277        for (int i = startOffset; i < stopOffset; ) {
278            TrieEdge edge = node.get(prefix.charAt(i));
279            if (edge == null) {
280                return EMPTY_ITERATOR;
281            }
282            node = edge.getChild();
283            String label = edge.getLabel();
284            int j = match(prefix, i, stopOffset, label);
285            if (i + j == stopOffset) {
286                break;
287            } else if (j >= 0) {
288                node = null;
289                break;
290            } else {
291            }
292            i += label.length();
293        }
294        if (node == null) return EMPTY_ITERATOR; else return new ValueIterator(node);
295    }

(Due to this problem, type-1 clone recall of a clone detector I implemented won't reach 1.)

Where is the source code?

I need the source code of the files and their clones as training data for my language model. However, when I open bcb.h2.db to view, all tables do not contain source code but only their id and file name. So where can I find their source code?

How to obtain source code of each clone pair in this benchmark?

Dear authors,
I am of great interests to this widely-used clone detection benchmark.
Personally speaking, I wish to obtain the detail (i.e., source code) of each clone pair. However, after downloading the file, I found it is actually in db format. I would appreciate it if you could teach me how to achieve my goal.

Thanks a lot~

The recorded source ranges for some functions are incorrect

I'm experiencing a similar problem to #23 - except that the errors I've found lead to the snippets being syntactically invalid.

I'm trying to evaluate an AST-based analysis tool. My tool isn't strictly speaking a code-clone-detector but it can be used as one, in a way, and associated recall is relevant for my evaluation.

However, the way BigCloneEval performs the evaluation isn't appropriate for my needs, so I decided to extract the information on condition-positive clone-pairs from the database and scrape the source-code-snippets used in them from the source code using a python script, then run my own experiments.

I only captured a subset of the data (in accordance with the recommended settings for BCE), but so far I have found 14 functions that are mislabelled and generate syntactically-erroneous snippets.

For now I'm going to work on the basis that the label errors are offset by some small number of lines and correct my local tables accordingly. There may other cases, like yours, where they point to markedly distinct functions but infrequent statistical errors like that aren't particularly concerning for me at this time.

Here are the IDs of the erroneously-labelled functions I've come across so far:

  • 10056705
  • 10315215
  • 13877909
  • 13877913
  • 13877919
  • 13877920
  • 13877923
  • 16805228
  • 22648747
  • 3198237
  • 3357568
  • 5180413
  • 8234997
  • 8996916
  • 9191794

I might post my corrections, and/or scripts for locating such errors later if anyone is interested.

had a non-zero return value: 1

Hello,
I tried to execute this evaluation framework using Nicad6 runner, but I am getting this error, I tried all my ways to debug this configuration issues, but couldn't do it. Thank You in advance
Screenshot from 2020-07-11 13-11-43

tasks.Init file is missing

The last step of the installation section in the README asks the user to run the init script inside the commands directory.

Running the script, however, results in a failure:

Error: Could not find or load main class tasks.Init

This comes from the fact that there is no tasks.Init file in the top level directory of the project. Was running this script only necessary in prior versions of BigCloneEval?

Is there a partitionInput command?

Dear Sir:

I am trying to use the BigCloneEval, and I found that there is not a partitionInput file.I used the command "./detectClones -tr ~/nicadRunner -o ~/clones -mf 2000".

The myconfig is defaultreport in NiCad4.0/config. The nicadRunner:


#!/bin/bash
ulimit -s hard
root=dirname $1
dir=basename $1
path=$root/$dir
cd /home/shi/software/NiCad-4.0/
./nicad4 functions java "$path" defaultreport > /dev/null 2> /dev/null
java -jar Convert.jar ${path}_functions-blind-abstract-clones/${dir}_functions-blind-abstract-clones-0.30.xml 2> /dev/null

The output clones is a blank file and some xml file in the ijadataset/bcb_reduced/. I read the README. I thinke the output clones should be the format csv file and there should be some result.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.