GithubHelp home page GithubHelp logo

dupl's Introduction

dupl Build Status

dupl is a tool written in Go for finding code clones. So far it can find clones only in the Go source files. The method uses suffix tree for serialized ASTs. It ignores values of AST nodes. It just operates with their types (e.g. if a == 13 {} and if x == 100 {} are considered the same provided it exceeds the minimal token sequence size).

Due to the used method dupl can report so called "false positives" on the output. These are the ones we do not consider clones (whether they are too small, or the values of the matched tokens are completely different).

Installation

go get -u github.com/mibk/dupl

Usage

Usage of dupl:
  dupl [flags] [paths]

Paths:
  If the given path is a file, dupl will use it regardless of
  the file extension. If it is a directory it will recursively
  search for *.go files in that directory.

  If no path is given dupl will recursively search for *.go
  files in the current directory.

Flags:
  -files
        read file names from stdin one at each line
  -html
        output the results as HTML, including duplicate code fragments
  -plumbing
        plumbing (easy-to-parse) output for consumption by scripts or tools
  -t, -threshold size
        minimum token sequence size as a clone (default 15)
  -vendor
        check files in vendor directory
  -v, -verbose
        explain what is being done

Examples:
  dupl -t 100
        Search clones in the current directory of size at least
        100 tokens.
  dupl $(find app/ -name '*_test.go')
        Search for clones in tests in the app directory.
  find app/ -name '*_test.go' |dupl -files
        The same as above.

Example

The reduced output of this command with the following parameters for the Docker source code looks like this.

$ dupl -t 200 -html >docker.html

dupl's People

Contributors

dmitshur avatar honshu avatar iwankgb avatar krhubert avatar mibk avatar shalecraig avatar wgh- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dupl's Issues

Remove -connect and -serve

It would be nice to clean up the code a little bit and therefore remove these flags as well as the code that uses it. It is all related to using some instance of dupl as server and some other instances of dupl as clients to speed up the search.

Because there is no such repository that the speed of the search was the issue, and because sadly it is not faster, but rather much much slower instead, I propose to remove it. I also think that nobody is using it anyway. More likely people are confused what is it doing there.

I'm pretty sure nobody is using it. But in any case, I'm filing this issue. If nobody argues against it, I'm going to remove it in a week or so. Thank you.

Please mention that this tool required go 1.4 or greater

Since you use for range that go 1.4 supported.

If you use go 1.3, you will get:

Installing dupl -> go get -u github.com/mibk/dupl
# github.com/mibk/dupl/suffixtree
../../mibk/dupl/suffixtree/suffixtree.go:44: syntax error: unexpected range, expecting {
../../mibk/dupl/suffixtree/suffixtree.go:49: syntax error: unexpected }
gometalinter: error: failed to install dupl: exit status 2: exit status 2

Allow disabling of check with in code comments

Great tool by the way! It has enabled me to write better code by pointing places where things can be dry'd up.

It would be nice to disable output for certain places in code where I know there is duplication, but I've chosen to keep the duplication in place. Sometimes there are reasons for this.

I'm thinking something akin to a linting comment to disable duplication checking:

// dupl-disable
            Context("when specifying parameter set 1", func() {
                BeforeEach(func() {
                    request, _ = http.NewRequest("GET", "/?paramA=20&paramB=40", nil)
                })
                It("should reflect the desired parameters", func() {
                    server.GET("/", func(c *gin.Context) {
                        paramSet1, _ := getParamSet1(c)
                        Expect(paramSet1.A).To(Equal(20))
                        Expect(paramSet1.B).To(Equal(40))
                    })
                    server.ServeHTTP(recorder, request)
                })
            })
// dupl-enable

...

// dupl-disable
                Context("when specifying parameter set 2", func() {
                    BeforeEach(func() {
                        request, _ = http.NewRequest("GET", "/?paramX=5000&paramY=10000", nil)
                    })
                    It("will reflect the desired parameters", func() {
                        server.GET("/", func(c *gin.Context) {
                            paramSet2, _ := getParamSet2(c)
                            Expect(paramSet2.X).To(Equal(5000))
                            Expect(paramSet2.Y).To(Equal(10000))
                        })
                        server.ServeHTTP(recorder, request)
                    })
                })
// dupl-enable

why dupl report with different function name?

we have similar function but focus on two different variable, and for test it have test case as below, but it's reported from dupl.

pkg/collector/metric/types/types_test.go:134: 134-146 lines are duplicate of `pkg/collector/metric/types/types_test.go:147-159` (dupl)
                It("SumAllDeltaValues", func() {
                        value := instance.SumAllDeltaValues()
                        Expect(value).To(Equal(uint64(0)))
                        instance.SetDeltaStat("SumAllDeltaValues", uint64(2))
                        value = instance.SumAllDeltaValues()
                        Expect(value).To(Equal(uint64(2)))
                        instance.SetDeltaStat("SumAllDeltaValues", uint64(0))
                        value = instance.SumAllDeltaValues()
                        Expect(value).To(Equal(uint64(2)))
                        instance.SetDeltaStat("SumAllDeltaValues1", uint64(2))
                        value = instance.SumAllDeltaValues()
                        Expect(value).To(Equal(uint64(4)))
                })
pkg/collector/metric/types/types_test.go:147: 147-159 lines are duplicate of `pkg/collector/metric/types/types_test.go:134-146` (dupl)
                It("SumAllAggrValues", func() {
                        value := instance.SumAllAggrValues()
                        Expect(value).To(Equal(uint64(0)))
                        instance.SetAggrStat("SumAllAggrValues", uint64(2))
                        value = instance.SumAllAggrValues()
                        Expect(value).To(Equal(uint64(2)))
                        instance.SetAggrStat("SumAllAggrValues", uint64(0))
                        value = instance.SumAllAggrValues()
                        Expect(value).To(Equal(uint64(2)))
                        instance.SetAggrStat("SumAllAggrValues1", uint64(2))
                        value = instance.SumAllAggrValues()
                        Expect(value).To(Equal(uint64(4)))
                })

Why is the output non-deterministic?

I've found that running dupl on the same codebase multiple times produces different results. I'm not sure why this would be the case, any ideas?

Heres some examples to demonstrate what I mean:

>for i in $(seq 1 100);do dupl ~/go/src/github.com/spf13/cobra/ |grep Found;done  |sort |uniq -c
      6 Found total 82 clone groups.
     27 Found total 83 clone groups.
     52 Found total 84 clone groups.
     13 Found total 85 clone groups.
      2 Found total 86 clone groups.
> for i in $(seq 1 100);do dupl ~/go/src/golang.org/x/net/webdav |grep Found;done  |sort |uniq -c
      2 Found total 228 clone groups.
     30 Found total 229 clone groups.
     30 Found total 230 clone groups.
     23 Found total 231 clone groups.
     12 Found total 232 clone groups.
      3 Found total 233 clone groups.

I don't see a way to check the version of dupl I have installed, but I just updated to the latest commit as of the time of me posting this ticket (415e882)

dupl things two switch/case clauses are the same when they differ in variable type creation. Mention the actual behaviour in README.

Consider this go code:

switch raw.VersionField {
// v1.Version and v2.Version are just strings aliases.
case v1.Version:
    dec := json.NewDecoder(bytes.NewReader(b))
    dec.UseNumber()
    var m v1.QueueMessage
    // some more code
case v2.Version:
    dec := json.NewDecoder(bytes.NewReader(b))
    dec.UseNumber()
    var m v2.QueueMessage
    // some more code, similar to the previous clause.
}

Running dupl on this returns:

duplicate of ./api.go:119-150 (dupl) // Lines belong to the second clase
duplicate of ./api.go:87-118 (dupl)  // Lines belong to the first clause

There are two issues here:

  1. These two cases aren't the same, and differ by the var m v1.QueueMessage line.
  2. Presuming that they are the same, they're linking each other in the error message. I guess the link should be one way, from the duplicate (the second clause, the with v2) to the first.

Output in a format more easily digested by editors

Hello,

Interesting tool. Would it be possible to have an alternate output format that is more easily integrated with editors/IDEs?

Instead of:

found 3 clones:
  loc 1: main.go, line 139-139,
  loc 2: main.go, line 140-140,
  loc 3: main.go, line 150-150,

Something like:

main.go:139-139: duplicate of main.go:140-140, main.go:150-150
main.go:140-140: duplicate of main.go:139-139, main.go:150-150
main.go:150-150: duplicate of main.go:139-139, main.go:140-140

dupl not the same as dupl *.go

Running dupl will find duplicates across files. Running dupl *.go will only find duplicates inside each file. Given the feedback of #6 is it possible for multiple files specified on the command line to work the same as dupl on a directory?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.