GithubHelp home page GithubHelp logo

boyter / scc Goto Github PK

View Code? Open in Web Editor NEW
6.1K 35.0 238.0 10.4 MB

Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

License: MIT License

Go 84.06% PowerShell 2.94% Shell 8.84% HTML 2.40% Python 1.70% Dockerfile 0.05%
sloccount cloc complexity tokei windows linux macos cli statistics code

scc's Introduction

Sloc Cloc and Code (scc)

scc

A tool similar to cloc, sloccount and tokei. For counting the lines of code, blank lines, comment lines, and physical lines of source code in many programming languages.

Goal is to be the fastest code counter possible, but also perform COCOMO calculation like sloccount and to estimate code complexity similar to cyclomatic complexity calculators. In short one tool to rule them all.

Also it has a very short name which is easy to type scc.

If you don't like sloc cloc and code feel free to use the name Succinct Code Counter.

Go Go Report Card Coverage Status Scc Count Badge Mentioned in Awesome Go

Dual-licensed under MIT or the UNLICENSE.

Support

Using scc commercially? If you want priority support for scc you can purchase a years worth https://boyter.gumroad.com/l/kgenuv which entitles you to priority direct email support from the developer.

Install

Go Get

If you are comfortable using Go and have >= 1.17 installed:

go install github.com/boyter/scc/v3@latest

or bleeding edge with

go install github.com/boyter/scc@master

Snap

A snap install exists thanks to Ricardo.

$ sudo snap install scc

NB Snap installed applications cannot run outside of /home https://askubuntu.com/questions/930437/permission-denied-error-when-running-apps-installed-as-snap-packages-ubuntu-17 so you may encounter issues if you use snap and attempt to run outside this directory.

Homebrew

Or if you have homebrew installed

$ brew install scc

MacPorts

On macOS, you can also install via MacPorts

$ sudo port install scc

Scoop

Or if you are using Scoop on Windows

$ scoop install scc

Chocolatey

Or if you are using Chocolatey on Windows

$ choco install scc

FreeBSD

On FreeBSD, scc is available as a package

$ pkg install scc

Or, if you prefer to build from source, you can use the ports tree

$ cd /usr/ports/devel/scc && make install clean

Run in Docker

Go to the directory you want to run scc from.

Run the command below to run the latest release of scc on your current working directory:

docker run --rm -it -v "$PWD:/pwd"  ghcr.io/lhoupert/scc:master scc /pwd

Manual

Binaries for Windows, GNU/Linux and macOS for both i386 and x86_64 machines are available from the releases page.

GitHub Action workflow

https://github.com/marketplace/actions/scc-docker-action https://github.com/iRyanBell/scc-docker-action

.github/workflows/main.yml

on: [push]

jobs:
  scc_job:
    runs-on: ubuntu-latest
    name: A job to count the lines of code.
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Get the lines of code.
        id: scc
        uses: iryanbell/[email protected]
        with:
          args: ${{ env.workspace }} -i js,go,html,css

GitLab

https://about.gitlab.com/blog/2023/02/15/code-counting-in-gitlab/

Other

If you would like to assist with getting scc added into apt/chocolatey/etc... please submit a PR or at least raise an issue with instructions.

Background

Read all about how it came to be along with performance benchmarks,

Some reviews of scc

A talk given at the first GopherCon AU about scc (press S to see speaker notes)

For performance see the Performance section

Other similar projects,

  • SLOCCount the original sloc counter
  • cloc, inspired by SLOCCount; implemented in Perl for portability
  • gocloc a sloc counter in Go inspired by tokei
  • loc rust implementation similar to tokei but often faster
  • loccount Go implementation written and maintained by ESR
  • ployglot ATS sloc counter
  • tokei fast, accurate and written in rust
  • sloc coffeescript code counter

Interesting reading about other code counting projects tokei, loc, polyglot and loccount

Further reading about processing files on the disk performance

Using scc to process 40 TB of files from Github/Bitbucket/Gitlab

Pitch

Why use scc?

  • It is very fast and gets faster the more CPU you throw at it
  • Accurate
  • Works very well across multiple platforms without slowdown (Windows, Linux, macOS)
  • Large language support
  • Can ignore duplicate files
  • Has complexity estimations
  • You need to tell the difference between Coq and Verilog in the same directory
  • cloc yaml output support so potentially a drop in replacement for some users
  • Can identify or ignore minified files
  • Able to identify many #! files ADVANCED! #115
  • Can ignore large files by lines or bytes

Why not use scc?

  • You don't like Go for some reason
  • It cannot count D source with different nested multi-line comments correctly #27

Differences

There are some important differences between scc and other tools that are out there. Here are a few important ones for you to consider.

Blank lines inside comments are counted as comments. While the line is technically blank the decision was made that once in a comment everything there should be considered a comment until that comment is ended. As such the following,

/* blank lines follow


*/

Would be counted as 4 lines of comments. This is noticeable when comparing scc's output to other tools on large repositories.

scc is able to count verbatim strings correctly. For example in C# the following,

private const string BasePath = @"a:\";
// The below is returned to the user as a version
private const string Version = "1.0.0";

Because of the prefixed @ this string ends at the trailing " by ignoring the escape character \ and as such should be counted as 2 code lines and 1 comment. Some tools are unable to deal with this and instead count up to the "1.0.0" as a string which can cause the middle comment to be counted as code rather than a comment.

scc will also tell you the number of bytes it has processed (for most output formats) allowing you to estimate the cost of running some static analysis tools.

Usage

Command line usage of scc is designed to be as simple as possible. Full details can be found in scc --help or scc -h. Note that the below reflects the state of master not a release, as such features listed below may be missing from your installation.

$ scc -h                                                                                      
Sloc, Cloc and Code. Count lines of code in a directory with complexity estimation.
Version 3.2.0
Ben Boyter <[email protected]> + Contributors

Usage:
  scc [flags] [files or directories]

Flags:
      --avg-wage int                 average wage value used for basic COCOMO calculation (default 56286)
      --binary                       disable binary file detection
      --by-file                      display output for every file
      --ci                           enable CI output settings where stdout is ASCII
      --cocomo-project-type string   change COCOMO model type [organic, semi-detached, embedded, "custom,1,1,1,1"] (default "organic")
      --count-as string              count extension as language [e.g. jsp:htm,chead:"C Header" maps extension jsp to html and chead to C Header]
      --currency-symbol string       set currency symbol (default "$")
      --debug                        enable debug output
      --eaf float                    the effort adjustment factor derived from the cost drivers (1.0 if rated nominal) (default 1)
      --exclude-dir strings          directories to exclude (default [.git,.hg,.svn])
  -x, --exclude-ext strings          ignore file extensions (overrides include-ext) [comma separated list: e.g. go,java,js]
      --file-gc-count int            number of files to parse before turning the GC on (default 10000)
  -f, --format string                set output format [tabular, wide, json, csv, csv-stream, cloc-yaml, html, html-table, sql, sql-insert, openmetrics] (default "tabular")
      --format-multi string          have multiple format output overriding --format [e.g. tabular:stdout,csv:file.csv,json:file.json]
      --gen                          identify generated files
      --generated-markers strings    string markers in head of generated files (default [do not edit,<auto-generated />])
  -h, --help                         help for scc
  -i, --include-ext strings          limit to file extensions [comma separated list: e.g. go,java,js]
      --include-symlinks             if set will count symlink files
  -l, --languages                    print supported languages and extensions
      --large-byte-count int         number of bytes a file can contain before being removed from output (default 1000000)
      --large-line-count int         number of lines a file can contain before being removed from output (default 40000)
      --min                          identify minified files
  -z, --min-gen                      identify minified or generated files
      --min-gen-line-length int      number of bytes per average line for file to be considered minified or generated (default 255)
      --no-cocomo                    remove COCOMO calculation output
  -c, --no-complexity                skip calculation of code complexity
  -d, --no-duplicates                remove duplicate files from stats and output
      --no-gen                       ignore generated files in output (implies --gen)
      --no-gitignore                 disables .gitignore file logic
      --no-ignore                    disables .ignore file logic
      --no-large                     ignore files over certain byte and line size set by max-line-count and max-byte-count
      --no-min                       ignore minified files in output (implies --min)
      --no-min-gen                   ignore minified or generated files in output (implies --min-gen)
      --no-size                      remove size calculation output
  -M, --not-match stringArray        ignore files and directories matching regular expression
  -o, --output string                output filename (default stdout)
      --overhead float               set the overhead multiplier for corporate overhead (facilities, equipment, accounting, etc.) (default 2.4)
      --remap-all string             inspect every file and remap by checking for a string and remapping the language [e.g. "-*- C++ -*-":"C Header"]
      --remap-unknown string         inspect files of unknown type and remap by checking for a string and remapping the language [e.g. "-*- C++ -*-":"C Header"]
      --size-unit string             set size unit [si, binary, mixed, xkcd-kb, xkcd-kelly, xkcd-imaginary, xkcd-intel, xkcd-drive, xkcd-bakers] (default "si")
      --sloccount-format             print a more SLOCCount like COCOMO calculation
  -s, --sort string                  column to sort by [files, name, lines, blanks, code, comments, complexity] (default "files")
      --sql-project string           use supplied name as the project identifier for the current run. Only valid with the --format sql or sql-insert option
  -t, --trace                        enable trace output (not recommended when processing multiple files)
  -v, --verbose                      verbose output
      --version                      version for scc
  -w, --wide                         wider output with additional statistics (implies --complexity)

Output should look something like the below for the redis project

$ scc redis 
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
C                          296    180267    20367     31679   128221      32548
C Header                   215     32362     3624      6968    21770       1636
TCL                        143     28959     3130      1784    24045       2340
Shell                       44      1658      222       326     1110        187
Autoconf                    22     10871     1038      1326     8507        953
Lua                         20       525       68        70      387         65
Markdown                    16      2595      683         0     1912          0
Makefile                    11      1363      262       125      976         59
Ruby                        10       795       78        78      639        116
gitignore                   10       162       16         0      146          0
YAML                         6       711       46         8      657          0
HTML                         5      9658     2928        12     6718          0
C++                          4       286       48        14      224         31
License                      4       100       20         0       80          0
Plain Text                   3       185       26         0      159          0
CMake                        2       214       43         3      168          4
CSS                          2       107       16         0       91          0
Python                       2       219       12         6      201         34
Systemd                      2        80        6         0       74          0
BASH                         1       118       14         5       99         31
Batch                        1        28        2         0       26          3
C++ Header                   1         9        1         3        5          0
Extensible Styleshe…         1        10        0         0       10          0
Smarty Template              1        44        1         0       43          5
m4                           1       562      116        53      393          0
───────────────────────────────────────────────────────────────────────────────
Total                      823    271888    32767     42460   196661      38012
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $6,918,301
Estimated Schedule Effort (organic) 28.682292 months
Estimated People Required (organic) 21.428982
───────────────────────────────────────────────────────────────────────────────
Processed 9425137 bytes, 9.425 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

Note that you don't have to specify the directory you want to run against. Running scc will assume you want to run against the current directory.

You can also run against multiple files or directories scc directory1 directory2 file1 file2 with the results aggregated in the output.

Ignore Files

scc mostly supports .ignore files inside directories that it scans. This is similar to how ripgrep, ag and tokei work. .ignore files are 100% the same as .gitignore files with the same syntax, and as such scc will ignore files and directories listed in them. You can add .ignore files to ignore things like vendored dependency checked in files and such. The idea is allowing you to add a file or folder to git and have ignored in the count.

Interesting Use Cases

Used inside Intel Nemu Hypervisor to track code changes between revisions https://github.com/intel/nemu/blob/topic/virt-x86/tools/cloc-change.sh#L9 Appears to also be used inside both http://codescoop.com/ https://pinpoint.com/ https://github.com/chaoss/grimoirelab-graal

It also is used to count code and guess language types in https://searchcode.com/ which makes it one of the most frequently run code counters in the world.

You can also hook scc into your gitlab pipeline https://gitlab.com/guided-explorations/ci-cd-plugin-extensions/ci-cd-plugin-extension-scc

Also used by CodeQL #317 and Scaleway https://twitter.com/Scaleway/status/1488087029476995074?s=20&t=N2-z6O-ISDdDzULg4o4uVQ

Features

scc uses a small state machine in order to determine what state the code is when it reaches a newline \n. As such it is aware of and able to count

  • Single Line Comments
  • Multi Line Comments
  • Strings
  • Multi Line Strings
  • Blank lines

Because of this it is able to accurately determine if a comment is in a string or is actually a comment.

It also attempts to count the complexity of code. This is done by checking for branching operations in the code. For example, each of the following for if switch while else || && != == if encountered in Java would increment that files complexity by one.

Complexity Estimates

Lets take a minute to discuss the complexity estimate itself.

The complexity estimate is really just a number that is only comparable to files in the same language. It should not be used to compare languages directly without weighting them. The reason for this is that its calculated by looking for branch and loop statements in the code and incrementing a counter for that file.

Because some languages don't have loops and instead use recursion they can have a lower complexity count. Does this mean they are less complex? Probably not, but the tool cannot see this because it does not build an AST of the code as it only scans through it.

Generally though the complexity there is to help estimate between projects written in the same language, or for finding the most complex file in a project scc --by-file -s complexity which can be useful when you are estimating on how hard something is to maintain, or when looking for those files that should probably be refactored.

As for how it works.

It's my own definition, but tries to be an approximation of cyclomatic complexity https://en.wikipedia.org/wiki/Cyclomatic_complexity although done only on a file level.

The reason it's an approximation is that it's calculated almost for free from a CPU point of view (since its a cheap lookup when counting), whereas a real cyclomatic complexity count would need to parse the code. It gives a reasonable guess in practice though even if it fails to identify recursive methods. The goal was never for it to be exact.

In short when scc is looking through what it has identified as code if it notices what are usually branch conditions it will increment a counter.

The conditions it looks for are compiled into the code and you can get an idea for them by looking at the JSON inside the repository. See https://github.com/boyter/scc/blob/master/languages.json#L3524 for an example of what it's looking at for a file that's Java.

The increment happens for each of the matching conditions and produces the number you see.

COCOMO

The COCOMO statistics displayed at the bottom of any command line run can be configured as needed.

Estimated Cost to Develop (organic) $664,081
Estimated Schedule Effort (organic) 11.772217 months
Estimated People Required (organic) 5.011633

To change the COCOMO parameters, you can either use one of the default COCOMO models.

scc --cocomo-project-type organic
scc --cocomo-project-type semi-detached
scc --cocomo-project-type embedded

You can also supply your own parameters if you are familiar with COCOMO as follows,

scc --cocomo-project-type "custom,1,1,1,1"

See below for details about how the model choices, and the parameters they use.

Organic – A software project is said to be an organic type if the team size required is adequately small, the problem is well understood and has been solved in the past and also the team members have a nominal experience regarding the problem.

scc --cocomo-project-type "organic,2.4,1.05,2.5,0.38"

Semi-detached – A software project is said to be a Semi-detached type if the vital characteristics such as team-size, experience, knowledge of the various programming environment lie in between that of organic and Embedded. The projects classified as Semi-Detached are comparatively less familiar and difficult to develop compared to the organic ones and require more experience and better guidance and creativity. Eg: Compilers or different Embedded Systems can be considered of Semi-Detached type.

scc --cocomo-project-type "semi-detached,3.0,1.12,2.5,0.35"

Embedded – A software project with requiring the highest level of complexity, creativity, and experience requirement fall under this category. Such software requires a larger team size than the other two models and also the developers need to be sufficiently experienced and creative to develop such complex models.

scc --cocomo-project-type "embedded,3.6,1.20,2.5,0.32"

Large File Detection

You can have scc exclude large files from the output.

The option to do so is --no-large which by default will exclude files over 1,000,000 bytes or 40,000 lines.

You can control the size of either value using --large-byte-count or --large-line-count.

For example to exclude files over 1,000 lines and 50kb you could use the following,

scc --no-large --large-byte-count 50000 --large-line-count 1000

Minified/Generated File Detection

You can have scc identify and optionally remove files identified as being minified or generated from the output.

You can do so by enabling the -z flag like so scc -z which will identify any file with an average line byte size >= 255 (by default) as being minified.

Minified files appear like so in the output.

$ scc --no-cocomo -z ./examples/minified/jquery-3.1.1.min.js
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
JavaScript (min)             1         4        0         1        3         17
───────────────────────────────────────────────────────────────────────────────
Total                        1         4        0         1        3         17
───────────────────────────────────────────────────────────────────────────────
Processed 86709 bytes, 0.087 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

Minified files are indicated with the text (min) after the language name.

Generated files are indicated with the text (gen) after the language name.

You can control the average line byte size using --min-gen-line-length such as scc -z --min-gen-line-length 1. Please note you need -z as modifying this value does not imply minified detection.

You can exclude minified files from the count totally using the flag --no-min-gen. Files which match the minified check will be excluded from the output.

Remapping

Some files may not have an extension. They will be checked to see if they are a #! file. If they are then the language will be remapped to the correct language. Otherwise, it will not process.

However, you may have the situation where you want to remap such files based on a string inside it. To do so you can use --remap-unknown

 scc --remap-unknown "-*- C++ -*-":"C Header"

The above will inspect any file with no extension looking for the string -*- C++ -*- and if found remap the file to be counted using the C Header rules. You can have multiple remap rules if required,

 scc --remap-unknown "-*- C++ -*-":"C Header","other":"Java"

There is also the --remap-all parameter which will remap all files.

Note that in all cases if the remap rule does not apply normal #! rules will apply.

Output Formats

By default scc will output to the console. However you can produce output in other formats if you require.

The different options are tabular, wide, json, csv, csv-stream, cloc-yaml, html, html-table, sql, sql-insert, openmetrics.

Note that you can write scc output to disk using the -o, --output option. This allows you to specify a file to write your output to. For example scc -f html -o output.html will run scc against the current directory, and output the results in html to the file output.html.

You can also write to multiple output files, or multiple types to stdout if you want using the --format-multi option. This is most useful when working in CI/CD systems where you want HTML reports as an artefact while also displaying the counts in stdout.

scc --format-multi "tabular:stdout,html:output.html,csv:output.csv"

The above will run against the current directory, outputting to standard output the default output, as well as writing to output.html and output.csv with the appropriate formats.

Tabular

This is the default output format when scc is run.

Wide

Wide produces some additional information which is the complexity/lines metric. This can be useful when trying to identify the most complex file inside a project based on the complexity estimate.

JSON

JSON produces JSON output. Mostly designed to allow scc to feed into other programs.

Note that this format will give you the byte size of every file scc reads allowing you to get a breakdown of the number of bytes processed.

CSV

CSV as an option is good for importing into a spreadsheet for analysis.

Note that this format will give you the byte size of every file scc reads allowing you to get a breakdown of the number of bytes processed. Also note that CSV respects --by-file and as such will return a summary by default.

CSV-Stream

csv-stream is an option useful for processing very large repositories where you are likely to run into memory issues. It's output format is 100% the same as CSV.

Note that you should not use this with the format-multi option as it will always print to standard output, and because of how it works will negate the memory saving it normally gains. savings that this option provides. Note that there is no sort applied with this option.

cloc-yaml

Is a drop in replacement for cloc using its yaml output option. This is quite often used for passing into other build systems and can help with replacing cloc if required.

$ scc -f cloc-yml processor
# https://github.com/boyter/scc/
header:
  url: https://github.com/boyter/scc/
  version: 2.11.0
  elapsed_seconds: 0.008
  n_files: 21
  n_lines: 6562
  files_per_second: 2625
  lines_per_second: 820250
Go:
  name: Go
  code: 5186
  comment: 273
  blank: 1103
  nFiles: 21
SUM:
  code: 5186
  comment: 273
  blank: 1103
  nFiles: 21

$ cloc --yaml processor
      21 text files.
      21 unique files.
       0 files ignored.

---
# http://cloc.sourceforge.net
header :
  cloc_url           : http://cloc.sourceforge.net
  cloc_version       : 1.60
  elapsed_seconds    : 0.196972846984863
  n_files            : 21
  n_lines            : 6562
  files_per_second   : 106.613679608407
  lines_per_second   : 33314.2364566841
Go:
  nFiles: 21
  blank: 1137
  comment: 606
  code: 4819
SUM:
  blank: 1137
  code: 4819
  comment: 606
  nFiles: 21

HTML and HTML-TABLE

The HTML output options produce a minimal html report using a table that is either standalone html or as just a table html-table which can be injected into your own HTML pages. The only difference between the two is that the html option includes html head and body tags with minimal styling.

The markup is designed to allow your own custom styles to be applied. An example report is here to view.

Note that the HTML options follow the command line options, so you can use scc --by-file -f html to produce a report with every file and not just the summary.

Note that this format if it has the --by-file option will give you the byte size of every file scc reads allowing you to get a breakdown of the number of bytes processed.

SQL and SQL-Insert

The SQL output format "mostly" compatible with cloc's SQL output format https://github.com/AlDanial/cloc#sql-

While all queries on the cloc documentation should work as expected, you will not be able to append output from scc and cloc into the same database. This is because the table format is slightly different to account for scc including complexity counts and bytes.

The difference between sql and sql-insert is that sql will include table creation while the latter will only have the insert commands.

Usage is 100% the same as any other scc command but sql output will always contain per file details. You can compute totals yourself using SQL.

The below will run scc against the current directory, name the output as the project scc and then pipe the output to sqlite to put into the database code.db

scc --format sql --sql-project scc . | sqlite3 code.db

Assuming you then wanted to append another project

scc --format sql-insert --sql-project redis . | sqlite3 code.db

You could then run SQL against the database,

sqlite3 code.db 'select project,file,max(nCode) as nL from t
                         group by project order by nL desc;'

See the cloc documentation for more examples.

OpenMetrics

OpenMetrics is a metric reporting format specification extending the Prometheus exposition text format.

The produced output is natively supported by Prometheus and GitLab CI

Note that OpenMetrics respects --by-file and as such will return a summary by default.

The output includes a metadata header containing definitions of the returned metrics:

# TYPE scc_files count
# HELP scc_files Number of sourcecode files.
# TYPE scc_lines count
# UNIT scc_lines lines
# HELP scc_lines Number of lines.
# TYPE scc_code count
# HELP scc_code Number of lines of actual code.
# TYPE scc_comments count
# HELP scc_comments Number of comments.
# TYPE scc_blanks count
# HELP scc_blanks Number of blank lines.
# TYPE scc_complexity count
# HELP scc_complexity Code complexity.
# TYPE scc_bytes count
# UNIT scc_bytes bytes
# HELP scc_bytes Size in bytes.

The header is followed by the metric data in either language summary form:

scc_files{language="Go"} 1
scc_lines{language="Go"} 1000
scc_code{language="Go"} 1000
scc_comments{language="Go"} 1000
scc_blanks{language="Go"} 1000
scc_complexity{language="Go"} 1000
scc_bytes{language="Go"} 1000

or, if --by-file is present, in per file form:

scc_lines{language="Go",file="./bbbb.go"} 1000
scc_code{language="Go",file="./bbbb.go"} 1000
scc_comments{language="Go",file="./bbbb.go"} 1000
scc_blanks{language="Go",file="./bbbb.go"} 1000
scc_complexity{language="Go",file="./bbbb.go"} 1000
scc_bytes{language="Go",file="./bbbb.go"} 1000

Performance

Generally scc will the fastest code counter compared to any I am aware of and have compared against. The below comparisons are taken from the fastest alternative counters. See Other similar projects above to see all of the other code counters compared against. It is designed to scale to as many CPU's cores as you can provide.

However if you want greater performance and you have RAM to spare you can disable the garbage collector like the following on linux GOGC=-1 scc . which should speed things up considerably. For some repositories turning off the code complexity calculation via -c can reduce runtime as well.

Benchmarks are run on fresh 32 Core CPU Optimised Digital Ocean Virtual Machine 2022/09/20 all done using hyperfine with 3 warm-up runs and 10 timed runs.

scc v3.1.0
tokei v12.1.2
loc v0.5.0
polyglot v0.5.29

See https://github.com/boyter/scc/blob/master/benchmark.sh to see how the benchmarks are run.

Benchmark 1: scc redis
  Time (mean ± σ):      20.2 ms ±   1.7 ms    [User: 127.1 ms, System: 47.0 ms]
  Range (min … max):    16.8 ms …  25.8 ms    132 runs
 
Benchmark 2: scc -c redis
  Time (mean ± σ):      17.0 ms ±   1.4 ms    [User: 91.6 ms, System: 32.7 ms]
  Range (min … max):    14.3 ms …  21.6 ms    169 runs
 
Benchmark 3: tokei redis
  Time (mean ± σ):      33.7 ms ±   5.0 ms    [User: 246.4 ms, System: 55.0 ms]
  Range (min … max):    24.2 ms …  47.5 ms    76 runs
 
Benchmark 4: loc redis
  Time (mean ± σ):      36.9 ms ±  30.6 ms    [User: 756.5 ms, System: 20.7 ms]
  Range (min … max):     9.9 ms … 123.9 ms    71 runs
 
Benchmark 5: polyglot redis
  Time (mean ± σ):      21.8 ms ±   0.9 ms    [User: 32.1 ms, System: 46.3 ms]
  Range (min … max):    20.0 ms …  28.4 ms    138 runs
 
Summary
  'scc -c redis' ran
    1.19 ± 0.14 times faster than 'scc redis'
    1.28 ± 0.12 times faster than 'polyglot redis'
    1.98 ± 0.33 times faster than 'tokei redis'
    2.17 ± 1.81 times faster than 'loc redis'
Benchmark 1: scc cpython
  Time (mean ± σ):      52.6 ms ±   3.8 ms    [User: 624.3 ms, System: 121.5 ms]
  Range (min … max):    45.3 ms …  62.3 ms    47 runs
 
Benchmark 2: scc -c cpython
  Time (mean ± σ):      46.0 ms ±   3.8 ms    [User: 468.0 ms, System: 111.2 ms]
  Range (min … max):    40.0 ms …  58.0 ms    67 runs
 
Benchmark 3: tokei cpython
  Time (mean ± σ):     110.4 ms ±   6.6 ms    [User: 1239.8 ms, System: 114.5 ms]
  Range (min … max):    98.3 ms … 123.6 ms    26 runs
 
Benchmark 4: loc cpython
  Time (mean ± σ):      52.9 ms ±  25.2 ms    [User: 1103.0 ms, System: 57.4 ms]
  Range (min … max):    30.0 ms … 118.9 ms    49 runs
 
Benchmark 5: polyglot cpython
  Time (mean ± σ):      82.4 ms ±   3.0 ms    [User: 153.3 ms, System: 168.8 ms]
  Range (min … max):    74.8 ms …  88.7 ms    36 runs
 
Summary
  'scc -c cpython' ran
    1.14 ± 0.13 times faster than 'scc cpython'
    1.15 ± 0.56 times faster than 'loc cpython'
    1.79 ± 0.16 times faster than 'polyglot cpython'
    2.40 ± 0.24 times faster than 'tokei cpython'
Benchmark 1: scc linux
  Time (mean ± σ):     743.0 ms ±  18.8 ms    [User: 17133.4 ms, System: 1280.2 ms]
  Range (min … max):   709.4 ms … 778.8 ms    10 runs
 
Benchmark 2: scc -c linux
  Time (mean ± σ):     528.8 ms ±  11.8 ms    [User: 10272.0 ms, System: 1236.9 ms]
  Range (min … max):   508.9 ms … 543.1 ms    10 runs
 
Benchmark 3: tokei linux
  Time (mean ± σ):     736.5 ms ±  18.2 ms    [User: 13098.3 ms, System: 2276.0 ms]
  Range (min … max):   699.3 ms … 760.8 ms    10 runs
 
Benchmark 4: loc linux
  Time (mean ± σ):     567.1 ms ± 113.4 ms    [User: 15984.5 ms, System: 1037.0 ms]
  Range (min … max):   381.8 ms … 656.3 ms    10 runs
 
Benchmark 5: polyglot linux
  Time (mean ± σ):      1.241 s ±  0.027 s    [User: 2.973 s, System: 2.636 s]
  Range (min … max):    1.196 s …  1.299 s    10 runs
 
Summary
  'scc -c linux' ran
    1.07 ± 0.22 times faster than 'loc linux'
    1.39 ± 0.05 times faster than 'tokei linux'
    1.41 ± 0.05 times faster than 'scc linux'
    2.35 ± 0.07 times faster than 'polyglot linux'

If you enable duplicate detection expect performance to fall by about 20% in scc.

Performance is tracked over each release and presented below. Currently, the most recent release 3.1.0 is the fastest version.

scc

https://jsfiddle.net/m1w7kgqv/

CI/CD Support

Some CI/CD systems which will remain nameless do not work very well with the box-lines used by scc. To support those systems better there is an option --ci which will change the default output to ASCII only.

$ scc --ci main.go
-------------------------------------------------------------------------------
Language                 Files     Lines   Blanks  Comments     Code Complexity
-------------------------------------------------------------------------------
Go                           1       272        7         6      259          4
-------------------------------------------------------------------------------
Total                        1       272        7         6      259          4
-------------------------------------------------------------------------------
Estimated Cost to Develop $6,539
Estimated Schedule Effort 2.268839 months
Estimated People Required 0.341437
-------------------------------------------------------------------------------
Processed 5674 bytes, 0.006 megabytes (SI)
-------------------------------------------------------------------------------

The --format-multi option is especially useful in CI/CD where you want to get multiple output formats useful for storage or reporting.

Development

If you want to hack away feel free! PR's are accepted. Some things to keep in mind. If you want to change a language definition you need to update languages.json and then run go generate which will convert it into the processor/constants.go file.

For all other changes ensure you run all tests before submitting. You can do so using go test ./.... However for maximum coverage please run test-all.sh which will run gofmt, unit tests, race detector and then all of the integration tests. All of those must pass to ensure a stable release.

API Support

The core part of scc which is the counting engine is exposed publicly to be integrated into other Go applications. See https://github.com/pinpt/ripsrc for an example of how to do this.

It also powers all of the code calculations displayed in https://searchcode.com/ such as https://searchcode.com/file/169350674/main.go/ making it one of the more used code counters in the world.

However as a quick start consider the following,

Note that you must pass in the number of bytes in the content in order to ensure it is counted!

package main

import (
	"fmt"
	"io/ioutil"

	"github.com/boyter/scc/v3/processor"
)

type statsProcessor struct{}

func (p *statsProcessor) ProcessLine(job *processor.FileJob, currentLine int64, lineType processor.LineType) bool {
	switch lineType {
	case processor.LINE_BLANK:
		fmt.Println(currentLine, "lineType", "BLANK")
	case processor.LINE_CODE:
		fmt.Println(currentLine, "lineType", "CODE")
	case processor.LINE_COMMENT:
		fmt.Println(currentLine, "lineType", "COMMENT")
	}
	return true
}

func main() {
	bts, _ := ioutil.ReadFile("somefile.go")

	t := &statsProcessor{}
	filejob := &processor.FileJob{
		Filename: "test.go",
		Language: "Go",
		Content:  bts,
		Callback: t,
		Bytes:    int64(len(bts)),
	}

	processor.ProcessConstants() // Required to load the language information and need only be done once
	processor.CountStats(filejob)
}

Adding/Modifying Languages

To add or modify a language you will need to edit the languages.json file in the root of the project, and then run go generate to build it into the application. You can then go install or go build as normal to produce the binary with your modifications.

Issues

Its possible that you may see the counts vary between runs. This usually means one of two things. Either something is changing or locking the files under scc, or that you are hitting ulimit restrictions. To change the ulimit see the following links.

To help identify this issue run scc like so scc -v . and look for the message too many open files in the output. If it is there you can rectify it by setting your ulimit to a higher value.

Low Memory

If you are running scc in a low memory environment < 512 MB of RAM you may need to set --file-gc-count to a lower value such as 0 to force the garbage collector to be on at all times.

A sign that this is required will be scc crashing with panic errors.

Tests

scc is pretty well tested with many unit, integration and benchmarks to ensure that it is fast and complete.

Package

Packaging as of version v3.1.0 is done through https://goreleaser.com/

Containers

Note if you plan to run scc in Alpine containers you will need to build with CGO_ENABLED=0.

See the below dockerfile as an example on how to achieve this based on this issue #208

FROM golang as scc-get

ENV GOOS=linux \
GOARCH=amd64 \
CGO_ENABLED=0

ARG VERSION
RUN git clone --branch $VERSION --depth 1 https://github.com/boyter/scc
WORKDIR /go/scc
RUN go build -ldflags="-s -w"

FROM alpine
COPY --from=scc-get /go/scc/scc /bin/
ENTRYPOINT ["scc"]

Badges (beta)

You can use scc to provide badges on your github/bitbucket/gitlab/sr.ht open repositories. For example, Scc Count Badge The format to do so is,

https://sloc.xyz/PROVIDER/USER/REPO

An example of the badge for scc is included below, and is used on this page.

[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/)](https://github.com/boyter/scc/)

By default the badge will show the repo's lines count. You can also specify for it to show a different category, by using the ?category= query string.

Valid values include code, blanks, lines, comments, cocomo and examples of the appearance are included below.

Scc Count Badge Scc Count Badge Scc Count Badge Scc Count Badge Scc Count Badge

For cocomo you can also set the avg-wage value similar to scc itself. For example,

https://sloc.xyz/github/boyter/scc/?category=cocomo&avg-wage=1 https://sloc.xyz/github/boyter/scc/?category=cocomo&avg-wage=100000

Note that the avg-wage value must be a positive integer otherwise it will revert back to the default value of 56286.

NB it may not work for VERY large repositories (has been tested on Apache hadoop/spark without issue).

You can find the source code for badges in the repository at https://github.com/boyter/scc/blob/master/cmd/badges/main.go

A example for each supported provider

Languages

List of supported languages. The master version of scc supports 239 languages at last count. Note that this is always assumed that you built from master, and it might trail behind what is actually supported. To see what your version of scc supports run scc --languages

Click here to view all languages supported by master

scc's People

Contributors

anthonymastrean avatar apocelipes avatar assistcontrol avatar boyter avatar carterli avatar commonloon102 avatar dbaggerman avatar dependabot[bot] avatar elindsey avatar elliotwutingfeng avatar foxdd avatar fschaefer avatar garklein avatar heliodex avatar jan-guenter avatar kaathewisegit avatar lhoupert avatar lukas-brenning avatar lunarwatcher avatar nemith avatar pombredanne avatar righolt avatar rmg avatar rtennill avatar serkonda7 avatar shynur avatar steverusso avatar sylr avatar vearutop avatar walter-weinmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scc's Issues

Doesn't exit with non-zero exit code on flag error

$ scc --bad-flag . 
Error: unknown flag: --bad-flag
Usage:
  scc [flags]

Flags:
      --avg-wage int          average wage value used for basic COCOMO calculation (default 56286)
      --binary                disable binary file detection
      --by-file               display output for every file
      --cocomo                remove COCOMO calculation output
      --debug                 enable debug output
      --exclude-dir strings   directories to exclude (default [.git,.hg,.svn])
      --file-gc-count int     number of files to parse before turning the GC on (default 10000)
  -f, --format string         set output format [tabular, wide, json, csv] (default "tabular")
  -h, --help                  help for scc
  -i, --include-ext strings   limit to file extensions [comma separated list: e.g. go,java,js]
  -l, --languages             print supported languages and extensions
  -c, --no-complexity         skip calculation of code complexity
  -d, --no-duplicates         remove duplicate files from stats and output
  -M, --not-match string      ignore files and directories matching regular expression
  -o, --output string         output filename (default stdout)
  -s, --sort string           column to sort by [files, name, lines, blanks, code, comments, complexity] (default "files")
  -t, --trace                 enable trace output. Not recommended when processing multiple files
  -v, --verbose               verbose output
      --version               version for scc
  -w, --wide                  wider output with additional statistics (implies --complexity)

$ echo $?
0

Some Ideas

Hi guys, thank you for building the project, i really enjoyed it.

Here are some ideas/suggestions.

  1. Angular projects put all their logic in html files, but the estimated cost is zero, perhaps if one finds ng-app or ng-if (or whatever the Angular kids are doing these days) then it should be possible to add complexity to the mix, I think the problem is that when code is hiding inside html files, as is often the case with JS, then the cost counter under-estimates the cost, but the opposite is also true,
    when the javascript files have declarative structures, like JSX, or React-looking things, then it is possible the tool considers these as "code" when in reality it is merely "layout" and may over-estimate the cost.

Is there something that can be done about this?
detecting html or xml inside js and starting a react-mode?

  1. How come the cost is always so high? I know estimating isn't perfect, but did it really take 4 people, 11 months, and 1/5 million dollars to build this particular tool? (maybe even WAY more considering salaries might be very underestimated, considering the talents of the SCC team?

is it over estimated?
under-estimated?

  1. Should there be a link explaining the numbers and how they were calculated
    simply, so that when the managers see the numbers they don't run away afraid?

  2. Can the tool add some default ignore folders like vendor in php and node_modules?

  3. Can mini-fied code be detected and omitted when a flag is present? (mostly libraries, not written by the code authors, or build artifacts)

  4. How should things like md, txt and html be calculated for cost?, particularly in the presence of generators?

  5. Can there be a estimate for cost of maintenance?

  6. should cost consider complexity of programming languages? it's not the same to have 10 lines of css than 10 lines of ML code than 10 lines of template meta-programming multi-threaded hardened c++ server code, or 10 lines of glsl shader code, or maybe 10 lines of a language like Haskell might be counted in complexity as the same as 10 lines of BASIC, does that even make sense? can there be a parameter where one can indicate the avg cost for each developer?, I know estimates aren't perfect, but I'd like to know this.

  7. how is code that generates other code being estimated for cost? is that more complex?

  8. what about self-modifying code?

  9. do generics impact the estimate cost? what about things like macros? can configuration files be auto-detected as such?

  10. I leave these ideas here in the hopes it would spark an interesting conversation :)

localhost:scc-master b$ ~/scc .
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
Java                        21      3725     2345       620      760        521
Go                          19      5452     4359       454      639        673
Python                       8       343      306        19       18         29
Markdown                     5       614      498         0      116          0
Powershell                   2       240       46       159       35          8
Plain Text                   2        31       24         0        7          0
License                      2        45       37         0        8          0
ignore                       2         2        2         0        0          0
Report Definition L…         1         0        0         0        0          0
Futhark                      1        29       16        10        3          2
Alloy                        1        50        4        40        6          0
Gherkin Specificati…         1         0        0         0        0          0
gitignore                    1        33       19         7        7          0
JSON                         1      6852     6851         0        1          0
nuspec                       1        22       22         0        0          0
Alchemist                    1        20       20         0        0         55
Wren                         1       188      131        35       22          8
Luna                         1        23       17         1        5          0
Q#                           1        31       23         2        6          5
Bitbucket Pipeline           1        23       22         0        1          0
Macromedia eXtensib…         1         0        0         0        0          0
TOML                         1        38        9        25        4          0
Varnish Configurati…         1         0        0         0        0          0
JavaServer Pages             1         0        0         0        0          0
Extensible Styleshe…         1         0        0         0        0          0
Docker ignore                1         3        2         1        0          0
Flow9                        1        21       12         6        3          5
Bosque                       1       179      139         8       32          1
Freemarker Template          1         0        0         0        0          0
Coq                          1       168      141         9       18          5
SystemVerilog                1        79       51        21        7          5
Monkey C                     1        74       62         0       12         12
YAML                         1        20       19         0        1          0
Shell                        1       294      252         8       34         36
V                            1        73       35        30        8          3
───────────────────────────────────────────────────────────────────────────────
Total                       88     18672    15464      1455     1753       1368
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $479,052
Estimated Schedule Effort 11.599424 months
Estimated People Required 4.892170
───────────────────────────────────────────────────────────────────────────────

Mix of CRLF (windows) and LF line endings

There seems to be a mix of CRLF and LF line endings when checked out. Perhaps a .gitattributes file should be in place?

https://help.github.com/articles/dealing-with-line-endings/

CRLF:

$ grep -lr $'\r' * 
CODE_OF_CONDUCT.md
CONTRIBUTING.md
examples/speedwalk/main.go
examples/godirwalk/main.go
examples/mmap/main_test.go
examples/mmap/main.go
examples/complexity/complexity.java
examples/create_folders_with_files.py
examples/walk/main.go
examples/stringconcat/main_test.go
examples/fileread/100k
examples/fileread/main_test.go
examples/fileread/10k
examples/fileread/main.go
examples/fileread/fileread
examples/fileread/1000k
examples/fileread/linuxaverage
examples/cwalk/main.go
examples/nativewalk/main.go
.git/objects/pack/pack-3c13ce3ebdc90d9918848fd20574e2bf9ed60025.idx
.git/objects/pack/pack-3c13ce3ebdc90d9918848fd20574e2bf9ed60025.pack
.git/index
languages.json
processor/helpers.go
processor/helpers_test.go
processor/formatters_test.go
processor/cocomo_test.go
processor/file.go
processor/structs_test.go
processor/file_test.go
processor/processor_test.go
processor/cocomo.go
UNLICENSE

LF

$ grep -Lr $'\r' * 
assets/database_languages2.json
assets/merge.py
assets/database_languages.json
examples/onenewline.py
examples/oneline.py
examples/twonewline.py
examples/threenewline.py
examples/twolines.py
examples/nolines.py
examples/fileread/textfile.json
.git/logs/HEAD
.git/logs/refs/heads/master
.git/logs/refs/remotes/origin/HEAD
.git/hooks/pre-receive.sample
.git/hooks/pre-rebase.sample
.git/hooks/update.sample
.git/hooks/pre-applypatch.sample
.git/hooks/post-update.sample
.git/hooks/pre-commit.sample
.git/hooks/pre-push.sample
.git/hooks/commit-msg.sample
.git/hooks/fsmonitor-watchman.sample
.git/hooks/prepare-commit-msg.sample
.git/hooks/applypatch-msg.sample
.git/description
.git/packed-refs
.git/info/exclude
.git/HEAD
.git/refs/heads/master
.git/refs/remotes/origin/HEAD
.git/config
.gitignore
Gopkg.lock
Gopkg.toml
LICENSE
main.go
processor/workers.go
processor/processor.go
processor/workers_test.go
processor/structs.go
processor/formatters.go
processor/constants.go
README.md
scripts/include.go
.travis.yml
vendor/github.com/monochromegane/go-gitignore/match.go
vendor/github.com/monochromegane/go-gitignore/util.go
vendor/github.com/monochromegane/go-gitignore/pattern.go
vendor/github.com/monochromegane/go-gitignore/index_scan_patterns.go
vendor/github.com/monochromegane/go-gitignore/gitignore.go
vendor/github.com/monochromegane/go-gitignore/patterns.go
vendor/github.com/monochromegane/go-gitignore/.travis.yml
vendor/github.com/monochromegane/go-gitignore/README.md
vendor/github.com/monochromegane/go-gitignore/LICENSE
vendor/github.com/monochromegane/go-gitignore/initial_holder.go
vendor/github.com/monochromegane/go-gitignore/depth_holder.go
vendor/github.com/monochromegane/go-gitignore/full_scan_patterns.go
vendor/github.com/pkg/errors/stack.go
vendor/github.com/pkg/errors/appveyor.yml
vendor/github.com/pkg/errors/errors.go
vendor/github.com/pkg/errors/.travis.yml
vendor/github.com/pkg/errors/.gitignore
vendor/github.com/pkg/errors/README.md
vendor/github.com/pkg/errors/LICENSE
vendor/github.com/urfave/cli/flag.go
vendor/github.com/urfave/cli/help.go
vendor/github.com/urfave/cli/flag-types.json
vendor/github.com/urfave/cli/CHANGELOG.md
vendor/github.com/urfave/cli/funcs.go
vendor/github.com/urfave/cli/generate-flag-types
vendor/github.com/urfave/cli/flag_generated.go
vendor/github.com/urfave/cli/category.go
vendor/github.com/urfave/cli/appveyor.yml
vendor/github.com/urfave/cli/runtests
vendor/github.com/urfave/cli/app.go
vendor/github.com/urfave/cli/command.go
vendor/github.com/urfave/cli/errors.go
vendor/github.com/urfave/cli/.travis.yml
vendor/github.com/urfave/cli/cli.go
vendor/github.com/urfave/cli/.gitignore
vendor/github.com/urfave/cli/README.md
vendor/github.com/urfave/cli/LICENSE
vendor/github.com/urfave/cli/context.go
vendor/github.com/urfave/cli/.flake8
vendor/github.com/edsrzf/mmap-go/mmap_unix.go
vendor/github.com/edsrzf/mmap-go/mmap_windows.go
vendor/github.com/edsrzf/mmap-go/.gitignore
vendor/github.com/edsrzf/mmap-go/README.md
vendor/github.com/edsrzf/mmap-go/msync_netbsd.go
vendor/github.com/edsrzf/mmap-go/msync_unix.go
vendor/github.com/edsrzf/mmap-go/LICENSE
vendor/github.com/edsrzf/mmap-go/mmap.go
vendor/github.com/iafan/cwalk/cwalk.go
vendor/github.com/iafan/cwalk/MIT-LICENSE.txt
vendor/github.com/iafan/cwalk/.gitignore
vendor/github.com/iafan/cwalk/README.md
vendor/github.com/MichaelTJones/walk/symlink_windows.go
vendor/github.com/MichaelTJones/walk/path_windows.go
vendor/github.com/MichaelTJones/walk/path_plan9.go
vendor/github.com/MichaelTJones/walk/path_unix.go
vendor/github.com/MichaelTJones/walk/walk.go
vendor/github.com/MichaelTJones/walk/README.md
vendor/github.com/MichaelTJones/walk/symlink.go
vendor/github.com/karrick/godirwalk/withoutNamlen.go
vendor/github.com/karrick/godirwalk/readdir_unix.go
vendor/github.com/karrick/godirwalk/dirent_fileno.go
vendor/github.com/karrick/godirwalk/readdir_windows.go
vendor/github.com/karrick/godirwalk/doc.go
vendor/github.com/karrick/godirwalk/withNamlen.go
vendor/github.com/karrick/godirwalk/walk.go
vendor/github.com/karrick/godirwalk/go.mod
vendor/github.com/karrick/godirwalk/.gitignore
vendor/github.com/karrick/godirwalk/README.md
vendor/github.com/karrick/godirwalk/LICENSE
vendor/github.com/karrick/godirwalk/readdir.go
vendor/github.com/karrick/godirwalk/dirent_ino.go

Panic When Using Glob

I got a panic when running scc on the command line in Windows 10.

The command was:

scc */*.cpp 

when run on a directory with no sub-directories (I forgot which directory I was in and ran this by accident). The same command works fine when there are sub-directories to enter.

The error message was:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x10 pc=0x52ad92]

goroutine 4 [running]:
github.com/boyter/scc/processor.walkDirectoryParallel(0x1140a080, 0x7, 0x11482140)
        /home/bboyter/Go/src/github.com/boyter/scc/processor/file.go:84 +0xf2
github.com/boyter/scc/processor.Process.func1.1(0x1140a080, 0x7, 0x11482140, 0x1140a070)
        /home/bboyter/Go/src/github.com/boyter/scc/processor/processor.go:344 +0x31
created by github.com/boyter/scc/processor.Process.func1
        /home/bboyter/Go/src/github.com/boyter/scc/processor/processor.go:343 +0xa5

Not a big issue, since the glob doesn't match any files anyway, but an error message would be nicer than a panic.

First comment line in C# is counted as Code

Describe the bug
When first line in C# file contains single-line comment (starts with //), it is counted as code.

To Reproduce

  1. Download file
    FooClass.zip
  2. Extract
  3. Scc

Expected behavior

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
C#                           1        14       11         2        1          0
───────────────────────────────────────────────────────────────────────────────
Total                        1        14       11         2        1          0
───────────────────────────────────────────────────────────────────────────────

Actual behavior

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
C#                           1        14       12         1        1          0
───────────────────────────────────────────────────────────────────────────────
Total                        1        14       12         1        1          0
───────────────────────────────────────────────────────────────────────────────

Desktop

  • Windows 10

Nicer help message

The help message produced by scc is somewhat confusing and hard to read.

The help text is currently 170 columns wide. It could benefit from line wrapping, ideally automatic based on terminal width, but I understand if that's out of scope.

Current:

   --wide, -w           Set to check produce more output such as complexity and code vs complexity ranking. Same as setting format to wide

Wrapping for 100 columns:

   --wide, -w           Set to check produce more output such as complexity and
                        code vs complexity ranking. Same as setting format to wide.

In addition, many of the help messages are phrased awkwardly, which makes the width issue worse.

For example, --files has the description "Set to specify you want to see the output for every file" which could be written more concisely as "Display output for every file".

options not recognized if path is specified

Thanks for the great app.
using scc version 1.6.0, I noted the following correct behaviour

➜  drgo scc --wl=go  
-------------------------------------------------------------------------------
Language                 Files     Lines     Code  Comments   Blanks Complexity
-------------------------------------------------------------------------------
Go                         377     47522    34848      7085     5589       6683
-------------------------------------------------------------------------------
Total                      377     47522    34848      7085     5589       6683
-------------------------------------------------------------------------------
Estimated Cost to Develop $1,124,299
Estimated Schedule Effort 16.040798 months
Estimated People Required 8.302536
-------------------------------------------------------------------------------

But this does not work

drgo scc rosewood --wl=go
-------------------------------------------------------------------------------
Language                 Files     Lines     Code  Comments   Blanks Complexity
-------------------------------------------------------------------------------
Go                          57      5532     3723      1313      496        820
HTML                        48      3464     3078         0      386          0
Plain Text                   9      1523     1463         0       60          0
JSON                         6       702      702         0        0          0
Markdown                     5       267      201         0       66          0
ASP.NET                      4        69       69         0        0          1
CSS                          4       295      211        33       51          0
JavaScript                   1        18       12         3        3          2
Makefile                     1        56       32         9       15          1
gitignore                    1        68       39        13       16          0
-------------------------------------------------------------------------------
Total                      136     11994     9530      1371     1093        824
-------------------------------------------------------------------------------
Estimated Cost to Develop $288,166
Estimated Schedule Effort 9.562139 months
Estimated People Required 3.569788
-------------------------------------------------------------------------------

Please let me know if you would like any assistance.

Thanks

/salah

Add support for typings definition

Filename: test.d.ts
Should identify as: TS Typings
Use the same definitions as TypeScript

Requires a change to the way extensions are calculated. Needs to split on the first . and check for d.ts then fallback to .ts.

Improve Performance Through Intelligent Memory Management

The big takeaway from https://boyter.org/posts/sloc-cloc-code/ is that disabling the garbage collector speeds processing up by a factor of two. If it were possible to eliminate the GC pressure that would be a free performance gain.

Some ideas.

  • Profile memory and look for any obvious gains
  • Investigate checking for the available memory on the system and the memory usage and turn off/on the GC

A short term gain might be to just turn off GC until a threshold of files has been identified in the walker. Then turn it on.

false statistics about a language that is not actually present in repo.

Describe the bug
When I run scc a command line directly from repo directory i am getting correct statistics but when I run it like mentioned below I also get ASP.NET in statistics. How come that can happen?

To Reproduce

  1. using scc as command line.
  2. command line arguments. $ scc /path/to/a/repo

Expected behavior
I expect it to show statistics about Golang, markdown, makefile and gitignore only.

Desktop (please complete the following information):

  • OS: [Linux Ubuntu 19.04]

Blanks are not counted correctly in Java

Describe the bug
In some cases blank lines are not counted correctly in Java files.
I've simplified the example somewhat, but could not pinpoint problematic lines.

To Reproduce

  1. Download file AbstractSACParser.zip
  2. Extract
  3. Scc

Expected behavior

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
Java                         1       139      103        16        20          8

Actual behavior

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
Java                         1       139      119        15        5          8

Desktop

  • Windows 10

Unable to tell the difference between Coq and Verilog

This is becoming more of an issue, especially with the news that V Lang is also going to use v along with Coq and Verilog vlang/v#6

Annoying because its quite a large refactor to accommodate this. More to the point is how to implement it? One thought is to scan the first 500 bytes or so and look for tokens that indicate one language or another. This is likely to be problematic however.

Might have to have a look at how https://github.com/vmchale/polyglot does it and copy whatever it is doing.

Things to keep in mind. Need to do this for every file that has the extension up to some point. As such it needs to be fast. One possible thing to do here is after checking some amount of files if all are determined to be Coq perhaps flip over to just assuming that will be the case for everything else from that point on.

API support

I've been looking into your repo and would like to see if you might be interested in turning this into an API -- such that you could use it as a library in addition to a command line program.

Looking through your code, it seems like we could generalize the FileJob into a more generic type or interface and make countStats public and it could easily then be called as a library.

My use case is that I'd like to be able to feed a reader into countStats so that I can dynamically pipe a source file (in my case from memory) into your API and have it return me the results.

Is this something you'd be interested discussing further?

--exclude flag issues once you have >3, or after a directory exclusion

Seems that when you have more than three exclusions, it forgets the first three and starts over.

i.e. the output of:

C:\Tools> scc X:\path\to\repo

is the same as

C:\Tools> scc -e thing1-e thing2 -e thing3 -e nonexistent X:\path\to\repo

Similarly, the output of (assuming thing4 exists):

C:\Tools> scc -e thing4 X:\path\to\repo

is the same as:

C:\Tools> scc -e thing1-e thing2 -e thing3 -e thing4 X:\path\to\repo

Also, exceptions (both file and directory) after a directory exclusion seem to drop the directory exclusion - but reversing the order works if one of them is a file.

zsh: command not found: scc

Install by go get -u github.com/boyter/scc/.

But output zsh: command not found: scc when use scc

Desktop (please complete the following information):

  • OS: Windows WSL ubuntu

Use scc as library

It would be great if code could be updated to be able to use it as a library

Feature: Support .ignore file

Its used by both ag and rg and should be included. Would be a nice way to add vendor as an exclusion for example.

Missing packages during build ...

Building via go build, receive the following errors:

main.go:4:2: cannot find package "github.com/boyter/scc/processor" **in** any of: /usr/local/Cellar/go/1.10.2/libexec/src/github.com/boyter/scc/processor (from $GOROOT) /Users/myuser/go/src/github.com/boyter/scc/processor (from $GOPATH) main.go:5:2: cannot find package "github.com/urfave/cli" **in** any of: /usr/local/Cellar/go/1.10.2/libexec/src/github.com/urfave/cli (from $GOROOT) /Users/myuser/go/src/github.com/urfave/cli (from $GOPATH)

Problem is fixed by installing necessary dependencies (as indicated above) with:

go get github.com/boyter/scc/processor github.com/urfave/cli

Build then works. I'm considering this a workaround and only added this issue as documentation for others that may (will?) run into this.

OS: macOS Sierra (10.12.6)
go version: go1.10.2 darwin/amd64

PS: new to Go, bear with me if this is implicit.

Template Language Support

Languages like Mako, Jinja etc... have the ability to break into code. However scc is unaware of this and considers the whole thing to be code. Need to have some sort of system that allows toggling between code and comment mode, IE invert the usual case of expecting code that can go into a comment and reverse it.

A quick hack to resolve this might be to have a flag that starts the code in comment mode by default, but that when this happens continue to do the complexity checks anyway to find the template conditionals.

Verilog files identified as V

Describe the bug

Verilog files are being identified as V.

To Reproduce

Clone https://github.com/seldridge/verilog.git
Run scc over the folder ./src inside it
Observe the output where V language is identified

$ scc
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
SystemVerilog                7       598      383       171       44         66
V                            6       965      771       143       51         36
───────────────────────────────────────────────────────────────────────────────
Total                       13      1563     1154       314       95        102
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $31,398
Estimated Schedule Effort 4.118292 months
Estimated People Required 0.903127
───────────────────────────────────────────────────────────────────────────────

Expected behavior

All the files in this directory should be identified as Verilog.

Desktop (please complete the following information):

  • OS: Window, Linux, MacOS
  • Version 2.4.0

Allow passing multiple regex (--not-match)

Consider accepting multiple regular expressions instead of just one. Example

scc --not-match ".*\.csv" --not-match "specialDirectory/.*\.txt" .

Currently scc just uses the last specified regex (--not-match) option.

C# line count is broken if verbatim string with backslash is present

Describe the bug
Count is broken when there is a line in C# file with verbatim string in it like this:
private const string BasePath = @"a:\";.

To Reproduce

  1. Download file SccTokeiFailure.zip
  2. Extract
  3. scc

Actual behavior

Language                 Files     Lines     Code  Comments   Blanks Complexity
C#                           1        20       20         0        0          0

Expected behavior

Language                 Files     Lines     Code  Comments   Blanks Complexity
C#                           1        20       14        3        3          0

Tested on

  • Debian 9.7
  • Windows 10

scc duplicate detection -d option produces odd results

Describe the bug

When running scc with -d option sometimes the counts of files changes.

To Reproduce

Used via command line with -d option.

counting-tests>scc.exe -d
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
Java                        52      6665     2439      3411      815         95
XML                         11       651      519        56       76          0
Plain Text                   5       128       91         0       37          0
XML Schema                   2       958      906         0       52          0
gitignore                    1        47       26        10       11          0
Properties File              1         5        3         2        0          0
───────────────────────────────────────────────────────────────────────────────
Total                       72      8454     3984      3479      991         95
────────────────────────────────────────────────────────────

// !!!!!!!!!!!!!!
// NOW HERE the number of java files goes down by 4
// and the number of Properties File is 2 instead of 1
!! !!!!!!!!!!!!!!

counting-tests>scc.exe -d
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
Java                        48      6201     2269      3172      760         91
XML                         11       651      519        56       76          0
Plain Text                   5       128       91         0       37          0
Properties File              2        12        8         4        0          0
XML Schema                   2       958      906         0       52          0
gitignore                    1        47       26        10       11          0
───────────────────────────────────────────────────────────────────────────────
Total                       69      7997     3819      3242      936         91
────────────────────────────────────────────────────────────

Expected behavior

The number of Java and Properties files should not change between runs with -d option. Note that --by-file should produce different results as it depends on which file is processed first which one is marked as a duplicate.

Desktop (please complete the following information):

  • OS: Windows
  • Version: scc 2.1.0 and 2.2.0

Ignore files in gitignore

This might be the against the purpose of this project, but personally I've been too spoiled from ag

When I was using it to measure a few of my projects, it spent an awfully long time working in node_modules and dist, when I just wanted to see the results of my TypeScript. So hiding gitignore files will make the results more accurate in this case

On the other hand, it is interesting to see the cumulative amount of work done in an open source project. So maybe it should just be a flag

Python docstrings are counted as code

I tried scc 2.2.0 on Windows, on a python file, and it appears that Python docstrings are counted as code instead of comments.

'''This is a module docstring'''

class C:
  '''
  This is a class docstring
  '''
  
  def f():
    """This is a function docstring.
    simple quotes and double quotes are equivalent
    """
    pass

lines: 12
code: 10
comments: 0
blanks: 2

This should be:
lines: 12
code: 3
comments: 7
blanks:2

Missing Language Output

# bboyter @ SurfaceBook2 in ~/Go/src/github.com/boyter/scc/examples/long on git:master x [9:18:03]
$ scc
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
                             7         0        0         0        0          0
───────────────────────────────────────────────────────────────────────────────
Total                        7         0        0         0        0          0
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $0
Estimated Schedule Effort 0.000000 months
Estimated People Required NaN
───────────────────────────────────────────────────────────────────────────────

Not sure what is going on here. Need to investigate before release.

Allow passing multiple directories

For example;

scc src/ include/

Currently scc just uses the first file;

scc/main.go

Lines 20 to 25 in 2a32681

Run: func(cmd *cobra.Command, args []string) {
processor.DirFilePaths = args
processor.ConfigureGc()
processor.ConfigureLazy(true)
processor.Process()
},

// DirFilePaths is not set via flags but by arguments following the flags for file or directory to process
var DirFilePaths = []string{}

fpath := filepath.Clean(DirFilePaths[0])

Formatting issue with Varnish

$ scc
-------------------------------------------------------------------------------
Language                 Files     Lines     Code  Comments   Blanks Complexity
-------------------------------------------------------------------------------
JavaScript                  67    105600    71931     17008    16661      18812
Python                      66     18236    14135       797     3304       2572
HTML                        46     27966    27239        40      687          0
SVG                         21       581      581         0        0          0
CSS                         21     19368    15996       535     2837          0
Plain Text                  14       880      716         0      164          0
SQL                         10       647      233       367       47          0
Perl                         4     32511    26928      3394     2189       4163
Shell                        4        43       31         7        5          2
Java                         3      2828     1883       633      312        366
JSON                         2      9359     9359         0        0          0
C++                          2      2389     1642       311      436         84
C#                           2        40       35         1        4          4
F#                           2       101       64         2       35          5
Forth                        2        33       23         2        8          3
Markdown                     2        12       11         0        1          0
PHP                          1         2        2         0        0          0
gitignore                    1        57       39        13        5          0
Varnish Configuration         1        50       38         0       12          0
Ruby                         1        22        3        18        1          0
License                      1        20       16         0        4          0
Ruby HTML                    1       267      243         0       24         20
C Header                     1        70       20        40       10          0
-------------------------------------------------------------------------------
Total                      275    221082   171168     23168    26746      26031
-------------------------------------------------------------------------------

Nested Comments in D-Lang Problematic

/* 8 lines 5 code 1 comments 2 blanks */

void main() {
    auto x = 5; /+ a /+ nested +/ comment /* +/
    writefln("hello");
    auto y = 4; // */
}

Note that in the above /+ allows nested comments but that */ can be inside the different one. Currently scc is unable to deal with this.

Could not identify CMake file in project Vespa

Project: https://github.com/vespa-engine/vespa
Commit: 389801098797ab37c7bc4ac5a3888ef4d92214e7
CMake files marked as Plain Text

# scc --version
scc version 1.1.0
# scc
-------------------------------------------------------------------------------
Language                 Files     Lines     Code  Comments   Blanks Complexity
-------------------------------------------------------------------------------
Java                      7264    776647   577947     84844   113856      47825
C++                       3160    568874   479707     16565    72602      51343
C Header                  2802    218981   136097     44848    38036       4763
Plain Text                1208    288657   288021         0      636          0
XML                        620    156449   155398       204      847          0
Module-Definition          252      7498     5992         0     1506        626
Shell                      189      9843     7441      1077     1325       1289
JSON                       135      9918     9904         0       14          0
C++ Header                 133     20360    17209       810     2341       2651
Scala                       59      5242     3811       303     1128        220
Markdown                    49       843      607         0      236          0
Perl                        49      7547     6076       627      844        926
HTML                        18      2171     1950        11      210          0
Autoconf                    12       131      107        19        5          6
Ruby                         9       342      294         9       39         18
Python                       9      1118      815        97      206        191
C                            7      1881     1427       179      275        141
Emacs Lisp                   4      2752     2118       403      231         93
CMake                        3       676      473        79      124         58
Makefile                     2        41       36         2        3          0
YAML                         1        27       19         1        7          0
Dockerfile                   1        16       10         2        4          3
MSBuild                      1         6        6         0        0          0
Protocol Buffers             1       464      143       255       66          0
-------------------------------------------------------------------------------
Total                    15988   2080484  1695608    150335   234541     110153
-------------------------------------------------------------------------------
Estimated Cost to Develop $66,433,488
Estimated Schedule Effort 75.578429 months
Estimated People Required 104.122323
-------------------------------------------------------------------------------
# tokei --version
tokei 7.0.3 compiled without serialization formats.
# tokei -s files
-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Java                 7264       795587       578434       103023       114130
 C++                  3160       570774       481222        16697        72855
 C Header             2802       228614       136351        53756        38507
 CMake                1018        14243        12801         1127          315
 XML                   620       157314       155418          985          911
 Module-Definition     252         7499         5993            0         1506
 Shell                 203        11145         8490         1155         1500
 Plain Text            193       275090       275090            0            0
 JSON                  135         9918         9918            0            0
 C++ Header            133        20455        17210          902         2343
 Scala                  59         5313         3817          365         1131
 Perl                   52         8409         6733          711          965
 Markdown               49          843          843            0            0
 HTML                   18         2190         1950           29          211
 Autoconf               12          131          107           19            5
 Python                  9         1118          817           95          206
 Ruby                    9          342          294            9           39
 C                       7         1918         1432          198          288
 Emacs Lisp              4         2752         2118          403          231
 Makefile                2           41           36            2            3
 Dockerfile              1           16           10            2            4
 Protocol Buffers        1          464          143          255           66
-------------------------------------------------------------------------------
 Total               16003      2114176      1699227       179733       235216
-------------------------------------------------------------------------------

file without an extension count as an extension

Hi,
I just go get scc and try it to count on my project. I was surprised with line:

Robot Framework              1     54580    53988         0      592          0

Actually I have binary with the name robot and scc counts this binary as a .robot extension. When I renamed my binary to robot1, the line disappeared. Your program looks fine, but you should count only on extensions, not on names which duplicate extension name.

P.S.: Counting adoc (asciidoc) files is almost same like counting markdown files. We use them for documentation. It is pity adoc is not in the default configuration.

No counts for single file

See https://raw.githubusercontent.com/corretto/corretto-8/da7e192d1f7a73340cb1ec9b2f08645f8d189769/src/jdk/src/share/native/sun/java2d/cmm/lcms/cmscgats.c for instance

$ wget https://raw.githubusercontent.com/corretto/corretto-8/da7e192d1f7a73340cb1ec9b2f08645f8d189769/src/jdk/src/share/native/sun/java2d/cmm/lcms/cmscgats.c
$ ./scc cmscgats.c
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────────────
Total                        0         0        0         0        0          0
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $0
Estimated Schedule Effort 0.000000 months
Estimated People Required NaN
───────────────────────────────────────────────────────────────────────────────

but

$ perl ../cloc-1.80.pl cmscgats.c 
       1 text file.
       1 unique file.                              
       0 files ignored.

github.com/AlDanial/cloc v 1.80  T=0.02 s (63.5 files/s, 179074.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                                1            811            212           1796

New language + go generate; does not show up

I tried to add language support for SAS and Stata, see diff.

I ran go generate and go build after, but these do not show up in the list when doing scc --languages nor in the list when running the main version.

Running go 1.11 on windows.

Any advice?

chocolatey package

in the same vein as the hombrew ticket, how about a package on chocolatey?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.