GithubHelp home page GithubHelp logo

blackducksoftware / ohcount Goto Github PK

View Code? Open in Web Editor NEW

This project forked from korsakov/ohcount

257.0 25.0 74.0 8.87 MB

The Ohloh source code line counter

Home Page: https://github.com/blackducksoftware/ohcount

License: GNU General Public License v2.0

Ruby 15.69% Shell 2.89% CSS 0.02% Python 4.20% C 66.23% ActionScript 1.92% Objective-C 0.68% BlitzMax 0.27% C# 1.24% Coq 0.12% Common Lisp 2.48% Eiffel 0.12% Smalltalk 0.62% XSLT 0.22% CoffeeScript 0.36% D 0.43% TeX 0.15% eC 1.57% Logtalk 0.21% IDL 0.58%
line-counter tool ubuntu ohloh ohcount

ohcount's Introduction

Coverity Scan Build Build Status

Ohcount

Ohloh's source code line counter.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License Version 2 as published by the Free Software Foundation.

License

Ohcount is specifically licensed under GPL v2.0, and no later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Overview

Ohcount is a library for counting lines of source code. It was originally developed at Ohloh, and is used to generate the reports at www.openhub.net.

Ohcount supports multiple languages within a single file: for example, a complex HTML document might include regions of both CSS and JavaScript.

Ohcount has two main components: a detector which determines the primary language family used by a particular source file, and a parser which provides a line-by-line breakdown of the contents of a source file.

Ohcount includes a command line tool that allows you to count individual files or whole directory trees. It also allows you to find source code files by language family, or to create a detailed annotation of an individual source file.

Ohcount includes a Ruby binding which allows you to directly access its language detection features from a Ruby application.

Language Support

See: src/languages.h

System Requirements

Ohcount is supported on Ubuntu 18.04 LTS. It has also been tested on Fedora 29. Other unix-like environments should also work, but your mileage may vary.

Ohcount does not support Windows.

Building Ohcount

$ git clone git://github.com/blackducksoftware/ohcount.git
$ cd ohcount

Dockerfile

One may use the bundled Dockerfile to build ohcount for Ubuntu:

$ docker build -t ohcount:ubuntu .

Manual build

Last updated: 2021-12-09

Ohcount needs Ruby 2.* to run tests. The ruby dev headers provided by Ubuntu/Fedora package managers were found to be missing a config.h header file. If the default ruby and ruby-dev packages do not work, install ruby using brew/rbenv/asdf/rvm, which work reliably with ohcount.

You will need ragel 7.0 or higher, bash, gperf, libpcre3-dev, libmagic-dev, gcc(version 7.3 or greater) and swig (>=3.0.0). For older gcc versions one could try this fix.

Ubuntu/Debian
$ sudo apt-get install libpcre3 libpcre3-dev libmagic-dev gperf gcc ragel swig
$ ./build
Fedora
$ sudo dnf install gcc file-devel gperf ragel swig pcre-devel
$ ./build
OSx
$ brew install libmagic pcre ragel swig
$ ./build
Other Unix systems
  • If build fails with a missing ohcount.so error and any ruby/x86.../ folder has the file, copy it to ruby/ folder.

Using Ohcount

Once you've built ohcount, the executable program will be at bin/ohcount. The most basic use is to count lines of code in a directory tree:

$ bin/ohcount path/to/directory

Ohcount support several options. Run ohcount --help for more information.

Building Ruby and Python Libraries

To build the ruby wrapper:

$ ./build ruby

To build the python wrapper, run

$ python python/setup.py build
$ python python/setup.py install

The python wrapper is currently unsupported.

Contributing

  • Observe any existing PR contribution and emulate the pattern. For e.g. see this.
  • Run ./build to compile the ragel files.
  • While writing the test/expected_dir files, disable any whitespace/tab replacing options from your editor.
  • Ohcount output has tabs in it, so the test/expected_dir also needs to contain tab characters.
  • Sample format of test/expected_dir is as follows. There is a Tab character after dart, code & comment:
dart	code	void main() {
dart	comment	  // Line comment
  • Some editors convert Tab to Space. The following steps help ensure that the proper character is added. ** Open the file in Vim editor. ** Run :set list. This makes all hidden characters like Tab visible. ** Type dart, press ctrl+v followed by tab. ** Run the tests to confirm these changes: ./build tests.

ohcount's People

Contributors

alex-sig avatar amujumdar avatar bnkr avatar borisfaure avatar bytbox avatar chris-morgan avatar ciaranm avatar d0k avatar earl avatar genisysram avatar grubba avatar haraldkl avatar jasonriedy avatar jerstlouis avatar koraktor avatar krajaratnam avatar mjl- avatar mwh avatar notalex avatar oblomov avatar pdegenportnoy avatar peti avatar pfusik avatar pmoura avatar pshiryaev avatar raphink avatar samb avatar sylvestre avatar tsee avatar wildmichael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ohcount's Issues

modelica.rl:59:13: parse error

Hello, I'm a developer with MacPorts, trying to update our ohcount package to your latest version, 3.1.1. I can't get it to build. Here's a partial build log:

Generating hash headers
Found gperf, making headers...
1 input keys have identical hash values, examine output carefully...
Found ragel, compiling...
Compiling actionscript.rl
Compiling ada.rl
Compiling ampl.rl
[...]
modelica.rl:59:13: parse error
[...]
Compiling xml.rl
Compiling xmlschema.rl
Compiling xslt.rl
Building src/parser.c (will take a while)
clang: warning: argument unused during compilation: '-L/opt/local/lib' [-Wunused-command-line-argument]
In file included from src/parser.c:8:
parsers.gperf:63:10: fatal error: '../parsers/modelica.h' file not found
#include "../parsers/modelica.h"
         ^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

I have ragel 7.0.0.11 installed.

Possibility of cutting a release for downstream packagers?

The latest release, 3.0.0, is from 2009, and this is the release packaged by packagers (Debian, Homebrew and MacPorts on macOS, etc). 175 commits have been added to master since then, with 19 new languages added:

augeas
bfpp
brainfuck
chaiscript
coffeescript
coq
forth
golang
jam
logtalk
modula2
modula3
nsis
oberon
prolog
puppet
rebol
rust
tex_dtx

In particular, CoffeeScript, Go and Rust are very widely used these days, but the packaged version is useless for all these languages.

Therefore, could you please consider cutting a release so that downstream packagers could pick up the (not-so-)recent improvements? Thanks.

CVE-2017-16926: Command injection through file names

As reported in bugs.debian.org, there is a critical defect in Ohcount.

The issue, in brief, is that an attack can be executed by using a specially crafted file name that will cause Ohcount to execute arbitrary statements in a shell as the user that is running Ohcount.

The Black Duck Open Hub team is aware of the report and defect and is working on a fix.

Compilation error, travis failing also

Compilation fails on the current master (6654d48) on both:

  • Ubuntu 20.04.1 LTS, gcc version 9.3.0, gperf v3.1
  • OSX 11.2.3, Apple clang version 12.0.0 (clang-1200.0.32.29), gperf v3.0.3

It also seems that your Travis build is failing and Travis mistakenly finishes with success: https://travis-ci.org/github/blackducksoftware/ohcount/jobs/760100858

Error:

Building src/parser.c (will take a while)
clang: warning: argument unused during compilation: '-L/opt/local/lib' [-Wunused-command-line-argument]
src/parser.c:10:20: error: conflicting types for 'ohcount_hash_parser_from_language'
struct ParserMap * ohcount_hash_parser_from_language (register const char *str, register size_t len);
                   ^
parsers.gperf:171:1: note: previous definition is here
ohcount_hash_parser_from_language (register const char *str, register unsigned int len)

Please tag a new release

It has been a while we had a new release.
It would be nice for packagers to have a new tag (for Debian for example)
Thanks

Comparing output to wc -l for sanity checking

Why is there such a big difference between find -type f | xargs wc -l's 270997 and ohcount's 4129992?

I'm looking at https://webkitgtk.org/releases/webkitgtk-2.28.2.tar.xz

       5 ./Documentation/jsc-glib-4.0/html/right.png
     117 ./Documentation/jsc-glib-4.0/html/api-index-2-24.html
     355 ./Documentation/jsc-glib-4.0/html/index-all.html
      73 ./Documentation/jsc-glib-4.0/html/annotation-glossary.html
       5 ./Documentation/jsc-glib-4.0/html/left.png
     185 ./Documentation/jsc-glib-4.0/html/jsc-glib-4.0-JSCVersion.html
       3 ./Documentation/jsc-glib-4.0/html/right-insensitive.png
     189 ./Documentation/jsc-glib-4.0/html/jsc-glib-4.0.devhelp2
    2429 ./NEWS
  270997 total
[hendry@t480s webkitgtk-2.28.2]$ ohcount
Examining 19472 file(s)

                          Ohloh Line Count Summary

Language          Files       Code    Comment  Comment %      Blank      Total
----------------  -----  ---------  ---------  ---------  ---------  ---------
cpp               14023    2345093     520098      18.2%     499145    3364336
javascript          843     181268      30452      14.4%      54820     266540
html                472     147073       2728       1.8%        783     150584
c                  1001      76467      47129      38.1%      14921     138517
xml                  29      68888        565       0.8%       1247      70700
python              155      22985       9162      28.5%       6824      38971
css                 266      19816       6578      24.9%       4452      30846
cmake               134      15500       1907      11.0%       2090      19497
perl                 29      14214       1541       9.8%       3354      19109
ruby                 46      13957       1902      12.0%       1849      17708
assembler             5       8167         73       0.9%       1592       9832
shell                25       1409        310      18.0%        256       1975
autoconf              3        394         59      13.0%         52        505
glsl                  7        327         76      18.9%         66        469
bat                   4        272         19       6.5%         45        336
postscript            1         58          0       0.0%          9         67
----------------  -----  ---------  ---------  ---------  ---------  ---------
Total             17043    2915888     622599      17.6%     591505    4129992

Btw in the help http://www.ohloh.net/ is broken.

Add support for swift language

I was run the code analyser on a project that has Objective-c plus swift files and only the Objective-c, xib etc are listed but not the Swift code.

Will be great if we can include this language on the analyser.

Please take into account .gitattributes linguist-language entries, including for unrecognized languages

As a new feature, I suggest you support .gitattributes linguist-language entries, including for unrecognized languages. This is already supported by Gitea and means if a repository contains a completely new, and/or very domain-specific niche language, an entry in .gitattributes provided by the repository maintainers will easily make everything "just work". If the tooling plays along of course, which ohcount currently doesn't.

Current output:

$ git init .
Initialized empty Git repository in /home/user/test/.git/
$ echo -e "A\n\nB\nC" > ./test.mylang
$ echo "*.mylang linguist-language=MyLang" > .gitattributes
$ git add test.mylang .gitattributes 
$ git commit -m "Language count test"
[main (root-commit) e716e77] Language count test
 2 files changed, 4 insertions(+)
 create mode 100644 .gitattributes
 create mode 100644 test.mylang
$ ohcount .
Examining 2 file(s)

                          Ohloh Line Count Summary                          

Language          Files       Code    Comment  Comment %      Blank      Total
----------------  -----  ---------  ---------  ---------  ---------  ---------
----------------  -----  ---------  ---------  ---------  ---------  ---------
Total                 0          0          0       0.0%          0          0

Expected output:

$ git init .
Initialized empty Git repository in /home/user/test/.git/
$ echo -e "A\n\nB\nC" > ./test.mylang
$ echo "*.mylang linguist-language=MyLang" > .gitattributes
$ git add test.mylang .gitattributes 
$ git commit -m "Language count test"
[main (root-commit) e716e77] Language count test
 2 files changed, 4 insertions(+)
 create mode 100644 .gitattributes
 create mode 100644 test.mylang
$ ohcount .
Examining 2 file(s)

                          Ohloh Line Count Summary                          

Language          Files       Code    Comment  Comment %      Blank      Total
----------------  -----  ---------  ---------  ---------  ---------  ---------
MyLang                1          3          0       0.0%          1          4
----------------  -----  ---------  ---------  ---------  ---------  ---------
Total                 1          3          0       0.0%          1          4

Currently Not Building

It's currently not building for me. Following the instructions as provided I'm seeing the following error message:

Building src/parser.c (will take a while)
src/parser.c:10:20: error: conflicting types for ‘ohcount_hash_parser_from_language’
 struct ParserMap * ohcount_hash_parser_from_language (register const char *str, register unsigned int len);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from src/parser.c:8:0:
parsers.gperf:170:1: note: previous definition of ‘ohcount_hash_parser_from_language’ was here

Digging in a little further, it's clear that there are some issues with the gperf step, as it produces the warning:

Found gperf, making headers...
1 input keys have identical hash values, examine output carefully...

and many of the generated files contain the error message:

#error "gperf generated tables don't work with this execution character set. Please report a bug to <[email protected]>."

By running the commands in generate_headers manually, it becomes clear that the specific file causing the identical hash value warning turns out to be parsers.gperf, and this would seem to tally with the build-killing error observed above.

/etc/issue on Fedora contains escape sequence \S

Fedora uses an escape sequence to insert the OS name into /etc/issue output when processed by agetty

[user@localhost ~]$ cat /etc/issue
\S
Kernel \r on an \m (\l)

From the agetty man page

       s      Insert  the  system  name (the name of the operating system).  Same as `uname -s'.  See also
              the \S escape code.

       S or S{VARIABLE}
              Insert the VARIABLE data from /etc/os-release.  If this file does not exist then  fall  back
              to  /usr/lib/os-release.   If  the  VARIABLE argument is not specified, then use PRETTY_NAME
              from the file or the system name (see \s).  This escape code allows to keep /etc/issue  dis‐
              tribution and release independent.  Note that \S{ANSI_COLOR} is converted to the real termi‐
              nal escape sequence.

Ideally the standard way of obtaining the OS name should be used

[user@localhost ~]$ lsb_release -i
Distributor ID:	Fedora

Problem: Elm language not supported

I work on an open source project called Exosphere. It is a SPA (single page app) written mostly in the Elm programming language. See the GitLab repository analytics page: https://gitlab.com/exosphere/exosphere/-/graphs/master/charts

Elm: 71%
CSS: 26%
JavaScript: 0.99%
Python: 0.67%
Gherkin: 0.27%

gitlab-exosphere-languages

But when I look at the Open Hub overview: https://openhub.net/p/exosphere

It reports:

... is mostly written in JavaScript
with a very low number of source code comments

openhub-project-overview

And on the Open Hub languages summary page: https://openhub.net/p/exosphere/analyses/latest/languages_summary

It reports:

CSS: 91%
JavaScript: 5.5%
Python: 2.7%
HTML: 0.5%
shell script: 0.3%

openhub-exosphere-languages

One of the compelling aspects of our project is that it is mostly written in Elm with very little JavaScript and CSS.
We'd love for this to be reflected on Open Hub.

Note: Elm's syntax is very similar to Haskell's, so the Haskell parser could easily be modified to work.

PHP being recognized as HTML

The first line of a php file (which also gives away it's php, is identified as html)

`u:~/ohcount/bin$ ./ohcount -a test.php

html lcode <?php

php lcode if ( !is_multisite() ) {`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.