GithubHelp home page GithubHelp logo

isabella232 / clouseau Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cfpb/clouseau

0.0 0.0 0.0 1.33 MB

Search your repository's git history for undesirable text patterns such as passwords, ssh keys and othe personal identifiable information

License: Creative Commons Zero v1.0 Universal

Python 86.25% Shell 4.43% HTML 8.62% Dockerfile 0.70%

clouseau's Introduction

Clouseau

Build Status

What is Clouseau?

Clouseau is a silly git repo inspector.

Clouseau is a P.I. for your PII. It searches git commits -- source code and commit messages -- for undesirable text patterns, such as passwords, ssh keys and personal identifiable information. You can search for profanity or other information with a new pattern file or a regular expression specified on the command line.

See the Get Involved section at the end of this readme to see the current status of this project and contribute.

Dependencies

See the requirements.txt file for additional dependencies to be installed in the quick setup.

Quick setup

  1. Clone this repository somewhere you can execute Python code.

  2. From the cloned Clouseau project root, set up a virtualenv:

    virtualenv --no-site-packages --distribute venv    # creates the virtualenv named "venv"
    source venv/bin/activate                           # activates (places you in) the virtualenv
  3. Install the requirements:

    pip install -r requirements.txt
  4. Tell Python to also look in this directory for libraries.

    export PYTHONPATH=$PYTHONPATH:.

And that's it! Now follow the usage instructions below.

Basic Usage

Search a github repository to match patterns:

$ bin/clouseau --url [repo-url] ; e.g., $ bin/clouseau --url https://github.com/virtix/cato.git

This will search against the default pattern file (clouseau/patterns/default.txt) and display any matches for each of the patterns the file contains.

The results should look something like this:

Additional Usage Options

Search using a single regular expression:
$ bin/clouseau --url https://github.com/virtix/cato.git --term "Your Name"

Search the entire history for a single term (quite slow and needs threading or multi-process work):
$ bin/clouseau --url https://github.com/virtix/cato.git --term "Your Name" --revlist all

Search the current revision using a different pattern file:
$ bin/clouseau -u https://github.com/virtix/cato.git --patterns ~/projects/patterns/profanity.txt

Search the current revision using multiple pattern files:
$ bin/clouseau -u https://github.com/virtix/cato.git --patterns ~/projects/patterns/profanity.txt,~/projects/patterns/custom_pattern.txt

Skip either cloning or pulling and just scan:
$ bin/clouseau -u https://github.com/virtix/cato.git --skip

Search the specific revision :
$ bin/clouseau -u https://github.com/virtix/cato.git --revlist 5c0b30b007

Search between the range of two commits:
$ bin/clouseau -u https://gituhub.com/virtix/cato.git --revlist d46868fe...3ea013e8

Search since a given date:
$ bin/clouseau -u https://github.com/virtix/cato.git --after 03/10/13

Blame:
$ bin/clouseau -u https://github.com/virtix/cato.git --author bill

Intended command-line interface

$ bin/clouseau -h
usage: clouseau [-h] [-v] --url URL [--term TERM] [--patterns PATTERNS]
                [--clean] [--output OUTPUT_FORMAT]
                [--output-destination OUTPUT_DESTINATION] [--dest DEST]
                [--revlist REVLIST]

Clouseau: A silly git inspector

 optional arguments:
   -h, --help               show this help message and exit
   -v, --version            show program's version number and exit
   --url URL, -u URL        Fully qualified git URL (http://www.kernel.org/pub//software/scm/git/docs/git-clone.html)
   --term TERM, -t TERM     Search for a single regular expression instead of every term in patterns.txt
   --patterns PATTERNS, -p PATTERNS
                            Path to list of regular expressions to use.
   --clean, -c              Delete the existing git repo and re-clone
   --output OUTPUT_FORMAT, -o OUTPUT_FORMAT  (NOT YET IMPLEMENTED)
                            Output formats: console, markdown, raw, html, json
   --output-destination OUTPUT_DESTINATION, -od OUTPUT_DESTINATION  (NOT YET IMPLEMENTED)
                            Location where the output is to be stored. Default ./temp.
   --dest DEST, -d DEST  The directory where the git repo is stored. Default: ./temp  (NOT YET IMPLEMENTED)
   --revlist REVLIST, -rl REVLIST
                           A space-delimted list of revisions (commits) to search.
                           Defaults to HEAD. Specify 'all' to search the entire history.
   --before BEFORE, -b BEFORE
                            Search commits that occur prior to this date; e.g., Mar-08-2013
   --after AFTER, -a AFTER
                            Search commits that occur after this date; e.g., Mar-10-2013
   --author AUTHOR         Perform searched for commits made by AUTHOR; e.g., an email address or name.
   --skip   SKIP           If specified, skips any calls to git-clone or git-pull.

Minimal output

For continuous integration environments, minimal output may be desirable. In that case, use bin/clouseau_thin:

$ bin/clouseau_thin -u [git_url] ...

clouseau_thin supports all clouseau options and differs only in the verbosity and attractiveness of its output.

Running locally on a cloned repository

Run Clouseau from your cloned project root, with your repository's Github url in place of $remote_url:

$ clouseau_thin -u $remote_url --skip --dest $(dirname $(pwd)) --revlist="HEAD"

This is useful for checking local repositories for sensitive data before pushing to a public URL.

Running as a post-commit hook

First, install Clouseau by changing directory to your cloned Clouseau project root and then pip install -e ./

Test the install by changing to any other directory and issuing clouseau and also clouseau_thin

Now, change to one of your local git repos.

Create .git/hooks/post-commit and make it executable (chmod +x .git/hooks/post-commit)

Edit it with content such as this:

#!/bin/sh

echo "running clouseau"
remote_url=$(git config --get remote.origin.url)
clouseau_thin -u $remote_url --skip --dest $(dirname $(pwd)) --revlist="HEAD"

Now, make a commit to that project.

You should see that Clouseau runs and finds nothing.

Make another commit, this time adding something that looks like a SSN or IP to the file and/or the commit message. Run Clouseau again, and you should see output such as this:

running clouseau
Skipping git-clone or git-pull as --skip was found on the command line.
Clouseau: a silly git inspector, searching [your_git_url]

✓  hooktest.txt
Search term:  username[ ]*=[ ]*.+
[email protected]:marcesher/cato/commit/0731c34b40bcd4322c6b4daf044ec3587211808a
Author: Marc Esher <[email protected]> Date:   Tue Feb 25 15:41:37 2014 -0500
my username=foo

+production_ip=127.0.0.1  Line:19
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

✓  Commit Message
Search term:  username[ ]*=[ ]*.+
[email protected]:marcesher/cato/commit/0731c34b40bcd4322c6b4daf044ec3587211808a
Author: Marc Esher <[email protected]> Date:   Tue Feb 25 15:41:37 2014 -0500
my username=foo

my username=foo  Line:1
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Clouseau's output can also be sent to the Mac OS X Notification Center via post-commit hook, which is useful for users of GitHub's GUI client for Mac.

Running with Docker

Clouseau is now in the Docker index and you can run it with a simple docker command:

docker run -i -e "GIT_URL=https://github.com/virtix/cato.git" -t dlapiduz/clouseau

Running unit tests

To run unit tests, issue:

nosetests

Getting involved

If you're interested in using Clouseau to scan your source code and commit messages for undesirable content, please get involved.

Clouseau is currently in an early stage of development and not recommended for production use.

  • Proof of concept
  • Multiple output formats
  • Works on reasonably sized repos (concurrency)
  • Stores previous runs

The intent is that this can be run against any repo and it will search the index for file blobs containing the patterns defined in a patterns.txt file or a regular expression specified on the command line.

We welcome feature requests, bug reports, and code / documentation improvements. We also welcome stories of how you're using Clouseau.

General instructions on how to contribute are described in CONTRIBUTING.

Open source licensing info

  1. TERMS
  2. LICENSE
  3. CFPB Source Code Policy

clouseau's People

Contributors

cfarm avatar contolini avatar dlapiduz avatar kimberlymunoz avatar laminfaty avatar m3brown avatar marcesher avatar mehtadev17 avatar rosskarchner avatar scotchester avatar virginiacc avatar virtix avatar willbarton avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.