GithubHelp home page GithubHelp logo

isabella232 / credential-digger Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sap/credential-digger

0.0 0.0 0.0 2.31 MB

A Github scanning tool that identifies hardcoded credentials while filtering the false positive data through machine learning models :lock:

License: Apache License 2.0

Python 72.38% Dockerfile 0.21% CSS 5.40% JavaScript 10.31% HTML 6.81% Shell 0.15% Jupyter Notebook 4.74%

credential-digger's Introduction

REUSE status GitHub release (latest by date) PyPI

Logo

Credential Digger

Credential Digger is a GitHub scanning tool that identifies hardcoded credentials (Passwords, API Keys, Secret Keys, Tokens, personal information, etc), filtering the false positive data through machine learning models.

TLDR; watch the video ⬇️

Watch the video

Why

In data protection, one of the most critical threats is represented by hardcoded (or plaintext) credentials in open-source projects. Several tools are already available to detect leaks in open-source platforms, but the diversity of credentials (depending on multiple factors such as the programming language, code development conventions, or developers' personal habits) is a bottleneck for the effectiveness of these tools. Their lack of precision leads to a very high number of pieces of code incorrectly detected as leaked secrets. Data wrongly detected as a leak is called false positive data, and compose the huge majority of the data detected by currently available tools.

The goal of Credential Digger is to reduce the amount of false positive data on the output of the scanning phase by leveraging machine learning models.

Architecture

The tool supports several scan flavors: public and private repositories on github and gitlab, wiki pages, github organizations, local git repositories, local files and folders. Please refer to the Wiki for the complete documentation.

For the complete description of the approach of Credential Digger, you can read this publication.

@InProceedings {lrnto-icissp21,
    author = {S. Lounici and M. Rosa and C. M. Negri and S. Trabelsi and M. Önen},
    booktitle = {Proc. of the 8th The International Conference on Information Systems Security and Privacy  (ICISSP)},
    title = {Optimizing Leak Detection in Open-Source Platforms with Machine Learning Techniques},
    month = {February},
    day = {11-13},
    year = {2021}
}

Requirements

Credential Digger supports Python >= 3.6 and < 3.10, and works only with Linux and MacOS systems. In case you don't meet these requirements, you may consider running a Docker container (that also includes a user interface).

Download and Installation

First, you need to install the regular expression matching library Hyperscan. Be sure to have build-essential and python3-dev too.

sudo apt install -y libhyperscan-dev build-essential python3-dev

or (for MacOS):

brew install hyperscan

Then, you can install Credential Digger module using pip.

pip install credentialdigger

How to run

Add rules

One of the core components of Credential Digger is the regular expression scanner. You can choose the regular expressions rules you want (just follow the template here). We provide a list of patterns in the rules.yml file, that are included in the UI. The scanner supports rules of 4 different categories: password, token, crypto_key, and other.

Before the very first scan, you need to add the rules that will be used by the scanner. This step is only needed once.

python -m credentialdigger add_rules --sqlite /path/to/data.db /path/to/rules.yaml

Scan a repository

After adding the rules, you can scan a repository:

python -m credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db

Machine learning models are not mandatory, but highly recommended in order to reduce the manual effort of reviewing the result of a scan:

python -m credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db --models PathModel PasswordModel

As for the models, also the similarity feature is not mandatory, but highly recommended in order to reduce the manual effort while assessing the discoveries after a scan:

python -m credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db --similarity --models PathModel PasswordModel

Docker container

To have a ready-to-use instance of Credential Digger, with a user interface, you can build the docker container. This option requires the installation of Docker and Docker Compose.

git clone https://github.com/SAP/credential-digger.git
cd credential-digger
cp .env.sample .env
sudo docker-compose up --build

The UI is available at http://localhost:5000/

It is preferrable to have at least 8 GB of RAM free when using docker containers

Advanced Installation

Credential Digger is modular, and offers a wide choice of components and adaptations.

Build from source

After installing the dependencies listed above, you can install Credential Digger as follows.

Configure a virtual environment for Python 3 (optional) and clone the main branch of the project:

virtualenv -p python3 ./venv
source ./venv/bin/activate

git clone https://github.com/SAP/credential-digger.git
cd credential-digger

Install the requirements from requirements.txt file and install the library:

pip install -r requirements.txt
python setup.py install

Then, you can add the rules and scan a repository as described above.

External postgres database

Another ready-to-use instance of Credential Digger with the UI, but using a dockerized postgres database instead of a local sqlite one:

git clone https://github.com/SAP/credential-digger.git
cd credential-digger
cp .env.sample .env
vim .env  # set credentials for postgres
sudo docker-compose -f docker-compose.postgres.yml up --build

WARNING: Differently from the sqlite version, here we need to configure the .env file with the credentials for postgres (by modifying POSTGRES_USER, POSTGRES_PASSWORD and POSTGRES_DB).

Most advanced users may also wish to use an external postgres database instead of the dockerized one we provide in our docker-compose.postgres.yml.

How to update the project

If you are already running Credential Digger and you want to update it to a newer version, you can refer to the wiki for the needed steps.

Python library usage

When installing credentialdigger from pip (or from source), you can instantiate the client and scan a repository.

Instantiate the client proper for the chosen database:

# Using a Sqlite database
from credentialdigger import SqliteClient
c = SqliteClient(path='/path/to/data.db')

# Using a postgres database
from credentialdigger import PgClient
c = PgClient(dbname='my_db_name',
             dbuser='my_user',
             dbpassword='my_password',
             dbhost='localhost_or_ip',
             dbport=5432)

Add rules

Add rules before launching your first scan.

c.add_rules_from_file('/path/to/rules.yml')

Scan a repository

new_discoveries = c.scan(repo_url='https://github.com/user/repo',
                         models=['PathModel', 'PasswordModel'],
                         debug=True)

WARNING: Make sure you add the rules before your first scan.

Please refer to the Wiki for further information on the arguments.

CLI - Command Line Interface

Credential Digger also offers a simple CLI to scan a repository. The CLI supports both sqlite and postgres databases. In case of postgres, you need either to export the credentials needed to connect to the database as environment variables or to setup a .env file. In case of sqlite, the path of the db must be passed as argument.

Refer to the Wiki for all the supported commands and their usage.

Wiki

For further information, please refer to the Wiki

Contributing

We invite your participation to the project through issues and pull requests. Please refer to the Contributing guidelines for how to contribute.

How to obtain support

As a first step, we suggest to read the wiki. In case you don't find the answers you need, you can open an issue or contact the maintainers.

News

credential-digger's People

Contributors

marcorosa avatar alaabenfatma avatar fabiosangregorio avatar sofiane-lounici avatar slimtrabelsi avatar ichbinfrog avatar sofianelounici avatar sec4567 avatar sebastianwolf-sap avatar dependabot[bot] avatar sarthakg1234 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.