GithubHelp home page GithubHelp logo

acceis / leakscraper Goto Github PK

View Code? Open in Web Editor NEW
386.0 15.0 78.0 216 KB

LeakScraper is an efficient set of tools to process and visualize huge text files containing credentials. Theses tools are designed to help penetration testers and redteamers doing OSINT by gathering credentials belonging to their target.

License: GNU General Public License v3.0

Shell 1.23% Python 71.60% CSS 4.66% Smarty 18.67% JavaScript 3.84%
python3 pentesting osint leak credentials-gathering redteam redteaming credentials

leakscraper's Introduction

leakScraper

mongodb version

Python 3.5|3.6 License

LeakScraper is an efficient set of tools to process and visualize huge text files containing credentials. These tools are designed to help pentesters/redteamers doing OSINT, credentials gathering and credentials stuffing attacks.

Installation

  • First things first : have a working mongodb server.
  • Then :
     git clone -b mongodb https://github.com/Acceis/leakScraper
     cd leakScraper
     sudo ./install.sh
    It will install a few pip packages and debian packages (python-magic, python3-pymongo and bottle).

Requirements

Linux (debian), python 3.x and a mongodb server.

Usage

See the wiki for that

Screenshot

The different tools

LeakScraper is split into three parts :

  • leakStandardizer : A tool to standardize leaks you got from some legit place on the internet. It takes in input a file containing credentials following some weird format, containing non ascii characters or empty lines, lines containing invalid emails or no password. It will produce, with your help (using regular expression), an easily greppable file using the following format : email:hash:plain ("plain" for "plain text password").
  • leakImporter : A tool to import a standardized leak file into a mongodb database. It will take care of extracting data from the file, putting it into a mysql comprehensive format, creating/managing indexes ...
  • leakScraper : A tool and an interface to excavate data from the database and display it nicely.

Postulates

  • The covered usecase is the following : searching credentials belonging to a specific organization/company/structure. This is done by searching credentials associated to an email belonging to the organization in question. Eg: Searching credentials belonging to microsoft is done by searching credentials associated to accounts registered with an email ending with "@microsoft.com". It is the only usecase covered and it means a lot in terms of technical choices (database indexes and data representation in general).

  • Leaks can weight several gigabytes. It means that each process (standardizing, imports and researches) are using in-place algorithms in terms of memory. You can know beforehand how much memory theses tools will use to process a specific file, and it will never exhaust your computer's resources (unless you have a very old one).

  • Processing huge files and working with a lot of data takes time. It's important imo to have visual/real-time feedback to know how much time processing/importing a file will take. It's important to know if you just started a 7 hours long process or a 1,200 years long one.

leakscraper's People

Contributors

almandin avatar gbonfiglio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

leakscraper's Issues

DeprecationWarning

Hey

Im using kali 2

I followed the install process When I launch the script
python3 leakScraper.py

All works good no syntax error but when I visit

localhost:8080

I get this error

Listening on http://127.0.0.1:8080/
Hit Ctrl-C to quit.

leakScraper.py:58: DeprecationWarning: count is deprecated. Use estimated_document_count or count_documents instead. Please note that $where must be replaced by $expr, $near must be replaced by $geoWithin with $center, and $nearSphere must be replaced by $geoWithin with $centerSphere
  count = credentials.count()

any solution ?

Internal error

Is the project still maintained? Following your installation guide I get multiple errors and while running leakScraper.py and going to port 8080 I get error 500.

Here is the error:
Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/bottle.py", line 876, in _handle return route.call(**args) File "/usr/local/lib/python3.9/dist-packages/bottle.py", line 1759, in wrapper rv = callback(*a, **ka) File "/usr/local/lib/python3.9/dist-packages/bottle.py", line 3688, in wrapper result = func(*args, **kwargs) File "/home/parrot/Downloads/leakScraper/leakScraper.py", line 58, in index count = credentials.count() File "/usr/lib/python3/dist-packages/pymongo/collection.py", line 1865, in count return self._count(cmd, collation, session) File "/usr/lib/python3/dist-packages/pymongo/collection.py", line 1664, in _count return self.__database.client._retryable_read( File "/usr/lib/python3/dist-packages/pymongo/mongo_client.py", line 1460, in _retryable_read server = self._select_server( File "/usr/lib/python3/dist-packages/pymongo/mongo_client.py", line 1278, in _select_server server = topology.select_server(server_selector) File "/usr/lib/python3/dist-packages/pymongo/topology.py", line 241, in select_server return random.choice(self.select_servers(selector, File "/usr/lib/python3/dist-packages/pymongo/topology.py", line 199, in select_servers server_descriptions = self._select_servers_loop( File "/usr/lib/python3/dist-packages/pymongo/topology.py", line 215, in _select_servers_loop raise ServerSelectionTimeoutError( pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 65267068bccd10bc362b42fd, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused')>]> 127.0.0.1 - - [11/Oct/2023] "GET / HTTP/1.1" 500 741 127.0.0.1 - - [11/Oct/2023] "GET /favicon.ico HTTP/1.1" 404 742

Edit: It seems like count is deprecated.

Inventory notification

Your tool/software has been inventoried on Rawsec's CyberSecurity Inventory.

https://inventory.rawsec.ml/tools.html#leakScraper

What is Rawsec's CyberSecurity Inventory?

An inventory of tools and resources about CyberSecurity. This inventory aims to help people to find everything related to CyberSecurity.

  • Open source: Every information is available and up to date. If an information is missing or deprecated, you are invited to (help us).
  • Practical: Content is categorized and table formatted, allowing to search, browse, sort and filter.
  • Fast: Using static and client side technologies resulting in fast browsing.
  • Rich tables: search, sort, browse, filter, clear
  • Fancy informational popups
  • Badges / Shields
  • Static API
  • Twitter bot

More details about features here.

Note: the inventory is a FLOSS (Free, Libre and Open-Source Software) project.

Why?

  • Specialized websites: Some websites are referencing tools but additional information is not available or browsable. Make additional searches take time.
  • Curated lists: Curated lists are not very exhaustive, up to date or browsable and are very topic related.
  • Search engines: Search engines sometimes does find nothing, some tools or resources are too unknown or non-referenced. These is where crowdsourcing is better than robots.

Why should you care about being inventoried?

Mainly because this is giving visibility to your tool, more and more people are using the Rawsec's CyberSecurity Inventory, this helps them find what they need.

Badges

The badge shows to your community that your are inventoried. This also shows you care about your project and want it growing, that your tool is not an abandonware.

Feel free to claim your badge here: http://inventory.rawsec.ml/features.html#badges, it looks like that Rawsec's CyberSecurity Inventory, but there are several styles available.

Want to thank us?

If you want to thank us, you can help make the project better known by tweeting about it! For example: Twitter URL

So what?

That's all, this message is just to notify you if you care.

Standardiser hangs during use

Iā€™m trying to troubleshoot an issue with the script hanging on unexpected characters.

I originally identified control characters causing problems, so have been stripping those out of my input files, but Iā€™m still getting problems. Is there a debug mode to indicate which strings it crashed on ?

I will start collecting examples of the strings that are crashing now it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.