GithubHelp home page GithubHelp logo

nccgroup / dirble Goto Github PK

View Code? Open in Web Editor NEW
609.0 18.0 87.0 386 KB

Fast directory scanning and scraping tool

License: GNU General Public License v3.0

Makefile 2.11% Rust 95.19% Dockerfile 0.42% Python 1.18% Shell 1.10%
pentest-tool pentest web tool

dirble's Introduction

Introduction

Dirble is a website directory scanning tool for Windows and Linux. It's designed to be fast to run and easy to use.

How to Use

Download one of the precompiled binaries for Linux, Windows, or Mac, or compile the source using Cargo, then run it from a terminal. The default wordlist Dirble uses is dirble_wordlist.txt in the same directory as the executable.

It can be installed in BlackArch using sudo pacman -S dirble

There is also a docker image, which can be run as: docker run --rm -t isona/dirble [dirble arguments]

The help text can be displayed using dirble --help, alternatively it can be found on the github wiki: https://github.com/nccgroup/dirble/wiki/Help-Text

Example Uses

Run against a website using the default dirble_wordlist.txt from the current directory: dirble [address]

Run with a different wordlist and including .php and .html extensions: dirble [address] -w example_wordlist.txt -x .php,.html

With listable directory scraping enabled: dirble [address] --scrape-listable

Providing a list of extensions and a list of hosts: dirble [address] -X wordlists/web.lst -U hostlist.txt

Providing multiple hosts to scan via command line: dirble [address] -u [address] -u [address]

Running with threading in Gobuster's default style, disabling recursion and having 10 threads scanning the main directory: dirble [address] --max-threads 10 --wordlist-split 10 -r

Building from source

To build on your current platform, ensure cargo is installed and then run cargo build --release. Alternatively, running make will build the binary in release mode (internally running cargo build --release).

To cross-compile for 32- and 64-bit Linux and Windows targets, there is a handy makefile. make release will build for all four targets using cross. This depends on having cross and docker installed (cargo install cross).

Features

  • Cookies
  • Custom Headers
  • Extensions and prefixes
  • HTTP basic auth
  • Listable directory detection and scraping
  • Save ouptut to file
  • Save output in XML and JSON formats
  • Proxy support
  • Recursion
  • Status code blacklisting and whitelisting
  • Threading
  • Request throttling
  • Detect not found code of each directory based on response code and length
  • Ability to provide list of URLs to be scanned
  • User agents
  • Scanning with GET, POST or HEAD requests
  • Exclude ranges of response lengths from output

Performance

The following graph was generated by running each tool with Hyperfine against a test server with 5ms latency and 1% packet loss. (Gobuster was omitted due to lack of recursion).

This is a cool graph

How it works

Directory Detection

Dirble detects files based on the response code sent by the server. The behaviour can be loosely categorised by response code type.

  • 200: the path exists and is valid
  • 301, 302: redirection; report the code, size, and Location header
  • 404: not found; by default these responses are not reported
  • All other response codes are reported in the Dirble format of + [url] (CODE:[code]|SIZE:[size])

A path is classified as a directory if a request to [url] (with no trailing slash) returns a 301 or 302 redirection to [url]/ (with a trailing slash). This gets reported with a D prefix and if recursion is enabled will be added to the scan queue. This method is not dependent on the redirection target existing or being accessible, so a separate request will be made to determine the response code and size of the directory.

Listable directories are detected by inspecting the content of url/: if it returns a 200 response code and the body contains either "parent directory", "up to " or "directory listing for" (case insensitive), then it is likely to be a listable directory. If --scrape-listable is enabled, URLs are parsed out of the listing (ignoring sorting links or out of scope links) and added to the scan queue if they have a trailing slash. Listable directories have an L prefix in the output.

Threading

The threading behaviour of Dirble is based on the concepts of wordlists and jobs. A job is any task which can be run independently of other tasks, for example requesting a series of URLs. A wordlist is a list of words with a defined transformation, for example the list {admin, config, shop} together with the transformation append ".php" forms a single wordlist instance.

To improve performance further, we introduce the concept of wordlist splitting. This is the process by which a single wordlist instance (i.e. words with a transformation) is broken up into multiple jobs, each responsible for a portion of the list. The number of interleaved portions that each wordlist is split into is defined by the --wordlist-split option (default 3).

Whenever a directory is detected (and recursion is enabled) new jobs are created for each split wordlist (with transformation) and added to a central job queue.

The maximum number of concurrent tasks is defined by the --max-threads parameter, and Dirble will start jobs as they are added to the queue, up to this limit. Whenever a job completes (i.e. a split wordlist is exhausted) Dirble will take the next job from the queue and start it.

Released under GPL v3.0, see LICENSE for more information

dirble's People

Contributors

dependabot[bot] avatar ipv4v6 avatar isona avatar kwesthaus avatar lgtm-com[bot] avatar sciguy16 avatar stcktrce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dirble's Issues

Directory detection fails if you specify port 80

If you run Dirble with a URL containing port 80 or 443 as follows http://[url]:80, most websites will redirect without the port number, breaking directory detection.
This would be fixed by removing :80 and :443 if the given url begins with http:// or https:// respectively.

Invoked as:
dirble http://[url]:80
Output line showing the issue:
+ http://[url]:80/javascript (CODE:301|SIZE:317|DEST:http://[url]/javascript/

Silent option should not show host timeout

Hi, I think the silent option(-S) should not show host timeout (Timeout was reached) in runtime. You can make it show at the final result.
And in silent, that will be nice if you can put a processing bar into it :D

%ext% support

Hi, I see dirble is not supporting %ext%/%EXT%. Many wordlists using this format and replace it with extensions.
Ex: admin.%ext% -> admin.php / admin.asp / admin.jsp

Follow initial redirect

In testing out dirble, I noticed that it will attempt exactly the url that is given, but seems to not understand what to do if, for example, the following scenario is encountered:

./dirble --host abc.com
<dirble brutes abc.com, but abc.com 301's absolutely every request>

curl -skv abc.com
301 to https://abc.com

curk-skv https://abc.com
301 to https://www.abc.com

real site resides on https://www.abc.com, but input provided is just abc.com.

wpscan handles this pretty well with a function called 'follow initial redirect'.
If something like that could be possible here, it would greatly improve workflow!

Options for wordlist.

Hi, can you make an option in wordlist have '/' at prefix or not? Now, if the wordlist has '/' at the prefix, it not remove it and requests with '//.' Some wordlist has '/' in default. Thanks.

Output missing \n

In output.txt, you are missing \n after Dirble Scan Report for https://domain.com
Ex:

Dirble Scan Report for https://domain.com:8443/:Dirble Scan Report for https://domain.com/:+ https://domain.com/.passwd (CODE:0|SIZE:0)
+ https://domain.com/2005 (CODE:0|SIZE:0)

Scanner Tripped up

Can You guys add a way to detect similar size responses for a tarpit throwing 200 with similar status codes and just disengage the host or better yet discard all results that match a similar size if its is a thing that happens over and over

Missing `follow redirects` feature

Hey,

While trying out this, I noticed that it is missing the follow redirects feature which both dirsearch and gobuster has.

It surely helps with the servers that redirects every request like:

  1. port 80 redirecting request to 443
  2. redirecting to add forward slash at the end.

Thanks

Further false positive detection

Hi!

I just wanted to drop here another use case that it'll be great to drop out from the results, marking it as a false positive.

During the nonexistent paths detection, it'll be great to test a random file with different extensions, as I've seen several cases where the response varies depending only on the extension appended. E.g.:

$ curl -s -o /dev/null -w "%{size_download}" http://[REDACTED]/error/1.html
14
$ curl -s -o /dev/null -w "%{size_download}" http://[REDACTED]/error/1.php
60

In this example, any request that ends in .html will have a size of 14 bytes, and any request that finishes in .php will have a size of 60 bytes.

It'll be great if the nonexistent detection routine could handle these cases too.

My two cents!

Embed version numbers in release zips

It would be useful to embed the version number in the release zipfile and binary names to make it easier to keep track of which version has been downloaded and whether it is up to date.

JSON Output format

In the json output, can we include the target? like -> .target = "https://google.com"

Trying to see if I can integrate this wonderful tool with my automation platform(replace dirsearch) and i'm trying to match the format that was previously going into the database. Do you ever expect the url to have a different host then the target? Unless your extracting links from a page that go to other subdomains, I doubt you would, so even just a simple .path = "/api/v1/users/all" would be perfect. I don't really need to have the entire url in there but eh.

Thanks ๐Ÿ‘

Filter and display response headers

It would be useful to be able to optionally display all or a subset of the response headers from each request, or to flag up when a reponse header matches a particular search string.

A recent test I did involved checking a load of API endpoints for header injection, and my requests included "X-Some-Custom-Header: <script>alert(1)</script>", and the server would sometimes duplicate this header in its response, or copy the header value into the response body. Being able to filter responses based on whether they included the payload in the response, against a wordlist of endpoints would dramatically speed up this testing.

Increase wordlist splitting factor for base scan

Increasing the wordlist splitting factor for the initial scan of the base URL to max(wordlist_split, max_threads - 2) will dramatically increase the speed of the initial discovery phase while leaving a couple of "spare" threads available to start working on any discovered directories. Perhaps this could be the default behaviour, reverting to a fixed splitting factor when the user explicitly provides one on the command line.

Related: I don't think the splitting factor is sanity checked against the thread limit - would it make sense to cap wordlist_split at max_threads when validating the config?

Error `Option::unwrap()`

Hi, I got this error with newest source code from github

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:347:21
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::panicking::panic
   9: dirble::output::directory_name
  10: alloc::slice::<impl [T]>::sort_by::{{closure}}
  11: alloc::slice::merge_sort
  12: dirble::output::print_report
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/libcore/result.rs:999:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::result::unwrap_failed
   9: dirble::main
  10: std::rt::lang_start::{{closure}}
  11: std::panicking::try::do_call
  12: __rust_maybe_catch_panic
  13: std::rt::lang_start_internal
  14: main

Project Roadmap

A list of features that would be nice to add, listed in no particular order:

Input

  • Load base request from a file
  • Load headers from a file
  • Remove empty lines from a wordlist when it's read in, but always scan [url]/
  • Support for multiple wordlists
  • Load command line options from a config file
  • Better detection of where the default wordlist is located
  • Option to pause and resume scans later

Error Checking

  • Check before scanning if a certificate is invalid
  • Optionally output certificate details
  • Better errors when curl returns an error, this is currently represented as a code 0
  • Detection and handling of URL rewriting
  • Wait after receiving a 429 - Too Many Requests
  • Detect when all responses are 401 - Unauthorized or 403 - Forbidden

Output

  • Better header on report
  • Separate report sections for different hosts
  • JSON output format
  • XML output format
  • Option to store all output formats
  • Filter output based on regex
  • Filter on response length
  • Option to output all "found" content to a folder
  • Option to display when a cookie is set by the server
  • Output colouring based on response code
  • Security header audit
  • Option to output Page Title
  • Filtering on MIME type
  • Option to report MIME type

Scraping

  • Scrape pages for in scope URLs to scan
  • Printing of interesting comments, things such as todo, urls, high entropy sections such as hashes
  • Scrape robots.txt for URLs to scan

Scanning

  • Detect if a server is case sensitive
  • Detect if a server replies sensibly to HEAD requests and if it does, use those to save bandwidth (would potentially interfere with scraping however)
  • Support for different HTTP verbs
  • Option to change string used to detect if a directory is listable
  • Set which status codes to ignore/output
  • Interactive recursion
  • Option to not scan without an extension
  • Set subdirectories to exclude in the scan
  • Options to set predefined user agents
  • Option to use random user agents
  • Better 30x handling
  • Wordlist prefixes
  • Vhost bruteforcing
  • Not found tuning
  • Max recursion depth setting
  • Get OPTIONS for each folder scanned
  • Option to take a screenshot of each page found, similar to Eyewitness
  • Check if a 401 page is requesting basic auth

Releasing

  • Mac build
  • Debian dpkg
  • Arch pkgbuild
  • Ubuntu snap
  • Ubuntu PPA
  • Centos/Fedora RPM
  • Gentoo ebuild
  • Windows installer (self-updating?)
  • Mac Homebrew release
  • Release on crates.io
  • Generate man page with clap
  • Generate auto-completion script with clap

Actions

  • Run tests on Windows, Mac, Linux
  • Cross-compile for ARM
  • Build releases
  • Build dpkg & RPM

stream did not contain valid UTF-8

I quickly tested dirble and I came across this error message.

$ ./dirble http://127.0.0.1:8000 -w ~/clones/SecLists/Discovery/Web-Content/big.txt
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

It seems like this line causes it. Removing the line fixed the error.

https://github.com/danielmiessler/SecLists/blob/master/Discovery/Web-Content/big.txt#L16072

Library to parse JSON output

It would be useful to have a library that can parse the JSON output and provide iterators over the discovered content.

Raise error when -oA is provided

Currently the "output all" short alias is --oA with two dashes. Nmap uses -oA with one dash, and attempts to use "-oA" with dirble result in weird error messages (e.g. #43 ).

Short term: Error when -oA is specified
Long term: Make it so that -oA works as expected

Utf8Error - Result::unwrap()

Hi!

I'm using dirble to run a scan using this wordlist: https://gist.github.com/jhaddix/b80ea67d85c13206125806f0828f4d10

with this options:

RUST_BACKTRACE=full ./dirble -l --scrape-listable --scan-401 --scan-403 --show-htaccess -w ../../content_discovery_all.txt -x js,php,java,bak,sql,inc,config,old,1 -u http://blank.blank

and this is what I get:

Dirble 1.4.2 (commit b6c46aa, build 2019-10-28)
Developed by Izzy Whistlecroft
Targets: http://blank.blank
Wordlists: ../../content_discovery_all.txt
No Prefixes
Extensions: 1 bak config inc java js old php sql
No lengths hidden

[INFO] Detected nonexistent paths for http:/blank.blank/ are (CODE:301)
[INFO] Increasing wordlist-split for initial scan of http://blank.blank/ to 8
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 41, error_len: Some(1) }', src/libcore/result.rs:1084:5
stack backtrace:
   0:     0x5635b03d54ab - std::panicking::default_hook::{{closure}}::hd4d730f4b49280ac
   1:     0x5635b03d5186 - std::panicking::default_hook::h15ad337e082b11af
   2:     0x5635b03d5c1d - std::panicking::rust_panic_with_hook::h1ae6f71213bb644c
   3:     0x5635b03d57a2 - std::panicking::continue_panic_fmt::h7260e5946830995a
   4:     0x5635b03d5686 - rust_begin_unwind
   5:     0x5635b03ece2d - core::panicking::panic_fmt::h0f33ccf7fc2a1201
   6:     0x5635b03ecf27 - core::result::unwrap_failed::h5f2f3948a0c719bd
   7:     0x5635b02a9bac - dirble::request::make_request::h6d5658e1e763b468
   8:     0x5635b02810af - dirble::request_thread::thread_spawn::h8b4fdc807a27d39e
   9:     0x5635b0284155 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb12b1413905fb8ee
  10:     0x5635b029c476 - std::panicking::try::do_call::h1c781cdca5ded62e
  11:     0x5635b03d87da - __rust_maybe_catch_panic
  12:     0x5635b0285986 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hcdba3607b5c903c6
  13:     0x5635b03c9daf - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h31390944ec2de39e
  14:     0x5635b03d7ef0 - std::sys::unix::thread::Thread::new::thread_start::h98ef2794a4d7713d
  15:     0x7f2d2d3fd4cf - start_thread
  16:     0x7f2d2d3122d3 - clone
  17:                0x0 - <unknown>

Curl error after requesting c:/Users/Personal%201/Desktop/Portofolio : [1] Unsupported protocol
+ c:/Users/Personal%201/Desktop/Portofolio (CODE:0|SIZE:0)
Curl error after requesting c:/Users/ctyi/Desktop : [1] Unsupported protocol
+ c:/Users/ctyi/Desktop (CODE:0|SIZE:0)
Curl error after requesting c:/Users/ctyi/Desktop1 : [1] Unsupported protocol
+ c:/Users/ctyi/Desktop1 (CODE:0|SIZE:0)
Curl error after requesting c:/Users/K.HOW/Desktop/code/Responsive-Portfolio : [1] Unsupported protocol
+ c:/Users/K.HOW/Desktop/code/Responsive-Portfolio (CODE:0|SIZE:0)

Proper serialisation into JSON and XML

Currently there is no sanitisation or encoding of RequestResponse data when it gets serialised into XML and JSON. A URL or Location header could contain braces or angle brackets, which will mess up the structure. We should implement or derive the Serialize trait properly write unit tests with unusual inputs.

https://github.com/nccgroup/dirble/blob/2eb9801b4083aa2a2181229199d66db899925758/src/output_format.rs#L97-L113
https://github.com/nccgroup/dirble/blob/2eb9801b4083aa2a2181229199d66db899925758/src/output_format.rs#L116-L134

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.