nccgroup / dirble Goto Github PK

View Code? Open in Web Editor NEW

609.0 18.0 87.0 386 KB

Fast directory scanning and scraping tool

License: GNU General Public License v3.0

Makefile 2.11% Rust 95.19% Dockerfile 0.42% Python 1.18% Shell 1.10%

pentest-tool pentest web tool

dirble's Introduction

Introduction

Dirble is a website directory scanning tool for Windows and Linux. It's designed to be fast to run and easy to use.

How to Use

Download one of the precompiled binaries for Linux, Windows, or Mac, or compile the source using Cargo, then run it from a terminal. The default wordlist Dirble uses is dirble_wordlist.txt in the same directory as the executable.

It can be installed in BlackArch using sudo pacman -S dirble

There is also a docker image, which can be run as: docker run --rm -t isona/dirble [dirble arguments]

The help text can be displayed using dirble --help, alternatively it can be found on the github wiki: https://github.com/nccgroup/dirble/wiki/Help-Text

Example Uses

Run against a website using the default dirble_wordlist.txt from the current directory: dirble [address]

Run with a different wordlist and including .php and .html extensions: dirble [address] -w example_wordlist.txt -x .php,.html

With listable directory scraping enabled: dirble [address] --scrape-listable

Providing a list of extensions and a list of hosts: dirble [address] -X wordlists/web.lst -U hostlist.txt

Providing multiple hosts to scan via command line: dirble [address] -u [address] -u [address]

Running with threading in Gobuster's default style, disabling recursion and having 10 threads scanning the main directory: dirble [address] --max-threads 10 --wordlist-split 10 -r

Building from source

To build on your current platform, ensure cargo is installed and then run cargo build --release. Alternatively, running make will build the binary in release mode (internally running cargo build --release).

To cross-compile for 32- and 64-bit Linux and Windows targets, there is a handy makefile. make release will build for all four targets using cross. This depends on having cross and docker installed (cargo install cross).

Features

Cookies
Custom Headers
Extensions and prefixes
HTTP basic auth
Listable directory detection and scraping
Save ouptut to file
Save output in XML and JSON formats
Proxy support
Recursion
Status code blacklisting and whitelisting
Threading
Request throttling
Detect not found code of each directory based on response code and length
Ability to provide list of URLs to be scanned
User agents
Scanning with GET, POST or HEAD requests
Exclude ranges of response lengths from output

Performance

The following graph was generated by running each tool with Hyperfine against a test server with 5ms latency and 1% packet loss. (Gobuster was omitted due to lack of recursion).

How it works

Directory Detection

Dirble detects files based on the response code sent by the server. The behaviour can be loosely categorised by response code type.

200: the path exists and is valid
301, 302: redirection; report the code, size, and Location header
404: not found; by default these responses are not reported
All other response codes are reported in the Dirble format of + [url] (CODE:[code]|SIZE:[size])

A path is classified as a directory if a request to [url] (with no trailing slash) returns a 301 or 302 redirection to [url]/ (with a trailing slash). This gets reported with a D prefix and if recursion is enabled will be added to the scan queue. This method is not dependent on the redirection target existing or being accessible, so a separate request will be made to determine the response code and size of the directory.

Listable directories are detected by inspecting the content of url/: if it returns a 200 response code and the body contains either "parent directory", "up to " or "directory listing for" (case insensitive), then it is likely to be a listable directory. If --scrape-listable is enabled, URLs are parsed out of the listing (ignoring sorting links or out of scope links) and added to the scan queue if they have a trailing slash. Listable directories have an L prefix in the output.

Threading

The threading behaviour of Dirble is based on the concepts of wordlists and jobs. A job is any task which can be run independently of other tasks, for example requesting a series of URLs. A wordlist is a list of words with a defined transformation, for example the list {admin, config, shop} together with the transformation append ".php" forms a single wordlist instance.

To improve performance further, we introduce the concept of wordlist splitting. This is the process by which a single wordlist instance (i.e. words with a transformation) is broken up into multiple jobs, each responsible for a portion of the list. The number of interleaved portions that each wordlist is split into is defined by the --wordlist-split option (default 3).

Whenever a directory is detected (and recursion is enabled) new jobs are created for each split wordlist (with transformation) and added to a central job queue.

The maximum number of concurrent tasks is defined by the --max-threads parameter, and Dirble will start jobs as they are added to the queue, up to this limit. Whenever a job completes (i.e. a split wordlist is exhausted) Dirble will take the next job from the queue and start it.

Released under GPL v3.0, see LICENSE for more information

dirble's People

Contributors

Stargazers

Watchers

Forkers

sciguy16 totoroha keystroke95 ncivnp pweisdepp bbhunter rahmiy olivierh59500 mosuan sasqwatch minkione c0dak sgachies 1r0dm480 securux axiom215 hitesh50 open-sec mmg1 shreegowtham27 3453-315h duzhanyuan helloexp enumeration-tools evi1hack wayc0des-land 0d4rujd 5l1v3r1 elamaran619 strangerting slooppe ipv4v6 rajivraj hax0rg1rl blockchainguard sarmadbytes rezaduty deepwebhacker zmdprogrom angrykobe moonsun80 qianniaoge zacharyz-at bosci m3g4byt3 keyman9848 b-xiang yut0u xinyuweb kwesthaus trietptm-on-coding-algorithms pakrae asdlei99 roguesmg hunter0x8 polling-repo-continua w4fz5uck5 shahid1996 burnafterr3ad reewardius hartl3y94 nosorry gh0st0ne run0nceex link-spider mk-kaiser tmcmil markus851 ak-infosol-pvt-ltd sylviagaytaneh2021 qpc-github quantum-platinum-cloud nanaao dinarpay wandersonsousa hwatwasthat lgtm-migrator tools-env iq-scm edvrfn brainhub24 excloudx6 sponkmonk nachorl

dirble's Issues

Support for CURLOPT_RESOLVE

The CURLOPT_RESOLVE option instructs libcurl to override DNS lookups for particular (or all) hostnames. This would be useful when scanning a server that requires SNI but does not have a public DNS record, as an alternative to modifying /etc/hosts.

Write output every n seconds

Hi, Can it save in n seconds? Thanks.

Directory detection fails if you specify port 80

If you run Dirble with a URL containing port 80 or 443 as follows http://[url]:80, most websites will redirect without the port number, breaking directory detection.
This would be fixed by removing :80 and :443 if the given url begins with http:// or https:// respectively.

Invoked as:
dirble http://[url]:80
Output line showing the issue:
+ http://[url]:80/javascript (CODE:301|SIZE:317|DEST:http://[url]/javascript/

Hide-lenghts min value 0

Hi!

This is a small patch that allows zero value in the argument hide-lenght
In some cases it is necessary to hide the value 0, in ordern to avoid filling the json file with garbage.

+ https://www.example.com/test/index.php (CODE:0|SIZE:0)
Curl error after requesting https://www.example.com/test/index.php : [28] Timeout was reached

Replace min value 1 by 0.
https://github.com/nccgroup/dirble/blob/b6c46aab7e18889bc984e5bfa8a60415a85cdf8d/src/arg_parse.rs#L553

Thanks!

Silent option should not show host timeout

Hi, I think the silent option(-S) should not show host timeout (Timeout was reached) in runtime. You can make it show at the final result.
And in silent, that will be nice if you can put a processing bar into it :D

%ext% support

Hi, I see dirble is not supporting %ext%/%EXT%. Many wordlists using this format and replace it with extensions.
Ex: admin.%ext% -> admin.php / admin.asp / admin.jsp

Follow initial redirect

In testing out dirble, I noticed that it will attempt exactly the url that is given, but seems to not understand what to do if, for example, the following scenario is encountered:

./dirble --host abc.com
<dirble brutes abc.com, but abc.com 301's absolutely every request>

curl -skv abc.com
301 to https://abc.com

curk-skv https://abc.com
301 to https://www.abc.com

real site resides on https://www.abc.com, but input provided is just abc.com.

wpscan handles this pretty well with a function called 'follow initial redirect'.
If something like that could be possible here, it would greatly improve workflow!

Automate builds when new tags are pushed

Options for wordlist.

Hi, can you make an option in wordlist have '/' at prefix or not? Now, if the wordlist has '/' at the prefix, it not remove it and requests with '//.' Some wordlist has '/' in default. Thanks.

Output missing \n

In output.txt, you are missing \n after Dirble Scan Report for https://domain.com
Ex:

Dirble Scan Report for https://domain.com:8443/:Dirble Scan Report for https://domain.com/:+ https://domain.com/.passwd (CODE:0|SIZE:0)
+ https://domain.com/2005 (CODE:0|SIZE:0)

Scanner Tripped up

Can You guys add a way to detect similar size responses for a tarpit throwing 200 with similar status codes and just disengage the host or better yet discard all results that match a similar size if its is a thing that happens over and over

Missing `follow redirects` feature

Hey,

While trying out this, I noticed that it is missing the follow redirects feature which both dirsearch and gobuster has.

It surely helps with the servers that redirects every request like:

port 80 redirecting request to 443
redirecting to add forward slash at the end.

Thanks

Further false positive detection

Hi!

I just wanted to drop here another use case that it'll be great to drop out from the results, marking it as a false positive.

During the nonexistent paths detection, it'll be great to test a random file with different extensions, as I've seen several cases where the response varies depending only on the extension appended. E.g.:

$ curl -s -o /dev/null -w "%{size_download}" http://[REDACTED]/error/1.html
14
$ curl -s -o /dev/null -w "%{size_download}" http://[REDACTED]/error/1.php
60

In this example, any request that ends in .html will have a size of 14 bytes, and any request that finishes in .php will have a size of 60 bytes.

It'll be great if the nonexistent detection routine could handle these cases too.

My two cents!

Embed version numbers in release zips

It would be useful to embed the version number in the release zipfile and binary names to make it easier to keep track of which version has been downloaded and whether it is up to date.

Include requested host name in error message

https://github.com/nccgroup/dirble/blob/b0a33f5d425bcf4c569e390572833910dc7c0cae/src/arg_parse.rs#L874

The error message is not very helpful as it does not say which hostname caused it. Include the hostname in the error message to make debugging easier.

Also there's a typo in the current message.

JSON Output format

In the json output, can we include the target? like -> .target = "https://google.com"

Trying to see if I can integrate this wonderful tool with my automation platform(replace dirsearch) and i'm trying to match the format that was previously going into the database. Do you ever expect the url to have a different host then the target? Unless your extracting links from a page that go to other subdomains, I doubt you would, so even just a simple .path = "/api/v1/users/all" would be perfect. I don't really need to have the entire url in there but eh.

Thanks 👍

Filter and display response headers

It would be useful to be able to optionally display all or a subset of the response headers from each request, or to flag up when a reponse header matches a particular search string.

A recent test I did involved checking a load of API endpoints for header injection, and my requests included "X-Some-Custom-Header: <script>alert(1)</script>", and the server would sometimes duplicate this header in its response, or copy the header value into the response body. Being able to filter responses based on whether they included the payload in the response, against a wordlist of endpoints would dramatically speed up this testing.

Embed current branch/commit in version number for builds that aren't directly on a release tag

To make it easier to keep track of development builds that don't correspond to releases it would be good to include the current branch and commit in the version string that's printed out in the header.

There are bindings to libgit2 that could be used in a build script to set an appropriate environment variable for the build process.

Increase wordlist splitting factor for base scan

Increasing the wordlist splitting factor for the initial scan of the base URL to max(wordlist_split, max_threads - 2) will dramatically increase the speed of the initial discovery phase while leaving a couple of "spare" threads available to start working on any discovered directories. Perhaps this could be the default behaviour, reverting to a fixed splitting factor when the user explicitly provides one on the command line.

Related: I don't think the splitting factor is sanity checked against the thread limit - would it make sense to cap wordlist_split at max_threads when validating the config?

Error `Option::unwrap()`

Hi, I got this error with newest source code from github

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:347:21
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::panicking::panic
   9: dirble::output::directory_name
  10: alloc::slice::<impl [T]>::sort_by::{{closure}}
  11: alloc::slice::merge_sort
  12: dirble::output::print_report
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/libcore/result.rs:999:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::result::unwrap_failed
   9: dirble::main
  10: std::rt::lang_start::{{closure}}
  11: std::panicking::try::do_call
  12: __rust_maybe_catch_panic
  13: std::rt::lang_start_internal
  14: main

Project Roadmap

A list of features that would be nice to add, listed in no particular order:

Input

Load base request from a file
Load headers from a file
Remove empty lines from a wordlist when it's read in, but always scan [url]/
Support for multiple wordlists
Load command line options from a config file
Better detection of where the default wordlist is located
Option to pause and resume scans later

Error Checking

Check before scanning if a certificate is invalid
Optionally output certificate details
Better errors when curl returns an error, this is currently represented as a code 0
Detection and handling of URL rewriting
Wait after receiving a 429 - Too Many Requests
Detect when all responses are 401 - Unauthorized or 403 - Forbidden

Output

Scraping

Scrape pages for in scope URLs to scan
Printing of interesting comments, things such as todo, urls, high entropy sections such as hashes
Scrape robots.txt for URLs to scan

Scanning

Releasing

Actions

Run tests on Windows, Mac, Linux
Cross-compile for ARM
Build releases
Build dpkg & RPM

Macos build for v1.4.0

Hi, I see Mac OS build is missing in the new release xD

stream did not contain valid UTF-8

I quickly tested dirble and I came across this error message.

$ ./dirble http://127.0.0.1:8000 -w ~/clones/SecLists/Discovery/Web-Content/big.txt
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

It seems like this line causes it. Removing the line fixed the error.

https://github.com/danielmiessler/SecLists/blob/master/Discovery/Web-Content/big.txt#L16072

Library to parse JSON output

It would be useful to have a library that can parse the JSON output and provide iterators over the discovered content.

Raise error when -oA is provided

Currently the "output all" short alias is --oA with two dashes. Nmap uses -oA with one dash, and attempts to use "-oA" with dirble result in weird error messages (e.g. #43 ).

Short term: Error when -oA is specified
Long term: Make it so that -oA works as expected

Utf8Error - Result::unwrap()

Hi!

I'm using dirble to run a scan using this wordlist: https://gist.github.com/jhaddix/b80ea67d85c13206125806f0828f4d10

with this options:

RUST_BACKTRACE=full ./dirble -l --scrape-listable --scan-401 --scan-403 --show-htaccess -w ../../content_discovery_all.txt -x js,php,java,bak,sql,inc,config,old,1 -u http://blank.blank

and this is what I get:

Dirble 1.4.2 (commit b6c46aa, build 2019-10-28)
Developed by Izzy Whistlecroft
Targets: http://blank.blank
Wordlists: ../../content_discovery_all.txt
No Prefixes
Extensions: 1 bak config inc java js old php sql
No lengths hidden

[INFO] Detected nonexistent paths for http:/blank.blank/ are (CODE:301)
[INFO] Increasing wordlist-split for initial scan of http://blank.blank/ to 8
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 41, error_len: Some(1) }', src/libcore/result.rs:1084:5
stack backtrace:
   0:     0x5635b03d54ab - std::panicking::default_hook::{{closure}}::hd4d730f4b49280ac
   1:     0x5635b03d5186 - std::panicking::default_hook::h15ad337e082b11af
   2:     0x5635b03d5c1d - std::panicking::rust_panic_with_hook::h1ae6f71213bb644c
   3:     0x5635b03d57a2 - std::panicking::continue_panic_fmt::h7260e5946830995a
   4:     0x5635b03d5686 - rust_begin_unwind
   5:     0x5635b03ece2d - core::panicking::panic_fmt::h0f33ccf7fc2a1201
   6:     0x5635b03ecf27 - core::result::unwrap_failed::h5f2f3948a0c719bd
   7:     0x5635b02a9bac - dirble::request::make_request::h6d5658e1e763b468
   8:     0x5635b02810af - dirble::request_thread::thread_spawn::h8b4fdc807a27d39e
   9:     0x5635b0284155 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb12b1413905fb8ee
  10:     0x5635b029c476 - std::panicking::try::do_call::h1c781cdca5ded62e
  11:     0x5635b03d87da - __rust_maybe_catch_panic
  12:     0x5635b0285986 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hcdba3607b5c903c6
  13:     0x5635b03c9daf - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h31390944ec2de39e
  14:     0x5635b03d7ef0 - std::sys::unix::thread::Thread::new::thread_start::h98ef2794a4d7713d
  15:     0x7f2d2d3fd4cf - start_thread
  16:     0x7f2d2d3122d3 - clone
  17:                0x0 - <unknown>

Curl error after requesting c:/Users/Personal%201/Desktop/Portofolio : [1] Unsupported protocol
+ c:/Users/Personal%201/Desktop/Portofolio (CODE:0|SIZE:0)
Curl error after requesting c:/Users/ctyi/Desktop : [1] Unsupported protocol
+ c:/Users/ctyi/Desktop (CODE:0|SIZE:0)
Curl error after requesting c:/Users/ctyi/Desktop1 : [1] Unsupported protocol
+ c:/Users/ctyi/Desktop1 (CODE:0|SIZE:0)
Curl error after requesting c:/Users/K.HOW/Desktop/code/Responsive-Portfolio : [1] Unsupported protocol
+ c:/Users/K.HOW/Desktop/code/Responsive-Portfolio (CODE:0|SIZE:0)

Proper serialisation into JSON and XML

Currently there is no sanitisation or encoding of RequestResponse data when it gets serialised into XML and JSON. A URL or Location header could contain braces or angle brackets, which will mess up the structure. We should implement or derive the Serialize trait properly write unit tests with unusual inputs.

https://github.com/nccgroup/dirble/blob/2eb9801b4083aa2a2181229199d66db899925758/src/output_format.rs#L97-L113
https://github.com/nccgroup/dirble/blob/2eb9801b4083aa2a2181229199d66db899925758/src/output_format.rs#L116-L134

rename --host to --url

I'm playing with the tool and getting confused everytime I run it.

host is an IP address or a dns name. But what you really require is an URL (with scheme)
https://danielmiessler.com/study/url-uri/