GithubHelp home page GithubHelp logo

edefuzz's Introduction

EDEFuzz - Hunting excessive data exposure in web APIs

A tool to flag excessive data exposure vulnerabilities in web APIs. Tested on Ubuntu 20.04 and Windows 10 22H2.

The design and implementation of EDEFuzz are described in our paper.

Citing EDEFuzz

EDEFuzz was accepted for publication at the 46th International Conference on Software Engineering (ICSE 2024).

If you use EDEFuzz in your research, please cite our ICSE'2024 paper.

@inproceedings{pan2024edefuzz,
  title={EDEFuzz: A Web API Fuzzer for Excessive Data Exposures},
  author={Pan, Lianglu and Cohney, Shaanan and Murray, Toby and Pham, Van-Thuan},
  booktitle={Proceedings of the 46th IEEE/ACM International Conference on Software Engineering},
  pages={1--12},
  year={2024}
}

Folder structure

EDEFuzz
├── config: this folder stores configuration files, prepared manually (discussed in section 4.1)
│   ├── *.config: the configuration file, containing a sequence of instructions for EDEFuzz to interact with the web page under test
├── tests: contents in this folder are generated by the tool
│   ├── *.json: the API response produced by the server, used as a baseline in our experiments
│   └── *.data: the cached request-response pairs captured, allowing us to run a simulated server (discussed in section 4.2)
├── cache.py: to generate the cache file for a target API, used by `fuzzer.py`
├── constants.py: defines some constant values used in EDEFuzz, nothing special
├── database.py: database utility functions
├── engine.py: the fuzzer engine, used by `fuzzer.py` to execute test cases
├── fuzzer.py: main entry of EDEFuzz
├── mutate.py: generates mutations of an API response
├── README.md
└── report.py: the result analyser, used by `fuzzer.py` to analyse results
└── report_html.py: to generate the HTML file highlights flagged excessive fields, used by `report.py`

Initial setup

Docker setup

Run sudo docker-compose up. It will build two Docker images and run them. Once complete, and you should be able to see something begin with edefuzz-mysql | ... indicating the database is running.

  • edefuzz-mysql: the database used by EDEFuzz to store data
  • edefuzz: EDEFuzz will run within this container

If you encounter http: invalid Host header error, try sudo snap refresh docker --channel=latest/edge. You may find more information about this error in this link.

Usage

(In a new terminal window, )Get into the Docker container edefuzz's shell by: docker exec -it edefuzz bash. cd into EDEFuzz's folder by cd edefuzz.

Preparation

  • Create a configuration file under config/ folder (discussed in section 4.1). You can find a few examples given in config/ folder.
  • Run python3 fuzzer.py c [target] (e.g. python3 fuzzer.py c wikipedia) to generate request-response cache (used by simulated server in fuzzing stage, discussed in section 4.2). Note that configuration files provided by us may or may not work as those websites could have changed.

The request-response cache is stored at tests/[target].data. This process should typically take no more than a few seconds.

Test execution

  • Run python3 fuzzer.py f [target] to execute test cases (e.g. python3 fuzzer.py f wikipedia). The below screenshot indicates the test execution is in progress.

Test execution

The duration of this stage is mostly dependent on the number of mutants, ranging from a couple minutes to a few days. Outputs are stored in the database (the edefuzz-mysql Docker container). As a reference, the wikipedia example took around ten minutes on our end.

Analysing results

  • Run python3 fuzzer.py r [target] to generate the results. The full process is discussed in section 4.4.

Flagged excessive data fields are reported in tests/[target].csv. Each line in tests/[target].csv indicates an identified excessive data field. A more user-friendly output is generated in tests/[target]_flagged.html (an example is shown below), containing the original JSON object, with flagged excessive fields highlighted in red.

Flagged results

A statistical summary is generated in report_stat.csv. Each line in report_stat.csv is in the form [target name], [test duration (min)], [number of data fields in API], [number of executed test cases], [number of excessive fields flagged], [extra flags indicating why EDEFuzz failed (explained in section 5.2 RQ2)]. This statistical summary is useful after testing multiple targets.

Dataset Used in Our Paper

The Australian Dataset

Appear in paper Entity Name [target] name used in EDEFuzz
Company-A Australia Post auspost
Company-B Canvas canvas
Company-C Chemist Warehouse chemistwarehouse
Company-D Ikea Australia ikea
Company-E JB-HIFI jbhifi
Company-F JetStar jetstar
Company-G Myer myer
Company-H Volkswagen Australia volkswagen
  • Some businesses included in our Australian dataset have updated their websites (e.g. new API endpoints, the page we tested no longer accessible, etc.). The config files for those targets are revised in January 2024.
  • Company-A's website is now protected by CAPTCHA (discussed in section 5.2 RQ2). While we still provide a config file for Company-A, it may fail the preparation phase.
  • The config file for Company-B is not provided as it contains author's log-in credential.

The Alexa Top 200 Dataset

The Alexa top 200 dataset, less 131 that we excluded for numerous reasons (discussed in section 5.2 RQ2).

360 3dmgame adobe ali213 alibaba aliexpress
alipay aliyun amazon apple archive baidu
bankofamerica bilibili binance bing cctv chess
cnn coinmarketcap csdn deepl deviantart douban
douyu dropbox etsy freepik hupu ilive
imdb imgur iqiyi ixigua lenovo linkedin
microsoft msn nytimes paypal pinterest primevideo
qq salesforce shutterstock sina sogou sohu
spotify stackexchange steamcommunity telegram theguardian tiktok
tradingview tumblr twitter weather wetransfer wikipedia
wordpress yahoo youdao youku youtube zhihu
zhihuishu zoho zoom

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

edefuzz's People

Contributors

pa55er6y avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.