GithubHelp home page GithubHelp logo

microsoft / private-benchmarking Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 4.42 MB

A platform that enables users to perform private benchmarking of machine learning models. The platform facilitates the evaluation of models based on different trust levels between the model owners and the dataset owners.

License: MIT License

Dockerfile 0.85% Python 76.45% Shell 6.60% HTML 16.10%
benchmarking inference llms-benchmarking mpc private private-benchmarking secure ezpc large-language-models contamination

private-benchmarking's Introduction

Private Benchmarking of Machine Learning Models

Project Status

Warning: This is an academic proof-of-concept prototype and has not received careful code review. This implementation is NOT ready for production use.

points

  • ssl certificate security for the website (file:settings.py) (deployment tasks)
  • implement Trust level 1,2,3,4 and 5
  • Testing
  • Documentation
  • CI/CD workflows for github actions

Project Description

This project aims to create a platform that enables users to perform private benchmarking of machine learning models. The platform facilitates the evaluation of models based on different trust levels between the model owners and the dataset owners.

This repository provides the accompnaying code for paper https://arxiv.org/abs/2403.00393

TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs

Tanmay Rajore, Nishanth Chandran, Sunayana Sitaram, Divya Gupta, Rahul Sharma, Kashish Mittal, Manohar Swaminathan

Installation

for complete build and EzPC LLM support

  • modify the setup.sh file to according to your system configuration for Nvidia Drivers and CUDA version (default is 11.8 and GPU architecture is 90 Hopper )
        (In setup.sh)
        line 42: export CUDA_VERSION=11.8
        line 43: export GPU_ARCH=90
    
  • run the setup.sh file
    ./setup.sh
    Enter the Server IP address: <your_server_IP>
    

only the platform

pip install -r requirements.txt
cd eval_website/eval_website
python manage.py makemigrations
python manage.py migrate
python manage.py runserver 0.0.0.0:8000

Usage

To use the project after installation visit.

http://127.0.0.1:8000 (on Localhost) or http://<your_server_IP>:8000 (on Public IP)

  • Sample User Credentials

    • Model Owner
      • username: ModelOwner
      • password: helloFriend
    • Dataset Owner
      • username: DatasetOwner
      • password: helloFriend
  • certain ports are pre-assigned as follows:

    • 8000: for the main website
    • 8001: for the EzPC LLM secure communication with Trusted third party server
    • 7000: for the Trusted execution environment to communicate with the website
    • 7001: for the Trusted third party server to receive model files
    • 7002: for the Trusted third party server to receive dataset files
    • 9000: for communication of Dataset owner with the website for receiving key files for EzPC
    • 9001: for communication of Model owner with the website for receiving key files for EzPC
  • Trusted Third Party(TTP) Server

    • The TTP server is a separate server that is used to perform the secure computation of the model. The TTP server is required to be running for the secure computation to be performed. The TTP server can be started by running the following command.
      cd utils/TTP_TEE_files
      python ttp_server.py
    • Assumptions:
      • The TTP server related details are set in the platform Backend database.
      • The TTP server require to receive model and dataset files for evaluation from the respective parties on port 7001 and 7002 respectively.
      • The TTP server will perform the secure computation and return the results to the platform.
      • The TTP server also requires server.crt and server.key files to be present in the same directory as the ttp_server.py file. These files are used for secure communication between the TTP server and the platform using the CA generated by the Platform after first run and need to be generated using the following command.
      openssl req -newkey rsa:2048 -nodes -keyout "./server.key" -out server.csr -subj /CN=127.0.0.1
      
      openssl x509 -req -in server.csr -CA path/ca.crt(generated by eval_website root) -CAkey /path/ca.key(generated by eval_website root) -CAcreateserial -out ./server.crt -days xxx
    • Environment Variable ENCRYPTION_KEY is required to be set for the TTP/TEE server to run (32 bytes/256 bits) key.
      export ENCRYPTION_KEY="32 bytes key"
      #generate a 32 bytes key using the following command
      python -c 'import os, binascii; print(binascii.hexlify(os.urandom(32)).decode("utf-8"))'
      
  • Trusted Execution Environment(TEE)

    • The Trusted Execution Environment is a separate server that is used to perform the secure computation of the model (based on TTP scripts). The Trusted Execution Environment is required to be running for the secure computation to be performed. Detailed instructions for setting up the Trusted Execution Environment can be found in the TTP/TEE.
  • EzPC LLM

    • currently EzPC supports the following models

      • bert-tiny
      • bert-base
      • bert-large
      • gpt2
      • gpt-neo
      • llama7b
      • llama13b
    • for more information on how to use EzPC LLM refer to the EzPC LLM.

Artifacts Evaluation

The artifacts evaluation for the paper to generate the Table can be found in the Artifacts Evaluation.

Contributing

If you would like to contribute to this project, please follow the guidelines outlined in the contributing.md file.

License

This project is licensed under the [MIT] license. Please see the LICENSE file for more information.

private-benchmarking's People

Contributors

dependabot[bot] avatar rahsharmar avatar trajore avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

private-benchmarking's Issues

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.