GithubHelp home page GithubHelp logo

yongliang-qiao / cate-benchmark_q Goto Github PK

View Code? Open in Web Editor NEW

This project forked from misoc-mml/cate-benchmark

0.0 0.0 0.0 78 KB

A testing platform to assess the performance of CATE estimators across popular datasets.

License: GNU General Public License v3.0

Shell 6.82% Python 93.18%

cate-benchmark_q's Introduction

CATE Benchmark

A testing platform to assess the performance of CATE estimators across popular datasets.

Installation

The easiest way to replicate the running environment is through Anaconda. Once installed, follow the steps below.

  1. Download the repo.
  2. Enter the directory (i.e. cd cate-benchmark).
  3. Run the following command to recreate the 'cate-bench' conda environment:

conda env create -f environment.yml

  1. Download datasets from here. Once downloaded, extract them to 'datasets' directory.

Usage

The 'experiments' folder contains a few example scripts that run the code, ranging from basic to more advanced. To learn more about available script parameters, see the contents of 'main.py'.

By default, any files created as part of running the scripts are saved under 'results'.

Example - 'basic'

Go to 'experiments' and run the basic script:

bash basic.sh

This script tests the Lasso model against one iteration of the IHDP data set. You should see relevant metrics and the performance obtained by the estimator printed in the console. In the same script, it is easy to change the number of iterations, the data set or the estimator.

Example - 'advanced'

Go to 'experiments' and run the advanced script:

bash advanced.sh

This script covers more estimators and 10 iterations of IHDP. Once it's done, you should see a summary similar to the following one:

You can find the content of the summary in 'results/combined.csv'.

Example - 'extensive'

This script tests almost all estimators against all four data sets. Depending on the computational power of your machine, this script may take days or even weeks to complete. To run the script, go to 'experiments' and run:

bash extensive.sh

Analysing results

A separate directory is created per each selected estimator when running the scripts to store various results. The following result files are usually created:

  • info.log (intermediate results as the script is being executed)
  • scores.csv (final scores per relevant metric)
  • times.csv (training + prediction time in seconds consumed by an estimator)

In addition, it is possible to get a summary of multiple estimators in a single table. This can be done via the 'results/process.py' script, which in turn produces 'combined.csv' file. For an example usage, see some of the existing running scripts.

Adding other estimators

The code can be easily extended to use more estimators.

  1. Go to 'main.py'.
  2. Edit 'get_parser()': add new key to 'estimation_model'.
  3. Edit '_get_model()': using the new key, return an instance of your model.
  4. Edit 'estimate(): train your model on the data and provide predictions.

Other projects

Projects using the CATE benchmark:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.