GithubHelp home page GithubHelp logo

shaikhmubin02 / leetcode-hard-gym Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tmwilliamlin168/leetcode-hard-gym

0.0 0.0 0.0 1.19 MB

A hard gym for programming

Shell 0.40% Python 99.60%

leetcode-hard-gym's Introduction

Leetcode-Hard Gym

RL environment interface to LeetCode's submission server for evaluating codegen agents. Built on top of OpenAI's gym.

Written by: Beck Labash

Supports:

  • c
  • c#
  • java
  • python
  • javascript
  • ruby
  • swift
  • go
  • scala
  • kotlin
  • rust
  • php
  • typescript
  • racket
  • erlang
  • elixir
  • dart
  • mysql

Leaderboard for Leetcode Hard (Python): Pass@1

  • OpenAI's GPT-4: 10.7 (source)
  • OpenAI's Codex: 3.6 (source)
  • OpenAI's GPT-3.5: 0.0 (source)
  • Reflexion + GPT-4: 15.0 (source)

Setup:

  1. Clone the repository:
git clone https://github.com/GammaTauAI/leetcode-hard-gym.git && cd leetcode-hard-gym
  1. Create a virtual environment and install the leetcode_env module and its dependencies:
python -m venv venv
source venv/bin/activate
python -m pip install -e .
  1. Set the environment variable LEETCODE_SESSION to the cookie LEETCODE_SESSION from a signed-in Leetcode session. This cookie can be found by using browser DevTools or by using a browser extension like EditThisCookie.
export LEETCODE_SESSION=...

Example usage:

First we write some code:

code = """
class Solution:
    def twoSum(self, nums, target):
        l = len(nums)
        for i in range(l - 1):
            for j in range(i + 1, l):
                if nums[i] + nums[j] == target:
                    return [i, j]
"""

Then we can build a submission ...

from leetcode_env.types import LeetCodeSubmission, ProgrammingLanguage
sub = LeetCodeSubmission(code=code,
                         lang=ProgrammingLanguage.PYTHON3,
                         question_slug='two-sum',
                         timeout=5)

... and instantiate a submission environment ...

from leetcode_env.environment import LeetcodeEnv
env = LeetcodeEnv()

Finally, we can step through the environment with the submission:

status, reward, done, submission_result = env.step(sub)
print(status, reward, done, submission_result)
# Wrong Answer
# False
# False
# {'status_code': 11, 'lang': 'python3', 'run_success': True, 'status_runtime': 'N/A', 'memory': 14160000, 'question_id': '4', 'elapsed_time': 105, 'compare_result': '00010000000...00000000001000', 'code_output': '1.00000', 'std_output': '', 'last_testcase': '[1,3]\n[2]', 'expected_output': '2.00000', 'task_finish_time': 1680132323596, 'total_correct': 6, 'total_testcases': 2094, 'runtime_percentile': None, 'status_memory': 'N/A', 'memory_percentile': None, 'pretty_lang': 'Python3', 'submission_id': '924506780', 'input_formatted': '[1,3], [2]', 'input': '[1,3]\n[2]', 'status_msg': 'Wrong Answer', 'state': 'SUCCESS'}

Note: compare result was shortened here, it contains a sequence of booleans indicating if a test was passed

LeetcodeHardGym Dataset

A script is provided to build an uncontaminated set of free Leetcode Hard problems in a format similar to HumanEval. It fetches the dataset, filters out class-dependent, void, and class implementation problems, and formats the problems for the specified programming languages. Optionally, it can extract test cases from examples in problem descriptions using GPT, or remove these examples from generated docstrings.

Usage

To build the dataset, leetcode_env must be installed in the current environment. Then, we can run the following command from the leetcode_dataset/ directory of this repository:

python build.py --langs python3 rust --log_level INFO --output_dir ./build

Arguments

  • --langs: List of languages. Current options are: rust, python3.
  • --log_level: Logging level. Options: DEBUG, INFO, WARNING, ERROR, CRITICAL. Default is INFO.
  • --output_dir: Directory to save the built dataset. Default is ./build.
  • --extract_test_cases: If set, test cases will be extracted from problem descriptions using GPT.
  • --remove_examples: If set, examples will be removed. Cannot be used with --extract_test_cases.

Environment Variables

  • LEETCODE_SESSION: This environment variable must be set for the script to run. Please refer to the Setup section for instructions on how to obtain your session cookie.
  • OPENAI_API_KEY: This environment variable is required if the --extract_test_cases option is used. Please refer to the OpenAI API documentation for instructions on how to obtain your API key.

Dependencies

If the --extract_test_cases option is used, the openai and langchain libraries are required. These can be installed with:

 pip3 install openai langchain termcolor

Output

The script will output a .jsonl file for each specified language in the output directory. The filename will be in the format leetcode-hard-uncontaminated-{lang}.jsonl.

Cite

This benchmark was introduced in the following paper:

@misc{shinn2023reflexion,
      title={Reflexion: Language Agents with Verbal Reinforcement Learning}, 
      author={Noah Shinn and Federico Cassano and Beck Labash and Ashwin Gopinath and Karthik Narasimhan and Shunyu Yao},
      year={2023},
      eprint={2303.11366},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

leetcode-hard-gym's People

Contributors

becklabs avatar noahshinn024 avatar noahshinn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.