GithubHelp home page GithubHelp logo

coboleval's Introduction

COBOLEval: LLM Evaluation for COBOL

COBOLEval is a dataset to evaluate the code generation abilities of Large Language Models on the COBOL programming language. It is a transpilation of the widely-used HumanEval benchmark from Python into COBOL. This repo contains both the Python to COBOL transpiler, and an evaluation harness for the dataset.

Installation

COBOLEval uses GnuCOBOL to compile the generated COBOL solutions. Download version 3.2.0 here and follow the installation instructions: https://sourceforge.net/projects/gnucobol/files/.

Check that the installation was successful with:

>>> cobc -v
cobc (GnuCOBOL) 3.2.0

Using Python3.10 or later:

python -m venv coboleval
source coboleval/bin/activate
pip install -r requirements.txt

To run the Python to COBOL transpiler, you'll need to install Rust.

Usage

This program runs untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. Following HumanEval, the execution call in evaluation.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner.

Generate completions

Configure the model and the number of samples-per-problem in scripts/generate.py then run.

if __name__ == "__main__":
    model = Model(name="gpt-4", samples_per_task=1)
    runner = OpenAIChat(model)
    runner.eval()

This will create a samples.jsonl file in preds/gpt-4 which contains the generated COBOL solutions.

Calculate Pass@k

Configure the model and the number of samples in the entrypoint() function in scripts/evaluate_functional_correctness.py:

def entrypoint():
    all_results = []
    run_folders = ["gpt-4"]  # edit
    for folder in run_folders:
        all_results.append(eval(f"preds/{folder}", "1"))

    for res, folder in zip(all_results, run_folders):
        print(f"{folder}: {res}")

Outputs are written to preds/gpt-4/samples_results.jsonl and Pass@k is printed:

gpt-4: {'pass@1': 0.10273972602739725}

coboleval's People

Contributors

rmuller-ml avatar ggordonhall avatar

Stargazers

Kaito Sugimoto avatar ishan avatar Naosuke Ito avatar  avatar Arromal Jamuna Jayan avatar Jialiang Chen avatar  avatar  avatar  avatar Richard Metzler avatar  avatar Tushar Jain avatar Jakub Wieczorek avatar Sayan Das avatar  avatar Dao Trung Hieu avatar Nikos Katirtzis avatar  avatar  avatar  avatar MacGnG avatar Nicolás Pérez avatar Alex Kwiatkowski avatar Yakup Cemil KAYABAŞ avatar Rob Mills avatar fumbl3b avatar

Watchers

 avatar

coboleval's Issues

Support for FIM based prompting

As mentioned in this blogpost, there is a FIM based way to prompt the models followed by rearranging the prompt + output so that the Working Storage Section and Linkage Section are in correct order. Do you plan to add support for this style of prompting in this repo?

I am assuming that the Pass@1 and %Compile scores mentioned in the blog are without using FIM based prompting?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.