GithubHelp home page GithubHelp logo

math's Introduction

Measuring Mathematical Problem Solving With the MATH Dataset

This is the repository for Measuring Mathematical Problem Solving With the MATH Dataset by Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt.

This repository contains dataset loaders and evaluation code.

Download the MATH dataset here.

Download the AMPS pretraining dataset here.

Citation

If you find this useful in your research, please consider citing

@article{hendrycksmath2021,
  title={Measuring Mathematical Problem Solving With the MATH Dataset},
  author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},
  journal={NeurIPS},
  year={2021}
}

math's People

Contributors

collin-burns avatar erictang000 avatar hacobe avatar hendrycks avatar ssss1029 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

math's Issues

ValueError: not enough values to unpack (expected 2, got 1)

Hi, sorry im new in this field.

!python tune_gpt.py --khan-dataroot /content/amps/khan/ --save-dir /content/drive/MyDrive/model/

when i using the above code on google colap, i got error

Traceback (most recent call last):
File "tune_gpt.py", line 333, in
main()
File "tune_gpt.py", line 318, in main
train_data = get_dataset(args)
File "tune_gpt.py", line 239, in get_dataset
len_multiplier, dirname = args.khan_dataroot.split("@")
ValueError: not enough values to unpack (expected 2, got 1)

How to fix this?

File lists for mathematica

For pre-training on amps where are the files "no_steps_flist_relative.txt" and "with_steps_flist_relative.txt" .

Are they the concatenation of all *.txt in folder "data_file_lists" ?

There is a missing swap file "make_flists.py.swp" in the mathematical root folder. If possible please share the same?

for eg. is this what is expected :

filenames = ['no_steps_flist_relative_algebra.txt', 'no_steps_flist_relative_calculus.txt', 'no_steps_flist_relative_counting_and_statistics.txt', 'no_steps_flist_relative_geometry.txt', 'no_steps_flist_relative_linear_algebra.txt', 'no_steps_flist_relative_number_theory.txt']
with open('./no_steps_flist_relative.txt', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

Some latex typos in Math dataset

I try to show the dataset in a pdf file, and most of the latex files are good to transfer. But there are still some latex typos in:

  1. test/intermediate_algebra/44.json

A URL appears to be mistakenly included in a JSON file, like

    "solution": "... the diagram,\n\nhttp://aops-classroom.s3.amazonaws.com/Algebra3/Algebra3_Putnam1958_Morning1.png\n\n ..."

Is it possible to correct this into a parsable form?

  1. test/prealgebra/1117.json
"solution": "A $60 coat with a 20$\\%$ discount
  1. test/number_theory/407.json
    something like:
6 3 _ _ _ _
  1. ./test/prealgebra/1645.json
times $24.50 per square yard, or $\boxed{735}$ dollars.

No module named 'dataset.deepmind'

Hi, similar to previous :

Traceback (most recent call last):
  File "eval_math_gpt.py", line 37, in <module>
    from dataset.deepmind import DeepMindMathDataset
ModuleNotFoundError: No module named 'dataset.deepmind'

On commenting out the same next error :

Traceback (most recent call last):
  File "eval_math_gpt.py", line 360, in <module>
    parser.add_argument('--arch', default='gpt2', choices=transformers.GPT2_PRETRAINED_MODEL_ARCHIVE_LIST)
AttributeError: module 'transformers' has no attribute 'GPT2_PRETRAINED_MODEL_ARCHIVE_LIST'

math_equivalence.is_equiv behaves incorrectly when the model output is a floating-point number and the correct answer is an integer

As the title describes, math_equivalence.is_equiv fails to work properly when the model output is a floating point number and the correct answer in the dataset is an integer.

Provide a test case that should pass as expectation:

def test_decimals_integer(self):
    test_in = "1.0"
    test_out = "1"
    self.assertTrue(is_equiv(test_in, test_out))

I hope to communicate on this issue and if it is confirmed that it should be supported, I can come and provide the pull request.

Some answers are incorrect

Q1

If $\det \mathbf{A} = -1,$ then find $\det (\mathbf{7A}).$

GT

In general, $\det (k \mathbf{A}) = k^2 \det \mathbf{A}.$ Thus,
[\det (7 \mathbf{A}) = 7^2 (-1) = \boxed{-49}.]

Problem & Correction

The GT answer is only true if A is 2x2, but that's not stated or implied in the question.

The general solution is -7^n, where n is the dimension of the matrix.

Q2

In the diagram below, $AB = AC = 115,$ $AD = 38,$ and $CF = 77.$ Compute $\frac{[CEF]}{[DBE]}.$

[asy]
unitsize(0.025 cm);

pair A, B, C, D, E, F;

B = (0,0);
C = (80,0);
A = intersectionpoint(arc(B,115,0,180),arc(C,115,0,180));
D = interp(A,B,38/115);
F = interp(A,C,(115 + 77)/115);
E = extension(B,C,D,F);

draw(C--B--A--F--D);

label("$A$", A, N);
label("$B$", B, SW);
label("$C$", C, NE);
label("$D$", D, W);
label("$E$", E, SW);
label("$F$", F, SE);
[/asy]

GT

\begin{align*} \frac{[CEF]}{[DBE]} &= \frac{\frac{1}{2} \cdot EF \cdot CE \cdot \sin \angle CEF}{\frac{1}{2} \cdot DE \cdot BE \cdot \sin \angle BED} \ &= \frac{EF}{DE} \cdot \frac{CE}{BE} \cdot \frac{\sin \angle CEF}{\sin \angle BED} \ &= \boxed{\frac{19}{96}}. \end{align*}

(In case it doesn't render, the final answer says 19/96)

Problem & Correction

$\frac{[CEF]}{[DBE]} = \frac{EF}{DE} \cdot \frac{CE}{BE} = 1 \cdot \frac{96}{19} = \frac{96}{19}$

In other words, the GT answer is inverted

Q3

The real number $x$ satisfies
[3x + \frac{1}{2x} = 3.]

Find
[64x^6 + \frac{1}{729x^6}.]

GT

Multiplying both sides of $3x + \frac{1}{2x} = 3$ by $\frac{2}{3},$ we get
[2x + \frac{1}{3x} = 2.]

Squaring both sides, we get
[4x^2 + \frac{4}{3} + \frac{1}{9x^2} = 4,]

so
[4x^2 + \frac{1}{9x^2} = \frac{8}{3}.]

Cubing both sides, we get

[64x^3 + 3 \cdot \frac{(4x^2)^2}{9x^2} + 3 \cdot \frac{4x^2}{(9x^2)^2} + \frac{1}{729x^6} = \frac{512}{27}.]

Then
\begin{align*}
64x^3 + \frac{1}{729x^6} &= \frac{512}{27} - \frac{3 \cdot 4x^2}{9x^2} \left( 4x^2 + \frac{1}{9x^2} \right) \
&= \frac{512}{27} - \frac{3 \cdot 4}{9} \cdot \frac{8}{3} \
&= \boxed{\frac{416}{27}}.
\end{align*}

Problem & Correction

The first term in the cubed expression should be 64x^6, not 64x^3.

Q4

Let $a,$ $b,$ $c$ be distinct complex numbers such that
[\frac{a}{1 - b} = \frac{b}{1 - c} = \frac{c}{1 - a} = k.]

Find the sum of all possible values of $k.$

GT

From the given equation,
\begin{align*}
a &= k(1 - b), \
b &= k(1 - c), \
c &= k(1 - a).
\end{align*}Then
\begin{align*}
a &= k(1 - b) \
&= k(1 - k(1 - c)) \
&= k(1 - k(1 - k(1 - a))).
\end{align*}Expanding, we get $ak^3 + a - k^3 + k^2 - k = 0,$ which factors as
[(k^2 - k + 1)(ak + a - k) = 0.]If $ak + a - k = 0,$ then $a = \frac{k}{k + 1},$ in which case $b = c = \frac{k}{k + 1}.$ This is not allowed, as $a,$ $b,$ and $c$ are distinct, so $k^2 - k + 1 = 0.$ The sum of the roots is $\boxed{1}.$

Note: The roots of $k^2 - k + 1 = 0$ are
[\frac{1 \pm i \sqrt{3}}{2}.]For either value of $k,$ we can take $a = 0,$ $b = 1,$ and $c = k.$

Problem & Solution

$a = 0,$ $b = 1,$ and $c = k$ is not permitted since it would make $\frac{a}{1 - b}$ undefined

typo in `_strip_string`?

Description

There appears to be a typo in the _strip_string function of the math_equivalence.py file, where an unnecessary escape character is used with the percent sign. The code uses string.replace("\%", "") when it seems to me that string.replace("%", "") was intended. I'm not 100% certain whether this behavior is unintentional but currently is_equiv("50%", "50") == False. The current code also throws a warning for python 3.11.

string = string.replace("\%", "")

Suggestion

string = string.replace("\%", "")
->
string = string.replace("%", "")

train data has an example without a { } for box

data point

data/MATH/train/algebra/24014.json

has string:

{
    "problem": "What is the largest value of $x$, if $\\frac{x}{5} + \\frac{1}{5x} = \\frac{1}{2}$?",
    "level": "Level 3",
    "type": "Algebra",
    "solution": "We multiply both sides of the equation by $10x$ to clear the fractions, leaving us with $2x^2 + 2 = 5x$. Rearranging the terms, we have $2x^2 - 5x + 2 = 0$. We can now solve for $x$ by factoring: $(2x - 1)(x - 2) = 0$. We could also use the quadratic formula:  $$x = \\frac{5 \\pm \\sqrt{(-5)^2 - 4(2)(2)}}{4}.$$Either way, we find that $x = 1/2$ or $x = 2$. Since we want the largest value of $x$, our answer is $\\boxed 2$."
}

but it should be:

{
    "problem": "What is the largest value of $x$, if $\\frac{x}{5} + \\frac{1}{5x} = \\frac{1}{2}$?",
    "level": "Level 3",
    "type": "Algebra",
    "solution": "We multiply both sides of the equation by $10x$ to clear the fractions, leaving us with $2x^2 + 2 = 5x$. Rearranging the terms, we have $2x^2 - 5x + 2 = 0$. We can now solve for $x$ by factoring: $(2x - 1)(x - 2) = 0$. We could also use the quadratic formula:  $$x = \\frac{5 \\pm \\sqrt{(-5)^2 - 4(2)(2)}}{4}.$$Either way, we find that $x = 1/2$ or $x = 2$. Since we want the largest value of $x$, our answer is $\\boxed{2}$."
}

How does custom merges file effect tokenizer ?

What is the default value used for the tokenizer-merges-file ?

Do you use the default merges_gpt2.txt or the custom digits removed file merges_gpt2_single_digit_numbers.txt

My understanding is that the file merges.txt is build during the training of the BBPE (Byte Level BPE) tokenizer on the corpus: it gets a new entry (line) at each iteration of the tokenizer to find the byte pairs most frequent.

How di you verify for this design decision?
I understand the need for the "clean" merges file, but using a new merges file with pre-trained weights, wouldn't that be an error ? Since now there are tokens missing as compared to what the gpt2 was trained with?

Or should one re-train the tokenizer itself, on the current dataset vocab ?

evaluation accuracy remains 0

Thanks for your great work in creating this dataset, I have questions while evaluating llama2-7b-chat on this dataset.

  • The accuracy of the llama2-7b-chat output remains 0 when the training goes, here is my code:
def acc(eval_preds:EvalPrediction):
        logits, labels = eval_preds
        preds = tokenizer.batch_decode(logits, skip_special_tokens=True)
        labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
        save_results(preds, labels) # save results to json file
        preds = [last_boxed_only_string(s) for s in preds]
        correct = 0
        total = 0
        for pred, label in zip(preds, labels):
            if is_equiv(pred, label):
                correct += 1
            total += 1
        return {"accuracy": correct / total}
    return acc
  • whether the preprocess is required or recommended to use?

Which version of transformers lib is being used ?

Hi, Thanks for the updates.
Still unable to run evaluation script.


Traceback (most recent call last):
  File "eval_math_gpt.py", line 373, in <module>
    run_eval(args)
  File "eval_math_gpt.py", line 174, in run_eval
    output_ids = model.generate(
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'GPT2LMHeadModel' object has no attribute 'generate'

Meaning of 3-digit folder names under Khan dataset

Hi,

First of all, thank you so much for making the dataset publicly available!

For the Khan dataset as part of AMPS, I wanted to ask the meanings/topics of the 3-digit named folders -- varying between 124 and 547. Is there any way to find the mapping for these?

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.