hendrycks / math Goto Github PK

The MATH Dataset (NeurIPS 2021)

License: MIT License

Python 100.00%

math's Introduction

Measuring Mathematical Problem Solving With the MATH Dataset

This is the repository for Measuring Mathematical Problem Solving With the MATH Dataset by Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt.

This repository contains dataset loaders and evaluation code.

Download the MATH dataset here.

Download the AMPS pretraining dataset here.

Citation

If you find this useful in your research, please consider citing

@article{hendrycksmath2021,
  title={Measuring Mathematical Problem Solving With the MATH Dataset},
  author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},
  journal={NeurIPS},
  year={2021}
}

math's People

Contributors

Stargazers

Watchers

math's Issues

ValueError: not enough values to unpack (expected 2, got 1)

Hi, sorry im new in this field.

!python tune_gpt.py --khan-dataroot /content/amps/khan/ --save-dir /content/drive/MyDrive/model/

when i using the above code on google colap, i got error

Traceback (most recent call last):
File "tune_gpt.py", line 333, in
main()
File "tune_gpt.py", line 318, in main
train_data = get_dataset(args)
File "tune_gpt.py", line 239, in get_dataset
len_multiplier, dirname = args.khan_dataroot.split("@")
ValueError: not enough values to unpack (expected 2, got 1)

How to fix this?

MATH data set missing [IMPORTANT]

MATH data set missing:

Could not bind. Reason: Can't contact LDAP server

File lists for mathematica

For pre-training on amps where are the files "no_steps_flist_relative.txt" and "with_steps_flist_relative.txt" .

Are they the concatenation of all *.txt in folder "data_file_lists" ?

There is a missing swap file "make_flists.py.swp" in the mathematical root folder. If possible please share the same?

for eg. is this what is expected :

filenames = ['no_steps_flist_relative_algebra.txt', 'no_steps_flist_relative_calculus.txt', 'no_steps_flist_relative_counting_and_statistics.txt', 'no_steps_flist_relative_geometry.txt', 'no_steps_flist_relative_linear_algebra.txt', 'no_steps_flist_relative_number_theory.txt']
with open('./no_steps_flist_relative.txt', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

Some latex typos in Math dataset

I try to show the dataset in a pdf file, and most of the latex files are good to transfer. But there are still some latex typos in:

test/intermediate_algebra/44.json

A URL appears to be mistakenly included in a JSON file, like

    "solution": "... the diagram,\n\nhttp://aops-classroom.s3.amazonaws.com/Algebra3/Algebra3_Putnam1958_Morning1.png\n\n ..."

Is it possible to correct this into a parsable form?

test/prealgebra/1117.json

"solution": "A $60 coat with a 20$\\%$ discount

test/number_theory/407.json
something like:

6 3 _ _ _ _

./test/prealgebra/1645.json

times $24.50 per square yard, or $\boxed{735}$ dollars.

Where are the examples in the readme?

I looked in both MATh and amps but couldn't find them. Where are they?

Will post the output of grep in a bit.

No module named 'dataset.deepmind'

Hi, similar to previous :

Traceback (most recent call last):
  File "eval_math_gpt.py", line 37, in <module>
    from dataset.deepmind import DeepMindMathDataset
ModuleNotFoundError: No module named 'dataset.deepmind'

On commenting out the same next error :

Traceback (most recent call last):
  File "eval_math_gpt.py", line 360, in <module>
    parser.add_argument('--arch', default='gpt2', choices=transformers.GPT2_PRETRAINED_MODEL_ARCHIVE_LIST)
AttributeError: module 'transformers' has no attribute 'GPT2_PRETRAINED_MODEL_ARCHIVE_LIST'

math_equivalence.is_equiv behaves incorrectly when the model output is a floating-point number and the correct answer is an integer

As the title describes, math_equivalence.is_equiv fails to work properly when the model output is a floating point number and the correct answer in the dataset is an integer.

Provide a test case that should pass as expectation:

def test_decimals_integer(self):
    test_in = "1.0"
    test_out = "1"
    self.assertTrue(is_equiv(test_in, test_out))

I hope to communicate on this issue and if it is confirmed that it should be supported, I can come and provide the pull request.

quickstart for MATH eval?

how do I evaluate a model with MATH? any code, bash etc?

Some answers are incorrect

Q1

If $\det \mathbf{A} = -1,$ then find $\det (\mathbf{7A}).$

GT

In general, $\det (k \mathbf{A}) = k^2 \det \mathbf{A}.$ Thus,
[\det (7 \mathbf{A}) = 7^2 (-1) = \boxed{-49}.]

Problem & Correction

The GT answer is only true if A is 2x2, but that's not stated or implied in the question.

The general solution is -7^n, where n is the dimension of the matrix.

Q2

In the diagram below, $AB = AC = 115,$ $AD = 38,$ and $CF = 77.$ Compute $\frac{[CEF]}{[DBE]}.$

[asy]
unitsize(0.025 cm);

pair A, B, C, D, E, F;

B = (0,0);
C = (80,0);
A = intersectionpoint(arc(B,115,0,180),arc(C,115,0,180));
D = interp(A,B,38/115);
F = interp(A,C,(115 + 77)/115);
E = extension(B,C,D,F);

draw(C--B--A--F--D);

label("$A$", A, N);
label("$B$", B, SW);
label("$C$", C, NE);
label("$D$", D, W);
label("$E$", E, SW);
label("$F$", F, SE);
[/asy]

GT

\begin{align*} \frac{[CEF]}{[DBE]} &= \frac{\frac{1}{2} \cdot EF \cdot CE \cdot \sin \angle CEF}{\frac{1}{2} \cdot DE \cdot BE \cdot \sin \angle BED} \ &= \frac{EF}{DE} \cdot \frac{CE}{BE} \cdot \frac{\sin \angle CEF}{\sin \angle BED} \ &= \boxed{\frac{19}{96}}. \end{align*}

(In case it doesn't render, the final answer says 19/96)

Problem & Correction

$\frac{[CEF]}{[DBE]} = \frac{EF}{DE} \cdot \frac{CE}{BE} = 1 \cdot \frac{96}{19} = \frac{96}{19}$

In other words, the GT answer is inverted

Q3

The real number $x$ satisfies
[3x + \frac{1}{2x} = 3.]

Find
[64x^6 + \frac{1}{729x^6}.]

GT

Multiplying both sides of $3x + \frac{1}{2x} = 3$ by $\frac{2}{3},$ we get
[2x + \frac{1}{3x} = 2.]

Squaring both sides, we get
[4x^2 + \frac{4}{3} + \frac{1}{9x^2} = 4,]

so
[4x^2 + \frac{1}{9x^2} = \frac{8}{3}.]

Cubing both sides, we get

[64x^3 + 3 \cdot \frac{(4x^2)^2}{9x^2} + 3 \cdot \frac{4x^2}{(9x^2)^2} + \frac{1}{729x^6} = \frac{512}{27}.]

Then
\begin{align*}
64x^3 + \frac{1}{729x^6} &= \frac{512}{27} - \frac{3 \cdot 4x^2}{9x^2} \left( 4x^2 + \frac{1}{9x^2} \right) \
&= \frac{512}{27} - \frac{3 \cdot 4}{9} \cdot \frac{8}{3} \
&= \boxed{\frac{416}{27}}.
\end{align*}

Problem & Correction

The first term in the cubed expression should be 64x^6, not 64x^3.

Q4

Let $a,$ $b,$ $c$ be distinct complex numbers such that
[\frac{a}{1 - b} = \frac{b}{1 - c} = \frac{c}{1 - a} = k.]

Find the sum of all possible values of $k.$

GT

From the given equation,
\begin{align*}
a &= k(1 - b), \
b &= k(1 - c), \
c &= k(1 - a).
\end{align*}Then
\begin{align*}
a &= k(1 - b) \
&= k(1 - k(1 - c)) \
&= k(1 - k(1 - k(1 - a))).
\end{align*}Expanding, we get $ak^3 + a - k^3 + k^2 - k = 0,$ which factors as
[(k^2 - k + 1)(ak + a - k) = 0.]If $ak + a - k = 0,$ then $a = \frac{k}{k + 1},$ in which case $b = c = \frac{k}{k + 1}.$ This is not allowed, as $a,$ $b,$ and $c$ are distinct, so $k^2 - k + 1 = 0.$ The sum of the roots is $\boxed{1}.$

Note: The roots of $k^2 - k + 1 = 0$ are
[\frac{1 \pm i \sqrt{3}}{2}.]For either value of $k,$ we can take $a = 0,$ $b = 1,$ and $c = k.$

Problem & Solution

$a = 0,$ $b = 1,$ and $c = k$ is not permitted since it would make $\frac{a}{1 - b}$ undefined

is this a typo error?

math/modeling/dataset/MATH.py

Line 63 in 766462e

if self.mode_answe == 'peeking_only':

typo in `_strip_string`?

Description

There appears to be a typo in the _strip_string function of the math_equivalence.py file, where an unnecessary escape character is used with the percent sign. The code uses string.replace("\%", "") when it seems to me that string.replace("%", "") was intended. I'm not 100% certain whether this behavior is unintentional but currently is_equiv("50%", "50") == False. The current code also throws a warning for python 3.11.

math/modeling/math_equivalence.py

Line 104 in 357963a

string = string.replace("\%", "")

Suggestion

string = string.replace("\%", "")
->
string = string.replace("%", "")

train data has an example without a { } for box

data point

data/MATH/train/algebra/24014.json

has string:

{
    "problem": "What is the largest value of $x$, if $\\frac{x}{5} + \\frac{1}{5x} = \\frac{1}{2}$?",
    "level": "Level 3",
    "type": "Algebra",
    "solution": "We multiply both sides of the equation by $10x$ to clear the fractions, leaving us with $2x^2 + 2 = 5x$. Rearranging the terms, we have $2x^2 - 5x + 2 = 0$. We can now solve for $x$ by factoring: $(2x - 1)(x - 2) = 0$. We could also use the quadratic formula:  $$x = \\frac{5 \\pm \\sqrt{(-5)^2 - 4(2)(2)}}{4}.$$Either way, we find that $x = 1/2$ or $x = 2$. Since we want the largest value of $x$, our answer is $\\boxed 2$."
}

but it should be:

{
    "problem": "What is the largest value of $x$, if $\\frac{x}{5} + \\frac{1}{5x} = \\frac{1}{2}$?",
    "level": "Level 3",
    "type": "Algebra",
    "solution": "We multiply both sides of the equation by $10x$ to clear the fractions, leaving us with $2x^2 + 2 = 5x$. Rearranging the terms, we have $2x^2 - 5x + 2 = 0$. We can now solve for $x$ by factoring: $(2x - 1)(x - 2) = 0$. We could also use the quadratic formula:  $$x = \\frac{5 \\pm \\sqrt{(-5)^2 - 4(2)(2)}}{4}.$$Either way, we find that $x = 1/2$ or $x = 2$. Since we want the largest value of $x$, our answer is $\\boxed{2}$."
}

How does custom merges file effect tokenizer ?

What is the default value used for the tokenizer-merges-file ?

Do you use the default merges_gpt2.txt or the custom digits removed file merges_gpt2_single_digit_numbers.txt

My understanding is that the file merges.txt is build during the training of the BBPE (Byte Level BPE) tokenizer on the corpus: it gets a new entry (line) at each iteration of the tokenizer to find the byte pairs most frequent.

How di you verify for this design decision?
I understand the need for the "clean" merges file, but using a new merges file with pre-trained weights, wouldn't that be an error ? Since now there are tokens missing as compared to what the gpt2 was trained with?

Or should one re-train the tokenizer itself, on the current dataset vocab ?

evaluation accuracy remains 0

Thanks for your great work in creating this dataset, I have questions while evaluating llama2-7b-chat on this dataset.

The accuracy of the llama2-7b-chat output remains 0 when the training goes, here is my code:

def acc(eval_preds:EvalPrediction):
        logits, labels = eval_preds
        preds = tokenizer.batch_decode(logits, skip_special_tokens=True)
        labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
        save_results(preds, labels) # save results to json file
        preds = [last_boxed_only_string(s) for s in preds]
        correct = 0
        total = 0
        for pred, label in zip(preds, labels):
            if is_equiv(pred, label):
                correct += 1
            total += 1
        return {"accuracy": correct / total}
    return acc

whether the preprocess is required or recommended to use?

Which version of transformers lib is being used ?

Hi, Thanks for the updates.
Still unable to run evaluation script.


Traceback (most recent call last):
  File "eval_math_gpt.py", line 373, in <module>
    run_eval(args)
  File "eval_math_gpt.py", line 174, in run_eval
    output_ids = model.generate(
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'GPT2LMHeadModel' object has no attribute 'generate'

Meaning of 3-digit folder names under Khan dataset

Hi,

First of all, thank you so much for making the dataset publicly available!

For the Khan dataset as part of AMPS, I wanted to ask the meanings/topics of the 3-digit named folders -- varying between 124 and 547. Is there any way to find the mapping for these?

Thanks in advance!

ModuleNotFoundError: No module named 'dataset.aops'

Traceback (most recent call last):
  File "eval_math_gpt.py", line 36, in <module>
    from dataset.aops import AOPSMathDataset
ModuleNotFoundError: No module named 'dataset.aops'

Which file is this ??

hendrycks / math Goto Github PK

math's Introduction

Measuring Mathematical Problem Solving With the MATH Dataset

Citation

math's People

Contributors

Stargazers

Watchers

Forkers

math's Issues

Q1

GT

Problem & Correction

Q2

GT

Problem & Correction

Q3

GT

Problem & Correction

Q4

GT

Problem & Solution

Description

Suggestion

Recommend Projects

Recommend Topics

Recommend Org

Jobs