akjindal53244 / arithmo Goto Github PK

View Code? Open in Web Editor NEW

68.0 3.0 5.0 492 KB

Small and Efficient Mathematical Reasoning LLMs

License: Apache License 2.0

Python 100.00%

gsm8k large-language-models llm mathematical-reasoning mistral-7b

arithmo's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes pruthwik shaunstoltz anminhhung blaze2017

arithmo's Issues

Ask for adding a new baseline——MuggleMATH in the Comparing Arithmo-Mistral-7B with other LLM models section

Hi,
Could you please add a new baseline——MuggleMATH in the Comparing Arithmo-Mistral-7B with other LLM models section on the github webpage?MuggleMATH mainly investigates the scaling law and generalization for data augmentation for LLM math reasoning and gets comparable results in GSM8K. The address of paper is https://arxiv.org/abs/2310.05506 .
Thanks a lot!

how do you construct your test set (11k) ?

I appreciate your dataset for math reasoning, But can you provide me more details for how you construct your test data (11k size in listed in the huggingface)? https://huggingface.co/datasets/akjindal53244/Arithmo-Data

training script?

would be good to release the training process as well :)

Inquiry on Data Deduplication, Random Lower-Casing, and PoT Prompts Diversity

Hello,

I truly admire your work on fine-tuning LLMs for mathematical reasoning and I have a few questions about the data preprocessing. I would appreciate some insights into the following aspects:

1. Data Deduplication Impact

In the data preparation phase, you mentioned that deduplication was applied. Could you specify what percentage of the data was removed through this process? How does this affect the overall quality and diversity of the final dataset?

2. Effects of Random Lower-Casing

The preprocessing steps include randomly lower-casing a certain percentage of inputs. What was the rationale behind this choice? Does the case variation of letters impact the fine-tuning results of the model?

3. Diversity of PoT Prompts

The training process incorporates a diverse set of Python prompts for the PoT. Could you share some insights on how this diversity compares to using a single prompt style in terms of model performance? What led to the decision to use such a varied approach?

I am also looking forward to any papers or further documentation you might release on this project, as I believe they would be incredibly informative.

Thank you for your dedication to this project and for taking the time to address my inquiries. Your work is truly inspiring.

Best regards,
lyf-00

CUDA is out of memory

Hi,

when I am trying to use your model for inference on my data, I get 'CUDA is out of memory' error.
when i try to quantize the model using bitsandbytes using your query_model.py, I get the following error while importing bitsandbytes:

File "/home/.conda/envs/designtodoc/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1355, in_get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
[Errno 13] Permission denied: '/fs/applications/jupyterhub/gpu.jupyterhub.rng-dl01/srv/jupyterhub'

unable to load the model

File "/home/prdsfwjehm/semanticgraph/few-shot-learning-mistral.py", line 7, in
model = AutoModelForCausalLM.from_pretrained("akjindal53244/Arithmo-Mistral-7B", device_map={"": "cpu"})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/prdsfwjehm/.conda/envs/semanticgraph/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 527, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/prdsfwjehm/.conda/envs/semanticgraph/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py",line 1039, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/prdsfwjehm/.conda/envs/semanticgraph/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py",line 734, in getitem
raise KeyError(key)
KeyError: 'mistral'

akjindal53244 / arithmo Goto Github PK

arithmo's People

Contributors

Stargazers

Watchers

Forkers

arithmo's Issues

Ask for adding a new baseline——MuggleMATH in the Comparing Arithmo-Mistral-7B with other LLM models section

how do you construct your test set (11k) ?

training script?

Inquiry on Data Deduplication, Random Lower-Casing, and PoT Prompts Diversity

CUDA is out of memory

unable to load the model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs