GithubHelp home page GithubHelp logo

grok's Introduction

grok's People

Contributors

aletheap avatar eltociear avatar yburda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grok's Issues

[Bug] Function definition missing in data.py

The make_data.py file in scripts invokes create_data_files(args.data_directory) where default directory as setup is data.

The function invokes

ArithmeticTokenizer.create_token_file(data_dir)
ArithmeticDataset.create_dataset_files(data_dir)

both of which are not defined in the corresponding class. This breaks the code when running for multiple experiments.

questions regarding modulus division

Thanks for the very interesting paper! I have two questions regarding modular division:

  1. Does figure 1 of the paper correspond to this equation x◦y=x/y (mod p) for 0≤x<p, 0<y<p where p == 97?
  2. Wouldn't x/y (mod p) produce fractional results? How would you get the cross-entropy loss (against these fractional targets) then?

I tried staring at the code but couldn't really connect the dots :(

Grok is actually llama!

I've always wondered least amount of work went into grok.

Just a finetune of llama 7bn model

what say

Setup package versions

Hello,

Can you include which versions of the libraries used in this project are compatible with it? Someone has an open pull request for issues with pytorch-lightning but even fixing that, there's still issues on my version of torch. Would be helpful to know what was used to reproduce results.

Attribute error when running train.py

As the in title, I'm running into an AttributeError: can't set attribute when running train.py, is it because I'm using Python 3.8 or something is off with the passing of hparams?
error

Non-working code from OpenAI

Why is this bug happening?

(myenv) shyamaluser@Shyamals-iMac grok % ./scripts/train.py 
Namespace(random_seed=-1, gpu=0, max_epochs=None, max_steps=100000, batchsize=0, n_layers=2, n_heads=4, d_model=128, dropout=0.0, weight_noise=0.0, non_linearity='relu', max_context_len=50, math_operator='+', operand_length=None, train_data_pct=5, warmup_steps=10, anneal_lr_steps=100000, anneal_lr=False, max_lr=0.001, weight_decay=0, weight_decay_kind='to_zero', noise_factor=0, save_activations=False, save_outputs=False, logdir='/Users/shyamaluser/grok', datadir='/Users/shyamaluser/grok/data')
Traceback (most recent call last):
  File "/Users/shyamaluser/grok/./scripts/train.py", line 14, in <module>
    print(grok.training.train(hparams))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shyamaluser/grok/grok/training.py", line 703, in train
    model = TrainableTransformer(hparams).float()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shyamaluser/grok/grok/training.py", line 50, in __init__
    self.hparams = hparams  # type: ignore
    ^^^^^^^^^^^^
  File "/Users/shyamaluser/grok/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in __setattr__
    super().__setattr__(name, value)
AttributeError: property 'hparams' of 'TrainableTransformer' object has no setter

1) what? 😂

This is the troll that got me to join Github 🤣

Tasks

No tasks being tracked yet.

The label is in the input

Thank you for sharing your implementation.

If I understood correctly, the toy example in this paper is to train a network (Transformer) to solve equation:

$$a \circ b = c$$

given $a$, $b$ as inputs, predicting the correct $c$.

To translate it for the Transformer, we tokenize everything, and add end-of-sentence <|EOS|> token in the following fashion (which is suggested by this code).

<|EOS|> <a> <OP> <b> <=> <c> <|EOS|>

where <a> <b> and <c> are integers.

By design, we may use <|EOS|> <a> <OP> <b> <=> <?> <|EOS|> as the input, where <?> is a placeholder token for the solution to the equation. The output of the Transformer can be the predicted equation: <|EOS|> <a> <OP> <b> <=> <c_> <|EOS|>, where <c_> indicates the predicted token c. And the target should be the correct, full equation: <|EOS|> <a> <OP> <b> <=> <c> <|EOS|>. We then calculate the loss base on the second-to-the-last tokens: <c>, <c_>.

However, in this implementation, the input is the first 6 tokens, i.e. <|EOS|> <a> <OP> <b> <=> <c>, while the target is the last 6 tokens, i.e. <a> <OP> <b> <=> <c> <|EOS|>. The attached figure shows the first batch of x (input) and y (target) obtained in debugging form follow position

https://github.com/openai/grok/blob/43efed280af24a8837b05fd9c97a3d14f295666f/grok/training.py#L292C1-L293C63

Untitled

In the figure above, 0 indicating <|EOS|> token, 1 indicating '<=>' token, 6 indicating the '**2+' operation (which is a conditional equation depending on odd or even <a>)

The problem is the solution is already in the input x. Therefore, I think the model is trained on a wrong task.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.