karpathy / makemore Goto Github PK

View Code? Open in Web Editor NEW

2.2K 2.2K 566.0 175 KB

An autoregressive character-level language model for making more things

License: MIT License

Python 100.00%

makemore's People

Contributors

Stargazers

Watchers

Forkers

lampts nashid shreydan techthiyanes gngdb thesephist jingweiz evdcush smorodov kaweeo jwwtc laplacekorea syedfouzan ifkv frenchmatthew giovannizinzi treeform brennomello aterterian jimgehrmann tvremine eliplutchok castersen brennoalencar cambleck cvampal navagg hkuz eshvk dickkemp machinatoonist normanyu aidamash healthonrails kyvinhmai siddharth-gandhi mmlynarik shanunrandev123 sweetspin mmedek coffear valeman100 frexg rayed-therap andrew-pynch sriprarabdha stjordanis nagarajj kapilgarg1996 khufia adri1wald edsami17 leykun10 teslanik nicocereghini veysiadn ipanditi xoss haotieu2001 jaafarhammoud cmathw ohikava jawadatgithub taran11313 john3105 teslos johnnypeck kbalde the-intelligence-of-information dwarkeshsp nathanhillier jd8111997 xhook e56 briansigafoos aspiringastro mrcichon overlytic adriabrufau rishabhy allenqm dpirad007 iamdoron sibeshkar ccc-ai0 gitwail subratkishoredutta uccmen killmyfriends 8mikehawk connorsteph clam004 jaythaceo nicolezattarin lumijek asnecemnnit hidetana18 kartikkrishna97 andresn automationkit

makemore's Issues

LayerNorm eps value

Hi!

thanks for this little piece of juicy code!

Just for curiosity, I've noticed that in your implementation you are using nn.LayerNorm with the standard denominator constant eps=1e-5, whereas in other implementations (DINO [here] and ViT in timm[here]) this parameter is explicitly set to eps=1e-6.

I know that it is a small detail, but details sometimes are super-important for having better models.

Do you think the model is sensitive to this kind of parameter change? Have you ever tried/noticed it?

Thanks!

[Suggestion] Add a note about the training of Bengio et al. MLP

Hi @karpathy, thanks for that great repo!

Maybe it would be better to note in your code that while you're training by minimizing the CE loss, Bengio actually maximized the log-likelihood. I know that it is equivalent in this case (one-hot vectors as ground-truth), but that's not the case in general, so maybe better to note. Thanks!

Add TensorBoard to requirements

In addition to PyTorch, makemore.py file requires TensorBoard installation. Maybe add that on README file?

SyntaxError: invalid syntax

Tried running it the first time and seeing this.

need Discussions :)

Thank you karpathy for open sourcing this great course series.
I think the discussion board need be opened.
I found that in the process of learning, there were many thoughts and questions rather than issues. I think these thoughts should be enlightening to others, but because they are not issues, I cannot find a suitable place to post these thoughts.

Similar stuff

To those in the know, is there possibly any newer alternatives that work better and do the same thing as this? I fear I'm missing out on something more effective. I literally just want to make more stuff from a dataset.

Question about MLP

Here you are padding the tensor with special starting token. It looks strange to me that you are doing it inside the embedding. Isn't this strange?
Aren't you supposed to first pass the special token through the embedding first and then add that as a padding?

tok_emb = self.wte(idx) # token embeddings of shape (b, t, n_embd)
idx = torch.roll(idx, 1, 1)
# something like this instead?
idx[:, 0] = self.wte(self.vocab_size) # special <BLANK> token

embs.append(tok_emb)

Can these models also be used for classification?

If we had labels for these names, such as:

| name   | is_palindrome | h_index | scrabble_score |
|--------+---------------+---------+----------------|
| anna   |             1 |       4 |              4 |
| jake   |             0 |       1 |             15 |
| bob    |             1 |       7 |              7 |
| karen  |             0 |       8 |              8 |
| andrej |             0 |      11 |             14 |
| ...    |               |         |                |

Can makemore-style generative models be modified to perform classification so I can feed in a new name like asdf and get a prediction for its h_index?

While a suggestion like "add this layer here" would absolutely be helpful, I'm secretly hoping someone will share a general, intuitive way to think about repurposing machine learning models for new tasks...

karpathy / makemore Goto Github PK

makemore's People

Contributors

Stargazers

Watchers

Forkers

makemore's Issues

LayerNorm eps value

[Suggestion] Add a note about the training of Bengio et al. MLP

Add TensorBoard to requirements

SyntaxError: invalid syntax

need Discussions :)

Similar stuff

Question about MLP

Can these models also be used for classification?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs