karpathy / makemore Goto Github PK
View Code? Open in Web Editor NEWAn autoregressive character-level language model for making more things
License: MIT License
An autoregressive character-level language model for making more things
License: MIT License
Hi!
thanks for this little piece of juicy code!
Just for curiosity, I've noticed that in your implementation you are using nn.LayerNorm
with the standard denominator constant eps=1e-5
, whereas in other implementations (DINO
[here] and ViT
in timm
[here]) this parameter is explicitly set to eps=1e-6
.
I know that it is a small detail, but details sometimes are super-important for having better models.
Do you think the model is sensitive to this kind of parameter change? Have you ever tried/noticed it?
Thanks!
Hi @karpathy, thanks for that great repo!
Maybe it would be better to note in your code that while you're training by minimizing the CE loss, Bengio actually maximized the log-likelihood. I know that it is equivalent in this case (one-hot vectors as ground-truth), but that's not the case in general, so maybe better to note. Thanks!
In addition to PyTorch, makemore.py file requires TensorBoard installation. Maybe add that on README file?
Thank you karpathy for open sourcing this great course series.
I think the discussion board need be opened.
I found that in the process of learning, there were many thoughts and questions rather than issues. I think these thoughts should be enlightening to others, but because they are not issues, I cannot find a suitable place to post these thoughts.
To those in the know, is there possibly any newer alternatives that work better and do the same thing as this? I fear I'm missing out on something more effective. I literally just want to make more stuff from a dataset.
Here you are padding the tensor with special starting token. It looks strange to me that you are doing it inside the embedding. Isn't this strange?
Aren't you supposed to first pass the special token through the embedding first and then add that as a padding?
tok_emb = self.wte(idx) # token embeddings of shape (b, t, n_embd)
idx = torch.roll(idx, 1, 1)
# something like this instead?
idx[:, 0] = self.wte(self.vocab_size) # special <BLANK> token
embs.append(tok_emb)
If we had labels for these names, such as:
| name | is_palindrome | h_index | scrabble_score |
|--------+---------------+---------+----------------|
| anna | 1 | 4 | 4 |
| jake | 0 | 1 | 15 |
| bob | 1 | 7 | 7 |
| karen | 0 | 8 | 8 |
| andrej | 0 | 11 | 14 |
| ... | | | |
Can makemore
-style generative models be modified to perform classification so I can feed in a new name like asdf
and get a prediction for its h_index
?
While a suggestion like "add this layer here" would absolutely be helpful, I'm secretly hoping someone will share a general, intuitive way to think about repurposing machine learning models for new tasks...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.