Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Encoding/decoding transformation of the text (2.3 Converting tokens into token IDs) about llms-from-scratch HOT 1 CLOSED

rasbt commented on May 16, 2024

Encoding/decoding transformation of the text (2.3 Converting tokens into token IDs)

from llms-from-scratch.

Comments (1)

rasbt commented on May 16, 2024 1

Formally, this does not matter for our case, because we do not take into account spaces, but in general, here we do not precisely restore the original text, right?

Yes, that correct, we can't complete restore the original text. I was trying to go more for simplicity here to keep the code shorter and simpler since we are using BPE anyways.

Could you please tell if you are interested in such insignificant feedback like this or it is not worth the notes or new issues?

In general, I do appreciate comments like this. Thanks! Some readers may have similar questions and thus it doesn't hurt to add short notes about it.

from llms-from-scratch.

Related Issues (20)

In 3.3.1, there seems to be a missing image between "The attention weights and context vector calculation are summarized in the figure below:" and "The code below walks through the figure above step by step." HOT 1
RuntimeError: size mismatch - ch05/03_bonus_pretraining_on_gutenberg HOT 2
book feedback HOT 1
Inconsistencies between the code in the book and the notebooks (2.6 Data sampling with a sliding window) HOT 7
Output of the cell without variable specified (Embedding Layers and Linear Layers) HOT 1
Wrong number of token ids specified in the notebook (2.7 Creating token embeddings) HOT 1
Incorrect description of function torch.arange() (2.8 Encoding word positions) HOT 1
Inconsistencies in output for dropout section (3.5.2 Masking additional attention weights with dropout) HOT 1
Probably a typo in multi-head attention description (3.6.1 Stacking multiple single-head attention layers) HOT 1
Solution for Exercise 3.2 is included in the notebook with main code (3.6.1 Stacking multiple single-head attention layers) HOT 1
Question about implementation of CausalAttention class (3.5.3 Implementing a compact causal self-attention class) HOT 6
Inconsistencies in unsqueeze operation description in the book and in notebook and its necessity (3.6.2 Implementing multi-head attention with weight splits) HOT 4
Solution for Exercise 3.3 is included in the notebook with main code (3.6.2 Implementing multi-head attention with weight splits) HOT 1
Inconsistencies in MHA Wrapper Implementation Between Chapter 3 Main Content and Bonus Material HOT 1
Offering Chinese Translation for 'Build a Large Language Model From Scratch HOT 3
Chapter 5 - Context Size and the DataLoaders HOT 2
Feedback: Stripe output from notebook HOT 2
About endoftext in ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py HOT 14
Contributions for Chinese simplified version HOT 4
{Q} : Replacing the LlamaDecoderLayer Class hugging Face With New LongNet

Encoding/decoding transformation of the text (2.3 Converting tokens into token IDs) about llms-from-scratch HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs