I would like to discussing making the memebers of ConversationModel public. Here is wh

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

What about a setup like this for the read part. <div class="highlight highlight-so

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks I have tested <a class="issue-link js-issue-link" data-error-text="Failed to lo

Thank you, merged <a class="issue-link js-issue-link" data-error-text="Failed to load

Request: Make private `model` inside ConversationManager public about rust-bert HOT 9 CLOSED

guillaume-be commented on May 11, 2024

Request: Make private `model` inside ConversationManager public

from rust-bert.

Comments (9)

guillaume-be commented on May 11, 2024

Hello @QuantumEntangledAndy

In general I would like to keep the pipelines at a higher level of abstraction, so maybe for this case there would be a solution not requiring manipulating the models and tokenizers directly. Note that

conversation.history = conversation_manager.model.get_tokenizer().convert_tokens_to_ids(
    conversation_manager.model.get_tokenizer().tokenize_list(texts.to_vec())
)

would not generate a properly formed history, expecting eos tokens and actual token ids (tokenize_list returns a list of String).

Would it be more convenient if the pipeline was storing the history sequence (i.e. keep the sequence of prompts/responses separate as a Vec>) instead of an aggregated vector? This would probably be sufficient to trim to the last N inputs.

Would you still require access to the tokenizer? An alternative would be to load a tokenizer manually and use it to re-encode the prompts

tokenizer =   let tokenizer = Gpt2Tokenizer::from_file(vocab_path.to_str().unwrap(), merges_path.to_str().unwrap(), false,)?;

from rust-bert.

QuantumEntangledAndy commented on May 11, 2024

I would like a realiable what to read in history from a file too. Currently I save it as text (toml actually) with this format.

enum Speaker {
  Bot,
  User,
}

struct Past {
  speaker: Speaker
  idx: u64,
  message: String,
}

But on load I have noway of getting this data into the history.

from rust-bert.

QuantumEntangledAndy commented on May 11, 2024

Also if we do want to go down the abstraction route and focus on the higher level it might be better to not expose history at all. It is the tokenized and id form which is not realiably read or writable without the tokenizer.

from rust-bert.

QuantumEntangledAndy commented on May 11, 2024

What about a setup like this for the read part.

enum HistoryKind {
  Input,
  Output,
}
struct HistoryItem {
  kind: HistoryKind,
  message: Vec<i64>
}
struct Conversation {
  history: Vec<HistoryItem>
}

impl Conversation {
  pub fn get_outputs -> Vec<String> {
    self.history.filter(|i| i.kind == HistoryKind::Output).map(|i| tokens_to_string(i.message)).collect()
  }
  pub fn get_inputs -> Vec<String> {
    self.history.filter(|i| i.kind == HistoryKind::Input).map(|i| tokens_to_string(i.message)).collect()
  }
}

This completely disposes of generated_responses and inputs and relies on getting them from the history.

This doesn't solve my issue but it will change the model to one source of truth an set it up for easy trimming to N last input.

from rust-bert.

QuantumEntangledAndy commented on May 11, 2024

As a less invasive change we could:

maintain the current input and reposnes strings and the source of actual input and outputs regardless of context that generated them.
Change history to a list of list of ints split on the Eos
Add methods to set and get the history as a list of strings.

from rust-bert.

QuantumEntangledAndy commented on May 11, 2024

Issue with getting and setting history as strings is that it requires the tokenizer which is not available to Conversation only to the manager so these two methods would have to be conversation manager method.

from rust-bert.

guillaume-be commented on May 11, 2024

@QuantumEntangledAndy thank you for providing more details, I now understand better what you are trying to achieve.

As a less invasive change we could:

maintain the current input and reposnes strings and the source of actual input and outputs regardless of context that generated them.

Change history to a list of list of ints split on the Eos

Add methods to set and get the history as a list of strings.

These were my thoughts as well, pushing some changes that should allow you do load conversations from snapshots (see #89)

from rust-bert.

QuantumEntangledAndy commented on May 11, 2024

Thanks I have tested #89 and can confirm that is it working as intended :)

from rust-bert.

guillaume-be commented on May 11, 2024

Thank you, merged #89 to master

from rust-bert.

Request: Make private `model` inside ConversationManager public about rust-bert HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs