Comments (7)
Hi,
Currently, the content of the model directory is an implementation detail and its structure could change in the future.
I'm not sure when would you need to have the model files in memory? As you should load the actual files at some point, isn't it equivalent to build a Model
instance and then move it around (the instance being the in-memory representation of the model on disk)? Or are you dynamically generating the model and vocabularies?
from ctranslate2.
I want to extract the files in memory as std::vector<unsigned char>
s directly from an archive, for simplicity and security reasons. Then, I would like to reinterpret_cast
them in the right type and use them as normally at runtime. Could that work?
Another case I can think of is if someone wants to share vocabularies or vmaps among different models.
from ctranslate2.
I'm thinking we could add a ModelReader
interface with the following methods (to be refined):
std::unique_ptr<std::istream> get_model_binary()
bool is_vocabulary_shared()
(iftrue
we build a singleVocabulary
instance)std::unique_ptr<std::istream> get_source_vocabulary()
std::unique_ptr<std::istream> get_target_vocabulary()
std::unique_ptr<std::istream> get_vocabulary_mapping()
(returnsnullptr
if no vocabulary mapping)
Then we add a new overload that accepts an instance implementing this interface.
In you code, you would need to extend this base class and implement your own loading logic. I think there are ways to wrap a stringstream
over an existing buffer.
What do you think?
from ctranslate2.
That would be great! I need to get a bit deeper into the relevant classes and interfaces in the code, but I'd like to get a first grasp of your idea and the work this will involve: all these methods should be templates with arguments and the derived class will implement the conversion of these arguments into the right types, do I get it right?
from ctranslate2.
ModelReader
would be an abstract class with methods left unimplemented. Derived classes (such as ModelFileReader
) can implement arbitrary loading logic as long as they meet the interface: returning a stream over the requested objects.
The main goal is to not integrate your loading logic into the main codebase, as it is very specific to your use case, but allow to plug it.
Do you want to take this one? I can also implement it if you prefer.
from ctranslate2.
Actually a single method could be enough, assuming you could easily map a filename to a stream:
std::unique_ptr<std::istream> get_file(const std::string& filename)
from ctranslate2.
ModelReader
would be an abstract class with methods left unimplemented. Derived classes (such asModelFileReader
) can implement arbitrary loading logic as long as they meet the interface: returning a stream over the requested objects.The main goal is to not integrate your loading logic into the main codebase, as it is very specific to your use case, but allow to plug it.
Yes, absolutely, I get the general idea and it's great as long as I have a way to implement my logic.
Do you want to take this one? I can also implement it if you prefer.
I would love to work on that, but honestly you already have implemented it in your mind, along with alternatives :), so I would prefer and appreciate it if you could add it --it would save us great time and effort.
Thank you!
from ctranslate2.
Related Issues (20)
- Problem with GPU allocation after updating to CTranslate2 4.0.0 HOT 1
- [INFO] cuDNN with fusions HOT 1
- Whisper models no longer work on Windows cuda after CTranslate2 4.0.0 HOT 2
- Intel Advanced Matrix Extensions (AMX) support HOT 1
- System error while converting Aya101 HOT 4
- Update the pip package HOT 1
- PROBLEM converting phi2, phi1.5, and phi1 models HOT 8
- Translator.unload_model(to_cpu=True) takes long time HOT 8
- `translate_iterable` is not properly handling `max_input_length` HOT 2
- Whisper batch generation is not faster than loops HOT 5
- Support for Zephyr and other "StableLmForCausalLM" models? HOT 2
- Benchmarking Whisper on ctranslate2, llama.cpp, and bitsandbytes HOT 1
- Deploying with model and tensor parallelism HOT 3
- When building from source missing C++ libraries HOT 6
- int8 quantization not working HOT 1
- I try to pytest but it complains that "AttributeError: module 'ctranslate2' has no attribute 'get_cuda_device_count'" HOT 5
- Error when converting NMT model with ALiBi or RoPe HOT 2
- [Gemma] GELU should be approx tanh not exact HOT 2
- Asking about the return_scores during generation HOT 7
- 4.2.0 release build for python 3.9 HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctranslate2.