GithubHelp home page GithubHelp logo

Comments (6)

Mivg avatar Mivg commented on August 28, 2024

Hi @mdrpanwar
Thanks for your question. Any model checkpoint that can be loaded with HuggingFace can be used, even if it is not pushed as a model card.
However, in case you have a custom model with no AutoClass functionality, it will indeed not work in its current form.
Can you please add some details on what you have and try to achieve and I'll try to add support for that?

from sled.

mdrpanwar avatar mdrpanwar commented on August 28, 2024

Hi @Mivg,

Thanks for replying.

My question and request were the following:
The official code of new transformer models is not always released in the form of the Hugging Face models having the AutoClass functionality. So, the current implementation of SLED restricts the direct usage of such base models. I was hoping for a more general implementation that can take in any base model implemented in PyTorch regardless of the AutoClass. Perhaps it will require more work. Is this something you are targetting in near future?

Please feel free to close this issue. I shall get back when I have a more concrete requirement for a specific model.

Thanks.

from sled.

Mivg avatar Mivg commented on August 28, 2024

Hi @mdrpanwar

Thanks for the details. Sure, that makes sense and there is no reason SLED could not support it.
Before I think up a possible solution, I want to be precise on the goal. Is it correct to assume your model is implemented in PyTorch and inherits from PreTrainedModel (part of transformers) but is just not registered to be used as an AutoClass? I.e. you are able to to do model = MyCustomModel(...) and pass it to the trainer as if it was e.g. BART?
If so, do you also have a custom config class that inherits from PretrainedConfig?
Finally, if the two above are true, does your model support MyCustomModel.from_pretrained('some local checkpoint')?

In any case, supporting the above should be rather straightforward. the other possible solution assuming only the answer to the first question is yes is to support something like SledForConditionalGeneration.wrap_model(backbone_model) and use it instead of the from_pretrained initialization

from sled.

mdrpanwar avatar mdrpanwar commented on August 28, 2024

Hi @Mivg,

Thank you for your detailed response. It is fine to assume that base models are written in PyTorch. Beyond that, there are two classes of base models:
1. The base model is written using Hugging Face's transformers library. In this case, it is fair to assume that it inherits from PreTrainedModel and the custom config class inherits from PretrainedConfig. However, for wider applicability, we can only assume the former to be true.
2. The base model is not written using transformers library (written only using PyTorch or using some other library e.g. fairseq). In this case, we need to come up with some minimal interface that is expected of the base model such that it can be used under SLED framework.

Ideally, we would like to support both 1 and 2 to be exhaustive; but 1 already covers a large number of possible base models. So, we can start with 1 and gradually support 2 over time if you think it to be a valid use case.

from sled.

leoribeiro avatar leoribeiro commented on August 28, 2024

Hello @Mivg, is there any update on this issue? Can I use SLED in other HF models?

@Mivg can I do something like that:

import sled
from transformers import AutoModelForSeq2SeqLM
config = AutoConfig.from_pretrained("google/flan-t5-small")
config["model_type"] = "tau/sled"
config["underlying_config"] = "facebook/bart-base"
config["context_size"] =  256
config["window_fraction"] =  0.5
config["prepend_prefix"] =  true
config["encode_prefix"] =  true
config["sliding_method"] =  "dynamic"

model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small", config=config)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

Would this code enable SLED on Flan-T5?

from sled.

leoribeiro avatar leoribeiro commented on August 28, 2024

@mdrpanwar please, would you help? Were you able to use SLED with other LMs in HF?

from sled.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.