lightning-universe / lightning-transformers Goto Github PK

View Code? Open in Web Editor NEW

606.0 23.0 78.0 845 KB

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning

Home Page: https://lightning-transformers.readthedocs.io

License: Apache License 2.0

Python 99.59% Makefile 0.41%

pytorch pytorch-lightning transformers hydra

lightning-transformers's Introduction

Deprecation notice 🔒

This repository has been archived (read-only) on Nov 21, 2022. Thanks to everyone who contributed to lightning-transformers, we feel it's time to move on.

🤗 Transformers can already be easily trained using the Lightning ⚡ Trainer. Here's a recent example from the community: https://sachinruk.github.io/blog/deep-learning/2022/11/07/t5-for-grammar-correction.html. Note that there are no limitations or workarounds, things just work out of the box.

The lightning-transformers repo explored the possibility to provide task-specific modules and pre-baked defaults, at the cost of introducing extra abstractions. In the spirit of keeping ourselves focused, these abstractions are not something we wish to continue supporting.

If you liked lightning-transformers and want to continue developing it in the future, feel free to fork the repo and choose another name for the project.

Flexible components pairing 🤗 Transformers with Pytorch Lightning ⚡

Docs • Community

Installation

pip install lightning-transformers

From Source

git clone https://github.com/PyTorchLightning/lightning-transformers.git
cd lightning-transformers
pip install .

What is Lightning-Transformers

Lightning Transformers provides LightningModules, LightningDataModules and Strategies to use 🤗 Transformers with the PyTorch Lightning Trainer.

Quick Recipes

Train bert-base-cased on the CARER emotion dataset using the Text Classification task.

import pytorch_lightning as pl
from transformers import AutoTokenizer

from lightning_transformers.task.nlp.text_classification import (
    TextClassificationDataModule,
    TextClassificationTransformer,
)

tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path="bert-base-cased"
)
dm = TextClassificationDataModule(
    batch_size=1,
    dataset_name="emotion",
    max_length=512,
    tokenizer=tokenizer,
)
model = TextClassificationTransformer(
    pretrained_model_name_or_path="bert-base-cased", num_labels=dm.num_classes
)

trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)

trainer.fit(model, dm)

Train a pre-trained mt5-base backbone on the WMT16 dataset using the Translation task.

import pytorch_lightning as pl
from transformers import AutoTokenizer

from lightning_transformers.task.nlp.translation import (
    TranslationTransformer,
    WMT16TranslationDataModule,
)

tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path="google/mt5-base"
)
model = TranslationTransformer(
    pretrained_model_name_or_path="google/mt5-base",
    n_gram=4,
    smooth=False,
    val_target_max_length=142,
    num_beams=None,
    compute_generate_metrics=True,
)
dm = WMT16TranslationDataModule(
    # WMT translation datasets: ['cs-en', 'de-en', 'fi-en', 'ro-en', 'ru-en', 'tr-en']
    dataset_config_name="ro-en",
    source_language="en",
    target_language="ro",
    max_source_length=128,
    max_target_length=128,
    padding="max_length",
    tokenizer=tokenizer,
)
trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)

trainer.fit(model, dm)

Lightning Transformers supports a bunch of 🤗 tasks and datasets. See the documentation.

Billion Parameter Model Support

Big Model Inference

It's really easy to enable large model support for the pre-built LightningModule 🤗 tasks.

Below is an example to enable automatic model partitioning (across CPU/GPU and even leveraging disk space) to run text generation using a 6B parameter model.

import torch
from accelerate import init_empty_weights
from transformers import AutoTokenizer

from lightning_transformers.task.nlp.language_modeling import (
    LanguageModelingTransformer,
)

with init_empty_weights():
    model = LanguageModelingTransformer(
        pretrained_model_name_or_path="EleutherAI/gpt-j-6B",
        tokenizer=AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B"),
        low_cpu_mem_usage=True,
        device_map="auto",  # automatically partitions the model based on the available hardware.
    )

output = model.generate("Hello, my name is", device=torch.device("cuda"))
print(model.tokenizer.decode(output[0].tolist()))

For more information see Big Transformers Model Inference.

Big Model Training with DeepSpeed

Below is an example of how you can train a 6B parameter transformer model using Lightning Transformers and DeepSpeed.

import pytorch_lightning as pl
from transformers import AutoTokenizer

from lightning_transformers.task.nlp.language_modeling import (
    LanguageModelingDataModule,
    LanguageModelingTransformer,
)

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="gpt2")

model = LanguageModelingTransformer(
    pretrained_model_name_or_path="EleutherAI/gpt-j-6B",
    tokenizer=AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B"),
    deepspeed_sharding=True,  # defer initialization of the model to shard/load pre-train weights
)

dm = LanguageModelingDataModule(
    batch_size=1,
    dataset_name="wikitext",
    dataset_config_name="wikitext-2-raw-v1",
    tokenizer=tokenizer,
)
trainer = pl.Trainer(
    accelerator="gpu",
    devices="auto",
    strategy="deepspeed_stage_3",
    precision=16,
    max_epochs=1,
)

trainer.fit(model, dm)

For more information see DeepSpeed Training with Big Transformers Models or the Model Parallelism documentation.

Contribute

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Community

For help or questions, join our huge community on Slack!

lightning-transformers's People

Contributors

Stargazers

Watchers

Forkers

donnyyou zippeurfou ianminaj projektosmium helioxgroup mght munendra7777 junnyu benihime91 trendingtechnology deenam taeminlee akihironitta raedshabbir cs329yangzhong ghomashudson hadasah hongweizeng gabrielepicco jkdomoguen kaushikb11 ngo010 kyoungrok0517 zetimente maksym-taranukhin karthikrangasai stancld yuvalkirstain keepithunnyt techthiyanes qikahh vigneshwaran dumpmemory ustcsse308 mariomeissner jmwoloso mathemusician lersouza niranjanaryan talhausuf bzantium shwang1114 dennisjay zhaisilong bofenghuang elderwanng ohadrubin sustcsonglin tanmoyio rohitgr7 kentonmurray python-repository-hub rafaelvp-db dongheehand grandeep rr-28023 espoirmur tanglespace mbrukman guttappa1238 wang-haining huanglk stw2 utrobinmv ankitshah009 vic4code wachirayachi xusenlinzy gaarutyunov ajunlonglive ozoooooh rayhern tudoanh cimmittee mlshenkai das-projects intwanghao bluematrix007

lightning-transformers's Issues

Add generation inference only

🚀 Feature

Generation requires training via language modeling task, then is inference based. HF has a generation script to use LM head models or seq2seq models, we should offer a similar functionality, potentially with pipelines: https://github.com/huggingface/transformers/tree/master/examples/text-generation

python predict.py

Implement pipeline prediction capabilities

🚀 Feature

I should be able to do something like:

model = ConditionalGenerationTransformer.load_from_checkpoint('my_model.pt')

output = model.generate("This is a prefix condition")

It should be as minimal as possible as HF has done a lot of work around building 'pipelines': https://huggingface.co/transformers/task_summary.html#summary-of-the-tasks

We should be able to wrap all the underlying pipeline such that the user has to just load the approach model in lightning, and be able to access the pipeline underneath. We've stored the tokenizer/model such that the underlying pipeline can be instantiated easily.

I'm unsure if this has the crux of not supporting mini-batches, and this can be very limiting if so.

Create scripts directory

🚀 Feature

Move the train.py script:

train.py -> lightning_transformers/cli/train.py

This way we can import it and test it easily. And users can subclass it.

We can also add it as an entrypoint in setup.py:

setuptools.setup(
    entry_points={
        "console_scripts": [
            "pl-transformers-train=lightning_transformers.cli.train:hydra_entry",
        ],
    }
)

so users can do:

$ pl-transformers-train +task=huggingface/text_classification +dataset=text_classification/emotion

[API] Design approach for data metrics dependency

There are metrics that rely on the datamodule. For example, text classification requires the model knowing the number of classes which is tied to the dataset.

This is currently passed as below:

    model: LitAutoModelTransformer = instantiate_downstream_model(
        ...
        **data_module.data_model_kwargs
    )

Where data_module.data_model_kwargs is:

class LitTextClassificationDataModule(LitTransformerDataModule):
    @property
    def num_classes(self):
        return self.labels.num_classes

    @property
    def data_model_kwargs(self):
        return {
            'num_classes': self.num_classes
        }

This isn't very easy to grok imo. The kwargs can be anything, making it hard to tell what is being passed to the module in the first place.

Potential solution 1

Pass the data module to the model (I don't think save_hyperparameter works in this instance though):

class LitAutoModelTextClassificationTransformer(LitAutoModelTransformer):
    def __init__(self,
                 downstream_model_type: str,
                 backbone: DictConfig,
                 optim: DictConfig,
                 datamodule: LitTextClassificationDataModule,
                 scheduler: Optional[DictConfig] = None):
        self.num_classes = num_classes
        super().__init__(
            downstream_model_type=downstream_model_type,
            backbone=backbone,
            optim=optim,
            scheduler=scheduler
        )
        self._initialize_metrics(num_classes=datamodule.num_classes) # use num classes from datamodule directly

Potential solution 2

Initialize metrics when we know we can since the trainer now contains a reference to the dataset:

class LitAutoModelTextClassificationTransformer(LitAutoModelTransformer):
    def on_fit_start(self):
        datamodule = self.trainer.datamodule
        self._initialize_metrics(num_classes=datamodule.num_classes)

We could add this hook within the base class and expose initialize_metrics with the datamodule, for example.

[API] Improve API for adding custom models (lessons learnt from Performer model)

Recently I added a performer model using the writing prompts dataset provided in FairSeq. This involved having to build the 'backbone' model, the task specific wrapper for language modeling, and the data module. Throughout this process I learnt a few things that we need to address:

Our conf structure tends towards promises that cannot be kept

For example, I created by performer backbone model and a config:

conf/backbones/performer.yaml
task/language_modeling/lucidrains/performer.yaml

python main.py +task=language_modeling/lucidrains/performer backbone=performer # This works

But given the current conf structure, it's intuitive to assume that if I add a backbone, why can't it work for other tasks?

python main.py +task=language_modeling/huggingface backbone=performer # Crashes due to code incompatibility

Possible solutions:

Have backbones work with all tasks automatically. imo, I think this is extremely difficult if not impossible
Have backbones tied to tasks via repo name, in a manner which makes it clearer that you're mixing things

I believe 2 is the right solution to this. HuggingFace downstream tasks/backbones/tokenizer work together. If we wanted to build a FairSeq language model, we need to define a completely new set of code, which should be clear from the conf structure.

conf/task/huggingface/language_modeling.yaml
conf/backbone/huggingface/bert-base-cased.yaml

With Hydra 1.1, huggingface/language_modeling.yaml can add a recursive default of model: /model/huggingface/bert-base-cased.

Now when I add my performer model:

conf/task/lucidrains/language_modeling.yaml
conf/backbone/lucidrains/performer.yaml

Now if a user tries to do an incompatible combination, just from the command it's clearer why it doesn't work:

# Can't expect a task from a different repo to be compatible with huggingface
python main.py +task=lucidrains/language_modeling +backbone=huggingface/bert-base-cased

What about code structure?

lightning_transformers/
    task/
        lucidrains/
            language_modeling/
                core/
                    model.py
        huggingface/
            ...
    backbones/
        lucidrains/
            performer.py
    tokenizers/
        custom_tokenizer.py # This is compatible with other models, external to lucidrains

We duplicate the 'task' (language_modeling/text_classification etc) but we make it explicit what is supported. This is more important imo.

Custom DataModules should be built upon Transformer Datasets

If you're going to make a custom dataset, to standardize you should use the datasets.dataset class. They have a nice example of how to create a dataset here.

Our datamodules should just be thin wrappers on top of this, allowing you to use the transformer based data collators as we currently have setup.

Clean up Question Answering data class/Squad data class

Currently there is quite a large overlap between the two and the differentiation isn't very clear. Might be worth taking another stab at the QA dataset + clean up the processing functions to contain the meaningful transforms!

Files:
https://github.com/PyTorchLightning/lightning-transformers/blob/master/lightning_transformers/task/huggingface/question_answering/datasets/squad/data.py

https://github.com/PyTorchLightning/lightning-transformers/blob/master/lightning_transformers/task/huggingface/question_answering/datasets/squad/processing.py

https://github.com/PyTorchLightning/lightning-transformers/blob/master/lightning_transformers/task/huggingface/question_answering/core/data.py

Implement Smoke Tests across all tasks

🚀 Feature

As an initial test pass, we should provide smoke tests to ensure end-to-end all tasks run without failure.

Implement Q/A

Implement Q/A example in a unified manner to text classification.

Transformers Task Tracker

🚀 Feature

An issue that will help keep track what Tasks are still required to get a minimal example for.

Add Token Classification Task

🚀 Feature

Add Token Classification Task with CoNLL dataset: https://github.com/huggingface/transformers/tree/master/examples#the-big-table-of-tasks

Add HF Wav2vec 2.0

Add details about available Hydra config params for training

📚 Documentation

It could be something similar to what Thomas has added here. Do let me know if I should add it!

Thanks!

Duplicate output folders when using DDP

🐛 Bug

This is described in Lightning-AI/pytorch-lightning#5512

As a stopgap, we could just reduce the granularity of the run.dir by removing the seconds. This would make all processes use the same run dir in most cases, unless you're unlucky and get caught in the middle of a minute....

Transformers Tests Tracker

Task Smoke Tests #33

lightning_transformer/task/huggingface/language_modeling
lightning_transformer/task/huggingface/multiple_choice
lightning_transformer/task/huggingface/question_answering
lightning_transformer/task/huggingface/summarization
lightning_transformer/task/huggingface/text_classification
lightning_transformer/task/huggingface/token_classification
lightning_transformer/task/huggingface/translation
train.py (related: #34)

unittests (minimal tests)

Others

Clean-up CI
Format codebase
Run pre-commit in CI
Clean-up requirements

Add GLUE/XLNI Datasets for Text Classification Task

🚀 Feature

Add Glue/XLNI to maintain parity with HF tasks: https://github.com/huggingface/transformers/tree/master/examples#the-big-table-of-tasks

Clean up bug/report issue template + remove first contribution bot

Some of these templates are far too verbose, and we really don't need to the first contribution bot...

Setup CI/PyPi release

🚀 Feature

Will eventually need working CI and pypi releases :)

Add Translation Task

🚀 Feature

Add Translation Task from HF: https://github.com/huggingface/transformers/tree/master/examples#the-big-table-of-tasks

Add finetune strategy property to tasks

As a user, I finetune A LOT. Whenever I use a task, I want to enable finetuning or customize with my own strategy

Certain models have a very specific strategy that works well. In that case, the user should just enable finetuning and we'll use the best strategy

TextClassifier():
	def __init__(finetune_strategy=None):
		if finetune_strategy == True:
			finetune_strategy = SomeCustomStrategyForThisModel()

We should be able to support different strategies:

freeze then unfreeze
unfreeze from the beginning
custom user strategy:

finetune_strategy=MyFineTuneStrategy()

this is implemented as a callback, but because finetune is very model specific we attach it to the model and not the trainer

Finalise fundamental API with Hydra Support

🚀 Feature

Complete API to prevent changes down the line once we've moved ahead with adding all tasks.

It is important that we maintain a good elegant API on top of Hydra, without adding any hacky code. It would be nice to leverage dataclasses as described in #19 if this fits a good API design.

Ensure that for text classification, the API class/conf structure is agreed upon.

Add DataClass Hydra Support

Related #37

The final piece is to add automatic defaults + type checking for dataclasses.

Two options:

Via the ConfigStore: https://hydra.cc/docs/next/tutorials/structured_config/schema/ which would need to store all configs that we'd like to type check/duck-type via Hydra. The object remains a DictConfig (duck-typing)
Carlos Idea: provide a _name_ within the config to automatically convert to a specific data-class, go through the tree recursively

Choice 1 is closer to Hydra, Choice 2 is better because we actually pass data-classes around.

Add iGPT

Initial Documentation for release

📚 Documentation

Create the initial release documentation.

Add RACE, ARC datasets to Multiple Choice

🚀 Feature

Add the RACE, ARC dataset to maintain feature parity with https://github.com/huggingface/transformers/tree/master/examples#the-big-table-of-tasks

README

Implement Masked Language Modeling

🚀 Feature

Currently we only support Causal Language Modeling, and it would be nice to add support for both.

https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py

Ensure compatibility with TPUs

🚀 Feature

I haven't tried to see if TPUs work out of the box. We should ensure that the code runs as expected using the tasks.

Implement Freeze Capabilities for Seq2Seq models

🚀 Feature

Some nice freeze capabilities at runtime: https://github.com/huggingface/transformers/blob/master/examples/seq2seq/finetune_trainer.py#L229-L233

We should make a callback that can be passed to the trainer that allows users to freeze models (if compatible).

User Experience: Train a standard pre-trained model on a custom dataset

🚀 Feature

Related to #11.

Less of a feature and more of a gut check for our API. We want the API to be easy to use and for users who are familiar with transformers to be able to leverage PL without friction.

Choose a task, a huggingface dataset and fine-tune a model on this data. In addition you could also attempt to provide test/inference results.

Give a report on the process, and be as ruthless as possible with any potential feedback you may have!

Fix Datamodules

Datamodules should work with minimal code wrapping HF's dataset classes. Nate's original code defined a lot of conditions/variables that should be extracted from HF's dataset class.

Create datamodule that can take in dataset names and generate the datasets for Text Classification. Then it should be made to fit all types of data structures via inheritance if needed, and the text classification code is extracted out.

I.e TextClassificationDataModule etc IF needed (preferably everything fits one, but if it gets too big then we need so split).

Add data performance improvements for native precision

There are some data processing in the collate function that can be done if mixed precision/native precision is used:

https://github.com/huggingface/transformers/pull/9796/files

These will probably give us some nice optimizations when using native precision

Integrate SQuAD v2 metric

🚀 Feature

Currently there is no way to calculate the SQuAD metric. This is due to some complications in the metric aggregation. A wip branch can be seen here: https://github.com/PyTorchLightning/lightning-transformers/tree/wip-qa-metric

Continue integration of a QA metric!

Allow modifying base model (from scratch training)

python language_modeling/train.py --model_name_or_path bert-base-cased --dataset_name common_crawl --encoder_hidden_layers 24 --encoder_hidden_size 2048

Allow users to be able to train based on a model type, then we scale the parameters once the config is made for them. Warn them it's from scratch. Not sure if its possible, but a useful feature to have.

Multiple choices

Finalise Documentation

Will need many iterations to get the documentation right. There is a first pass available on master.

Ensure load_dataset only downloads data on one process when using DDP

Currently when we use load_dataset here: https://github.com/PyTorchLightning/lightning-transformers/blob/master/lightning_transformers/core/data.py#L67 we run this on all DDP processes. We probably want to ensure we only download/process on 1 process as this could lead to overwrites etc.

Add test.py script to the repo

🚀 Feature

As suggested by @hadasah, we should add a test.py script to test models once trained. This can be super simple and use the predict.py script as reference.

Store tokenizer metadata/object within model

When a model is saved, we do not store information pertaining to the tokenizer. This means we require the tokenizer to be re-created and assigned like below at inference/test time:

model = LitAutoModelTransformer.load_from_checkpoint('checkpoint.pt')
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
model.tokenizer = tokenizer
...

It would be preferred that after specifying the tokenizer at training time, inference knows which tokenizer to use.

Infer max steps for schedulers

Many schedulers require total_steps to be defined: https://github.com/PyTorchLightning/lightning-transformers/blob/master/conf/scheduler/linear_schedule_with_warmup.yaml#L3-L4

We should define a function within this repo to determine the total number of steps from the data module, appropriately considering limit_train_batches/num_processes etc. This is similar to what has been done in NeMo and I have some psuedocode here:

def total_training_steps(self,
                         max_epochs,
                         max_steps,
                         accumulate_grad_batches,
                         limit_train_batches,
                         num_distributed) -> int:
    if max_steps > 0:
        return max_steps
    # Compute effective num max_steps
    num_samples = len(train_dataloader.dataset)
    batch_size = train_dataloader.batch_size
    drop_last = train_dataloader.drop_last
    return self.compute_max_steps(
        max_epochs=max_epochs,
        accumulate_grad_batches=accumulate_grad_batches,
        limit_train_batches=limit_train_batches,
        num_distributed=num_distributed,
        num_samples=num_samples,
        batch_size=batch_size,
        drop_last=drop_last,
    )
def compute_max_steps(
        self,
        max_epochs,
        accumulate_grad_batches,
        limit_train_batches,
        num_distributed,
        num_samples,
        batch_size,
        drop_last):
    _round = math.floor if drop_last else math.ceil
    sampler_num_samples = math.ceil(num_samples / num_distributed)
    steps_per_epoch = _round(sampler_num_samples / batch_size)
    if isinstance(limit_train_batches, int) or limit_train_batches == 0.0:
        steps_per_epoch = min(steps_per_epoch, int(limit_train_batches))
    elif steps_per_epoch != float('inf'):
        # limit_train_batches is a percentage of batches per epoch
        steps_per_epoch = int(steps_per_epoch * limit_train_batches)
        if accumulate_grad_batches == 1:
            steps_per_epoch = max(steps_per_epoch, 1)
    return math.ceil(steps_per_epoch / accumulate_grad_batches) * max_epochs

Eventually I feel like this code should go into lightning as a useful function, there are already relevant issues in PL around this.

Integrate DALLE-Pytorch

🚀 Feature

This would be awesome to showcase the power of this library:

https://github.com/openai/CLIP
https://github.com/lucidrains/DALLE-pytorch

Remove input kwargs passing to datamodules for specific arguments

To clear up what's being passed to the data modules, we need to remove kwargs as much as possible, and be explicit in the class of what the arguments are, i.e currently:

https://github.com/PyTorchLightning/lightning-transformers/blob/master/lightning_transformers/core/data.py#L26-L29

Shouldn't exist, and be explicit in the class, something like this:

class QuestionAnsweringTransformerDataModule(TransformerDataModule):

    def __init__(self,
                 max_seq_length: int,
                 pad_to_max_length: int,
                 do_train: bool,
                 doc_stride: int,
                 version_2_with_negative: bool,
                 n_best_size: int,
                 max_answer_length: int,
                 null_score_diff_threshold: float,
                 output_dir: str,
                 *args,
                 **kwargs):
        super().__init__(*args, **kwargs)
        self.max_seq_length = max_seq_length
        self.pad_to_max_length = pad_to_max_length
        self.do_train = do_train
        self.doc_stride = doc_stride
        self.version_2_with_negative = version_2_with_negative
        self.n_best_size = n_best_size
        self.max_answer_length = max_answer_length
        self.null_score_diff_threshold = null_score_diff_threshold
        self.output_dir = output_dir

Not sure what the cleanest way to support this, cc @carmocca!

Set transformers environment variable in a hook

https://github.com/PyTorchLightning/lightning-transformers/blob/93982c9d180af7b2fdc923549481357dabfa0941/lightning_transformers/core/nlp/huggingface/data.py#L18

Let's move this to a hook.

Originally posted by @tchaton in #35 (comment)

[RFC] Allow Hydra to live in the base class

🚀 RFC

Proposal

Marry the project with Hydra, allow Hydra to live in a base class TaskTransformer (or something similarly named). Remove complications with the instantiator, make it clear we're using hydra.utils.instantiate in the code.

Motivation

I've been working on #53 and ran into many API issues, and confusion. The big issue is if I'm running into issues (after developing the code) then a normal user would be even more confused.

it isn't clear what base class to use.

I want to use Hydra conf as it provides niceties like instantiating my optimizer/scheduler, but at the same time our current base class doesn't provide any of that:

class TaskTransformer(LitTransformer):
    """
    Base class for task specific transformers
    """

    def setup(self, stage: str):
        self.configure_metrics(stage)

    def configure_metrics(self, stage: str) -> Optional[Any]:
        """
        Override to configure metrics for train/validation/test.
        This is called on fit start to have access to the data module,
        and initialize any data specific metrics.
        """
        pass

My proposal is to move configure_optimizers here, so instantiation happens in the base class. If a user doesn't want to use instantiation, fine. override configure_optimizers. This moves onto the next point.

    def __init__(self, optimizer_cfg: Any, scheduler_cfg: Any):
        super().__init__()
        self.optimizer_cfg = optimizer_cfg
        self.scheduler_cfg = scheduler_cfg

    def configure_optimizers(self) -> Dict:
        self.optimizer = self.optimizer(self.optimizer_cfg, self.model)
        # prepare_warmup needs the datamodule to be available when `self.num_training_steps`
        # is called that is why this is done here and not in the __init__
        self.prepare_warmup(self.scheduler_cfg)
        self.scheduler = self.scheduler(self.scheduler_cfg, self.optimizer)
        return super().configure_optimizers()

    def prepare_warmup(self, cfg: SchedulerConfig):
        if cfg.num_training_steps < 0:
            # less than 0 specifies to infer number of training steps
            cfg.num_training_steps = self.num_training_steps
            log.info(f"Inferring number of training steps, set to {cfg.num_training_steps}")

        if isinstance(cfg.num_warmup_steps, float):
            # Convert float values to percentage of training steps to use as warmup
            cfg.num_warmup_steps *= cfg.num_training_steps
            log.info(f"Inferring number of warmup steps from ratio, set to {cfg.num_warmup_steps}")

    def optimizer(self, cfg: Any, model: torch.nn.Module) -> torch.optim.Optimizer:
        no_decay = ["bias", "LayerNorm.weight"]
        grouped_parameters = [
            {
                "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
                "weight_decay": cfg.weight_decay,
            },
            {
                "params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
                "weight_decay": 0.0,
            },
        ]
        return hydra.utils.instantiate(cfg, grouped_parameters)

    def scheduler(self, cfg: Any, optimizer: torch.optim.Optimizer) -> torch.optim.lr_scheduler._LRScheduler:
        return hydra.utils.instantiate(cfg, optimizer=optimizer)

when using hydra, what should my class look like?

Unless I look at the HF base class, I have no idea how the instantiator works. Once I look at the HF base class it makes sense:

class HFTransformer(HydraTaskTransformer):
    """
    Base class for task specific transformers, wrapping pre-trained language models for downstream tasks.
    The API is built on top of AutoModel and AutoConfig, provided by HuggingFace.

    see: https://huggingface.co/transformers/model_doc/auto.html
    """

    def __init__(
        self,
        downstream_model_type: str,
        backbone: HFBackboneConfig,
        optimizer: OptimizerConfig,
        scheduler: HFSchedulerConfig,
        **config_data_args,
    ):

Hydra passes these configs which are defined in the task conf. But the fact that there is no class signature to inherit from made this extremely difficult to parse. So my proposal here is to be opinionated, set this class signature in the TaskTransformer.

Aside from additional Scheduler config, removing the instantiator functions that are now hard coded into the module, these were the changes required to get DALLE in (from the model perspective, data API needs some change but its less controversial).

Notes

Let's say a user doesn't want to pass in a backbone config. What should they do? Should be able to just omit it from your class signature. I think the best way to support this is to have the class signature take an optional backbone config (but in most cases, you will).

What if I don't want my base class to take all these configs? Use the LitTransformer base class.

Implement Language Modeling Task

🚀 Feature

Incorporate the language modeling task into the codebase and support raw text input as seen https://github.com/huggingface/transformers/tree/master/examples#the-big-table-of-tasks

Custom Transformer Support

class MyTransformer(BaseTransformer):

    def__init__(self, model_params):
        self.model = MyCustomModel(model_params)

This will allow people to train their own models via language modeling, then use them on examples/datasets supported by HF datasets.

Create custom distributed plugin to allow model.parallelize

🚀 Feature

Currently some models can be parallelized using the latest HF changes: huggingface/transformers#8696

We should create a plugin supported in this repo so that users can utilize this with the pl trainer.

Move ROUGE/BLEU score upstream to lightning

🚀 Feature

Should move these metrics from translation/summarization upstream to the lightning metrics package!

Simplify data hooks (remove prepare_labels)

🚀 Feature

It adds confusion and rigidness to the API. See how translation/summarization use it. Remove the hook, and allow everything to happen in process_data.

Implement Summarization Task

🚀 Feature

Add Summarisation Task supporting CNN/Daily Mail datasets: https://github.com/huggingface/transformers/tree/master/examples#the-big-table-of-tasks

lightning-universe / lightning-transformers Goto Github PK

lightning-transformers's Introduction

Deprecation notice 🔒

Installation

What is Lightning-Transformers

Quick Recipes

Train bert-base-cased on the CARER emotion dataset using the Text Classification task.

Train a pre-trained mt5-base backbone on the WMT16 dataset using the Translation task.

Billion Parameter Model Support

Big Model Inference

Big Model Training with DeepSpeed

Contribute

Community

lightning-transformers's People

Contributors

Stargazers

Watchers

Forkers

lightning-transformers's Issues

🚀 Feature

🚀 Feature

🚀 Feature

Our conf structure tends towards promises that cannot be kept

Custom DataModules should be built upon Transformer Datasets

🚀 Feature

🚀 Feature

🚀 Feature

📚 Documentation

🐛 Bug

Task Smoke Tests #33

unittests (minimal tests)

Others

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

📚 Documentation

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

🚀 RFC

Proposal

Motivation

it isn't clear what base class to use.

when using hydra, what should my class look like?

Notes

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

🚀 Feature

Recommend Projects

Recommend Topics

Recommend Org

Jobs