adapter-hub / adapters Goto Github PK

A Unified Library for Parameter-Efficient and Modular Transfer Learning

License: Apache License 2.0

Shell 0.04% Makefile 0.05% Python 39.51% Jupyter Notebook 60.40%

nlp natural-language-processing adapters transformers bert pytorch parameter-efficient-learning parameter-efficient-tuning lora

adapters's People

Contributors

Stargazers

Watchers

Forkers

calpt arueckle ashishpatel26 ruizewang hmohebbi kaburelabs jeroenvanhautte mrbananahuman asacooperstickland vblagoje tilmanbeck theorist17 formiel zeta1999 rrajasek95 cody-moveworks guyulongcs hsterz mrjaggu shanlilishanlili ahmetustun playwithsanlei sanazbahargam rogervaas isaacrodgz xhan77 sbassam li3cmz winniehan mengzaiqiao duanwang1984 fwl998877 zhyoung24 tianhaofu datexis wangjiaqiys cindyxinyiwang manhieu nelson-liu ankitshah009 jain-priyesh tanmaylaud laiviet sstojanoska bhargav5 pj0616 gregor-ge xplip ccliu2 jithu-chandran andreasbergmeister amiraktify daandouwe paritoshg uunal iwatesan till0r hsouporto zy-zong carvalhoamc wutong8023 angadsethi ibrahim85 piegu techthiyanes mohammadjahani1 macabdul9 zeinabbo sathishnatarajan2 fulquan kuior sam131112 terrisgo akkarimi dananastasyev thisaltennakoon nickb- bluekiji77 amantalion ramitsurana lucylulu4624 lllingfa dk25021999 j-chim pdan93 ilkerkesen mlnethub mistel1225 obonyojimmy xvshiting wangmj6 angrycaptain19 deh-alba mobashgr himashirathnayake geblanco hbcbh1999 hyezzz stat-eklee stancld

adapters's Issues

KeyError: 'mh_adapter' in add_adapter() with config containing "MH_adapter"

🐛 Bug

Information

I get this key error when loading an adapter (see code below).
Inspecting the adapter_config shows that no mh_adapter key is contained but an MH_adapter key.

To reproduce

Steps to reproduce the behavior:

model = BertModelWithHeads.from_pretrained('bert-base-uncased', cache_dir=transformers_cache_dir) model.load_adapter("sentiment/sst@example-org", cache_dir=transformers_cache_dir)

Traceback (most recent call last):
File "C:\Users\Gregor\AppData\Roaming\JetBrains\IdeaIC2020.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1438, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Users\Gregor\AppData\Roaming\JetBrains\IdeaIC2020.1\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/Gregor/Documents/Programming/efficient-adapters/evaluation/timing/measure_inference.py", line 59, in
results = measure_inference_gpu(True, [10, 512], [1, 2, 128], cache_dir, repetitions=5)
File "C:/Users/Gregor/Documents/Programming/efficient-adapters/evaluation/timing/measure_inference.py", line 16, in measure_inference_gpu
model.load_adapter("sentiment/sst@example-org", cache_dir=transformers_cache_dir)
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_model_mixin.py", line 748, in load_adapter
super().load_adapter(
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_model_mixin.py", line 650, in load_adapter
load_dir, load_name = loader.load(adapter_name_or_path, config, version, model_name, load_as, **kwargs)
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_model_mixin.py", line 375, in load
self.model.add_adapter(adapter_name, config["type"], config=config["config"])
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_model_mixin.py", line 705, in add_adapter
self.base_model.add_adapter(adapter_name, adapter_type, config)
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_bert.py", line 471, in add_adapter
self.encoder.add_adapter(adapter_name, adapter_type)
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_bert.py", line 413, in add_adapter
layer.add_adapter(adapter_name, adapter_type)
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_bert.py", line 389, in add_adapter
self.attention.output.add_adapter(adapter_name, adapter_type)
File "C:\Users\Gregor\Anaconda3\envs\adapter\lib\site-packages\transformers\adapter_bert.py", line 38, in add_adapter
if adapter_config and adapter_config["mh_adapter"]:
KeyError: 'mh_adapter'

no warning when missing prediction head

🐛 Bug

Information

When training e.g. fusion with multiple heads, there is a warning that no prediction head was stored for all adapters.
This is desired and thus no warning should be issued.

Model I am using (Bert, XLNet ...):
any model

Language I am using the model on (English, Chinese ...):
any language

Adapter setup I am using (if any):

The problem arises when using:

the official example scripts: (give details below)
[ x] my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
[x ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

train fusion with multiple heads
when storing the model, warnings are printed for every adapter

Expected behavior

Environment info

transformers version:
Platform:
Python version:
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Support Adapters in BART

🌟 Add adapters to the BART model.

ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

Hi
I am trying with this example colab:
https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Training.ipynb#scrollTo=Lbwb3NRf8mBF

getting this error:

Traceback (most recent call last):
  File "test.py", line 11, in <module>
    from transformers import AutoTokenizer, EvalPrediction, GlueDataset, GlueDataTrainingArguments, AutoModelWithHeads, AdapterType
ImportError: cannot import name 'AutoModelWithHeads' from 'transformers' (/idiap/user/rkarimi/libs/anaconda3/envs/adapter/lib/python3.7/site-packages/transformers/__init__.py)

versions

(adapter) rkarimi@italix17:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep transformers
adapter-transformers      1.0.1                     <pip>
transformers              3.5.1                     <pip>
(adapter) rkarimi@italix17:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep pytorch
pytorch-lightning         1.0.4                     <pip>
adapter hub from github is installed

fix adapters with `adapter_attention`

🐛 Bug

Old versions of the adapters initialized *adapter_attention* which were never used but stored.
I proposed a two stage fix:

hot fix which does not log the warning that the parameters were not instantiated
remove the parameters from all adapters

Information

Model I am using (Bert, XLNet ...): e.g. RoBERTa-Base

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any):
many but e.g.

model = AutoModel.from_pretrained("roberta-base")
model.load_adapter("comsense/csqa@ukp", "text_task", config="{'using': 'pfeiffer'}")

The problem arises when using:

[ x] the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

[x ] an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

load the adapter, you will get a warning for parameters which are not required

Expected behavior

no warning

Environment info

transformers version:
Platform:
Python version:
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

large memory requirement and getting low accuracy after adding adapter layers

Hi
I am using a custom model, so I had to read and implement the adapter layers using library myself, here is how I did it:
I defined an adapter class like below:

"""Implements an Adapter block.

Code is adapted from: https://github.com/Adapter-Hub/adapter-transformers/blob/master/\
src/transformers/adapter_modeling.py
"""
import torch.nn as nn
from .adapter_utils import Activations

class Adapter(nn.Module):
    """
    Implementation of a single Adapter block.
    """

    def __init__(self, input_size, config):
        super().__init__()
        self.input_size = input_size
        self.add_layer_norm_before = config.add_layer_norm_before
        self.add_layer_norm_after = config.add_layer_norm_after
        self.residual_before_layer_norm = config.residual_before_layer_norm

        # list for all modules of the adapter, passed into nn.Sequential()
        seq_list = []
        # If we want to have a layer norm on input, we add it to seq_list
        if self.add_layer_norm_before:
            seq_list.append(nn.LayerNorm(self.input_size))

        # if a downsample size is not passed, we just half the size of the original input
        reduction_factor = config.reduction_factor if config.reduction_factor is not None else 2
        self.down_sample_size = self.input_size//reduction_factor
        seq_list.append(nn.Linear(self.input_size, self.down_sample_size))
        self.non_linearity = Activations(config.non_linearity.lower())
        seq_list.append(self.non_linearity)

        # sequential adapter, first downproject, then non-linearity then upsample.
        # In the forward pass we include the residual connection
        self.adapter_down = nn.Sequential(*seq_list)

        # Up projection to input size
        self.adapter_up = nn.Linear(self.down_sample_size, self.input_size)

        # If we want to have a layer norm on output, we apply it later after a
        # separate residual connection. This means that we learn a new output layer norm,
        # which replaces another layer norm learned in the bert layer
        if self.add_layer_norm_after:
            self.adapter_norm_after = nn.LayerNorm(self.input_size)

    def forward(self, x): #, residual_input):
        down = self.adapter_down(x)
        up = self.adapter_up(down)
        output = up

        # todo add brief documentation what that means
        #if self.residual_before_layer_norm:
        #    output = output + residual_input

        # todo add brief documentation what that means
        if self.add_layer_norm_after:
            output = self.adapter_norm_after(output)

        # todo add brief documentation what that means
        #if not self.residual_before_layer_norm:
        #    output = output + residual_input

        return output #, down, up

then I add them between the layers of my class

class LayerFF(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.DenseReluDense = T5DenseReluDense(config)
        self.layer_norm = T5LayerNorm(config.d_model, eps=config.layer_norm_epsilon)
        self.dropout = nn.Dropout(config.dropout_rate)
        # TODO: remove it later.
        self.add_adapters = config.add_adapters
        if self.add_adapters:
            # TODO: adapter config should be a part of model config then. or do we want a separate config?
            adapter_config = AdapterConfig()
            self.adapter = Adapter(config.d_model, adapter_config)
    def forward(self, hidden_states):
        norm_x = self.layer_norm(hidden_states)
        y = self.DenseReluDense(norm_x)
        if self.add_adapters:
            y = self.adapter(y)
        layer_output = hidden_states + self.dropout(y)
        return layer_output

then I freeze all model parameters with require_grad=False and setting only adapter ones to True. I see a large memory requirement, around the same model, could you assist me if anything is missing? thanks

I additionally get accuracy close to untrained model, and very low, could you give me possible suggestions to improve the accuracy? thank you.

Example code for the Inter-Adapter attention plots in Adapter Fusion

🚀 Feature request

Example code to produce the (supercool!) Adapter Fusion inter-Adapter attention plots in figure 5 from the paper AdapterFusion: Non-Destructive Task Composition for Transfer Learning.

Motivation

Checking out the attention scores in the AdapterFusion module for analysis is exciting. But I didn't find it easy to create them. The challenge was accessing the relevant tensors and then creating them in the right format. Hence my request for some help ;).

Creating square attention plots from the attention tensor saved in BertFusion.recent_attention (https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_modeling.py#L218). As far I understand, this tensor is of shape [batch_size, seq_len, num_adapters], and when I average over the first two dimensions (mean(0).mea(0), which I will do for all batches in the prediction data) I get a tensor of num_adapters floats that sums to 1.

Should I understand this to be the attention displayed in the above figure? But how do I get something of shape [num_adapters, num_adapters]?

Accessing the stored attention tensors from the Bert encoder during the prediction forward passes. I have been trying to trace up the BertFusion module through transformers.adapter_bert to understand where this modules ends up in the Bert model, and thus how I can access it from the top down. My guess from https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_bert.py#L80 would be that

model.encoder.adapter_fusion_layer[adapter_fusion_name]

should give me the BertFusion module, which in turn would allow access to recent_attention after a prediction forward pass. But that does not seem to work. (If I believe correctly, because the model.encoder has no attribute adapter_fusion_layer.

How should I do this?

Your contribution

All that I have to contribute are the incomplete findings I shared above. But my guess is that the authors of Adapter Fusion would have some snippets lying around. I could turn those into an example snippet, in a notebook or something. Whatever you prefer!

Query regarding adapter type

Thanks for this project. I had a query which has not been discussed in the docs. I wanted to ask when we use run_glue.py example from this repo, which type of adapter is added.

Is the type same as Fig 2 (right) (two feed-forward layers up and down) or is it different ?
When we use the above script, do we use AdapterFusion method by default ?

Automatically determine whether we train adapters in Trainer

🚀 Feature request

Currently we need to manually pass is_training_adapter to the Trainer.
See: https://github.com/Adapter-Hub/adapter-transformers/blob/d24649cea108baa2f33c4f3ac9c040b88a43abc0/src/transformers/trainer.py#L180

I did not know about this option and wondered why my script never exported adapters. Further, this only has an effect on checkpointing, not on the training. So we might want to rename it. Or better: automatically determine whether to export adapters or the full model.

Upgrade transformers from 1.0.1 to 3.x

🚀 Feature request

transformers is now in 3.x version with cleaner data processing, improved stability and multiple bug fixes

Motivation

This makes it much easier for making new adapters on custom datasets which are not managed automatically by GLUE scripts. E.g. the __call__ API from AutoTokenizer reduces the separate tokenize, pad, encode, create attention masks steps into single API call.

Your contribution

If the developers can point out places, e.g. classes or function calls which could act as a starting point for this upgrade - I'd be happy to start a PR. I might need a little help to warm up and get comfortable with the code flow here though.

"leave_out" in an adapter config does not work if config is a dict

🐛 Bug

Information

BertEncoderAdaptersMixin.add_adapter() checks for "leave_out" with hasattr. This does not work if the config is a dict because then it is no attribute.

To reproduce

adapter_config = resolve_adapter_config("pfeiffer")
adapter_config["leave_out"] = [0, 1]
#adapter_config = AdapterConfig.from_dict(adapter_config)
model.add_adapter(name, AdapterType.text_task, config=adapter_config)

If the third line stays commented out, then the 0th and 1st layer will not be skipped in BertEncoderAdaptersMixin.add_adapter().

Expected behavior

A dict config should work with "leave_out" especially if resolve_adapter_config returns a dict.

Run_squad script does not parse adapter args correctly and does not save adapters

🐛 Bug

Information

Model I am using (Bert, XLNet ...):
mBERT

Language I am using the model on (English, Chinese ...):
Arabic, but the issue is language/dataset independent

Adapter setup I am using (if any):
Arabic lang adapter from adapterhub, new squad task adapter

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task:
my own task or dataset: (give details below)
Arabic Reading Comprehension Dataset (ARCD) but should be the same for any dataset

To reproduce

Steps to reproduce the behavior:

Issue of args not being parsed correctly:

Try to define language adapter by providing --load_lang_adapter [adapter name]
Script fails as argument parser expects it to be --load_language_adapter
Try to define language adapter by providing --load_language_adapter instead
Script fails as setup_task_adapter_training function of adapter_training.py expects it to be --load_lang_adapter

Issue of adapters not being saved:

Run script to finetune adapters
Script stores full model for each checkpoint and at the end of training

Expected behavior

I should be able to define the language adapter with the --load_lang_adapter flag and its config with the --lang_adapter_config flag. When using adapters to finetune my model, I would usually like to store the adapters, not the full model.

Environment info

transformers version: 2.11.0
Platform: macOS-10.15.6-x86_64-i386-64bit
Python version: 3.8.5
PyTorch version (GPU?): 1.5.1 (False)
Tensorflow version (GPU?): 2.3.0 (False)
Using GPU in script?: False
Using distributed or parallel set-up in script?: False

layernorm should be set to trainable or not ?

Hi,
I see in some implementations they set layer_norm as require_grad=True, could you tell me if all layer norms of the model needs to be set to require_grad=True, or only the ones inside adapter layer needs this condition?
thanks.

Trainer doesn't work with multi-gpu setup

🐛 Bug

Information

Model I am using (Bert, XLNet ...): bert-base-uncased

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): Default

The problem arises when using:

the official example scripts: (give details below): run_glue_wh.py
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name) MNLI
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

HF default Trainer wraps the model in DataParallel if training_args.n_gpu > 1. As a result, it doesn't have the config attribute. When I use your modified Trainer, I am getting the error, DataParallel object has no attribute 'config'

Expected behavior

It should not raise the above error.

Environment info

Using the master version of this repo. I had to use CUDA_VISIBLE_DEVICES flag to specify one GPU.

Cannot load saved AdapterFusion from directory with model.load_adapter_fusion()

🐛 Bug

Information

Model I am using (Bert, XLNet ...): Bert-base

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): AdapterFusion

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: QQP, SNLI
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Train AdapterFusion using run_fusion_glue.py, loading two pre-trained single-task adapters "qqp" and "snli"
AdapterFusion weights and config (adapter_fusion_config.json, pytorch_model_adapter_fusion.bin) saved in a directory /qqp,snli
When trying to load AdapterFusion with model.load_adapter_fusion("qqp,snli") I get the following error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_model_mixin.py", line 837, in load_adapter_fusion
    load_dir, load_name = loader.load(adapter_fusion_name_or_path, load_as)
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_model_mixin.py", line 485, in load
    self.model.add_fusion(adapter_fusion_name, config["config"])
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_model_mixin.py", line 716, in add_fusion
    self.base_model.add_fusion_layer(adapter_names)
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_bert.py", line 585, in add_fusion_layer
    self.encoder.add_fusion_layer(adapter_names)
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_bert.py", line 479, in add_fusion_layer
    layer.add_fusion_layer(adapter_names)
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_bert.py", line 461, in add_fusion_layer
    self.attention.output.add_fusion_layer(adapter_names)
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_bert.py", line 69, in add_fusion_layer
    adapter_config = self.config.adapters.common_config(adapter_names)
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_config.py", line 243, in common_config
    adapter_config = AdapterConfig.from_dict(adapter_config)
  File "/home/om304/anaconda3/lib/python3.7/site-packages/transformers/adapter_config.py", line 87, in from_dict
    return cls(**config)
TypeError: ABCMeta object argument after ** must be a mapping, not NoneType

Expected behavior

I would expect to be able to load the trained adapter-fusion from the directory to which it was saved.

Environment info

transformers version: 2.11.0
Platform: Linux-4.15.0-58-generic-x86_64-with-debian-stretch-sid
Python version: 3.7.4
PyTorch version (GPU?): 1.4.0 (True)
Tensorflow version (GPU?): 2.1.0 (False)
Using GPU in script?: True
Using distributed or parallel set-up in script?: False

possible bug with adapter_bert implementation

Hi
Looking into adapter_bert inside "adapter_stack_layer" function, you first call self.get_adapter_preparams, there for case of Hausbly adapter config, you change the residual to hidden_state, after this fucntion call in line 178, both hidden_states and residual are the same value of hidden_states, then they feed into the adapter_layer(), is this the correct behaviour expected to feed the same input to this layer? thanks

Merge with original transformers library

🚀 Feature request

Merge this into the original transformers library.

Motivation

This library is awesome so thanks a lot but it would be much more convenient to have this merged into the original transformers library. The Huggingface team seems to be focused on adding lightweight options for their models and adapters are huge time-and-memory-savers for multitask use cases and would be a great addition to the transformers library.

Your contribution

You've done the integration here already so it should be straightforward but happy to help. I've posted an issue on huggingface's end as well.

Add documentation how to use loaded adapters without prediction head

🚀 Feature request

A description on how to manually pass the adapter composition to model.forward()

Motivation

I am doing this right now and cannot remember the exact naming. Checked docs, didn't find it. Now need to check code

multi-label classification / paperswithcode dataset

Hi guys,

Hope you are all well !

I was wondering if adapter-transformers can handle multi-label classification with 1560 labels.

More precisely, I would like to apply it to paperswithcode dataset where labels are called tasks.

Refs:

Thanks for any insights or inputs on that.

Cheers,
X

Adapters with heads are stored multiple times after more than one checkpoint

🐛 Bug

when checkpointing at more than one steps, we seem to be storing the same adapter and head multiple times (in a loop)
I think I was able to zero-in the problem:
https://github.com/Adapter-Hub/adapter-transformers/blob/a994914cbb5290a633e0f3e1e6b7cfd7fb91ecbe/src/transformers/adapter_model_mixin.py#L729
custom_weights_loaders.append(PredictionHeadLoader(self, error_on_missing=False))
When storing the model we append the prediction head, so this list increases every time we save the model.
also here:
https://github.com/Adapter-Hub/adapter-transformers/blob/a994914cbb5290a633e0f3e1e6b7cfd7fb91ecbe/src/transformers/adapter_model_mixin.py#L767

Information

Model I am using (Bert, XLNet ...):
mBERT
Language I am using the model on (English, Chinese ...):
English
Adapter setup I am using (if any):
SST-2 in Glue script
The problem arises when using:

[ x] the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

[x ] an official GLUE/SQUaD task: SST-2
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

run GLUE on any task
set --save_steps 2 --logging_steps 2

{"eval_loss": 0.33147413358775846, "eval_acc": 0.8692660550458715, "epoch": 2.2802850356294537, "step": 4800}
07/08/2020 09:55:17 - INFO - transformers.trainer - Saving model checkpoint to data_models/glue_testing/checkpoint-4800
07/08/2020 09:55:17 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/adapter_config.json
07/08/2020 09:55:17 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_adapter.bin
07/08/2020 09:55:17 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:17 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:17 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:17 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/sst-2/head_config.json
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/sst-2/pytorch_model_head.bin
07/08/2020 09:55:18 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/adapter_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_adapter.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Configuration saved in data_models/glue_testing/checkpoint-4800/en/head_config.json
07/08/2020 09:55:19 - INFO - transformers.adapter_model_mixin - Module weights saved in data_models/glue_testing/checkpoint-4800/en/pytorch_model_head.bin

Expected behavior

Only store the adapter and head once per checkpoint

Environment info

transformers version: latest
Platform:
Python version: 3.6
PyTorch version (GPU?): latest
Tensorflow version (GPU?):
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Prediction head not loaded

🐛 Bug

Information

Model I am using (Bert, XLNet ...): RoBERTa

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): Pfeiffer

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: STS-B
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Train a model with prediction head on STS-B
Store this model with model.save_pretrained(output_dir)
Load this model with the following code

config = AutoConfig.from_pretrained(load_path)
model = RobertaModelWithHeads.from_pretrained(load_path, config=config)

The config from AutoConfig (line 1) shows the head:

"prediction_heads": {
    "sts-b": {
      "activation_function": "tanh",
      "head_type": "classification",
      "layers": 2,
      "num_labels": 1
    }
  }

The config of the RobertaModelWithHeads removes the head from the config:

  "prediction_heads": {},

Apparently the weights of the head are also not loaded:

>>> [(k,v) for (k,v) in model.named_parameters() if 'bert' not in k]
[]

However, they are in the pickled checkpoint file:

>>> import torch
>>> x = torch.load(in_dir + '/pytorch_model.bin')
>>> [k for k in x.keys() if 'bert' not in k]
['heads.sts-b.1.weight', 'heads.sts-b.1.bias', 'heads.sts-b.4.weight', 'heads.sts-b.4.bias']

Expected behavior

Model also loads the prediction head of the checkpoint.

Environment info

transformers version: adapter-transformer pre-release version
Platform: macos catalina
Python version: 3.7.4
PyTorch version (GPU?): 1.5.0+cu92
Tensorflow version (GPU?):
Using GPU in script?: no
Using distributed or parallel set-up in script?:

adapter with seq2seq models

Hi
could you point me how I can use adapters with seq2seq models in huggingafce repo? thanks

is pretraining of adapter layers is necessary for performance?

Hi
I have added adapter layers to my custom model and currently getting very low performance, could you tell me if adapter layers needs special pretraining and how I can do this pretraining? I am defining them from scratch and freeze the model and then train the adapters. thanks.

the reason for add_classification_head

Hi
I read the description here https://docs.adapterhub.ml/prediction_heads.html
I am not getting why adding add_classification_head is needed? could you give more details on why one needs to introduce it and how it works? thanks.

Suggested adapter configuration for transformers with pre-layer normalization

What is suggested adapter configuration for transformers with pre-layer normalization? I mean where to keep Layer-normalization within adapters?

Thanks

number of iterations for all datasets in adapter fusion

Hi
could you clarify the number of training iterations? I could not find it in the paper
thanks

moving computation inside get_adapter_preparams to the adapter forward

Hi
I see currently you have implemented part of computation of adapter layers inside get_adapter_preparams
see https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_bert.py line 107, this is confusing and to me the best is putting all computation in one place, this is inside Adapter class https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_modeling.py, line 43 to allow track the method easier.
thanks.
Best
Rabeeh

Loading custom adapters and 'output_attentions' for AdapterFusion

Question

Information

Model I am using (Bert, XLNet ...): XLM-RoBERTa-base

Language I am using the model on (English, Chinese ...): Korean

Adapter setup I am using (if any):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

Datasets: KorNLI and KorSTS (Machine translated Korean MNLI & STS-B dataset)
Its format and size are the same as the original datasets (MNLI & STS-B)

Background

What I'm doing is that:

train Task-Adapters for KorNLI and KorSTS on the XLM-RoBERTa-base model (to train on Korean datasets) using the official code, 'run_glue_alt.py'
fusion both adapters with a fusion layer using 'run_fusion_glue.py'

Questions

Sorry that I'm not familiar with the adapter-transformers codebase.
Here are some questions about the AdapterFusion framework.

Is it available to load my own pre-trained adapters using 'model.load_adapter' function in the current framework? (I'm using the latest version of adapter-transformers')
The performance on the target task (KorSTS) composed with KorSTS and KorNLI single task adapters is markedly lower than the single task adapter trained on the KorSTS dataset. Even with various hyperparameter (batch size, epoch, learning rate, fusion config, ...) search, the performance doesn't seem to be improved. Is there any way to check whether the fusion layer is trained properly?
Connected with the questions above, is it possible to investigate the attention distribution of the trained fusion layer? I've checked there is an option 'output_attentions' defined in the BertModel class, but I could not find a way to output attention weights of the fusion layers, not the self-attention layers of the original pre-trained model.

Environment info

transformers version:
Platform:
Python version: 3.6.3
PyTorch version (GPU?): 1.4
Tensorflow version (GPU?):
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No, I'm using a single GPU

Unintuitive slowdown in data loading and model updating on using adapters

Environment info

transformers version: 1.0.1
Platform: Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.7.0 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes

Who can help:
@LysandreJik @patrickvonplaten

Model I am using: Bert

Language I am using the model on:English

Adapter setup I am using (if any): HoulsbyConfig

The problem arises when using:
My own modified scripts:
I want to use adapters for a project of mine, which will require fine-tuning BERT multiple times. In order to get an understanding of how much speedup I shall get from using adapters, I profiled the various steps in the training loop of BERT, both with and without the use of adapters
The tasks I am working on is:
Stanford Natural Language inference(SNLI)

To reproduce

Steps to reproduce the behavior:
The following function is executed for a period of 4 hours on identical GPUs(via an LSF bach system) once with UseAdapter set to true and once with it set to False. The path contains a preloaded and tokenized version of the SNLI training set(as well as the test and dev sets, dropped here via underscores)

def load_and_train(path, UseAdapter):
    x_train,y_train,a_train,t_train,_,_,_,_,_,_,_,_=load(open(path,"rb"))
    train_inst=torch.tensor(x_train)
    train_att=torch.tensor(a_train)
    train_types=torch.tensor(t_train)
    train_targ=torch.tensor(y_train)
    train_data = TensorDataset(train_inst, train_att, train_types,train_targ)
    train_sampler = RandomSampler(train_data)
    train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=32)
    model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
    if UseAdapter:
        model.add_adapter("SNLI",AdapterType.text_task,HoulsbyConfig().__dict__)
        model.train_adapter(["SNLI"])
        model.set_active_adapters(["SNLI"])
    model.cuda()
    optimizer=AdamW(model.parameters(),lr=1e-4)
    scheduler=get_linear_schedule_with_warmup(optimizer,0,len(train_dataloader)*EPOCHS)
    iter=0
    time_load=0
    time_cler=0
    time_forw=0
    time_back=0
    time_updt=0
    for e in range(15):
        model.train()
        for batch in train_dataloader:
            last=time()
            x=batch[0].cuda()
            a=batch[1].cuda()
            t=batch[2].cuda()
            y=batch[3].cuda()
            time_load+=time()-last
            last=time()
            model.zero_grad()
            time_cler+=time()-last
            last=time()
            outputs = model(x, token_type_ids=t, attention_mask=a, labels=y)
            time_forw+=time()-last
            last=time()
            loss=outputs[0]
            loss.backward()
            time_back+=time()-last
            last=time()
            optimizer.step()
            scheduler.step()
            time_updt+=time()-last
            iter+=1
            print(time_load,time_cler,time_forw,time_back,time_updt)

Expected behavior

With Adapters the trainer is able to run through more batches than without by the time the job gets timed out
Per Batch time_load is identical for both cases
Per Batch time_cler is slightly lower with adapters due to the presence of fewer gradients
Per Batch time_forw is slightly higher with adapters due to extra layers that are introduced
Per Batch time_back is significantly lower with adapters since it needs to save fewer gradients
Per Batch time_updt is lower with adapters due to having fewer parameters to update

Observed Behaviour

Overall times(seconds):

Adapter	Load Time	Clear Time	Forward Prop	Backward Prop	Update	Total	No of Batches
No	9.141064644	349.405822	873.8870151	11770.82554	1159.772	14163.03	69022
Yes	2721.683394	394.4980106	1652.686945	3192.402303	6304.335	14265.61	95981

Per Batch Times(seconds):

Adapter	Load Time	Clear Time	Forward Prop	Backward Prop	Update
No	0.000132437	0.005062238	0.012660992	0.1705373	0.016803
Yes	0.028356481	0.004110168	0.017218897	0.033260774	0.065683

As is evident from above, points 2 and 6 above are not satisfied in this output.
Note that similar observations were made in 2 reruns of the experiment.
It is unclear to me if there is an explanation I am missing or if this is an implementation issue.

Adapter saving raises JSON serialization errror

🐛 Bug

Information

Adapter saving raises an Exception due to PfeifferConfig and HoulsbyConfig not being JSON serializable. The code assumes either a string or a dict, but not dataclasses as configurations.

To reproduce

See https://colab.research.google.com/drive/1ql343s22txh8q63w_Dfk25JoIj7pGdJB?usp=sharing

[model.load_adapter_fusion] Cannot load pre-trained adapter fusion into model

Even all the layer is loaded to the model, however, the weights are not applied at all.
is there any version issue or did I miss something?

model.load_adapter("/home/test/siqa_default", "text_task",config=PfeifferConfig(), with_head=False)
model.load_adapter("/home/test/a", "text_task",config=PfeifferConfig(), with_head=False)
model.load_adapter("/home/test/b", "text_task",config=PfeifferConfig(), with_head=False)
model.load_adapter("/home/test/c", "text_task",config=PfeifferConfig(), with_head=False)
model.load_adapter("/home/test/d", "text_task",config=PfeifferConfig(), with_head=False)

adapter_names = [
        [
            "siqa_default",
            "a",
            "b",
            "c",
            "d"
        ]
    ]

# pre-trained fusion_path 
    fusion_path = ="/home/test/fusion/siqa_defaut,a,b,c,d"
    model.load_adapter_fusion(fusion_path)

# test_dataset
test_dataset = (
    MultipleChoiceDataset(
        data_dir=data_args.data_dir,
        tokenizer=tokenizer,
        task=task_type,
        max_seq_length=data_args.max_seq_length,
        overwrite_cache=data_args.overwrite_cache,
        mode=Split.test,
    )
)

def compute_metrics(p: EvalPrediction) -> Dict:
    preds = np.argmax(p.predictions, axis=1)
    return {"acc": simple_accuracy(preds, p.label_ids)}

# Initialize our Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    adapter_names=adapter_names
)

Language modeling head for flexible head classes

Hi! I have many language adapters, each trained with masked language modeling on different (English) datasets.

I want to be able to load 1 BERT model, load each of the adapters, and then decide which adapter to use on any given forward pass. This would save on memory and loading time, as opposed to loading a separate BERT for each adapter. This seems possible -- here's what I'm doing:

model_name = 'bert-base-cased'
model = BertForMaskedLM.from_pretrained(model_name)
model.load_adapter(ADAPTER1)
model.load_adapter(ADAPTER2)
# In practice, I have many more adapters

output1 = model(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER1])
output2 = model(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER2])

However, I am getting different outputs than having one model per adapter. For example, output1 above is different from output1 below, and output2 above is different from output2 below.

model_name = 'bert-base-cased'

model1 = BertForMaskedLM.from_pretrained(model_name)
model1.load_adapter(ADAPTER1)
output1 = model1(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER1])

model2 = BertForMaskedLM.from_pretrained(model_name)
model2.load_adapter(ADAPTER2)
output2 = model2(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER2])

I'm trying to read https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_model_mixin.py#L339 to see why this would be the case. Is this expected behavior? Am I doing something wrong?

Thanks!

Inference results are different every time

Hi,

I built my own sentiment-analysis-adapter but I found the inference result are different although the same text was input.
It is necessary to freeze the layers during inference or is there anything wrong with training configuration?

Training code (similar with: https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Training.ipynb#scrollTo=M6vjtq3NHtxS)

model_name = "cl-tohoku/bert-base-japanese-whole-word-masking"
# BERT model for Japanese
data_args = GlueDataTrainingArguments(task_name="sst-2", data_dir="./glue_data/yahoo_movie_reviews/")
# yahoo_movie_reviews is a dataset very similar with sst-2 but in Japanese
training_args = TrainingArguments(
    logging_steps=1000, 
    per_device_train_batch_size=32, 
    per_device_eval_batch_size=64, 
    save_steps=1000,
    evaluate_during_training=True,
    output_dir="./models/yahoo_movie_reviews",
    overwrite_output_dir=True,
    do_train=True,
    do_eval=True,
    do_predict=True,
    learning_rate=0.0001,
    num_train_epochs=10,
)
set_seed(training_args.seed)
num_labels = glue_tasks_num_labels[data_args.task_name]

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelWithHeads.from_pretrained(model_name)
model.add_adapter("sst-2", AdapterType.text_task)
model.train_adapter(["sst-2"])
model.add_classification_head("sst-2", num_labels=num_labels)
model.set_active_adapters([["sst-2"]])

train_dataset = GlueDataset(data_args, tokenizer=tokenizer)
eval_dataset = GlueDataset(data_args, tokenizer=tokenizer, mode="dev")

def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    return glue_compute_metrics(data_args.task_name, preds, p.label_ids)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

trainer.train()
trainer.evaluate()

Inference code is almost as same as https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Inference.ipynb#scrollTo=2xwdA1sz7eZO

Thanks!
Lai

Run_ner.py by default tries to load prediction head from language adapter directory

🐛 Bug

Information

Model I am using (Bert, XLNet ...):
bert-base-multilingual-cased

Language I am using the model on (English, Chinese ...):
Finnish

Adapter setup I am using (if any):
Fi language adapter (fine-tuned the pre-trained one from AdapterHub) & NER task adapter (newly initialized and fine-tuned)

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

Using run_ner.py

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

NER on the FiNER dataset

To reproduce

Steps to reproduce the behavior:

I execute the run_ner.py script to finetune my adapters, the finetuned adapters are stored in my output_dir
After finetuning, both adapters are stored with a pytorch_model_head.bin each
I run the run_ner.py script again, with do_train=False and do_evaluate=False in order to predict only, load_lang_adapter pointing to the dir where the finetuned language adapter is stored and load_task_adapter pointing to where the finetuned task adapter is stored
The script loads the prediction head weights from the language adapter's pytorch_model_head.bin whereas the task adapter's pytorch_model_head.bin is not used

07/20/2020 14:50:17 - INFO - transformers.adapter_model_mixin -   Loading module configuration from best_model/ner/adapter_config.json
07/20/2020 14:50:17 - INFO - transformers.adapter_config -   Adding adapter 'ner' of type 'text_task'.
07/20/2020 14:50:17 - INFO - transformers.adapter_model_mixin -   Loading module weights from best_model/ner/pytorch_adapter.bin
07/20/2020 14:50:17 - INFO - transformers.adapter_model_mixin -   Loading module configuration from best_model/fi/adapter_config.json
07/20/2020 14:50:17 - INFO - transformers.adapter_config -   Adding adapter 'fi' of type 'text_lang'.
07/20/2020 14:50:17 - INFO - transformers.adapter_model_mixin -   Loading module weights from best_model/fi/pytorch_adapter.bin
07/20/2020 14:50:17 - INFO - transformers.adapter_model_mixin -   Loading module configuration from best_model/fi/head_config.json
07/20/2020 14:50:17 - INFO - transformers.adapter_model_mixin -   Loading module weights from best_model/fi/pytorch_model_head.bin

Expected behavior

My intuition here is that the pytorch_model_head.bin in both the fi lang adapter's and the ner task adapter's directory should be the same since I used both during the training process, but it's unclear to me if that's the case. Since this is an NER script, I would also expect that the head is loaded from the NER task adapter directory. If I wanted to use a different language adapter for Finnish now, I would have to copy the pytorch_model_head.bin from the old language adapter's directory to the new language adapter's directory because the script, by default, tries to load it from there. If it loaded it from the NER adapter's directory instead, this would not be an issue. I'm not aware if there are some additional flags I can set to change this. It may not be a bug really, but it definitely caused some confusion on my side.

Environment info

transformers version: 2.11.0
Platform: Darwin-19.5.0-x86_64-i386-64bit
Python version: 3.7.5
PyTorch version (GPU?): 1.5.1 (False)
Tensorflow version (GPU?): 2.2.0 (False)
Using GPU in script?: False
Using distributed or parallel set-up in script?: False

Quick Start Adapter was not found

🐛 Bug

Hi, I tried to run the Quickstart Tutorial here, so Im using that exact code. When loading the adapter with model.load_adapter('sst') Im getting an error mentioned below. Using "sst-2" instead of "sst" returns another warning?

Information

Model I am using (Bert, XLNet ...):

Language I am using the model on (English, Chinese ...):

Adapter setup I am using (if any):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Run code from here

Using

# load pre-trained task adapter from Adapter Hub
# with with_head=True given, we also load a pre-trained classification head for this task
model.load_adapter('sst', config='pfeiffer', with_head=True)

# activate the adapter we just loaded, so that it is used in every forward pass
model.set_active_adapters('sst')

Returns

    raise EnvironmentError("No adapter with name '{}' was found in the adapter index.".format(specifier))
OSError: No adapter with name 'sst' was found in the adapter index.

Using:

# load pre-trained task adapter from Adapter Hub
# with with_head=True given, we also load a pre-trained classification head for this task
model.load_adapter('sst-2', config='pfeiffer', with_head=True)

# activate the adapter we just loaded, so that it is used in every forward pass
model.set_active_adapters('sst-2')

Returns

INFO:transformers.adapter_bert:No prediction head for task_name 'sst-2' available.
WARNING:transformers.adapter_bert:No prediction head is used.

Expected behavior

Environment info

transformers version: 2.11.0
Platform: Linux-4.15.0-108-generic-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): 1.5.0 (False)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: no
Using distributed or parallel set-up in script?: using quickstart tutorial

Supporting DistilBERT model

Hi,

I really like this project and I am wondering if the adapter-transformers could support DistilBERT model,
since I have following errors when I trained an adapter for my own DistilBERT:

Training: https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Inference.ipynb#scrollTo=2xwdA1sz7eZO

model_name = "bandainamco-mirai/distilbert-base-japanese"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelWithHeads.from_pretrained(model_name)
model.add_adapter("yahoo_movie_reviews", AdapterType.text_task)
model.train_adapter(["yahoo_movie_reviews"])

Then I got:

'ValueError: Unrecognized configuration class  for this kind of AutoModel: AutoModelWithHeads.
Model type should be one of XLMRobertaConfig, RobertaConfig, BertConfig.

Inference: https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Inference.ipynb#scrollTo=2xwdA1sz7eZO

model = AutoModelForSequenceClassification.from_pretrained("bandainamco-mirai/distilbert-base-japanese")
tokenizer = AutoTokenizer.from_pretrained("bandainamco-mirai/distilbert-base-japanese")
model.load_adapter("adapter/yahoo_movie_reviews")

Then I got:

AttributeError: 'DistilBertForSequenceClassification' object has no attribute 'load_adapter'

I wonder if it is possible to train an adapter for DistilBERT by changing part of exist codes.

Thanks for any insights on that.

Examples for zero-shot cross-lingual transfer for Classification

🌟 New adapter setup

Add code examples for zero-shot cross-lingual transfer like classification problem.

Motivation

As the description at Adapter documentation, I understand that the adapter can be performed zero-shot cross-lingual transfer, so I am facing the problem that I want to build a classification model will be train on English and test on another language like Chinese, Spanish,.. so can you add an example to use Adapter for this setting.

I already go through available examples but I still did not clarify how to use it, please let me know if I missed something.

Thank you so much

How can I use adapters when each input has multiple [cls] tokens

Hi!

I'm trying to use adapters for a task where the input is a set of documents each containing multiple sentences (hence multiple cls tokens). The goal is to assign a binary label to each sentence of each document. In the case of fine-tuning, I would get the output of a pretrained BERTModel, grab the cls embeddings and feed it through a simple classifier. But with adapters, I'm not sure how I can access the output of the base language model, process it, and feed it to a classification head. Any pointers would be greatly appreciated!

What is the best way to use this repo in conjunction of AllenNLP?

Hi,

So AllenNLP has already wrapped many transformer classes from the original transformers library, which uses many same names as adapter-transformers (since the latter was forked from the former).

Move master to main

🚀 Feature request

Motivation

Given the historical baggage associated with the 'Master' branch, I propose moving to the main branch.

https://www.hanselman.com/blog/easily-rename-your-git-default-branch-from-master-to-main

Your contribution

Hinglish Sentiment Adapter

🌟 New Adapter setup

Model and Data Description

Hinglish: Romanized version of Hindi, and is immensely popular in India, where Hindi is spoken by millions of people but typed quite often in Roman script

Dataset: SemEval 2020 Task 9 Sentiment Analysis: 3 classes, +ve, -ve and neutral

Open source status

Code Implementation for the Adapter: https://colab.research.google.com/drive/19lofRd9n142xJCtUteZb5L_r7spGcGLL?usp=sharing
Past Work: Accepted Paper, Code and Model Weights
Who are the authors: @NirantK and @meghanabhange

What I need help with

Because there were no examples other than Glue Datasets, I ended up implementing a new HinglishDataset class and other skeleton code -- I'd appreciate a review if I got something wrong

Next Steps

If all is well in the code above, I'd like to continue along and contribute an adapter for Hinglish under the Sentiment task.

questions on reproducing AdapterFusion paper's results

Hi there,

I need to reproduce the adapter fusion paper's results in table 1, I have a couple of questions:

I was wondering if you could share with me possible scripts to have this table reproduced.
In the paper this is mentioned that you report on the dev set, then do you also tune the parameters on the dev set? could you specify how you tune the parameters?
could you assist me in getting the "Argument" and "CB" datasets, the way you processed it, did you download these two datasets from huggingface datasets? If yes, I could not find the names, could you share their corresponding names in datasets library.
winogrande has multiple datasets of various size, could you specify which dataset of winograde is reported in the paper?

Thank you.

All the Colabs with adapter training fail due to tensors being on different devices

Hi,

I'm trying to train an adapter and I'm getting the following error:
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)
to reproduce this, you can just run any of the colabs provided in the tutorials. For example, running the cells in this one, the training line (trainer.train()) throws this error.

Token classification: TypeError: 'NoneType' object is not subscriptable

🐛 Bug

Information

Hi, I just wanted to train an adapter with the token classification example (using CoNLL-2003 NER dataset). I'm using the following json-based configuration:

{
    "data_dir": "./data_en",
    "labels": "./data_en/labels.txt",
    "model_name_or_path": "bert-large-cased",
    "output_dir": "conll2003-en-1",
    "max_seq_length": 128,
    "num_train_epochs": 3,
    "per_device_train_batch_size": 32,
    "save_steps": 750,
    "seed": 1,
    "do_train": true,
    "do_eval": true,
    "do_predict": true,
    "fp16": true,
    "train_adapter": true,
    "adapter_config": "pfeiffer",
    "language": "en"
}

and run it with python3 run_ner.py <config>.json. Then the following error message is thrown:

Traceback (most recent call last):
  File "run_ner.py", line 323, in <module>
    main()
  File "run_ner.py", line 248, in main
    model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
  File "/mnt/adapter-transformers/src/transformers/trainer.py", line 484, in train
    tr_loss += self._training_step(model, inputs, optimizer)
  File "/mnt/adapter-transformers/src/transformers/trainer.py", line 592, in _training_step
    outputs = model(**inputs, adapter_names=self.adapter_names)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/adapter-transformers/src/transformers/modeling_bert.py", line 1463, in forward
    adapter_names=adapter_names,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/adapter-transformers/src/transformers/modeling_bert.py", line 780, in forward
    adapter_names=adapter_names,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/adapter-transformers/src/transformers/modeling_bert.py", line 437, in forward
    adapter_names=adapter_names,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/adapter-transformers/src/transformers/modeling_bert.py", line 403, in forward
    layer_output = self.output(intermediate_output, attention_output, attention_mask, adapter_names=adapter_names)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/adapter-transformers/src/transformers/modeling_bert.py", line 368, in forward
    hidden_states = self.adapters_forward(hidden_states, input_tensor, attention_mask, adapter_names)
  File "/mnt/adapter-transformers/src/transformers/adapter_bert.py", line 443, in adapters_forward
    adapter_stack=adapter_stack,
  File "/mnt/adapter-transformers/src/transformers/adapter_bert.py", line 374, in adapter_stack_layer
    hidden_states, query, residual = self.get_adapter_preparams(adapter_config, hidden_states, input_tensor)
  File "/mnt/adapter-transformers/src/transformers/adapter_bert.py", line 320, in get_adapter_preparams
    if adapter_config["residual_before_ln"]:
TypeError: 'NoneType' object is not subscriptable

Do I need to provide additional options 🤔

Thanks many in advance,

Stefan

Environment info

transformers version: latest from master, ee2adad
Platform: nvidia/cuda:10.2-cudnn7-devel
Python version: 3.6.9
PyTorch version (GPU?): 1.5.1 + GPU
Tensorflow version (GPU?): None
Using GPU in script?: Yes + fp16
Using distributed or parallel set-up in script?: No

Cannot load some adapters

🐛 Bug

Information

Model I am using (Bert, XLNet ...): bert-base-uncased

Language I am using the model on (English, Chinese ...): EN

Adapter setup I am using (if any):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

self.bert.load_adapter("sts/qqp@ukp", "text_task", config=PfeifferConfig())
self.bert.load_adapter("nli/rte@ukp", "text_task", config=PfeifferConfig())
self.bert.load_adapter("nli/qnli@ukp", "text_task", config=PfeifferConfig())
self.bert.load_adapter("nli/multinli@ukp", "text_task", config=PfeifferConfig())
self.bert.load_adapter("lingaccept/cola@ukp", "text_task", config=PfeifferConfig())

AdapterFusion version of QQP works. Does not work for cola and multinli.

Expected behavior

Can load adapters.

Supporting GPT-2 model

🚀 Feature request

Similar to #58 it would be nice to have support for 🦄 GPT-2.

AdapterFusionConfig is not JSON Serializable

🐛 Bug

Information

The adapter fusion config is not JSON serializable. This can result in crashes when trying to save a model with fusion. See a minimal example below.

To reproduce

https://colab.research.google.com/drive/1YRwxVPe3-2QnatpG4fN_aZH00GoBMaqx?usp=sharing

Expected behavior

AdapterFusionConfig should be JSON Serializable.

Is the initialization function redundant here?

https://github.com/Adapter-Hub/adapter-transformers/blob/34a0ccacdac96d7ce4402b20da1fa6cd3ce99010/src/transformers/modeling_bert.py#L771

The function will add an adapter to the encoder by default. However, according to the doc, we could add an adapter through the add_adapter() method. Is the initialization function redundant here?

Integrate AdapterDrop

🚀 Feature request

Add our internal implementation of AdapterDrop to the AdapterHub repository.
https://arxiv.org/abs/2010.11918

This consists of several smaller features, which I'll add later to this feature request.

Add label information to adapter

🚀 Feature request

When I download an adapter, I would like to have something like

adapter.get_labels()
adapter.get_labels_dict()

so that I do not need to find the right labels in the README and reduce errors when copying them.

Maybe also

adapter.get_autoclass()

to know whether it is token classification or sequence classification

Motivation

I want adapters be self-explaining

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble