asyml / texar-pytorch Goto Github PK

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Home Page: https://asyml.io

License: Apache License 2.0

Python 99.66% Perl 0.23% Shell 0.06% Dockerfile 0.05%

machine-learning natural-language-processing pytorch deep-learning text-generation python machine-translation dialog-systems texar bert

texar-pytorch's People

Contributors

Stargazers

Watchers

Forkers

huzecong avinashbukkittu stc-cqupt tomnong trendingtechnology manoja328 legendtianjin mqrshiyan shubhampachori12110095 lunayach wwt17 haoransh labardan b2220333 whs1111 allensmile carylearner bigheiniu lgstd kormilitzin tonydeep ml-lab amir22010 nguyenducnhaty hhy5277 anirband jrdeco560 yucoian sprinterzzj wangwang110 databill86 tpnguyen tobiaslee alphanlp marcelrobeer drsnowbird batermj yesxiaoyu roholazandie saradhix zhanglipku wwwpig2004 hunterhector dragomirradev zhangjiekui aslongas ares2013 wharu machinelearning-tutorials chiragjn almoslmi elisaoh swapnull7 atif93 codeaudit zhitinghu elderwanng hugochan mgupta1410 chiragraman aurora1625 codle dami23 panaali diligentfight greengrass2015 aiah xuehuiping zlyx26 qianrenjian swivid enrico-stack aniruddha-ju snehashischatterjee1997 tawawhite whitefu oran-ac odp mingkaid yuxiangzhang0114 haoyulucas mryuan0428 zeyawang young-sun cz9779 indexfziq jennyzhang-petuum swang2000 coorful marvosyntactical jieralice13 lunhou gpengzhi venescu limberc eshellgxxxx xrosliang kk19990709 kristianspurling mylibrar

texar-pytorch's Issues

Update docs for param_groups

This statement is too vague. Pls explain more (what's the data structure? what's each field?) and give an example of how to use this APIs.

Originally posted by @ZhitingHu in asyml/texar#182

Change seq_len to max_time in the docs.

In other modules' doc we say [batch_size, max_time], instead of seq_len. https://github.com/asyml/texar/blob/master/texar/modules/decoders/rnn_decoders.py#L303

Pls keep the terminology consistent. Also, if PyTorch has the same issue, pls fix too

Originally posted by @ZhitingHu in asyml/texar#182

Different output of `UnidirectionalRNNEncoder` on `texar.tf` and `texar.torch`

Currently I found the following two pieces of code have different outputs:
texar.tf:

import tensorflow as tf
import texar.tf as tx

hp = {
    'type': 'LSTMCell',
    'kwargs': {
        'num_units': 256,
        'forget_bias': 0.
    },
    'dropout': {'output_keep_prob': 1},
    'num_layers': 1
}
encoder = tx.modules.UnidirectionalRNNEncoder(hparams={"rnn_cell": hp})

inputs = tf.zeros([32, 50, 256])

sequence_length = [26, 17, 13, 25, 36, 29, 25, 34, 11, 17, 10,
                   22, 23, 24, 33, 18, 21, 17, 22, 20, 34, 22,
                   40, 50, 19, 18, 14, 22, 14, 34, 22, 28]
tf.convert_to_tensor(sequence_length, dtype=tf.int32)
_, states = encoder(inputs, sequence_length)
initilizer  = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(initilizer)
    print("states[0]", sess.run(states[0]))
    print("states[1]", sess.run(states[1]))

texar.torch:

import torch
import texar.torch as tx

hp = {
    'type': 'LSTMCell',
    'kwargs': {
        'num_units': 256,
        'forget_bias': 0.
    },
    'dropout': {'output_keep_prob': 1},
    'num_layers': 1
}
encoder = tx.modules.UnidirectionalRNNEncoder(input_size=256, hparams={"rnn_cell": hp})

inputs = torch.zeros([32, 50, 256])

sequence_length = torch.Tensor([26, 17, 13, 25, 36, 29, 25, 34, 11, 17, 10,
                                22, 23, 24, 33, 18, 21, 17, 22, 20, 34, 22,
                                40, 50, 19, 18, 14, 22, 14, 34, 22, 28]).to(torch.int32)
_, states = encoder(inputs, sequence_length)
print("states[0]", states[0])
print("states[1]", states[1])

texar.tf prints

states[1] [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

states[1] [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

while texar.torch prints

states[0] tensor([[ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0345,  0.0294, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0345,  0.0294, -0.0202,  ..., -0.0030, -0.0021, -0.0020],
        ...,
        [ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020]],
       grad_fn=<StackBackward>)
states[1] tensor([[ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0419,  ..., -0.0061, -0.0042, -0.0041],
        ...,
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040]],

the sizes of both states are the same, but values are different, can this be solved by changing parameters? Thanks!

Multiple inheritance with `DecoderBase`

An initialization issue will occur when we construct a class using multiple inheritances with DecoderBase. For example, we here have class XLNetDecoder(XLNetEncoder, DecoderBase). When we call super().__init__(...) in XLNetDecoder, the initialization of DecoderBase can not be executed properly because of the parameter input_size in DecoderBase.__init__(). hparams is actually assigned to input_size, and hparams is set to be None in ModuleBase.

init order: XLNetDecoder -> XLNetEncoder -> XLNetBase -> DecoderBase -> ModuleBase.

Update doc in RecordData

The doc in RecordData is outdated.

About the Conv1DClassifier

Hello, I'm trying to reduplicate an example (which is originally implemented with texar) using texar-pytorch. But I got confused about the Conv1DClassifier.

The first question is about the documents:

However there doesn't seem to be a hyperparameter called filter. Maybe it means the out_channels?

Currently the Conv1DClassifier requires an inputs with a shape of [batch_size, channels, length], which means I have to manually transpose the input because in most case it have a shape like [batch_size, length, dim]. I understand that the requirement resulting from pyTorch's torch.nn.Conv1d, but I wonder if the classifier can do the transposing job for me so that its input style may keep in line with other classifiers (always set the length as the second dim).
In the Tensorflow version of texar, I can set 'other_conv_kwargs': {'padding': 'same'} to make sure the output shape will be identical to the input shape, but I can't do this in texar-pytorch since torch.nn.Conv1d only accept an integer for the padding numbers. Now I have 3 different kernel sizes [3, 4, 5] and my workaround is 'other_conv_kwargs': {'padding': 2}. But I wonder if I can manually set different paddings for different kernel sizes like padding: [1, 2, 2]. It would be even better if texar can calculate the expected paddings for me.
the Conv1DClassifier will add a default hyperparameter:

texar-pytorch/texar/torch/modules/classifiers/conv_classifiers.py

Lines 136 to 143 in 507932c

 hparams.update({ 

 "name": "conv1d_classifier", 

 "num_classes": 2, # set to <=0 to avoid appending output layer 

 "logit_layer_kwargs": { 

 "in_features": hparams["out_features"], 

 "bias": True 

 } 

 })

When I set 'num_dense_layers': 0, The Linear layer below will still have a hyperparameter in_features, which is totally wrong because there isn't any dense layer in the Conv1DEncoder.

texar-pytorch/texar/torch/modules/classifiers/conv_classifiers.py

Lines 71 to 85 in 507932c

 if self._hparams.num_dense_layers <= 0: 

 self._encoder.append_layer({"type": "Flatten"}) 

 logit_kwargs = self._hparams.logit_layer_kwargs 

 if logit_kwargs is None: 

 logit_kwargs = {} 

 elif not isinstance(logit_kwargs, HParams): 

 raise ValueError( 

 "hparams['logit_layer_kwargs'] must be a dict.") 

 else: 

 logit_kwargs = logit_kwargs.todict() 

 logit_kwargs.update({"out_features": self._num_classes}) 

 self._encoder.append_layer({"type": "Linear", 

 "kwargs": logit_kwargs})

In fact in this situation, the correct in_features should be the number of different kernel sizes times the number of out channels. Say, I have 'kernel_size': [3, 4, 5] and 'out_channels': 128, after Encoder-Flatten-Dropout, the in_features should be len([3,4,5]) * 128 = 384

Support dynamic batch sizes

Support dynamic batch sizes (e.g. each batch contains no more than x words) like torchtext. Since our implementation uses PyTorch's BatchSampler, we can provide an interface for the user to supply their own BatchSampler.

Note for lower-level implementation: (This note is only for those who understand how multi-processing in tx.data.DataBase works) Users might need to access data stored in examples (e.g. sentence length) to decide whether to include another example in the batch. This is possible without further changes to the code base because:

Samplers are executed on the main process only.
_prefetch_source is called in the (non-batch) samplers, before the next index is yielded. So even if lazy loading is enabled, examples will be accessible by the time the user tries to access it in BatchSampler code.

how to share embedding weights between encoder and decoder

Add a default implementation of no-op for `DataBase.process`

In many cases the user needs to implement their own data class, e.g. when using data that they've prepared themselves. In this case, it is often unnecessary to implement process, but since we don't provide the default impl., they would still need to override process with a no-op. Would it make sense to provide no-op as the default?

Also, the name IterDataSource might be misleading because by design the iterator can only be iterated over once if no caching options are set. Should we change it or remove it, or make the point stronger in the docstring?

@AvinashBukkittu What do you think?

Support RoBERTa in pretrained modules

https://github.com/pytorch/fairseq/tree/master/examples/roberta

Conv1DNetwork returns tensor of incorrect shape

When using Conv1DNetwork with multiple filters and no dense layers, the returned tensor has incorrect shape.

For example:

char_embed_size = 8
char_cnn = tx.modules.Conv1DEncoder(
    in_channels=char_embed_size, hparams={
        "kernel_size": [3, 4, 5],
        "out_channels": 50,
        "num_dense_layers": 0,
        "conv_activation": tx.core.identity,
        "dropout_conv": [],
        "dropout_rate": 0.0,
    })
batch_size, seq_len = 20, 30
# (batch, in_channels, in_features)
input = torch.randn(batch_size, char_embed_size, seq_len)
output = char_cnn(input)
out_channels = 150  # out_channels is multiplied by length of kernel_size
assert output.size() == (batch_size, out_channels)  # raises AssertionError
# actual returned size is (batch_size, 50, 3)

This is because internally Conv1DNetwork constructs a MergeLayer that concats outputs from differently-sized kernels, but does not specify the dim argument, so the default value of 2 is used. However, for convolutional layers, the channel dimension (defaults to dimension 1 in PyTorch) should be concat'd.

The test cases failed to capture this bug because dense layers were used. This resulted in Flatten layer being added, and in_features to the following Linear layer is inferred by actually running an input example through the network.

BidirectionalRNNEncoder does not support multi-layer RNNs

Cells for multi-layer RNNs are created a list of per-layer RNN cells stacked together by MultiRNNCell. The state representation for MultiRNNCell is therefore a list of states from each layer. However, _dynamic_rnn_loop only supports using single-layer RNNCells, and only checks the special case of LSTMCell where the state is a tuple of two tensors.

It is hard to extend such logic to arbitrarily nested cells (because theoretically a MultiRNNCell could contain other MultiRNNCells, although normal users don't do that). For now, we can extend the logic by checking whether the state is a list or tuple, and aggregate states over all elements in the list or tuple.

ModuleNotFoundError: No module named 'texar.torch'import

Installed texar (tf version) fine but having problems running scripts in texar-pytorch.

All scripts return this error:

ModuleNotFoundError: No module named 'texar.torch'import

For example: xlnet_generation_main.py

Have installed according to instructions. Can import "texar" followed by "torch" from python but not "import texar.torch as tx" at the start of each script.

Should a path be added? Any help appreciated.

pip install -e .

Installing collected packages: numpy, idna, certifi, chardet, urllib3, requests, funcsigs, mypy-extensions, texar-pytorch
Found existing installation: texar-pytorch 0.0.1
Uninstalling texar-pytorch-0.0.1:
Successfully uninstalled texar-pytorch-0.0.1
Running setup.py develop for texar-pytorch

python3
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.

import texar.torch as tx
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'texar.torch'import
import texar
import torch

gpt2 example polish

Port gpt2_train_main.py too

texar-pytorch/examples/gpt-2/gpt2_generate_main.py

Lines 26 to 28 in dc51de2

 from texar.modules.embedders.embedders import WordEmbedder 

 from texar.modules.embedders.position_embedders import PositionEmbedder 

 from texar.modules.decoders.transformer_decoders import TransformerDecoder

do not expose internal details. ---> from texar.modules import TransformerDecoder

texar-pytorch/examples/gpt-2/gpt2_generate_main.py

Lines 84 to 95 in dc51de2

 parser.add_argument('--config_model', 

 type=str, 

 default="configs.config_model_117M", 

 help="The model configuration file to configure the " 

 "model. The config file type is define by the " 

 "'config_type',it be of texar type or json type." 

 "For '--config_type=json', set the json " 

 "config file path like: '--config_model " 

 "gpt2_pretrained_models/model_117M/hparams.json';" 

 "For '--config_type=texar', set the texar " 

 "config file like: " 

 "'--config_model configs.config_model_117M'.")

Too sparse. Reformat to sth like

parser.add_argument(
    '--config_model',
    ...

texar-pytorch/examples/gpt-2/gpt2_generate_main.py

Line 100 in dc51de2

class GPT2(nn.Module):

Should this inherit nn.Module or texar.ModuleBase?
texar-pytorch/examples/gpt-2/gpt2_generate_main.py

Lines 197 to 198 in dc51de2

except EOFError:

exit(0)

Better to print a message

Extra head-tail empty string in data batch

Currently, in the PR #116, there is possible extra head-tail empty string("") in loading ptb data(train, valid and test), using MonoTextData and TrainTestDataIterator.

E.g.

texar-pytorch/examples/vae_text/vae_train.py

Line 135 in 6e7d61f

def forward(self, data_batch, kl_weight):

here the correct data_batch["text"] could be ['<BOS>', 'abc', ...'cba', '<EOS>'], but the actual data is ['<BOS>', '', 'abc', ...'cba', '', '<EOS>']. Two empty strings are added between the head and tail, along with the data_batch["text_ids"].

This issue is only found when loading the ptb data now.

Make modules in xlnet_model_utils.py texar modules

These modules should be made as texar modules (Zhiting's comments in #120 #107 ).

AttributeError: module 'torch.nn' has no attribute 'Identity'

The error occurs when I am running the example of gpt-2.

python gpt2_generate_main.py --is_interactive
--max_decoding_length=100
--temperature=0.7
--top_k=40

torch version: 1.0.0

Generator issue in BufferShuffleSampler

Here, return <something> in a generator is equivalent to raise StopIteration(<something>), which means function _iterator_given_size returns nothing when self.buffer_size >= size.

Here is a simple example:

def f():
  return iter([1,2,3])
  yield 2

print(list(f()))  # []

AttentionRNNDecoder param initial_state wrapped in AttentionWrapperState

Hi @gpengzhi , we noticed that it would be too heavy to wrap the AttentionRNNDecoder forward method parameter "initial_state" into AttentionWrapperState. The position is here: https://github.com/asyml/texar-pytorch/blob/master/texar/modules/decoders/rnn_decoders.py#L518. Would you consider making it as a regular tensor? Thanks!

Add Tokenizer for pretrained modules

https://github.com/gpengzhi/pytorch-transformers/tree/master/pytorch_transformers

AttentionRNNDecoder interface refactor

Requiring input_size and encoder_output_size, especially in AttentionRNNDecoder, is too much, e.g., here:

tx.modules.AttentionRNNDecoder(
            encoder_output_size=(self.encoder.cell_fw.hidden_size +
                                 self.encoder.cell_bw.hidden_size),
            input_size=self.target_embedder.dim + config_model.decoder
            ['attention']['attention_layer_size'],
            ...)

Can we refactor the interface a bit so that sth like

encoder_output_size=self.encoder.output_size,
input_size=self.target_embedder.dim

is enough?

related issue: #42

@gpengzhi @huzecong @TomNong

SinusoidsPositionEmbedder does not support negative indices

SinusoidsPositionEmbedder now requires the argument position_size (i.e. maximum sequence length) in its constructor, and precomputes position embeddings for indices in the range [0, position_size). This improves efficiency during embedding lookup.

However, in certain cases the maximum sequence length cannot be known at the time of model construction (e.g. models supporting arbitrary-lengthed sequences), or require sinusoids at negative indices (e.g. TransformerXL with relative positional encoding). Neither of these cases are supported by the current implementation.

My suggestion is to add a flag cache_embeddings to the constructor. If True, follow the current precomputing scheme; if False, compute the encodings on the fly during lookup.

Documentation issues

Currently there are a plethora of issues with the documentation. In a certain sense, documentation is more important than code for end-users so we should also fix these before release.

Add CI test for building documentations.
Fix docstrings with incorrect/inconsistent Sphinx format. Examples include:
- Unescaped markup characters: "*_keep_prob" should be "\*_keep_prob". Also make sure all docstrings are raw strings (r"""docstring""").
- Incorrect indentation: Docstrings should follow same indentation as code (4 spaces).
- Collapsed strings where there should be structures (e.g. lists).
```
 Wrong (will render as one line):
 - This is a long line that
 is wrapped.
 - Second item.
 Correct:
 - This is a long line that
   is wrapped.
 - Second item.
```
- Modules, classes, method, attributes should use corresponding Sphinx roles (:mod:, :class:, :meth:, :attr:).
- String literals, code snippets, and other identifiers that should be in monospace font should be in monospace font (``"string"``, :python:`code`, ``identifier``).
Fix docstrings that are incorrect. Examples include:
- Method docstrings with arguments that no longer exist.
- Class docstrings with attributes / __init__ arguments that no longer exist.
- Code examples that still use TensorFlow functions, or do not conform to the current APIs should be changed accordingly.
- External links to PyTorch documents should exist for each PyTorch class/function/module referenced.

This is tedious work for the document maintainer, so in the future, I hope each contributor could also make sure that the docstrings you added conform to the above standards when you make a pull requests.

Add "output_size" for all Texar modules

In PyTorch, users are required to know the output size of previous modules in order to create parameters for following modules that take as input the previous outputs. For certain modules, manually computing the output size could be difficult for the user (e.g. convolutional layers). It would be useful to add an output_size property to ModuleBase.

utils.pad_and_concat cannot pad torch.LongTensor list

pad_and_concat function cannot pad torch.LongTensor. For example

>>> a = [torch.tensor([[1,2]]), torch.tensor([[3]])]
>>> a[0].dtype
torch.int64
>>> texar.utils.pad_and_concat(a, 0)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/haoransh/PycharmProjects/texar-pytorch/texar/utils/shapes.py", line 221, in pad_and_concat
    values[i] = torch.cat((v, padding), dim=pad_dim)
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors'

Inconsistent behavior between hparams.dataset.bos_token/vocab.bos_token and hparams.dataset.eos_token/vocab.eos_token

Currently, we set vocab.bos_token and vocab.eos_token to their default values (<BOS> and <EOS> respectively) when bos_token and eos_token in hparams are set to empty string. This creates inconsistencies between the two sets of variables. In the process method, we use the special tokens from the hparams.

One possible use case of this happening is when the input files already contain bos and eos tokens and the user does not want any additional tokens to be added during processing.

Originally posted by @huzecong in #53

Not support Windows system？

When I run the code in Windows, I met the error

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead

and I run the same codes in Linux, all things were OK.

I had checked the pytorch version were both 1.1.0.

Add configuration for XLNet-Base

The XLNet authors have just released a new pre-trained model checkpoint named XLNet-Base, Cased (https://github.com/zihangdai/xlnet#released-models). We should add support for this checkpoint.

Write clear docs for output_size methods. In few cases, we return the size of only one of the tensors.

Then the docstring "The final dimension(s) of :meth:forward output tensor(s)" should be updated. It's not exactly the size(s) of forward outputs, but instead more of case-by-case? E.g., sometimes it's the dimensions of some of the output tensors.

And, what does "final dimension(s)" mean?

Since it's case-by-case, all output_size should explicitly and clearly explain what it is in the particular classes.

Originally posted by @ZhitingHu in asyml/texar#182

seq2seq_attn example polish

Make bleu_tool.py the same as in TF

Make sure examples/transformer/bleu_tool.py is the same as the Texar-TF version

related issue: asyml/texar#174

DataBase.to should return itself

A commonly used paradigm is tensor = tensor.to(device). Now DataBase.to is implemented as a no-op (just stored the device; actual data moving is done in process and collate), to support such paradigm it should return self.

Loading pre-trained model checkpoint and model arch config are coupled

setting pretrained_model_name will not only define the model arch but also load the pre-trained checkpoint. We should have another hparam to control whether to load pre-trained checkpoint or not.

Improve code structure of pre-trained module

init_xxx_checkpoint, load_pretrained_xxx, and transform_xxx_to_texar_config might be better to be part of the pre-trained class instead of xxx_utils.py (Zecong's comments in #120 ).

Implementing XLNet with IMDB cannot run

Traceback (most recent call last):
File "xlnet_classification_main.py", line 274, in
main(_args)
File "xlnet_classification_main.py", line 166, in main
iterator = tx.data.DataIterator(datasets)
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 594, in init
for name, dataset in datasets.items()}
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 594, in
for name, dataset in datasets.items()}
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 465, in init
sampler = RandomSampler(dataset)
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 167, in init
data, replacement, num_samples)
File "/home/yh/.local/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Tried to change the default value of agrs such as replacement = True and num_samples =(some integer) of data_iterator.RandomSampler, but it doesn't work... Any advice would be appreciated!

Move the default download directory outside the package directory

The default download directory texar_download is currently under the package directory texar-pytorch. The preferred installation method now is pip install . instead of pip install -e .. We should store those large checkpoint files outside the package directory. We could probably set the default directory to ~/texar_download (a folder under the home directory).

bug in `texar.utils.shapes.get_rank()`

https://github.com/ZhitingHu/texar-pytorch/blob/master/texar/utils/shapes.py#L71

tensor.dim() returns int.
TypeError: object of type 'int' has no len()

transformer example polish

texar.modules.Transformer in its current form is far from being qualified as a Texar module. Please move it to under examples/

Can we use Texar data modules instead of torchtext which is not as clean as ours, e.g.,

texar-pytorch/examples/transformer/transformer_main.py

Lines 185 to 192 in dc51de2

 random.shuffle(train_data) 

 train_iter = data.iterator.pool( 

 train_data, 

 config_data.batch_size, 

 key=lambda x: (len(x[0]), len(x[1])), 

 # key is not used if sort_within_batch is False by default 

 batch_size_fn=utils.batch_size_fn, 

 random_shuffler=data.iterator.RandomShuffler())

texar-pytorch/examples/transformer/transformer_main.py

Lines 43 to 46 in dc51de2

parser.add_argument("--run_mode",

type=str,

default="train_and_evaluate",

help="Either train_and_evaluate or test.")

The explanation "Either train_and_evaluate or test" is inconsistent to the code, which allows "train_and_evaluate"/"eval"/"test".... Also, call it either "eval" or "evaluate", not both
texar-pytorch/examples/transformer/transformer_main.py

Line 73 in dc51de2

beam_width = getattr(config_model, "beam_width", 1)

avoid using uncommon APIs like getattr, which harms readability. Is it necessary to use getattr here?
texar-pytorch/examples/transformer/transformer_main.py

Lines 82 to 86 in dc51de2

if torch.cuda.is_available():

model = model.cuda()

device = torch.cuda.current_device()

else:

device = None

Is it necessary for all models? Can we simply this? E.g., add a maybe_to_cuda() method to texar.ModuleBase
texar-pytorch/examples/transformer/transformer_main.py

Lines 97 to 99 in dc51de2

optim = torch.optim.Adam(

model.parameters(), lr=init_lr, betas=(0.9, 0.997), eps=1e-9)

scheduler = torch.optim.lr_scheduler.LambdaLR(optim, scheduler_lambda)

Can this be replaced with Texar APIs? Quite a few hyperparameters hard-coded here
texar-pytorch/examples/transformer/transformer_main.py

Line 102 in dc51de2

torch.cuda.empty_cache()

Is it common / necessary ?
texar-pytorch/examples/transformer/transformer_main.py

Line 114 in dc51de2

with torch.no_grad():

Is it common / necessary ?
texar-pytorch/examples/transformer/transformer_main.py

Line 137 in dc51de2

fname = os.path.join(args.model_dir, 'tmp.eval')

If it's only for temporary file, use python tempfile

WordEmbedder does not provide option to freeze embedding parameters

When using pre-trained embeddings, it is often desired to freeze embedding parameters during training. However, current interface of WordEmbedder will always create a trainable variable (i.e. nn.Parameter with requires_grad=True).

Will texar-pytorch support "Text Style Transfer" ?

Texar(-TensorFlow) supports Text Style Transfer, which is very nice.

Will texar-pytorch support "Text Style Transfer" ?
Thanks!

Seq2Seq Example with GPU Support

Hello,

How can I run the Seq2Seq example with my GPU?

I already modified the training data to use the cuda device as well as the model:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 

train_data = tx.data.PairedTextData(hparams=config_data.train, device=device)
val_data = tx.data.PairedTextData(hparams=config_data.val, device=device)
test_data = tx.data.PairedTextData(hparams=config_data.test, device=device)

model = Seq2SeqAttn(train_data)
model.to(device)

Redesign module constructor interfaces

When subclassing existing modules, a common use case I've encountered is to automatically fill certain arguments of the super class constructor based on HParams values. By design, HParams are realized in the base class (ModuleBase), where the default values are filled in and user-provided values are type-checked. However, in order to call the base class constructor, all it arguments must be filled, and these arguments may rely on default HParams values, which are not available yet.

As a result, one must explicitly realize the HParams in the derived class. But there is no way to prevent the base class from realizing the HParams again, and in cases with multiple levels of inheritance, HParams realization could happen multiple times.

For example, let's say I'm writing a CharCNN module and want to subclass texar.modules.encoders.Conv1DEncoder. The signature of the constructor for Conv1DEncoder is: __init__(self, in_channels: int, in_features: Optional[int] = None, hparams=None). I have a field named "char_embed_dim" in default_hparams, which I am using to fill the in_channels argument. To do that, I have to realize HParams.

One possible solution is to change the constructor of HParams: __init__(self, hparams, default_hparams). If hparams is already an HParams instance, then we don't do anything. This way, when a subclass realizes HParams, it can pass the realized HParams into the super class constructor, thus prevent repeated realization.

embedding_fn of GPT2Decoder

Here, _embedding_fn should be defined inside GPT2Decoder. And in Ex. Use 1), there is no need to specify decoder(..., embedding=_embedding_fn)

Integrate bleu_tool.py into texar.evals

related issue #88

Add hidden state in decoder return values

As pointed out in #110, it would be useful to record hidden states in return values.
A problem with the current interface is we only have type annotation for State, which might include internal variables (e.g. TransformerDecoder uses Cache as State, which includes per-layer memories). We might as well refactor the decoder interfaces.

Util bleu_moses.py does not work on windows

Hey,

When for example executing the seq2seq attention example, the function corpus_bleu_moses in bleu_moses.py is used to evaluate the current model. This does not work on windows:

Traceback (most recent call last):
  File "seq2seq_attn.py", line 190, in <module>
    main()
  File "seq2seq_attn.py", line 178, in main
    val_bleu = _eval_epoch('val')
  File "seq2seq_attn.py", line 172, in _eval_epoch
    hypotheses=hypos)
  File "c:\development\git\texar-pytorch\texar\evals\bleu_moses.py", line 154, in corpus_bleu_moses
    multi_bleu_cmd, stdin=hyp_input, stderr=subprocess.STDOUT)
  File "C:\Development\Python36\lib\subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "C:\Development\Python36\lib\subprocess.py", line 403, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Development\Python36\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Development\Python36\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application

I fixed this error by changing line 147 of bleu_moses.py to
multi_bleu_cmd = ["perl"] + [multi_bleu_path]

I am not sure if this works without "perl" on Linux. Also, I don't know if you want to support Windows at all. But maybe you could state somewhere in the documentation that perl is required for some things (but I guess its already pre-installed on Linux, right?).

Path in Transformer Example is wrong

Hello,

just wanted to report a small mistake in the wmt_14_en_de.sh script (Transformer Example):

lines 22+23:

DOWNLOADED_DATA_DIR="data/en_de_temp/"
OUTPUT_DIR_CACHE="${DOWNLOADED_DATA_DIR}/cache"

OUTPUT_DIR_CACHE is then data/en_de_temp//cache. The double slash is at least on my system a problem.

You can easily fix this by removing one slash.

Missing trainable variables from MergeLayer

Hi @AvinashBukkittu , in MergeLayer here: https://github.com/asyml/texar-pytorch/blob/master/texar/core/layers.py#L638 we may want to use a nn.ModuleList instead of python list, thus the layer parameters can be accessable.

embedder_utils.get_embedding: No need to update the reference of embedding with the return value from initializers

In the current implementation of embedder_utils.get_embedding, it update the reference of embedding with the return value of initialization function. However, this is not necessary because the initialization functions in Pytorch are in-place operations. Moreover it will cause error with the customized initialization such as https://github.com/asyml/texar-pytorch/blob/master/texar/custom/initializers.py#L10.

	hparams.update({
	"name": "conv1d_classifier",
	"num_classes": 2, # set to <=0 to avoid appending output layer
	"logit_layer_kwargs": {
	"in_features": hparams["out_features"],
	"bias": True
	}
	})

	if self._hparams.num_dense_layers <= 0:
	self._encoder.append_layer({"type": "Flatten"})

	logit_kwargs = self._hparams.logit_layer_kwargs
	if logit_kwargs is None:
	logit_kwargs = {}
	elif not isinstance(logit_kwargs, HParams):
	raise ValueError(
	"hparams['logit_layer_kwargs'] must be a dict.")
	else:
	logit_kwargs = logit_kwargs.todict()
	logit_kwargs.update({"out_features": self._num_classes})

	self._encoder.append_layer({"type": "Linear",
	"kwargs": logit_kwargs})

	from texar.modules.embedders.embedders import WordEmbedder
	from texar.modules.embedders.position_embedders import PositionEmbedder
	from texar.modules.decoders.transformer_decoders import TransformerDecoder

	parser.add_argument('--config_model',
	type=str,
	default="configs.config_model_117M",
	help="The model configuration file to configure the "
	"model. The config file type is define by the "
	"'config_type',it be of texar type or json type."
	"For '--config_type=json', set the json "
	"config file path like: '--config_model "
	"gpt2_pretrained_models/model_117M/hparams.json';"
	"For '--config_type=texar', set the texar "
	"config file like: "
	"'--config_model configs.config_model_117M'.")

	data_iterator.switch_to_train_data()
	iterator = data_iterator.get_iterator()

	random.shuffle(train_data)
	train_iter = data.iterator.pool(
	train_data,
	config_data.batch_size,
	key=lambda x: (len(x[0]), len(x[1])),
	# key is not used if sort_within_batch is False by default
	batch_size_fn=utils.batch_size_fn,
	random_shuffler=data.iterator.RandomShuffler())

	parser.add_argument("--run_mode",
	type=str,
	default="train_and_evaluate",
	help="Either train_and_evaluate or test.")

	if torch.cuda.is_available():
	model = model.cuda()
	device = torch.cuda.current_device()
	else:
	device = None

	optim = torch.optim.Adam(
	model.parameters(), lr=init_lr, betas=(0.9, 0.997), eps=1e-9)
	scheduler = torch.optim.lr_scheduler.LambdaLR(optim, scheduler_lambda)

	except EOFError:
	exit(0)

asyml / texar-pytorch Goto Github PK

texar-pytorch's People

Contributors

Stargazers

Watchers

Forkers

texar-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs