GithubHelp home page GithubHelp logo

asyml / texar-pytorch Goto Github PK

View Code? Open in Web Editor NEW
742.0 24.0 119.0 3.15 MB

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Home Page: https://asyml.io

License: Apache License 2.0

Python 99.66% Perl 0.23% Shell 0.06% Dockerfile 0.05%
machine-learning natural-language-processing pytorch deep-learning text-generation python machine-translation dialog-systems texar bert

texar-pytorch's People

Contributors

atif93 avatar avinashbukkittu avatar codle avatar gpengzhi avatar haoransh avatar haoyulucas avatar hunterhector avatar huzecong avatar imgaojun avatar jennyzhang-petuum avatar jieralice13 avatar mylibrar avatar odp avatar qinzzz avatar swapnull7 avatar tanyuqian avatar tomnong avatar wanglec avatar wanglechuan-gif avatar weiwei718 avatar zeyawang avatar zhanyuanucb avatar zhitinghu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

texar-pytorch's Issues

BERT Classifier: Loss is 0?

Hey,

I am trying to run the BERT Classifier example with the MRPC task.

I get the following error:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2018.3.3\helpers\pydev\pydevd.py", line 1741, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm 2018.3.3\helpers\pydev\pydevd.py", line 1735, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm 2018.3.3\helpers\pydev\pydevd.py", line 1135, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2018.3.3\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Development/Git/texar-pytorch/examples/bert/bert_classifier_main.py", line 274, in <module>
    main()
  File "C:/Development/Git/texar-pytorch/examples/bert/bert_classifier_main.py", line 258, in main
    _train_epoch()
  File "C:/Development/Git/texar-pytorch/examples/bert/bert_classifier_main.py", line 156, in _train_epoch
    input_ids = batch["input_ids"]
  File "C:\Development\Git\texar-pytorch\texar\data\data\dataset_utils.py", line 78, in __getitem__
    return self._batch[item]
KeyError: 'input_ids'

Changing the keys as for example in input_ids = batch["input_ids"] to input_ids = batch["data_input_ids"] seems to fix these errors.

However, then i tried the script like this
python bert_classifier_main.py --do_train --do_eval

and get the following logs:

python bert_classifier_main.py --do_train --do_eval
Pretrained model loaded from bert_pretrained_models/uncased_L-12_H-768_A-12\bert_model.ckpt
INFO:root:step: 50; loss: 0
INFO:root:step: 100; loss: 0
INFO:root:step: 150; loss: 0
INFO:root:step: 200; loss: 0
INFO:root:step: 250; loss: 0
INFO:root:step: 300; loss: 0
INFO:root:eval accu: 0.8578; loss: 0.3904; nsamples: 408

Is this correct?

Different output of `UnidirectionalRNNEncoder` on `texar.tf` and `texar.torch`

Currently I found the following two pieces of code have different outputs:
texar.tf:

import tensorflow as tf
import texar.tf as tx

hp = {
    'type': 'LSTMCell',
    'kwargs': {
        'num_units': 256,
        'forget_bias': 0.
    },
    'dropout': {'output_keep_prob': 1},
    'num_layers': 1
}
encoder = tx.modules.UnidirectionalRNNEncoder(hparams={"rnn_cell": hp})

inputs = tf.zeros([32, 50, 256])

sequence_length = [26, 17, 13, 25, 36, 29, 25, 34, 11, 17, 10,
                   22, 23, 24, 33, 18, 21, 17, 22, 20, 34, 22,
                   40, 50, 19, 18, 14, 22, 14, 34, 22, 28]
tf.convert_to_tensor(sequence_length, dtype=tf.int32)
_, states = encoder(inputs, sequence_length)
initilizer  = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(initilizer)
    print("states[0]", sess.run(states[0]))
    print("states[1]", sess.run(states[1]))

texar.torch:

import torch
import texar.torch as tx

hp = {
    'type': 'LSTMCell',
    'kwargs': {
        'num_units': 256,
        'forget_bias': 0.
    },
    'dropout': {'output_keep_prob': 1},
    'num_layers': 1
}
encoder = tx.modules.UnidirectionalRNNEncoder(input_size=256, hparams={"rnn_cell": hp})

inputs = torch.zeros([32, 50, 256])

sequence_length = torch.Tensor([26, 17, 13, 25, 36, 29, 25, 34, 11, 17, 10,
                                22, 23, 24, 33, 18, 21, 17, 22, 20, 34, 22,
                                40, 50, 19, 18, 14, 22, 14, 34, 22, 28]).to(torch.int32)
_, states = encoder(inputs, sequence_length)
print("states[0]", states[0])
print("states[1]", states[1])

texar.tf prints

states[1] [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

states[1] [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

while texar.torch prints

states[0] tensor([[ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0345,  0.0294, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0345,  0.0294, -0.0202,  ..., -0.0030, -0.0021, -0.0020],
        ...,
        [ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020],
        [ 0.0346,  0.0295, -0.0203,  ..., -0.0030, -0.0020, -0.0020]],
       grad_fn=<StackBackward>)
states[1] tensor([[ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0419,  ..., -0.0061, -0.0042, -0.0041],
        ...,
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040],
        [ 0.0679,  0.0591, -0.0420,  ..., -0.0061, -0.0039, -0.0040]],

the sizes of both states are the same, but values are different, can this be solved by changing parameters? Thanks!

Multiple inheritance with `DecoderBase`

An initialization issue will occur when we construct a class using multiple inheritances with DecoderBase. For example, we here have class XLNetDecoder(XLNetEncoder, DecoderBase). When we call super().__init__(...) in XLNetDecoder, the initialization of DecoderBase can not be executed properly because of the parameter input_size in DecoderBase.__init__(). hparams is actually assigned to input_size, and hparams is set to be None in ModuleBase.

init order: XLNetDecoder -> XLNetEncoder -> XLNetBase -> DecoderBase -> ModuleBase.

About the Conv1DClassifier

Hello, I'm trying to reduplicate an example (which is originally implemented with texar) using texar-pytorch. But I got confused about the Conv1DClassifier.

  1. The first question is about the documents:

image

However there doesn't seem to be a hyperparameter called filter. Maybe it means the out_channels?

  1. Currently the Conv1DClassifier requires an inputs with a shape of [batch_size, channels, length], which means I have to manually transpose the input because in most case it have a shape like [batch_size, length, dim]. I understand that the requirement resulting from pyTorch's torch.nn.Conv1d, but I wonder if the classifier can do the transposing job for me so that its input style may keep in line with other classifiers (always set the length as the second dim).

  2. In the Tensorflow version of texar, I can set 'other_conv_kwargs': {'padding': 'same'} to make sure the output shape will be identical to the input shape, but I can't do this in texar-pytorch since torch.nn.Conv1d only accept an integer for the padding numbers. Now I have 3 different kernel sizes [3, 4, 5] and my workaround is 'other_conv_kwargs': {'padding': 2}. But I wonder if I can manually set different paddings for different kernel sizes like padding: [1, 2, 2]. It would be even better if texar can calculate the expected paddings for me.

  3. the Conv1DClassifier will add a default hyperparameter:

hparams.update({
"name": "conv1d_classifier",
"num_classes": 2, # set to <=0 to avoid appending output layer
"logit_layer_kwargs": {
"in_features": hparams["out_features"],
"bias": True
}
})

When I set 'num_dense_layers': 0, The Linear layer below will still have a hyperparameter in_features, which is totally wrong because there isn't any dense layer in the Conv1DEncoder.

if self._hparams.num_dense_layers <= 0:
self._encoder.append_layer({"type": "Flatten"})
logit_kwargs = self._hparams.logit_layer_kwargs
if logit_kwargs is None:
logit_kwargs = {}
elif not isinstance(logit_kwargs, HParams):
raise ValueError(
"hparams['logit_layer_kwargs'] must be a dict.")
else:
logit_kwargs = logit_kwargs.todict()
logit_kwargs.update({"out_features": self._num_classes})
self._encoder.append_layer({"type": "Linear",
"kwargs": logit_kwargs})

In fact in this situation, the correct in_features should be the number of different kernel sizes times the number of out channels. Say, I have 'kernel_size': [3, 4, 5] and 'out_channels': 128, after Encoder-Flatten-Dropout, the in_features should be len([3,4,5]) * 128 = 384

Support dynamic batch sizes

Support dynamic batch sizes (e.g. each batch contains no more than x words) like torchtext. Since our implementation uses PyTorch's BatchSampler, we can provide an interface for the user to supply their own BatchSampler.

Note for lower-level implementation: (This note is only for those who understand how multi-processing in tx.data.DataBase works) Users might need to access data stored in examples (e.g. sentence length) to decide whether to include another example in the batch. This is possible without further changes to the code base because:

  1. Samplers are executed on the main process only.
  2. _prefetch_source is called in the (non-batch) samplers, before the next index is yielded. So even if lazy loading is enabled, examples will be accessible by the time the user tries to access it in BatchSampler code.

Add a default implementation of no-op for `DataBase.process`

In many cases the user needs to implement their own data class, e.g. when using data that they've prepared themselves. In this case, it is often unnecessary to implement process, but since we don't provide the default impl., they would still need to override process with a no-op. Would it make sense to provide no-op as the default?

Also, the name IterDataSource might be misleading because by design the iterator can only be iterated over once if no caching options are set. Should we change it or remove it, or make the point stronger in the docstring?

@AvinashBukkittu What do you think?

Conv1DNetwork returns tensor of incorrect shape

When using Conv1DNetwork with multiple filters and no dense layers, the returned tensor has incorrect shape.

For example:

char_embed_size = 8
char_cnn = tx.modules.Conv1DEncoder(
    in_channels=char_embed_size, hparams={
        "kernel_size": [3, 4, 5],
        "out_channels": 50,
        "num_dense_layers": 0,
        "conv_activation": tx.core.identity,
        "dropout_conv": [],
        "dropout_rate": 0.0,
    })
batch_size, seq_len = 20, 30
# (batch, in_channels, in_features)
input = torch.randn(batch_size, char_embed_size, seq_len)
output = char_cnn(input)
out_channels = 150  # out_channels is multiplied by length of kernel_size
assert output.size() == (batch_size, out_channels)  # raises AssertionError
# actual returned size is (batch_size, 50, 3)

This is because internally Conv1DNetwork constructs a MergeLayer that concats outputs from differently-sized kernels, but does not specify the dim argument, so the default value of 2 is used. However, for convolutional layers, the channel dimension (defaults to dimension 1 in PyTorch) should be concat'd.

The test cases failed to capture this bug because dense layers were used. This resulted in Flatten layer being added, and in_features to the following Linear layer is inferred by actually running an input example through the network.

BidirectionalRNNEncoder does not support multi-layer RNNs

Cells for multi-layer RNNs are created a list of per-layer RNN cells stacked together by MultiRNNCell. The state representation for MultiRNNCell is therefore a list of states from each layer. However, _dynamic_rnn_loop only supports using single-layer RNNCells, and only checks the special case of LSTMCell where the state is a tuple of two tensors.

It is hard to extend such logic to arbitrarily nested cells (because theoretically a MultiRNNCell could contain other MultiRNNCells, although normal users don't do that). For now, we can extend the logic by checking whether the state is a list or tuple, and aggregate states over all elements in the list or tuple.

ModuleNotFoundError: No module named 'texar.torch'import

Installed texar (tf version) fine but having problems running scripts in texar-pytorch.

All scripts return this error:

ModuleNotFoundError: No module named 'texar.torch'import

For example: xlnet_generation_main.py

Have installed according to instructions. Can import "texar" followed by "torch" from python but not "import texar.torch as tx" at the start of each script.

Should a path be added? Any help appreciated.

pip install -e .

Installing collected packages: numpy, idna, certifi, chardet, urllib3, requests, funcsigs, mypy-extensions, texar-pytorch
Found existing installation: texar-pytorch 0.0.1
Uninstalling texar-pytorch-0.0.1:
Successfully uninstalled texar-pytorch-0.0.1
Running setup.py develop for texar-pytorch

python3
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.

import texar.torch as tx
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'texar.torch'import
import texar
import torch

gpt2 example polish

  • Port gpt2_train_main.py too

  • from texar.modules.embedders.embedders import WordEmbedder
    from texar.modules.embedders.position_embedders import PositionEmbedder
    from texar.modules.decoders.transformer_decoders import TransformerDecoder

    do not expose internal details. ---> from texar.modules import TransformerDecoder

  • parser.add_argument('--config_model',
    type=str,
    default="configs.config_model_117M",
    help="The model configuration file to configure the "
    "model. The config file type is define by the "
    "'config_type',it be of texar type or json type."
    "For '--config_type=json', set the json "
    "config file path like: '--config_model "
    "gpt2_pretrained_models/model_117M/hparams.json';"
    "For '--config_type=texar', set the texar "
    "config file like: "
    "'--config_model configs.config_model_117M'.")

    Too sparse. Reformat to sth like

parser.add_argument(
    '--config_model',
    ...

Extra head-tail empty string in data batch

Currently, in the PR #116, there is possible extra head-tail empty string("") in loading ptb data(train, valid and test), using MonoTextData and TrainTestDataIterator.

E.g.

def forward(self, data_batch, kl_weight):
here the correct data_batch["text"] could be ['<BOS>', 'abc', ...'cba', '<EOS>'], but the actual data is ['<BOS>', '', 'abc', ...'cba', '', '<EOS>']. Two empty strings are added between the head and tail, along with the data_batch["text_ids"].

This issue is only found when loading the ptb data now.

Generator issue in BufferShuffleSampler

Here, return <something> in a generator is equivalent to raise StopIteration(<something>), which means function _iterator_given_size returns nothing when self.buffer_size >= size.

Here is a simple example:

def f():
  return iter([1,2,3])
  yield 2

print(list(f()))  # []

AttentionRNNDecoder interface refactor

Requiring input_size and encoder_output_size, especially in AttentionRNNDecoder, is too much, e.g., here:

tx.modules.AttentionRNNDecoder(
            encoder_output_size=(self.encoder.cell_fw.hidden_size +
                                 self.encoder.cell_bw.hidden_size),
            input_size=self.target_embedder.dim + config_model.decoder
            ['attention']['attention_layer_size'],
            ...)

Can we refactor the interface a bit so that sth like

encoder_output_size=self.encoder.output_size,
input_size=self.target_embedder.dim

is enough?

related issue: #42

@gpengzhi @huzecong @TomNong

SinusoidsPositionEmbedder does not support negative indices

SinusoidsPositionEmbedder now requires the argument position_size (i.e. maximum sequence length) in its constructor, and precomputes position embeddings for indices in the range [0, position_size). This improves efficiency during embedding lookup.

However, in certain cases the maximum sequence length cannot be known at the time of model construction (e.g. models supporting arbitrary-lengthed sequences), or require sinusoids at negative indices (e.g. TransformerXL with relative positional encoding). Neither of these cases are supported by the current implementation.

My suggestion is to add a flag cache_embeddings to the constructor. If True, follow the current precomputing scheme; if False, compute the encodings on the fly during lookup.

Documentation issues

Currently there are a plethora of issues with the documentation. In a certain sense, documentation is more important than code for end-users so we should also fix these before release.

  • Add CI test for building documentations.
  • Fix docstrings with incorrect/inconsistent Sphinx format. Examples include:
    • Unescaped markup characters: "*_keep_prob" should be "\*_keep_prob". Also make sure all docstrings are raw strings (r"""docstring""").
    • Incorrect indentation: Docstrings should follow same indentation as code (4 spaces).
    • Collapsed strings where there should be structures (e.g. lists).
       Wrong (will render as one line):
       - This is a long line that
       is wrapped.
       - Second item.
       Correct:
       - This is a long line that
         is wrapped.
       - Second item.
      
    • Modules, classes, method, attributes should use corresponding Sphinx roles (:mod:, :class:, :meth:, :attr:).
    • String literals, code snippets, and other identifiers that should be in monospace font should be in monospace font (``"string"``, :python:`code`, ``identifier``).
  • Fix docstrings that are incorrect. Examples include:
    • Method docstrings with arguments that no longer exist.
    • Class docstrings with attributes / __init__ arguments that no longer exist.
    • Code examples that still use TensorFlow functions, or do not conform to the current APIs should be changed accordingly.
    • External links to PyTorch documents should exist for each PyTorch class/function/module referenced.

This is tedious work for the document maintainer, so in the future, I hope each contributor could also make sure that the docstrings you added conform to the above standards when you make a pull requests.

Add "output_size" for all Texar modules

In PyTorch, users are required to know the output size of previous modules in order to create parameters for following modules that take as input the previous outputs. For certain modules, manually computing the output size could be difficult for the user (e.g. convolutional layers). It would be useful to add an output_size property to ModuleBase.

utils.pad_and_concat cannot pad torch.LongTensor list

pad_and_concat function cannot pad torch.LongTensor. For example

>>> a = [torch.tensor([[1,2]]), torch.tensor([[3]])]
>>> a[0].dtype
torch.int64
>>> texar.utils.pad_and_concat(a, 0)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/haoransh/PycharmProjects/texar-pytorch/texar/utils/shapes.py", line 221, in pad_and_concat
    values[i] = torch.cat((v, padding), dim=pad_dim)
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors'

Inconsistent behavior between hparams.dataset.bos_token/vocab.bos_token and hparams.dataset.eos_token/vocab.eos_token

Currently, we set vocab.bos_token and vocab.eos_token to their default values (<BOS> and <EOS> respectively) when bos_token and eos_token in hparams are set to empty string. This creates inconsistencies between the two sets of variables. In the process method, we use the special tokens from the hparams.

One possible use case of this happening is when the input files already contain bos and eos tokens and the user does not want any additional tokens to be added during processing.

Originally posted by @huzecong in #53

Not support Windows system?

When I run the code in Windows, I met the error

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead

and I run the same codes in Linux, all things were OK.

I had checked the pytorch version were both 1.1.0.

Write clear docs for output_size methods. In few cases, we return the size of only one of the tensors.

Then the docstring "The final dimension(s) of :meth:forward output tensor(s)" should be updated. It's not exactly the size(s) of forward outputs, but instead more of case-by-case? E.g., sometimes it's the dimensions of some of the output tensors.

And, what does "final dimension(s)" mean?

Since it's case-by-case, all output_size should explicitly and clearly explain what it is in the particular classes.

Originally posted by @ZhitingHu in asyml/texar#182

seq2seq_attn example polish

DataBase.to should return itself

A commonly used paradigm is tensor = tensor.to(device). Now DataBase.to is implemented as a no-op (just stored the device; actual data moving is done in process and collate), to support such paradigm it should return self.

Implementing XLNet with IMDB cannot run

Traceback (most recent call last):
File "xlnet_classification_main.py", line 274, in
main(_args)
File "xlnet_classification_main.py", line 166, in main
iterator = tx.data.DataIterator(datasets)
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 594, in init
for name, dataset in datasets.items()}
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 594, in
for name, dataset in datasets.items()}
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 465, in init
sampler = RandomSampler(dataset)
File "/home/yh/texar-pytorch/texar/torch/data/data/data_iterators.py", line 167, in init
data, replacement, num_samples)
File "/home/yh/.local/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Tried to change the default value of agrs such as replacement = True and num_samples =(some integer) of data_iterator.RandomSampler, but it doesn't work... Any advice would be appreciated!

Move the default download directory outside the package directory

The default download directory texar_download is currently under the package directory texar-pytorch. The preferred installation method now is pip install . instead of pip install -e .. We should store those large checkpoint files outside the package directory. We could probably set the default directory to ~/texar_download (a folder under the home directory).

transformer example polish

Seq2Seq Example with GPU Support

Hello,

How can I run the Seq2Seq example with my GPU?

I already modified the training data to use the cuda device as well as the model:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 

train_data = tx.data.PairedTextData(hparams=config_data.train, device=device)
val_data = tx.data.PairedTextData(hparams=config_data.val, device=device)
test_data = tx.data.PairedTextData(hparams=config_data.test, device=device)

model = Seq2SeqAttn(train_data)
model.to(device)

Redesign module constructor interfaces

When subclassing existing modules, a common use case I've encountered is to automatically fill certain arguments of the super class constructor based on HParams values. By design, HParams are realized in the base class (ModuleBase), where the default values are filled in and user-provided values are type-checked. However, in order to call the base class constructor, all it arguments must be filled, and these arguments may rely on default HParams values, which are not available yet.

As a result, one must explicitly realize the HParams in the derived class. But there is no way to prevent the base class from realizing the HParams again, and in cases with multiple levels of inheritance, HParams realization could happen multiple times.

For example, let's say I'm writing a CharCNN module and want to subclass texar.modules.encoders.Conv1DEncoder. The signature of the constructor for Conv1DEncoder is: __init__(self, in_channels: int, in_features: Optional[int] = None, hparams=None). I have a field named "char_embed_dim" in default_hparams, which I am using to fill the in_channels argument. To do that, I have to realize HParams.

One possible solution is to change the constructor of HParams: __init__(self, hparams, default_hparams). If hparams is already an HParams instance, then we don't do anything. This way, when a subclass realizes HParams, it can pass the realized HParams into the super class constructor, thus prevent repeated realization.

embedding_fn of GPT2Decoder

Here, _embedding_fn should be defined inside GPT2Decoder. And in Ex. Use 1), there is no need to specify decoder(..., embedding=_embedding_fn)

Add hidden state in decoder return values

As pointed out in #110, it would be useful to record hidden states in return values.
A problem with the current interface is we only have type annotation for State, which might include internal variables (e.g. TransformerDecoder uses Cache as State, which includes per-layer memories). We might as well refactor the decoder interfaces.

Util bleu_moses.py does not work on windows

Hey,

When for example executing the seq2seq attention example, the function corpus_bleu_moses in bleu_moses.py is used to evaluate the current model. This does not work on windows:

Traceback (most recent call last):
  File "seq2seq_attn.py", line 190, in <module>
    main()
  File "seq2seq_attn.py", line 178, in main
    val_bleu = _eval_epoch('val')
  File "seq2seq_attn.py", line 172, in _eval_epoch
    hypotheses=hypos)
  File "c:\development\git\texar-pytorch\texar\evals\bleu_moses.py", line 154, in corpus_bleu_moses
    multi_bleu_cmd, stdin=hyp_input, stderr=subprocess.STDOUT)
  File "C:\Development\Python36\lib\subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "C:\Development\Python36\lib\subprocess.py", line 403, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Development\Python36\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Development\Python36\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application

I fixed this error by changing line 147 of bleu_moses.py to
multi_bleu_cmd = ["perl"] + [multi_bleu_path]

I am not sure if this works without "perl" on Linux. Also, I don't know if you want to support Windows at all. But maybe you could state somewhere in the documentation that perl is required for some things (but I guess its already pre-installed on Linux, right?).

Path in Transformer Example is wrong

Hello,

just wanted to report a small mistake in the wmt_14_en_de.sh script (Transformer Example):

lines 22+23:

DOWNLOADED_DATA_DIR="data/en_de_temp/"
OUTPUT_DIR_CACHE="${DOWNLOADED_DATA_DIR}/cache"

OUTPUT_DIR_CACHE is then data/en_de_temp//cache. The double slash is at least on my system a problem.

You can easily fix this by removing one slash.

embedder_utils.get_embedding: No need to update the reference of embedding with the return value from initializers

In the current implementation of embedder_utils.get_embedding, it update the reference of embedding with the return value of initialization function. However, this is not necessary because the initialization functions in Pytorch are in-place operations. Moreover it will cause error with the customized initialization such as https://github.com/asyml/texar-pytorch/blob/master/texar/custom/initializers.py#L10.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.