I'm trying to run this on a cluster with N_GPUS=3 in

I think you installed tensorflow 2.X. You should install 1.x (e.g. <code class="notran

Detecting multiple GPUs,about lvapeab/nmt-keras

Comments (9)

lvapeab commented on May 24, 2024

It looks like Tensorflow doesn't detect any GPUs. Is tensorflow-gpu installed? Is the environment variable CUDA_VISIBLE_DEVICES properly set?

from nmt-keras.

VP007-py commented on May 24, 2024

echo $CUDA_VISIBLE_DEVICES gives me 0,1,2 and after installing tensorflow-gpu I get

Using TensorFlow backend.
2020-08-11 20:15:00.601205: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
[11/08/2020 20:15:07] Running training.
[11/08/2020 20:15:07] Building mydata55_hien dataset
[11/08/2020 20:15:08] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:08] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 20:15:08] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:15:08] Creating dictionary of all words
[11/08/2020 20:15:08] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 80000.
[11/08/2020 20:15:08] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:08] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 1800.
[11/08/2020 20:15:08] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:08] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 1000.
[11/08/2020 20:15:09] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:09] Creating vocabulary for data with data_id 'source_text'.
[11/08/2020 20:15:09] 	 Total: 42963 unique words in 80000 sentences with a total of 466800 words.
[11/08/2020 20:15:09] Creating dictionary of all words
[11/08/2020 20:15:10] Loaded "train" set inputs of data_type "text-features" with data_id "source_text" and length 80000.
[11/08/2020 20:15:10] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:10] Creating vocabulary for data with data_id 'state_below'.
[11/08/2020 20:15:10] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:15:10] Creating dictionary of all words
[11/08/2020 20:15:11] Loaded "train" set inputs of data_type "text-features" with data_id "state_below" and length 80000.
[11/08/2020 20:15:11] Loaded "train" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:15:11] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:11] Loaded "val" set inputs of data_type "text-features" with data_id "source_text" and length 1800.
[11/08/2020 20:15:11] Loaded "val" set inputs of data_type "ghost" with data_id "state_below" and length 1800.
[11/08/2020 20:15:11] Loaded "val" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:15:11] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:11] Loaded "test" set inputs of data_type "text-features" with data_id "source_text" and length 1000.
[11/08/2020 20:15:11] Loaded "test" set inputs of data_type "ghost" with data_id "state_below" and length 1000.
[11/08/2020 20:15:11] Loaded "test" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:15:11] Keeping 1 captions per input on the val set.
[11/08/2020 20:15:11] Samples reduced to 1800 in val set.
[11/08/2020 20:15:11] <<< Saving Dataset instance to datasets/Dataset_mydata55_hien.pkl ... >>>
[11/08/2020 20:15:12] <<< Dataset instance saved >>>
[11/08/2020 20:15:12] <<< Building AttentionRNNEncoderDecoder Translation_Model >>>
Traceback (most recent call last):
  File "main.py", line 51, in <module>
    train_model(parameters, args.dataset)
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/training.py", line 93, in train_model
    clear_dirs=clear_dirs)
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/model_zoo.py", line 155, in __init__
    eval('self.' + model_type + '(params)')
  File "<string>", line 1, in <module>
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/model_zoo.py", line 457, in AttentionRNNEncoderDecoder
    src_text = Input(name=self.ids_inputs[0], batch_shape=tuple([None, None]), dtype='int32')
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/engine/input_layer.py", line 178, in Input
    input_tensor=tensor)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/engine/input_layer.py", line 87, in __init__
    name=self.name)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 650, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
AttributeError: module 'tensorflow' has no attribute 'placeholder'

from nmt-keras.

lvapeab commented on May 24, 2024

I think you installed tensorflow 2.X. You should install 1.x (e.g. pip install tensorflow-gpu==1.15)

from nmt-keras.

VP007-py commented on May 24, 2024

This step resolves the errors, but I'm running out of memory as shown.

python3 main.py 
Using TensorFlow backend.
[11/08/2020 20:50:13] Running training.
[11/08/2020 20:50:13] Building mydata55_hien dataset
[11/08/2020 20:50:13] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:14] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 20:50:14] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:50:14] Creating dictionary of all words
[11/08/2020 20:50:14] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 80000.
[11/08/2020 20:50:14] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:14] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 1800.
[11/08/2020 20:50:14] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:14] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 1000.
[11/08/2020 20:50:15] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:15] Creating vocabulary for data with data_id 'source_text'.
[11/08/2020 20:50:15] 	 Total: 42963 unique words in 80000 sentences with a total of 466800 words.
[11/08/2020 20:50:15] Creating dictionary of all words
[11/08/2020 20:50:15] Loaded "train" set inputs of data_type "text-features" with data_id "source_text" and length 80000.
[11/08/2020 20:50:16] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:16] Creating vocabulary for data with data_id 'state_below'.
[11/08/2020 20:50:16] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:50:16] Creating dictionary of all words
[11/08/2020 20:50:17] Loaded "train" set inputs of data_type "text-features" with data_id "state_below" and length 80000.
[11/08/2020 20:50:17] Loaded "train" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:50:17] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:17] Loaded "val" set inputs of data_type "text-features" with data_id "source_text" and length 1800.
[11/08/2020 20:50:17] Loaded "val" set inputs of data_type "ghost" with data_id "state_below" and length 1800.
[11/08/2020 20:50:17] Loaded "val" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:50:17] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:17] Loaded "test" set inputs of data_type "text-features" with data_id "source_text" and length 1000.
[11/08/2020 20:50:17] Loaded "test" set inputs of data_type "ghost" with data_id "state_below" and length 1000.
[11/08/2020 20:50:17] Loaded "test" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:50:17] Keeping 1 captions per input on the val set.
[11/08/2020 20:50:17] Samples reduced to 1800 in val set.
[11/08/2020 20:50:17] <<< Saving Dataset instance to datasets/Dataset_mydata55_hien.pkl ... >>>
[11/08/2020 20:50:18] <<< Dataset instance saved >>>
[11/08/2020 20:50:18] <<< Building AttentionRNNEncoderDecoder Translation_Model >>>
[11/08/2020 20:50:18] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:650: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[11/08/2020 20:50:18] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:4786: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

[11/08/2020 20:50:18] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:157: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

[11/08/2020 20:50:24] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:3561: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
[11/08/2020 20:50:26] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:292: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

[11/08/2020 20:50:26] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:299: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

[11/08/2020 20:50:26] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:308: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-08-11 20:50:26.283764: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-11 20:50:26.311783: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2399890000 Hz
2020-08-11 20:50:26.313223: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x9548a50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-11 20:50:26.313241: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-11 20:50:26.314794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-11 20:50:27.979868: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x9618b90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-11 20:50:27.979910: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-08-11 20:50:27.979920: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-08-11 20:50:27.979927: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-08-11 20:50:27.982474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
2020-08-11 20:50:27.983341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
2020-08-11 20:50:27.985994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
2020-08-11 20:50:27.991316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-11 20:50:28.058570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-08-11 20:50:28.091144: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-08-11 20:50:28.102065: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-08-11 20:50:28.178217: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-08-11 20:50:28.227447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-08-11 20:50:28.227515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-11 20:50:28.233741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2
2020-08-11 20:50:28.233800: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-11 20:50:28.238665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-11 20:50:28.238686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 2 
2020-08-11 20:50:28.238712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N Y N 
2020-08-11 20:50:28.238721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   Y N N 
2020-08-11 20:50:28.238732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 2:   N N N 
2020-08-11 20:50:28.243290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2020-08-11 20:50:28.244710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-08-11 20:50:28.245980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)
[11/08/2020 20:50:28] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:312: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

[11/08/2020 20:50:28] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:321: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

[11/08/2020 20:50:28] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:328: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

-----------------------------------------------------------------------------------
		TranslationModel instance
-----------------------------------------------------------------------------------
_model_type: AttentionRNNEncoderDecoder
name: mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001
model_path: /scratch/trained_models/mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001/
verbose: 1

Params:
	ACCUMULATE_GRADIENTS: 1
	ADDITIONAL_OUTPUT_MERGE_MODE: Add
	ALIGN_FROM_RAW: True
	ALPHA_FACTOR: 0.6
	AMSGRAD: False
	APPLY_DETOKENIZATION: False
	ATTENTION_DROPOUT_P: 0.0
	ATTENTION_MODE: add
	ATTENTION_SIZE: 756
	BATCH_NORMALIZATION_MODE: 1
	BATCH_SIZE: 412
	BEAM_SEARCH: True
	BEAM_SIZE: 3
	BETA_1: 0.9
	BETA_2: 0.999
	BIDIRECTIONAL_DEEP_ENCODER: True
	BIDIRECTIONAL_ENCODER: True
	BIDIRECTIONAL_MERGE_MODE: concat
	BPE_CODES_PATH: examples/mydata55//training_codes.joint
	CLASSIFIER_ACTIVATION: softmax
	CLIP_C: 5.0
	CLIP_V: 0.0
	COVERAGE_NORM_FACTOR: 0.2
	COVERAGE_PENALTY: False
	DATASET_NAME: mydata55
	DATASET_STORE_PATH: datasets/
	DATA_AUGMENTATION: False
	DATA_ROOT_PATH: examples/mydata55/
	DECODER_HIDDEN_SIZE: 756
	DECODER_RNN_TYPE: ConditionalLSTM
	DEEP_OUTPUT_LAYERS: [('linear', 500)]
	DETOKENIZATION_METHOD: detokenize_none
	DOUBLE_STOCHASTIC_ATTENTION_REG: 0.0
	DROPOUT_P: 0.0
	EARLY_STOP: True
	EMBEDDINGS_FREQ: 1
	ENCODER_HIDDEN_SIZE: 756
	ENCODER_RNN_TYPE: LSTM
	EPOCHS_FOR_SAVE: 1
	EPSILON: 1e-08
	EVAL_EACH: 1
	EVAL_EACH_EPOCHS: True
	EVAL_ON_SETS: ['val']
	EXTRA_NAME: 
	FF_SIZE: 128
	FILL: end
	FORCE_RELOAD_VOCABULARY: False
	GLOSSARY: None
	GRU_RESET_AFTER: True
	HEURISTIC: 0
	HOMOGENEOUS_BATCHES: False
	INIT_ATT: glorot_uniform
	INIT_FUNCTION: glorot_uniform
	INIT_LAYERS: ['tanh']
	INNER_INIT: orthogonal
	INPUTS_IDS_DATASET: ['source_text', 'state_below']
	INPUTS_IDS_MODEL: ['source_text', 'state_below']
	INPUTS_TYPES_DATASET: ['text-features', 'text-features']
	INPUT_VOCABULARY_SIZE: 42966
	JOINT_BATCHES: 4
	KERAS_METRICS: ['perplexity']
	LABEL_SMOOTHING: 0.0
	LENGTH_NORM_FACTOR: 0.2
	LENGTH_PENALTY: False
	LOG_DIR: tensorboard_logs
	LOSS: categorical_crossentropy
	LR: 0.001
	LR_DECAY: None
	LR_GAMMA: 0.8
	LR_HALF_LIFE: 100
	LR_REDUCER_EXP_BASE: -0.5
	LR_REDUCER_TYPE: exponential
	LR_REDUCE_EACH_EPOCHS: False
	LR_START_REDUCTION_ON_EPOCH: 0
	MAPPING: examples/mydata55//mapping.hi_en.pkl
	MAXLEN_GIVEN_X: True
	MAXLEN_GIVEN_X_FACTOR: 2
	MAX_EPOCH: 15
	MAX_INPUT_TEXT_LEN: 150
	MAX_OUTPUT_TEXT_LEN: 150
	MAX_OUTPUT_TEXT_LEN_TEST: 450
	MAX_PLOT_Y: 100.0
	METRICS: ['perplexity']
	MINLEN_GIVEN_X: True
	MINLEN_GIVEN_X_FACTOR: 3
	MIN_DELTA: 0.0
	MIN_LR: 1e-09
	MIN_OCCURRENCES_INPUT_VOCAB: 0
	MIN_OCCURRENCES_OUTPUT_VOCAB: 0
	MODE: training
	MODEL_NAME: mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001
	MODEL_SIZE: 32
	MODEL_TYPE: AttentionRNNEncoderDecoder
	MOMENTUM: 0.0
	MULTIHEAD_ATTENTION_ACTIVATION: linear
	NESTEROV_MOMENTUM: False
	NOISE_AMOUNT: 0.01
	NORMALIZE_SAMPLING: False
	N_GPUS: 3
	N_HEADS: 8
	N_LAYERS_DECODER: 2
	N_LAYERS_ENCODER: 2
	N_SAMPLES: 5
	OPTIMIZED_SEARCH: True
	OPTIMIZER: Adam
	OUTPUTS_IDS_DATASET: ['target_text']
	OUTPUTS_IDS_MODEL: ['target_text']
	OUTPUTS_TYPES_DATASET: ['text-features']
	OUTPUT_VOCABULARY_SIZE: 34431
	PAD_ON_BATCH: True
	PARALLEL_LOADERS: 1
	PATIENCE: 5
	PLOT_EVALUATION: False
	POS_UNK: True
	REBUILD_DATASET: True
	RECURRENT_DROPOUT_P: 0.0
	RECURRENT_INPUT_DROPOUT_P: 0.0
	RECURRENT_WEIGHT_DECAY: 0.0
	REGULARIZATION_FN: L2
	RELOAD: 0
	RELOAD_EPOCH: False
	RHO: 0.9
	SAMPLE_EACH_UPDATES: 300
	SAMPLE_ON_SETS: ['train', 'val']
	SAMPLE_WEIGHTS: True
	SAMPLING: max_likelihood
	SAMPLING_SAVE_MODE: list
	SAVE_EACH_EVALUATION: True
	SCALE_SOURCE_WORD_EMBEDDINGS: False
	SCALE_TARGET_WORD_EMBEDDINGS: False
	SEARCH_PRUNING: False
	SKIP_VECTORS_HIDDEN_SIZE: 500
	SKIP_VECTORS_SHARED_ACTIVATION: tanh
	SOURCE_TEXT_EMBEDDING_SIZE: 500
	SRC_LAN: hi
	SRC_PRETRAINED_VECTORS: None
	SRC_PRETRAINED_VECTORS_TRAINABLE: True
	START_EVAL_ON_EPOCH: 1
	START_SAMPLING_ON_EPOCH: 1
	STOP_METRIC: Bleu_4
	STORE_PATH: /scratch/trained_models/mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001/
	TARGET_TEXT_EMBEDDING_SIZE: 500
	TASK_NAME: mydata55
	TEMPERATURE: 1
	TENSORBOARD: True
	TEXT_FILES: {'train': 'train.', 'val': 'val.', 'test': 'TEST.'}
	TIE_EMBEDDINGS: False
	TOKENIZATION_METHOD: tokenize_none
	TOKENIZE_HYPOTHESES: True
	TOKENIZE_REFERENCES: True
	TRAINABLE_DECODER: True
	TRAINABLE_ENCODER: True
	TRAIN_ON_TRAINVAL: False
	TRG_LAN: en
	TRG_PRETRAINED_VECTORS: None
	TRG_PRETRAINED_VECTORS_TRAINABLE: True
	USE_BATCH_NORMALIZATION: True
	USE_CUDNN: False
	USE_L1: False
	USE_L2: False
	USE_NOISE: False
	USE_PRELU: False
	USE_TF_OPTIMIZER: True
	VERBOSE: 1
	WARMUP_EXP: -1.5
	WEIGHT_DECAY: 0.0001
	WRITE_VALID_SAMPLES: True
-----------------------------------------------------------------------------------
Model: "mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001_training"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
source_text (InputLayer)        (None, None)         0                                            
__________________________________________________________________________________________________
source_word_embedding (Embeddin (None, None, 500)    21483000    source_text[0][0]                
__________________________________________________________________________________________________
src_embedding_batch_normalizati (None, None, 500)    2000        source_word_embedding[0][0]      
__________________________________________________________________________________________________
remove_mask_1 (RemoveMask)      (None, None, 500)    0           src_embedding_batch_normalization
__________________________________________________________________________________________________
bidirectional_encoder_LSTM (Bid (None, None, 1512)   7602336     remove_mask_1[0][0]              
__________________________________________________________________________________________________
annotations_batch_normalization (None, None, 1512)   6048        bidirectional_encoder_LSTM[0][0] 
__________________________________________________________________________________________________
bidirectional_encoder_1 (Bidire (None, None, 1512)   13722912    annotations_batch_normalization[0
__________________________________________________________________________________________________
annotations_1_batch_normalizati (None, None, 1512)   6048        bidirectional_encoder_1[0][0]    
__________________________________________________________________________________________________
add_1 (Add)                     (None, None, 1512)   0           annotations_batch_normalization[0
                                                                 annotations_1_batch_normalization
__________________________________________________________________________________________________
source_text_mask (GetMask)      (None, None, 500)    0           src_embedding_batch_normalization
__________________________________________________________________________________________________
annotations (ApplyMask)         (None, None, 1512)   0           add_1[0][0]                      
                                                                 source_text_mask[0][0]           
__________________________________________________________________________________________________
state_below (InputLayer)        (None, None)         0                                            
__________________________________________________________________________________________________
ctx_mean (MaskedMean)           (None, 1512)         0           annotations[0][0]                
__________________________________________________________________________________________________
target_word_embedding (Embeddin (None, None, 500)    17215500    state_below[0][0]                
__________________________________________________________________________________________________
initial_state (Dense)           (None, 756)          1143828     ctx_mean[0][0]                   
__________________________________________________________________________________________________
initial_memory (Dense)          (None, 756)          1143828     ctx_mean[0][0]                   
__________________________________________________________________________________________________
state_below_batch_normalization (None, None, 500)    2000        target_word_embedding[0][0]      
__________________________________________________________________________________________________
initial_state_batch_normalizati (None, 756)          3024        initial_state[0][0]              
__________________________________________________________________________________________________
initial_memory_batch_normalizat (None, 756)          3024        initial_memory[0][0]             
__________________________________________________________________________________________________
decoder_AttConditionalLSTMCond  [(None, None, 756),  12378745    state_below_batch_normalization[0
                                                                 annotations[0][0]                
                                                                 initial_state_batch_normalization
                                                                 initial_memory_batch_normalizatio
__________________________________________________________________________________________________
proj_h0_batch_normalization (Ba (None, None, 756)    3024        decoder_AttConditionalLSTMCond[0]
__________________________________________________________________________________________________
permute_general_1 (PermuteGener multiple             0           decoder_AttConditionalLSTMCond[0]
                                                                 logit_ctx[0][0]                  
__________________________________________________________________________________________________
decoder_LSTMCond1 (LSTMCond)    [(None, None, 756),  9147600     proj_h0_batch_normalization[0][0]
                                                                 permute_general_1[0][0]          
                                                                 initial_state_batch_normalization
                                                                 initial_memory_batch_normalizatio
__________________________________________________________________________________________________
proj_h1_batch_normalization (Ba (None, None, 756)    3024        decoder_LSTMCond1[0][0]          
__________________________________________________________________________________________________
add_2 (Add)                     (None, None, 756)    0           proj_h0_batch_normalization[0][0]
                                                                 proj_h1_batch_normalization[0][0]
__________________________________________________________________________________________________
logit_ctx (TimeDistributed)     (None, None, 500)    756500      decoder_AttConditionalLSTMCond[0]
__________________________________________________________________________________________________
logit_lstm (TimeDistributed)    (None, None, 500)    378500      add_2[0][0]                      
__________________________________________________________________________________________________
logit_emb (TimeDistributed)     (None, None, 500)    250500      state_below_batch_normalization[0
__________________________________________________________________________________________________
out_layer_mlp_batch_normalizati (None, None, 500)    2000        logit_lstm[0][0]                 
__________________________________________________________________________________________________
out_layer_ctx_batch_normalizati (None, None, 500)    2000        permute_general_1[1][0]          
__________________________________________________________________________________________________
out_layer_emb_batch_normalizati (None, None, 500)    2000        logit_emb[0][0]                  
__________________________________________________________________________________________________
additional_input (Add)          (None, None, 500)    0           out_layer_mlp_batch_normalization
                                                                 out_layer_ctx_batch_normalization
                                                                 out_layer_emb_batch_normalization
__________________________________________________________________________________________________
activation_1 (Activation)       (None, None, 500)    0           additional_input[0][0]           
__________________________________________________________________________________________________
linear_0 (TimeDistributed)      (None, None, 500)    250500      activation_1[0][0]               
__________________________________________________________________________________________________
out_layer_linear_0_batch_normal (None, None, 500)    2000        linear_0[0][0]                   
__________________________________________________________________________________________________
target_text (TimeDistributed)   (None, None, 34431)  17249931    out_layer_linear_0_batch_normaliz
==================================================================================================
Total params: 102,759,872
Trainable params: 102,741,776
Non-trainable params: 18,096
__________________________________________________________________________________________________
[11/08/2020 20:50:40] From /home/pandramish.vinay/nmt-keras/nmt_keras/model_zoo.py:213: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

[11/08/2020 20:50:40] Preparing optimizer and compiling. Optimizer configuration: 
	 LR: 0.001
	 LOSS: categorical_crossentropy
	 BETA_1: 0.9
	 BETA_2: 0.999
	 EPSILON: 1e-08
[11/08/2020 20:50:40] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1192: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

[11/08/2020 20:50:40] <<< Training model >>>
[11/08/2020 20:50:40] Training parameters: { 
	batch_size: 412
	class_weights: None
	da_enhance_list: []
	da_patch_type: resize_and_rndcrop
	data_augmentation: False
	each_n_epochs: 1
	epoch_offset: 0
	epochs_for_save: 1
	eval_on_epochs: True
	eval_on_sets: None
	extra_callbacks: [<keras_wrapper.extra.callbacks.EvalPerformance object at 0x14d0400de320>, <keras_wrapper.extra.callbacks.Sample object at 0x14d0400e6860>]
	homogeneous_batches: False
	initial_lr: 0.001
	joint_batches: 4
	lr_decay: None
	lr_gamma: 0.8
	lr_half_life: 100
	lr_reducer_exp_base: -0.5
	lr_reducer_type: exponential
	lr_warmup_exp: -1.5
	maxlen: 150
	mean_substraction: False
	metric_check: Bleu_4
	min_delta: 0.0
	min_lr: 1e-09
	n_epochs: 15
	n_gpus: 3
	n_parallel_loaders: 1
	normalization_type: None
	normalize: False
	num_iterations_val: None
	patience: 5
	patience_check_split: val
	reduce_each_epochs: False
	reload_epoch: 0
	shuffle: True
	start_eval_on_epoch: 1
	start_reduction_on_epoch: 0
	tensorboard: True
	tensorboard_params: {'write_grads': False, 'batch_size': 412, 'update_freq': 'epoch', 'embeddings_layer_names': None, 'histogram_freq': 0, 'write_images': False, 'word_embeddings_labels': None, 'write_graph': True, 'log_dir': 'tensorboard_logs', 'embeddings_freq': None, 'embeddings_metadata': None}
	verbose: 1
	wo_da_patch_type: whole
}
[11/08/2020 20:50:40] <<< creating directory /scratch/trained_models/mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001/tensorboard_logs ... >>>
[11/08/2020 20:50:44] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/callbacks/tensorboard_v1.py:200: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

[11/08/2020 20:50:44] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/callbacks/tensorboard_v1.py:203: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Epoch 1/15
2020-08-11 20:51:21.244817: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-08-11 20:51:36.044298: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 27.4KiB (rounded to 28160).  Current allocation summary follows.
2020-08-11 20:51:36.044578: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): 	Total Chunks: 243, Chunks in use: 239. 60.8KiB allocated for chunks. 59.8KiB in use in bin. 2.5KiB client-requested in use in bin.
2020-08-11 20:51:36.044605: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): 	Total Chunks: 13, Chunks in use: 11. 9.2KiB allocated for chunks. 8.2KiB in use in bin. 5.9KiB client-requested in use in bin.
2020-08-11 20:51:36.044629: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): 	Total Chunks: 2, Chunks in use: 2. 2.2KiB allocated for chunks. 2.2KiB in use in bin. 1.5KiB client-requested in use in bin.
2020-08-11 20:51:36.044640: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): 	Total Chunks: 252, Chunks in use: 251. 632.2KiB allocated for chunks. 630.2KiB in use in bin. 606.3KiB client-requested in use in bin.
2020-08-11 20:51:36.044665: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): 	Total Chunks: 58, Chunks in use: 58. 310.2KiB allocated for chunks. 310.2KiB in use in bin. 275.8KiB client-requested in use in bin.
2020-08-11 20:51:36.044674: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): 	Total Chunks: 70, Chunks in use: 70. 740.5KiB allocated for chunks. 740.5KiB in use in bin. 719.1KiB client-requested in use in bin.
2020-08-11 20:51:36.044682: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): 	Total Chunks: 211, Chunks in use: 211. 3.44MiB allocated for chunks. 3.44MiB in use in bin. 3.30MiB client-requested in use in bin.
2020-08-11 20:51:36.044691: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): 	Total Chunks: 1, Chunks in use: 1. 32.2KiB allocated for chunks. 32.2KiB in use in bin. 16.1KiB client-requested in use in bin.
2020-08-11 20:51:36.044700: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): 	Total Chunks: 195, Chunks in use: 195. 19.33MiB allocated for chunks. 19.33MiB in use in bin. 19.29MiB client-requested in use in bin.
2020-08-11 20:51:36.044708: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): 	Total Chunks: 62, Chunks in use: 62. 11.79MiB allocated for chunks. 11.79MiB in use in bin. 9.23MiB client-requested in use in bin.
2020-08-11 20:51:36.044717: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): 	Total Chunks: 2601, Chunks in use: 2601. 1015.97MiB allocated for chunks. 1015.97MiB in use in bin. 1012.12MiB client-requested in use in bin.
2020-08-11 20:51:36.044726: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): 	Total Chunks: 422, Chunks in use: 422. 332.47MiB allocated for chunks. 332.47MiB in use in bin. 256.36MiB client-requested in use in bin.
2020-08-11 20:51:36.044750: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): 	Total Chunks: 9, Chunks in use: 9. 13.00MiB allocated for chunks. 13.00MiB in use in bin. 11.03MiB client-requested in use in bin.
2020-08-11 20:51:36.044759: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): 	Total Chunks: 56, Chunks in use: 56. 127.48MiB allocated for chunks. 127.48MiB in use in bin. 126.31MiB client-requested in use in bin.
2020-08-11 20:51:36.044767: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): 	Total Chunks: 122, Chunks in use: 122. 648.19MiB allocated for chunks. 648.19MiB in use in bin. 639.69MiB client-requested in use in bin.
2020-08-11 20:51:36.044775: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): 	Total Chunks: 112, Chunks in use: 112. 1.21GiB allocated for chunks. 1.21GiB in use in bin. 1.18GiB client-requested in use in bin.
2020-08-11 20:51:36.044782: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): 	Total Chunks: 99, Chunks in use: 99. 2.11GiB allocated for chunks. 2.11GiB in use in bin. 2.02GiB client-requested in use in bin.
2020-08-11 20:51:36.044790: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): 	Total Chunks: 2, Chunks in use: 2. 69.81MiB allocated for chunks. 69.81MiB in use in bin. 47.76MiB client-requested in use in bin.
2020-08-11 20:51:36.044798: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): 	Total Chunks: 15, Chunks in use: 15. 1.09GiB allocated for chunks. 1.09GiB in use in bin. 1.04GiB client-requested in use in bin.
2020-08-11 20:51:36.044805: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-11 20:51:36.044813: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): 	Total Chunks: 6, Chunks in use: 6. 3.63GiB allocated for chunks. 3.63GiB in use in bin. 3.59GiB client-requested in use in bin.
2020-08-11 20:51:36.044821: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 27.5KiB was 16.0KiB, Chunk State: 
2020-08-11 20:51:36.044827: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 2399508736
2020-08-11 20:51:36.044843: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8b0000000 next 3716 of size 964619008
2020-08-11 20:51:36.044849: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e97eeb00 next 4149 of size 414464
2020-08-11 20:51:36.044855: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e9853e00 next 4150 of size 414464
2020-08-11 20:51:36.044861: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e98b9100 next 4151 of size 828672
2020-08-11 20:51:36.044867: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e9983600 next 4152 of size 24857344
2020-08-11 20:51:36.044873: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb138100 next 4153 of size 16640
2020-08-11 20:51:36.044879: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb13c200 next 4154 of size 414464
2020-08-11 20:51:36.044884: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb1a1500 next 4155 of size 417536
2020-08-11 20:51:36.044890: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb207400 next 4156 of size 417536
2020-08-11 20:51:36.044896: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb26d300 next 4157 of size 417536
2020-08-11 20:51:36.044902: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb2d3200 next 4158 of size 208896
2020-08-11 20:51:36.044908: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb306200 next 4159 of size 104448
2020-08-11 20:51:36.044914: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb31fa00 next 4160 of size 104448
2020-08-11 20:51:36.044919: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb339200 next 4161 of size 104448
2020-08-11 20:51:36.044942: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb352a00 next 4162 of size 417536
2020-08-11 20:51:36.044947: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb3b8900 next 4163 of size 834816
2020-08-11 20:51:36.044953: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb484600 next 4164 of size 25038848
2020-08-11 20:51:36.044959: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc65600 next 4165 of size 16640
2020-08-11 20:51:36.044964: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc69700 next 4166 of size 16640
2020-08-11 20:51:36.044970: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc6d800 next 4167 of size 16640
2020-08-11 20:51:36.044975: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc71900 next 4168 of size 834816
2020-08-11 20:51:36.044981: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecd3d600 next 4169 of size 417536
2020-08-11 20:51:36.044986: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecda3500 next 4170 of size 417536
2020-08-11 20:51:36.044992: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ece09400 next 4171 of size 417536
2020-08-11 20:51:36.044998: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ece6f300 next 4172 of size 417536
2020-08-11 20:51:36.045003: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eced5200 next 4173 of size 417536
2020-08-11 20:51:36.045009: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecf3b100 next 4174 of size 16640
2020-08-11 20:51:36.045014: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecf3f200 next 4175 of size 12519424
2020-08-11 20:51:36.045020: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8edb2fa00 next 4176 of size 417536
2020-08-11 20:51:36.045026: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8edb95900 next 4177 of size 2286336
2020-08-11 20:51:36.045031: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eddc3c00 next 4178 of size 417536
2020-08-11 20:51:36.045037: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ede29b00 next 4180 of size 417536
2020-08-11 20:51:36.045042: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ede8fa00 next 4181 of size 12519424
2020-08-11 20:51:36.045048: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eea80200 next 4182 of size 417536
2020-08-11 20:51:36.045054: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeae6100 next 4183 of size 417536
2020-08-11 20:51:36.045059: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeb4c000 next 4184 of size 16640
2020-08-11 20:51:36.045065: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeb50100 next 4185 of size 417536
2020-08-11 20:51:36.045070: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eebb6000 next 4186 of size 417536
2020-08-11 20:51:36.045076: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eec1bf00 next 4187 of size 417536
2020-08-11 20:51:36.045082: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eec81e00 next 4188 of size 828672
2020-08-11 20:51:36.045087: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eed4c300 next 4189 of size 414464
2020-08-11 20:51:36.045093: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eedb1600 next 4190 of size 414464
2020-08-11 20:51:36.045098: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eee16900 next 4191 of size 414464
2020-08-11 20:51:36.045104: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eee7bc00 next 4192 of size 414464
2020-08-11 20:51:36.045109: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeee0f00 next 4193 of size 414464
2020-08-11 20:51:36.045115: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eef46200 next 4194 of size 414464
2020-08-11 20:51:36.045121: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eefab500 next 4195 of size 274176
2020-08-11 20:51:36.045126: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eefee400 next 4196 of size 414464
2020-08-11 20:51:36.045131: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ef053700 next 4197 of size 414464
2020-08-11 20:51:36.045153: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ef0b8a00 next 4198 of size 414464

For other datasets,it's simply getting killed (Rest is same as above)

Epoch 1/15
Killed

``

from nmt-keras.

lvapeab commented on May 24, 2024

I see that you are using a batch size of 412. This is probably too large. Try reducing it.

from nmt-keras.

VP007-py commented on May 24, 2024

Works for as low as 64. Thank You !

from nmt-keras.

VP007-py commented on May 24, 2024

The same error persists even now (batch size: 16/32/64 too)... Maybe update the keras_wrapper to limit gpu growth from here

from nmt-keras.

lvapeab commented on May 24, 2024

keras_wrapper should not modify the behavior of GPU memory growth. In case you want to change the behavior of it, you can set an environment variable (export TF_FORCE_GPU_ALLOW_GROWTH=true).

Regarding you Killed error, it seems a problem with the main memory, rather than the GPU one.

from nmt-keras.

VP007-py commented on May 24, 2024

Hey, I filtered out very long sentences from the dataset and it gave no issues (In a ideal scenario, the system should work for longer sentences).
Yeah will set the environment variables if I run into errors !

from nmt-keras.

Detecting multiple GPUs about nmt-keras HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs