GithubHelp home page GithubHelp logo

Detecting multiple GPUs about nmt-keras HOT 9 CLOSED

VP007-py avatar VP007-py commented on May 24, 2024
Detecting multiple GPUs

from nmt-keras.

Comments (9)

lvapeab avatar lvapeab commented on May 24, 2024

It looks like Tensorflow doesn't detect any GPUs. Is tensorflow-gpu installed? Is the environment variable CUDA_VISIBLE_DEVICES properly set?

from nmt-keras.

VP007-py avatar VP007-py commented on May 24, 2024

echo $CUDA_VISIBLE_DEVICES gives me 0,1,2 and after installing tensorflow-gpu I get

Using TensorFlow backend.
2020-08-11 20:15:00.601205: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
[11/08/2020 20:15:07] Running training.
[11/08/2020 20:15:07] Building mydata55_hien dataset
[11/08/2020 20:15:08] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:08] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 20:15:08] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:15:08] Creating dictionary of all words
[11/08/2020 20:15:08] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 80000.
[11/08/2020 20:15:08] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:08] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 1800.
[11/08/2020 20:15:08] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:08] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 1000.
[11/08/2020 20:15:09] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:09] Creating vocabulary for data with data_id 'source_text'.
[11/08/2020 20:15:09] 	 Total: 42963 unique words in 80000 sentences with a total of 466800 words.
[11/08/2020 20:15:09] Creating dictionary of all words
[11/08/2020 20:15:10] Loaded "train" set inputs of data_type "text-features" with data_id "source_text" and length 80000.
[11/08/2020 20:15:10] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:10] Creating vocabulary for data with data_id 'state_below'.
[11/08/2020 20:15:10] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:15:10] Creating dictionary of all words
[11/08/2020 20:15:11] Loaded "train" set inputs of data_type "text-features" with data_id "state_below" and length 80000.
[11/08/2020 20:15:11] Loaded "train" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:15:11] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:11] Loaded "val" set inputs of data_type "text-features" with data_id "source_text" and length 1800.
[11/08/2020 20:15:11] Loaded "val" set inputs of data_type "ghost" with data_id "state_below" and length 1800.
[11/08/2020 20:15:11] Loaded "val" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:15:11] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:15:11] Loaded "test" set inputs of data_type "text-features" with data_id "source_text" and length 1000.
[11/08/2020 20:15:11] Loaded "test" set inputs of data_type "ghost" with data_id "state_below" and length 1000.
[11/08/2020 20:15:11] Loaded "test" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:15:11] Keeping 1 captions per input on the val set.
[11/08/2020 20:15:11] Samples reduced to 1800 in val set.
[11/08/2020 20:15:11] <<< Saving Dataset instance to datasets/Dataset_mydata55_hien.pkl ... >>>
[11/08/2020 20:15:12] <<< Dataset instance saved >>>
[11/08/2020 20:15:12] <<< Building AttentionRNNEncoderDecoder Translation_Model >>>
Traceback (most recent call last):
  File "main.py", line 51, in <module>
    train_model(parameters, args.dataset)
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/training.py", line 93, in train_model
    clear_dirs=clear_dirs)
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/model_zoo.py", line 155, in __init__
    eval('self.' + model_type + '(params)')
  File "<string>", line 1, in <module>
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/model_zoo.py", line 457, in AttentionRNNEncoderDecoder
    src_text = Input(name=self.ids_inputs[0], batch_shape=tuple([None, None]), dtype='int32')
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/engine/input_layer.py", line 178, in Input
    input_tensor=tensor)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/engine/input_layer.py", line 87, in __init__
    name=self.name)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 650, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
AttributeError: module 'tensorflow' has no attribute 'placeholder'

from nmt-keras.

lvapeab avatar lvapeab commented on May 24, 2024

I think you installed tensorflow 2.X. You should install 1.x (e.g. pip install tensorflow-gpu==1.15)

from nmt-keras.

VP007-py avatar VP007-py commented on May 24, 2024

This step resolves the errors, but I'm running out of memory as shown.

python3 main.py 
Using TensorFlow backend.
[11/08/2020 20:50:13] Running training.
[11/08/2020 20:50:13] Building mydata55_hien dataset
[11/08/2020 20:50:13] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:14] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 20:50:14] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:50:14] Creating dictionary of all words
[11/08/2020 20:50:14] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 80000.
[11/08/2020 20:50:14] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:14] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 1800.
[11/08/2020 20:50:14] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:14] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 1000.
[11/08/2020 20:50:15] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:15] Creating vocabulary for data with data_id 'source_text'.
[11/08/2020 20:50:15] 	 Total: 42963 unique words in 80000 sentences with a total of 466800 words.
[11/08/2020 20:50:15] Creating dictionary of all words
[11/08/2020 20:50:15] Loaded "train" set inputs of data_type "text-features" with data_id "source_text" and length 80000.
[11/08/2020 20:50:16] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:16] Creating vocabulary for data with data_id 'state_below'.
[11/08/2020 20:50:16] 	 Total: 34428 unique words in 80000 sentences with a total of 401199 words.
[11/08/2020 20:50:16] Creating dictionary of all words
[11/08/2020 20:50:17] Loaded "train" set inputs of data_type "text-features" with data_id "state_below" and length 80000.
[11/08/2020 20:50:17] Loaded "train" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:50:17] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:17] Loaded "val" set inputs of data_type "text-features" with data_id "source_text" and length 1800.
[11/08/2020 20:50:17] Loaded "val" set inputs of data_type "ghost" with data_id "state_below" and length 1800.
[11/08/2020 20:50:17] Loaded "val" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:50:17] 	Applying tokenization function: "tokenize_none".
[11/08/2020 20:50:17] Loaded "test" set inputs of data_type "text-features" with data_id "source_text" and length 1000.
[11/08/2020 20:50:17] Loaded "test" set inputs of data_type "ghost" with data_id "state_below" and length 1000.
[11/08/2020 20:50:17] Loaded "test" set inputs of type "file-name" with id "raw_source_text".
[11/08/2020 20:50:17] Keeping 1 captions per input on the val set.
[11/08/2020 20:50:17] Samples reduced to 1800 in val set.
[11/08/2020 20:50:17] <<< Saving Dataset instance to datasets/Dataset_mydata55_hien.pkl ... >>>
[11/08/2020 20:50:18] <<< Dataset instance saved >>>
[11/08/2020 20:50:18] <<< Building AttentionRNNEncoderDecoder Translation_Model >>>
[11/08/2020 20:50:18] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:650: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[11/08/2020 20:50:18] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:4786: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

[11/08/2020 20:50:18] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:157: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

[11/08/2020 20:50:24] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:3561: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
[11/08/2020 20:50:26] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:292: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

[11/08/2020 20:50:26] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:299: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

[11/08/2020 20:50:26] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:308: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-08-11 20:50:26.283764: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-11 20:50:26.311783: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2399890000 Hz
2020-08-11 20:50:26.313223: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x9548a50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-11 20:50:26.313241: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-11 20:50:26.314794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-11 20:50:27.979868: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x9618b90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-11 20:50:27.979910: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-08-11 20:50:27.979920: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-08-11 20:50:27.979927: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-08-11 20:50:27.982474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
2020-08-11 20:50:27.983341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
2020-08-11 20:50:27.985994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
2020-08-11 20:50:27.991316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-11 20:50:28.058570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-08-11 20:50:28.091144: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-08-11 20:50:28.102065: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-08-11 20:50:28.178217: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-08-11 20:50:28.227447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-08-11 20:50:28.227515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-11 20:50:28.233741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2
2020-08-11 20:50:28.233800: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-11 20:50:28.238665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-11 20:50:28.238686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 2 
2020-08-11 20:50:28.238712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N Y N 
2020-08-11 20:50:28.238721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   Y N N 
2020-08-11 20:50:28.238732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 2:   N N N 
2020-08-11 20:50:28.243290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2020-08-11 20:50:28.244710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-08-11 20:50:28.245980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)
[11/08/2020 20:50:28] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:312: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

[11/08/2020 20:50:28] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:321: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

[11/08/2020 20:50:28] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:328: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

-----------------------------------------------------------------------------------
		TranslationModel instance
-----------------------------------------------------------------------------------
_model_type: AttentionRNNEncoderDecoder
name: mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001
model_path: /scratch/trained_models/mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001/
verbose: 1

Params:
	ACCUMULATE_GRADIENTS: 1
	ADDITIONAL_OUTPUT_MERGE_MODE: Add
	ALIGN_FROM_RAW: True
	ALPHA_FACTOR: 0.6
	AMSGRAD: False
	APPLY_DETOKENIZATION: False
	ATTENTION_DROPOUT_P: 0.0
	ATTENTION_MODE: add
	ATTENTION_SIZE: 756
	BATCH_NORMALIZATION_MODE: 1
	BATCH_SIZE: 412
	BEAM_SEARCH: True
	BEAM_SIZE: 3
	BETA_1: 0.9
	BETA_2: 0.999
	BIDIRECTIONAL_DEEP_ENCODER: True
	BIDIRECTIONAL_ENCODER: True
	BIDIRECTIONAL_MERGE_MODE: concat
	BPE_CODES_PATH: examples/mydata55//training_codes.joint
	CLASSIFIER_ACTIVATION: softmax
	CLIP_C: 5.0
	CLIP_V: 0.0
	COVERAGE_NORM_FACTOR: 0.2
	COVERAGE_PENALTY: False
	DATASET_NAME: mydata55
	DATASET_STORE_PATH: datasets/
	DATA_AUGMENTATION: False
	DATA_ROOT_PATH: examples/mydata55/
	DECODER_HIDDEN_SIZE: 756
	DECODER_RNN_TYPE: ConditionalLSTM
	DEEP_OUTPUT_LAYERS: [('linear', 500)]
	DETOKENIZATION_METHOD: detokenize_none
	DOUBLE_STOCHASTIC_ATTENTION_REG: 0.0
	DROPOUT_P: 0.0
	EARLY_STOP: True
	EMBEDDINGS_FREQ: 1
	ENCODER_HIDDEN_SIZE: 756
	ENCODER_RNN_TYPE: LSTM
	EPOCHS_FOR_SAVE: 1
	EPSILON: 1e-08
	EVAL_EACH: 1
	EVAL_EACH_EPOCHS: True
	EVAL_ON_SETS: ['val']
	EXTRA_NAME: 
	FF_SIZE: 128
	FILL: end
	FORCE_RELOAD_VOCABULARY: False
	GLOSSARY: None
	GRU_RESET_AFTER: True
	HEURISTIC: 0
	HOMOGENEOUS_BATCHES: False
	INIT_ATT: glorot_uniform
	INIT_FUNCTION: glorot_uniform
	INIT_LAYERS: ['tanh']
	INNER_INIT: orthogonal
	INPUTS_IDS_DATASET: ['source_text', 'state_below']
	INPUTS_IDS_MODEL: ['source_text', 'state_below']
	INPUTS_TYPES_DATASET: ['text-features', 'text-features']
	INPUT_VOCABULARY_SIZE: 42966
	JOINT_BATCHES: 4
	KERAS_METRICS: ['perplexity']
	LABEL_SMOOTHING: 0.0
	LENGTH_NORM_FACTOR: 0.2
	LENGTH_PENALTY: False
	LOG_DIR: tensorboard_logs
	LOSS: categorical_crossentropy
	LR: 0.001
	LR_DECAY: None
	LR_GAMMA: 0.8
	LR_HALF_LIFE: 100
	LR_REDUCER_EXP_BASE: -0.5
	LR_REDUCER_TYPE: exponential
	LR_REDUCE_EACH_EPOCHS: False
	LR_START_REDUCTION_ON_EPOCH: 0
	MAPPING: examples/mydata55//mapping.hi_en.pkl
	MAXLEN_GIVEN_X: True
	MAXLEN_GIVEN_X_FACTOR: 2
	MAX_EPOCH: 15
	MAX_INPUT_TEXT_LEN: 150
	MAX_OUTPUT_TEXT_LEN: 150
	MAX_OUTPUT_TEXT_LEN_TEST: 450
	MAX_PLOT_Y: 100.0
	METRICS: ['perplexity']
	MINLEN_GIVEN_X: True
	MINLEN_GIVEN_X_FACTOR: 3
	MIN_DELTA: 0.0
	MIN_LR: 1e-09
	MIN_OCCURRENCES_INPUT_VOCAB: 0
	MIN_OCCURRENCES_OUTPUT_VOCAB: 0
	MODE: training
	MODEL_NAME: mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001
	MODEL_SIZE: 32
	MODEL_TYPE: AttentionRNNEncoderDecoder
	MOMENTUM: 0.0
	MULTIHEAD_ATTENTION_ACTIVATION: linear
	NESTEROV_MOMENTUM: False
	NOISE_AMOUNT: 0.01
	NORMALIZE_SAMPLING: False
	N_GPUS: 3
	N_HEADS: 8
	N_LAYERS_DECODER: 2
	N_LAYERS_ENCODER: 2
	N_SAMPLES: 5
	OPTIMIZED_SEARCH: True
	OPTIMIZER: Adam
	OUTPUTS_IDS_DATASET: ['target_text']
	OUTPUTS_IDS_MODEL: ['target_text']
	OUTPUTS_TYPES_DATASET: ['text-features']
	OUTPUT_VOCABULARY_SIZE: 34431
	PAD_ON_BATCH: True
	PARALLEL_LOADERS: 1
	PATIENCE: 5
	PLOT_EVALUATION: False
	POS_UNK: True
	REBUILD_DATASET: True
	RECURRENT_DROPOUT_P: 0.0
	RECURRENT_INPUT_DROPOUT_P: 0.0
	RECURRENT_WEIGHT_DECAY: 0.0
	REGULARIZATION_FN: L2
	RELOAD: 0
	RELOAD_EPOCH: False
	RHO: 0.9
	SAMPLE_EACH_UPDATES: 300
	SAMPLE_ON_SETS: ['train', 'val']
	SAMPLE_WEIGHTS: True
	SAMPLING: max_likelihood
	SAMPLING_SAVE_MODE: list
	SAVE_EACH_EVALUATION: True
	SCALE_SOURCE_WORD_EMBEDDINGS: False
	SCALE_TARGET_WORD_EMBEDDINGS: False
	SEARCH_PRUNING: False
	SKIP_VECTORS_HIDDEN_SIZE: 500
	SKIP_VECTORS_SHARED_ACTIVATION: tanh
	SOURCE_TEXT_EMBEDDING_SIZE: 500
	SRC_LAN: hi
	SRC_PRETRAINED_VECTORS: None
	SRC_PRETRAINED_VECTORS_TRAINABLE: True
	START_EVAL_ON_EPOCH: 1
	START_SAMPLING_ON_EPOCH: 1
	STOP_METRIC: Bleu_4
	STORE_PATH: /scratch/trained_models/mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001/
	TARGET_TEXT_EMBEDDING_SIZE: 500
	TASK_NAME: mydata55
	TEMPERATURE: 1
	TENSORBOARD: True
	TEXT_FILES: {'train': 'train.', 'val': 'val.', 'test': 'TEST.'}
	TIE_EMBEDDINGS: False
	TOKENIZATION_METHOD: tokenize_none
	TOKENIZE_HYPOTHESES: True
	TOKENIZE_REFERENCES: True
	TRAINABLE_DECODER: True
	TRAINABLE_ENCODER: True
	TRAIN_ON_TRAINVAL: False
	TRG_LAN: en
	TRG_PRETRAINED_VECTORS: None
	TRG_PRETRAINED_VECTORS_TRAINABLE: True
	USE_BATCH_NORMALIZATION: True
	USE_CUDNN: False
	USE_L1: False
	USE_L2: False
	USE_NOISE: False
	USE_PRELU: False
	USE_TF_OPTIMIZER: True
	VERBOSE: 1
	WARMUP_EXP: -1.5
	WEIGHT_DECAY: 0.0001
	WRITE_VALID_SAMPLES: True
-----------------------------------------------------------------------------------
Model: "mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001_training"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
source_text (InputLayer)        (None, None)         0                                            
__________________________________________________________________________________________________
source_word_embedding (Embeddin (None, None, 500)    21483000    source_text[0][0]                
__________________________________________________________________________________________________
src_embedding_batch_normalizati (None, None, 500)    2000        source_word_embedding[0][0]      
__________________________________________________________________________________________________
remove_mask_1 (RemoveMask)      (None, None, 500)    0           src_embedding_batch_normalization
__________________________________________________________________________________________________
bidirectional_encoder_LSTM (Bid (None, None, 1512)   7602336     remove_mask_1[0][0]              
__________________________________________________________________________________________________
annotations_batch_normalization (None, None, 1512)   6048        bidirectional_encoder_LSTM[0][0] 
__________________________________________________________________________________________________
bidirectional_encoder_1 (Bidire (None, None, 1512)   13722912    annotations_batch_normalization[0
__________________________________________________________________________________________________
annotations_1_batch_normalizati (None, None, 1512)   6048        bidirectional_encoder_1[0][0]    
__________________________________________________________________________________________________
add_1 (Add)                     (None, None, 1512)   0           annotations_batch_normalization[0
                                                                 annotations_1_batch_normalization
__________________________________________________________________________________________________
source_text_mask (GetMask)      (None, None, 500)    0           src_embedding_batch_normalization
__________________________________________________________________________________________________
annotations (ApplyMask)         (None, None, 1512)   0           add_1[0][0]                      
                                                                 source_text_mask[0][0]           
__________________________________________________________________________________________________
state_below (InputLayer)        (None, None)         0                                            
__________________________________________________________________________________________________
ctx_mean (MaskedMean)           (None, 1512)         0           annotations[0][0]                
__________________________________________________________________________________________________
target_word_embedding (Embeddin (None, None, 500)    17215500    state_below[0][0]                
__________________________________________________________________________________________________
initial_state (Dense)           (None, 756)          1143828     ctx_mean[0][0]                   
__________________________________________________________________________________________________
initial_memory (Dense)          (None, 756)          1143828     ctx_mean[0][0]                   
__________________________________________________________________________________________________
state_below_batch_normalization (None, None, 500)    2000        target_word_embedding[0][0]      
__________________________________________________________________________________________________
initial_state_batch_normalizati (None, 756)          3024        initial_state[0][0]              
__________________________________________________________________________________________________
initial_memory_batch_normalizat (None, 756)          3024        initial_memory[0][0]             
__________________________________________________________________________________________________
decoder_AttConditionalLSTMCond  [(None, None, 756),  12378745    state_below_batch_normalization[0
                                                                 annotations[0][0]                
                                                                 initial_state_batch_normalization
                                                                 initial_memory_batch_normalizatio
__________________________________________________________________________________________________
proj_h0_batch_normalization (Ba (None, None, 756)    3024        decoder_AttConditionalLSTMCond[0]
__________________________________________________________________________________________________
permute_general_1 (PermuteGener multiple             0           decoder_AttConditionalLSTMCond[0]
                                                                 logit_ctx[0][0]                  
__________________________________________________________________________________________________
decoder_LSTMCond1 (LSTMCond)    [(None, None, 756),  9147600     proj_h0_batch_normalization[0][0]
                                                                 permute_general_1[0][0]          
                                                                 initial_state_batch_normalization
                                                                 initial_memory_batch_normalizatio
__________________________________________________________________________________________________
proj_h1_batch_normalization (Ba (None, None, 756)    3024        decoder_LSTMCond1[0][0]          
__________________________________________________________________________________________________
add_2 (Add)                     (None, None, 756)    0           proj_h0_batch_normalization[0][0]
                                                                 proj_h1_batch_normalization[0][0]
__________________________________________________________________________________________________
logit_ctx (TimeDistributed)     (None, None, 500)    756500      decoder_AttConditionalLSTMCond[0]
__________________________________________________________________________________________________
logit_lstm (TimeDistributed)    (None, None, 500)    378500      add_2[0][0]                      
__________________________________________________________________________________________________
logit_emb (TimeDistributed)     (None, None, 500)    250500      state_below_batch_normalization[0
__________________________________________________________________________________________________
out_layer_mlp_batch_normalizati (None, None, 500)    2000        logit_lstm[0][0]                 
__________________________________________________________________________________________________
out_layer_ctx_batch_normalizati (None, None, 500)    2000        permute_general_1[1][0]          
__________________________________________________________________________________________________
out_layer_emb_batch_normalizati (None, None, 500)    2000        logit_emb[0][0]                  
__________________________________________________________________________________________________
additional_input (Add)          (None, None, 500)    0           out_layer_mlp_batch_normalization
                                                                 out_layer_ctx_batch_normalization
                                                                 out_layer_emb_batch_normalization
__________________________________________________________________________________________________
activation_1 (Activation)       (None, None, 500)    0           additional_input[0][0]           
__________________________________________________________________________________________________
linear_0 (TimeDistributed)      (None, None, 500)    250500      activation_1[0][0]               
__________________________________________________________________________________________________
out_layer_linear_0_batch_normal (None, None, 500)    2000        linear_0[0][0]                   
__________________________________________________________________________________________________
target_text (TimeDistributed)   (None, None, 34431)  17249931    out_layer_linear_0_batch_normaliz
==================================================================================================
Total params: 102,759,872
Trainable params: 102,741,776
Non-trainable params: 18,096
__________________________________________________________________________________________________
[11/08/2020 20:50:40] From /home/pandramish.vinay/nmt-keras/nmt_keras/model_zoo.py:213: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

[11/08/2020 20:50:40] Preparing optimizer and compiling. Optimizer configuration: 
	 LR: 0.001
	 LOSS: categorical_crossentropy
	 BETA_1: 0.9
	 BETA_2: 0.999
	 EPSILON: 1e-08
[11/08/2020 20:50:40] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1192: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

[11/08/2020 20:50:40] <<< Training model >>>
[11/08/2020 20:50:40] Training parameters: { 
	batch_size: 412
	class_weights: None
	da_enhance_list: []
	da_patch_type: resize_and_rndcrop
	data_augmentation: False
	each_n_epochs: 1
	epoch_offset: 0
	epochs_for_save: 1
	eval_on_epochs: True
	eval_on_sets: None
	extra_callbacks: [<keras_wrapper.extra.callbacks.EvalPerformance object at 0x14d0400de320>, <keras_wrapper.extra.callbacks.Sample object at 0x14d0400e6860>]
	homogeneous_batches: False
	initial_lr: 0.001
	joint_batches: 4
	lr_decay: None
	lr_gamma: 0.8
	lr_half_life: 100
	lr_reducer_exp_base: -0.5
	lr_reducer_type: exponential
	lr_warmup_exp: -1.5
	maxlen: 150
	mean_substraction: False
	metric_check: Bleu_4
	min_delta: 0.0
	min_lr: 1e-09
	n_epochs: 15
	n_gpus: 3
	n_parallel_loaders: 1
	normalization_type: None
	normalize: False
	num_iterations_val: None
	patience: 5
	patience_check_split: val
	reduce_each_epochs: False
	reload_epoch: 0
	shuffle: True
	start_eval_on_epoch: 1
	start_reduction_on_epoch: 0
	tensorboard: True
	tensorboard_params: {'write_grads': False, 'batch_size': 412, 'update_freq': 'epoch', 'embeddings_layer_names': None, 'histogram_freq': 0, 'write_images': False, 'word_embeddings_labels': None, 'write_graph': True, 'log_dir': 'tensorboard_logs', 'embeddings_freq': None, 'embeddings_metadata': None}
	verbose: 1
	wo_da_patch_type: whole
}
[11/08/2020 20:50:40] <<< creating directory /scratch/trained_models/mydata55_hien_AttentionRNNEncoderDecoder_src_emb_500_bidir_True_enc_LSTM_756_dec_ConditionalLSTM_756_deepout_linear_trg_emb_500_Adam_0.001/tensorboard_logs ... >>>
[11/08/2020 20:50:44] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/callbacks/tensorboard_v1.py:200: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

[11/08/2020 20:50:44] From /home/pandramish.vinay/.local/lib/python3.5/site-packages/keras/callbacks/tensorboard_v1.py:203: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Epoch 1/15
2020-08-11 20:51:21.244817: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-08-11 20:51:36.044298: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 27.4KiB (rounded to 28160).  Current allocation summary follows.
2020-08-11 20:51:36.044578: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): 	Total Chunks: 243, Chunks in use: 239. 60.8KiB allocated for chunks. 59.8KiB in use in bin. 2.5KiB client-requested in use in bin.
2020-08-11 20:51:36.044605: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): 	Total Chunks: 13, Chunks in use: 11. 9.2KiB allocated for chunks. 8.2KiB in use in bin. 5.9KiB client-requested in use in bin.
2020-08-11 20:51:36.044629: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): 	Total Chunks: 2, Chunks in use: 2. 2.2KiB allocated for chunks. 2.2KiB in use in bin. 1.5KiB client-requested in use in bin.
2020-08-11 20:51:36.044640: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): 	Total Chunks: 252, Chunks in use: 251. 632.2KiB allocated for chunks. 630.2KiB in use in bin. 606.3KiB client-requested in use in bin.
2020-08-11 20:51:36.044665: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): 	Total Chunks: 58, Chunks in use: 58. 310.2KiB allocated for chunks. 310.2KiB in use in bin. 275.8KiB client-requested in use in bin.
2020-08-11 20:51:36.044674: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): 	Total Chunks: 70, Chunks in use: 70. 740.5KiB allocated for chunks. 740.5KiB in use in bin. 719.1KiB client-requested in use in bin.
2020-08-11 20:51:36.044682: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): 	Total Chunks: 211, Chunks in use: 211. 3.44MiB allocated for chunks. 3.44MiB in use in bin. 3.30MiB client-requested in use in bin.
2020-08-11 20:51:36.044691: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): 	Total Chunks: 1, Chunks in use: 1. 32.2KiB allocated for chunks. 32.2KiB in use in bin. 16.1KiB client-requested in use in bin.
2020-08-11 20:51:36.044700: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): 	Total Chunks: 195, Chunks in use: 195. 19.33MiB allocated for chunks. 19.33MiB in use in bin. 19.29MiB client-requested in use in bin.
2020-08-11 20:51:36.044708: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): 	Total Chunks: 62, Chunks in use: 62. 11.79MiB allocated for chunks. 11.79MiB in use in bin. 9.23MiB client-requested in use in bin.
2020-08-11 20:51:36.044717: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): 	Total Chunks: 2601, Chunks in use: 2601. 1015.97MiB allocated for chunks. 1015.97MiB in use in bin. 1012.12MiB client-requested in use in bin.
2020-08-11 20:51:36.044726: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): 	Total Chunks: 422, Chunks in use: 422. 332.47MiB allocated for chunks. 332.47MiB in use in bin. 256.36MiB client-requested in use in bin.
2020-08-11 20:51:36.044750: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): 	Total Chunks: 9, Chunks in use: 9. 13.00MiB allocated for chunks. 13.00MiB in use in bin. 11.03MiB client-requested in use in bin.
2020-08-11 20:51:36.044759: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): 	Total Chunks: 56, Chunks in use: 56. 127.48MiB allocated for chunks. 127.48MiB in use in bin. 126.31MiB client-requested in use in bin.
2020-08-11 20:51:36.044767: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): 	Total Chunks: 122, Chunks in use: 122. 648.19MiB allocated for chunks. 648.19MiB in use in bin. 639.69MiB client-requested in use in bin.
2020-08-11 20:51:36.044775: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): 	Total Chunks: 112, Chunks in use: 112. 1.21GiB allocated for chunks. 1.21GiB in use in bin. 1.18GiB client-requested in use in bin.
2020-08-11 20:51:36.044782: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): 	Total Chunks: 99, Chunks in use: 99. 2.11GiB allocated for chunks. 2.11GiB in use in bin. 2.02GiB client-requested in use in bin.
2020-08-11 20:51:36.044790: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): 	Total Chunks: 2, Chunks in use: 2. 69.81MiB allocated for chunks. 69.81MiB in use in bin. 47.76MiB client-requested in use in bin.
2020-08-11 20:51:36.044798: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): 	Total Chunks: 15, Chunks in use: 15. 1.09GiB allocated for chunks. 1.09GiB in use in bin. 1.04GiB client-requested in use in bin.
2020-08-11 20:51:36.044805: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-11 20:51:36.044813: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): 	Total Chunks: 6, Chunks in use: 6. 3.63GiB allocated for chunks. 3.63GiB in use in bin. 3.59GiB client-requested in use in bin.
2020-08-11 20:51:36.044821: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 27.5KiB was 16.0KiB, Chunk State: 
2020-08-11 20:51:36.044827: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 2399508736
2020-08-11 20:51:36.044843: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8b0000000 next 3716 of size 964619008
2020-08-11 20:51:36.044849: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e97eeb00 next 4149 of size 414464
2020-08-11 20:51:36.044855: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e9853e00 next 4150 of size 414464
2020-08-11 20:51:36.044861: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e98b9100 next 4151 of size 828672
2020-08-11 20:51:36.044867: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8e9983600 next 4152 of size 24857344
2020-08-11 20:51:36.044873: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb138100 next 4153 of size 16640
2020-08-11 20:51:36.044879: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb13c200 next 4154 of size 414464
2020-08-11 20:51:36.044884: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb1a1500 next 4155 of size 417536
2020-08-11 20:51:36.044890: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb207400 next 4156 of size 417536
2020-08-11 20:51:36.044896: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb26d300 next 4157 of size 417536
2020-08-11 20:51:36.044902: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb2d3200 next 4158 of size 208896
2020-08-11 20:51:36.044908: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb306200 next 4159 of size 104448
2020-08-11 20:51:36.044914: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb31fa00 next 4160 of size 104448
2020-08-11 20:51:36.044919: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb339200 next 4161 of size 104448
2020-08-11 20:51:36.044942: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb352a00 next 4162 of size 417536
2020-08-11 20:51:36.044947: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb3b8900 next 4163 of size 834816
2020-08-11 20:51:36.044953: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eb484600 next 4164 of size 25038848
2020-08-11 20:51:36.044959: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc65600 next 4165 of size 16640
2020-08-11 20:51:36.044964: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc69700 next 4166 of size 16640
2020-08-11 20:51:36.044970: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc6d800 next 4167 of size 16640
2020-08-11 20:51:36.044975: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecc71900 next 4168 of size 834816
2020-08-11 20:51:36.044981: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecd3d600 next 4169 of size 417536
2020-08-11 20:51:36.044986: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecda3500 next 4170 of size 417536
2020-08-11 20:51:36.044992: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ece09400 next 4171 of size 417536
2020-08-11 20:51:36.044998: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ece6f300 next 4172 of size 417536
2020-08-11 20:51:36.045003: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eced5200 next 4173 of size 417536
2020-08-11 20:51:36.045009: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecf3b100 next 4174 of size 16640
2020-08-11 20:51:36.045014: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ecf3f200 next 4175 of size 12519424
2020-08-11 20:51:36.045020: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8edb2fa00 next 4176 of size 417536
2020-08-11 20:51:36.045026: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8edb95900 next 4177 of size 2286336
2020-08-11 20:51:36.045031: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eddc3c00 next 4178 of size 417536
2020-08-11 20:51:36.045037: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ede29b00 next 4180 of size 417536
2020-08-11 20:51:36.045042: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ede8fa00 next 4181 of size 12519424
2020-08-11 20:51:36.045048: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eea80200 next 4182 of size 417536
2020-08-11 20:51:36.045054: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeae6100 next 4183 of size 417536
2020-08-11 20:51:36.045059: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeb4c000 next 4184 of size 16640
2020-08-11 20:51:36.045065: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeb50100 next 4185 of size 417536
2020-08-11 20:51:36.045070: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eebb6000 next 4186 of size 417536
2020-08-11 20:51:36.045076: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eec1bf00 next 4187 of size 417536
2020-08-11 20:51:36.045082: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eec81e00 next 4188 of size 828672
2020-08-11 20:51:36.045087: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eed4c300 next 4189 of size 414464
2020-08-11 20:51:36.045093: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eedb1600 next 4190 of size 414464
2020-08-11 20:51:36.045098: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eee16900 next 4191 of size 414464
2020-08-11 20:51:36.045104: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eee7bc00 next 4192 of size 414464
2020-08-11 20:51:36.045109: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eeee0f00 next 4193 of size 414464
2020-08-11 20:51:36.045115: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eef46200 next 4194 of size 414464
2020-08-11 20:51:36.045121: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eefab500 next 4195 of size 274176
2020-08-11 20:51:36.045126: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8eefee400 next 4196 of size 414464
2020-08-11 20:51:36.045131: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ef053700 next 4197 of size 414464
2020-08-11 20:51:36.045153: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x14c8ef0b8a00 next 4198 of size 414464

For other datasets,it's simply getting killed (Rest is same as above)

Epoch 1/15
Killed

``

from nmt-keras.

lvapeab avatar lvapeab commented on May 24, 2024

I see that you are using a batch size of 412. This is probably too large. Try reducing it.

from nmt-keras.

VP007-py avatar VP007-py commented on May 24, 2024

Works for as low as 64. Thank You !

from nmt-keras.

VP007-py avatar VP007-py commented on May 24, 2024

The same error persists even now (batch size: 16/32/64 too)... Maybe update the keras_wrapper to limit gpu growth from here

from nmt-keras.

lvapeab avatar lvapeab commented on May 24, 2024

keras_wrapper should not modify the behavior of GPU memory growth. In case you want to change the behavior of it, you can set an environment variable (export TF_FORCE_GPU_ALLOW_GROWTH=true).

Regarding you Killed error, it seems a problem with the main memory, rather than the GPU one.

from nmt-keras.

VP007-py avatar VP007-py commented on May 24, 2024

Hey, I filtered out very long sentences from the dataset and it gave no issues (In a ideal scenario, the system should work for longer sentences).
Yeah will set the environment variables if I run into errors !

from nmt-keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.