** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath
####### overrides: ['hydra.verbose=true', 'config=pretrain/rotnet/rotnet_8gpu_resnet', 'config.DATA.TRAIN.DATA_SOURCES=[disk_folder]', 'config.DATA.TRAIN.LABEL_SOURCES=[disk_folder]', 'config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder]', 'config.DATA.TRAIN.DATA_PATHS=[dummy_data/train]', 'config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2', 'config.DATA.TEST.DATA_SOURCES=[disk_folder]', 'config.DATA.TEST.LABEL_SOURCES=[disk_folder]', 'config.DATA.TEST.DATASET_NAMES=[dummy_data_folder]', 'config.DATA.TEST.DATA_PATHS=[dummy_data/val]', 'config.DATA.TEST.BATCHSIZE_PER_REPLICA=2', 'config.DISTRIBUTED.NUM_NODES=1', 'config.DISTRIBUTED.NUM_PROC_PER_NODE=1', 'config.OPTIMIZER.num_epochs=2', 'config.OPTIMIZER.param_schedulers.lr.values=[0.01,0.001]', 'config.OPTIMIZER.param_schedulers.lr.milestones=[1]', 'config.CHECKPOINT.DIR=./checkpoints', 'hydra.verbose=true']
INFO 2021-04-09 05:46:13,602 __init__.py: 34: Provided Config has latest version: 1
INFO 2021-04-09 05:46:13,603 run_distributed_engines.py: 163: Spawning process for node_id: 0, local_rank: 0, dist_rank: 0, dist_run_id: localhost:56173
INFO 2021-04-09 05:46:13,603 train.py: 67: Env set for rank: 0, dist_rank: 0
INFO 2021-04-09 05:46:13,603 env.py: 38: CONDA_DEFAULT_ENV: vissl_2
INFO 2021-04-09 05:46:13,603 env.py: 38: CONDA_EXE: /home/ec2-user/miniconda3/bin/conda
INFO 2021-04-09 05:46:13,603 env.py: 38: CONDA_PREFIX: /home/ec2-user/miniconda3/envs/vissl_2
INFO 2021-04-09 05:46:13,603 env.py: 38: CONDA_PREFIX_1: /home/ec2-user/miniconda3
INFO 2021-04-09 05:46:13,603 env.py: 38: CONDA_PROMPT_MODIFIER: (vissl_2)
INFO 2021-04-09 05:46:13,603 env.py: 38: CONDA_PYTHON_EXE: /home/ec2-user/miniconda3/bin/python
INFO 2021-04-09 05:46:13,603 env.py: 38: CONDA_SHLVL: 2
INFO 2021-04-09 05:46:13,603 env.py: 38: HISTCONTROL: ignoredups
INFO 2021-04-09 05:46:13,603 env.py: 38: HISTSIZE: 1000
INFO 2021-04-09 05:46:13,603 env.py: 38: HOME: /home/ec2-user
INFO 2021-04-09 05:46:13,603 env.py: 38: HOSTNAME: ip-10-0-6-212.vpc.internal
INFO 2021-04-09 05:46:13,603 env.py: 38: LANG: en_US.UTF-8
INFO 2021-04-09 05:46:13,603 env.py: 38: LESSOPEN: ||/usr/bin/lesspipe.sh %s
INFO 2021-04-09 05:46:13,603 env.py: 38: LOCAL_RANK: 0
INFO 2021-04-09 05:46:13,604 env.py: 38: LOGNAME: ec2-user
INFO 2021-04-09 05:46:13,604 env.py: 38: LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
INFO 2021-04-09 05:46:13,604 env.py: 38: MAIL: /var/spool/mail/ec2-user
INFO 2021-04-09 05:46:13,604 env.py: 38: OLDPWD: /home/ec2-user/vissl/configs
INFO 2021-04-09 05:46:13,604 env.py: 38: PATH: /usr/local/cuda-11.2/bin:/usr/local/cuda-11.2/bin:/home/ec2-user/miniconda3/envs/vissl_2/bin:/home/ec2-user/miniconda3/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/aws/bin:/home/ec2-user/miniconda3/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/opt/aws/bin:/home/ec2-user/miniconda3/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin
INFO 2021-04-09 05:46:13,604 env.py: 38: PWD: /home/ec2-user/vissl
INFO 2021-04-09 05:46:13,604 env.py: 38: RANK: 0
INFO 2021-04-09 05:46:13,604 env.py: 38: SELINUX_LEVEL_REQUESTED:
INFO 2021-04-09 05:46:13,604 env.py: 38: SELINUX_ROLE_REQUESTED:
INFO 2021-04-09 05:46:13,604 env.py: 38: SELINUX_USE_CURRENT_RANGE:
INFO 2021-04-09 05:46:13,604 env.py: 38: SHELL: /bin/bash
INFO 2021-04-09 05:46:13,604 env.py: 38: SHLVL: 2
INFO 2021-04-09 05:46:13,604 env.py: 38: SSH_CLIENT: 207.207.163.8 60744 22
INFO 2021-04-09 05:46:13,604 env.py: 38: SSH_CONNECTION: 207.207.163.8 60744 10.0.6.212 22
INFO 2021-04-09 05:46:13,604 env.py: 38: SSH_TTY: /dev/pts/0
INFO 2021-04-09 05:46:13,604 env.py: 38: TERM: screen
INFO 2021-04-09 05:46:13,604 env.py: 38: TMUX: /tmp/tmux-1000/default,12776,0
INFO 2021-04-09 05:46:13,604 env.py: 38: TMUX_PANE: %0
INFO 2021-04-09 05:46:13,604 env.py: 38: USER: ec2-user
INFO 2021-04-09 05:46:13,604 env.py: 38: WORLD_SIZE: 1
INFO 2021-04-09 05:46:13,604 env.py: 38: XDG_RUNTIME_DIR: /run/user/1000
INFO 2021-04-09 05:46:13,604 env.py: 38: XDG_SESSION_ID: 53
INFO 2021-04-09 05:46:13,604 env.py: 38: _: /home/ec2-user/miniconda3/envs/vissl_2/bin/python3
INFO 2021-04-09 05:46:13,604 env.py: 38: _CE_CONDA:
INFO 2021-04-09 05:46:13,604 env.py: 38: _CE_M:
INFO 2021-04-09 05:46:13,604 misc.py: 133: Set start method of multiprocessing to forkserver
INFO 2021-04-09 05:46:13,604 train.py: 78: Setting seed....
INFO 2021-04-09 05:46:13,604 misc.py: 146: MACHINE SEED: 1
INFO 2021-04-09 05:46:13,742 hydra_config.py: 63: Training with config:
INFO 2021-04-09 05:46:13,747 hydra_config.py: 67: {'CHECKPOINT': {'APPEND_DISTR_RUN_ID': False,
'AUTO_RESUME': True,
'BACKEND': 'disk',
'CHECKPOINT_FREQUENCY': 1,
'CHECKPOINT_ITER_FREQUENCY': -1,
'DIR': './checkpoints',
'LATEST_CHECKPOINT_RESUME_FILE_NUM': 1,
'OVERWRITE_EXISTING': False,
'USE_SYMLINK_CHECKPOINT_FOR_RESUME': False},
'CLUSTERFIT': {'CLUSTER_BACKEND': 'faiss',
'FEATURES': {'DATASET_NAME': '',
'DATA_PARTITION': 'TRAIN',
'LAYER_NAME': ''},
'NUM_CLUSTERS': 16000,
'N_ITER': 50},
'DATA': {'DDP_BUCKET_CAP_MB': 25,
'ENABLE_ASYNC_GPU_COPY': True,
'NUM_DATALOADER_WORKERS': 1,
'PIN_MEMORY': True,
'TEST': {'BATCHSIZE_PER_REPLICA': 2,
'COLLATE_FUNCTION': 'default_collate',
'COLLATE_FUNCTION_PARAMS': {},
'COPY_DESTINATION_DIR': '',
'COPY_TO_LOCAL_DISK': False,
'DATASET_NAMES': ['dummy_data_folder'],
'DATA_LIMIT': -1,
'DATA_LIMIT_SAMPLING': {'IS_BALANCED': False,
'SEED': 0,
'SKIP_NUM_SAMPLES': 0},
'DATA_PATHS': ['dummy_data/val'],
'DATA_SOURCES': ['disk_folder'],
'DEFAULT_GRAY_IMG_SIZE': 224,
'DROP_LAST': False,
'ENABLE_QUEUE_DATASET': False,
'INPUT_KEY_NAMES': ['data'],
'LABEL_PATHS': [],
'LABEL_SOURCES': ['disk_folder'],
'LABEL_TYPE': 'standard',
'MMAP_MODE': True,
'NEW_IMG_PATH_PREFIX': '',
'REMOVE_IMG_PATH_PREFIX': '',
'TARGET_KEY_NAMES': ['label'],
'TRANSFORMS': [{'name': 'ImgRotatePil'},
{'name': 'Resize', 'size': 256},
{'name': 'CenterCrop', 'size': 224},
{'name': 'ToTensor'},
{'mean': [0.485, 0.456, 0.406],
'name': 'Normalize',
'std': [0.229, 0.224, 0.225]}],
'USE_STATEFUL_DISTRIBUTED_SAMPLER': False},
'TRAIN': {'BATCHSIZE_PER_REPLICA': 2,
'COLLATE_FUNCTION': 'default_collate',
'COLLATE_FUNCTION_PARAMS': {},
'COPY_DESTINATION_DIR': '',
'COPY_TO_LOCAL_DISK': False,
'DATASET_NAMES': ['dummy_data_folder'],
'DATA_LIMIT': -1,
'DATA_LIMIT_SAMPLING': {'IS_BALANCED': False,
'SEED': 0,
'SKIP_NUM_SAMPLES': 0},
'DATA_PATHS': ['dummy_data/train'],
'DATA_SOURCES': ['disk_folder'],
'DEFAULT_GRAY_IMG_SIZE': 224,
'DROP_LAST': False,
'ENABLE_QUEUE_DATASET': False,
'INPUT_KEY_NAMES': ['data'],
'LABEL_PATHS': [],
'LABEL_SOURCES': ['disk_folder'],
'LABEL_TYPE': 'standard',
'MMAP_MODE': True,
'NEW_IMG_PATH_PREFIX': '',
'REMOVE_IMG_PATH_PREFIX': '',
'TARGET_KEY_NAMES': ['label'],
'TRANSFORMS': [{'name': 'ImgRotatePil'},
{'name': 'RandomResizedCrop', 'size': 224},
{'name': 'RandomHorizontalFlip'},
{'name': 'ToTensor'},
{'mean': [0.485, 0.456, 0.406],
'name': 'Normalize',
'std': [0.229, 0.224, 0.225]}],
'USE_STATEFUL_DISTRIBUTED_SAMPLER': False}},
'DISTRIBUTED': {'BACKEND': 'nccl',
'BROADCAST_BUFFERS': True,
'INIT_METHOD': 'tcp',
'MANUAL_GRADIENT_REDUCTION': False,
'NCCL_DEBUG': False,
'NCCL_SOCKET_NTHREADS': '',
'NUM_NODES': 1,
'NUM_PROC_PER_NODE': 1,
'RUN_ID': 'auto'},
'HOOKS': {'LOG_GPU_STATS': True,
'MEMORY_SUMMARY': {'LOG_ITERATION_NUM': 0,
'PRINT_MEMORY_SUMMARY': True},
'MODEL_COMPLEXITY': {'COMPUTE_COMPLEXITY': False,
'INPUT_SHAPE': [3, 224, 224]},
'PERF_STATS': {'MONITOR_PERF_STATS': False,
'PERF_STAT_FREQUENCY': -1,
'ROLLING_BTIME_FREQ': -1},
'TENSORBOARD_SETUP': {'EXPERIMENT_LOG_DIR': 'tensorboard',
'FLUSH_EVERY_N_MIN': 5,
'LOG_DIR': '.',
'LOG_PARAMS': True,
'LOG_PARAMS_EVERY_N_ITERS': 310,
'LOG_PARAMS_GRADIENTS': True,
'USE_TENSORBOARD': False}},
'IMG_RETRIEVAL': {'DATASET_PATH': '',
'EVAL_BINARY_PATH': '',
'EVAL_DATASET_NAME': 'Paris',
'FEATS_PROCESSING_TYPE': '',
'GEM_POOL_POWER': 4.0,
'N_PCA': 512,
'RESIZE_IMG': 1024,
'SHOULD_TRAIN_PCA_OR_WHITENING': True,
'SPATIAL_LEVELS': 3,
'TEMP_DIR': '/tmp/instance_retrieval/',
'TRAIN_DATASET_NAME': 'Oxford',
'WHITEN_IMG_LIST': ''},
'LOG_FREQUENCY': 100,
'LOSS': {'CrossEntropyLoss': {'ignore_index': -1},
'bce_logits_multiple_output_single_target': {'normalize_output': False,
'reduction': 'none',
'world_size': 1},
'cross_entropy_multiple_output_single_target': {'ignore_index': -1,
'normalize_output': False,
'reduction': 'mean',
'temperature': 1.0,
'weight': None},
'deepclusterv2_loss': {'BATCHSIZE_PER_REPLICA': 256,
'DROP_LAST': True,
'kmeans_iters': 10,
'memory_params': {'crops_for_mb': [0],
'embedding_dim': 128},
'num_clusters': [3000, 3000, 3000],
'num_crops': 2,
'num_train_samples': -1,
'temperature': 0.1},
'ignore_index': -1,
'moco_loss': {'embedding_dim': 128,
'momentum': 0.999,
'queue_size': 65536,
'temperature': 0.2},
'multicrop_simclr_info_nce_loss': {'buffer_params': {'effective_batch_size': 4096,
'embedding_dim': 128,
'world_size': 64},
'num_crops': 2,
'temperature': 0.1},
'name': 'cross_entropy_multiple_output_single_target',
'nce_loss_with_memory': {'loss_type': 'nce',
'loss_weights': [1.0],
'memory_params': {'embedding_dim': 128,
'memory_size': -1,
'momentum': 0.5,
'norm_init': True,
'update_mem_on_forward': True},
'negative_sampling_params': {'num_negatives': 16000,
'type': 'random'},
'norm_constant': -1,
'norm_embedding': True,
'num_train_samples': -1,
'temperature': 0.07,
'update_mem_with_emb_index': -100},
'simclr_info_nce_loss': {'buffer_params': {'effective_batch_size': 4096,
'embedding_dim': 128,
'world_size': 64},
'temperature': 0.1},
'swav_loss': {'crops_for_assign': [0, 1],
'embedding_dim': 128,
'epsilon': 0.05,
'normalize_last_layer': True,
'num_crops': 2,
'num_iters': 3,
'num_prototypes': [3000],
'output_dir': '.',
'queue': {'local_queue_length': 0,
'queue_length': 0,
'start_iter': 0},
'temp_hard_assignment_iters': 0,
'temperature': 0.1,
'use_double_precision': False},
'swav_momentum_loss': {'crops_for_assign': [0, 1],
'embedding_dim': 128,
'epsilon': 0.05,
'momentum': 0.99,
'momentum_eval_mode_iter_start': 0,
'normalize_last_layer': True,
'num_crops': 2,
'num_iters': 3,
'num_prototypes': [3000],
'queue': {'local_queue_length': 0,
'queue_length': 0,
'start_iter': 0},
'temperature': 0.1,
'use_double_precision': False}},
'MACHINE': {'DEVICE': 'gpu'},
'METERS': {'accuracy_list_meter': {'meter_names': [],
'num_meters': 1,
'topk_values': [1]},
'enable_training_meter': True,
'mean_ap_list_meter': {'max_cpu_capacity': -1,
'meter_names': [],
'num_classes': 9605,
'num_meters': 1},
'name': 'accuracy_list_meter'},
'MODEL': {'ACTIVATION_CHECKPOINTING': {'NUM_ACTIVATION_CHECKPOINTING_SPLITS': 2,
'USE_ACTIVATION_CHECKPOINTING': False},
'AMP_PARAMS': {'AMP_ARGS': {'opt_level': 'O1'},
'AMP_TYPE': 'apex',
'USE_AMP': False},
'CUDA_CACHE': {'CLEAR_CUDA_CACHE': False, 'CLEAR_FREQ': 100},
'FEATURE_EVAL_SETTINGS': {'EVAL_MODE_ON': False,
'EVAL_TRUNK_AND_HEAD': False,
'EXTRACT_TRUNK_FEATURES_ONLY': False,
'FREEZE_TRUNK_AND_HEAD': False,
'FREEZE_TRUNK_ONLY': False,
'LINEAR_EVAL_FEAT_POOL_OPS_MAP': [],
'SHOULD_FLATTEN_FEATS': True},
'FSDP_CONFIG': {'flatten_parameters': True,
'fp32_reduce_scatter': False,
'mixed_precision': True},
'GRAD_CLIP': {'MAX_NORM': 1, 'NORM_TYPE': 2, 'USE_GRAD_CLIP': False},
'HEAD': {'BATCHNORM_EPS': 1e-05,
'BATCHNORM_MOMENTUM': 0.1,
'INPLACE_RELU': True,
'PARAMS': [['mlp', {'dims': [2048, 4]}]],
'PARAMS_MULTIPLIER': 1.0},
'INPUT_TYPE': 'rgb',
'MULTI_INPUT_HEAD_MAPPING': [],
'NON_TRAINABLE_PARAMS': [],
'SHARDED_DDP_SETUP': {'reduce_buffer_size': -1},
'SINGLE_PASS_EVERY_CROP': False,
'SYNC_BN_CONFIG': {'CONVERT_BN_TO_SYNC_BN': False,
'GROUP_SIZE': -1,
'SYNC_BN_TYPE': 'pytorch'},
'TEMP_FROZEN_PARAMS_ITER_MAP': [],
'TRUNK': {'NAME': 'resnet',
'TRUNK_PARAMS': {'EFFICIENT_NETS': {},
'REGNET': {},
'RESNETS': {'DEPTH': 50,
'GROUPNORM_GROUPS': 32,
'GROUPS': 1,
'LAYER4_STRIDE': 2,
'NORM': 'BatchNorm',
'STANDARDIZE_CONVOLUTIONS': False,
'WIDTH_MULTIPLIER': 1,
'WIDTH_PER_GROUP': 64,
'ZERO_INIT_RESIDUAL': False},
'VISION_TRANSFORMERS': {'ATTENTION_DROPOUT_RATE': 0,
'CLASSIFIER': 'token',
'DROPOUT_RATE': 0,
'DROP_PATH_RATE': 0,
'HIDDEN_DIM': 768,
'IMAGE_SIZE': 224,
'MLP_DIM': 3072,
'NUM_HEADS': 12,
'NUM_LAYERS': 12,
'PATCH_SIZE': 16,
'QKV_BIAS': False,
'QK_SCALE': False,
'name': None}}},
'WEIGHTS_INIT': {'APPEND_PREFIX': '',
'PARAMS_FILE': '',
'REMOVE_PREFIX': '',
'SKIP_LAYERS': ['num_batches_tracked'],
'STATE_DICT_KEY_NAME': 'classy_state_dict'}},
'MULTI_PROCESSING_METHOD': 'forkserver',
'NEAREST_NEIGHBOR': {'L2_NORM_FEATS': False, 'SIGMA': 0.1, 'TOPK': 200},
'OPTIMIZER': {'betas': [0.9, 0.999],
'construct_single_param_group_only': False,
'head_optimizer_params': {'use_different_lr': False,
'use_different_wd': False,
'weight_decay': 0.0001},
'larc_config': {'clip': False,
'eps': 1e-08,
'trust_coefficient': 0.001},
'momentum': 0.9,
'name': 'sgd',
'nesterov': False,
'non_regularized_parameters': [],
'num_epochs': 2,
'param_schedulers': {'lr': {'auto_lr_scaling': {'auto_scale': True,
'base_lr_batch_size': 1,
'base_value': 0.1},
'end_value': 0.0,
'interval_scaling': [],
'lengths': [],
'milestones': [1],
'name': 'multistep',
'schedulers': [],
'start_value': 0.1,
'update_interval': 'epoch',
'value': 0.1,
'values': [0.2, 0.02]},
'lr_head': {'auto_lr_scaling': {'auto_scale': True,
'base_lr_batch_size': 1,
'base_value': 0.1},
'end_value': 0.0,
'interval_scaling': [],
'lengths': [],
'milestones': [1],
'name': 'multistep',
'schedulers': [],
'start_value': 0.1,
'update_interval': 'epoch',
'value': 0.1,
'values': [0.2, 0.02]}},
'regularize_bias': True,
'regularize_bn': False,
'use_larc': False,
'use_zero': False,
'weight_decay': 0.0001},
'SEED_VALUE': 1,
'SLURM': {'COMMENT': 'vissl job',
'CONSTRAINT': '',
'LOG_FOLDER': '.',
'MEM_GB': 250,
'NAME': 'vissl',
'PARTITION': 'learnfair',
'PORT_ID': 40050,
'TIME_HOURS': 72,
'USE_SLURM': False},
'SVM': {'cls_list': [],
'costs': {'base': -1.0,
'costs_list': [0.1, 0.01],
'power_range': [4, 20]},
'cross_val_folds': 3,
'dual': True,
'force_retrain': False,
'loss': 'squared_hinge',
'low_shot': {'dataset_name': 'voc',
'k_values': [1, 2, 4, 8, 16, 32, 64, 96],
'sample_inds': [1, 2, 3, 4, 5]},
'max_iter': 2000,
'normalize': True,
'penalty': 'l2'},
'TEST_EVERY_NUM_EPOCH': 5,
'TEST_MODEL': True,
'TEST_ONLY': False,
'TRAINER': {'TASK_NAME': 'self_supervision_task',
'TRAIN_STEP_NAME': 'standard_train_step'},
'VERBOSE': False}
INFO 2021-04-09 05:46:14,206 train.py: 90: System config:
------------------- -------------------------------------------------------------------------------------------
sys.platform linux
Python 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
numpy 1.19.5
Pillow 8.2.0
vissl 0.1.5 @/home/ec2-user/vissl/vissl
GPU available True
GPU 0,1,2,3 Tesla T4
CUDA_HOME /usr/local/cuda-11.2
torchvision 0.9.1+cu102 @/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torchvision
hydra 1.0.6 @/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/hydra
classy_vision 0.6.0.dev @/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/classy_vision
apex 0.1 @/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/apex
cv2 4.5.1
PyTorch 1.8.1+cu102 @/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch
PyTorch debug build False
------------------- -------------------------------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
CPU info:
------------------- ----------------------------------------------
Architecture x86_64
CPU op-mode(s) 32-bit, 64-bit
Byte Order Little Endian
CPU(s) 48
On-line CPU(s) list 0-47
Thread(s) per core 2
Core(s) per socket 24
Socket(s) 1
NUMA node(s) 1
Vendor ID GenuineIntel
CPU family 6
Model 85
Model name Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping 7
CPU MHz 2998.569
BogoMIPS 4999.99
Hypervisor vendor KVM
Virtualization type full
L1d cache 32K
L1i cache 32K
L2 cache 1024K
L3 cache 36608K
NUMA node0 CPU(s) 0-47
------------------- ----------------------------------------------
INFO 2021-04-09 05:46:14,207 train_task.py: 194: Not using Automatic Mixed Precision
INFO 2021-04-09 05:46:14,207 trainer_main.py: 109: Using Distributed init method: tcp://localhost:56173, world_size: 1, rank: 0
INFO 2021-04-09 05:46:14,209 distributed_c10d.py: 187: Added key: store_based_barrier_key:1 to store for rank: 0
INFO 2021-04-09 05:46:14,209 trainer_main.py: 130: | initialized host ip-10-0-6-212.vpc.internal as rank 0 (0)
INFO 2021-04-09 05:46:14,209 img_rotate_pil.py: 56: ImgRotatePil | Using num_angles: 4
INFO 2021-04-09 05:46:14,209 img_rotate_pil.py: 58: ImgRotatePil | Using num_rotations_per_img: 1
INFO 2021-04-09 05:46:14,210 ssl_dataset.py: 153: Rank: 0 split: TEST Data files:
['dummy_data/val']
INFO 2021-04-09 05:46:14,210 ssl_dataset.py: 156: Rank: 0 split: TEST Label files:
['dummy_data/val']
INFO 2021-04-09 05:46:14,210 disk_dataset.py: 83: Loaded 10 samples from folder dummy_data/val
INFO 2021-04-09 05:46:14,211 img_rotate_pil.py: 56: ImgRotatePil | Using num_angles: 4
INFO 2021-04-09 05:46:14,211 img_rotate_pil.py: 58: ImgRotatePil | Using num_rotations_per_img: 1
INFO 2021-04-09 05:46:14,211 ssl_dataset.py: 153: Rank: 0 split: TRAIN Data files:
['dummy_data/train']
INFO 2021-04-09 05:46:14,211 ssl_dataset.py: 156: Rank: 0 split: TRAIN Label files:
['dummy_data/train']
INFO 2021-04-09 05:46:14,211 disk_dataset.py: 83: Loaded 10 samples from folder dummy_data/train
INFO 2021-04-09 05:46:14,211 misc.py: 133: Set start method of multiprocessing to forkserver
INFO 2021-04-09 05:46:14,211 __init__.py: 109: Created the Distributed Sampler....
INFO 2021-04-09 05:46:14,211 __init__.py: 90: Distributed Sampler config:
{'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 10, 'total_size': 10, 'shuffle': True, 'seed': 0}
INFO 2021-04-09 05:46:14,212 __init__.py: 173: Wrapping the dataloader to async device copies
INFO 2021-04-09 05:46:17,220 misc.py: 133: Set start method of multiprocessing to forkserver
INFO 2021-04-09 05:46:17,220 __init__.py: 109: Created the Distributed Sampler....
INFO 2021-04-09 05:46:17,220 __init__.py: 90: Distributed Sampler config:
{'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 10, 'total_size': 10, 'shuffle': True, 'seed': 0}
INFO 2021-04-09 05:46:17,220 __init__.py: 173: Wrapping the dataloader to async device copies
INFO 2021-04-09 05:46:17,220 train_task.py: 422: Building model....
INFO 2021-04-09 05:46:17,220 resnext.py: 63: ResNeXT trunk, supports activation checkpointing. Deactivated
INFO 2021-04-09 05:46:17,221 resnext.py: 83: Building model: ResNeXt50-1x64d-w1-BatchNorm2d
INFO 2021-04-09 05:46:17,751 train_task.py: 596: Broadcast model BN buffers from master on every forward pass
INFO 2021-04-09 05:46:17,751 classification_task.py: 377: Synchronized Batch Normalization is disabled
INFO 2021-04-09 05:46:17,751 train_task.py: 342: Building loss...
INFO 2021-04-09 05:46:17,788 optimizer_helper.py: 254:
Trainable params: 161,
Non-Trainable params: 0,
Trunk Regularized Parameters: 53,
Trunk Unregularized Parameters 106,
Head Regularized Parameters: 2,
Head Unregularized Parameters: 0
Remaining Regularized Parameters: 0
Remaining Unregularized Parameters: 0
INFO 2021-04-09 05:46:17,789 trainer_main.py: 246: Training 2 epochs. One epoch = 5 iterations
INFO 2021-04-09 05:46:17,789 trainer_main.py: 248: Total 10 iterations for training
INFO 2021-04-09 05:46:17,789 trainer_main.py: 249: Total 10 samples in one epoch
INFO 2021-04-09 05:46:18,039 logger.py: 80: Fri Apr 9 05:46:17 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 24C P0 25W / 70W | 1166MiB / 15109MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 24C P8 9W / 70W | 3MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 23C P8 9W / 70W | 3MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 24C P8 9W / 70W | 3MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 36191 C python3 1163MiB |
+-----------------------------------------------------------------------------+
INFO 2021-04-09 05:46:18,040 trainer_main.py: 166: Model is:
Classy <class 'vissl.models.base_ssl_model.BaseSSLMultiInputOutputModel'>:
BaseSSLMultiInputOutputModel(
(_heads): ModuleDict()
(trunk): ResNeXt(
(_feature_blocks): ModuleDict(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1_relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(<SUPPORTED_L4_STRIDE.two: 2>, <SUPPORTED_L4_STRIDE.two: 2>), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(<SUPPORTED_L4_STRIDE.two: 2>, <SUPPORTED_L4_STRIDE.two: 2>), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(flatten): Flatten()
)
)
(heads): ModuleList(
(0): MLP(
(clf): Sequential(
(0): Linear(in_features=2048, out_features=4, bias=True)
)
)
)
)
INFO 2021-04-09 05:46:18,040 trainer_main.py: 167: Loss is: CrossEntropyMultipleOutputSingleTargetLoss(
(_losses): ModuleList()
)
INFO 2021-04-09 05:46:18,041 trainer_main.py: 168: Starting training....
INFO 2021-04-09 05:46:18,041 __init__.py: 90: Distributed Sampler config:
{'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 10, 'total_size': 10, 'shuffle': True, 'seed': 0}
** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath
Traceback (most recent call last):
File "run_distributed_engines.py", line 194, in <module>
hydra_main(overrides=overrides)
File "run_distributed_engines.py", line 179, in hydra_main
hook_generator=default_hook_generator,
File "run_distributed_engines.py", line 123, in launch_distributed
hook_generator=hook_generator,
File "run_distributed_engines.py", line 166, in _distributed_worker
process_main(cfg, dist_run_id, local_rank=local_rank, node_id=node_id)
File "run_distributed_engines.py", line 159, in process_main
hook_generator=hook_generator,
File "/home/ec2-user/vissl/vissl/engines/train.py", line 103, in train_main
trainer.train()
File "/home/ec2-user/vissl/vissl/trainer/trainer_main.py", line 171, in train
self._advance_phase(task) # advances task.phase_idx
File "/home/ec2-user/vissl/vissl/trainer/trainer_main.py", line 291, in _advance_phase
phase_type, epoch=task.phase_idx, compute_start_iter=compute_start_iter
File "/home/ec2-user/vissl/vissl/trainer/train_task.py", line 506, in recreate_data_iterator
self.data_iterator = iter(self.dataloaders[phase_type])
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/classy_vision/dataset/dataloader_async_gpu_wrapper.py", line 40, in __iter__
self.preload()
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/classy_vision/dataset/dataloader_async_gpu_wrapper.py", line 46, in preload
self.cache_next = next(self._iter)
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ec2-user/vissl/vissl/data/ssl_dataset.py", line 355, in __getitem__
item = self.transform(item)
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__
img = t(img)
File "/home/ec2-user/vissl/vissl/data/ssl_transforms/__init__.py", line 144, in __call__
output = self.transform(sample["data"][idx])
File "/home/ec2-user/vissl/vissl/data/ssl_transforms/img_rotate_pil.py", line 39, in __call__
img = TF.rotate(image, self.angles[label])
File "/home/ec2-user/miniconda3/envs/vissl_2/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 949, in rotate
raise TypeError("Argument angle should be int or float")
TypeError: Argument angle should be int or float
VISSL devs and contributors aim to triage issues asap however, as a general guideline, we ask users to expect triaging in 1-2 weeks.