zhugekongkong / sgg-g2s Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Hi Yuyu,
Sorry about posting yet another question for today. I'm going through the code but I don't see any direct mention of the target domain construction that's central to BPL. I do see that the eval code uses information_content
loaded from VG-SGG-dicts-with-attri-info.json
. I have a couple of questions for you:
VG-SGG-dicts-with-attri-info.json
and its Wikipedia counterpart? No new training targets?VG-SGG-dicts-with-attri-info.json
with the information content is only used for the recall calculations at eval. How is BPL used at train? Or is it just the dynamic adjustment of targets through additional linear layers (post_cat_clean
and rel_compress_clean
) in predictor meta-models on top of the underlying model with the pred_adj_nor
adjustment?with_clean_classifier
and with_transfer
logic, which is basically the BPL fine-tuning code. By default, the codebase seems to use SA and BPL at the same time. I'm wondering how one can do one without the other.Hi,
I saw that you had GCN layers defined in the code but you didn't seem to end up using it. Is that correct? If so, why did you comment it out? I'm interested in using GCN in SGG.
The relevant code is at
Got this error. Any advice?
(sgb) zhanwen@zhanwen-Legion-7-16ITHg6:~/sgb/bpl$ bash scripts/train.sh 0
TRAINING Predcls
mkdir: cannot create directory ‘./checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat/’: No such file or directory
cp: cannot create regular file './checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat/': No such file or directory
cp: cannot create regular file './checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat/': No such file or directory
cp: cannot create regular file './checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat/': No such file or directory
cp: cannot create regular file './checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat/': No such file or directory
cp: cannot create regular file './checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat/': No such file or directory
Traceback (most recent call last):
File "tools/relation_train_net.py", line 446, in <module>
main()
File "tools/relation_train_net.py", line 414, in main
cfg.merge_from_list(args.opts)
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/yacs/config.py", line 243, in merge_from_list
_assert_with_logging(subkey in d, "Non-existent key: {}".format(full_key))
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/yacs/config.py", line 545, in _assert_with_logging
assert cond, msg
AssertionError: Non-existent key: MODEL.PRETRAINED_MODEL_CKPT
Killing subprocess 34020
Traceback (most recent call last):
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
main()
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zhanwen/anaconda3/envs/sgb/bin/python', '-u', 'tools/relation_train_net.py', '--local_rank=0', '--config-file', 'configs/e2e_relation_X_101_32_8_FPN_1x_transformer.yaml', 'MODEL.ROI_RELATION_HEAD.USE_GT_BOX', 'True', 'MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL', 'True', 'MODEL.ROI_RELATION_HEAD.PREDICTOR', 'TransformerTransferPredictor', 'MODEL.ROI_RELATION_HEAD.PREDICT_USE_BIAS', 'True', 'DTYPE', 'float32', 'SOLVER.IMS_PER_BATCH', '16', 'TEST.IMS_PER_BATCH', '1', 'SOLVER.MAX_ITER', '16000', 'SOLVER.BASE_LR', '1e-3', 'SOLVER.SCHEDULE.TYPE', 'WarmupMultiStepLR', 'SOLVER.STEPS', '(10000, 16000)', 'SOLVER.VAL_PERIOD', '3000', 'SOLVER.CHECKPOINT_PERIOD', '2000', 'GLOVE_DIR', './datasets/vg/', 'MODEL.PRETRAINED_DETECTOR_CKPT', './checkpoints/pretrained_faster_rcnn/model_final.pth', 'MODEL.PRETRAINED_MODEL_CKPT', './checkpoints_best/transformer_predcls_float32_epoch16_batch16/model_final.pth', 'MODEL.ROI_RELATION_HEAD.WITH_CLEAN_CLASSIFIER', 'True', 'MODEL.ROI_RELATION_HEAD.WITH_TRANSFER_CLASSIFIER', 'True', 'OUTPUT_DIR', './checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat']' returned non-zero exit status 1.
Hi Yuyu,
I noticed that in the codebase (https://github.com/ZhuGeKongKong/SGG-G2S/blob/main/tools/relation_train_net.py#L186), the training logic goes by a single pass through the train loader with a max of 16000 batches. But I see from your best model that you had an epoch 16. Did you modify the code to run epochs on top of the iterations or just manually ran train.sh again and again?
Do you have any tools to process the custom datasets(i have only .jpg pictures) for the same task? I want to use in my custom datasets, but i don't konw how to label the right .json document, because the VG datasets don't publish the tools. I have seen the KaihuaTang/Scene-Graph-Benchmark.pytorch code ,but have no idea to process the my custom datasets for training and testing. Thanks very much! my QQ is 2473572913.
Hi Yuyu,
Thank you so much for answering my previous questions. I also noticed that the IMPPredictor is the only predictor (except for CausalAnalysisPredictor
) that doesn't use with_clean_classifier
and with_transfer
like the other ones. Would you please explain why this is the case? Is there any particular trait for IMPPredictor that prohibits transfer?
Sorry I can't figure this one out after fixing the last issue. Can you please take a look?
2022-02-14 20:08:56,786 maskrcnn_benchmark INFO: Saving config into: ./checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat/config.yml
2022-02-14 20:08:56,796 maskrcnn_benchmark INFO: #################### prepare training ####################
Traceback (most recent call last):
File "tools/relation_train_net.py", line 446, in <module>
main()
File "tools/relation_train_net.py", line 439, in main
model = train(cfg, args.local_rank, args.distributed, logger)
File "tools/relation_train_net.py", line 55, in train
model = build_detection_model(cfg)
File "/home/zhanwen/sgb/sgb/maskrcnn_benchmark/modeling/detector/detectors.py", line 10, in build_detection_model
return meta_arch(cfg)
File "/home/zhanwen/sgb/sgb/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 31, in __init__
self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels)
File "/home/zhanwen/sgb/sgb/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 89, in build_roi_heads
roi_heads.append(("relation", build_roi_relation_head(cfg, in_channels)))
File "/home/zhanwen/sgb/sgb/maskrcnn_benchmark/modeling/roi_heads/relation_head/relation_head.py", line 105, in build_roi_relation_head
return ROIRelationHead(cfg, in_channels)
File "/home/zhanwen/sgb/sgb/maskrcnn_benchmark/modeling/roi_heads/relation_head/relation_head.py", line 33, in __init__
self.predictor = make_roi_relation_predictor(cfg, feat_dim)
File "/home/zhanwen/sgb/sgb/maskrcnn_benchmark/modeling/roi_heads/relation_head/roi_relation_predictors.py", line 689, in make_roi_relation_predictor
func = registry.ROI_RELATION_PREDICTOR[cfg.MODEL.ROI_RELATION_HEAD.PREDICTOR]
KeyError: 'TransformerTransferPredictor'
Killing subprocess 4536
Traceback (most recent call last):
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
main()
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/zhanwen/anaconda3/envs/sgb/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zhanwen/anaconda3/envs/sgb/bin/python', '-u', 'tools/relation_train_net.py', '--local_rank=0', '--config-file', 'configs/e2e_relation_X_101_32_8_FPN_1x_transformer.yaml', 'MODEL.ROI_RELATION_HEAD.USE_GT_BOX', 'True', 'MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL', 'True', 'MODEL.ROI_RELATION_HEAD.PREDICTOR', 'TransformerTransferPredictor', 'MODEL.ROI_RELATION_HEAD.PREDICT_USE_BIAS', 'True', 'DTYPE', 'float32', 'SOLVER.IMS_PER_BATCH', '16', 'TEST.IMS_PER_BATCH', '1', 'SOLVER.MAX_ITER', '16000', 'SOLVER.BASE_LR', '1e-3', 'SOLVER.SCHEDULE.TYPE', 'WarmupMultiStepLR', 'SOLVER.STEPS', '(10000, 16000)', 'SOLVER.VAL_PERIOD', '3000', 'SOLVER.CHECKPOINT_PERIOD', '2000', 'GLOVE_DIR', './datasets/vg/', 'MODEL.PRETRAINED_DETECTOR_CKPT', './checkpoints/pretrained_faster_rcnn/model_final.pth', 'OUTPUT_DIR', './checkpoints/transformer_predcls_dist15_3k_FixPModel_lr1e3_B16_FCMat']' returned non-zero exit status 1.
Would you please provide instructions on how you made the conf_mat_freq_train.npy and the (commented out) conf_mat_adj_mat.npy? I was about to locate conf_mat_freq_train.npy in the repo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.