vamosc / clip4str Goto Github PK
View Code? Open in Web Editor NEWAn implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
License: Apache License 2.0
An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
License: Apache License 2.0
Thank you for your great work!
I tried to run train.py (not from pretrained) on Google Colab and getting this Error:
Error executing job with overrides: []
Error locating target 'strhub.models.vl_str.system.VL4STR', see chained exception above.
full_key: model
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I have tried to use absolute path for field "target" in "clip4str\configs\model\vl4str.yaml" but still get the above error.
I'm using the right hydra_core version (1.2.0) as in requirements.txt
Do you have any suggestions? Thank you!
It would be nice if you can add the latency results to the README as well. I am planning to use this for an industry application, but before experimenting, it would be nice to know if it's even a feasible option (since I have an SLA of like 1 sec per image).
Thank you for your great work!
I am trying to use Multilngual_CLIP to train clip4str for Vietnamese (with charset contains 229 tokens) (use Google Colab)
I have changed charset, code in strhub/models/vl_str/systems.py and other files so that I can use Text_encoder from Multilingual_CLIP for Vietnamese
Now I am getting an error for Learning rate scheduler as following:
675 M Trainable params
332 M Non-trainable params
1.0 B Total params
4,031.815 Total estimated model params size (MB)
[dataset] mean (0.48145466, 0.4578275, 0.40821073), std (0.26862954, 0.26130258, 0.27577711)
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:117: UserWarning: When using Trainer(accumulate_grad_batches != 1)
and overriding LightningModule.optimizer_{step,zero_grad}
, the hooks will not be called on every batch (rather, they are called on every optimization step).
rank_zero_warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[VL4STR] The length of encoder params with and without weight decay is 259 and 479, respectively.
[VL4STR] The length of decoder params with and without weight decay is 14 and 38, respectively.
Loading train_dataloader
to estimate number of stepping batches.
dataset root: /content/drive/MyDrive/clip4str/dataset/str_dataset/train/real
lmdb: ArT num samples: 34984
lmdb: The number of training samples is 34984
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Error executing job with overrides: []
Traceback (most recent call last):
File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 145, in
main()
File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
_run_app(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
run_and_report(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 216, in run_and_report
raise ex
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 104, in main
trainer.fit(model, datamodule=datamodule, ckpt_path=config.ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1217, in _run
self.strategy.setup(self)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/single_device.py", line 72, in setup
super().setup(trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 139, in setup
self.setup_optimizers(trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 128, in setup_optimizers
self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers
_validate_scheduler_api(lr_scheduler_configs, model)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 350, in _validate_scheduler_api
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: The provided lr scheduler OneCycleLR
doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step
hook with your own logic if you are using a custom LR scheduler.
I can not see any problem in OneCycleLR, do you have any suggestions for me with this matter? Is it a problem of package version?
Hi, I am trying to perform inference using the following script:
bash code/clip4str/scripts/read.sh 7 clip4str_b_plus.ckpt /home/shreyans/scratch/tata1mg/clip4str_og/code/clip4str/misc/test_image
The error i get is:
Additional keyword arguments: {}
args.checkpoint /home/shreyans/scratch/tata1mg/clip4str_og/output/clip4str_base16x16_d70bde1f2d.ckpt
config of VL4STR:
image_freeze_nlayer: 0, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False
use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0
use_share_dim: True, image_detach: True
cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False
config of VL4STR:
image_freeze_nlayer: -1, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False
use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0
use_share_dim: True, image_detach: True
cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False
loading checkpoint from /home/shreyans/scratch/tata1mg/clip4str_og/pretrained/clip/ViT-B-16.pt
The dimension of the visual decoder is 512.
Traceback (most recent call last):
File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/utils.py", line 104, in load_from_checkpoint
model = ModelClass.load_from_checkpoint(checkpoint_path, **kwargs)
File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 161, in load_from_checkpoint
model = cls._load_model_state(checkpoint, strict=strict, kwargs)
File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 203, in _load_model_state
model = cls(_cls_kwargs)
File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/vl_str/system.py", line 70, in init
assert os.path.exists(kwargs["clip_pretrained"])
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/shreyans/scratch/tata1mg/clip4str_og/code/clip4str/read.py", line 54, in
main()
File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/shreyans/scratch/tata1mg/clip4str_og/code/clip4str/read.py", line 37, in main
model = load_from_checkpoint(args.checkpoint, **kwargs).eval().to(args.device)
File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/utils.py", line 113, in load_from_checkpoint
model.load_state_dict(checkpoint)
File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VL4STR:
Missing key(s) in state_dict: "clip_model.positional_embedding", "clip_model.text_projection", "clip_model.logit_scale", "clip_model.visual.class_embedding", "clip_model.visual.positional_embedding", "clip_model.visual.proj", "clip_model.visual.conv1.weight", "clip_model.visual.ln_pre.weight", "clip_model.visual.ln_pre.bias", "clip_model.visual.transformer.resblocks.0.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.0.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.0.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.0.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.0.ln_1.weight", "clip_model.visual.transformer.resblocks.0.ln_1.bias", "clip_model.visual.transformer.resblocks.0.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.0.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.0.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.0.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.0.ln_2.weight", "clip_model.visual.transformer.resblocks.0.ln_2.bias", "clip_model.visual.transformer.resblocks.1.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.1.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.1.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.1.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.1.ln_1.weight", "clip_model.visual.transformer.resblocks.1.ln_1.bias", "clip_model.visual.transformer.resblocks.1.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.1.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.1.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.1.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.1.ln_2.weight", "clip_model.visual.transformer.resblocks.1.ln_2.bias", "clip_model.visual.transformer.resblocks.2.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.2.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.2.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.2.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.2.ln_1.weight", "clip_model.visual.transformer.resblocks.2.ln_1.bias", "clip_model.visual.transformer.resblocks.2.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.2.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.2.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.2.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.2.ln_2.weight", "clip_model.visual.transformer.resblocks.2.ln_2.bias", "clip_model.visual.transformer.resblocks.3.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.3.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.3.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.3.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.3.ln_1.weight", "clip_model.visual.transformer.resblocks.3.ln_1.bias", "clip_model.visual.transformer.resblocks.3.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.3.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.3.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.3.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.3.ln_2.weight", "clip_model.visual.transformer.resblocks.3.ln_2.bias", "clip_model.visual.transformer.resblocks.4.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.4.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.4.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.4.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.4.ln_1.weight", "clip_model.visual.transformer.resblocks.4.ln_1.bias", "clip_model.visual.transformer.resblocks.4.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.4.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.4.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.4.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.4.ln_2.weight", "clip_model.visual.transformer.resblocks.4.ln_2.bias", "clip_model.visual.transformer.resblocks.5.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.5.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.5.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.5.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.5.ln_1.weight", "clip_model.visual.transformer.resblocks.5.ln_1.bias", "clip_model.visual.transformer.resblocks.5.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.5.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.5.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.5.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.5.ln_2.weight", "clip_model.visual.transformer.resblocks.5.ln_2.bias", "clip_model.visual.transformer.resblocks.6.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.6.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.6.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.6.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.6.ln_1.weight", "clip_model.visual.transformer.resblocks.6.ln_1.bias", "clip_model.visual.transformer.resblocks.6.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.6.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.6.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.6.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.6.ln_2.weight", "clip_model.visual.transformer.resblocks.6.ln_2.bias", "clip_model.visual.transformer.resblocks.7.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.7.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.7.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.7.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.7.ln_1.weight", "clip_model.visual.transformer.resblocks.7.ln_1.bias", "clip_model.visual.transformer.resblocks.7.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.7.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.7.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.7.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.7.ln_2.weight", "clip_model.visual.transformer.resblocks.7.ln_2.bias", "clip_model.visual.transformer.resblocks.8.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.8.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.8.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.8.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.8.ln_1.weight", "clip_model.visual.transformer.resblocks.8.ln_1.bias", "clip_model.visual.transformer.resblocks.8.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.8.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.8.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.8.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.8.ln_2.weight", "clip_model.visual.transformer.resblocks.8.ln_2.bias", "clip_model.visual.transformer.resblocks.9.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.9.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.9.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.9.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.9.ln_1.weight", "clip_model.visual.transformer.resblocks.9.ln_1.bias", "clip_model.visual.transformer.resblocks.9.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.9.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.9.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.9.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.9.ln_2.weight", "clip_model.visual.transformer.resblocks.9.ln_2.bias", "clip_model.visual.transformer.resblocks.10.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.10.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.10.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.10.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.10.ln_1.weight", "clip_model.visual.transformer.resblocks.10.ln_1.bias", "clip_model.visual.transformer.resblocks.10.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.10.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.10.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.10.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.10.ln_2.weight", "clip_model.visual.transformer.resblocks.10.ln_2.bias", "clip_model.visual.transformer.resblocks.11.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.11.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.11.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.11.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.11.ln_1.weight", "clip_model.visual.transformer.resblocks.11.ln_1.bias", "clip_model.visual.transformer.resblocks.11.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.11.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.11.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.11.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.11.ln_2.weight", "clip_model.visual.transformer.resblocks.11.ln_2.bias", "clip_model.visual.ln_post.weight", "clip_model.visual.ln_post.bias", "clip_model.transformer.resblocks.0.attn.in_proj_weight", "clip_model.transformer.resblocks.0.attn.in_proj_bias", "clip_model.transformer.resblocks.0.attn.out_proj.weight", "clip_model.transformer.resblocks.0.attn.out_proj.bias", "clip_model.transformer.resblocks.0.ln_1.weight", "clip_model.transformer.resblocks.0.ln_1.bias", "clip_model.transformer.resblocks.0.mlp.c_fc.weight", "clip_model.transformer.resblocks.0.mlp.c_fc.bias", "clip_model.transformer.resblocks.0.mlp.c_proj.weight", "clip_model.transformer.resblocks.0.mlp.c_proj.bias", "clip_model.transformer.resblocks.0.ln_2.weight", "clip_model.transformer.resblocks.0.ln_2.bias", "clip_model.transformer.resblocks.1.attn.in_proj_weight", "clip_model.transformer.resblocks.1.attn.in_proj_bias", "clip_model.transformer.resblocks.1.attn.out_proj.weight", "clip_model.transformer.resblocks.1.attn.out_proj.bias", "clip_model.transformer.resblocks.1.ln_1.weight", "clip_model.transformer.resblocks.1.ln_1.bias", "clip_model.transformer.resblocks.1.mlp.c_fc.weight", "clip_model.transformer.resblocks.1.mlp.c_fc.bias", "clip_model.transformer.resblocks.1.mlp.c_proj.weight", "clip_model.transformer.resblocks.1.mlp.c_proj.bias", "clip_model.transformer.resblocks.1.ln_2.weight", "clip_model.transformer.resblocks.1.ln_2.bias", "clip_model.transformer.resblocks.2.attn.in_proj_weight", "clip_model.transformer.resblocks.2.attn.in_proj_bias", "clip_model.transformer.resblocks.2.attn.out_proj.weight", "clip_model.transformer.resblocks.2.attn.out_proj.bias", "clip_model.transformer.resblocks.2.ln_1.weight", "clip_model.transformer.resblocks.2.ln_1.bias", "clip_model.transformer.resblocks.2.mlp.c_fc.weight", "clip_model.transformer.resblocks.2.mlp.c_fc.bias", "clip_model.transformer.resblocks.2.mlp.c_proj.weight", "clip_model.transformer.resblocks.2.mlp.c_proj.bias", "clip_model.transformer.resblocks.2.ln_2.weight", "clip_model.transformer.resblocks.2.ln_2.bias", "clip_model.transformer.resblocks.3.attn.in_proj_weight", "clip_model.transformer.resblocks.3.attn.in_proj_bias", "clip_model.transformer.resblocks.3.attn.out_proj.weight", "clip_model.transformer.resblocks.3.attn.out_proj.bias", "clip_model.transformer.resblocks.3.ln_1.weight", "clip_model.transformer.resblocks.3.ln_1.bias", "clip_model.transformer.resblocks.3.mlp.c_fc.weight", "clip_model.transformer.resblocks.3.mlp.c_fc.bias", "clip_model.transformer.resblocks.3.mlp.c_proj.weight", "clip_model.transformer.resblocks.3.mlp.c_proj.bias", "clip_model.transformer.resblocks.3.ln_2.weight", "clip_model.transformer.resblocks.3.ln_2.bias", "clip_model.transformer.resblocks.4.attn.in_proj_weight", "clip_model.transformer.resblocks.4.attn.in_proj_bias", "clip_model.transformer.resblocks.4.attn.out_proj.weight", "clip_model.transformer.resblocks.4.attn.out_proj.bias", "clip_model.transformer.resblocks.4.ln_1.weight", "clip_model.transformer.resblocks.4.ln_1.bias", "clip_model.transformer.resblocks.4.mlp.c_fc.weight", "clip_model.transformer.resblocks.4.mlp.c_fc.bias", "clip_model.transformer.resblocks.4.mlp.c_proj.weight", "clip_model.transformer.resblocks.4.mlp.c_proj.bias", "clip_model.transformer.resblocks.4.ln_2.weight", "clip_model.transformer.resblocks.4.ln_2.bias", "clip_model.transformer.resblocks.5.attn.in_proj_weight", "clip_model.transformer.resblocks.5.attn.in_proj_bias", "clip_model.transformer.resblocks.5.attn.out_proj.weight", "clip_model.transformer.resblocks.5.attn.out_proj.bias", "clip_model.transformer.resblocks.5.ln_1.weight", "clip_model.transformer.resblocks.5.ln_1.bias", "clip_model.transformer.resblocks.5.mlp.c_fc.weight", "clip_model.transformer.resblocks.5.mlp.c_fc.bias", "clip_model.transformer.resblocks.5.mlp.c_proj.weight", "clip_model.transformer.resblocks.5.mlp.c_proj.bias", "clip_model.transformer.resblocks.5.ln_2.weight", "clip_model.transformer.resblocks.5.ln_2.bias", "clip_model.transformer.resblocks.6.attn.in_proj_weight", "clip_model.transformer.resblocks.6.attn.in_proj_bias", "clip_model.transformer.resblocks.6.attn.out_proj.weight", "clip_model.transformer.resblocks.6.attn.out_proj.bias", "clip_model.transformer.resblocks.6.ln_1.weight", "clip_model.transformer.resblocks.6.ln_1.bias", "clip_model.transformer.resblocks.6.mlp.c_fc.weight", "clip_model.transformer.resblocks.6.mlp.c_fc.bias", "clip_model.transformer.resblocks.6.mlp.c_proj.weight", "clip_model.transformer.resblocks.6.mlp.c_proj.bias", "clip_model.transformer.resblocks.6.ln_2.weight", "clip_model.transformer.resblocks.6.ln_2.bias", "clip_model.transformer.resblocks.7.attn.in_proj_weight", "clip_model.transformer.resblocks.7.attn.in_proj_bias", "clip_model.transformer.resblocks.7.attn.out_proj.weight", "clip_model.transformer.resblocks.7.attn.out_proj.bias", "clip_model.transformer.resblocks.7.ln_1.weight", "clip_model.transformer.resblocks.7.ln_1.bias", "clip_model.transformer.resblocks.7.mlp.c_fc.weight", "clip_model.transformer.resblocks.7.mlp.c_fc.bias", "clip_model.transformer.resblocks.7.mlp.c_proj.weight", "clip_model.transformer.resblocks.7.mlp.c_proj.bias", "clip_model.transformer.resblocks.7.ln_2.weight", "clip_model.transformer.resblocks.7.ln_2.bias", "clip_model.transformer.resblocks.8.attn.in_proj_weight", "clip_model.transformer.resblocks.8.attn.in_proj_bias", "clip_model.transformer.resblocks.8.attn.out_proj.weight", "clip_model.transformer.resblocks.8.attn.out_proj.bias", "clip_model.transformer.resblocks.8.ln_1.weight", "clip_model.transformer.resblocks.8.ln_1.bias", "clip_model.transformer.resblocks.8.mlp.c_fc.weight", "clip_model.transformer.resblocks.8.mlp.c_fc.bias", "clip_model.transformer.resblocks.8.mlp.c_proj.weight", "clip_model.transformer.resblocks.8.mlp.c_proj.bias", "clip_model.transformer.resblocks.8.ln_2.weight", "clip_model.transformer.resblocks.8.ln_2.bias", "clip_model.transformer.resblocks.9.attn.in_proj_weight", "clip_model.transformer.resblocks.9.attn.in_proj_bias", "clip_model.transformer.resblocks.9.attn.out_proj.weight", "clip_model.transformer.resblocks.9.attn.out_proj.bias", "clip_model.transformer.resblocks.9.ln_1.weight", "clip_model.transformer.resblocks.9.ln_1.bias", "clip_model.transformer.resblocks.9.mlp.c_fc.weight", "clip_model.transformer.resblocks.9.mlp.c_fc.bias", "clip_model.transformer.resblocks.9.mlp.c_proj.weight", "clip_model.transformer.resblocks.9.mlp.c_proj.bias", "clip_model.transformer.resblocks.9.ln_2.weight", "clip_model.transformer.resblocks.9.ln_2.bias", "clip_model.transformer.resblocks.10.attn.in_proj_weight", "clip_model.transformer.resblocks.10.attn.in_proj_bias", "clip_model.transformer.resblocks.10.attn.out_proj.weight", "clip_model.transformer.resblocks.10.attn.out_proj.bias", "clip_model.transformer.resblocks.10.ln_1.weight", "clip_model.transformer.resblocks.10.ln_1.bias", "clip_model.transformer.resblocks.10.mlp.c_fc.weight", "clip_model.transformer.resblocks.10.mlp.c_fc.bias", "clip_model.transformer.resblocks.10.mlp.c_proj.weight", "clip_model.transformer.resblocks.10.mlp.c_proj.bias", "clip_model.transformer.resblocks.10.ln_2.weight", "clip_model.transformer.resblocks.10.ln_2.bias", "clip_model.transformer.resblocks.11.attn.in_proj_weight", "clip_model.transformer.resblocks.11.attn.in_proj_bias", "clip_model.transformer.resblocks.11.attn.out_proj.weight", "clip_model.transformer.resblocks.11.attn.out_proj.bias", "clip_model.transformer.resblocks.11.ln_1.weight", "clip_model.transformer.resblocks.11.ln_1.bias", "clip_model.transformer.resblocks.11.mlp.c_fc.weight", "clip_model.transformer.resblocks.11.mlp.c_fc.bias", "clip_model.transformer.resblocks.11.mlp.c_proj.weight", "clip_model.transformer.resblocks.11.mlp.c_proj.bias", "clip_model.transformer.resblocks.11.ln_2.weight", "clip_model.transformer.resblocks.11.ln_2.bias", "clip_model.token_embedding.weight", "clip_model.ln_final.weight", "clip_model.ln_final.bias", "visual_decoder.pos_queries", "visual_decoder.layers.0.self_attn.in_proj_weight", "visual_decoder.layers.0.self_attn.in_proj_bias", "visual_decoder.layers.0.self_attn.out_proj.weight", "visual_decoder.layers.0.self_attn.out_proj.bias", "visual_decoder.layers.0.cross_attn.in_proj_weight", "visual_decoder.layers.0.cross_attn.in_proj_bias", "visual_decoder.layers.0.cross_attn.out_proj.weight", "visual_decoder.layers.0.cross_attn.out_proj.bias", "visual_decoder.layers.0.linear1.weight", "visual_decoder.layers.0.linear1.bias", "visual_decoder.layers.0.linear2.weight", "visual_decoder.layers.0.linear2.bias", "visual_decoder.layers.0.norm1.weight", "visual_decoder.layers.0.norm1.bias", "visual_decoder.layers.0.norm2.weight", "visual_decoder.layers.0.norm2.bias", "visual_decoder.layers.0.norm_q.weight", "visual_decoder.layers.0.norm_q.bias", "visual_decoder.layers.0.norm_c.weight", "visual_decoder.layers.0.norm_c.bias", "visual_decoder.text_embed.embedding.weight", "visual_decoder.norm.weight", "visual_decoder.norm.bias", "visual_decoder.head.weight", "visual_decoder.head.bias", "cross_decoder.pos_queries", "cross_decoder.layers.0.self_attn.in_proj_weight", "cross_decoder.layers.0.self_attn.in_proj_bias", "cross_decoder.layers.0.self_attn.out_proj.weight", "cross_decoder.layers.0.self_attn.out_proj.bias", "cross_decoder.layers.0.cross_attn.in_proj_weight", "cross_decoder.layers.0.cross_attn.in_proj_bias", "cross_decoder.layers.0.cross_attn.out_proj.weight", "cross_decoder.layers.0.cross_attn.out_proj.bias", "cross_decoder.layers.0.linear1.weight", "cross_decoder.layers.0.linear1.bias", "cross_decoder.layers.0.linear2.weight", "cross_decoder.layers.0.linear2.bias", "cross_decoder.layers.0.norm1.weight", "cross_decoder.layers.0.norm1.bias", "cross_decoder.layers.0.norm2.weight", "cross_decoder.layers.0.norm2.bias", "cross_decoder.layers.0.norm_q.weight", "cross_decoder.layers.0.norm_q.bias", "cross_decoder.layers.0.norm_c.weight", "cross_decoder.layers.0.norm_c.bias", "cross_decoder.text_embed.embedding.weight", "cross_decoder.norm.weight", "cross_decoder.norm.bias", "cross_decoder.head.weight", "cross_decoder.head.bias".
Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "loops", "callbacks", "optimizer_states", "lr_schedulers", "NativeMixedPrecisionPlugin", "hparams_name", "hyper_parameters"
Ihave done everything as mentioned, what is the reason?
How can I convert a model to ONNX for inference implementation?
This is an excellent work and I am very much interested in the advanced effects of clip attention maps in it.
Can you share the code used to generate the clip attention graph in the paper.
Thank you very much.
is it possible to train CLIP with other language
Thank you for the great work and released models. I noticed the tokenizer does not include spaces. Was the model not trained on them or is there a way to add them to the tokenizer?
patch_size是16*16吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.