GithubHelp home page GithubHelp logo

comeeasy / clip_for_classificaion_with_descriptions Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 26 KB

Modified CLIP model to utilize classification task with descriptions

Python 88.07% Jupyter Notebook 11.93%

clip_for_classificaion_with_descriptions's Introduction

CLIP for classification using description.


Introduction

CLIP is trained with huge of (image, caption) paired data. Therefore we can assume that the model, CLIP, has its “general” knowledge between image and text captured from nature.

Consequently, CLIP is the most beloved model for zero-shot image or text tasks. However it is not feasible to apply for specific “subtle” tasks. For instance, classifying what type of wallpaper fault an image is certainly hard for CLIP as it is not “general”.

Therefore we need to fine-tune CLIP model to apply it for specific tasks. It may is done by following Finetune like you pretrain: Improved finetuning of zero-shot vision models.

Here is the things.

  1. Implemented code to use CLIP of HuggingFace model.
  2. Use Pytorch Lightning to fully utilize machine power.
  3. Constrain training procedure using user-defined descriptions.

Installation

conda env create -f environments.yaml

Usage

  1. Create your dataset for classification.

  2. The dataset has to follow below structure.

    train_dataset_dir
    ├── class0
    ├── class1
    ├── class2
    ...
    ├── classN-3
    ├── classN-2
    └── classN-1
  3. Open cfg.py and write a path of train_dataset_dir (e.g. /home/path/to/dset)

  4. Open info/class_descriptions.yaml and Write descriptions for “every” each class as follow

    "가구수정": separation of wallpaper that is attached next to furniture with compartments, a defect that occurs in places like built-in wardrobes or drawers
    "걸레받이수정": gap that has occurred between the mopboard and the wallpaper, mopboard is a material used to connect the side of the wallpaper and the floor, and this describes a defect that has occurred around that area
    "곰팡이": blue mold that occurs between the wallpaper surface or moldings and the wallpaper. The defect arises over time in damp conditions or due to water leakage
    ...
    classK: descrption for classK
    ...
    "틈새과다": excessive gaps that have occurred between the wallpaper surface and moldings
    "피스": tearing that occurred in the wallpaper surface due to improperly installed screws
    "훼손": damage to the wallpaper surface itself and the damage occurring between the wallpaper and moldings
  5. Train!

    python train.py

See CFG in cfg.py

  • It governs to every training procedure.
class CFG():
	# model 
  batch_size = 1 # for now, batch size is fixed to 1
  img_transform_size_W = img_transform_size_H = 512
  num_classes = -1 # automatically calculated by train_dataset_dir
  label_smoothing = 0.1 # use label smoothing
  
  sim_weight = 0.5 # weight that multiplied to similarity loss proposed in paper CLIP.
  fc_weight = 0.5  # weight that multiplied to classificatio loss

	# optimizer setting you should 
  lr = 1e-6
  optim_betas = (0.9,0.98)
  optim_eps = 1e-8
  optim_weight_decay = 0.05
  temperature = 1.072508 # (exp(t)), t=0.07 from CLIP paper
  
  # dataset
  test_size = 0.2
  train_transforms = transforms.Compose([
                      transforms.Resize((img_transform_size_W, img_transform_size_H)),
                      transforms.TrivialAugmentWide(),
                      transforms.ToTensor(),
                  ])
  val_transforms = transforms.Compose([
                      transforms.Resize((img_transform_size_W, img_transform_size_H)),
                      transforms.ToTensor(),
                  ])
  train_dataset_dir = "/path/to/training/dataset/of/yours"
  class_description_yaml_file = "/path/to/class_description/yaml/written/in/step4"	

References

https://dacon.io/competitions/official/236082/data

https://www.analyticsvidhya.com/blog/2021/06/5-techniques-to-handle-imbalanced-data-for-a-classification-problem/

https://ratsgo.github.io/insight-notes/docs/interpretable/smoothing

https://arxiv.org/pdf/2311.12065.pdf

https://sebastianraschka.com/blog/2023/data-augmentation-pytorch.html

https://youtu.be/jzJHgL9VGV4?si=R_FQq_XerelO6tRB

clip_for_classificaion_with_descriptions's People

Contributors

comeeasy avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.