GithubHelp home page GithubHelp logo

featurepatch-visiontransformation's Introduction


Logo

Pattern Recognition & Computer Vision

Fine-tuning code and pre-trained models
Explore the official paper »

Table of Contents
  1. About The Code
  2. Working & Test
  3. Usage
  4. Contributing
  5. Acknowledgements & Resources

About The Project

Statistical pattern recognition, nowadays often known under the term "machine learning",
is the key element of modern computer science. Its goal is to find, learn, and recognize patterns in complex data,
for example in images, speech, biological pathways, the internet.

  • This repo is a gist of implementation of the Vision Transformation which was introduced in the paper: An Image is worth 16x16 words
  • This repository is uses Py-Torch implementation availabale here
  • The Py-Torch repository has pre-trained weights

logo

The code just a rewrite & straight implementation of the VisionTransformer class, with minor modifications
and simplifications the class function is easier to run & modify for future work to patch and embed images for classification.

A list of commonly used resources that I find helpful are listed in the acknowledgements.

Built With

The raw implementation of code is built using python3.7.9 & pip20.0

Getting Started

Logo

A quick overview of the architecture

The Vision Transformer is an image classifier which takes in an image and outputs the class & sub-class prediction, HOWEVER,
it does that without any convolutional layer, INSTEAD it uses the attention layers which is used already in NLTK, that is-an Attention Mechanism is also an attempt to implement the same action of selectively concentrating on a few relevant things,
while ignoring others in deep neural networks, However, in computer vision, convolutional neural networks (CNNs) are still the norm and self-attention just began to slowly creep into the main body of research.

The network is trained in three steps where image is turned in sequence of 1D tokens to use transform architecture:

  • Fine-tuning of the global features pretrained by ImageNet & flatten the patches into 1D vectors.
  • Mask inference to obtain the cropped images and perform fine-tuning of the local feature. Hereby, the weights in the global features are fixed.
  • Concatenating of the global and local feature outputs and fine-tuning of the fusion feature while freezing the weights of the other features.
  • The position embeding allows the network to determine what part of the image a specific patch came from.
logo

stand-alone self-attention

Prerequisites

Install the dependencies before running the compute.py file

  • pip
    $ pip install -r requirements.txt

Usage

First, build & download the model using command:

python run_model.py

you can change the attributes & parameters by, the default image is 384x384:

custom_config = {
    "img_size": 384,
    "in_chans": 3,
    "patch_size": 16,
    "embed_dim": 768,
    "depth": 12,
    "n_heads": 12,
    "qkv_bias": True,
    "mlp_ratio": 4,
}

To run the classification function and predict probability output:

python compute.py -image or -i <image destination, usually the base dir>

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/FeaturePatch-VisionTransformation)
  3. Commit your Changes (git commit -m 'Add some updates')
  4. Push to the Branch (git push origin feature/FeaturePatch-VisionTransformation)
  5. Open a Pull Request

Acknowledgements

featurepatch-visiontransformation's People

Contributors

akshaykalucha avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.