GithubHelp home page GithubHelp logo

three0-s / clip-ivp Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 79.51 MB

MICCAI2023 Workshop]Official implementation of CLIP-IVP: CLIP-based Intraoral View Prediction.

License: Other

Dockerfile 0.23% Python 89.24% Shell 0.43% C++ 3.10% Cuda 7.00%

clip-ivp's Introduction

MICCAI2023 Workshop] Official Pytorch Implementation of CLIP-IVP

This is an official implementation of CLIP-IVP: CLIP-based Intraoral View Prediction presented in MICCAI2023 1st MedAGI workshop. We exploited the source codes in stylegan2-ada repository for our implementation.

In this study, we propose a novel method called CLIP-based Intraoral View Prediction (CLIP-IVP) for predicting novel views of intraoral structure using only a single front teeth image of a patient. This task has not been explored in previous medical imaging applications.

img

Our approach leverages pre-trained CLIP image encoder to represent front intraoral images, reducing time and resources for training. Our model achieves a Frechet Inception Distance (FID) score of 3.4 on the intraoral view prediction task, suggesting the effectiveness of our method. Furthermore, we demonstrate that our model can be used to predict the orthodontic treatment process in a one-shot manner, which might be useful in treatment planning and prediction. Overall, our proposed model provides a new framework for generating high-fidelity medical images and opens up possibilities for future research in this field. Our work has the potential to provide more accurate predictions only using a small amount of data, and to support clinicians for better treatment planning. Additionally, our approach can be adapted for use in other domains.

Network Diagram

img This figure illustrates the training and inference processes for CLIP-IVP. We freeze pretrained CLIP image encoder during both training and inference phases.

Semantic-Nudge

img An example of one-shot semantic nudge using a pair of source images. We manipulate the base image to reflect the semantic difference between a pair of source images, such as pre- and post-treatment intraoral images. With semantic nudge, we can progressively predict the orthodontic treatment process of a patient. The predicted images show the process of leveling and alignment.

Text-To-Image

img We tested the zero-shot text-to-image generation ability of our model. For this task, we simply substituted the CLIP image encoder with the CLIP text encoder at the inference phase, without any further training. Since our model doesn’t learn the direct CLIP text embeddings, this method cannot guarantee the results. Nevertheless, it gives us a chance to explore pre- trained CLIP latent space when medical phraseologies are projected onto it.

Direction-Dependency of CLIP Embeddings

img Since CLIP latent is unit normalized during CLIP training, we hypothesize that information is encoded into the direction of the latent code and the norm of the latent would have trivial impact. We simply test our assumption by progressively multiplying a constant to unit-normalized CLIP embeddings and decoding this intermediate latent. From the above figure, we can observe our conjecture agrees with the results.

clip-ivp's People

Contributors

yevvonlim avatar jannehellsten avatar tkarras avatar nurpax avatar

Stargazers

Daekyu Kwon avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.