GithubHelp home page GithubHelp logo

gianscarpe / positional-diffusion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from iit-pavis/positional_diffusion

0.0 0.0 0.0 334.66 MB

Code for "Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models"

License: MIT License

Python 100.00%

positional-diffusion's Introduction

Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models

Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements’ positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. We conduct extensive experiments with benchmark datasets including two puzzle datasets, three sentence ordering datasets, and one visual storytelling dataset, demonstrating that our method outperforms long-lasting research on puzzle solving with up to +18% compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and visual storytelling. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks.

Method

Datasets

Puzzles

Text

  • ROCStories: Link
  • NeurIPS Abstracts:
  • Wikipedia Movie Plots: Link

VIST

Rooms

  • Code and dataset at Link

Environment

  • We provide the environment definition in singularity/build/conda_env.yaml
  • Singularity image is also available at [WIP]
  • Requirements:
  - pytorch==1.12.1
  - cudatoolkit<=11.3.10
  - pyg
  - einops
  - black
  - pre-commit
  - pytorch-lightning<1.8
  - pip
  - matplotlib
  - wandb
  - transformers
  - timm
  - kornia

Training

Puzzles

Training script for puzzle:

  • Choose between two datsets: wikiart, celeba
  • Train model on all puzzle sizes: 6,8,10,12
  • At inference, choose between zero-center sampling (--noise_weight 0) or gaussian sampling (--noise_weight 1)
python puzzle_diff/train_script.py -dataset [wikiart,celeba] -puzzle_sizes 6,8,10,12 -inference_ratio 10 -sampling DDIM -gpu 1 -batch_size 8 -steps 300 -num_workers 6 --noise_weight [0,1] --predict_xstart True

PuzzleWikiArt PuzzleWikiArt PuzzleWikiArt PuzzleWikiArt

TEXT

Training script for Text:

  • Choose between three datasets: roc,wiki,nips
python puzzle_diff/train_text.py -dataset roc -inference_ratio 10 -sampling DDIM -gpus 2 -batch_size 16 -steps 100 -num_workers 6 --predict_xstart True

NIPS

VIST

Training script for VIST:

python puzzle_diff/train_vist.py -dataset sind -inference_ratio 10 -sampling DDIM -gpus 1 -batch_size 8 -steps 100 -num_workers 6 --predict_xstart True

VIST

Room Rearrangement

  • WIP

Additional parameters

Given the path for a model checkpoint /path/to/ckpt, specify the path for continuing training, as:

... --checkpoint_path /path/to/ckpt

positional-diffusion's People

Contributors

fgiuliari avatar gianscarpe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.