费政聪's Projects
Attention-Aligned Transformer for Image Captioning
Actor-Critic Sequence Generation for Relative Difference Captioning
All In One: General Multimodal Large Language Model
Multimodal dataset for arXiv
An image captioner with Chinese language
Use strategy to achieve clean webvid-10m dataset
When clip meet mae and beyond
Incorporating CLIP features into Transformer-based image captioning
Cross Lingual Knowledge Alignment for Stable Diffusion Models
Dynamic Early Exit for Image Captioning
Multi-modal dialogue system
Controllable Image Captioning with Diffusion Model
A tutorial of diffusion model for text-guide image generation
Scaling RWKV-Like Architectures for Diffusion Models
Transformer-Mamba Diffusion Models
Scalable Diffusion Models with State Space Backbone
Efficient Vision Transformers with Dynamic Token Routing
Scaling Diffusion Transformers with Mixture of Experts
Promoting Coherence and Diversity in Image Captioning
descriptive synthetic captions in dalle3
fast sorting for massive data
Transformer-based Food-Comment Matching model
Efficient modeling of future context for image captioning
official implementation for GameTag algorithm
Pytorch implementation for Graph Convolutional Network
Gradient-Free Textual Inversion for Personalized Text-to-Image Generation
Image Editing Anything