GithubHelp home page GithubHelp logo

amusi / cvpr2024-papers-with-code Goto Github PK

View Code? Open in Web Editor NEW
16.1K 279.0 2.5K 446 KB

CVPR 2024 论文和开源项目合集

cvpr cvpr2020 computer-vision deep-learning machine-learning object-detection image-segmentation paper image-processing visual-tracking

cvpr2024-papers-with-code's Introduction

CVPR 2024 论文和开源项目合集(Papers with Code)

CVPR 2024 decisions are now available on OpenReview!

注1:欢迎各位大佬提交issue,分享CVPR 2024论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来!

【CVPR 2024 论文开源目录】

3DGS(Gaussian Splatting)

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Avatars

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Real-Time Simulated Avatar from Head-Mounted Sensors

Backbone

RepViT: Revisiting Mobile CNN From ViT Perspective

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

CLIP

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

FairCLIP: Harnessing Fairness in Vision-Language Learning

MAE

Embodied AI

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

GAN

OCR

An Empirical Study of Scaling Law for OCR

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

NeRF

PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF

DETR

DETRs Beat YOLOs on Real-time Object Detection

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Prompt

多模态大语言模型(MLLM)

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Link-Context Learning for Multimodal LLMs

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Making Large Multimodal Models Understand Arbitrary Visual Prompts

Pink: Unveiling the power of referential comprehension for multi-modal llms

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

OneLLM: One Framework to Align All Modalities with Language

大语言模型(LLM)

VTimeLLM: Empower LLM to Grasp Video Moments

NAS

ReID(重识别)

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

扩散模型(Diffusion Models)

InstanceDiffusion: Instance-level Control for Image Generation

Residual Denoising Diffusion Models

DeepCache: Accelerating Diffusion Models for Free

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

SVGDreamer: Text Guided SVG Generation with Diffusion Model

InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

MMA-Diffusion: MultiModal Attack on Diffusion Models

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Vision Transformer

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

RepViT: Revisiting Mobile CNN From ViT Perspective

A General and Efficient Training for Transformer via Token Expansion

视觉和语言(Vision-Language)

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

FairCLIP: Harnessing Fairness in Vision-Language Learning

目标检测(Object Detection)

DETRs Beat YOLOs on Real-time Object Detection

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

YOLO-World: Real-Time Open-Vocabulary Object Detection

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

异常检测(Anomaly Detection)

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

目标跟踪(Object Tracking)

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

语义分割(Semantic Segmentation)

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

医学图像(Medical Image)

Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images

医学图像分割(Medical Image Segmentation)

自动驾驶(Autonomous Driving)

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Memory-based Adapters for Online 3D Scene Perception

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

A Real-world Large-scale Dataset for Roadside Cooperative Perception

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

Traffic Scene Parsing through the TSP6K Dataset

3D点云(3D-Point-Cloud)

3D目标检测(3D Object Detection)

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

UniMODE: Unified Monocular 3D Object Detection

3D语义分割(3D Semantic Segmentation)

图像编辑(Image Editing)

Edit One for All: Interactive Batch Image Editing

视频编辑(Video Editing)

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

Low-level Vision

Residual Denoising Diffusion Models

Boosting Image Restoration via Priors from Pre-trained Models

超分辨率(Super-Resolution)

SeD: Semantic-Aware Discriminator for Image Super-Resolution

APISR: Anime Production Inspired Real-World Anime Super-Resolution

去噪(Denoising)

图像去噪(Image Denoising)

3D人体姿态估计(3D Human Pose Estimation)

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

图像生成(Image Generation)

InstanceDiffusion: Instance-level Control for Image Generation

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Instruct-Imagen: Image Generation with Multi-modal Instruction

Residual Denoising Diffusion Models

UniGS: Unified Representation for Image Generation and Segmentation

Multi-Instance Generation Controller for Text-to-Image Synthesis

SVGDreamer: Text Guided SVG Generation with Diffusion Model

InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following

视频生成(Video Generation)

Vlogger: Make Your Dream A Vlog

VBench: Comprehensive Benchmark Suite for Video Generative Models

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

3D生成

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

视频理解(Video Understanding)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

知识蒸馏(Knowledge Distillation)

Logit Standardization in Knowledge Distillation

Efficient Dataset Distillation via Minimax Diffusion

立体匹配(Stereo Matching)

Neural Markov Random Field for Stereo Matching

场景图生成(Scene Graph Generation)

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

视频质量评价(Video Quality Assessment)

KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos

数据集(Datasets)

A Real-world Large-scale Dataset for Roadside Cooperative Perception

Traffic Scene Parsing through the TSP6K Dataset

其他(Others)

Object Recognition as Next Token Prediction

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

Seamless Human Motion Composition with Blended Positional Encodings

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

MoMask: Generative Masked Modeling of 3D Human Motions

Amodal Ground Truth and Completion in the Wild

Improved Visual Grounding through Self-Consistent Explanations

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

Learning from Synthetic Human Group Activities

A Cross-Subject Brain Decoding Framework

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Contrastive Mean-Shift Learning for Generalized Category Discovery

cvpr2024-papers-with-code's People

Contributors

amusi avatar anirudh257 avatar daveredrum avatar jihongju-tomtom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cvpr2024-papers-with-code's Issues

添加几篇文章

Class Add

Can you add a new class about Face Age Estimation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.