amusi / cvpr2024-papers-with-code Goto Github PK

CVPR 2024 论文和开源项目合集

cvpr cvpr2020 computer-vision deep-learning machine-learning object-detection image-segmentation paper image-processing visual-tracking

cvpr2024-papers-with-code's Introduction

CVPR 2024 论文和开源项目合集(Papers with Code)

CVPR 2024 decisions are now available on OpenReview！

注1：欢迎各位大佬提交issue，分享CVPR 2024论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

ECCV 2024

CVPR 2023

欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！

【CVPR 2024 论文开源目录】

3DGS(Gaussian Splatting)
Avatars
Backbone
CLIP
MAE
Embodied AI
GAN
GNN
多模态大语言模型(MLLM)
大语言模型(LLM)
NAS
OCR
NeRF
DETR
Prompt
扩散模型(Diffusion Models)
ReID(重识别)
长尾分布(Long-Tail)
Vision Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
异常检测(Anomaly Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像(Medical Image)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video Object Segmentation)
视频实例分割(Video Instance Segmentation)
参考图像分割(Referring Image Segmentation)
图像抠图(Image Matting)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
去噪(Denoising)
去模糊(Deblur)
自动驾驶(Autonomous Driving)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D语义场景补全(3D Semantic Scene Completion)
3D配准(3D Registration)
3D人体姿态估计(3D Human Pose Estimation)
3D人体Mesh估计(3D Human Mesh Estimation)
医学图像(Medical Image)
图像生成(Image Generation)
视频生成(Video Generation)
3D生成(3D Generation)
视频理解(Video Understanding)
行为检测(Action Detection)
文本检测(Text Detection)
知识蒸馏(Knowledge Distillation)
模型剪枝(Model Pruning)
图像压缩(Image Compression)
三维重建(3D Reconstruction)
深度估计(Depth Estimation)
轨迹预测(Trajectory Prediction)
车道线检测(Lane Detection)
图像描述(Image Captioning)
视觉问答(Visual Question Answering)
手语识别(Sign Language Recognition)
视频预测(Video Prediction)
新视点合成(Novel View Synthesis)
Zero-Shot Learning(零样本学习)
立体匹配(Stereo Matching)
特征匹配(Feature Matching)
场景图生成(Scene Graph Generation)
隐式神经表示(Implicit Neural Representations)
图像质量评价(Image Quality Assessment)
视频质量评价(Video Quality Assessment)
数据集(Datasets)
新任务(New Tasks)
其他(Others)

3DGS(Gaussian Splatting)

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Homepage: https://city-super.github.io/scaffold-gs/
Paper: https://arxiv.org/abs/2312.00109
Code: https://github.com/city-super/Scaffold-GS

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Homepage: https://shunyuanzheng.github.io/GPS-Gaussian
Paper: https://arxiv.org/abs/2312.02155
Code: https://github.com/ShunyuanZheng/GPS-Gaussian

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Paper: https://arxiv.org/abs/2312.02134
Code: https://github.com/huliangxiao/GaussianAvatar

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Paper: https://arxiv.org/abs/2311.14521
Code: https://github.com/buaacyw/GaussianEditor

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Homepage: https://ingra14m.github.io/Deformable-Gaussians/
Paper: https://arxiv.org/abs/2309.13101
Code: https://github.com/ingra14m/Deformable-3D-Gaussians

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

Homepage: https://yihua7.github.io/SC-GS-web/
Paper: https://arxiv.org/abs/2312.14937
Code: https://github.com/yihua7/SC-GS

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/
Paper: https://arxiv.org/abs/2312.16812
Code: https://github.com/oppo-us-research/SpacetimeGaussians

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

Homepage: https://fictionarry.github.io/DNGaussian/
Paper: https://arxiv.org/abs/2403.06912
Code: https://github.com/Fictionarry/DNGaussian

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Paper: https://arxiv.org/abs/2310.08528
Code: https://github.com/hustvl/4DGaussians

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Paper: https://arxiv.org/abs/2310.08529
Code: https://github.com/hustvl/GaussianDreamer

Avatars

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Paper: https://arxiv.org/abs/2312.02134
Code: https://github.com/huliangxiao/GaussianAvatar

Real-Time Simulated Avatar from Head-Mounted Sensors

Homepage: https://www.zhengyiluo.com/SimXR/
Paper: https://arxiv.org/abs/2403.06862

Backbone

RepViT: Revisiting Mobile CNN From ViT Perspective

Paper: https://arxiv.org/abs/2307.09283
Code: https://github.com/THU-MIG/RepViT

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Paper: https://arxiv.org/abs/2311.17132
Code: https://github.com/DaiShiResearch/TransNeXt

CLIP

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper: https://arxiv.org/abs/2312.03818
Code: https://github.com/SunzeY/AlphaCLIP

FairCLIP: Harnessing Fairness in Vision-Language Learning

MAE

Embodied AI

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Homepage: https://tai-wang.github.io/embodiedscan/
Paper: https://arxiv.org/abs/2312.16170
Code: https://github.com/OpenRobotLab/EmbodiedScan

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Homepage: https://iranqin.github.io/MP5.github.io/
Paper: https://arxiv.org/abs/2312.07472
Code: https://github.com/IranQin/MP5

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

Paper: https://arxiv.org/abs/2312.08963
Code: https://github.com/yyvhang/lemon_3d

GAN

OCR

An Empirical Study of Scaling Law for OCR

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Paper: https://arxiv.org/abs/2403.00303
Code: https://github.com/PriNing/ODM

NeRF

PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF

Paper: https://arxiv.org/abs/2311.13099
Code: https://github.com/FYTalon/pienerf/

DETR

DETRs Beat YOLOs on Real-time Object Detection

Paper: https://arxiv.org/abs/2304.08069
Code: https://github.com/lyuwenyu/RT-DETR

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Paper: https://arxiv.org/abs/2403.16131
Code: https://github.com/xiuqhou/Salience-DETR

Prompt

多模态大语言模型(MLLM)

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Link-Context Learning for Multimodal LLMs

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Paper: https://arxiv.org/abs/2311.17911
Code: https://github.com/shikiw/OPERA

Making Large Multimodal Models Understand Arbitrary Visual Prompts

Homepage: https://vip-llava.github.io/
Paper: https://arxiv.org/abs/2312.00784

Pink: Unveiling the power of referential comprehension for multi-modal llms

Paper: https://arxiv.org/abs/2310.00582
Code: https://github.com/SY-Xuan/Pink

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Paper: https://arxiv.org/abs/2311.08046
Code: https://github.com/PKU-YuanGroup/Chat-UniVi

OneLLM: One Framework to Align All Modalities with Language

Paper: https://arxiv.org/abs/2312.03700
Code: https://github.com/csuhan/OneLLM

大语言模型(LLM)

VTimeLLM: Empower LLM to Grasp Video Moments

Paper: https://arxiv.org/abs/2311.18445
Code: https://github.com/huangb23/VTimeLLM

NAS

ReID(重识别)

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Paper: https://arxiv.org/abs/2403.10254
Code: https://github.com/924973292/EDITOR

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Paper: https://arxiv.org/abs/2308.09911
Code : https://github.com/QinYang79/RDE

扩散模型(Diffusion Models)

InstanceDiffusion: Instance-level Control for Image Generation

Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
Paper: https://arxiv.org/abs/2402.03290
Code: https://github.com/frank-xwang/InstanceDiffusion

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

DeepCache: Accelerating Diffusion Models for Free

Paper: https://arxiv.org/abs/2312.00858
Code: https://github.com/horseee/DeepCache

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Homepage: https://tianhao-qi.github.io/DEADiff/
Paper: https://arxiv.org/abs/2403.06951
Code: https://github.com/Tianhao-Qi/DEADiff_code

SVGDreamer: Text Guided SVG Generation with Diffusion Model

Paper: https://arxiv.org/abs/2312.16476
Code: https://ximinng.github.io/SVGDreamer-project/

InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

Paper: https://arxiv.org/abs/2312.05849
Code: https://github.com/jiuntian/interactdiffusion

MMA-Diffusion: MultiModal Attack on Diffusion Models

Paper: https://arxiv.org/abs/2311.17516
Code: https://github.com/yangyijune/MMA-Diffusion

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Homeoage: https://video-motion-customization.github.io/
Paper: https://arxiv.org/abs/2312.00845
Code: https://github.com/HyeonHo99/Video-Motion-Customization

Vision Transformer

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Paper: https://arxiv.org/abs/2311.17132
Code: https://github.com/DaiShiResearch/TransNeXt

RepViT: Revisiting Mobile CNN From ViT Perspective

Paper: https://arxiv.org/abs/2307.09283
Code: https://github.com/THU-MIG/RepViT

A General and Efficient Training for Transformer via Token Expansion

Paper: https://arxiv.org/abs/2404.00672
Code: https://github.com/Osilly/TokenExpansion

视觉和语言(Vision-Language)

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Paper: https://arxiv.org/abs/2403.02781
Code: https://github.com/zhengli97/PromptKD

FairCLIP: Harnessing Fairness in Vision-Language Learning

目标检测(Object Detection)

DETRs Beat YOLOs on Real-time Object Detection

Paper: https://arxiv.org/abs/2304.08069
Code: https://github.com/lyuwenyu/RT-DETR

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper: https://arxiv.org/abs/2401.17270
Code: https://github.com/AILab-CVC/YOLO-World

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Paper: https://arxiv.org/abs/2403.16131
Code: https://github.com/xiuqhou/Salience-DETR

异常检测(Anomaly Detection)

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

Paper: https://arxiv.org/abs/2310.12790
Code: https://github.com/mala-lab/AHL

目标跟踪(Object Tracking)

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

语义分割(Semantic Segmentation)

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Paper: https://arxiv.org/abs/2312.04265
Code: https://github.com/w1oves/Rein

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Paper: https://arxiv.org/abs/2311.15537
Code: https://github.com/xb534/SED

医学图像(Medical Image)

Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology

Paper: https://arxiv.org/abs/2402.17228
Code: https://github.com/DearCaat/RRT-MIL

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Paper: https://arxiv.org/abs/2402.17300
Code: https://github.com/Luffy03/VoCo

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images

Paper: https://arxiv.org/abs/2311.15264
Code: https://github.com/nicoboou/chada_vit

医学图像分割(Medical Image Segmentation)

自动驾驶(Autonomous Driving)

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Paper: https://arxiv.org/abs/2310.08370
Code: https://github.com/Nightmare-n/UniPAD

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Paper: https://arxiv.org/abs/2311.17663
Code: https://github.com/haomo-ai/Cam4DOcc

Memory-based Adapters for Online 3D Scene Perception

Paper: https://arxiv.org/abs/2403.06974
Code: https://github.com/xuxw98/Online3D

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

Paper: https://arxiv.org/abs/2306.15670
Code: https://github.com/hustvl/Symphonies

A Real-world Large-scale Dataset for Roadside Cooperative Perception

Paper: https://arxiv.org/abs/2403.10145
Code: https://github.com/AIR-THU/DAIR-RCooper

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

Paper: https://arxiv.org/abs/2403.07535
Code: https://github.com/Junda24/AFNet

Traffic Scene Parsing through the TSP6K Dataset

Paper: https://arxiv.org/pdf/2303.02835.pdf
Code: https://github.com/PengtaoJiang/TSP6K

3D点云(3D-Point-Cloud)

3D目标检测(3D Object Detection)

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Paper: https://arxiv.org/abs/2312.08371
Code: https://github.com/kuanchihhuang/PTT

UniMODE: Unified Monocular 3D Object Detection

Paper: https://arxiv.org/abs/2402.18573

3D语义分割(3D Semantic Segmentation)

图像编辑(Image Editing)

Edit One for All: Interactive Batch Image Editing

Homepage: https://thaoshibe.github.io/edit-one-for-all
Paper: https://arxiv.org/abs/2401.10219
Code: https://github.com/thaoshibe/edit-one-for-all

视频编辑(Video Editing)

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

Homepage: https://maskint.github.io
Paper: https://arxiv.org/abs/2312.12468

Low-level Vision

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

Boosting Image Restoration via Priors from Pre-trained Models

Paper: https://arxiv.org/abs/2403.06793

超分辨率(Super-Resolution)

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Paper: https://arxiv.org/abs/2402.19387
Code: https://github.com/lbc12345/SeD

APISR: Anime Production Inspired Real-World Anime Super-Resolution

Paper: https://arxiv.org/abs/2403.01598
Code: https://github.com/Kiteretsu77/APISR

去噪(Denoising)

图像去噪(Image Denoising)

3D人体姿态估计(3D Human Pose Estimation)

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Paper: https://arxiv.org/abs/2311.12028
Code: https://github.com/NationalGAILab/HoT

图像生成(Image Generation)

InstanceDiffusion: Instance-level Control for Image Generation

Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
Paper: https://arxiv.org/abs/2402.03290
Code: https://github.com/frank-xwang/InstanceDiffusion

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Homepage: https://eclipse-t2i.vercel.app/
Paper: https://arxiv.org/abs/2312.04655
Code: https://github.com/eclipse-t2i/eclipse-inference

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper: https://arxiv.org/abs/2401.01952

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

UniGS: Unified Representation for Image Generation and Segmentation

Paper: https://arxiv.org/abs/2312.01985

Multi-Instance Generation Controller for Text-to-Image Synthesis

Paper: https://arxiv.org/abs/2402.05408
Code: https://github.com/limuloo/migc

SVGDreamer: Text Guided SVG Generation with Diffusion Model

Paper: https://arxiv.org/abs/2312.16476
Code: https://ximinng.github.io/SVGDreamer-project/

InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

Paper: https://arxiv.org/abs/2312.05849
Code: https://github.com/jiuntian/interactdiffusion

Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following

Paper: https://arxiv.org/abs/2311.17002
Code: https://github.com/ali-vilab/Ranni

视频生成(Video Generation)

Vlogger: Make Your Dream A Vlog

Paper: https://arxiv.org/abs/2401.09414
Code: https://github.com/Vchitect/Vlogger

VBench: Comprehensive Benchmark Suite for Video Generative Models

Homepage: https://vchitect.github.io/VBench-project/
Paper: https://arxiv.org/abs/2311.17982
Code: https://github.com/Vchitect/VBench

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Homeoage: https://video-motion-customization.github.io/
Paper: https://arxiv.org/abs/2312.00845
Code: https://github.com/HyeonHo99/Video-Motion-Customization

3D生成

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Homepage: https://haozhexie.com/project/city-dreamer/
Paper: https://arxiv.org/abs/2309.00610
Code: https://github.com/hzxie/city-dreamer

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Paper: https://arxiv.org/abs/2311.11284
Code: https://github.com/EnVision-Research/LucidDreamer

视频理解(Video Understanding)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

知识蒸馏(Knowledge Distillation)

Logit Standardization in Knowledge Distillation

Efficient Dataset Distillation via Minimax Diffusion

Paper: https://arxiv.org/abs/2311.15529
Code: https://github.com/vimar-gu/MinimaxDiffusion

立体匹配(Stereo Matching)

Neural Markov Random Field for Stereo Matching

Paper: https://arxiv.org/abs/2403.11193
Code: https://github.com/aeolusguan/NMRF

场景图生成(Scene Graph Generation)

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

Homepage: https://zhangce01.github.io/HiKER-SGG/
Paper : https://arxiv.org/abs/2403.12033
Code: https://github.com/zhangce01/HiKER-SGG

视频质量评价(Video Quality Assessment)

KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos

Homepage: https://lixinustc.github.io/projects/KVQ/
Paper: https://arxiv.org/abs/2402.07220
Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024

数据集(Datasets)

A Real-world Large-scale Dataset for Roadside Cooperative Perception

Paper: https://arxiv.org/abs/2403.10145
Code: https://github.com/AIR-THU/DAIR-RCooper

Traffic Scene Parsing through the TSP6K Dataset

Paper: https://arxiv.org/pdf/2303.02835.pdf
Code: https://github.com/PengtaoJiang/TSP6K

其他(Others)

Object Recognition as Next Token Prediction

Paper: https://arxiv.org/abs/2312.02142
Code: https://github.com/kaiyuyue/nxtp

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

Paper: https://arxiv.org/abs/2306.14525
Code: https://parameternet.github.io/

Seamless Human Motion Composition with Blended Positional Encodings

Paper: https://arxiv.org/abs/2402.15509
Code: https://github.com/BarqueroGerman/FlowMDM

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Homepage: https://clova-tool.github.io/
Paper: https://arxiv.org/abs/2312.10908

MoMask: Generative Masked Modeling of 3D Human Motions

Paper: https://arxiv.org/abs/2312.00063
Code: https://github.com/EricGuo5513/momask-codes

Amodal Ground Truth and Completion in the Wild

Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/
Paper: https://arxiv.org/abs/2312.17247
Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild

Improved Visual Grounding through Self-Consistent Explanations

Paper: https://arxiv.org/abs/2312.04554
Code: https://github.com/uvavision/SelfEQ

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

Homepage: https://chenshuang-zhang.github.io/imagenet_d/
Paper: https://arxiv.org/abs/2403.18775
Code: https://github.com/chenshuang-zhang/imagenet_d

Learning from Synthetic Human Group Activities

Homepage: https://cjerry1243.github.io/M3Act/
Paper https://arxiv.org/abs/2306.16772
Code: https://github.com/cjerry1243/M3Act

A Cross-Subject Brain Decoding Framework

Homepage: https://littlepure2333.github.io/MindBridge/
Paper: https://arxiv.org/abs/2404.07850
Code: https://github.com/littlepure2333/MindBridge

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Paper : https://arxiv.org/abs/2403.17749
Code: https://github.com/YuqiYang213/MLoRE

Contrastive Mean-Shift Learning for Generalized Category Discovery

Homepage: https://postech-cvlab.github.io/cms/
Paper: https://arxiv.org/abs/2404.09451
Code: https://github.com/sua-choi/CMS

cvpr2024-papers-with-code's People

Contributors

Stargazers

Watchers

Forkers

kaizen123 arslan-z liannice jdc08161063 justcallmewilliam dconstan chaoso allensmile wishgale yakuzeng ginobilinie wwwanghao frankfan007 gfs-tww zgsxwsdxg tuq820 kuan-li mathsionyang yangsenwxy neverstoplearn renakeji dmchen2015 benjamesbabala sttomato xdusponge thubiter wdlcas slf12 xqpinitial tobechao boozyguo zhushaoquan xiaowenhe leavelove majian-stu zyl19930813 tukjet funson hongyunnchen xiaopingzeng wyhsiao aipakchoi huaer79 hust-wayne technicalant hehaoming zengsn wenh123 m1a0o dorniwang lelegan zeitgeistqian lyrl zhenghl1995 awesome-archive liuzm07 zzu0654 zugexiaodui wangdongya lxjeffffff xjsxujingsong smile-le siyayao hishome wyhunstoppable lrbj romitavia jhilbertxtu herbert-wu hitersyw missionfission megayeye lbjcelsius changsl231 hhy5277 hui5908225 yaomoren 2017tjm onewillow clscy xiaomujiang zhzhang1997 gztangde cloverws lhrleo morganwang010 xiaobingdu hhgl acc-l quanfang zfyong cwzcwz damonzhenghuang chenliqiong awoziji mikuyourworld phillipwangaust superyangai gentletorch persuelx

cvpr2024-papers-with-code's Issues

增加一篇新论文：3D点云配准

3D点云配准

Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences
论文：https://arxiv.org/abs/2005.01014
代码：https://github.com/XiaoshuiHuang/fmr

添加一篇GAN用于representation learning的文章 "Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning"

Full-text link: https://arxiv.org/abs/1912.01899
code link: https://github.com/SsGood/DBGAN

添加一篇对抗样本论文

Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction
论文地址：https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf
代码地址：https://github.com/erbloo/dr_cvpr20

Deep Learning相关

希望能添加一下我们的文章（已中）：

标题：Filter Grafting for Deep Neural Networks

论文链接：https://arxiv.org/abs/2001.05868
代码链接：https://github.com/fxmeng/filter-grafting
论文解读：https://www.zhihu.com/question/372070853/answer/1041569335

谢谢～

神经网络剪枝文章

论文标题：Hrank: Filter Pruning using High-Rank Feature Map
论文：http://arxiv.org/abs/2002.10179
代码：https://github.com/lmbxmu/HRank

风格迁移论文一篇

论文标题："Diversified Arbitrary Style Transfer via Deep Feature Perturbation"
论文链接：https://arxiv.org/abs/1909.08223
代码：https://github.com/EndyWon/Deep-Feature-Perturbation

感谢！

添加几篇文章

TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style
数据集，三维重建
paper:
http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
code：
https://github.com/chaitanya100100/TailorNet
数据集：
https://github.com/zycliao/TailorNet_dataset
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion
三维重建
paper:
http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf
code:
https://github.com/jchibane/if-net
Learning to Transfer Texture from Clothing Images to 3D Humans
三维重建
paper:
http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf
code:
https://github.com/aymenmir1/pix2surf

有其他顶会的开源代码整理吗？

您好，感谢做了非常不错的整理，这个是2020CVPR的开源代码整理，像2019年的顶会您有做开源代码的整理吗？谢谢

[语义场景补全] 论文1篇

3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior

论文：https://arxiv.org/abs/2003.14052
代码：https://github.com/charlesCXK/3D-SketchAware-SSC

Thank you very much !

Kenan Dai, Yunhua Zhang, Dong Wang, Jianhua Li, Huchuan Lu, Xiaoyun Yang, High-Performance Long-Term Tracking with Meta-Updater, CVPR2020
论文：https://arxiv.org/abs/2004.00305
代码：https://github.com/Daikenan/LTMU

Bin Yan, Dong Wang, Huchuan Lu, Xiaoyun Yang, Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises, CVPR2020
论文：https://arxiv.org/abs/2003.09595
代码：https://github.com/MasterBin-IIAU/CSA

添加一篇目标检测论文

Noise-Aware Fully Webly Supervised Object Detection

论文地址：http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html

代码地址：https://github.com/shenyunhang/NA-fWebSOD/

谢谢

[Code Link Changed]: Could you please update it?

Hi, our open-source code has a new link, and we will release the code in the next week. Could you please update the information? Thank you very much!

3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior

Paper Link：https://arxiv.org/abs/2003.14052
Code Link：https://github.com/charlesCXK/TorchSSC

Class Add

Can you add a new class about Face Age Estimation?

Request for adding a new paper

Title: An Efficient PointLSTM for Point Clouds Based Gesture Recognition
Paper: http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
Code: https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch

Thank you!

视频超分辨率论文重复

有两个 Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

添加一篇视频超分辨率和一篇视频插值的文章

视频超分辨率：

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

论文：https://arxiv.org/abs/1812.02898
代码：https://github.com/YapengTian/TDAN-VSR-CVPR-2020

视频插值:

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

论文：https://arxiv.org/abs/2002.11616
代码：https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020

Please add our paper about deep learning

Title: Deep Image Spatial Transformation for person image generation
Code: https://github.com/RenYurui/Global-Flow-Local-Attention
Paper: https://arxiv.org/abs/2003.00696

新增模型压缩相关文章一篇，感谢

模型压缩（二值神经网络）相关文章一篇，已中
论文标题：Forward and Backward Information Retention for Accurate Binary Neural Networks
论文arXiv链接：https://arxiv.org/abs/1909.10788
代码：https://github.com/htqin/IR-Net/

添加一篇场景文字检测文章

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection
论文地址：http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf
代码地址：https://github.com/wangyuxin87/ContourNet

[CVPR2020] Learning Meta Face Recognition in Unseen Domains

方向：跨域人脸识别
论文标题：Learning Meta Face Recognition in Unseen Domains
论文：ArXiv or here
协议 & 评测代码：https://github.com/cleardusk/MFR （更新中）
软文：https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ

如能添加，非常感谢！

Reqest for adding a new paper

Title: Oops! Predicting Unintentional Action in Video
Paper: https://arxiv.org/abs/1911.11206
Code: https://github.com/cvlab-columbia/oops
Dataset: https://oops.cs.columbia.edu/data/
Website: https://oops.cs.columbia.edu/

Thank you!

添加一篇3D目标检测的文章

Full-text link: http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf
code link: https://github.com/swords123/IDA-3D

求添加一下我们的文章Rethinking Performance Estimation in Neural Architecture Search

目前已中
论文标题：Rethinking Performance Estimation in Neural Architecture Search
论文：准备中
代码：https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS
解读：https://www.zhihu.com/question/372070853/answer/1035234510

如能添加，非常感谢！

Please add our paper on deep learning

title: Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives
paper: https://arxiv.org/abs/2003.10739
code: https://github.com/d-li14/DHM

[CVPR2020] EfficientDet: Scalable and Efficient Object Detection

方向：目标检测
题目：EfficientDet: Scalable and Efficient Object Detection
ArXiv: https://arxiv.org/abs/1911.09070
GitHub: https://github.com/google/automl/tree/master/efficientdet

无意间发现的很不错的文章，好像这里漏掉了

添加一篇3D目标检测论文

Structure Aware Single-stage 3D Object Detection from Point Cloud

论文地址：http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html

代码地址：https://github.com/skyhehe123/SA-SSD

谢谢

CVPR2020 Polarized Reflection Removal

用偏振相机做去反光，后续会开源代码以及数据集。

project website: https://leichenyang.weebly.com/project-polarized.html
code: https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment

非常感谢！

谢谢！

添加一篇HOI的paper

PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection
https://arxiv.org/pdf/1912.12898
https://github.com/YueLiao/PPDM

Request for adding a new paper on Domain Generalization

Learning to Learn Single Domain Generalization

Paper: https://arxiv.org/abs/2003.13216

Code: https://github.com/joffery/M-ADA

Thanks!

添加一篇视频插帧文章

FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

论文: http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html

代码：
https://github.com/CM-BF/FeatureFlow

谢谢！

对抗样本CVPR2020文章一篇

对抗样本（Adversarial Examples）相关文章一篇，已中
论文标题：Zhao, Zhengyu, Zhuoran Liu, and Martha Larson. "Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance."
论文arXiv链接：https://arxiv.org/abs/1911.02466
代码：https://github.com/ZhengyuZhao/PerC-Adversarial

望添加，多谢~

Code Address: https://github.com/sugarruy/hashstash

Thanks!

Point Cloud Segmentation

Hi @amusi, could you update this paper (CVPR2020)

"PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling"

paper: https://arxiv.org/pdf/2003.00492.pdf
code: https://github.com/yanx27/PointASNL

Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation（NICE-GAN）
论文地址：http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html
代码地址：https://github.com/alpc91/NICE-GAN-pytorch

目前已中，希望可以添加

方向：视频增强（视频去模糊）
论文标题：Cascaded Deep Video Deblurring Using Temporal Sharpness Prior
论文：准备中
code：https://github.com/csbhr/CDVD-TSP
project page：https://csbhr.github.io/projects/cdvd-tsp/index.html

如能添加，非常感谢！

please kindly add HOReID

Thanks for your awesome work.
please kindly add the following Re-ID work.

Title: High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification
Paper: http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html
Code: https://github.com/wangguanan/HOReID

添加一篇3D人体姿态估计论文

Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data
论文地址：https://arxiv.org/abs/2006.07778
代码地址：https://github.com/Nicholasli1995/EvoSkeleton

Request for adding a new oral paper

Interpretable and Accurate Fine-grained Recognition via Region Grouping (CVPR 2020, Oral)

Paper: https://arxiv.org/pdf/2005.10411.pdf
Code: https://github.com/zxhuang1698/interpretability-by-parts
Category: 可解释性/图像分类/其他

Thanks!

可否添加代码库时，附上平台，如TensorFlow、PyTorch等

thx

添加一篇线框解析的文章

Holistically-Attracted Wireframe Parser

Fulltext: http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html

Code: https://github.com/cherubicXN/hawp

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble