GithubHelp home page GithubHelp logo

hongbo123467 / awesome-vlm-ad-its Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ge25nab/awesome-vlm-ad-its

0.0 0.0 0.0 814 KB

This repository collects research papers of large Vision Language Models in Autonomous driving and Intelligent Transportation System. The repository will be continuously updated to track the latest update.

License: Apache License 2.0

awesome-vlm-ad-its's Introduction

Vision-Language Models (VLMs) in Autonomous Driving (AD) and Intelligent Transportation Systems (ITS) ๐Ÿš˜

arXiv Badge Stars Badge Forks Badge Pull Requests Badge Issues Badge License Badge

This repository collects research papers on Vision-Language Models in Autonomous Driving and Intelligent Transportation Systems. The repo maintained by TUM-AIR will be continuously updated to track the latest work in the community.

If there are any omissions or suggestions, you're warmly welcome to reach out to us ([email protected] or [email protected]).

Keywords: Vision Language Models, Large Language Models, Autonomous Driving, Intelligent Transportation Systems

๐Ÿค ย  Citation

Please visit Vision Language Models in Autonomous Driving and Intelligent Transportation Systems for more details and comprehensive information. If you find our paper and repo helpful, please consider citing it as follows:

@misc{zhou2023vision,
      title={Vision Language Models in Autonomous Driving and Intelligent Transportation Systems}, 
      author={Xingcheng Zhou and Mingyu Liu and Bare Luka Zagar and Ekim Yurtsever and Alois C. Knoll},
      year={2023},
      eprint={2310.14414},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

๐Ÿ“ƒ Introduction

The applications of Vision-Language Models (VLMs) in the fields of Autonomous Driving (AD) and Intelligent Transportation Systems (ITS) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By integrating language data, the vehicles, and transportation systems are able to deeply understand real-world environments, improving driving safety and efficiency

๐ŸŒŸ Large VLMs in Autonomous Driving

Perception and Understanding

Method Year Task Code Link
The Traffic Scene Understanding and Prediction Based on Image Captioning 2020 Image Captioning
VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision 2023 Pedestrian Detection Github
Unsupervised Multi-view Pedestrian Detection 2023 Pedestrian Detection
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving 2023 Single Object Referring
Referring Multi-Object Tracking 2023 Multiple Objects Referring and Tracking Github
Language Prompt for Autonomous Driving 2023 Multiple Objects Referring and Tracking Github
OpenScene: 3D Scene Understanding with Open Vocabularies 2023 Open-Voc 3D Semantic Segmentation Github
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP 2023 Open-Voc 3D Semantic Segmentation Github
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving 2023 Open-Voc 3D Object Detection and Tracking
Zelda: Video Analytics using Vision-Language Models 2023 Language-guided Video Retrieval
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario 2023 Visual Question Answering Github
Talk2BEV: Language-Enhanced Bird's Eye View (BEV) Maps 2023 Visual Spatiel Reasoning, Open-loop Decision making Github
Semantic Anomaly Detection with Large Language Models 2023 Semantic Anomaly Detection

Navigation and Planning

Method Year Task Code Link
Talk to the vehicle: Language conditioned autonomous navigation of self driving car 2019 Language-Guided Navigation
Ground then Navigate: Language-guided Navigation in Dynamic Scenes 2022 Language-Guided Navigation
ALT-Pilot: Autonomous navigation with Language augmented Topometric maps 2023 Vision-Language Localization, Language-Guided Navigation Page
GPT-Driver: Learning to Drive with GPT 2023 Motion Planing Github
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving 2023 Trajectory Prediction

Decision-Making and Control

Method Year Task Code Link
Advisable Learning for Self-driving Vehicles by Internalizing Observation-to-Action Rules 2020 Open-loop Decision-Making
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving 2023 Open-loop Decision-Making
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles 2023 Open-loop Decision-Making, Motion Planing
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving 2023 Open-loop Control, Visual Spatial Reasoning Github
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models 2023 Closed-loop Decision-Making
SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model 2023 Closed-loop Decision-Making
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models 2023 Closed-loop Decision-Making Github

End-to-End Autonomous Driving

Method Year Task Code Link
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model 2023 Open-loop Control, Visual Question Answering
ADAPT: Action-aware Driving Caption Transformer 2023 Open-loop Decision-Making, Visual Spatial Reasoning Github

Data Generation

Method Year Task Code Link
DriveGAN: Towards a Controllable High-Quality Neural Simulation 2021 Conditional Video Generation Page
GAIA-1: A Generative World Model for Autonomous Driving 2023 Conditional Video Generation Page
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving 2023 Conditional Video Generation Github
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model 2023 Conditional Multi-view Video Generation Github
BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout 2023 Conditional Image Generation

๐ŸŒŸ Large VLMs in Intelligent Transportation Systems

ITS Perception and Understanding

Method Year Task Code Link
A Multi-granularity Retrieval System for Natural Language-based Vehicle Retrieval 2022 Language-Guided Vehicle Retrieval Page
Tracked-Vehicle Retrieval by Natural Language Descriptions With Multi-Contextual Adaptive Knowledge 2023 Language-Guided Vehicle Retrieval Page
A Unified Multi-modal Structure for Retrieving Tracked Vehicles through Natural Language Descriptions 2023 Language-Guided Vehicle Retrieval Page
Traffic-Domain Video Question Answering with Automatic Captioning 2023 Image Captioning, Visual Question Answering
Causality-aware Visual Scene Discovery for Cross-Modal Question Reasoning 2023 Visual Question Answering
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer 2023 Visual Question Answering Github
Delving into CLIP latent space for Video Anomaly Recognition 2023 Video Anomaly Recognition Github

ITS Management System

Method Year Task Code Link
LLM Powered Sim-to-real Transfer for Traffic Signal Control 2023 Traffic Signal Control

๐ŸŒŸ Dataset

Autonomous Driving Datasets

Dataset Year Task Data Link
Pedestrian Detection: A Benchmark 2009 2D OD Link
Vision meets robotics: The kitti dataset 2012 2D/3D OD, SS, OT Link
The Cityscapes Dataset for Semantic Urban Scene Understanding 2016 2D/3D OD, SS Link
Citypersons: A diverse dataset for pedestrian detection 2017 2D OD Link
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences 2019 3D SS Link
Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification 2019 OT, ReID Link
nuscenes: A multimodal dataset for autonomous driving 2020 2D/3D OD, 2D/3D SS, OT, MP Link
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning 2020 2D OD, 2D SS, OT Link
Scalability in Perception for Autonomous Driving: Waymo Open Dataset 2020 2D/3D OD, 2D/3D SS, OT Link

Language-Enhanced Autonomous Driving Datasets

Dataset Year Task Data Link
Textual explanations for self-driving vehicles 2018 Textural Explanation Link
Object referring in videos with language and human gaze 2018 Object Detection
Touchdown: Natural language navigation and spatial reasoning in visual street environments 2019 Visual-Spatial Reasoning, Vision-Language Navigation Link
Talk to the vehicle: Language conditioned autonomous navigation of self driving cars 2019 Vision-Language Navigation
rounding human-to-vehicle advice for self-driving vehicles 2019 Human-to-Vehicle Advice Link
Talk2car: Taking control of your self-driving car 2020 Single Object Reffering Link
Cityflow-nl: Tracking and retrieval of vehicles at city scale by natural language descriptions 2021 Vihicle Retrival, Object Tracking
Ground then navigate: Language-guided navigation in dynamic scenes 2022 Vision-Language Navigation
Language prompt for autonomous driving 2023 Object Tracking Link
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario 2023 Visual Question Answering Link
Referring multi-object tracking 2023 Object Tracking Link
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving 2023 Visual-Spatial Reasoning, Decision Making Link
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving 2023 Visual Question Answering Link
DRAMA: Joint Risk Localization and Captioning in Driving 2023 Iamge Captioning, Visual Question Answering Link
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning 2023 Importance Ranking, Visual-Spatial Reasoning

Language-Enhanced Intelligent Transportation Systems Datasets

Dataset Year Task Data Link
Future Frame Prediction for Anomaly Detection โ€“ A New Baseline 2018 Anomaly Detection Link
Real-world Anomaly Detection in Surveillance Videos 2018 Anomaly Detection Link
Sutd-trafficqa: A question answering benchmark and an efficient network for video reasoning over traffic events 2021 Visual Question Answering Link
AerialVLN: Vision-and-Language Navigation for UAVs 2023 Vision-Language Navigation Link

License

This repository is released under the Apache 2.0 license.

awesome-vlm-ad-its's People

Contributors

ge25nab avatar mingyuliu1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.