π’ A collection of remote sensing multimodal large language model papers focusing on the vision-language domain.
School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University
In this repository, we will collect and document researchers and their outstanding work related to remote sensing multimodal large language model (vision-language).
- The list will be continuously updated π₯π₯
- π¦ coming soon! π
- Papers
- Remote Sensing Vision-Language Dataset
- related: Remote Sensing Vision-Language Foundation Models
- π₯ Feb-4-24: LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
arXiv 2024 (arXiv:2402.02544). D. Muhtar, Z. Li, F. Gu, X. Zhang, and P. Xiao. [Paper][Code]
- π₯ Jan-30-24: EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
arXiv 2024 (arXiv:2401.16822). W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao. [Paper][[Code]:Null]
- π₯ Jan-18-24: SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
arXiv 2024 (arXiv:2401.09712). Y. Zhan, Z. Xiong, and Y. Yuan. [Paper][Code]
- π₯ Nov-30-23: Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
arXiv 2023 (arXiv:2311.14656). J. Roberts, T. LΓΌddecke, R. Sheikh, K. Han, and S. Albanie. [Paper][Code]
- π₯ Nov-28-23: GeoChat: Grounded Large Vision-Language Model for Remote Sensing
arXiv 2023 (arXiv:2311.15826). K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan. [Paper][Code]
- π₯ Jul-28-23: RSGPT: A Remote Sensing Vision Language Model and Benchmark
arXiv 2023 (arXiv:2307.15266). Y. Hu, J. Yuan, and C. Wen. [Paper][Code]
- π₯ Feb-17-24: ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing
arXiv 2024 (arXiv:2402.11325). Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu. [Paper][[Code]:Null)]
- π₯ Jan-2-24: RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
arXiv 2023 (arXiv:2306.11300). Z. Zhang, T. Zhao, Y. Guo, and J. Yin. [Paper][Code]
- π₯ Dec-20-23: SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
AAAI 2024 (arXiv:2312.12856). Z. Wang, R. Prabha, T. Huang, J. Wu, and R. Rajagopal. [Paper][Code]
- π₯ Jan-2-24: RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
arXiv 2023 (arXiv:2306.11300). Z. Zhang, T. Zhao, Y. Guo, and J. Yin. [Paper][Code]
- π₯ Dec-12-23: Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
arXiv 2023 (arXiv:2312.06960). U. Mall, C. P. Phoo, M. K. Liu, C. Vondrick, B. Hariharan, and K. Bala. [Paper][[Code]:Null]
- π₯ Aug-10-23: RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
arXiv 2023 (arXiv:2306.11029). F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, and J. Zhou. [Paper][Code]