GithubHelp home page GithubHelp logo

Yang Zhan 👋

I am currently pursuing the Ph.D. degree with the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China.

🏆My research interests

Vision and Language, Large Language Model, Multimodal Machine Learning, AI for Remote Sensing, and Data Mining.

💬Projects

📢News

🔥 [……]:

🔥 [2024]: Remote sensing multimodal large language model is an ongoing project. We will be working on improving it.

🔥 [2024/1]: SkyEyeGPT now is available at arXiv.

  • This work explores the remote sensing multimodal large language model (vision-language). We meticulously curate a high-quality remote sensing multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions, namely SkyEye-968k. We develop SkyEyeGPT, which unifies remote sensing vision-language tasks and breaks new ground in enabling the unified modeling of remote sensing vision and LLM. Experiments on 8 datasets for remote sensing vision language tasks demonstrate SkyEyeGPT’s superiority in image-level and region-level tasks. Specially, it has shown encouraging results in some tests, compared with GPT-4V.

🔥 [2024/1]: A curated list about Remote Sensing Multimodal Large Language Model (Vision-Language) is created.

🔥 [2023/12]: Propose the Mono3DVG task and construct the Mono3DRefer dataset(accepted by AAAI2024)!

  • For intelligent systems and robots, understanding objects based on language expressions in real 3D scenes is an important capability for human-machine interaction. However, existing 2D visual grounding cannot capture the true 3D extent of the referred objects. 3D visual grounding requires laser radars or RGB-D sensors, which greatly limits its application scenarios due to the expensive cost and device limitations. Monocular 3D object detection is low-cost and has strong applicability, but it cannot localize specific objects. We introduce a novel task of 3D visual grounding in monocular RGB images using language descriptions with appearance and geometry information. We create Mono3DRefer, which is the first dataset that leverages the ChatGPT to generate descriptions. We believe Mono3DVG can be widely applied since it does not require strict conditions such as RGB-D sensors, LiDARs, or industrial cameras. The application scenarios are wide, such as drones, surveillance systems, intelligent vehicles, robots, and other devices equipped with cameras.

🔥 [2023/08]: Propose a novel PE-RSITR task and provide empirical studies(accepted by T-GRS)!

  • This work explores the parameter-efficient transfer learning for remote sensing image-text retrieval. Our proposed MRS-Adapter reduces 98.9% of fine-tuned parameters and its performance exceeds traditional methods by 7%~13%.

🔥 [2023/02]: Propose the RSVG task and construct the DIOR-RSVG dataset(accepted by T-GRS)!

  • This work explores the visual grounding for remote sensing domain. The DIOR-RSVG takes DIOR dataset as the data source and is built using an automatic generation algorithm with manual verification. A novel transformer-based MGVLF model is devised to solve problems of the cluttered background and scale variation of RS images.

🔥 [2022/08]: Propose a STMGCN for vessel traffic flow prediction(accepted by T-ITS)!

  • This work explores multi-graph convolutional network for vessel traffic flow prediction. Due to the difference between water traffic and land traffic, we propose a big data-driven maritime traffic network extraction algorithm to construct a "road network". We then design a STMGCN to make full use of maritime graphs and multi-graph learning (including distance graph, interaction graph, and correlation graph).

🔥 [2021/08]: Propose a MVFFNet for imbalanced ship classification(accepted by PRLetters)!

🌱 Academic Services

  • Journal Reviewer: IEEE Transactions on Geoscience and Remote Sensing (T-GRS), Neural Networks (NEUNET), IEEE Geoscience and Remote Sensing Letters (IEEE GRSL)

📫 Contact

Email: [email protected]

⚡ Fact

ZhanYang's Projects

mono3dvg icon mono3dvg

[AAAI 2024] Mono3DVG: 3D Visual Grounding in Monocular Images, AAAI, 2024

pe-rsitr icon pe-rsitr

Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval, 2023

rsvg-pytorch icon rsvg-pytorch

RSVG: Exploring Data and Model for Visual Grounding on Remote Sensing Data, 2022

skyeyegpt icon skyeyegpt

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.