Light

ainisa20 / mplug-docowl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from x-plug/mplug-docowl

0.0 0.0 0.0 92.9 MB

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

License: Apache License 2.0

Shell 0.91% Python 99.09%

mplug-docowl's Introduction

The Powerful Multi-modal LLM Family
for OCR-free Document Understanding

Alibaba Group

News

🔥🔥🔥 [2024.4.3] We build demos of DocOwl1.5 on both ModelScope and HuggingFace 🤗, supported by the DocOwl1.5-Omni. The source codes of launching a local demo are also released in DocOwl1.5.
🔥🔥 [2024.3.28] We release the training data (DocStruct4M, DocDownstream-1.0, DocReason25K), codes and models (DocOwl1.5-stage1, DocOwl1.5, DocOwl1.5-Chat, DocOwl1.5-Omni) of mPLUG-DocOwl 1.5 on both HuggingFace 🤗 and ModelScope .
🔥 [2024.3.20] We release the arxiv paper of mPLUG-DocOwl 1.5, a SOTA 8B Multimodal LLM on OCR-free Document Understanding (DocVQA 82.2, InfoVQA 50.7, ChartQA 70.2, TextVQA 68.6).
[2024.01.13] Our Scientific Diagram Analysis dataset M-Paper has been available on both HuggingFace 🤗 and ModelScope , containing 447k high-resolution diagram images and corresponding paragraph analysis.
[2023.10.13] Training data, models of mPLUG-DocOwl/UReader has been open-soruced.
[2023.10.10] Our paper UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model is accepted by EMNLP 2023.

[2023.07.10] The demo of mPLUG-DocOwl on ModelScope is avaliable.
[2023.07.07] We release the technical report and evaluation set of mPLUG-DocOwl.

Models

mPLUG-DocOwl1.5 (Arxiv 2024) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
mPLUG-PaperOwl (Arxiv 2023) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
UReader (EMNLP 2023) - UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
mPLUG-DocOwl (Arxiv 2023) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Online Demo

Note: The demo of HuggingFace is not as stable as ModelScope because the GPU in ZeroGPU Spaces of HuggingFace is dynamically assigned.

ModelScope

HuggingFace

Cases

Related Projects

mplug-docowl's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs