GithubHelp home page GithubHelp logo

ainisa20 / mplug-docowl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from x-plug/mplug-docowl

0.0 0.0 0.0 92.9 MB

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

License: Apache License 2.0

Shell 0.91% Python 99.09%

mplug-docowl's Introduction

The Powerful Multi-modal LLM Family

for OCR-free Document Understanding

Alibaba Group

News

  • ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ [2024.4.3] We build demos of DocOwl1.5 on both ModelScope and HuggingFace ๐Ÿค—, supported by the DocOwl1.5-Omni. The source codes of launching a local demo are also released in DocOwl1.5.
  • ๐Ÿ”ฅ๐Ÿ”ฅ [2024.3.28] We release the training data (DocStruct4M, DocDownstream-1.0, DocReason25K), codes and models (DocOwl1.5-stage1, DocOwl1.5, DocOwl1.5-Chat, DocOwl1.5-Omni) of mPLUG-DocOwl 1.5 on both HuggingFace ๐Ÿค— and ModelScope .
  • ๐Ÿ”ฅ [2024.3.20] We release the arxiv paper of mPLUG-DocOwl 1.5, a SOTA 8B Multimodal LLM on OCR-free Document Understanding (DocVQA 82.2, InfoVQA 50.7, ChartQA 70.2, TextVQA 68.6).
  • [2024.01.13] Our Scientific Diagram Analysis dataset M-Paper has been available on both HuggingFace ๐Ÿค— and ModelScope , containing 447k high-resolution diagram images and corresponding paragraph analysis.
  • [2023.10.13] Training data, models of mPLUG-DocOwl/UReader has been open-soruced.
  • [2023.10.10] Our paper UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model is accepted by EMNLP 2023.
  • [2023.07.10] The demo of mPLUG-DocOwl on ModelScope is avaliable.
  • [2023.07.07] We release the technical report and evaluation set of mPLUG-DocOwl.

Models

  • mPLUG-DocOwl1.5 (Arxiv 2024) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

  • mPLUG-PaperOwl (Arxiv 2023) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

  • UReader (EMNLP 2023) - UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

  • mPLUG-DocOwl (Arxiv 2023) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Online Demo

Note: The demo of HuggingFace is not as stable as ModelScope because the GPU in ZeroGPU Spaces of HuggingFace is dynamically assigned.

ModelScope

HuggingFace

Cases

images

Related Projects

mplug-docowl's People

Contributors

hawlyq avatar lukeforeveryoung avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.