GithubHelp home page GithubHelp logo

pre-trained-vl-model's Introduction

Pretrained model summary


pretrained language model

title paper link code link
Improving Language Understanding by Generative Pre-Training [paper] [code(pytorch)]
ELMo : Deep contextualized word representations [paper] [code(tensorflow)]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [paper] [code(tensorflow)][code(pytorch)]
ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS [paper] [code(tensorflow)][code(pytorch)]
RoBERTa: A Robustly Optimized BERT Pretraining Approach [paper] [code[pytorch]]
Language Models are Unsupervised Multitask Learners [paper] [code(tensorflow)]
Language Models are Few-Shot Learners [paper] [code]
XLNet: Generalized Autoregressive Pretraining for Language Understanding [paper] [code(tensorflow)]

pretrained image model

title paper link code link
Identity Mappings in Deep Residual Networks [paper] [code]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [paper] [code(pytorch)]
Mask R-CNN [paper] [code(tensorflow)][code(pytorch)]
You Only Look Once: Unified, Real-Time Object Detection [paper] [code(tensorflow)]
YOLOv3: An Incremental Improvement [paper] [code(tensorflow)][code(pytorch)]
YOLOv4: Optimal Speed and Accuracy of Object Detection [paper] [code(tensorflow)]

pretrained video model

title paper link code link
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection [paper] [code(tensorflow)]
Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection [paper] [code(tensorflow)]
Optimizing Video Object Detection via a Scale-Time Lattice [paper] [code(pytorch)]
Mobile Video Object Detection with Temporally-Aware Feature Maps [paper] [code(pytorch)]
X3D: Expanding Architectures for Efficient Video Recognition [paper] [code(pytorch)]

pretrained image and language model

summary table

image

papaer and code

title paper link code link
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [paper] [code(pytorch)]
12-in-1: Multi-Task Vision and Language Representation Learning [paper] [code(pytorch)]
LXMERT: Learning Cross-Modality Encoder Representations from Transformers [paper] [code(pytorch)]
VISUALBERT: A SIMPLE AND PERFORMANT BASELINE FOR VISION AND LANGUAGE [paper] [code(pytorch)]
VL-BERT: Pre-training of Generic Visual-Linguistic Representations [paper] [code(pytorch)]
UNITER: LEARNING UNIVERSAL IMAGE-TEXT REPRESENTATIONS [paper] [code(pytorch)]
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training [paper] [code(pytorch)]
Large-Scale Adversarial Training for Vision-and-Language Representation Learning [paper] [code(pytorch)]
Fusion of Detected Objects in Text for Visual Question Answering [paper] [code(tensorflow)]
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph [paper] [code]
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers [paper] [code]

pretrained video and language model

title paper link code link
VideoBERT: A Joint Model for Video and Language Representation Learning [paper] [code]
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation [paper] [code]
Multi-modal Circulant Fusion for Video-to-Language and Backward [paper] [code]
Video-Grounded Dialogues with Pretrained Generation Language Models [paper] [code]
Deep Extreme Cut: From Extreme Points to Object Segmentation [paper] [code(pytorch)]
Integrating Multimodal Information in Large Pretrained Transformers [paper] [code(pytorch)]
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text [paper] [code(caffe)]

pretrained knowledge and language model

title paper link code link
Knowledge Enhanced Contextual Word Representations [paper] [code(pytorch)]
Why Do Masked Neural Language Models Still Need Commonsense Repositories to Handle Semantic Variations in Question Answering? [paper] [code]
SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge [paper] [code]
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation [paper] [code]
Knowledge-Aware Language Model Pretraining [paper] [code]
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model [paper] [code]

pre-trained-vl-model's People

Contributors

jaeyun95 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.