xenova / transformers.js Goto Github PK

View Code? Open in Web Editor NEW

7.6K 58.0 416.0 99.96 MB

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

Home Page: https://huggingface.co/docs/transformers.js

License: Apache License 2.0

JavaScript 91.26% Python 8.74%

browser javascript transformers webml

transformers.js's Introduction

transformers.js javascript library logo

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

Transformers.js is designed to be functionally equivalent to Hugging Face's transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:

📝 Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
🖼️ Computer Vision: image classification, object detection, and segmentation.
🗣️ Audio: automatic speech recognition and audio classification.
🐙 Multimodal: zero-shot image classification.

Transformers.js uses ONNX Runtime to run models in the browser. The best part about it, is that you can easily convert your pretrained PyTorch, TensorFlow, or JAX models to ONNX using 🤗 Optimum.

For more information, check out the full documentation.

Quick tour

It's super simple to translate from existing code! Just like the python library, we support the pipeline API. Pipelines group together a pretrained model with preprocessing of inputs and postprocessing of outputs, making it the easiest way to run models with the library.

Python (original)	Javascript (ours)
from transformers import pipeline # Allocate a pipeline for sentiment-analysis pipe = pipeline('sentiment-analysis') out = pipe('I love transformers!') # [{'label': 'POSITIVE', 'score': 0.999806941}]	import { pipeline } from '@xenova/transformers'; // Allocate a pipeline for sentiment-analysis let pipe = await pipeline('sentiment-analysis'); let out = await pipe('I love transformers!'); // [{'label': 'POSITIVE', 'score': 0.999817686}]

Python (original)

Javascript (ours)

from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
pipe = pipeline('sentiment-analysis')

out = pipe('I love transformers!')
# [{'label': 'POSITIVE', 'score': 0.999806941}]

import { pipeline } from '@xenova/transformers';

// Allocate a pipeline for sentiment-analysis
let pipe = await pipeline('sentiment-analysis');

let out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]

You can also use a different model by specifying the model id or path as the second argument to the pipeline function. For example:

// Use a different model for sentiment-analysis
let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');

Installation

To install via NPM, run:

npm i @xenova/transformers

Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using ES Modules, you can import the library with:

<script type="module">
    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/[email protected]';
</script>

Examples

Want to jump straight in? Get started with one of our sample applications/templates:

Name	Description	Links
Whisper Web	Speech recognition w/ Whisper	code, demo
Doodle Dash	Real-time sketch-recognition game	blog, code, demo
Code Playground	In-browser code completion website	code, demo
Semantic Image Search (client-side)	Search for images with text	code, demo
Semantic Image Search (server-side)	Search for images with text (Supabase)	code, demo
Vanilla JavaScript	In-browser object detection	video, code, demo
React	Multilingual translation website	code, demo
Text to speech (client-side)	In-browser speech synthesis	code, demo
Browser extension	Text classification extension	code
Electron	Text classification application	code
Next.js (client-side)	Sentiment analysis (in-browser inference)	code, demo
Next.js (server-side)	Sentiment analysis (Node.js inference)	code, demo
Node.js	Sentiment analysis API	code
Demo site	A collection of demos	code, demo

Check out the Transformers.js template on Hugging Face to get started in one click!

Custom usage

By default, Transformers.js uses hosted pretrained models and precompiled WASM binaries, which should work out-of-the-box. You can customize this as follows:

Settings

import { env } from '@xenova/transformers';

// Specify a custom location for models (defaults to '/models/').
env.localModelPath = '/path/to/models/';

// Disable the loading of remote models from the Hugging Face Hub:
env.allowRemoteModels = false;

// Set location of .wasm files. Defaults to use a CDN.
env.backends.onnx.wasm.wasmPaths = '/path/to/files/';

For a full list of available settings, check out the API Reference.

Convert your models to ONNX

We recommend using our conversion script to convert your PyTorch, TensorFlow, or JAX models to ONNX in a single command. Behind the scenes, it uses 🤗 Optimum to perform conversion and quantization of your model.

python -m scripts.convert --quantize --model_id <model_name_or_path>

For example, convert and quantize bert-base-uncased using:

python -m scripts.convert --quantize --model_id bert-base-uncased

This will save the following files to ./models/:

bert-base-uncased/
├── config.json
├── tokenizer.json
├── tokenizer_config.json
└── onnx/
    ├── model.onnx
    └── model_quantized.onnx

For the full list of supported architectures, see the Optimum documentation.

Supported tasks/models

Here is the list of all tasks and architectures currently supported by Transformers.js. If you don't see your task/model listed here or it is not yet supported, feel free to open up a feature request here.

To find compatible models on the Hub, select the "transformers.js" library tag in the filter menu (or visit this link). You can refine your search by selecting the task you're interested in (e.g., text-classification).

Tasks

Natural Language Processing

Task	ID	Description	Supported?
Fill-Mask	`fill-mask`	Masking some of the words in a sentence and predicting which words should replace those masks.	✅ (docs) (models)
Question Answering	`question-answering`	Retrieve the answer to a question from a given text.	✅ (docs) (models)
Sentence Similarity	`sentence-similarity`	Determining how similar two texts are.	✅ (docs) (models)
Summarization	`summarization`	Producing a shorter version of a document while preserving its important information.	✅ (docs) (models)
Table Question Answering	`table-question-answering`	Answering a question about information from a given table.	❌
Text Classification	`text-classification` or `sentiment-analysis`	Assigning a label or class to a given text.	✅ (docs) (models)
Text Generation	`text-generation`	Producing new text by predicting the next word in a sequence.	✅ (docs) (models)
Text-to-text Generation	`text2text-generation`	Converting one text sequence into another text sequence.	✅ (docs) (models)
Token Classification	`token-classification` or `ner`	Assigning a label to each token in a text.	✅ (docs) (models)
Translation	`translation`	Converting text from one language to another.	✅ (docs) (models)
Zero-Shot Classification	`zero-shot-classification`	Classifying text into classes that are unseen during training.	✅ (docs) (models)
Feature Extraction	`feature-extraction`	Transforming raw data into numerical features that can be processed while preserving the information in the original dataset.	✅ (docs) (models)

Vision

Task	ID	Description	Supported?
Depth Estimation	`depth-estimation`	Predicting the depth of objects present in an image.	✅ (docs) (models)
Image Classification	`image-classification`	Assigning a label or class to an entire image.	✅ (docs) (models)
Image Segmentation	`image-segmentation`	Divides an image into segments where each pixel is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation.	✅ (docs) (models)
Image-to-Image	`image-to-image`	Transforming a source image to match the characteristics of a target image or a target image domain.	✅ (docs) (models)
Mask Generation	`mask-generation`	Generate masks for the objects in an image.	❌
Object Detection	`object-detection`	Identify objects of certain defined classes within an image.	✅ (docs) (models)
Video Classification	n/a	Assigning a label or class to an entire video.	❌
Unconditional Image Generation	n/a	Generating images with no condition in any context (like a prompt text or another image).	❌
Image Feature Extraction	`image-feature-extraction`	Transforming raw data into numerical features that can be processed while preserving the information in the original image.	✅ (docs) (models)

Audio

Task	ID	Description	Supported?
Audio Classification	`audio-classification`	Assigning a label or class to a given audio.	✅ (docs) (models)
Audio-to-Audio	n/a	Generating audio from an input audio source.	❌
Automatic Speech Recognition	`automatic-speech-recognition`	Transcribing a given audio into text.	✅ (docs) (models)
Text-to-Speech	`text-to-speech` or `text-to-audio`	Generating natural-sounding speech given text input.	✅ (docs) (models)

Tabular

Task	ID	Description	Supported?
Tabular Classification	n/a	Classifying a target category (a group) based on set of attributes.	❌
Tabular Regression	n/a	Predicting a numerical value given a set of attributes.	❌

Multimodal

Task	ID	Description	Supported?
Document Question Answering	`document-question-answering`	Answering questions on document images.	✅ (docs) (models)
Image-to-Text	`image-to-text`	Output text from a given image.	✅ (docs) (models)
Text-to-Image	`text-to-image`	Generates images from input text.	❌
Visual Question Answering	`visual-question-answering`	Answering open-ended questions based on an image.	❌
Zero-Shot Audio Classification	`zero-shot-audio-classification`	Classifying audios into classes that are unseen during training.	✅ (docs) (models)
Zero-Shot Image Classification	`zero-shot-image-classification`	Classifying images into classes that are unseen during training.	✅ (docs) (models)
Zero-Shot Object Detection	`zero-shot-object-detection`	Identify objects of classes that are unseen during training.	✅ (docs) (models)

Reinforcement Learning

Task	ID	Description	Supported?
Reinforcement Learning	n/a	Learning from actions by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback.	❌

Models

ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
Audio Spectrogram Transformer (from MIT) released with the paper AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung, James Glass.
BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
BEiT (from Microsoft) released with the paper BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei.
BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
Blenderbot (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
BlenderbotSmall (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
BLOOM (from BigScience workshop) released by the BigScience Workshop.
CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Chinese-CLIP (from OFA-Sys) released with the paper Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou.
CLAP (from LAION-AI) released with the paper Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
CLIP (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
CLIPSeg (from University of Göttingen) released with the paper Image Segmentation Using Text and Image Prompts by Timo Lüddecke and Alexander Ecker.
CodeGen (from Salesforce) released with the paper A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
CodeLlama (from MetaAI) released with the paper Code Llama: Open Foundation Models for Code by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
ConvNeXT (from Facebook AI) released with the paper A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
ConvNeXTV2 (from Facebook AI) released with the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
DeBERTa (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
DeBERTa-v2 (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
DeiT (from Facebook) released with the paper Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
Depth Anything (from University of Hong Kong and TikTok) released with the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
DETR (from Facebook) released with the paper End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
DINOv2 (from Meta AI) released with the paper DINOv2: Learning Robust Visual Features without Supervision by Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski.
DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
DiT (from Microsoft Research) released with the paper DiT: Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
Donut (from NAVER), released together with the paper OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
DPT (from Intel Labs) released with the paper Vision Transformers for Dense Prediction by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
EfficientNet (from Google Brain) released with the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan, Quoc V. Le.
ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
ESM (from Meta AI) are transformer protein language models. ESM-1b was released with the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. ESM-1v was released with the paper Language models enable zero-shot prediction of the effects of mutations on protein function by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. ESM-2 and ESMFold were released with the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
Falcon (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
FLAN-T5 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
GLPN (from KAIST) released with the paper Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
GPT Neo (from EleutherAI) released in the repository EleutherAI/gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
GPT NeoX (from EleutherAI) released with the paper GPT-NeoX-20B: An Open-Source Autoregressive Language Model by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
GPT-J (from EleutherAI) released in the repository kingoflolz/mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
GPTBigCode (from BigCode) released with the paper SantaCoder: don't reach for the stars! by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
HerBERT (from Allegro.pl, AGH University of Science and Technology) released with the paper KLEJ: Comprehensive Benchmark for Polish Language Understanding by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik.
Hubert (from Facebook) released with the paper HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
LongT5 (from Google AI) released with the paper LongT5: Efficient Text-To-Text Transformer for Long Sequences by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
LLaMA (from The FAIR team of Meta AI) released with the paper LLaMA: Open and Efficient Foundation Language Models by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.
Llama2 (from The FAIR team of Meta AI) released with the paper Llama2: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.
M2M100 (from Facebook) released with the paper Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
MarianMT Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
mBART (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
mBART-50 (from Facebook) released with the paper Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
Mistral (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
MMS (from Facebook) released with the paper Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
MobileBERT (from CMU/Google Brain) released with the paper MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
MobileViT (from Apple) released with the paper MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari.
MobileViTV2 (from Apple) released with the paper Separable Self-attention for Mobile Vision Transformers by Sachin Mehta and Mohammad Rastegari.
MPNet (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
MPT (from MosaiML) released with the repository llm-foundry by the MosaicML NLP Team.
MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
NLLB (from Meta) released with the paper No Language Left Behind: Scaling Human-Centered Machine Translation by the NLLB team.
Nougat (from Meta AI) released with the paper Nougat: Neural Optical Understanding for Academic Documents by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic.
OPT (from Meta AI) released with the paper OPT: Open Pre-trained Transformer Language Models by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
OWL-ViT (from Google AI) released with the paper Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
OWLv2 (from Google AI) released with the paper Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
Phi (from Microsoft) released with the papers - Textbooks Are All You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, Textbooks Are All You Need II: phi-1.5 technical report by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
Qwen2 (from the Qwen team, Alibaba Group) released with the paper Qwen Technical Report by Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu.
ResNet (from Microsoft Research) released with the paper Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
RoBERTa (from Facebook), released together with the paper RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
RoFormer (from ZhuiyiTechnology), released together with the paper RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
SegFormer (from NVIDIA) released with the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
Segment Anything (from Meta AI) released with the paper Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
SigLIP (from Google AI) released with the paper Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer.
SpeechT5 (from Microsoft Research) released with the paper SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
SqueezeBERT (from Berkeley) released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
StableLm (from Stability AI) released with the paper StableLM 3B 4E1T (Technical Report) by Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi, Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James Baicoianu.
Starcoder2 (from BigCode team) released with the paper StarCoder 2 and The Stack v2: The Next Generation by Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries.
Swin Transformer (from Microsoft) released with the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
Swin2SR (from University of Würzburg) released with the paper Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
T5v1.1 (from Google AI) released in the repository google-research/text-to-text-transfer-transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
Table Transformer (from Microsoft Research) released with the paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents by Brandon Smock, Rohith Pesala, Robin Abraham.
TrOCR (from Microsoft), released together with the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
UniSpeech (from Microsoft Research) released with the paper UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
UniSpeechSat (from Microsoft Research) released with the paper UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
Vision Transformer (ViT) (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
ViTMatte (from HUST-VL) released with the paper ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.
VITS (from Kakao Enterprise) released with the paper Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech by Jaehyeon Kim, Jungil Kong, Juhee Son.
Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
Wav2Vec2-BERT (from Meta AI) released with the paper Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team.
WavLM (from Microsoft Research) released with the paper WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
Whisper (from OpenAI) released with the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.
XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
YOLOS (from Huazhong University of Science & Technology) released with the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.

transformers.js's People

Contributors

Stargazers

Watchers

Forkers

codeaudit aucan obenjiro kyrolabs kevin51jiang hertera1 tmanager22 roadlabs gerryqi aasim-syed sunho0506 dalian-ai rafaelmri-ai devenlu stvhanna qdrk papiguy monad-one haloboy777 mimounraddahi dosyago masterzsh honsa kokizzu muharremokutan richardsonjf otherlibrary akshay5995 hbcbh1999 panpepson kustomzone automationkit lyhourchhen blueoceandevops spread0x kungfooman kennethlynne pavan142 bookertluceklw8 leonmercedf eltociear mbrukman ado5 linecode stjordanis soon14 hhy5277 tinoni animesh tristanoprofetto handtrix hyojunguy endeep tjmehta guncebektas pgrynfelder huangziming0830 florincatalin biw roschler sai4rall davgit joskid spencekim bhbs shalevy1 dimq1 lyhiving eviltik cassiebreviu deniska83 goswamig jaedukseo wide-video mshook katherineq11 helloworld2025 lai-flow coursenell anilcosaran ekolve mihastele ahlag m-taguchi takoyaro omar2205 pink-red marketingpip cgoder seandoc77 rachelpeterson484 jasonchang0905 sorokinvld neelhabib harikalarkutusu inspire12 hans00 chrislee973 fruitbox12 allthingsllm

transformers.js's Issues

[Feature request] whisper word level timestamps

I am new to both transformers.js and whisper, so I am sorry for a lame question in advance.

I am trying to make return_timestamps parameter work...

I managed to customize script.js from transformer.js demo locally and added data.generation.return_timestamps = "char"; around line ~447 inside GENERATE_BUTTON click handler in order to pass the parameter. With that change in place I am seeing timestamp appears as chunks (result var in worker.js):

{
    "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
    "chunks": [
        {
            "timestamp": [0,8],
            "text": " And so my fellow Americans ask not what your country can do for you"
        },
        {
            "timestamp": [8,11],
            "text": " ask what you can do for your country."
        }
    ]
}

however the chunks are not "char level" granular as expected following the return_timestamps doc.

I am looking for ideas how to achieve char/word level timestamp granularity with transform.js and whisper. Do some models/tools need to be updated and/or rebuild?

Converting to embeddings

Hello team,

How do I simply output tokens / embeddings from a model like "bert-base-multilingual-cased" using this library?

Thanks.

Simple demo on jsfiddle fails

Dear @xenova ,

thank you for this repo! It is pretty sad that Microsoft did no similar for their onnx web on their own and i am grateful that you've done it.

I tried to make the simplest demo at all in JSfiddle (and codepen).
After adding the jsdelivr source, just added the following js:

import { pipeline } from "/@xenova/transformers";

// Allocate a pipeline for sentiment-analysis
let pipe = await pipeline('embeddings', 'sentence-transformers/all-MiniLM-L6-v2');

let out = await pipe('I love transformers!');
console.log(out);

but get no output in the console.

Expected: is the embedding of the text.

Adding the trailing "/" was needed, due to error:
"Uncaught TypeError: Failed to resolve module specifier \"@xenova/transformers\". Relative references must start with either \"/\", \"./\", or \"../\"."
Could that be the source of the problem?

JSFiddle is here.

thank you for your help!

Default summarizer doesn't work

Just wanted to mention that I tried pipeline('summarization') without a second parameter and it didn't work (default is sshleifer/distilbart-cnn-12-6)

it did work when I provided the model that is specified in the test - sshleifer/distilbart-cnn-6-6.

I saw that 12-6 is an existing model on huggingface, is there a process that makes a model compatible? I'm sorry if these newbie questions are answered somewhere? I saw the section on how to convert a model to onnx... was it done for all the models manually?

Another questions is how to integrate a model that works slightly differently - for example this model that is oriented for Q&A?

Thanks

[Model request] Helsinki-NLP/opus-mt-ru-en (marian)

Sorry for this noob question, can somebody give me a kind of guideline to be able to convert and use

https://huggingface.co/Helsinki-NLP/opus-mt-ru-en/tree/main

thank you

WebGPU Support

Any plans to add webgpu support ?

[Bug] Percent X and transcripts get stuck with console error

Describe the bug
A clear and concise description of what the bug is.

How to reproduce
Steps or a minimal working example to reproduce the behavior
Audio with Hindi having transcribing stuck on X %

Expected behavior
A clear and concise description of what you expected to happen.

It should generate subtitles

Logs/screenshots
If applicable, add logs/screenshots to help explain your problem.
Screenshot attached, app name is screenrun.app

Environment

Transformers.js version:
Browser (if applicable):
Operating system (if applicable):
Other:

Additional context
Add any other context about the problem here.

TypeError: Cannot convert undefined to a BigInt

I am running the flan-t5 model for text2text-generation (e.g. await pipeline("text2text-generation", "flan-t5-base");) in a service worker. Seeing this error:

TypeError: Cannot convert undefined to a BigInt
    at BigInt (<anonymous>)
    at eval (models.js?a626:289:1)
    at Array.map (<anonymous>)
    at Function.toI64Tensor (models.js?a626:289:1)
    at Function.prepare_inputs (models.js?a626:303:1)
    at Function.forward (models.js?a626:557:1)
    at Function.runBeam (models.js?a626:544:1)
    at Function.generate_single (models.js?a626:366:1)
    at eval (models.js?a626:326:1)
    at Array.map (<anonymous>)

[Feature request] Add text-to-speech with SpeechT5

Name of the feature
Speech to text using SpeechT5, which was recently added to Transformers.

Blog post: https://huggingface.co/blog/speecht5
Spaces demo: https://huggingface.co/spaces/Matthijs/speecht5-tts-demo
Models: https://huggingface.co/mechanicalsea/speecht5-tts
Github repo: https://github.com/microsoft/SpeechT5/
Paper: https://arxiv.org/abs/2110.07205
Speaker embedding creation: https://huggingface.co/mechanicalsea/speecht5-vc/blob/main/manifest/utils/prep_cmu_arctic_spkemb.py

Reason for request
The brower's default TTS API is quite bad if you want to create an experience that works nicely across all browsers. Firefox's voices in particular are extremely robotic. Some applications require that the voice is consistent, and of a particular style/tone/etc. SpeechT5 allows you to create 512-dim speaker embeddings so you can use an arbitrary voice style.

Additional context

The model runs in realtime on the CPU (Pytorch), so with WebGPU we should easily have realtime generation on the web.
According the the above-linked models repo, the models are 600M (T5) and 300M (Hi-Fi-GAN), but I've just tried running it locally with the new docker integration on Hugging Face and it downloads a 585M model and a 50M model. So I'm not sure what's going on with the GAN size difference. Maybe they have quantized the GAN, but not T5? Hoping tha the T5 model can be quantized because that would move it from "reasonable" to "good" territory in terms of size. I'm assuming that it's currently in 16 bit format.

Example clip from the Spaces demo (this embedding is pretty monotone):

tmptgsysvc8.webm

[Model request] Facebook/BlenderBot for text-generation

Hello,
This project is really cool. I was wondering if it's possible to use the BlenderBot model.
My current code gives me an error:

let pipe = await pipeline('text-generation', model = 'facebook/blenderbot_small-90M')
let out = await pipe("Prompt");
console.log(out);

GET https://huggingface.co/Xenova/transformers.js/resolve/main/quantized/facebook/blenderbot_small-90M/causal-lm-with-past/tokenizer.json 404
Uncaught (in promise) Error: File not found. Could not locate "https://huggingface.co/Xenova/transformers.js/resolve/main/quantized/facebook/blenderbot_small-90M/causal-lm-with-past/tokenizer.json".

I saw that this was added in your Hugging Face repository, but I was wondering what the correct way to implement this is.
Thank you!

[Feature request] nodejs caching

Hi, thank you for your works

I'm a nodejs user and i read that there is no model cache implementation right now, and you are working on it.

Do you have an idea of when you will be able to push a release with a cache implementation ?

Just asking because i was at the point to code it on my side

Error: Unable to create tensor

Getting this error when running the embeddings pipeline:

Error: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=true' and 'truncation=true' to have batched tensors with the same length.
    at Function._call (tokenizers.js?dd68:928:1)
    at Function.closure [as tokenizer] (utils.js?aa0e:362:44)
    at Function._call (pipelines.js?5d72:46:1)
    at Function._call (pipelines.js?5d72:263:1)
    at closure (utils.js?aa0e:362:44)
    at _callee$ (utils.ts?2b07:20:28)
    at tryCatch (runtime.js?7efe:45:16)
    at Generator.invoke [as _invoke] (runtime.js?7efe:274:1)
    at prototype.<computed> [as next] (runtime.js?7efe:97:1)
    at asyncGeneratorStep (utils.ts?2b07:1:53)

I read the source code and saw that padding and truncation are set to true by default, so not sure how this is happening. Any ideas?

[Model request] DETR / OWLVIT

Normal VIT seems to struggle in my test, maybe these are better for object-detection?

[Feature request] Typescript Support

Hi,

I love this library and I use it a lot, thank you for the effort.
Is there a plan to add types to this package? I t could help us understand how to use and configure the different pipelines.
Is there other documentation? Since the HuggingFace one is more general and less specific about this lib.

[Feature request] Rephrase task

it would be great to have a rephrase task. Mainly to use for rephrasing ChatGPT answers :)

Next.js Support

Hi @xenova,

Thanks again for all the hard work and this amazing library.
I am trying to make it work on Next.js. Everything works fine but when I build (static site generation) and I run the website, I get weird errors. I think they might be related to webpack.

This is the error:
Uncaught SyntaxError: Label 'r' has already been declared

I am using next.js 13.2.4

I have seen you made some commits to fix Jimp on Next.js do you have an example how to use it?

Error: invalid input 'encoder_attention_mask'

I'm trying to run a slightly larger model (https://huggingface.co/facebook/bart-large-cnn). It's a Bart model, and I converted it to .onnx successfully using your script. Size comes out to ~1gb.

I get this error when running the summarization and text2text generation pipelines:

Error: invalid input 'encoder_attention_mask'
    at eval (ort-web.min.js?d8cf:6:446899)
    at Array.forEach (<anonymous>)
    at e.OnnxruntimeWebAssemblySessionHandler.run (ort-web.min.js?d8cf:6:446819)
    at InferenceSession.run (inference-session-impl.js?f23d:91:1)
    at sessionRun (models.js?a626:34:1)
    at seq2seq_forward (models.js?a626:111:1)
    at async Function.forward (models.js?a626:971:1)
    at async seq2seqRunBeam (models.js?a626:168:1)
    at async Function.runBeam (models.js?a626:964:1)
    at async Function.generate (models.js?a626:562:1)

Text completion fails with non-default model passed in as argument

When trying to use the flan-t5-base model in place of the default model text-completion, the pipeline fails:

Error: File not found. Could not locate "/models/onnx/quantized/google/flan-t5-base/causal-lm-with-past/tokenizer_config.json".

[Feature request] Split out the demo site from the library

The site source code should be removed from the library.
There are several solutions:

Make this repo a monorepo: I would use Lerna to keep and maintain the examples and the library all together with many packages:
- transformers
- demo
Split out the demo in another repo

automatic-speech-recognition does not return timestamps

The automatic-speech-recognition currently does not have the option to return timestamps for the text. Is it possible to enable this behavior?

convert.py script exits with message `Killed`

When running the following:
python3.9 ./transformers.js/scripts/convert.py --model_id google/flan-ul2 --from_hub --quantize --task seq2seq-lm-with-past

I receive the following:

Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████| 784/784 [00:00<00:00, 86.3kB/s]
Downloading (…)model.bin.index.json: 100%|█████████████████████████████████████████████████████████| 67.5k/67.5k [00:00<00:00, 3.05MB/s]
Downloading (…)l-00001-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 4.69G/4.69G [00:54<00:00, 85.5MB/s]
Downloading (…)l-00002-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 4.97G/4.97G [01:02<00:00, 79.1MB/s]
Downloading (…)l-00003-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 4.97G/4.97G [01:05<00:00, 75.9MB/s]
Downloading (…)l-00004-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 4.96G/4.96G [01:08<00:00, 72.5MB/s]
Downloading (…)l-00005-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 5.00G/5.00G [01:02<00:00, 79.5MB/s]
Downloading (…)l-00006-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 4.93G/4.93G [01:32<00:00, 53.3MB/s]
Downloading (…)l-00007-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 5.00G/5.00G [01:39<00:00, 50.4MB/s]
Downloading (…)l-00008-of-00008.bin: 100%|█████████████████████████████████████████████████████████| 4.93G/4.93G [02:10<00:00, 37.9MB/s]
Killed

Subsequent runs of the script will simply output Killed after about 15s.

Is there something I can do to fix this? I am running ubuntu 18.04 on x86 arch.

TypeError: A float32 tensor's data must be type of function Float32Array() when running under jest

Given the following script:

const { pipeline, env } = require("@xenova/transformers");
env.onnx.wasm.numThreads = 1;


(async() => {
  let embedder = await pipeline('embeddings', 'sentence-transformers/all-MiniLM-L6-v2');
  let sentences = [
      'The quick brown fox jumps over the lazy dog.'
  ];
  let output = (await embedder(sentences)).tolist();
  console.log(output);
})();

This will execute without error when running the following from the shell

$ node test.js

But if the same script is executed using jest, I receive the following error:

$ npx jest test.js

TypeError: A float32 tensor's data must be type of function Float32Array() { [native code] }
    at new h (node_modules/onnxruntime-common/dist/webpack:/onnxruntime-common/lib/tensor-impl.ts:111:17)
    at m.run (node_modules/onnxruntime-common/dist/webpack:/onnxruntime-common/lib/inference-session-impl.ts:112:28)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at sessionRun (node_modules/@xenova/transformers/src/models.js:52:18)
    at Function._call (node_modules/@xenova/transformers/src/models.js:365:16)
    at Function._call (node_modules/@xenova/transformers/src/pipelines.js:69:23)
    at Function._call (node_modules/@xenova/transformers/src/pipelines.js:351:33)
    at test.js:10:17

This appears to be related to the choice of the ONNX runtime. If this line: https://github.com/xenova/transformers.js/blob/main/src/backends/onnx.js#L10 is changed to onnxruntime-web (instead of onnxruntime-node). Executing under jest will now succeed, so there appears to be some issue with jest + onnxruntime-node. In terms of resolution, one option would be to detect within backends/onnx.js whether execution is being done under jest, which can be done by checking process.env.JEST_WORKER_ID (this will be populated when running under jest). In terms of root cause, I'm not sure where the actual bug is (jest or onnxruntime-node), but it would make the most sense for it to be resolved there if its possible to determine which package is responsible.

Steps to reproduce:

create new directory
copy test script to file named test.js
npm install @xenova/transformers jest
node test.js
npx jest test.js

Is it possible to run this in node?

I got this error when trying:

TypeError [ERR_WORKER_PATH]: The worker script or module filename must be an absolute path or a relative path starting with './' or '../'. Received "blob:nodedata:....

TypeError: quantize_dynamic() got an unexpected keyword argument 'activation_type'

When running the script:

python ./scripts/convert.py --model_id philschmid/flan-t5-base-samsum --from_hub --quantize --task seq2seq-lm

I get the following error:

TypeError: quantize_dynamic() got an unexpected keyword argument 'activation_type'

It works fine without the --quantize argument.

[Feature request] Mobile Browsers Support

When trying to run the models from a mobile browser it fails.

Mobile browsers have limited storage for webpages (let's assume 100MB since it depends on the browser and smartphone).
When using smaller models like whisper-tiny (STT) or distilbert-base-uncased (QA) it works.

Any ideas on how can we solve it?

I was thinking about pruning big models and having a less than 100MB version for them like Tensorflow Lite. What do you think?

[Model Request] mT5, mBART, NLLB

These are models that support a wider range of languages.

Btw, very cool project, thanks!

[chinese characters] Transformers.js library’s pipeline output is inconsistent with transformers pipeline

When using the pipeline from transformers.js library for inference, the output answer span and score are very different from the results obtained by using transformers Pipeline. Specifically, when inputting the same question and passage, the answer span output by transformers.js library is not in the passage, and has a low score; while the answer span output by transformers Pipeline is in the passage, and has a high score. This indicates that there is a problem with the question answering model from transformers.js library.

After testing, I found that under the condition of using the same model and input, the output results of fill mask and question answering are very different from transformers pipeline.

Generation config commented out?

Is there a reason why usage of several generation config params is commented out in the _get_logits_processor function on line 370 of models.js?

_get_logits_processor(
        generation_config,
        input_ids_seq_length,
        // encoder_input_ids, TODO
        // prefix_allowed_tokens_fn, TODO
        logits_processor = null
    ) {
        const processors = new LogitsProcessorList();

        // if (generation_config.diversity_penalty !== null && generation_config.diversity_penalty > 0.0) {
        //     processors.push(new HammingDiversityLogitsProcessor(
        //         generation_config.diversity_penalty,
        //         generation_config.num_beams,
        //         generation_config.num_beam_groups
        //     ));
        // }

Uncaught ReferenceError: self is not defined

I'm trying to use this library to run whisper in a browser environment, outside of a webworker, using NextJS. However, because of this line, I get an error.

Here's how I triggered it:

      const pipe = await pipeline(
        "automatic-speech-recognition",
        "openai/whisper-base"
      );
      const out = await pipe(
        "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
      );

Code sandbox

[Bug] Converter doesn't convert Whisper model to ONNX

Describe the bug
A clear and concise description of what the bug is.

Converter doesn't convert Whisper model to ONNX.
Converter doesn't work for non *.en mpdels

How to reproduce
Steps or a minimal working example to reproduce the behavior
python ./scripts/convert.py --model_id openai/whisper-tiny --from_hub --quantize --task speech2seq-lm-with-past

result:
Merging decoders
Traceback (most recent call last):
File "D:\Users\Dimq1\source\OpenAI\transformers.js\scripts\convert.py", line 301, in
main()
File "D:\Users\Dimq1\source\OpenAI\transformers.js\scripts\convert.py", line 293, in main
merge_decoders(
File "C:\Users\Dimq1\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\onnx\graph_transformations.py", line 135, in merge_decoders
_unify_onnx_outputs(decoder, decoder_with_past)
File "C:\Users\Dimq1\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\onnx\transformations_utils.py", line 147, in _unify_onnx_outputs
_check_num_outputs(model1, model2)
File "C:\Users\Dimq1\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\onnx\transformations_utils.py", line 136, in _check_num_outputs
raise ValueError(
ValueError: Two model protos need to have the same outputs. But one has 18 outputs while the other has 10 outputs.
PS D:\Users\Dimq1\source\OpenAI\transformers.js>

Expected behavior
A clear and concise description of what you expected to happen.

Logs/screenshots
If applicable, add logs/screenshots to help explain your problem.

Environment

Transformers.js version:
Browser (if applicable):
Operating system (if applicable):
Other:

Additional context
Add any other context about the problem here.

Embeddings differ between transformers.js and sentence-transformers (Python)

Running the following:

global.self = global;

const { pipeline, env } = require("@xenova/transformers");
env.onnx.wasm.numThreads = 1;
(async()=> {
        let embedder = await pipeline('embeddings', 'sentence-transformers/all-MiniLM-L6-v2')
        let sentences = [
            'The quick brown fox jumps over the lazy dog.'
        ]
        let output = await embedder(sentences)
        console.log(output[0][0]);
})();

and this (installable via pip install sentence-transformers):

from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-MiniLM-L6-v2")
query = model.encode("The quick brown fox jumps over the lazy dog.", convert_to_tensor=True)
print(query[0])

results in different values (-0.07890480756759644 and 0.0439). Any idea why the embedding values are different? I also noticed that the cosine similarity values for the same two sentences between Python and transformers.js was also quite a bit different (which is likely caused by the difference in embedding values).

[Feature request] Improve technical debt

To allow proper collaboration and healthiness we need to reduce some current technical debt. Something like CML. Disclaimer: I did that

linter
unit tests
Github actions checks
Github actions releases

Error when using with React

I tested with React but got error
Code:

import { pipeline } from "@xenova/transformers";
....
const classifier = await pipeline("translation");

Error:

tokenizers.js:1080 Uncaught (in promise) TypeError: text.split is not a function
    at Function._encode_text (tokenizers.js:1080:1)
    at Function.encode (tokenizers.js:1105:1)
    at tokenizers.js:970:1
    at Array.map (<anonymous>)
    at Function._call (tokenizers.js:970:1)
    at Function.closure [as tokenizer] (utils.js:372:1)
    at Function._call (pipelines.js:214:1)
    at closure (utils.js:372:1)
    at basicStateReducer (react-dom.development.js:16540:1)
    at updateReducer (react-dom.development.js:16664:1)

[Model request] GPT-Neo

Would it be possible to add GPT-Neo as an available model? I believe it is pretty similar to GPT-2, as they both use GPT2Tokenizer, so I don't think it should be too difficult to implement. It would also be pretty cool to have a model that knows things like quantum entanglement since it was trained on The Pile.

[Feature request] Change default text generation model from gpt2 to distilgpt2

Currently, the default model is gpt2 for text generation. It isn't good and I could not configure it to work correctly.

I used distilgpt2 and it works great out of the box. I want to create a PR and change it. @xenova What do you think?

Service worker in Typescript

I am trying to duplicate your example service worker code here in order to run the models outside of the main thread: https://github.com/xenova/transformers.js/blob/main/assets/js/worker.js

However, my codebase is written in Typescript rather than Javascript, and I get the error Cannot find name 'env' because you set the path to the wasm files like so: env.onnx.wasm.wasmPaths = DIST_DIR;.

Where is the env object initialized?

Uncaught (in promise) Error: failed to call OrtRun(). error code = 6

Describe the bug
A clear and concise description of what the bug is.

Uncaught (in promise) Error: failed to call OrtRun(). error code = 6.
    at e.run (ort-web.min.js:6:454860)
    at e.run (ort-web.min.js:6:444208)
    at e.OnnxruntimeWebAssemblySessionHandler.run (ort-web.min.js:6:447139)
    at o.run (inference-session-impl.js:91:44)
    at x (models.js:52:32)
    at A (models.js:147:34)
    at Function.forward (models.js:936:22)
    at O (models.js:202:29)
    at Function.runBeam (models.js:927:22)
    at Function.generate (models.js:558:41)

How to reproduce
Steps or a minimal working example to reproduce the behavior

Expected behavior
A clear and concise description of what you expected to happen.

Try on this audio file in Chrome for macOS

Logs/screenshots
If applicable, add logs/screenshots to help explain your problem.

Environment

Transformers.js version: latest from npm
Browser (if applicable): Chrome
Operating system (if applicable): MacOS
Other:

file.webm

Additional context
Add any other context about the problem here.

Current use of execution providers is suboptimal

Currently the library only operates with the WASM backend even in nodejs. However the better approach should be use node bindings in node and add the webgl EP to automatically fallback to wasm if webgl is not supported by the model.

Currenly in code:

let session = await InferenceSession.create(buffer, {
  // executionProviders: ["webgl"]
  executionProviders: ["wasm"]
});

Hack to at least improve in the web

let session;

try {
    session = await InferenceSession.create(buffer, {
        executionProviders: ['webgl', 'wasm']
    });
} catch (err) {
    session = await InferenceSession.create(buffer, {
        executionProviders: ['wasm']
    });
}

I can prepare a PR to improve the execution in node and web 😃

[Feature request] Add pipeline task 'zero-shot-classification' for MNLI trained models

I'm requesting to add the pipeline "zero-shot-classification" for label extraction from a text based on a pretrained model and hypotheses.
To view usage examples: https://huggingface.co/facebook/bart-large-mnli

This is a very requested feature as can be measured by the number of downloads of the linked model in the last month (> 2.5M)
And thank you for the extremely useful work you are doing!

Install script for linux

[Feature request] URL parameter API with JSON output for the web app

I would love to see a discussion on using the web-app URL-interface, as a way into using the various AI models and getting some form of semantically structured JSON back. This would make integrating the library a lot easier I think.

In my mind I see the tasks as a function-name name and the model context data as arguments on that function. There can also be global arguments, which are applied to all tasks, eg. the format type of the output (and perhaps the language).

I'm currently using the demo app with some URL parameters: https://conze.pt/app/ai/?task=summarization&arg1=Lightning%20is%20a%20naturally%20occurring&l=en

Maybe a good first step would be to have a set of JS objects describing the:

Available tasks,
Supported models for each task,
The model arguments (defaults, optional. required). Later these could be more formalized using some schema.org schema (or similar), eg: AchieveAction

const tasks = {

  'summarization':

    'description' : 'text summarization',
    'type' : 'text-generation',

    'models': {

      't5-small' : {
        'options' : {
          'output_formats' : [ 'text, 'json' ],
          'output_name' : 'summary',
          'max_new_tokens': 50,
          'num_beams': 1,
          'temperature': 1,
          'top_k': 20, 
          'do_sample': true,
          'summary_text' : 'text',
         }
       },

      't5-base' : ...,
      't5-v1_1-small' : ...,
      't5-v1_1-base' : ...,
      'facebook/bart-large-cnn' : ,
      'sshleifer/distilbart-cnn-6-6' : ...,
      'sshleifer/distilbart-cnn-12-6', ...,
    }

  // ...another task

}

TODO: Maybe add some "required / optional" and data-type declarations on the model options too.

Once we have this we can create a URL parameter structure to call a task with its options and get some plaintext or JSON back.

And perhaps (later) look into creating an OpenAPI schema (never did that myself): https://editor.swagger.io

I'm not yet sure what the best way would be to use larger amounts of text for models, I'm currently using HTTP GET, but a POST might be better suited for this. The drawback is the linking needs to become indirect then.

How to convert bloomz model

While converting the bloomz model, I am getting the 'invalid syntax' error. Is conversion limited to only predefined model types?
If not, please provide the syntax for converting the above model with quantization.

(I will run the inference in nodejs and not in browser, so memory will not be an issue in inference.)

Error: There was an error while processing timestamps, we haven't found a timestamp as last token.

On longer audio files (e.g. > 1 minute or so), I get this error:

Uncaught (in promise) Error: There was an error while processing timestamps, we haven't found a timestamp as last token.
    at Function._decode_asr (tokenizers.js:1497:23)
    at Function._call (pipelines.js:489:56)

My worker code is as this:

importScripts('https://cdn.jsdelivr.net/npm/@xenova/transformers/dist/transformers.min.js');
async function speech_to_text(data) {
	let pipe = await pipeline('automatic-speech-recognition','openai/whisper-tiny.en');
    return await pipe(data.audio, {
		top_k: 0,
		do_sample: false,
		chunk_length_s: 30,
		stride_length_s: 5,
		return_timestamps: true,
		force_full_sequences: false,
    })
}

Is there anything I can do to prevent this error?

Can this be used on Node?

I've tested on Node 18 and latest Bun, both don't work.

[Model request] mT5 (for more languages to use for translation)

Hello!
Thanks for making this great repository!
I was trying to implement machine translation and found that any language other than en, de, and ro does not work. Might it be possible to implement a more multilingual t5 model, such as Google's multilingual t5?
Thank you!

[Bug] Helsinki Multilingual models errors when using required >>id<< tokens

Describe the bug
A clear and concise description of what the bug is.

Helsinki multilingual models (Helsinki-NLP/opus-mt-en-mul and Helsinki-NLP/opus-mt-mul-en) require a specific token in the shape of >>id<< as per the models documentation. However when using those tokens will cause the models.js file to throw an error:

models.js:73 An error occurred during model execution: "Error: failed to call OrtRun(). error code = 6.".

How to reproduce
Steps or a minimal working example to reproduce the behavior

const pipe = await pipeline('translation','Helsinki-NLP/opus-mt-en-mul')
const result = await pipe(">>jpn<< I love transformer.js, it's a wonderful library");

Expected behavior
A clear and concise description of what you expected to happen.

I expect to get the translated text

Logs/screenshots
If applicable, add logs/screenshots to help explain your problem.

Environment

Transformers.js version: 1.4.0
Browser (if applicable): Arc 0.96.0
Operating system (if applicable): macOS Ventura 13.2
Other:

Additional context
Add any other context about the problem here.

It should be noted that the models do not error without the >>id<< token and run just fine without them (although the translation is obviously wrong because the source/target language wasn't specified)

How would I use Facebook Blenderbot?

Hi @xenova, I just noticed you uploaded a quantized Blenderbot to Hugging Face. Do you have instructions on how to use it? The original model doesn't have any...

Thanks!

Missing json config files when converting model from the Hub

I ran the following locally python ./scripts/convert.py --model_id openai/whisper-tiny.en --from_hub --quantize --task speech2seq-lm-with-past

Which worked mostly fine. However, the resulting directory containing converted model had a couple of files missing, namely
preprocessor_config.json and tokenizer.json. After grabbing them manually from HF https://huggingface.co/openai/whisper-tiny.en/tree/main, things seem to work fine

RangeError: offset is out of bounds

I am running the flan-t5 model for text2text-generation (e.g. await pipeline("text2text-generation", "flan-t5-base");) in a service worker. Inference runs without an issue for the first several runs, but eventually I get this error:

RangeError: offset is out of bounds
    at Uint8Array.set (<anonymous>)
    at e.createSessionAllocate (webpack-internal:///../../node_modules/onnxruntime-web/dist/ort-web.min.js:7:450663)
    at Object.e.createSession (webpack-internal:///../../node_modules/onnxruntime-web/dist/ort-web.min.js:7:451415)
    at e.createSession (webpack-internal:///../../node_modules/onnxruntime-web/dist/ort-web.min.js:7:443678)
    at e.OnnxruntimeWebAssemblySessionHandler.loadModel (webpack-internal:///../../node_modules/onnxruntime-web/dist/ort-web.min.js:7:446572)
    at Object.createSessionHandler (webpack-internal:///../../node_modules/onnxruntime-web/dist/ort-web.min.js:7:156408)
    at Function.create (webpack-internal:///../../node_modules/onnxruntime-common/dist/lib/inference-session-impl.js:176:39)
    at async constructSession (webpack-internal:///../../node_modules/@xenova/transformers/src/models.js:20:19)
    at async Promise.all (index 2)
    at async Function.from_pretrained (webpack-internal:///../../node_modules/@xenova/transformers/src/models.js:102:75)

I ran with the same input params 5 times in a row and still get the error, so it doesn't seem like it's an issue with an invalid input value.

AutomaticSpeechRecognitionPipeline callback_function api inconsistancy

In the AutomaticSpeechRecognitionPipeline class the callback_function is used in two places:

As a result, 1 returns a different data format than 2. This wasn't very clear to me. Does it not make sense to have a separate callback for these?

xenova / transformers.js Goto Github PK

transformers.js's Introduction

Quick tour

Installation

Examples

Custom usage

Settings

Convert your models to ONNX

Supported tasks/models

Tasks

Natural Language Processing

Vision

Audio

Tabular

Multimodal

Reinforcement Learning

Models

transformers.js's People

Contributors

Stargazers

Watchers

Forkers

transformers.js's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs