Comments (16)
Hi @TerryLaw535
To import T5 models (fine-tuned or already on HF) you can use one of these 2 notebooks:
- TensorFlow: https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/HuggingFace%20in%20Spark%20NLP%20-%20T5.ipynb
- ONNX: https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_T5.ipynb
Depending on which one you have saved (exported) your T5 model, you can follow the instructions to import your model into Spark NLP. (I personally recommend first trying ONNX, it has a faster inference time)
hi maziyarpanahi, thanks for your resource, however, I tried different transformers, tensorflow versions but none of them worked, basically there are all kinds of errors. And the code doesn't work on colab correctly either. Could you please check it out? Thank you so much!
from spark-nlp.
Hi @TerryLaw535
To import T5 models (fine-tuned or already on HF) you can use one of these 2 notebooks:
- TensorFlow: https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/HuggingFace%20in%20Spark%20NLP%20-%20T5.ipynb
- ONNX: https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_T5.ipynb
Depending on which one you have saved (exported) your T5 model, you can follow the instructions to import your model into Spark NLP. (I personally recommend first trying ONNX, it has a faster inference time)
from spark-nlp.
Thank you so much for your reply. I tried ONNX and it showed that [ONNXRuntimeError] : 1 : FAIL : Loading the model from onnx_models/google-t5/t5-small/decoder_model_merged.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model.cc:179 onnxruntime::Model::Model(onnx::ModelProto&&, the onnxruntime::PathString&, the onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, the onnxruntime::logging::Logger&, the onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9.
I also tried the TensorFlow method. I completely followed the instructions except for setting tensorflow == 2.8, since this version is too old and no longer available. I tried the code:
try:
model = T5ExportModel.from_pretrained(MODEL_NAME)
except:
model = T5ExportModel.from_pretrained(MODEL_NAME, from_pt=True)
model.export(EXPORT_PATH, use_cache=True)
and it report:
TypeError: in user code:
File "/tmp/ipykernel_70993/430537825.py", line 50, in decoder_init_serving *
logits = self.shared(sequence_output, mode="linear")
File "/home/weichengyu/.local/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler **
raise e.with_traceback(filtered_tb) from None
TypeError: Embedding.call() got an unexpected keyword argument 'mode'
Do you know how to deal with these problems? I think these problems are all related to version issues.
Thank you!
from spark-nlp.
I will assign someone to have a look at the TensorFlow to Spark NLP, something might have changed in Hugging Face. For the T5, I will ask someone to make a quick test to see if the notebooks are up to date.
from spark-nlp.
Related Issues (20)
- Spark NLP Configuration's spark.jsl.settings.storage.cluster_tmp_dir: Databricks DBFS location does not work HOT 3
- spark-nlp in databricks writing to root s3 in cluster HOT 1
- Import Whisper large v3 into Spark NLP HOT 5
- Zero-Shot NER gives wrong entities with labels HOT 12
- Cannot cast to float HOT 8
- Flexible normalization HOT 1
- XlmRoBertaSentenceEmbeddings returns huge amount of embeddings instead of set dimensions
- Sparknlp returning different embedding for manual spark dataframe vs reading from file spark dataframe HOT 5
- SparkNLP Embeddings inference 3X slower than with pandas_udf HOT 3
- EntityRuler fails two basic tests HOT 3
- Show an error of 'GLIBC_2.27 not found' when pretrained model download in AWS EMR HOT 2
- Onnx models fail when saving transformer
- Hardcoded column name in DocumentSimilarityRanker annotator
- ERROR TorrentBroadcast: Store broadcast broadcast_5 fail, remove all pieces of the broadcast HOT 7
- Scala 2.13 support HOT 1
- org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] HOT 3
- DependencyParserApproach throws "IllegalArgumentException: For input string: "_"" when training with CONLLU dataset HOT 5
- Importing models into Spark NLP in TensorFlow and ONNX formats
- MultiClassifierDLApproach not transforming every row of my dataset HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-nlp.