GithubHelp home page GithubHelp logo

johnsnowlabs / nlu Goto Github PK

View Code? Open in Web Editor NEW
816.0 23.0 124.0 487.07 MB

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

License: Apache License 2.0

Python 99.93% Shell 0.07%
nlu natural-language-understanding sentiment-classifier text-classification transformers language-detection named-entity-recognition seq2seq t5 lemmatizer

nlu's Introduction

NLU: The Power of Spark NLP, the Simplicity of Python

John Snow Labs' NLU is a Python library for applying state-of-the-art text mining, directly on any dataframe, with a single line of code. As a facade of the award-winning Spark NLP library, it comes with 1000+ of pretrained models in 100+, all production-grade, scalable, and trainable, with everything in 1 line of code.

NLU in Action

See how easy it is to use any of the thousands of models in 1 line of code, there are hundreds of tutorials and simple examples you can copy and paste into your projects to achieve State Of The Art easily.

NLU & Streamlit in Action

This 1 line let's you visualize and play with 1000+ SOTA NLU & NLP models in 200 languages

streamlit run https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/examples/streamlit/01_dashboard.py

NLU provides tight and simple integration into Streamlit, which enables building powerful webapps in just 1 line of code which showcase the. View the NLU&Streamlit documentation or NLU & Streamlit examples section. The entire GIF demo and

All NLU resources overview

Take a look at our official NLU page: https://nlu.johnsnowlabs.com/ for user documentation and examples

Ressource Description
Install NLU Just run pip install nlu pyspark==3.0.2
The NLU Namespace Find all the names of models you can load with nlu.load()
The nlu.load(<Model>) function Load any of the 1000+ models in 1 line
The nlu.load(<Model>).predict(data) function Predict on Strings, List of Strings, Numpy Arrays, Pandas, Modin and Spark Dataframes
The nlu.load(<train.Model>).fit(data) function Train a text classifier for 2-Class, N-Classes Multi-N-Classes, Named-Entitiy-Recognition or Parts of Speech Tagging
The nlu.load(<Model>).viz(data) function Visualize the results of Word Embedding Similarity Matrix, Named Entity Recognizers, Dependency Trees & Parts of Speech, Entity Resolution,Entity Linking or Entity Status Assertion
The nlu.load(<Model>).viz_streamlit(data) function Display an interactive GUI which lets you explore and test every model and feature in NLU in 1 click.
General Concepts General concepts in NLU
The latest release notes Newest features added to NLU
Overview NLU 1-liners examples Most common used models and their results
Overview NLU 1-liners examples for healthcare models Most common used healthcare models and their results
Overview of all NLU tutorials and Examples 100+ tutorials on how to use NLU on text datasets for various problems and from various sources like Twitter, Chinese News, Crypto News Headlines, Airline Traffic communication, Product review classifier training,
Connect with us on Slack Problems, questions or suggestions? We have a very active and helpful community of over 2000+ AI enthusiasts putting NLU, Spark NLP & Spark OCR to good use
Discussion Forum More indepth discussion with the community? Post a thread in our discussion Forum
John Snow Labs Medium Articles and Tutorials on the NLU, Spark NLP and Spark OCR
John Snow Labs Youtube Videos and Tutorials on the NLU, Spark NLP and Spark OCR
NLU Website The official NLU website
Github Issues Report a bug

Getting Started with NLU

To get your hands on the power of NLU, you just need to install it via pip and ensure Java 8 is installed and properly configured. Checkout Quickstart for more infos

pip install nlu pyspark==3.0.2

Loading and predicting with any model in 1 line python

import nlu 
nlu.load('sentiment').predict('I love NLU! <3') 

Loading and predicting with multiple models in 1 line

Get 6 different embeddings in 1 line and use them for downstream data science tasks!

nlu.load('bert elmo albert xlnet glove use').predict('I love NLU! <3') 

What kind of models does NLU provide?

NLU provides everything a data scientist might want to wish for in one line of code!

  • NLU provides everything a data scientist might want to wish for in one line of code!
  • 1000 + pre-trained models
  • 100+ of the latest NLP word embeddings ( BERT, ELMO, ALBERT, XLNET, GLOVE, BIOBERT, ELECTRA, COVIDBERT) and different variations of them
  • 50+ of the latest NLP sentence embeddings ( BERT, ELECTRA, USE) and different variations of them
  • 100+ Classifiers (NER, POS, Emotion, Sarcasm, Questions, Spam)
  • 300+ Supported Languages
  • Summarize Text and Answer Questions with T5
  • Labeled and Unlabeled Dependency parsing
  • Various Text Cleaning and Pre-Processing methods like Stemming, Lemmatizing, Normalizing, Filtering, Cleaning pipelines and more

Classifiers trained on many different datasets

Choose the right tool for the right task! Whether you analyze movies or twitter, NLU has the right model for you!

  • trec6 classifier
  • trec10 classifier
  • spam classifier
  • fake news classifier
  • emotion classifier
  • cyberbullying classifier
  • sarcasm classifier
  • sentiment classifier for movies
  • IMDB Movie Sentiment classifier
  • Twitter sentiment classifier
  • NER pretrained on ONTO notes
  • NER trainer on CONLL
  • Language classifier for 20 languages on the wiki 20 lang dataset.

Utilities for the Data Science NLU applications

Working with text data can sometimes be quite a dirty job. NLU helps you keep your hands clean by providing components that take away from data engineering intensive tasks.

  • Datetime Matcher
  • Pattern Matcher
  • Chunk Matcher
  • Phrases Matcher
  • Stopword Cleaners
  • Pattern Cleaners
  • Slang Cleaner

Where can I see all models available in NLU?

For NLU models to load, see the NLU Namespace or the John Snow Labs Modelshub or go straight to the source.

Supported Data Types

  • Pandas DataFrame and Series
  • Spark DataFrames
  • Modin with Ray backend
  • Modin with Dask backend
  • Numpy arrays
  • Strings and lists of strings

Overview of all tutorials using the NLU-Library

In the following tabular, all available tutorials using NLU are listed. These tutorials will help you learn the usage of the NLU library and on how to use it for your own tasks. Some of the tasks NLU does are translating from any language to the english language, lemmatizing, tokenizing, cleaning text from Symbol or unwanted syntax, spellchecking, detecting entities, analyzing sentiments and many more!

{:.table2}

Tutorial Description NLU Spells Used Open In Colab Dataset and Paper References
Albert Word Embeddings with NLU albert, sentiment pos albert emotion Open In Colab Albert-Paper, Albert on Github, Albert on TensorFlow, T-SNE, T-SNE-Albert, Albert_Embedding
Bert Word Embeddings with NLU bert, pos sentiment emotion bert Open In Colab Bert-Paper, Bert Github, T-SNE, T-SNE-Bert, Bert_Embedding
BIOBERT Word Embeddings with NLU biobert , sentiment pos biobert emotion Open In Colab BioBert-Paper, Bert Github , BERT: Deep Bidirectional Transformers, Bert Github, T-SNE, T-SNE-Biobert, Biobert_Embedding
COVIDBERT Word Embeddings with NLU covidbert, sentiment covidbert pos Open In Colab CovidBert-Paper, Bert Github, T-SNE, T-SNE-CovidBert, Covidbert_Embedding
ELECTRA Word Embeddings with NLU electra, sentiment pos en.embed.electra emotion Open In Colab Electra-Paper, T-SNE, T-SNE-Electra, Electra_Embedding
ELMO Word Embeddings with NLU elmo, sentiment pos elmo emotion Open In Colab ELMO-Paper, Elmo-TensorFlow, T-SNE, T-SNE-Elmo, Elmo-Embedding
GLOVE Word Embeddings with NLU glove, sentiment pos glove emotion Open In Colab Glove-Paper, T-SNE, T-SNE-Glove , Glove_Embedding
XLNET Word Embeddings with NLU xlnet, sentiment pos xlnet emotion Open In Colab XLNet-Paper, Bert Github, T-SNE, T-SNE-XLNet, Xlnet_Embedding
Multiple Word-Embeddings and Part of Speech in 1 Line of code bert electra elmo glove xlnet albert pos Open In Colab Bert-Paper, Albert-Paper, ELMO-Paper, Electra-Paper, XLNet-Paper, Glove-Paper
Normalzing with NLU norm Open In Colab -
Detect sentences with NLU sentence_detector.deep, sentence_detector.pragmatic, xx.sentence_detector Open In Colab Sentence Detector
Spellchecking with NLU n.a. n.a. -
Stemming with NLU en.stem, de.stem Open In Colab -
Stopwords removal with NLU stopwords Open In Colab Stopwords
Tokenization with NLU tokenize Open In Colab -
Normalization of Documents norm_document Open In Colab -
Open and Closed book question answering with Google's T5 en.t5 , answer_question Open In Colab T5-Paper, T5-Model
Overview of every task available with T5 en.t5.base Open In Colab T5-Paper, T5-Model
Translate between more than 200 Languages in 1 line of code with Marian Models tr.translate_to.fr, en.translate_to.fr ,fr.translate_to.he , en.translate_to.de Open In Colab Marian-Papers, Translation-Pipeline (En to Fr), Translation-Pipeline (En to Ger)
BERT Sentence Embeddings with NLU embed_sentence.bert, pos sentiment embed_sentence.bert Open In Colab Bert-Paper, Bert Github, Bert-Sentence_Embedding
ELECTRA Sentence Embeddings with NLU embed_sentence.electra, pos sentiment embed_sentence.electra Open In Colab Electra Paper, Sentence-Electra-Embedding
USE Sentence Embeddings with NLU use, pos sentiment use emotion Open In Colab Universal Sentence Encoder, USE-TensorFlow, Sentence-USE-Embedding
Sentence similarity with NLU using BERT embeddings embed_sentence.bert, use en.embed_sentence.electra embed_sentence.bert Open In Colab Bert-Paper, Bert Github, Bert-Sentence_Embedding
Part of Speech tagging with NLU pos Open In Colab Part of Speech
NER Aspect Airline ATIS en.ner.aspect.airline Open In Colab NER Airline Model, Atis intent Dataset
NLU-NER_CONLL_2003_5class_example ner Open In Colab NER-Piple
Named-entity recognition with Deep Learning ONTO NOTES ner.onto Open In Colab NER_Onto
Aspect based NER-Sentiment-Restaurants en.ner.aspect_sentiment Open In Colab -
Detect Named Entities (NER), Part of Speech Tags (POS) and Tokenize in Chinese zh.segment_words, zh.pos, zh.ner, zh.translate_to.en Open In Colab Translation-Pipeline (Zh to En)
Detect Named Entities (NER), Part of Speech Tags (POS) and Tokenize in Japanese ja.segment_words, ja.pos, ja.ner, ja.translate_to.en Open In Colab Translation-Pipeline (Ja to En)
Detect Named Entities (NER), Part of Speech Tags (POS) and Tokenize in Korean ko.segment_words, ko.pos, ko.ner.kmou.glove_840B_300d, ko.translate_to.en Open In Colab -
Date Matching match.datetime Open In Colab -
Typed Dependency Parsing with NLU dep Open In Colab Dependency Parsing
Untyped Dependency Parsing with NLU dep.untyped Open In Colab -
E2E Classification with NLU e2e Open In Colab e2e-Model
Language Classification with NLU lang Open In Colab -
Cyberbullying Classification with NLU classify.cyberbullying Open In Colab Cyberbullying-Classifier
Sentiment Classification with NLU for Twitter emotion Open In Colab Emotion detection
Fake News Classification with NLU en.classify.fakenews Open In Colab Fakenews-Classifier
Intent Classification with NLU en.classify.intent.airline Open In Colab Airline-Intention classifier, Atis-Dataset
Question classification based on the TREC dataset en.classify.questions Open In Colab Question-Classifier
Sarcasm Classification with NLU en.classify.sarcasm Open In Colab Sarcasm-Classifier
Sentiment Classification with NLU for Twitter en.sentiment.twitter Open In Colab Sentiment_Twitter-Classifier
Sentiment Classification with NLU for Movies en.sentiment.imdb Open In Colab Sentiment_imdb-Classifier
Spam Classification with NLU en.classify.spam Open In Colab Spam-Classifier
Toxic text classification with NLU en.classify.toxic Open In Colab Toxic-Classifier
Unsupervised keyword extraction with NLU using the YAKE algorithm yake Open In Colab -
Grammatical Chunk Matching with NLU match.chunks Open In Colab -
Getting n-Grams with NLU ngram Open In Colab -
Assertion en.med_ner.clinical en.assert, en.med_ner.clinical.biobert en.assert.biobert, ... Open In Colab Healthcare-NER, NER_Clinical-Classifier, Toxic-Classifier
De-Identification Model overview med_ner.jsl.wip.clinical en.de_identify, med_ner.jsl.wip.clinical en.de_identify.clinical, ... Open In Colab NER-Clinical
Drug Normalization norm_drugs Open In Colab -
Entity Resolution med_ner.jsl.wip.clinical en.resolve_chunk.cpt_clinical, med_ner.jsl.wip.clinical en.resolve.icd10cm, ... Open In Colab NER-Clinical, Entity-Resolver clinical
Medical Named Entity Recognition en.med_ner.ade.clinical, en.med_ner.ade.clinical_bert, en.med_ner.anatomy,en.med_ner.anatomy.biobert, ... Open In Colab -
Relation Extraction en.med_ner.jsl.wip.clinical.greedy en.relation, en.med_ner.jsl.wip.clinical.greedy en.relation.bodypart.problem, ... Open In Colab -
Visualization of NLP-Models with Spark-NLP and NLU ner, dep.typed, med_ner.jsl.wip.clinical resolve_chunk.rxnorm.in, med_ner.jsl.wip.clinical resolve.icd10cm Open In Colab NER-Piple, Dependency Parsing, NER-Clinical, Entity-Resolver (Chunks) clinical
NLU Covid-19 Emotion Showcase emotion Open In GitHub Emotion detection
NLU Covid-19 Sentiment Showcase sentiment Open In GitHub Sentiment classification
NLU Airline Emotion Demo emotion Open In GitHub Emotion detection
NLU Airline Sentiment Demo sentiment Open In GitHub Sentiment classification
Bengali NER Hindi Embeddings for 30 Models bn.ner, bn.lemma, ja.lemma, am.lemma, bh.lemma, en.ner.onto.bert.small_l2_128,.. Open In Colab Bengali-NER, Bengali-Lemmatizer, Japanese-Lemmatizer, Amharic-Lemmatizer
Entity Resolution med_ner.jsl.wip.clinical en.resolve.umls, med_ner.jsl.wip.clinical en.resolve.loinc, med_ner.jsl.wip.clinical en.resolve.loinc.biobert Open In Colab -
NLU 20 Minutes Crashcourse - the fast Data Science route spell, sentiment, pos, ner, yake, en.t5, emotion, answer_question, en.t5.base ... Open In Colab T5-Model, Part of Speech, NER-Piple, Emotion detection , Spellchecker, Sentiment classification
Chapter 0: Intro: 1-liners sentiment, pos, ner, bert, elmo, embed_sentence.bert Open In Colab Part of Speech, NER-Piple, Sentiment classification, Elmo-Embedding, Bert-Sentence_Embedding
Chapter 1: NLU base-features with some classifiers on testdata emotion, yake, stem Open In Colab Emotion detection
Chapter 2: Translation between 300+ languages with Marian tr.translate_to.en, en.translate_to.fr, en.translate_to.he Open In Colab Translation-Pipeline (En to Fr), Translation (En to He)
Chapter 3: Answer questions and summarize Texts with T5 answer_question, en.t5, en.t5.base Open In Colab T5-Model
Chapter 4: Overview of T5-Tasks en.t5.base Open In Colab T5-Model
Graph NLU 20 Minutes Crashcourse - State of the Art Text Mining for Graphs spell, sentiment, pos, ner, yake, emotion, med_ner.jsl.wip.clinical, ... Open In Colab Part of Speech, NER-Piple, Emotion detection, Spellchecker, Sentiment classification
Healthcare with NLU med_ner.human_phenotype.gene_biobert, med_ner.ade_biobert, med_ner.anatomy, med_ner.bacterial_species,... Open In Colab -
Part 0: Intro: 1-liners spell, sentiment, pos, ner, bert, elmo, embed_sentence.bert Open In Colab Bert-Paper, Bert Github, T-SNE, T-SNE-Bert , Part of Speech, NER-Piple, Spellchecker, Sentiment classification, Elmo-Embedding , Bert-Sentence_Embedding
Part 1: NLU base-features with some classifiers on Testdata yake, stem, ner, emotion Open In Colab NER-Piple, Emotion detection
Part 2: Translate between 200+ Languages in 1 line of code with Marian-Models en.translate_to.de, en.translate_to.fr, en.translate_to.he Open In Colab Translation-Pipeline (En to Fr), Translation-Pipeline (En to Ger), Translation (En to He)
Part 3: More Multilingual NLP-translations for Asian Languages with Marian en.translate_to.hi, en.translate_to.ru, en.translate_to.zh Open In Colab Translation (En to Hi), Translation (En to Ru), Translation (En to Zh)
Part 4: Unsupervise Chinese Keyword Extraction, NER and Translation from chinese news zh.translate_to.en, zh.segment_words, yake, zh.lemma, zh.ner Open In Colab Translation-Pipeline (Zh to En), Zh-Lemmatizer
Part 5: Multilingual sentiment classifier training for 100+ languages train.sentiment, xx.embed_sentence.labse train.sentiment n.a. Sentence_Embedding.Labse
Part 6: Question-answering and Text-summarization with T5-Modell answer_question, en.t5, en.t5.base Open In Colab T5-Paper
Part 7: Overview of all tasks available with T5 en.t5.base Open In Colab T5-Paper
Part 8: Overview of some of the Multilingual modes with State Of the Art accuracy (1-liner) bn.lemma, ja.lemma, am.lemma, bh.lemma, zh.segment_words, ... Open In Colab Bengali-Lemmatizer, Japanese-Lemmatizer , Amharic-Lemmatizer
Overview of some Multilingual modes avaiable with State Of the Art accuracy (1-liner) bn.ner.cc_300d, ja.ner, zh.ner, th.ner.lst20.glove_840B_300D, ar.ner Open In Colab Bengali-NER
NLU 20 Minutes Crashcourse - the fast Data Science route - Open In Colab -

Need help?

Simple NLU Demos

Features in NLU Overview

  • Tokenization
  • Trainable Word Segmentation
  • Stop Words Removal
  • Token Normalizer
  • Document Normalizer
  • Stemmer
  • Lemmatizer
  • NGrams
  • Regex Matching
  • Text Matching,
  • Chunking
  • Date Matcher
  • Sentence Detector
  • Deep Sentence Detector (Deep learning)
  • Dependency parsing (Labeled/unlabeled)
  • Part-of-speech tagging
  • Sentiment Detection (ML models)
  • Spell Checker (ML and DL models)
  • Word Embeddings (GloVe and Word2Vec)
  • BERT Embeddings (TF Hub models)
  • ELMO Embeddings (TF Hub models)
  • ALBERT Embeddings (TF Hub models)
  • XLNet Embeddings
  • Universal Sentence Encoder (TF Hub models)
  • BERT Sentence Embeddings (42 TF Hub models)
  • Sentence Embeddings
  • Chunk Embeddings
  • Unsupervised keywords extraction
  • Language Detection & Identification (up to 375 languages)
  • Multi-class Sentiment analysis (Deep learning)
  • Multi-label Sentiment analysis (Deep learning)
  • Multi-class Text Classification (Deep learning)
  • Neural Machine Translation
  • Text-To-Text Transfer Transformer (Google T5)
  • Named entity recognition (Deep learning)
  • Easy TensorFlow integration
  • GPU Support
  • Full integration with Spark ML functions
  • 1000 pre-trained models in +200 languages!
  • Multi-lingual NER models: Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Hewbrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu and more
  • Natural Language inference
  • Coreference resolution
  • Sentence Completion
  • Word sense disambiguation
  • Clinical entity recognition
  • Clinical Entity Linking
  • Entity normalization
  • Assertion Status Detection
  • De-identification
  • Relation Extraction
  • Clinical Entity Resolution

Citation

We have published a paper that you can cite for the NLU library:

@article{KOCAMAN2021100058,
    title = {Spark NLP: Natural language understanding at scale},
    journal = {Software Impacts},
    pages = {100058},
    year = {2021},
    issn = {2665-9638},
    doi = {https://doi.org/10.1016/j.simpa.2021.100058},
    url = {https://www.sciencedirect.com/science/article/pii/S2665963821000063},
    author = {Veysel Kocaman and David Talby},
    keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster},
    abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.}
    }
}

nlu's People

Contributors

agsfer avatar ahmedlone127 avatar alexott avatar arkajyotichakraborty avatar brollb avatar c-k-loan avatar davebulaval avatar dcecchini avatar dependabot[bot] avatar devintdha avatar diatrambitas avatar gadde5300 avatar josejuanmartinez avatar luca-martial avatar mahmoodbayeshi avatar maziyarpanahi avatar meryem1425 avatar milyiyo avatar murat-karadag avatar rajeshkppt avatar roverrwe avatar skocer avatar sonurdogan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlu's Issues

Unknown environment issue with BioBert

I am using the nlu BioBert mapper to improve upon a tool that already exists called text2term. A few weeks ago, I was able to get the tool working on a personal computer (Mac), but shortly after when I switched to my new work computer (also Mac, same OS but with an Apple Chip instead of Intel), the program no longer worked even with the same source code, Python, and Java version.

A coworker recreated the issue with an Apple Chip computer, Python 3.9.5, and Java 17. If you have any insights, please let me know.

Here are the co-requirements, as well as the versions and the error:
Python 3.10.6 (Also tried 3.9.13)
Java version "1.8.0_341" (Also tried Java 16)
requirements.txt:

Owlready2==0.36
argparse==1.4.0
pandas==1.4.1
numpy==1.23.2
gensim==4.1.2
scipy==1.8.0
scikit-learn==1.0.2
setuptools==60.9.3
requests==2.27.1
tqdm==4.62.3
sparse_dot_topn==0.3.1
bioregistry==0.4.63
nltk==3.7
rapidfuzz==2.0.5
shortuuid==1.0.9

Error:

ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/clientserver.py", line 516, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/clientserver.py", line 539, in send_command
    raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
[OK!]
Traceback (most recent call last):
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/pipe/component_resolution.py", line 276, in get_trained_component_for_nlp_model_ref
    component.get_pretrained_model(nlp_ref, lang, model_bucket),
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/components/embeddings/sentence_bert/BertSentenceEmbedding.py", line 13, in get_pretrained_model
    return BertSentenceEmbeddings.pretrained(name,language,bucket) \
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/annotator/embeddings/bert_sentence_embeddings.py", line 231, in pretrained
    return ResourceDownloader.downloadModel(BertSentenceEmbeddings, name, lang, remote_loc)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/pretrained/resource_downloader.py", line 40, in downloadModel
    j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply()
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/internal/__init__.py", line 317, in __init__
    super(_DownloadModel, self).__init__("com.johnsnowlabs.nlp.pretrained." + validator + ".downloadModel", reader,
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/internal/extended_java_wrapper.py", line 26, in __init__
    self._java_obj = self.new_java_obj(java_obj, *args)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/internal/extended_java_wrapper.py", line 36, in new_java_obj
    return self._new_java_obj(java_class, *args)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pyspark/ml/wrapper.py", line 86, in _new_java_obj
    return java_obj(*java_args)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pyspark/sql/utils.py", line 190, in deco
    return f(*a, **kw)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/protocol.py", line 334, in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/__init__.py", line 234, in load
    nlu_component = nlu_ref_to_component(nlu_ref)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/pipe/component_resolution.py", line 160, in nlu_ref_to_component
    resolved_component = get_trained_component_for_nlp_model_ref(lang, nlu_ref, nlp_ref, license_type, model_params)
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/pipe/component_resolution.py", line 287, in get_trained_component_for_nlp_model_ref
    raise ValueError(f'Failure making component, nlp_ref={nlp_ref}, nlu_ref={nlu_ref}, lang={lang}, \n err={e}')
ValueError: Failure making component, nlp_ref=sent_biobert_pmc_base_cased, nlu_ref=en.embed_sentence.biobert.pmc_base_cased, lang=en, 
 err=An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/__main__.py", line 48, in <module>
    Text2Term().map_file(arguments.source, arguments.target, output_file=arguments.output, csv_columns=csv_columns,
  File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/t2t.py", line 63, in map_file
    return self.map(source_terms, target_ontology, source_terms_ids=source_terms_ids, base_iris=base_iris,
  File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/t2t.py", line 115, in map
    self._do_biobert_mapping(source_terms, target_terms, biobert_file)
  File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/t2t.py", line 161, in _do_biobert_mapping
    biobert = BioBertMapper(ontology_terms)
  File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/biobert_mapper.py", line 28, in __init__
    self.biobert = self.load_biobert()
  File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/biobert_mapper.py", line 34, in load_biobert
    biobert = nlu.load('en.embed_sentence.biobert.pmc_base_cased')
  File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/__init__.py", line 249, in load
    raise Exception(
Exception: Something went wrong during creating the Spark NLP model_anno_obj for your request =  en.embed_sentence.biobert.pmc_base_cased Did you use a NLU Spell?

Installing Java

Hello,
I wanted to know if it's possible to use nlu on a 'non-cells' editor like VS Code.
I tried to, but I have this error :
Exception: Java gateway process exited before sending its port number

I looked into the colab file and I think I need to paste this

os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]

but I don't have this file /usr/lib/jvm/java-8-openjdk-amd64

Could you please send it to me or on Github if it's the solution ?
Thanks for your help

bad casing for nlp ref

Some nlp refs in spell book have wrong casing.
Double check with Modelshub/S3 metadata and fix

did the last version support python==3.8.10

hx@hx-image:~$ streamlit run https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/examples/streamlit/01_dashboard.py
Traceback (most recent call last):
File "/home/hx/.local/bin/streamlit", line 5, in
from streamlit.web.cli import main
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/init.py", line 55, in
from streamlit.delta_generator import DeltaGenerator as _DeltaGenerator
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/delta_generator.py", line 38, in
from streamlit import config, cursor, env_util, logger, runtime, type_util, util
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/cursor.py", line 18, in
from streamlit.runtime.scriptrunner import get_script_run_ctx
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/runtime/init.py", line 16, in
from streamlit.runtime.runtime import Runtime as Runtime
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/runtime/runtime.py", line 28, in
from streamlit.runtime.app_session import AppSession
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/runtime/app_session.py", line 35, in
from streamlit.runtime import caching, legacy_caching
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/runtime/caching/init.py", line 21, in
from streamlit.runtime.state.session_state import WidgetMetadata
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/runtime/state/init.py", line 16, in
from streamlit.runtime.state.safe_session_state import (
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/runtime/state/safe_session_state.py", line 20, in
from streamlit.runtime.state.session_state import (
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/runtime/state/session_state.py", line 44, in
from streamlit.type_util import ValueFieldName, is_array_value_field_name
File "/home/hx/.local/lib/python3.8/site-packages/streamlit/type_util.py", line 35, in
import pyarrow as pa
File "/home/hx/.local/lib/python3.8/site-packages/pyarrow/init.py", line 65, in
import pyarrow.lib as _lib
File "pyarrow/compat.pxi", line 43, in init pyarrow.lib
File "/home/hx/.local/lib/python3.8/site-packages/cloudpickle/init.py", line 3, in
from cloudpickle.cloudpickle import *
File "/home/hx/.local/lib/python3.8/site-packages/cloudpickle/cloudpickle.py", line 167, in
_cell_set_template_code = _make_cell_set_template_code()
File "/home/hx/.local/lib/python3.8/site-packages/cloudpickle/cloudpickle.py", line 148, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)

Model Loading

I am loading model like this

import sparknlp
import nlu

spark = sparknlp.start()
df = spark.read.csv("nlp_data.csv")
res = nlu.load("pos").predict(df[["text"]].rdd.flatMap(lambda x: x).collect())
print(res)
spark.stop()

Each time I get the following messages in my console:

com.johnsnowlabs.nlp#spark-nlp_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-3f17e4b8-0bdf-40c5-9879-d62f9c2dc974;1.0
        confs: [default]
        found com.johnsnowlabs.nlp#spark-nlp_2.12;5.2.3 in central
        found com.typesafe#config;1.4.2 in central
        found org.rocksdb#rocksdbjni;6.29.5 in central
        found com.amazonaws#aws-java-sdk-s3;1.12.500 in central
        found com.amazonaws#aws-java-sdk-kms;1.12.500 in central
        found com.amazonaws#aws-java-sdk-core;1.12.500 in central
        found commons-logging#commons-logging;1.1.3 in central
        found commons-codec#commons-codec;1.15 in central
        found org.apache.httpcomponents#httpclient;4.5.13 in central
        found org.apache.httpcomponents#httpcore;4.4.13 in central
        found software.amazon.ion#ion-java;1.0.2 in central
        found com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 in central
        found joda-time#joda-time;2.8.1 in central
        found com.amazonaws#jmespath-java;1.12.500 in central
        found com.github.universal-automata#liblevenshtein;3.0.0 in central
        found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
        found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
        found com.google.code.gson#gson;2.3 in central
        found it.unimi.dsi#fastutil;7.0.12 in central
        found org.projectlombok#lombok;1.16.8 in central
        found com.google.cloud#google-cloud-storage;2.20.1 in central
        found com.google.guava#guava;31.1-jre in central
        found com.google.guava#failureaccess;1.0.1 in central
        found com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava in central
        found com.google.errorprone#error_prone_annotations;2.18.0 in central
        found com.google.j2objc#j2objc-annotations;1.3 in central
        found com.google.http-client#google-http-client;1.43.0 in central
        found io.opencensus#opencensus-contrib-http-util;0.31.1 in central
        found com.google.http-client#google-http-client-jackson2;1.43.0 in central
        found com.google.http-client#google-http-client-gson;1.43.0 in central
        found com.google.api-client#google-api-client;2.2.0 in central
        found com.google.oauth-client#google-oauth-client;1.34.1 in central
        found com.google.http-client#google-http-client-apache-v2;1.43.0 in central
        found com.google.apis#google-api-services-storage;v1-rev20220705-2.0.0 in central
        found com.google.code.gson#gson;2.10.1 in central
        found com.google.cloud#google-cloud-core;2.12.0 in central
        found io.grpc#grpc-context;1.53.0 in central
        found com.google.auto.value#auto-value-annotations;1.10.1 in central
        found com.google.auto.value#auto-value;1.10.1 in central
        found javax.annotation#javax.annotation-api;1.3.2 in central
        found com.google.cloud#google-cloud-core-http;2.12.0 in central
        found com.google.http-client#google-http-client-appengine;1.43.0 in central
        found com.google.api#gax-httpjson;0.108.2 in central
        found com.google.cloud#google-cloud-core-grpc;2.12.0 in central
        found io.grpc#grpc-alts;1.53.0 in central
        found io.grpc#grpc-grpclb;1.53.0 in central
        found org.conscrypt#conscrypt-openjdk-uber;2.5.2 in central
        found io.grpc#grpc-auth;1.53.0 in central
        found io.grpc#grpc-protobuf;1.53.0 in central
        found io.grpc#grpc-protobuf-lite;1.53.0 in central
        found io.grpc#grpc-core;1.53.0 in central
        found com.google.api#gax;2.23.2 in central
        found com.google.api#gax-grpc;2.23.2 in central
        found com.google.auth#google-auth-library-credentials;1.16.0 in central
        found com.google.auth#google-auth-library-oauth2-http;1.16.0 in central
        found com.google.api#api-common;2.6.2 in central
        found io.opencensus#opencensus-api;0.31.1 in central
        found com.google.api.grpc#proto-google-iam-v1;1.9.2 in central
        found com.google.protobuf#protobuf-java;3.21.12 in central
        found com.google.protobuf#protobuf-java-util;3.21.12 in central
        found com.google.api.grpc#proto-google-common-protos;2.14.2 in central
        found org.threeten#threetenbp;1.6.5 in central
        found com.google.api.grpc#proto-google-cloud-storage-v2;2.20.1-alpha in central
        found com.google.api.grpc#grpc-google-cloud-storage-v2;2.20.1-alpha in central
        found com.google.api.grpc#gapic-google-cloud-storage-v2;2.20.1-alpha in central
        found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
        found com.google.code.findbugs#jsr305;3.0.2 in central
        found io.grpc#grpc-api;1.53.0 in central
        found io.grpc#grpc-stub;1.53.0 in central
        found org.checkerframework#checker-qual;3.31.0 in central
        found io.perfmark#perfmark-api;0.26.0 in central
        found com.google.android#annotations;4.1.1.4 in central
        found org.codehaus.mojo#animal-sniffer-annotations;1.22 in central
        found io.opencensus#opencensus-proto;0.2.0 in central
        found io.grpc#grpc-services;1.53.0 in central
        found com.google.re2j#re2j;1.6 in central
        found io.grpc#grpc-netty-shaded;1.53.0 in central
        found io.grpc#grpc-googleapis;1.53.0 in central
        found io.grpc#grpc-xds;1.53.0 in central
        found com.navigamez#greex;1.0 in central
        found dk.brics.automaton#automaton;1.11-8 in central
        found com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.4.4 in central
        found com.microsoft.onnxruntime#onnxruntime;1.16.3 in central
:: resolution report :: resolve 1966ms :: artifacts dl 54ms
        :: modules in use:
        com.amazonaws#aws-java-sdk-core;1.12.500 from central in [default]
        com.amazonaws#aws-java-sdk-kms;1.12.500 from central in [default]
        com.amazonaws#aws-java-sdk-s3;1.12.500 from central in [default]
        com.amazonaws#jmespath-java;1.12.500 from central in [default]
        com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
        com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 from central in [default]
        com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
        com.google.android#annotations;4.1.1.4 from central in [default]
        com.google.api#api-common;2.6.2 from central in [default]
        com.google.api#gax;2.23.2 from central in [default]
        com.google.api#gax-grpc;2.23.2 from central in [default]
        com.google.api#gax-httpjson;0.108.2 from central in [default]
        com.google.api-client#google-api-client;2.2.0 from central in [default]
        com.google.api.grpc#gapic-google-cloud-storage-v2;2.20.1-alpha from central in [default]
        com.google.api.grpc#grpc-google-cloud-storage-v2;2.20.1-alpha from central in [default]
        com.google.api.grpc#proto-google-cloud-storage-v2;2.20.1-alpha from central in [default]
        com.google.api.grpc#proto-google-common-protos;2.14.2 from central in [default]
        com.google.api.grpc#proto-google-iam-v1;1.9.2 from central in [default]
        com.google.apis#google-api-services-storage;v1-rev20220705-2.0.0 from central in [default]
        com.google.auth#google-auth-library-credentials;1.16.0 from central in [default]
        com.google.auth#google-auth-library-oauth2-http;1.16.0 from central in [default]
        com.google.auto.value#auto-value;1.10.1 from central in [default]
        com.google.auto.value#auto-value-annotations;1.10.1 from central in [default]
        com.google.cloud#google-cloud-core;2.12.0 from central in [default]
        com.google.cloud#google-cloud-core-grpc;2.12.0 from central in [default]
        com.google.cloud#google-cloud-core-http;2.12.0 from central in [default]
        com.google.cloud#google-cloud-storage;2.20.1 from central in [default]
        com.google.code.findbugs#jsr305;3.0.2 from central in [default]
        com.google.code.gson#gson;2.10.1 from central in [default]
        com.google.errorprone#error_prone_annotations;2.18.0 from central in [default]
        com.google.guava#failureaccess;1.0.1 from central in [default]
        com.google.guava#guava;31.1-jre from central in [default]
        com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava from central in [default]
        com.google.http-client#google-http-client;1.43.0 from central in [default]
        com.google.http-client#google-http-client-apache-v2;1.43.0 from central in [default]
        com.google.http-client#google-http-client-appengine;1.43.0 from central in [default]
        com.google.http-client#google-http-client-gson;1.43.0 from central in [default]
        com.google.http-client#google-http-client-jackson2;1.43.0 from central in [default]
        com.google.j2objc#j2objc-annotations;1.3 from central in [default]
        com.google.oauth-client#google-oauth-client;1.34.1 from central in [default]
        com.google.protobuf#protobuf-java;3.21.12 from central in [default]
        com.google.protobuf#protobuf-java-util;3.21.12 from central in [default]
        com.google.re2j#re2j;1.6 from central in [default]
        com.johnsnowlabs.nlp#spark-nlp_2.12;5.2.3 from central in [default]
        com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.4.4 from central in [default]
        com.microsoft.onnxruntime#onnxruntime;1.16.3 from central in [default]
        com.navigamez#greex;1.0 from central in [default]
        com.typesafe#config;1.4.2 from central in [default]
        commons-codec#commons-codec;1.15 from central in [default]
        commons-logging#commons-logging;1.1.3 from central in [default]
        dk.brics.automaton#automaton;1.11-8 from central in [default]
        io.grpc#grpc-alts;1.53.0 from central in [default]
        io.grpc#grpc-api;1.53.0 from central in [default]
        io.grpc#grpc-auth;1.53.0 from central in [default]
        io.grpc#grpc-context;1.53.0 from central in [default]
        io.grpc#grpc-core;1.53.0 from central in [default]
        io.grpc#grpc-googleapis;1.53.0 from central in [default]
        io.grpc#grpc-grpclb;1.53.0 from central in [default]
        io.grpc#grpc-netty-shaded;1.53.0 from central in [default]
        io.grpc#grpc-protobuf;1.53.0 from central in [default]
        io.grpc#grpc-protobuf-lite;1.53.0 from central in [default]
        io.grpc#grpc-services;1.53.0 from central in [default]
        io.grpc#grpc-stub;1.53.0 from central in [default]
        io.grpc#grpc-xds;1.53.0 from central in [default]
        io.opencensus#opencensus-api;0.31.1 from central in [default]
        io.opencensus#opencensus-contrib-http-util;0.31.1 from central in [default]
        io.opencensus#opencensus-proto;0.2.0 from central in [default]
        io.perfmark#perfmark-api;0.26.0 from central in [default]
        it.unimi.dsi#fastutil;7.0.12 from central in [default]
        javax.annotation#javax.annotation-api;1.3.2 from central in [default]
        joda-time#joda-time;2.8.1 from central in [default]
        org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
        org.apache.httpcomponents#httpcore;4.4.13 from central in [default]
        org.checkerframework#checker-qual;3.31.0 from central in [default]
        org.codehaus.mojo#animal-sniffer-annotations;1.22 from central in [default]
        org.conscrypt#conscrypt-openjdk-uber;2.5.2 from central in [default]
        org.projectlombok#lombok;1.16.8 from central in [default]
        org.rocksdb#rocksdbjni;6.29.5 from central in [default]
        org.threeten#threetenbp;1.6.5 from central in [default]
        software.amazon.ion#ion-java;1.0.2 from central in [default]
        :: evicted modules:
        commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.1.3] in [default]
        commons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.15] in [default]
        com.google.protobuf#protobuf-java-util;3.0.0-beta-3 by [com.google.protobuf#protobuf-java-util;3.21.12] in [default]
        com.google.protobuf#protobuf-java;3.0.0-beta-3 by [com.google.protobuf#protobuf-java;3.21.12] in [default]
        com.google.code.gson#gson;2.3 by [com.google.code.gson#gson;2.10.1] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   85  |   0   |   0   |   5   ||   80  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-3f17e4b8-0bdf-40c5-9879-d62f9c2dc974
        confs: [default]
        0 artifacts copied, 80 already retrieved (0kB/27ms)


pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[ / ]pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[ — ]Download done! Loading the resource.
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[ | ]sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[ / ]Download done! Loading the resource.
[ — ]2024-02-06 14:43:45.340048: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[OK!]

Is it indicating that I am downloading the model(s) from the internet agin and again, or am I downloading it from the jar files?
I assume that the jar files are now on my local system since it took some time when I first installed spark-nlp, and now it just prints the jars information almost immediately when I run the code

Something went wrong during loading and fitting the pipe...

I saw this error occur in the closed issues and I believe it was fixed in a later version. I'm not sure if this is the same issue as well.

system:
Windows 10
Python 3.8.8
Pyspark 3.0.2
NLU 3.1.1
Spark 3.1.2

An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: java.lang.UnsatisfiedLinkError:

Exception:
Something went wrong during loading and fitting the pipe. Check the other prints for more information and also verbose mode. Did you use a correct model reference?

Breaking dependencies

Hello I'am trying to run your lab into wsl but an error occure with dependencies. The full trace bellow:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/pipeline.py", line 468, in predict
    return __predict__(self, data, output_level, positions, keep_stranger_features, metadata, multithread,
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/utils/predict_helper.py", line 166, in __predict__
    pipe.fit()
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/pipeline.py", line 202, in fit
    self.vanilla_transformer_pipe = self.spark_estimator_pipe.fit(self.get_sample_spark_dataframe())
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/pipeline.py", line 101, in get_sample_spark_dataframe
    return sparknlp.start().createDataFrame(data=text_df)
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pyspark/sql/session.py", line 673, in createDataFrame
    return super(SparkSession, self).createDataFrame(
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py", line 299, in createDataFrame
    data = self._convert_from_pandas(data, schema, timezone)
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py", line 331, in _convert_from_pandas
    for column, series in pdf.iteritems():
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pandas/core/generic.py", line 6202, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'iteritems'
callmarl@LAPTOP-QS9M6N2F ~/workzone/nlp % python --version
Python 3.9.2
callmarl@LAPTOP-QS9M6N2F ~/workzone/nlp % pip freeze
asttokens==2.4.0
backcall==0.2.0
certifi==2023.7.22
charset-normalizer==3.2.0
click==8.1.7
colorama==0.4.6
databricks-api==0.9.0
databricks-cli==0.17.7
dataclasses==0.6
decorator==5.1.1
exceptiongroup==1.1.3
executing==1.2.0
idna==3.4
ipython==8.15.0
jedi==0.19.0
johnsnowlabs==5.0.7
matplotlib-inline==0.1.6
nlu==5.0.0
numpy==1.25.2
oauthlib==3.2.2
pandas==2.1.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
pkg_resources==0.0.0
prompt-toolkit==3.0.39
ptyprocess==0.7.0
pure-eval==0.2.2
py4j==0.10.9
pyarrow==13.0.0
pydantic==1.10.11
Pygments==2.16.1
PyJWT==2.8.0
pyspark==3.1.2
python-dateutil==2.8.2
pytz==2023.3.post1
requests==2.31.0
six==1.16.0
spark-nlp==5.0.2
spark-nlp-display==4.1
stack-data==0.6.2
svgwrite==1.4
tabulate==0.9.0
traitlets==5.9.0
typing_extensions==4.7.1
tzdata==2023.3
urllib3==1.26.16
wcwidth==0.2.6

GPU support

Hi,
I am using the Marian Models for translation.
It works fine, but I am assuming it works only on CPU
(I am using the following code:
pipe_translate = nlu.load('hu.translate_to.en')
translate = pipe_translate.predict("Sziasztok, mi a helyzet?")
and the predict part takes about 5 second, and I have an A100 GPU,
I dont think this should take so long...)
I can't figure it out, how to use the GPU, or how to check, if it uses the GPU...
(print (tf.test.gpu_device_name()) show the the GPU is there...)
Where can I find some documentation/info about this issue?
I had some issues with CUDA and java installation, but right now these look fine...

Thanks

[New Feature] LLMs for Machine Translation of slot-annotated data

Describe the feature
Expansion of SLU to new languages requires much work on manual annotation of data. In order to significantly reduce amount of work, LLMs can be used to machine translate slot-annotated data, e.g.
"play me <a> Dune <a> on <b> Youtube <b>" => "Spiele mir <a> Dune <a> auf <b> Youtube <b>"

Such feature is especially useful for expansion of On-Device SLU to new languages, as high quality multilingual transformers/LLMs cannot be used as core SLU model in this case.

Expected behavior
MT-LLM pipeline expects english sentences annotated in generic <> tags format (for example: "play me <a> Dune <a> on <b> Youtube <b>") and outputs translated sentence in the same format ("Spiele mir <a> Dune <a> auf <b> Youtube <b>"). Such data format can be easily converted to BIO annotation and to other popular NLU formats.

Additional context
https://paperswithcode.com/paper/large-language-models-for-expansion-of-spoken

In our recent work, we fine-tuned MT-LLM called BigTranslate towards MT of slot-annotated NLU data. We used parallel Amazon MASSIVE dataset for fine-tuning. There is significant performance improvement after fine-tuning (compared to zero-shot LLM-based machine translation) on multiATIS++ benchmark.

Here you can find fine-tuned BigTranslate: https://huggingface.co/Samsung/BigTranslateSlotTranslator
Here you can find code for fine-tuning + code for NLU training: https://github.com/samsung/mt-llm-nlu

In summary, we are wondering how we can merge our work into this project ) And what parts of our work might be useful for this proejct (e.g., scripts for conversion from BIO to tags format ??).

using NLU for biobert embeddings -- takes a really long time on list of 10,000 words, and on 1 word

Hi, so we are working on generating biobert embeddings for our project. When we run it on a single word it takes about a second or so. When we run on a list of 10,000 words, it either times out or takes upwards of hours to run. Is this normal? Below is how we are using it:

def load_biobert(self):
# Load BioBERT model (for sentence-type embeddings)
self.logger.info("Loading BioBERT model...")
start = time.time()
biobert = nlu.load('en.embed_sentence.biobert.pmc_base_cased')
end = time.time()
self.logger.info('done (BioBERT loading time: %.2fs seconds)', end - start)
return biobert

def get_biobert_embeddings(self, strings):
embedding_list = []
for string in strings:
self.logger.debug("...Generating embedding for: %s", string)
embedding_list.append(self.get_biobert_embedding(string))
return embedding_list

def get_biobert_embedding(self, string):
embedding = self.biobert.predict(string, output_level='sentence', get_embeddings=True)
return embedding.sentence_embedding_biobert.values[0]

Missing embedding results

Hello,

It seems like embeddings are not returned when running any of the embedding predictions. Sentiment and other models do return results fine though.

Any ideas what could I be missing here?

image

nlu 3.3.0
pyspark 3.0.2
py4j 0.10.9
spark-nlp 3.3.2
running on google colab

logging

Hi!
I would like to ask if it is possible to turn off logging or change the logging level from python script that uses nlu library?
Even simple 'import nlu' generates lines of logs, loading models there are tons of them...

Before importing nlu, I am trying to create pyspark context and set desired log level as it is pointed in logs from import nlu: Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel)

But it doesn't seem to help, actually the opposite, I can't load models and do the predictions then...

The other approach was setting logging levels for all possible loggers: nlu, py4j, py4j.java_gateway to CRITICAL in my case

logging.getLogger('nlu').setLevel(logging.CRITICAL)
logging.getLogger('py4j').setLevel(logging.CRITICAL)
logging.getLogger('py4j.java_gateway').setLevel(logging.CRITICAL)

But it also didn't help.
There are still messages from e.g. WARN SparkSession$Builder, WARN ApacheUtils, I tensorflow/core/platform/cpu_feature_guard.cc:142], etc...

spark nlu load error

Am trying to explore NLU models first and then the NLU Healthcare models.
nlu.load('emotion') step is failing. Attached the logs.

OS – Linux RHEL
Pyspark – version 3.0.1
Command used for install - python3 -m pip install nlu pyspark==3.0.1 --trusted-host pypi.org --trusted-host files.pythonhosted.org
I have created a python venv and install the NLU per above command.

I also tried reinstalling with below command:
python3 -m pip install --upgrade nlu streamlit pyspark==3.0.2

Code below:
import nlu
pp=nlu.load('emotion')
classifierdl_use_emotion download started this may take some time.
Approximate size to download 21.3 MB
[ / ]
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: java.lang.NoClassDefFoundError: org/tensorflow/Tensor
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:397)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:145)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:120)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLModel$.readTensorflowModel(ClassifierDLModel.scala:291)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.readTensorflow(ClassifierDLModel.scala:278)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.readTensorflow$(ClassifierDLModel.scala:276)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLModel$.readTensorflow(ClassifierDLModel.scala:291)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.$anonfun$$init$$1(ClassifierDLModel.scala:285)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.$anonfun$$init$$1$adapted(ClassifierDLModel.scala:285)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:47)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:46)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:46)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:57)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:57)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:35)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:333)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:327)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:456)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.tensorflow.Tensor
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 34 more
[OK!]
EXCEPTION: Could not resolve singular Component for type=emotion and nlp_ref=classifierdl_use_emotion and nlu_ref=emotion and lang =en
Traceback (most recent call last):
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/nlu/pipe/component_resolution.py", line 852, in construct_component_from_identifier
is_licensed=is_licensed)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/nlu/components/classifier.py", line 69, in init
else : self.model = ClassifierDl.get_pretrained_model(nlp_ref, language)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/nlu/components/classifiers/classifier_dl/classifier_dl.py", line 11, in get_pretrained_model
return ClassifierDLModel.pretrained(name,language,bucket)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/sparknlp/annotator.py", line 8063, in pretrained
return ResourceDownloader.downloadModel(ClassifierDLModel, name, lang, remote_loc)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/sparknlp/pretrained.py", line 62, in downloadModel
raise e
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/sparknlp/pretrained.py", line 59, in downloadModel
j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply()
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/sparknlp/internal.py", line 214, in init
name, language, remote_loc)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/sparknlp/internal.py", line 165, in init
self._java_obj = self.new_java_obj(java_obj, *args)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/sparknlp/internal.py", line 175, in new_java_obj
return self._new_java_obj(java_class, *args)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/pyspark/ml/wrapper.py", line 69, in _new_java_obj
return java_obj(*java_args)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: java.lang.NoClassDefFoundError: org/tensorflow/Tensor
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:397)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:145)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:120)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLModel$.readTensorflowModel(ClassifierDLModel.scala:291)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.readTensorflow(ClassifierDLModel.scala:278)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.readTensorflow$(ClassifierDLModel.scala:276)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLModel$.readTensorflow(ClassifierDLModel.scala:291)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.$anonfun$$init$$1(ClassifierDLModel.scala:285)
at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadClassifierDLTensorflowModel.$anonfun$$init$$1$adapted(ClassifierDLModel.scala:285)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:47)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:46)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:46)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:57)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:57)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:35)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:333)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:327)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:456)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.tensorflow.Tensor
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 34 more


ValueError Traceback (most recent call last)
/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/nlu/init.py in load(request, path, verbose, gpu, streamlit_caching)
341 if nlu_ref == '': continue
--> 342 nlu_component = nlu_ref_to_component(nlu_ref, authenticated=is_authenticated)
343 # if we get a list of components, then the NLU reference is a pipeline, we do not need to check order

/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/nlu/pipe/component_resolution.py in nlu_ref_to_component(nlu_reference, detect_lang, authenticated, is_recursive_call)
322 authenticated=authenticated,
--> 323 is_recursive_call=is_recursive_call)
324 if resolved_component is None:

/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/nlu/pipe/component_resolution.py in resolve_component_from_parsed_query_data(lang, component_type, dataset, component_embeddings, nlu_ref, trainable, path, authenticated, is_recursive_call)
467 if constructed_component is None:
--> 468 raise ValueError(f'EXCEPTION : Could not create NLU component for nlp_ref={nlp_ref} and nlu_ref={nlu_ref}')
469 else:

ValueError: EXCEPTION : Could not create NLU component for nlp_ref=classifierdl_use_emotion and nlu_ref=emotion

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
in
----> 1 pp=nlu.load('emotion')

/apps/sparknlp/spark-nlu/lib64/python3.6/site-packages/nlu/init.py in load(request, path, verbose, gpu, streamlit_caching)
360 print(e[1])
361 raise Exception(
--> 362 "Something went wrong during loading and fitting the pipe. Check the other prints for more information and also verbose mode. Did you use a correct model reference?")
363
364

Exception: Something went wrong during loading and fitting the pipe. Check the other prints for more information and also verbose mode. Did you use a correct model reference?

`pyspark.sql.utils.IllegalArgumentException` on fresh install

  • Using Windows 10, same errors in Ubuntu WSL
  • Java version: openjdk version "1.8.0_282" (equivalent to JDK 8)
  • Installed with pip
  • in Python 3.6: >>> import nlu without errors
>>> nlu.load('tokenize').predict('Each word and symbol in a sentence will generate token.') # From the homepage
Ivy Default Cache set to: C:\Users\USERNAME\.ivy2\cache
The jars for the packages stored in: C:\Users\USERNAME\.ivy2\jars
:: loading settings :: url = jar:file:/C:/Users/USERNAME/.conda/envs/py36nlp/Lib/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.johnsnowlabs.nlp#spark-nlp_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-f4225ffb-68be-4e92-a6fc-c8cf6d7928e2;1.0
        confs: [default]
        found com.johnsnowlabs.nlp#spark-nlp_2.11;2.7.5 in central
        found com.typesafe#config;1.3.0 in central
        found org.rocksdb#rocksdbjni;6.5.3 in central
        found com.amazonaws#aws-java-sdk;1.7.4 in central
        found commons-logging#commons-logging;1.1.1 in central
        found org.apache.httpcomponents#httpclient;4.2 in central
        found org.apache.httpcomponents#httpcore;4.2 in central
        found commons-codec#commons-codec;1.3 in central
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.ivy.util.url.IvyAuthenticator (file:/C:/Users/USERNAME/.conda/envs/py36nlp/Lib/site-packages/pyspark/jars/ivy-2.4.0.jar) to field java.net.Authenticator.theAuthenticator
WARNING: Please consider reporting this to the maintainers of org.apache.ivy.util.url.IvyAuthenticator
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
        found joda-time#joda-time;2.10.10 in central
        [2.10.10] joda-time#joda-time;[2.2,)
        found com.github.universal-automata#liblevenshtein;3.0.0 in central
        found com.google.code.findbugs#annotations;3.0.1 in central
        found net.jcip#jcip-annotations;1.0 in central
        found com.google.code.findbugs#jsr305;3.0.1 in central
        found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
        found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
        found com.google.code.gson#gson;2.3 in central
        found it.unimi.dsi#fastutil;7.0.12 in central
        found org.projectlombok#lombok;1.16.8 in central
        found org.slf4j#slf4j-api;1.7.21 in central
        found com.navigamez#greex;1.0 in central
        found dk.brics.automaton#automaton;1.11-8 in central
        found org.json4s#json4s-ext_2.11;3.5.3 in central
        found org.joda#joda-convert;1.8.1 in central
        found org.tensorflow#tensorflow;1.15.0 in central
        found org.tensorflow#libtensorflow;1.15.0 in central
        found org.tensorflow#libtensorflow_jni;1.15.0 in central
        found net.sf.trove4j#trove4j;3.0.3 in central
:: resolution report :: resolve 1184ms :: artifacts dl 28ms
        :: modules in use:
        com.amazonaws#aws-java-sdk;1.7.4 from central in [default]
        com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
        com.google.code.findbugs#annotations;3.0.1 from central in [default]
        com.google.code.findbugs#jsr305;3.0.1 from central in [default]
        com.google.code.gson#gson;2.3 from central in [default]
        com.google.protobuf#protobuf-java;3.0.0-beta-3 from central in [default]
        com.google.protobuf#protobuf-java-util;3.0.0-beta-3 from central in [default]
        com.johnsnowlabs.nlp#spark-nlp_2.11;2.7.5 from central in [default]
        com.navigamez#greex;1.0 from central in [default]
        com.typesafe#config;1.3.0 from central in [default]
        commons-codec#commons-codec;1.3 from central in [default]
        commons-logging#commons-logging;1.1.1 from central in [default]
        dk.brics.automaton#automaton;1.11-8 from central in [default]
        it.unimi.dsi#fastutil;7.0.12 from central in [default]
        joda-time#joda-time;2.10.10 from central in [default]
        net.jcip#jcip-annotations;1.0 from central in [default]
        net.sf.trove4j#trove4j;3.0.3 from central in [default]
        org.apache.httpcomponents#httpclient;4.2 from central in [default]
        org.apache.httpcomponents#httpcore;4.2 from central in [default]
        org.joda#joda-convert;1.8.1 from central in [default]
        org.json4s#json4s-ext_2.11;3.5.3 from central in [default]
        org.projectlombok#lombok;1.16.8 from central in [default]
        org.rocksdb#rocksdbjni;6.5.3 from central in [default]
        org.slf4j#slf4j-api;1.7.21 from central in [default]
        org.tensorflow#libtensorflow;1.15.0 from central in [default]
        org.tensorflow#libtensorflow_jni;1.15.0 from central in [default]
        org.tensorflow#tensorflow;1.15.0 from central in [default]
        :: evicted modules:
        commons-codec#commons-codec;1.6 by [commons-codec#commons-codec;1.3] in [default]
        joda-time#joda-time;2.9.5 by [joda-time#joda-time;2.10.10] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   29  |   1   |   0   |   2   ||   27  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-f4225ffb-68be-4e92-a6fc-c8cf6d7928e2
        confs: [default]
        0 artifacts copied, 27 already retrieved (0kB/17ms)
21/03/08 16:01:46 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2823)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2818)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$$resolveGlobPath(DependencyUtils.scala:191)
        at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147)
        at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$3.apply(SparkSubmit.scala:343)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$3.apply(SparkSubmit.scala:343)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/03/08 16:01:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
No accepted Data type or usable columns found or applying the NLU models failed.
Make sure that the first column you pass to .predict() is the one that nlu should predict on OR rename the column you want to predict on to 'text'
If you are on Google Collab, click on Run time and try factory reset Runtime run the setup script again, you might have used too much memory
On Kaggle try to reset restart session and run the setup script again, you might have used too much memory
Full Stacktrace: see bottom
Additional info:
<class 'pyspark.sql.utils.IllegalArgumentException'> pipeline.py 1380
Stuck? Contact us on Slack! https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA

Same errors occure when running nlu.load('tokenize').predict('Each word and symbol in a sentence will generate token.')
Full stack trace:

Full Stacktrace was (<class 'pyspark.sql.utils.IllegalArgumentException'>, IllegalArgumentException('Unsupported class file major version 55', 'org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
         at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
         at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
         at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
         at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:50)
         at org.apache.spark.util.FieldAccessFinder$$anon$4$$anonfun$visitMethodInsn$7.apply(ClosureCleaner.scala:845)
         at org.apache.spark.util.FieldAccessFinder$$anon$4$$anonfun$visitMethodInsn$7.apply(ClosureCleaner.scala:828)
         at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
         at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
         at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
         at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
         at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
         at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
         at org.apache.spark.util.FieldAccessFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:828)
         at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
         at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
         at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
         at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
         at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:272)
         at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:271)
         at scala.collection.immutable.List.foreach(List.scala:392)
         at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:271)
         at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
         at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:820)
         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:819)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
         at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
         at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:819)
         at org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
         at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:43)
         at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.GenerateExec.doExecute(GenerateExec.scala:80)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
         at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:43)
         at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
         at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
         at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3263)
         at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3260)
         at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
         at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
         at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
         at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
         at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3369)
         at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3260)
         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
         at py4j.Gateway.invoke(Gateway.java:282)
         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
         at py4j.commands.CallCommand.execute(CallCommand.java:79)
         at py4j.GatewayConnection.run(GatewayConnection.java:238)
         at java.base/java.lang.Thread.run(Thread.java:834)'), <traceback object at 0x0000024A29857188>)

DataFrame problem with pyspark and pandas interaction

When executing the following code, an error occurs

from johnsnowlabs import nlp

pipeline = nlp.load('sentiment')
pipeline.predict("I love this Documentation! It's so good!")
...
Approximate size to download 354.6 KB
Download done! Loading the resource.
[OK!]
Warning::Spark Session already created, some configs may not take.
Traceback (most recent call last):
  File "/home/user/Documents/test/nlu/test_maen.py", line 8, in <module>
    pipeline.predict("I love this Documentation! It's so good!")
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/nlu/pipe/pipeline.py", line 468, in predict
    return __predict__(self, data, output_level, positions, keep_stranger_features, metadata, multithread,
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/nlu/pipe/utils/predict_helper.py", line 166, in __predict__
    pipe.fit()
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/nlu/pipe/pipeline.py", line 202, in fit
    self.vanilla_transformer_pipe = self.spark_estimator_pipe.fit(self.get_sample_spark_dataframe())
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/nlu/pipe/pipeline.py", line 101, in get_sample_spark_dataframe
    return sparknlp.start().createDataFrame(data=text_df)
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/pyspark/sql/session.py", line 603, in createDataFrame
    return super(SparkSession, self).createDataFrame(
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/pyspark/sql/pandas/conversion.py", line 299, in createDataFrame
    data = self._convert_from_pandas(data, schema, timezone)
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/pyspark/sql/pandas/conversion.py", line 327, in _convert_from_pandas
    for column, series in pdf.iteritems():
  File "/home/user/Documents/test/nlu/.venv/lib64/python3.10/site-packages/pandas/core/generic.py", line 6202, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'iteritems'. Did you mean: 'isetitem'?

There is a solution for this error on stackoverflow
Maybe you should specify the right version in the dependencies of the johnsnowlabs module?
For example pandas >= 1.3.5, < 2

Platform - Fedora Linux 36

openjdk version "11.0.19" 2023-04-18
OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-2.fc36) (build 11.0.19+7)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-2.fc36) (build 11.0.19+7, mixed mode, sharing)

Error when loading match.datetime component

import nlu
nlu.load('match.datetime').predict('In the years 2000/01/01 to 2010/01/01 a lot of things happened')

Running it in colab pip install nlu pyspark==3.0.2
Get this Error:
Exception: Something went wrong during loading and fitting the pipe. Check the other prints for more information and also verbose mode. Did you use a correct model reference?

Error while trying to load nlu.load('embed_sentence.bert')

I am trying to create sentence similarity model using Spark_nlp, but i am getting the below two different errors.

sent_small_bert_L2_128 download started this may take some time.
Approximate size to download 16.1 MB
[OK!]

IllegalArgumentException Traceback (most recent call last)
File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\nlu\pipe\component_resolution.py:276, in get_trained_component_for_nlp_model_ref(lang, nlu_ref, nlp_ref, license_type, model_configs)
274 if component.get_pretrained_model:
275 component = component.set_metadata(
--> 276 component.get_pretrained_model(nlp_ref, lang, model_bucket),
277 nlu_ref, nlp_ref, lang, False, license_type)
278 else:

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\nlu\components\embeddings\sentence_bert\BertSentenceEmbedding.py:13, in BertSentence.get_pretrained_model(name, language, bucket)
11 @staticmethod
12 def get_pretrained_model(name, language, bucket=None):
---> 13 return BertSentenceEmbeddings.pretrained(name,language,bucket)
14 .setInputCols('sentence')
15 .setOutputCol("sentence_embeddings")

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\sparknlp\annotator\embeddings\bert_sentence_embeddings.py:231, in BertSentenceEmbeddings.pretrained(name, lang, remote_loc)
230 from sparknlp.pretrained import ResourceDownloader
--> 231 return ResourceDownloader.downloadModel(BertSentenceEmbeddings, name, lang, remote_loc)

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\sparknlp\pretrained\resource_downloader.py:40, in ResourceDownloader.downloadModel(reader, name, language, remote_loc, j_dwn)
39 try:
---> 40 j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply()
41 except Py4JJavaError as e:

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\sparknlp\internal_init_.py:317, in _DownloadModel.init(self, reader, name, language, remote_loc, validator)
316 def init(self, reader, name, language, remote_loc, validator):
--> 317 super(_DownloadModel, self).init("com.johnsnowlabs.nlp.pretrained." + validator + ".downloadModel", reader,
318 name, language, remote_loc)

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\sparknlp\internal\extended_java_wrapper.py:26, in ExtendedJavaWrapper.init(self, java_obj, *args)
25 self.sc = SparkContext._active_spark_context
---> 26 self._java_obj = self.new_java_obj(java_obj, *args)
27 self.java_obj = self._java_obj

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\sparknlp\internal\extended_java_wrapper.py:36, in ExtendedJavaWrapper.new_java_obj(self, java_class, *args)
35 def new_java_obj(self, java_class, *args):
---> 36 return self._new_java_obj(java_class, *args)

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\pyspark\ml\wrapper.py:69, in JavaWrapper._new_java_obj(java_class, *args)
68 java_args = [_py2java(sc, arg) for arg in args]
---> 69 return java_obj(*java_args)

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\py4j\java_gateway.py:1304, in JavaMember.call(self, *args)
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1307 for temp_arg in temp_args:

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\pyspark\sql\utils.py:134, in capture_sql_exception..deco(*a, **kw)
131 if not isinstance(converted, UnknownException):
132 # Hide where the exception came from that shows a non-Pythonic
133 # JVM exception message.
--> 134 raise_from(converted)
135 else:

File :3, in raise_from(e)

IllegalArgumentException: requirement failed: Was not found appropriate resource to download for request: ResourceRequest(sent_small_bert_L2_128,Some(en),public/models,4.0.2,3.3.0) with downloader: com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader@c7c973f

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\nlu_init_.py:234, in load(request, path, verbose, gpu, streamlit_caching, m1_chip)
233 continue
--> 234 nlu_component = nlu_ref_to_component(nlu_ref)
235 # if we get a list of components, then the NLU reference is a pipeline, we do not need to check order

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\nlu\pipe\component_resolution.py:160, in nlu_ref_to_component(nlu_ref, detect_lang, authenticated)
159 else:
--> 160 resolved_component = get_trained_component_for_nlp_model_ref(lang, nlu_ref, nlp_ref, license_type, model_params)
162 if resolved_component is None:

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\nlu\pipe\component_resolution.py:287, in get_trained_component_for_nlp_model_ref(lang, nlu_ref, nlp_ref, license_type, model_configs)
286 except Exception as e:
--> 287 raise ValueError(f'Failure making component, nlp_ref={nlp_ref}, nlu_ref={nlu_ref}, lang={lang}, \n err={e}')
289 return component

ValueError: Failure making component, nlp_ref=sent_small_bert_L2_128, nlu_ref=embed_sentence.bert, lang=en,
err=requirement failed: Was not found appropriate resource to download for request: ResourceRequest(sent_small_bert_L2_128,Some(en),public/models,4.0.2,3.3.0) with downloader: com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader@c7c973f

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
Cell In [16], line 2
1 import nlu
----> 2 pipe = nlu.load('embed_sentence.bert')
3 print("pipe",pipe)

File c:\users\ramesar2\appdata\local\programs\python\python38\lib\site-packages\nlu_init_.py:249, in load(request, path, verbose, gpu, streamlit_caching, m1_chip)
247 print(e[1])
248 print(err)
--> 249 raise Exception(
250 f"Something went wrong during creating the Spark NLP model_anno_obj for your request = {request} Did you use a NLU Spell?")
251 # Complete Spark NLP Pipeline, which is defined as a DAG given by the starting Annotators
252 try:

Exception: Something went wrong during creating the Spark NLP model_anno_obj for your request = embed_sentence.bert Did you use a NLU Spell?

Elmo Not work

I install all package and run examples
but Elmo not work
image
or
image

please help me!

Unable to load en.ner.dl.bert

I have to following code:

documents = ["Open my files on oceans.", "Open my presentation on oceans.", "open my presentation on week 6 day 3"]
nlu_model = nlu.load('en.ner.dl.bert')
nlu_model.predict(documents, output_level='token') 

The nlu.load('en.ner.dl.bert') part causes an error that I am not sure how to fix:

ner_dl_bert download started this may take some time.
Approximate size to download 15.4 MB
[OK!]
pos_anc download started this may take some time.
Approximate size to download 4.3 MB
[OK!]
bert_base_cased download started this may take some time.
Approximate size to download 389.1 MB
[OK!]
<class 'AttributeError'>
'NoneType' object has no attribute '__set_missing_model_attributes__'
Something went wrong during loading and fitting the pipe. Check the other prints for more information and also verbose mode. Did you use a correct model reference?
The NLU components could not be properly created. Please check previous print messages and Verbose mode for further info

My environment:
ubuntu 20.10
python 3.7.9
pyspark 2.4.7
spark-nlp 2.6.5

I appreciate your help

Embed Japanese Sentences with Bert

Hi, thanks for such convenience tool! I would like to ask authors, does this tool supply 'Embed Japanese Sentences with Bert' ? Thank you

combining 'sentiment' and 'emotion' models causes crash

I'm working in a Google Colab notebook and I set up via

!wget http://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash

import nlu

a quick version check nlu.version() confirms 3.4.2

Several of the official tutorial notebooks (for ex.: XLNet)) create a multi-model pipeline that includes both 'sentiment' and 'emotion'.

Direct copy of content from the notebook:

import pandas as pd

# Download the dataset 
!wget -N https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sarcasm/train-balanced-sarcasm.csv -P /tmp

# Load dataset to Pandas
df = pd.read_csv('/tmp/train-balanced-sarcasm.csv')

pipe = nlu.load('sentiment pos xlnet emotion') 

df['text'] = df['comment']

max_rows = 200

predictions = pipe.predict(df.iloc[0:100][['comment','label']], output_level='token')

predictions

However, running a prediction on this pipe results in the following error:


sentimentdl_glove_imdb download started this may take some time.
Approximate size to download 8.7 MB
[OK!]
pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[OK!]
xlnet_base_cased download started this may take some time.
Approximate size to download 417.5 MB
[OK!]
classifierdl_use_emotion download started this may take some time.
Approximate size to download 21.3 MB
[OK!]
glove_100d download started this may take some time.
Approximate size to download 145.3 MB
[OK!]
tfhub_use download started this may take some time.
Approximate size to download 923.7 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<ipython-input-1-9b2e4a06bf65> in <module>()
     34 
     35 # NLU to gives us one row per embedded word by specifying the output level
---> 36 predictions = pipe.predict( df.iloc[0:5][['text','label']], output_level='token' )
     37 
     38 display(predictions)

9 frames
/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py in raise_from(e)

IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SentimentDLModel_6c1a68f3f2c7.

Current inputCols: sentence_embeddings@glove_100d. Dataset's columns:
(column_name=text,is_nlp_annotator=false)
(column_name=document,is_nlp_annotator=true,type=document)
(column_name=sentence,is_nlp_annotator=true,type=document)
(column_name=sentence_embeddings@tfhub_use,is_nlp_annotator=true,type=sentence_embeddings).
Make sure such annotators exist in your pipeline, with the right output names and that they have following annotator types: sentence_embeddings

Having experimented with various combinations of models, it turns out that the problem is caused whenever 'sentiment' and 'emotion' models are specified in the same pipeline (regardless of pipeline order or what other models are listed).

Running pipe = nlu.load('emotion ANY OTHER MODELS') or pipe = nlu.load('sentiment ANY OTHER MODELS') will be successful, so it really appears to be only a result of combining 'sentiment' and 'emotion'

Is this a known bug? Does anyone have any suggestions for fixing?

My temporary solution has been to run emoPipe = nlu.load('emotion').predict() in isolation, then inner join the resulting dataframe to the the resulting df of pipe = nlu.load('sentiment pos xlnet').predict().

However, I would like to understand better what is failing and to know if there is a way to streamline the inclusion of all models.

Thanks

Issue with nlu.load('sentiment')

I'm trying to follow the example at nlu/examples/colab/component_examples/sequence2sequence/translation_demo.ipynb but I keep on getting this error when nlu.load('sentiment').

My code:

import nlu 
nlu.load('sentiment').predict('I love NLU! <3') 

My error:

analyze_sentiment download started this may take some time.
Approx size to download 4.9 MB
[OK!]
<class 'pyspark.sql.utils.IllegalArgumentException'>
'Unsupported class file major version 55'
Something went wrong during loading and fitting the pipe. Check the other prints for more information and also verbose mode. Did you use a correct model reference?
<nlu.NluError at 0x7f046d214be0>

Java problems when using the library

Hello, I followed all the installation steps in the documentation, but it was not enough to get the library working.

Then I had to install the JDK, specify the interpreter and the path to the JDK

import os
from johnsnowlabs import nlp

os.environ["PYSPARK_DRIVER_PYTHON"] = "D:\\myproject\\nlp_command\\.venv\\Scripts"
os.environ["JAVA_HOME"] = "C:\\Program Files\\Java\\jdk-20"

pipeline = nlp.load('sentiment')
# pipeline.predict("I love this Documentation! It's so good!")

But I'm still getting the "Java gateway process exited before sending its port number" error.

Platform - windows 10

java version "20.0.2" 2023-07-18
Java(TM) SE Runtime Environment (build 20.0.2+9-78)
Java HotSpot(TM) 64-Bit Server VM (build 20.0.2+9-78, mixed mode, sharing)

How to set the batch size?

Hi,

The prediction process takes a long time to finish so I check the GPU memory usage and find out it only uses 3GB memory ( I have 16GB memory GPU).
I want to set a larger batch size to speed up the process but I can't find the argument.
How to set the batch size when using the predict function?

import nlu
pipe = nlu.load('xx.embed_sentence.labse', gpu=True)
pipe.pipe.predict(text, output_level='document')

Thanks

error while download hebrewner

I am running the official docker image of nlp-server and trying to make ner on hebrew sentence but it failed to download the model, also i trying to download the model manually and it say to access denied

load error

I get the following error when trying the following:

import nlu
nlu.load('elmo')

using configuration:
OS: Windows 10
Java version: 1.8.0_311 (Java 8)
Pyspark – version: 3.1.2

:: loading settings :: url = jar:file:/C:/Spark/spark-3.2.0-bin-hadoop3.2/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: C:\Users\Lukas.ivy2\cache
The jars for the packages stored in: C:\Users\Lukas.ivy2\jars
com.johnsnowlabs.nlp#spark-nlp_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-f9a2f2a7-e7ac-44f5-a922-ae1493621cbc;1.0
confs: [default]
found com.johnsnowlabs.nlp#spark-nlp_2.12;3.3.4 in central
found com.typesafe#config;1.4.1 in central
found org.rocksdb#rocksdbjni;6.5.3 in central
found com.amazonaws#aws-java-sdk-bundle;1.11.603 in central
found com.github.universal-automata#liblevenshtein;3.0.0 in central
found com.google.code.findbugs#annotations;3.0.1 in central
found net.jcip#jcip-annotations;1.0 in central
found com.google.code.findbugs#jsr305;3.0.1 in central
found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
found com.google.code.gson#gson;2.3 in central
found it.unimi.dsi#fastutil;7.0.12 in central
found org.projectlombok#lombok;1.16.8 in central
found org.slf4j#slf4j-api;1.7.21 in central
found com.navigamez#greex;1.0 in central
found dk.brics.automaton#automaton;1.11-8 in central
found org.json4s#json4s-ext_2.12;3.5.3 in central
found joda-time#joda-time;2.9.5 in central
found org.joda#joda-convert;1.8.1 in central
found com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.3.3 in central
found net.sf.trove4j#trove4j;3.0.3 in central
:: resolution report :: resolve 391ms :: artifacts dl 16ms
:: modules in use:
com.amazonaws#aws-java-sdk-bundle;1.11.603 from central in [default]
com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
com.google.code.findbugs#annotations;3.0.1 from central in [default]
com.google.code.findbugs#jsr305;3.0.1 from central in [default]
com.google.code.gson#gson;2.3 from central in [default]
com.google.protobuf#protobuf-java;3.0.0-beta-3 from central in [default]
com.google.protobuf#protobuf-java-util;3.0.0-beta-3 from central in [default]
com.johnsnowlabs.nlp#spark-nlp_2.12;3.3.4 from central in [default]
com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.3.3 from central in [default]
com.navigamez#greex;1.0 from central in [default]
com.typesafe#config;1.4.1 from central in [default]
dk.brics.automaton#automaton;1.11-8 from central in [default]
it.unimi.dsi#fastutil;7.0.12 from central in [default]
joda-time#joda-time;2.9.5 from central in [default]
net.jcip#jcip-annotations;1.0 from central in [default]
net.sf.trove4j#trove4j;3.0.3 from central in [default]
org.joda#joda-convert;1.8.1 from central in [default]
org.json4s#json4s-ext_2.12;3.5.3 from central in [default]
org.projectlombok#lombok;1.16.8 from central in [default]
org.rocksdb#rocksdbjni;6.5.3 from central in [default]
org.slf4j#slf4j-api;1.7.21 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 21 | 0 | 0 | 0 || 21 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-f9a2f2a7-e7ac-44f5-a922-ae1493621cbc
confs: [default]
0 artifacts copied, 21 already retrieved (0kB/0ms)
22/01/14 17:30:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
elmo download started this may take some time.
22/01/14 17:31:05 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
EXCEPTION: Could not resolve singular Component for type=elmo and nlp_ref=elmo and nlu_ref=elmo and lang =en
Traceback (most recent call last):
File "D:.venv\python3.8_nlu\lib\site-packages\nlu\pipe\component_resolution.py", line 708, in construct_component_from_identifier
return Embeddings(get_default=False, nlp_ref=nlp_ref, nlu_ref=nlu_ref, lang=language,
File "D:.venv\python3.8_nlu\lib\site-packages\nlu\components\embedding.py", line 98, in init
else : self.model =SparkNLPElmo.get_pretrained_model(nlp_ref, lang)
File "D:.venv\python3.8_nlu\lib\site-packages\nlu\components\embeddings\elmo\spark_nlp_elmo.py", line 14, in get_pretrained_model
return ElmoEmbeddings.pretrained(name,language)
File "D:.venv\python3.8_nlu\lib\site-packages\sparknlp\annotator.py", line 7760, in pretrained
return ResourceDownloader.downloadModel(ElmoEmbeddings, name, lang, remote_loc)
File "D:.venv\python3.8_nlu\lib\site-packages\sparknlp\pretrained.py", line 50, in downloadModel
file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
File "D:.venv\python3.8_nlu\lib\site-packages\sparknlp\internal.py", line 231, in init
super(_GetResourceSize, self).init(
File "D:.venv\python3.8_nlu\lib\site-packages\sparknlp\internal.py", line 165, in init
self._java_obj = self.new_java_obj(java_obj, *args)
File "D:.venv\python3.8_nlu\lib\site-packages\sparknlp\internal.py", line 175, in new_java_obj
return self._new_java_obj(java_class, *args)
File "D:.venv\python3.8_nlu\lib\site-packages\pyspark\ml\wrapper.py", line 66, in _new_java_obj
return java_obj(*java_args)
File "D:.venv\python3.8_nlu\lib\site-packages\py4j\java_gateway.py", line 1304, in call
return_value = get_return_value(
File "D:.venv\python3.8_nlu\lib\site-packages\pyspark\sql\utils.py", line 111, in deco
return f(*a, **kw)
File "D:.venv\python3.8_nlu\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: java.lang.NoClassDefFoundError: org/json4s/package$MappingException
at org.json4s.ext.EnumNameSerializer.deserialize(EnumSerializer.scala:53)
at org.json4s.Formats$$anonfun$customDeserializer$1.applyOrElse(Formats.scala:66)
at org.json4s.Formats$$anonfun$customDeserializer$1.applyOrElse(Formats.scala:66)
at scala.collection.TraversableOnce.collectFirst(TraversableOnce.scala:180)
at scala.collection.TraversableOnce.collectFirst$(TraversableOnce.scala:167)
at scala.collection.AbstractTraversable.collectFirst(Traversable.scala:108)
at org.json4s.Formats$.customDeserializer(Formats.scala:66)
at org.json4s.Extraction$.customOrElse(Extraction.scala:775)
at org.json4s.Extraction$.extract(Extraction.scala:454)
at org.json4s.Extraction$.extract(Extraction.scala:56)
at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:22)
at com.johnsnowlabs.util.JsonParser$.parseObject(JsonParser.scala:28)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.parseJson(ResourceMetadata.scala:101)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$$anonfun$readResources$1.applyOrElse(ResourceMetadata.scala:129)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$$anonfun$readResources$1.applyOrElse(ResourceMetadata.scala:128)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at scala.collection.Iterator$$anon$13.next(Iterator.scala:593)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toList(TraversableOnce.scala:350)
at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:350)
at scala.collection.AbstractIterator.toList(Iterator.scala:1431)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.readResources(ResourceMetadata.scala:128)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.readResources(ResourceMetadata.scala:123)
at com.johnsnowlabs.client.aws.AWSGateway.getMetadata(AWSGateway.scala:78)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:68)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:145)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:445)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:577)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.json4s.package$MappingException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 51 more

Traceback (most recent call last):
File "D:.venv\python3.8_nlu\lib\site-packages\nlu_init_.py", line 236, in load
nlu_component = nlu_ref_to_component(nlu_ref, authenticated=is_authenticated)
File "D:.venv\python3.8_nlu\lib\site-packages\nlu\pipe\component_resolution.py", line 171, in nlu_ref_to_component
resolved_component = resolve_component_from_parsed_query_data(language, component_type, dataset,
File "D:.venv\python3.8_nlu\lib\site-packages\nlu\pipe\component_resolution.py", line 320, in resolve_component_from_parsed_query_data
raise ValueError(f'EXCEPTION : Could not create NLU component for nlp_ref={nlp_ref} and nlu_ref={nlu_ref}')
ValueError: EXCEPTION : Could not create NLU component for nlp_ref=elmo and nlu_ref=elmo

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "D:.venv\python3.8_nlu\lib\site-packages\nlu_init_.py", line 255, in load
raise Exception(
Exception: Something went wrong during loading and fitting the pipe. Check the other prints for more information and also verbose mode. Did you use a correct model reference?

Unsupported class file major version 58

Hi,
First thanks a lot for this nice work !
It seems that you used spark and the java version on tensorflow to accomplish that right (with python as wrapper)?

I tried to install the nlu package on my python 3.8 which for now doesn't work (and it's okay #11 ).
So I created a virtual environement with python 3.7.

Launching ipython, import nlu that works.

However when I try to do as the doc : nlu.load('sentiment').predict('Why is NLU is awesome? Because of the sauce!') (In fact just nlu.load('sentiment') is the reason of the crash)

It return an

<class 'pyspark.sql.utils.IllegalArgumentException'>
'Unsupported class file major version 58'

I'm using Archlinux kernel zen 5.9.1 on a new virtual env with only wheel and nlu installed on python 3.7.9

I have java 8 / 11 and 14 installed

nlu support on Python 3.8

On import nlu, looks like pyspark/cloudpickle.py is failing with:

TypeError: an interger is required (got type bytes) . On some research, I found this is an issue with running pysark on Python 3.8. I am not sure if this is the only cause, but if it is, i recommend placing a requirements for Python<3.8

Unable to pip install nlu (macOS BigSur, Python3.9)

Hello,
I was able to pip install nlu befiore upgrading macOS.
After the upgrade, I wanted to get a clean environment, and when I tried to install nlu again I got this error:

pablos-MBP:spark pablo$ pip install nlu
Defaulting to user installation because normal site-packages is not writeable
Collecting nlu
  Using cached nlu-1.0.2-py3-none-any.whl (150 kB)
Collecting pyarrow>=0.16.0
  Using cached pyarrow-1.0.1.tar.gz (1.3 MB)
  Installing build dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/local/opt/[email protected]/bin/python3.9 /Users/pablo/Library/Python/3.9/lib/python/site-packages/pip install --ignore-installed --no-user --prefix /private/var/folders/8j/lbsf0k851g391m73x6y10rsr0000gn/T/pip-build-env-xh65myma/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'cython >= 0.29' 'numpy==1.14.5; python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' setuptools setuptools_scm wheel
       cwd: None
  Complete output (4217 lines):

If you want I can share the 4217 lines of the complete error, prbably is the same error as the other ticket about compatibility with Python 3.8, in this case 3.9, so is really any 3.7+?

Remove the hard dependency on the pyspark

Right now, the nlu package has a hard dependency on the pyspark making it hard to use with Databricks runtime, or other compatible Spark runtime. Instead, this package should either rely on implicit dependency completely, or use something like findspark package, something like done here.

P.S. the spark-nlp package itself doesn't depend on the pyspark

error while using biobert PubMed PMC

Hi, I am totally interested in this NLU biobert library. its totally easy to implement yet understandable. However, I faced difficulties while to use this NLU biobert for my project. So I wanna run this code:

`import nlu

embeddings_df2 = nlu.load('en.embed.biobert.pubmed_pmc_base_cased', gpu=True).predict(df['text'], output_level='token')
embeddings_df2`

I am using google colab with GPU. After approximately 40 mins, its suddenly stopped and resulted the error

biobert_pubmed_pmc_base_cased download started this may take some time.
Approximate size to download 386.7 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Exception happened during processing of request from ('127.0.0.1', 40522)
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1207, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1033, in send_command
response = connection.send_command(command)
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1212, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "/usr/lib/python3.7/socketserver.py", line 316, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python3.7/socketserver.py", line 347, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python3.7/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.7/socketserver.py", line 720, in init
self.handle()
File "/usr/local/lib/python3.7/dist-packages/pyspark/accumulators.py", line 268, in handle
poll(accum_updates)
File "/usr/local/lib/python3.7/dist-packages/pyspark/accumulators.py", line 241, in poll
if func():
File "/usr/local/lib/python3.7/dist-packages/pyspark/accumulators.py", line 245, in accum_updates
num_updates = read_int(self.rfile)
File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 595, in read_int
raise EOFError
EOFError

ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:35473)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict
self.configure_light_pipe_usage(data.count(), multithread)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count
return int(self._jdf.count())
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 977, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1115, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
Exception occured
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict
self.configure_light_pipe_usage(data.count(), multithread)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count
return int(self._jdf.count())
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict
data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column)
TypeError: cannot unpack non-iterable NoneType object
Exception occured
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict
self.configure_light_pipe_usage(data.count(), multithread)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count
return int(self._jdf.count())
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict
data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column)
TypeError: cannot unpack non-iterable NoneType object
Exception occured
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict
self.configure_light_pipe_usage(data.count(), multithread)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count
return int(self._jdf.count())
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict
data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column)
TypeError: cannot unpack non-iterable NoneType object
Exception occured
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict
self.configure_light_pipe_usage(data.count(), multithread)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count
return int(self._jdf.count())
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict
data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column)
TypeError: cannot unpack non-iterable NoneType object
Exception occured
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict
self.configure_light_pipe_usage(data.count(), multithread)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count
return int(self._jdf.count())
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict
data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column)
TypeError: cannot unpack non-iterable NoneType object
ERROR:nlu:Exception occured
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict
self.configure_light_pipe_usage(data.count(), multithread)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count
return int(self._jdf.count())
File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict
data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column)
TypeError: cannot unpack non-iterable NoneType object
No accepted Data type or usable columns found or applying the NLU models failed.
Make sure that the first column you pass to .predict() is the one that nlu should predict on OR rename the column you want to predict on to 'text'
On try to reset restart Jupyter session and run the setup script again, you might have used too much memory
Full Stacktrace was (<class 'TypeError'>, TypeError('cannot unpack non-iterable NoneType object'), <traceback object at 0x7f4ed5dd60f0>)
Additional info:
<class 'TypeError'> pipeline.py 435
cannot unpack non-iterable NoneType object
Stuck? Contact us on Slack! https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA

I already tried 2-3 times. in my opinion, probably due to RAM exceeding. However, I already activated the GPU itself. Any solution for this? Thanks in advance.

predict() - pyspark IndexError on python 3.11.4

Python version: 3.11.4
pyspark version: 3.1.2

model.predict('I love NLU! <3')
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]

Warning::Spark Session already created, some configs may not take.
Traceback (most recent call last):
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/serializers.py", line 437, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 72, in dumps
    cp.dump(obj)
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 630, in reducer_override
    return self._function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 503, in _function_reduce
    return self._dynamic_function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 484, in _dynamic_function_reduce
    state = _function_getstate(func)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 156, in _function_getstate
    f_globals_ref = _extract_code_globals(func.__code__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle.py", line 236, in _extract_code_globals
    out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle.py", line 236, in <setcomp>
    out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}
                 ~~~~~^^^^^^^
IndexError: tuple index out of range
Traceback (most recent call last):
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/serializers.py", line 437, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 72, in dumps
    cp.dump(obj)
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 630, in reducer_override
    return self._function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 503, in _function_reduce
    return self._dynamic_function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 484, in _dynamic_function_reduce
    state = _function_getstate(func)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 156, in _function_getstate
    f_globals_ref = _extract_code_globals(func.__code__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle.py", line 236, in _extract_code_globals
    out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle.py", line 236, in <setcomp>
    out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}
                 ~~~~~^^^^^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users//nlu/nlu/pipe/pipeline.py", line 485, in predict
    return __predict__(self, data, output_level, positions, keep_stranger_features, metadata, multithread,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//nlu/nlu/pipe/utils/predict_helper.py", line 267, in __predict__
    pipe.fit()
  File "/Users//nlu/nlu/pipe/pipeline.py", line 204, in fit
    self.vanilla_transformer_pipe = self.spark_estimator_pipe.fit(self.get_sample_spark_dataframe())
                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//nlu/nlu/pipe/pipeline.py", line 103, in get_sample_spark_dataframe
    return sparknlp.start().createDataFrame(data=text_df)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/sql/session.py", line 673, in createDataFrame
    return super(SparkSession, self).createDataFrame(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/sql/pandas/conversion.py", line 300, in createDataFrame
    return self._create_dataframe(data, schema, samplingRatio, verifySchema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/sql/session.py", line 701, in _create_dataframe
    jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/rdd.py", line 2618, in _to_java_object_rdd
    return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True)
                                                ^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/rdd.py", line 2949, in _jrdd
    wrapped_func = _wrap_function(self.ctx, self.func, self._prev_jrdd_deserializer,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/rdd.py", line 2828, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/rdd.py", line 2814, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
                      ^^^^^^^^^^^^^^^^^^
  File "/Users//miniconda3/lib/python3.11/site-packages/pyspark/serializers.py", line 447, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range

using NLU-biobert for entity linking or word embedding

I just wanted to enquire how can one use this model for entity linking? I believe I did see some linking and pos-tagging but is there some documentation that shows matching words to it's meaning rather than just matching with similarity? I want to load a spark database and use the model to perform word embedding by meaning on the whole dataset and store the output in another data frame, also being able to measure its performance by various metrics.

Could not locate executable null\bin\winutils.exe

Hi, thanks for the package, I'm starting t explore it and it looks good so far!
I've juts faced some minor issues when trying to run it on my windows machine and thought about giving a heads up here in case someone finds similar problem.

First is you need to run your python as admin, because folders are created (in order to store downloaded models I presume) and it will cause errors if no permission is granted. Is there a way to choose where these models are downloaded to? This might help with that

Second, after the installations steps in the instructions I got this error when trying to run nlu:

Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

I found the answer to this problem here: https://stackoverflow.com/a/50430966/11483674

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.