Comments (5)
Can you try with more data? For some reason, I can run with 3 examples from the bosque dataset, but when I added more examples it crashes. Also, can be related to the data is written in portuguese?
That's interesting! This might be a bug. There is probably a character or a token it doesn't like, it shouldn't crash in my opinion and just skip that row/sentence.
Will assign this for further inspection.
from spark-nlp.
This seems great news! How can I install this fix?
@Arierref46 you just need to update to the latest version of spark-nlp==5.3.3
from spark-nlp.
I share some links here just in case
I am not sure about that data type, but I just tested a file that is like this:
# sent_id = weblog-juancole.com_juancole_20030911085700_ENG_20030911_085700-0022
# text = It should continue to be defanged.
1 It it PRON PRP Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 3 nsubj 3:nsubj|6:nsubj:xsubj _
2 should should AUX MD VerbForm=Fin 3 aux 3:aux _
3 continue continue VERB VB VerbForm=Inf 0 root 0:root _
4 to to PART TO _ 6 mark 6:mark _
5 be be AUX VB VerbForm=Inf 6 aux:pass 6:aux:pass _
6 defanged defange VERB VBN Tense=Past|VerbForm=Part|Voice=Pass 3 xcomp 3:xcomp SpaceAfter=No
7 . . PUNCT . _ 3 punct 3:punct _
# sent_id = weblog-blogspot.com_healingiraq_20040409053012_ENG_20040409_053012-0015
# text = So what happened?
1 So so ADV RB _ 3 advmod 3:advmod _
2 what what PRON WP PronType=Int 3 nsubj 3:nsubj _
3 happened happen VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 0 root 0:root SpaceAfter=No
4 ? ? PUNCT . _ 3 punct 3:punct _
# sent_id = weblog-typepad.com_ripples_20040407125600_ENG_20040407_125600-0055
# text = That too was stopped.
1 That that PRON DT Number=Sing|PronType=Dem 4 nsubj:pass 4:nsubj:pass _
2 too too ADV RB _ 4 advmod 4:advmod _
3 was be AUX VBD Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin 4 aux:pass 4:aux:pass _
4 stopped stop VERB VBN Tense=Past|VerbForm=Part|Voice=Pass 0 root 0:root SpaceAfter=No
5 . . PUNCT . _ 4 punct 4:punct _
- General example: https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/61cb48470ad75c7f33cb771a6a711a253ace62ee/Spark_NLP_Udemy_MOOC/Open_Source/12.01.DependencyParser_TypedDependencyParser.ipynb
- Docs for Dep parser: https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/dependency/dependency_parser/index.html#sparknlp.annotator.dependency.dependency_parser.DependencyParserApproach.setConllU
from spark-nlp.
Can you try with more data? For some reason, I can run with 3 examples from the bosque dataset, but when I added more examples it crashes. Also, can be related to the data is written in portuguese?
from spark-nlp.
This seems great news! How can I install this fix?
from spark-nlp.
Related Issues (20)
- Sparknlp returning different embedding for manual spark dataframe vs reading from file spark dataframe HOT 5
- SparkNLP Embeddings inference 3X slower than with pandas_udf HOT 3
- EntityRuler fails two basic tests HOT 3
- Show an error of 'GLIBC_2.27 not found' when pretrained model download in AWS EMR HOT 2
- Onnx models fail when saving transformer
- Hardcoded column name in DocumentSimilarityRanker annotator
- ERROR TorrentBroadcast: Store broadcast broadcast_5 fail, remove all pieces of the broadcast HOT 7
- Scala 2.13 support HOT 1
- org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] HOT 3
- When Attempting to loadSavedModel, I Encountered 'java.lang.Exception: Could Not Retrieve the SavedModelBundle + () HOT 16
- Importing models into Spark NLP in TensorFlow and ONNX formats
- MultiClassifierDLApproach not transforming every row of my dataset HOT 1
- An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel. : java.lang.UnsatisfiedLinkError: no jnitensorflow in java.library.path: /Users/alexc./Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:. HOT 1
- KMeans throws “Column features must be of type equal to one of the following types” HOT 1
- Cache mechanism is not working related to metadata.json in s3 HOT 3
- XLMRoberta embeddings not differentiating between different sentences
- It seems the model is downloaded every time the program starts - any way to cache? HOT 1
- NerDLModel don't load a pretrained NerDLAproach HOT 2
- BartTransformer - Import to SparkNLP HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-nlp.