Is there an existing issue for this? <li class="

This seems great news! How can I install this fix? <p d

DependencyParserApproach throws "IllegalArgumentException: For input string: "_"" when training with CONLLU dataset about spark-nlp HOT 5 CLOSED

Arierref46 commented on May 26, 2024

DependencyParserApproach throws "IllegalArgumentException: For input string: "_"" when training with CONLLU dataset

from spark-nlp.

Comments (5)

maziyarpanahi commented on May 26, 2024 1

Can you try with more data? For some reason, I can run with 3 examples from the bosque dataset, but when I added more examples it crashes. Also, can be related to the data is written in portuguese?

That's interesting! This might be a bug. There is probably a character or a token it doesn't like, it shouldn't crash in my opinion and just skip that row/sentence.

Will assign this for further inspection.

from spark-nlp.

danilojsl commented on May 26, 2024 1

This seems great news! How can I install this fix?

@Arierref46 you just need to update to the latest version of spark-nlp==5.3.3

from spark-nlp.

maziyarpanahi commented on May 26, 2024

I share some links here just in case

I am not sure about that data type, but I just tested a file that is like this:

# sent_id = weblog-juancole.com_juancole_20030911085700_ENG_20030911_085700-0022
# text = It should continue to be defanged.
1	It	it	PRON	PRP	Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs	3	nsubj	3:nsubj|6:nsubj:xsubj	_
2	should	should	AUX	MD	VerbForm=Fin	3	aux	3:aux	_
3	continue	continue	VERB	VB	VerbForm=Inf	0	root	0:root	_
4	to	to	PART	TO	_	6	mark	6:mark	_
5	be	be	AUX	VB	VerbForm=Inf	6	aux:pass	6:aux:pass	_
6	defanged	defange	VERB	VBN	Tense=Past|VerbForm=Part|Voice=Pass	3	xcomp	3:xcomp	SpaceAfter=No
7	.	.	PUNCT	.	_	3	punct	3:punct	_

# sent_id = weblog-blogspot.com_healingiraq_20040409053012_ENG_20040409_053012-0015
# text = So what happened?
1	So	so	ADV	RB	_	3	advmod	3:advmod	_
2	what	what	PRON	WP	PronType=Int	3	nsubj	3:nsubj	_
3	happened	happen	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	0	root	0:root	SpaceAfter=No
4	?	?	PUNCT	.	_	3	punct	3:punct	_

# sent_id = weblog-typepad.com_ripples_20040407125600_ENG_20040407_125600-0055
# text = That too was stopped.
1	That	that	PRON	DT	Number=Sing|PronType=Dem	4	nsubj:pass	4:nsubj:pass	_
2	too	too	ADV	RB	_	4	advmod	4:advmod	_
3	was	be	AUX	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	4	aux:pass	4:aux:pass	_
4	stopped	stop	VERB	VBN	Tense=Past|VerbForm=Part|Voice=Pass	0	root	0:root	SpaceAfter=No
5	.	.	PUNCT	.	_	4	punct	4:punct	_

from spark-nlp.

Arierref46 commented on May 26, 2024

Can you try with more data? For some reason, I can run with 3 examples from the bosque dataset, but when I added more examples it crashes. Also, can be related to the data is written in portuguese?

from spark-nlp.

Arierref46 commented on May 26, 2024

This seems great news! How can I install this fix?

from spark-nlp.

Recommend Projects