GithubHelp home page GithubHelp logo

DependencyParserApproach throws "IllegalArgumentException: For input string: "_"" when training with CONLLU dataset about spark-nlp HOT 5 CLOSED

Arierref46 avatar Arierref46 commented on May 26, 2024
DependencyParserApproach throws "IllegalArgumentException: For input string: "_"" when training with CONLLU dataset

from spark-nlp.

Comments (5)

maziyarpanahi avatar maziyarpanahi commented on May 26, 2024 1

Can you try with more data? For some reason, I can run with 3 examples from the bosque dataset, but when I added more examples it crashes. Also, can be related to the data is written in portuguese?

That's interesting! This might be a bug. There is probably a character or a token it doesn't like, it shouldn't crash in my opinion and just skip that row/sentence.

Will assign this for further inspection.

from spark-nlp.

danilojsl avatar danilojsl commented on May 26, 2024 1

This seems great news! How can I install this fix?

@Arierref46 you just need to update to the latest version of spark-nlp==5.3.3

from spark-nlp.

maziyarpanahi avatar maziyarpanahi commented on May 26, 2024

I share some links here just in case

I am not sure about that data type, but I just tested a file that is like this:

# sent_id = weblog-juancole.com_juancole_20030911085700_ENG_20030911_085700-0022
# text = It should continue to be defanged.
1	It	it	PRON	PRP	Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs	3	nsubj	3:nsubj|6:nsubj:xsubj	_
2	should	should	AUX	MD	VerbForm=Fin	3	aux	3:aux	_
3	continue	continue	VERB	VB	VerbForm=Inf	0	root	0:root	_
4	to	to	PART	TO	_	6	mark	6:mark	_
5	be	be	AUX	VB	VerbForm=Inf	6	aux:pass	6:aux:pass	_
6	defanged	defange	VERB	VBN	Tense=Past|VerbForm=Part|Voice=Pass	3	xcomp	3:xcomp	SpaceAfter=No
7	.	.	PUNCT	.	_	3	punct	3:punct	_

# sent_id = weblog-blogspot.com_healingiraq_20040409053012_ENG_20040409_053012-0015
# text = So what happened?
1	So	so	ADV	RB	_	3	advmod	3:advmod	_
2	what	what	PRON	WP	PronType=Int	3	nsubj	3:nsubj	_
3	happened	happen	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	0	root	0:root	SpaceAfter=No
4	?	?	PUNCT	.	_	3	punct	3:punct	_

# sent_id = weblog-typepad.com_ripples_20040407125600_ENG_20040407_125600-0055
# text = That too was stopped.
1	That	that	PRON	DT	Number=Sing|PronType=Dem	4	nsubj:pass	4:nsubj:pass	_
2	too	too	ADV	RB	_	4	advmod	4:advmod	_
3	was	be	AUX	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	4	aux:pass	4:aux:pass	_
4	stopped	stop	VERB	VBN	Tense=Past|VerbForm=Part|Voice=Pass	0	root	0:root	SpaceAfter=No
5	.	.	PUNCT	.	_	4	punct	4:punct	_

from spark-nlp.

Arierref46 avatar Arierref46 commented on May 26, 2024

Can you try with more data? For some reason, I can run with 3 examples from the bosque dataset, but when I added more examples it crashes. Also, can be related to the data is written in portuguese?

from spark-nlp.

Arierref46 avatar Arierref46 commented on May 26, 2024

This seems great news! How can I install this fix?

from spark-nlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.