GithubHelp home page GithubHelp logo

Comments (6)

kleag avatar kleag commented on July 22, 2024

You're right. It is completely wrong. But there is not enough information to debug. Could you try the command line version (analyzeText), please and paste (or attach) here both the text and all the console output?

from lima.

kleag avatar kleag commented on July 22, 2024

As you can see, the results are quite better under Linux:

# sent_id = 1
# text = February 23 - A revolt against the government of King Joseph I of Portugal takes place in the city of Oporto.
1	February	February	PROPN	_	NUMBER=SING	_	_	_	NE=DateTime.DATE|Pos=1|Len=8
2	23	23	NUM	_	_	_	_	_	NE=DateTime.DATE|Pos=10|Len=2
3	-	-	COLON	_	_	3	Dummy	_	Pos=13|Len=1
4	A	a	DET	_	_	4	det	_	Pos=15|Len=1
5	revolt	revolt	NOUN	_	NUMBER=SING	13	SUJ_V	_	Pos=17|Len=6
6	against	against	ADP	_	_	7	PREPSUB	_	Pos=24|Len=7
7	the	the	DET	_	_	7	det	_	Pos=32|Len=3
8	government	government	NOUN	_	NUMBER=SING	4	COMPDUNOM	_	Pos=36|Len=10
9	of	of	ADP	_	_	10	PREPSUB	_	Pos=47|Len=2
10	King	king	NOUN	_	NUMBER=SING	10	ADJPRENSUB	_	Pos=50|Len=4
11	Joseph	Joseph	PROPN	_	NUMBER=SING	_	_	_	NE=Person.PERSON|Pos=55|Len=6
12	I	I	PRON	_	_	_	_	_	NE=Person.PERSON|Pos=62|Len=1
13-14	joseph	_	_	_	_	_	_	_	_
13	of	of	ADP	_	_	12	PREPSUB	_	Pos=64|Len=2
14	Portugal	Portugal	PROPN	_	NUMBER=SING	_	_	_	NE=Location.LOCATION|Pos=67|Len=8
15	takes	take	VERB	_	_	0	_	_	Pos=76|Len=5
16	place	place	NOUN	_	NUMBER=SING	13	COD_V	_	Pos=82|Len=5
17	in	in	ADP	_	_	17	PREPSUB	_	Pos=88|Len=2
18	the	the	DET	_	_	17	det	_	Pos=91|Len=3
19	city	city	NOUN	_	NUMBER=SING	14	COMPDUNOM	_	Pos=95|Len=4
20	of	of	ADP	_	_	19	PREPSUB	_	Pos=100|Len=2
21	Oporto	Oporto	PROPN	_	NUMBER=SING	_	_	_	NE=Location.LOCATION|Pos=103|Len=6
22	.	.	SENT	_	_	0	_	_	Pos=109|Len=1

We need more information to understand what happens under Windows.

from lima.

ebarbot avatar ebarbot commented on July 22, 2024

I get this, I don't know if I am supposed to set something to print more logs ?

test_josephI_output

H:\test_lima_windows>analyzeText -l eng joseph_I.txt
Analyzing 1/1 (100.00%) 'joseph_I.txt'# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = 1
# text = February 23 - A revolt against the government of King Joseph I of Portugal takes place in the city of Oporto.
1 February February PROPN _ NUMBER=SING _ _ _ NE=DateTime.DATE|Pos=1|Len=8
2 23 23 NUM _ _ _ _ _ NE=DateTime.DATE|Pos=10|Len=2
3 - - COMMA _ _ 3 Dummy _ Pos=13|Len=1
4 A A PROPN _ NUMBER=SING 4 ADJPRENSUB _ Pos=15|Len=1
5 revolt revolt NOUN _ NUMBER=SING 5 ADJPRENSUB _ Pos=17|Len=6
6 against against NOUN _ NUMBER=SING 6 ADJPRENSUB _ Pos=24|Len=7
7 the the NOUN _ NUMBER=SING 7 ADJPRENSUB _ Pos=32|Len=3
8 government government NOUN _ NUMBER=SING 8 ADJPRENSUB _ Pos=36|Len=10
9 of of NOUN _ NUMBER=SING 10 ADJPRENSUB _ Pos=47|Len=2
10 King King PROPN _ NUMBER=SING 10 SUBSUBJUX _ Pos=50|Len=4
11 Joseph Joseph PROPN _ NUMBER=SING _ _ _ NE=Person.PERSON|Pos=55|Len=6
12 I i NUM _ NUMBER=SING _ _ _ NE=Person.PERSON|Pos=62|Len=1
13 of of NOUN _ NUMBER=SING 12 ADJPRENSUB _ Pos=64|Len=2
14 Portugal Portugal PROPN _ NUMBER=SING _ _ _ NE=Location.LOCATION|Pos=67|Len=8
15 takes takes NOUN _ NUMBER=SING 14 ADJPRENSUB _ Pos=76|Len=5
16 place place NOUN _ NUMBER=SING 15 ADJPRENSUB _ Pos=82|Len=5
17 in in NOUN _ NUMBER=SING 16 ADJPRENSUB _ Pos=88|Len=2
18 the the NOUN _ NUMBER=SING 17 ADJPRENSUB _ Pos=91|Len=3
19 city city NOUN _ NUMBER=SING 18 ADJPRENSUB _ Pos=95|Len=4
20 of of NOUN _ NUMBER=SING 19 ADJPRENSUB _ Pos=100|Len=2
21 Oporto Oporto PROPN _ NUMBER=SING _ _ _ NE=Location.LOCATION|Pos=103|Len=6
22 . . SENT _ _ 0 _ _ Pos=109|Len=1

from lima.

kleag avatar kleag commented on July 22, 2024

@victorbocharov , you are the last developer having ensured a successful Windows build. Have you noticed problems like that ?

from lima.

victorbocharov avatar victorbocharov commented on July 22, 2024

No, I haven't. Moreover, I don't have Windows computers, so I won't be able to reproduce this. I can only suggest a few guesses:

  • PoS tags are given according to some tokenization rules: starts from capital => PROPN, digits => NUM, ...
  • lemmatization doesn't work (takes -> takes)
  • NER works

Looks like English dictionary isn't used or it is empty. @kleag : How to check this?
@ebarbot : Is the pipeline "main" unchanged?
@ebarbot : How old is the version of LIMA?

from lima.

ebarbot avatar ebarbot commented on July 22, 2024

I downloaded the 3.0.0.20210912222206-0c3404de version, and if I explicitely write analyzeText -l eng -p main joseph_I.txt I get the same result

from lima.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.