Hello Jeff Thompson, I'm trying to run your in python but I'm

The warning you can just ignore. See <a href="https://stackoverflow.com/a/41951301/116

Issues with new version of Gensim about word2vecandtsne HOT 6 CLOSED

arthurbr140896 commented on May 26, 2024

Issues with new version of Gensim

from word2vecandtsne.

Comments (6)

jeffThompson commented on May 26, 2024

Hmm. What line is the error showing up on? The code is intended for Python 2.7, so Python 3 might be the culprit.

from word2vecandtsne.

arthurbr140896 commented on May 26, 2024

Thank you so much for your response, it is much appreciated.

So I have installed Python 2.7.0,
It seemed to have fixed the Tuple Parameter Error that I was getting but I can't be sure yet.
I have run the code again and this is the traceback that I am getting:**

Warning (from warnings module):
File "C:\Python27\lib\site-packages\gensim\utils.py", line 860
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
UserWarning: detected Windows; aliasing chunkize to chunkize_serial
building vocabulary...
training model...

Traceback (most recent call last):
File "C:\Users\arthu\Desktop\Arthur\Documents\UCI\Fall 2017\Independent Study with LaFarge\Slime Mold\Word2VecAndTsne-master (1)\Word2VecAndTsne-master\TrainModel.py", line 38, in
model.train(sentences)
File "C:\Python27\lib\site-packages\gensim\models\word2vec.py", line 940, in train
"You must specify either total_examples or total_words, for proper alpha and progress calculations. "
ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

Again thank you so much for your help.

Arthur Rodrigues

from word2vecandtsne.

jeffThompson commented on May 26, 2024

The warning you can just ignore. See this post to see how to suppress it, if you want to.

The error is, I think, due to a new version of Gensim. It seems maybe it needs two extra arguments. Can you try modifying this section of the TrainModel.py file to be like this:

# train model
print 'training model...'
if skip_gram:
	model.train(sentences, total_examples=self.corpus_count, epochs=self.iter, sg=1)
else:
	model.train(sentences, total_examples=self.corpus_count, epochs=self.iter)

And let me know if it works?

from word2vecandtsne.

arthurbr140896 commented on May 26, 2024

Alright I tested it out and this is the traceback that I got:

Traceback (most recent call last):
File "C:\Users\arthu\Desktop\Arthur\Documents\UCI\Fall 2017\Independent Study with LaFarge\Slime Mold\Word2VecAndTsne-master (1)\Word2VecAndTsne-master\TrainModel.py", line 37, in
model.train(sentences, total_examples=self.corpus_count, epochs=self.iter)
NameError: name 'self' is not defined

Thank you,
Arthur Rodrigues

from word2vecandtsne.

arthurbr140896 commented on May 26, 2024

I think I was able to fix the issue:

print 'training model...'
if skip_gram:
model.train(sentences, total_examples=model.corpus_count, epochs=model.iter)
else:
model.train(sentences, total_examples=model.corpus_count, epochs=model.iter)

However now I have to run the TrainModel script with the new version of Gensim and the rest of
the scripts with the older version of Gensim.

Also a question I have is how do I get it to take the csv file and output it into a PNG, I the blog post
it tells us to use "The included Processing script" but I'm not sure which one that is.

Again thank you so much.

But also just another quick question, the models that I'm getting words with similar meaning aren't really being clustered together, but that is probably I haven't trained the algorithm with a large enough data set right? (I've just trained it only with Michel Foucault's The Order of Things)

Thank you

Arthur Rodrigues

from word2vecandtsne.

jeffThompson commented on May 26, 2024

Glad you fixed it!

Re older version:
I'll have to take a look when I have some free time. For now, if you have the older version installed, you could use that through the whole process. Or, if you find fixes to any other problems you find with the new version, let me know and I'll update everything.

Re image file:
That's the Processing sketch in the VisualizeSpace folder. It will create a visual representation of your vector space.

Re proximity:
Yes, you probably need a larger dataset. A single book isn't probably enough to get super accurate on all words, but it should get clear clustering. This is probably due to your t-SNE variables – try tweaking them for better results.

from word2vecandtsne.

Issues with new version of Gensim about word2vecandtsne HOT 6 CLOSED

Comments (6)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs