GithubHelp home page GithubHelp logo

akb89 / word2vec Goto Github PK

View Code? Open in Web Editor NEW
12.0 2.0 5.0 5.33 MB

Re-implementation of Word2Vec using Tensorflow v2 Estimators and Datasets

License: MIT License

Python 100.00%
word2vec tensorflow tensorflow-estimators tensorflow-datasets tensorflow2

word2vec's People

Contributors

akb89 avatar cocophotos avatar dependabot[bot] avatar minimalparts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

word2vec's Issues

ModuleNotFoundError

Exception has occurred: ModuleNotFoundError
No module named 'logging.config'
File "C:\Users\Olek\word2vec-master\word2vec\main.py", line 10, in
import logging.config

Unicode error

Hello,

while training word2vec on sample english wiki data, I'm getting following error:
w2v train --data enwiki.20190120.sample10.0.balanced.txt.7z --outputdir output
2020-03-23 15:06:25,712 - word2vec.main - INFO - Training Tensorflow implementation of Word2Vec
2020-03-23 15:06:25,714 - word2vec.estimators.word2vec - INFO - Building vocabulary from file enwiki.20190120.sample10.0.balanced.txt.7z
2020-03-23 15:06:25,714 - word2vec.estimators.word2vec - INFO - Loading word counts... Traceback (most recent call last):
File "/usr/local/bin/w2v", line 11, in <module>
load_entry_point('tf-word2vec', 'console_scripts', 'w2v')()
File "/home/giosal/word2vec/word2vec/main.py", line 126, in main
args.func(args)
File "/home/giosal/word2vec/word2vec/main.py", line 47, in _train
w2v.build_vocab(args.datafile, vocab_filepath, args.min_count)
File "/home/giosal/word2vec/word2vec/estimators/word2vec.py", line 49, in build_vocab
for line in data_stream:
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 2: invalid start byte

Setuptools installs tensorflow 2.0

Setuptools installs the wrong version of TensorFlow:

~/unige/M1/METL/tp3/test master*
❯ python3 --version
Python 3.6.7

~/unige/M1/METL/tp3/test master*
❯ python3 -m virtualenv env_tf            

Using base prefix '/usr'
New python executable in /home/<username>/unige/M1/METL/tp3/test/env_tf/bin/python3
Also creating executable in /home/<username>/unige/M1/METL/tp3/test/env_tf/bin/python
Installing setuptools, pip, wheel...
done.

~/unige/M1/METL/tp3/test master*
❯ source env_tf/bin/activate

~/unige/M1/METL/tp3/test master*
env_tf ❯ python --version            
Python 3.6.7

~/unige/M1/METL/tp3/test master*
env_tf ❯ python3 --version
Python 3.6.7

~/unige/M1/METL/tp3/test master*
env_tf ❯ git clone https://github.com/akb89/word2vec
Cloning into 'word2vec'...
remote: Enumerating objects: 2066, done.
remote: Counting objects: 100% (2066/2066), done.
remote: Compressing objects: 100% (792/792), done.
remote: Total 2066 (delta 1273), reused 2042 (delta 1249), pack-reused 0
Receiving objects: 100% (2066/2066), 5.29 MiB | 2.48 MiB/s, done.
Resolving deltas: 100% (1273/1273), done.

~/unige/M1/METL/tp3/test master*
env_tf ❯ cd word2vec 

~/unige/M1/METL/tp3/test/word2vec master
env_tf ❯ /home/<username>/unige/M1/METL/tp3/test/env_tf/bin/python3 --version
Python 3.6.7

~/unige/M1/METL/tp3/test/word2vec master
env_tf ❯ /home/<username>/unige/M1/METL/tp3/test/env_tf/bin/python3 setup.py develop 
running develop
running egg_info
creating tf_word2vec.egg-info
writing tf_word2vec.egg-info/PKG-INFO
writing dependency_links to tf_word2vec.egg-info/dependency_links.txt
writing entry points to tf_word2vec.egg-info/entry_points.txt
writing requirements to tf_word2vec.egg-info/requires.txt
writing top-level names to tf_word2vec.egg-info/top_level.txt
writing manifest file 'tf_word2vec.egg-info/SOURCES.txt'
reading manifest file 'tf_word2vec.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'tf_word2vec.egg-info/SOURCES.txt'
running build_ext
Creating /home/<username>/unige/M1/METL/tp3/test/env_tf/lib/python3.6/site-packages/tf-word2vec.egg-link (link to .)
Adding tf-word2vec 0.1.5 to easy-install.pth file
Installing w2v script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin

Installed /home/<username>/unige/M1/METL/tp3/test/word2vec
Processing dependencies for tf-word2vec==0.1.5
Searching for tensorflow>=1.13.1
Reading https://pypi.org/simple/tensorflow/
Downloading https://files.pythonhosted.org/packages/29/39/f99185d39131b8333afcfe1dcdb0629c2ffc4ecfb0e4c14ca210d620e56c/tensorflow-2.0.0a0-cp36-cp36m-manylinux1_x86_64.whl#sha256=3eecb1412df267336d4a9c611e65f8dad98f01be7ba9b7ff2b6601b200e166e5
Best match: tensorflow 2.0.0a0
Processing tensorflow-2.0.0a0-cp36-cp36m-manylinux1_x86_64.whl
Installing tensorflow-2.0.0a0-cp36-cp36m-manylinux1_x86_64.whl to /home/<username>/unige/M1/METL/tp3/test/env_tf/lib/python3.6/site-packages
writing requirements to /home/<username>/unige/M1/METL/tp3/test/env_tf/lib/python3.6/site-packages/tensorflow-2.0.0a0-py3.6-linux-x86_64.egg/EGG-INFO/requires.txt
Adding tensorflow 2.0.0a0 to easy-install.pth file
Installing freeze_graph script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin
Installing saved_model_cli script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin
Installing tensorboard script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin
Installing tf_upgrade_v2 script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin
Installing tflite_convert script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin
Installing toco script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin
Installing toco_from_protos script to /home/<username>/unige/M1/METL/tp3/test/env_tf/bin

Installed /home/<username>/unige/M1/METL/tp3/test/env_tf/lib/python3.6/site-packages/tensorflow-2.0.0a0-py3.6-linux-x86_64.egg
Searching for pyyaml>=5.1
Reading https://pypi.org/simple/pyyaml/
Downloading https://files.pythonhosted.org/packages/9f/2c/9417b5c774792634834e730932745bc09a7d36754ca00acf1ccd1ac2594d/PyYAML-5.1.tar.gz#sha256=436bc774ecf7c103814098159fbb84c2715d25980175292c648f2da143909f95
Best match: PyYAML 5.1
Processing PyYAML-5.1.tar.gz
Writing /tmp/easy_install-w68v23u5/PyYAML-5.1/setup.cfg
Running PyYAML-5.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-w68v23u5/PyYAML-5.1/egg-dist-tmp-ki5acp6g
In file included from ext/_yaml.c:591:0:
ext/_yaml.h:2:10: fatal error: yaml.h: No such file or directory
 #include <yaml.h>
          ^~~~~~~~
compilation terminated.
Error compiling module, falling back to pure Python
zip_safe flag not set; analyzing archive contents...
Moving PyYAML-5.1-py3.6-linux-x86_64.egg to /home/<username>/unige/M1/METL/tp3/test/env_tf/lib/python3.6/site-packages
Adding PyYAML 5.1 to easy-install.pth file

I created a whole new virtual environment, and the installation seems to go well with tensorflow==1.13.1 specified in setup.py. The minimal command then works.

Enhancement - doc2vec

Hello,

I have been trying to convert a simple doc2vec example to use tensorflow data pipeline. I was struggling to do the window sequence generation. I stumbled upon your project. I was hoping for some inspiration. Unfortunately my skill level is at least an order of magnitude less than yours in python/tf. So I was awed but not able to take away inspiration to solve my issue.

I wondered if there was any interest in adding a doc2vec train_mode?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.