Comments (3)
I guess the file named vec500flickr30m.tar.gz (3.0G) has not been downloaded completely.
from dual_encoding.
Hello.
I have the exact same problem.
First I got this encoding problem when trying to read the id.txt file
Traceback (most recent call last): File "<input>", line 1, in <module> File "/home/dual_encoding-master/venv/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 1277060: invalid continuation byte
because my pc use UTF-8 as default. I tried with ISO-8859-1 by changing the __init__
in basic/bigfile.py
self.names = open(id_file, encoding='ISO-8859-1').read().strip().split()
and I could read the file, but now the length of self.names is len(self.names) = 1746908
instead of the 1743364 reported in shape.txt, so the encoding I choosed must be wrong.
Any idea what encoding should I use to read id.txt?
update: I tried with the files from Google Drive and http://lixirong.net/data/w2vv-tmm2018/word2vec.tar.gz but the problem persists in both
from dual_encoding.
Found the solution: The problem is that I was trying to run the code in Python3, but the "id.txt" was written in python2.7 and its encoding is a bit different to python3.
The solution was either run with python2.7 or:
1.- Open with python2.7 the file "id.txt" and get the list of words with .strip().split()
names = open("id.txt").read().strip().split()
2.- Save the list with json with the option ensure_ascii=False
like this
json.dump(names, open("id.json", "w"), ensure_ascii=False)
3.- Run the BigFile code with python3 by replacing
self.names = open(id_file).read().strip().split()
with
self.names = json.load(open(id_file, "r", encoding='latin-1'))
and done, len(self.names) = 1743364
as intended, therefore the list of vectors is read as the original.
Hope it helps!
from dual_encoding.
Related Issues (20)
- Experiments on MSVD and MPII-MD HOT 1
- #10 More details about MSVD on zero-example video retrieval HOT 12
- A question about MPII-MD
- torch.backends.cudnn.CuDNNError: 8: CUDNN_STATUS_EXECUTION_FAILED HOT 3
- The model is very sensitive to the batch size HOT 2
- Issue about dataset format HOT 2
- Evaluating performance on a single video
- Could this work do video to text task?
- Anyone run this code on python3? HOT 2
- Code for newly published TPAMI2021 version HOT 2
- Alternative to baidu for downloading the frame features.
- IP not active
- Training time on MSVD dataset
- Question about feature extraction method HOT 1
- about the testing result
- about the testing result HOT 1
- Seeking information for CNN model used on MSRVTT
- ./do_test_dual_encoding_msrvtt10ktest.sh: 没有那个文件或目录 HOT 4
- An error when perform script HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dual_encoding.