GithubHelp home page GithubHelp logo

Comments (14)

andresusanopinto avatar andresusanopinto commented on August 18, 2024 12

Closed due to inactivity.

from hub.

afaq-ahmad avatar afaq-ahmad commented on August 18, 2024 2

At the the time export inference graph I am also facing the same problem

from hub.

arnoegw avatar arnoegw commented on August 18, 2024

Hmm, strange. Please help me understand if this is an issue with the cached result of downloading&decompressing, or an issue with its subsequent use. According to your logs, the module 'https://tfhub.dev/google/nnlm-en-dim128/1' has been cached in /tmp/tfhub_modules/32f2b2259e1cc8ca58c876921748361283e73997

Please open a shell session, cd into that cache directory, and see if the following commands gives you the same file sizes and checksums:

$ ls -gGR
.:
total 24
drwxr-x--- 2  4096 Mar 19 10:02 assets/
-rw-r----- 1 11093 Mar 19 10:02 saved_model.pb
-rw-r----- 1     2 Mar 19 10:02 tfhub_module.pb
drwxr-x--- 2  4096 Mar 19 10:02 variables/

./assets:
total 8252
-rw-r----- 1 8449127 Mar 19 10:02 tokens.txt

./variables:
total 486896
-rw-r----- 1 498570752 Mar 19 10:02 variables.data-00000-of-00001
-rw-r----- 1       141 Mar 19 10:02 variables.index

$ find . -type f | xargs md5sum
6fff604a2ca98db081d03a93b5c3eb21  ./saved_model.pb
67fbacf272d157a66875d14fdd6bc0cb  ./variables/variables.index
86c053ab977204d7ac3e70ed8169133a  ./variables/variables.data-00000-of-00001
829ca3b0f994475ae7786114c4a7c526  ./assets/tokens.txt
6d13541d2cedd620921d41aece9c01d3  ./tfhub_module.pb

If you get different checksums (or file sizes), there was a download problem. In that case, please delete the cache directory, make sure you have enough free disk space, and re-run your program (which should re-download the module).

If you get the same checksums but the original error message persists, I need to dig deeper.

from hub.

tokestermw avatar tokestermw commented on August 18, 2024

Hi, so here are the results of the commands. Looks like the checksum for ./variables/variables.data-00000-of-00001 are different.

motoki@here:/tmp/tfhub_modules/32f2b2259e1cc8ca58c876921748361283e73997$ ls -gGR
.:
total 24
drwxr-xr-x 2  4096 Apr 30 14:22 assets
-rw-rw-r-- 1 11093 Apr 30 14:22 saved_model.pb
-rw-rw-r-- 1     2 Apr 30 14:22 tfhub_module.pb
drwxr-xr-x 2  4096 Apr 30 14:22 variables

./assets:
total 8252
-rw-rw-r-- 1 8449127 Apr 30 14:22 tokens.txt

./variables:
total 486892
-rw-rw-r-- 1 498570752 Apr 30 14:23 variables.data-00000-of-00001
-rw-rw-r-- 1       141 Apr 30 14:22 variables.index
motoki@here:/tmp/tfhub_modules/32f2b2259e1cc8ca58c876921748361283e73997$ find . -type f | xargs md5sum
67fbacf272d157a66875d14fdd6bc0cb  ./variables/variables.index
4d9090c03df283800f8b5590d0175847  ./variables/variables.data-00000-of-00001
829ca3b0f994475ae7786114c4a7c526  ./assets/tokens.txt
6d13541d2cedd620921d41aece9c01d3  ./tfhub_module.pb
6fff604a2ca98db081d03a93b5c3eb21  ./saved_model.pb

I did try deleting and re-downloading and I get another checksum for ./variables/variables.data-00000-of-00001.

motoki@here:/tmp/tfhub_modules/32f2b2259e1cc8ca58c876921748361283e73997$ find . -type f | xargs md5sum
67fbacf272d157a66875d14fdd6bc0cb  ./variables/variables.index
30ed10f1887c0d361fb518254198b042  ./variables/variables.data-00000-of-00001
829ca3b0f994475ae7786114c4a7c526  ./assets/tokens.txt
6d13541d2cedd620921d41aece9c01d3  ./tfhub_module.pb
6fff604a2ca98db081d03a93b5c3eb21  ./saved_model.pb

What's strange is, I remember loading the model worked the first time I downloaded the model. I was playing with changing cache dirs, and deleting the cache, and now the model doesn't load regardless of the model.

from hub.

tokestermw avatar tokestermw commented on August 18, 2024

Interestingly, I ran it with a fresh Python 3.5 (as opposed to 3.6) venv and I am getting segfault messages.

import tensorflow as tf
import tensorflow_hub as hub

import os
# default is '/tmp/tfhub_modules/'
# TFHUB_CACHE_DIR = '/tmp/my_module_cache'
# os.environ['TFHUB_CACHE_DIR'] = TFHUB_CACHE_DIR

with tf.Graph().as_default():
  embed = hub.Module("https://tfhub.dev/google/nnlm-en-dim128/1")
  embeddings = embed(["A long sentence.", "single-word", "http://example.com"])

  with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(tf.tables_initializer())

    print(sess.run(embeddings))
:--
INFO:tensorflow:Using /tmp/tfhub_modules to cache modules.
INFO:tensorflow:Downloading TF-Hub Module 'https://tfhub.dev/google/nnlm-en-dim128/1'.
INFO:tensorflow:Downloaded TF-Hub Module 'https://tfhub.dev/google/nnlm-en-dim128/1'.
Segmentation fault (core dumped)

Then, I refreshed with Python 3.6 venv and got the same DataLossError as above, and then I refreshed again with Python 3.5 venv and got a DataLossError instead of a segfault. 😅

from hub.

arnoegw avatar arnoegw commented on August 18, 2024

Let's focus on the faulty downloads that you have seen repeatedly with Python 3.6 (Ubuntu 16.04). Has anyone else seen these? I've been investigating for a while but did not find anything.

Let me provide some context so that others can better look at it.

The URL passed to module gets augmented to https://tfhub.dev/google/nnlm-en-dim128/1?tf-hub-format=compressed, which redirects to https://storage.googleapis.com/tensorflow-hub/google/nnlm-en-dim128/1.tar.gz.

https://github.com/tensorflow/hub/blob/r0.1/tensorflow_hub/compressed_module_resolver.py#L101
opens that.

https://github.com/tensorflow/hub/blob/r0.1/tensorflow_hub/resolver.py#L105
iterates over the entries of the tarball (using tarfile's transparent decompression), gets the fileobj for each file in it, makes a matching GFile fileobj for output, and throws the two at shutil.copyfileobj():
https://github.com/python/cpython/blob/3.6/Lib/shutil.py#L76

That seems just about right.

The extracted fileobj knows its size from what is stored in the tarball, not from the end of the stream:
https://github.com/python/cpython/blob/3.6/Lib/tarfile.py#L610
So it's not clear to me just now if the correct file size that you see means a lot.

Can you observe problems somewhere along that call chain? Maybe too few bytes downloaded?

Is there any clue in the diff between the variables.data-00000-of-00001 in your cache, and what you get from manually downloading and untarring the module?

from hub.

gowthamkpr avatar gowthamkpr commented on August 18, 2024

@arnoegw Did you find a solution for this issue. Thanks!

from hub.

arnoegw avatar arnoegw commented on August 18, 2024

This was never really resolved. But then, no further instances of the problem have been reported, so it is probably very rare. Also, we have no way to reproduce and analyze this. So there seems nothing left to do here.

from hub.

arnoegw avatar arnoegw commented on August 18, 2024

To clarify, I believe this is different from issue #305 (despite the similar error message), because this here talks about problems after downloading a Hub module, while that other one appears to talk about TensorFlow's ordinary checkpointing during training.

from hub.

MISSIVA20 avatar MISSIVA20 commented on August 18, 2024

j'ai eu cette erreur lorsque j'ai essyer d'exporter le modèle ssd mobile netV2 quelqu'un a pu résolu le problème???
DataLossError (see above for traceback): Checksum does not match: stored 1858275978 vs. calculated on the restored bytes 2637572513
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

from hub.

afaq-ahmad avatar afaq-ahmad commented on August 18, 2024

At the the time export inference graph I am also facing the same problem

When I used previous checkpoint, it successfully converted to a frozen graph

from hub.

MISSIVA20 avatar MISSIVA20 commented on August 18, 2024

from hub.

MISSIVA20 avatar MISSIVA20 commented on August 18, 2024

from hub.

serser avatar serser commented on August 18, 2024

A typical mistake comes from saving the same model with multiple processes, so that the checkpoint gets easily corrupted.

from hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.