dalgu90 / resnet-18-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

187.0 3.0 66.0 13.1 MB

ResNet-18 TensorFlow Implementation including conversion of torch .t7 weights into tensorflow ckpt

Python 96.85% Shell 3.15%

resnet-18-tensorflow's Introduction

resnet-18-tensorflow

A TensorFlow implementation of ResNet-18(https://arxiv.org/abs/1512.03385)

Prerequisite

TensorFlow 1.8
The ImageNet dataset

All image files are required to be valid JPEG files. See this gist.
It is highly recommened for every image to be resized so that the shorter side is 256.

(Optional) Torchfile(to convert ResNet-18 .t7 checkpoint into tensorflow checkpoint. Install with a command pip install torchfile)

How To Run

(Optional) Convert torch .t7 into tensorflow ckpt

# Download the ResNet-18 torch checkpoint
wget https://d2j0dndfm35trm.cloudfront.net/resnet-18.t7
# Convert into tensorflow checkpoint
python extract_torch_t7.py

Modify train_scratch.sh(training from scratch) or train.sh(finetune pretrained weights) to have valid values of following arguments

train_dataset, train_image_root, val_dataset, val_image_root: Path to the list file of train/val dataset and to the root
num_gpus and corresponding IDs of GPUs(CUDA_VISIBLE_DEVICES at the first line)

Run!

./train.sh if you want to finetune the converted ResNet(NOTE: The model needs to be finetuned for some epochs)
./train_scratch.sh if you want to train ResNet from scratch

Evaluate the trained model

./eval.sh for evaluating the trained model(change the arguments in eval.sh to your preference)

Note

The extracted weights should be finetuned for several epochs(run ./train.sh) to get the full performance(If you run the evaluation code without finetuning, the single-crop top-1 validation accuracy is about 60%, which is less than the appeared in the original). I guess there is some minor issue that I have missed.

resnet-18-tensorflow's People

Contributors

Stargazers

Watchers

resnet-18-tensorflow's Issues

testing image get wrong results

Hi, I'm new to resnet18, I'm so appreciated about your code. I modified your file extractor_torch_t7.py (attached) to test the model on some pictures I chose but all classified wrong. So I'm wondering about your testing results, pictures you used & performances? So could you give me some advice to improve my test performance? thx!

with tf.Graph().as_default():
    global_step = tf.Variable(0, trainable=False, name='global_step')
    images = [tf.placeholder(tf.float32, [2, 224, 224, 3])]
    labels = [tf.placeholder(tf.int32, [2])]

    # Build model
    print("Build ResNet-18 model")
    hp = resnet.HParams(batch_size=2,
                        num_gpus=1,
                        num_classes=1000,
                        weight_decay=0.001,
                        momentum=0.9,
                        finetune=False)
    network_train = resnet.ResNet(hp, images, labels, global_step, name="train")
    network_train.build_model()
    print('Number of Weights: %d' % network_train._weights)
    print('FLOPs: %d' % network_train._flops)

    # Build an initialization operation to run below.
    init = tf.global_variables_initializer()

    # Start running operations on the Graph.
    sess = tf.Session(config=tf.ConfigProto(
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.96),
        allow_soft_placement=True,
        log_device_placement=False))
    sess.run(init)

    # Set variables values
    print('Set variables to loaded weights')
    all_vars = tf.trainable_variables()
    for v in all_vars:
        print('\t' + v.op.name)
        assign_op = v.assign(model_weights[v.op.name])
        sess.run(assign_op)
    image_path='/images/'
    images =[] 
    for imgname in os.listdir(image_path):
    	img = cv2.imread(image_path+"/"+imgname)
    	img = cv2.resize(img, (224,224), interpolation=cv2.INTER_CUBIC)
    	img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    	img = img.astype(np.uint8)
    pred= sess.run(network_val.preds)

    # Save as checkpoint
    print('Save as checkpoint: %s' % INIT_CHECKPOINT_DIR)
    #sys.exit(1)
    if not os.path.exists(INIT_CHECKPOINT_DIR):
        os.mkdir(INIT_CHECKPOINT_DIR)
    saver = tf.train.Saver(tf.global_variables())
    saver.save(sess, os.path.join(INIT_CHECKPOINT_DIR, 'model.ckpt'))

print('Done!')

finetune model

Hi, @dalgu90
Thanks for your wonderful work! Could you share your fine-tuned model based on the transferred model? I have no devices to fine-tune the model on ImageNet, but I want to use the ResNet18 model to fine-tune my own dataset.

Why there is no bias in the checkpoint?

Hi,

Thanks for sharing such a good project. I am checking the code and find that seems there is no bias for conv in the topology, do I miss anything?

Key conv1/bn/beta/Momentum not found in checkpoint

Hi,

nice work.
I was using the train.py to train the model using the imagenet weights.
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
I am getting the following error at time. I have checked the variables they are all good to go.

Key conv1/bn/beta/Momentum not found in checkpoint
	 [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
	 [[{{node save/restore_all/NoOp_1/_10}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_447_save/restore_all/NoOp_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Question about input preprocessing

Hello,

Thank you for your contribution, I really appreciate it.
According to your code, you normalized input using mean and std of the imagenet as,
imagenet_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32) * 255.0
imagenet_std = np.array([0.229, 0.224, 0.225], dtype=np.float32) * 255.0
image = (image - imagenet_mean) / imagenet_std

However, it seems little different from the official torch version normalization method used for resnet training.
Which one is the correct one that you used for the pretraining? (torch version? or the one in your code?)

Thanks in advance.

Does the shortcut convolutional layer has BN?

As is shown in common resnet-N models，if output channels > input channels，the shortcut should be a layer with Batch Normalization. However, as is shown in the programs provided, the Batch Normalization did not exist. Isn't is provided in the pre-trained pytorch model?
Thank you very much!

Is it possible to generate a *.lua file for this model?

I would like to train this model using NVIDIA's DIGITS, and in order to do this I need to load the pre-trained model into DIGITS with a) its weights (resnet-18.t7) and b) its architecture (described by a *.lua file). How would I generate a *.lua file for this model?

Thanks in advance for any suggestions.

Small bug fix

Hey,

Thanks for your code. It looks like there's a small bug in the '_bn' function of utils.py. When you call '_bn' in other parts of your code (eg. in the residual_block function of resnet.py) you are passing the scope name as the third argument, whereas the global_step is the third argument in _bn. I would suggest switching the order of the global_step and name arguments and not setting any default value for the name argument in _bn.

Should the argmax parameter come from softmax?

the argmax parameter here directly from logits, is it a bug?
Should the argmax parameter come from softmax?

can't restore the generated tf model?

Hi,
I downloaded the model (t7 file), installed torchfile, and run "python extract_torch_t7.py". I got 3 files under init/:
-rw-r--r-- 1 domain^users 3277 Nov 17 10:29 model.ckpt.index
-rw-r--r-- 1 domain^users 46782116 Nov 17 10:29 model.ckpt.data-00000-of-00001
-rw-r--r-- 1 domain^users 77 Nov 17 10:29 checkpoint
-rw-r--r-- 1 domain^users 47121651 Nov 17 10:29 model.ckpt.meta

I then tried to load them in python:
with tf.Session() as sess:
saver = tf.train.import_meta_graph('init/model.ckpt.meta')
saver.restore(sess, 'init/model.ckpt')

Traceback (most recent call last):
File "", line 3, in
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1439, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "~/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device to node 'tower_0/conv5_2/bn_2/cond/pred_id_3': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: tower_0/conv5_2/bn_2/cond/pred_id_3 = IdentityT=DT_BOOL, _device="/device:GPU:0"]]

Caused by op u'tower_0/conv5_2/bn_2/cond/pred_id_3', defined at:
File "", line 2, in
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1577, in import_meta_graph
**kwargs)
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/framework/meta_graph.py", line 498, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 287, in import_graph_def
op_def=op_def)
File "/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "~/tensorflow/tensorflow_ve/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1264, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'tower_0/conv5_2/bn_2/cond/pred_id_3': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: tower_0/conv5_2/bn_2/cond/pred_id_3 = IdentityT=DT_BOOL, _device="/device:GPU:0"]]

Restore Checkpoints error, can't find the key in checkpoint

OS: Ubuntu 16.04
checkpoint: converted from the torchfile, followed the instruction of the readme.
NotFoundError (see above for traceback): Key conv1/bn/beta/Momentum not found in checkpoint

tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence When Using Custom Dataset

Hello, I am new and trying to train a resnet18 model with your code using my custom dataset (8 classes).

I made changes in a config file (train_scratch.sh).
And I got the error below when start training (at 0 step).

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,224,224,3], [?]], output_types=[DT_FLOAT, DT_INT32], _device="/device:CPU:0"]]
[[{{node train_image/ExperimentalFunctionBufferingResourceGetNext_1}} = ExperimentalFunctionBufferingResourceGetNextoutput_types=[DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:GPU:1"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 263, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 259, in main
train()
File "train.py", line 233, in train
feed_dict={network_train.is_train:True, network_train.lr:lr_value})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,224,224,3], [?]], output_types=[DT_FLOAT, DT_INT32], _device="/device:CPU:0"]]
[[node train_image/ExperimentalFunctionBufferingResourceGetNext_1 (defined at /shared/workspace/yjoh/resnet-18-tensorflow/imagenet_input.py:176) = ExperimentalFunctionBufferingResourceGetNextoutput_types=[DT_FLOAT, DT_INT 32], _device="/job:localhost/replica:0/task:0/device:GPU:1"]]

My tensorflow version is 1.12 and Python 3.6.

Let me know if more info / codes are needed. Thank you in advance!

How to make a data set？

ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: 'Tensor("train_image/DecodeCSV:0", shape=(), dtype=float32, device=/device:CPU:0)'

dalgu90 / resnet-18-tensorflow Goto Github PK

resnet-18-tensorflow's Introduction

resnet-18-tensorflow

resnet-18-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

resnet-18-tensorflow's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs