GithubHelp home page GithubHelp logo

visipedia / tfrecords Goto Github PK

View Code? Open in Web Editor NEW
110.0 10.0 30.0 20 KB

Functions for creating tfrecords for TensorFlow models.

License: MIT License

Python 100.00%
tensorflow bounding-boxes protocol-buffers tfrecords-files tfrecords

tfrecords's Introduction

tfrecords

Convenience functions to create tfrecords that can be used with classification, detection and keypoint localization systems. The create_tfrecords.py file will help create the correct tfrecords to feed into those systems.

There are configuration parameters that dictate whether to store the raw images in the tfrecords or not (store_images=True in create_tfrecords.create method or --store_images when calling create_tfrecords.py from the command line). If you choose not to store the raw images in the tfrecords, then you should be mindful that the filename field needs to be a valid path on the system where you will be processing the tfrecords. Also, if those images are too big, you may find that your input pipelines for your model struggle to fill the input queues. Resizing images to 800px seems to work well.

Inputs

The data needs to be stored in an Example protocol buffer. The protocol buffer will have the following fields:

Key Value
image/id string containing an identifier for this image.
image/filename string containing a file system path to the of the image file.
image/encoded string containing JPEG encoded image in RGB colorspace
image/height integer, image height in pixels
image/width integer, image width in pixels
image/colorspace string, specifying the colorspace, e.g. 'RGB'
image/channels integer, specifying the number of channels, e.g. 3
image/format string, specifying the format, e.g. 'JPEG'
image/extra string, any extra data can be stored here. For example, this can be a string encoded json structure.
image/class/label integer specifying the index in a classification layer. The label ranges from [0, num_labels), e.g 0-99 if there are 100 classes.
image/class/text string specifying the human-readable version of the label e.g. 'White-throated Sparrow'
image/class/conf float value specifying the confidence of the label. For example, a probability output from a classifier.
image/object/count an integer, the number of object annotations. For example, this should match the number of bounding boxes.
image/object/area a float array of object areas; normalized coordinates. For example, the simplest case would simply be the area of the bounding boxes. Or it could be the size of the segmentation. Normalized in this case means that the area is divided by the (image width x image height)
image/object/id an array of strings indicating the id of each object.
image/object/bbox/xmin a float array, the left edge of the bounding boxes; normalized coordinates.
image/object/bbox/xmax a float array, the right edge of the bounding boxes; normalized coordinates.
image/object/bbox/ymin a float array, the top left corner of the bounding boxes; normalized coordinates.
image/object/bbox/ymax a float array, the top edge of the bounding boxes; normalized coordinates.
image/object/bbox/score a float array, the score for the bounding box. For example, the confidence of a detector.
image/object/bbox/label an integer array, specifying the index in a classification layer. The label ranges from [0, num_labels)
image/object/bbox/text an array of strings, specifying the human readable label for the bounding box.
image/object/bbox/conf a float array, the confidence of the label for the bounding box. For example, a probability output from a classifier.
image/object/parts/x a float array of x locations for a part; normalized coordinates.
image/object/parts/y a float array of y locations for a part; normalized coordinates.
image/object/parts/v an integer array of visibility flags for the parts. 0 indicates the part is not visible (e.g. out of the image plane). 1 indicates the part is occluded. 2 indicates the part is visible.
image/object/parts/score a float array of scores for the parts. For example, the confidence of a keypoint localizer.

Take note:

  • Many of the above fields can be empty. Most of the different systems using the tfrecords will only need a subset of the fields.

  • The bounding box coordinates, part coordinates and areas need to be normalized. For the bounding boxes and parts this means that the x values have been divided by the width of the image, and the y values have been divided by the height of the image. This ensures that the pixel location can be recovered on any (aspect-perserved) resized version of the original image. The areas are normalized by they area of the image.

  • The origin of an image is the top left. All pixel locations will be interpreted with respect to that origin.

The create_tfrecords.py file has a convience function for generating the tfrecord files. You will need to preprocess your dataset and get it into a python list of dicts. Each dict represents an image and should have a structure that mimics the tfrecord structure above. However, slashes are replaced by nested dictionaries, and the outermost image dictionary is implied. Here is an example of a valid dictionary structure for one image:

image_data = {
  "filename" : "/path/to/image_1.jpg", 
  "id" : "0",
  "class" : {
    "label" : 1,
    "text" : "Indigo Bunting",
    "conf" : 0.9
  },
  "object" : {
    "count" : 1,
    "area" : [.49],
    "id" : ["1"],
    "bbox" : {
      "xmin" : [0.1],
      "xmax" : [0.8],
      "ymin" : [0.2],
      "ymax" : [0.9],
      "label" : [1],
      "score" : [0.8],
      "conf" : [0.9]
    },
    "parts" : {
      "x" : [0.2, 0.5],
      "y" : [0.3, 0.6],
      "v" : [2, 1],
      "score" : [1.0, 1.0]
    }
  }
}

Not all of the fields are required. For example, if you just want to train a classifier using the whole image as an input, then your dictionaries could look like:

image_data = {
  "filename" : "/path/to/image_1.jpg", 
  "id" : "0",
  "class" : {
    "label" : "1"
  }
}

If the encoded key is not provided, then the create method will read in the image by using the filename value (if we request the images to be stored in the tfrecords). In this case, it is assumed that image is stored in either jpg or png format. The image will be converted to the jpg format for storage in the tfrecord. If encoded is provided, then it is required to provide height, width, format, colorspace, and channels as well.

Once you have your dataset preprocessed, you can use the create method in create_tfrecords.py to create the tfrecords files. For example:

# this should be your array of image data dictionaries. 
# Don't forget that you'll want to separate your training and testing data.
train_dataset = [...]

from create_tfrecords import create
failed_images = create(
  dataset=train_dataset,
  dataset_name="train",
  output_directory="/home/gvanhorn/Desktop/train_dataset",
  num_shards=10,
  num_threads=5,
  store_images=True
)

This call to the create method will use 5 threads to produce 10 tfrecord files, each prefixed with the name train in the directory /home/gvanhorn/Desktop/train_dataset.

All images that cause errors will be returned to the caller. An extra field, error_msg, will be added to the dictionary for that image, and will contain the error message that was thrown when trying to process it. Typically an error is due to filename fields that don't exist.

print("%d images failed." % (len(failed_images),))
for image_data in failed_images:
  print("Image %s: %s" % (image_data['id'], image_data['error_msg']))

If you do not want the images to be stored in the tfrecords, then you can pass store_images=False to the create method. Subsequently, code that reads the tfrecords will be expected to load in the image using the filename field.

If you have saved your preprocessed dataset list into a json file, such as train_tfrecords_dataset.json, then you can call create_tfrecords.py from the command line to create the tfrecords:

python create_tfrecords.py \
--dataset_path /home/gvanhorn/Desktop/train_dataset/train_tfrecords_dataset.json \
--prefix train \
--output_dir /home/gvanhorn/Desktop/train_dataset \
--shards 10 \
--threads 5 \
--shuffle \
--store_images

If you do not want the images stored in the tfrecords, then you can exclude the --store_images argument.

tfrecords's People

Contributors

gvanhorn38 avatar kulits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tfrecords's Issues

Strange question, have you ever met ?

Running this error on windows will not happen on Ubuntu:OverflowError: Python int too large to convert to C long.
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
I want to modify the Int64 bit uint32, but tensorflow only supports Int64, float, string. What should I do?

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [Unable to decode bytes as JPEG, PNG, GIF, or BMP]

Hi all,
I use Python 2.7.13 and Tensorflow 1.3.0 on CPU.

I want to use DensNet( https://github.com/pudae/tensorflow-densenet ) for regression problem. My data contains 60000 jpeg images with 37 float labels for each image.
I saved my data into tfrecords files by:

`
def Read_Labels(label_path):
labels_csv = pd.read_csv(label_path)
labels = np.array(labels_csv)
return labels[:,1:]

def load_image(addr):
# read an image and resize to (224, 224)
img = cv2.imread(addr)
img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
return img

def Shuffle_images_with_labels(shuffle_data, photo_filenames, labels):
if shuffle_data:
c = list(zip(photo_filenames, labels))
shuffle(c)
addrs, labels = zip(*c)
return addrs, labels

def image_to_tfexample_mine(image_data, image_format, height, width, label):
return tf.train.Example(features=tf.train.Features(feature={
'image/encoded': bytes_feature(image_data),
'image/format': bytes_feature(image_format),
'image/class/label': _float_feature(label),
'image/height': int64_feature(height),
'image/width': int64_feature(width),
}))

def _convert_dataset(split_name, filenames, labels, dataset_dir):
assert split_name in ['train', 'validation']

num_per_shard = int(math.ceil(len(filenames) / float(_NUM_SHARDS)))

with tf.Graph().as_default():

    for shard_id in range(_NUM_SHARDS):
      output_filename = _get_dataset_filename(dataset_path, split_name, shard_id)
     
      with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
          start_ndx = shard_id * num_per_shard
          end_ndx = min((shard_id+1) * num_per_shard, len(filenames))
          for i in range(start_ndx, end_ndx):
              sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
                      i+1, len(filenames), shard_id))
              sys.stdout.flush()

              img = load_image(filenames[i])
              image_data = tf.compat.as_bytes(img.tostring())
                
              label = labels[i]
                
              example = image_to_tfexample_mine(image_data, image_format, height, width, label)
                
              # Serialize to string and write on the file
              tfrecord_writer.write(example.SerializeToString())

sys.stdout.write('\n')
sys.stdout.flush()

def run(dataset_dir):

labels = Read_Labels(dataset_dir + '/training_labels.csv')

photo_filenames = _get_filenames_and_classes(dataset_dir + '/images_training')

shuffle_data = True 

photo_filenames, labels = Shuffle_images_with_labels(
        shuffle_data,photo_filenames, labels)

training_filenames = photo_filenames[_NUM_VALIDATION:]
training_labels = labels[_NUM_VALIDATION:]

validation_filenames = photo_filenames[:_NUM_VALIDATION]
validation_labels = labels[:_NUM_VALIDATION]

_convert_dataset('train',
                 training_filenames, training_labels, dataset_path)
_convert_dataset('validation',
                 validation_filenames, validation_labels, dataset_path)

print('\nFinished converting the Flowers dataset!')` 

And I decode it by:

`
with tf.Session() as sess:

feature = {
  'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
  'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
  'image/class/label': tf.FixedLenFeature(
      [37,], tf.float32, default_value=tf.zeros([37,], dtype=tf.float32)),
   }

filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)

reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)

features = tf.parse_single_example(serialized_example, features=feature)

image = tf.decode_raw(features['image/encoded'], tf.float32)
print(image.get_shape())

label = tf.cast(features['image/class/label'], tf.float32)

image = tf.reshape(image, [224, 224, 3])

images, labels = tf.train.shuffle_batch([image, label], batch_size=10, capacity=30, num_threads=1, min_after_dequeue=10)

init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)

for batch_index in range(6):
    img, lbl = sess.run([images, labels])
    img = img.astype(np.uint8)
    print(img.shape)
    for j in range(6):
        plt.subplot(2, 3, j+1)
        plt.imshow(img[j, ...])
    plt.show()

coord.request_stop()

coord.join(threads)`

It's all fine up to this point. But when I use the bellow commands for decoding TFRecord files:

`
reader = tf.TFRecordReader

keys_to_features = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='raw'),
'image/class/label': tf.FixedLenFeature(
[37,], tf.float32, default_value=tf.zeros([37,], dtype=tf.float32)),
}

items_to_handlers = {
'image': slim.tfexample_decoder.Image('image/encoded'),
'label': slim.tfexample_decoder.Tensor('image/class/label'),
}

decoder = slim.tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)`


I get the following error.

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [Unable to decode bytes as JPEG, PNG, GIF, or BMP]
[[Node: case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/Assert_1/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/is_bmp, case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/Assert_1/Assert/data_0)]]
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:sensorflow:Finished training! Saving model to disk.


To use Densenet for my problem, I should fix this error first.
Could anybody please help me out of this problem. This code works perfectly for the datasets like flowers, MNIST and CIFAR10 available at https://github.com/pudae/tensorflow-densenet/tree/master/datasets but does not work for my data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.