GithubHelp home page GithubHelp logo

Comments (14)

yashk2810 avatar yashk2810 commented on May 22, 2024 1

Hey, you can find the tutorials here: https://github.com/tensorflow/docs/tree/master/site/en/r2/tutorials/quickstart

from datasets.

rsepassi avatar rsepassi commented on May 22, 2024 1

So the only line you're missing is to get an iterator:

data, info = tfds.load("omniglot", with_info=True)
train_data, test_data = data['train'], data['test']

train_data = train_data.repeat().shuffle(10000).batch(32).prefetch(tf.data.experimental.AUTOTUNE)

example = tf.compat.v1.data.make_one_shot_iterator(train_data).get_next()
# These are Tensors
image, label = example["image"], example["label"]

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

Would the process be analogous for training on Omniglot?

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

Could an example be provided without the use of eager execution? I'm used to using placeholders. Would I have to first use the numpy API and pass it to a feed dict?

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

Here was my guess:

I would ordinarily use two placeholders called image and label. I want to replace that workflow with TensorFlow Datasets. Here is how I tried to get those same objects with the Datasets workflow:

data, info = tfds.load("omniglot", with_info=True)
train_data, test_data = data['train'], data['test']

train_data = train_data.repeat().shuffle(10000).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
image, label = train_data["image"], train_data["label"]

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

Thank you!

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

A few more questions, if it's alright...

How do I do the same thing but for testing? I'm not super familiar with TensorFlow's data API. I imagine I wouldn't need the batch iterator. Ideally, I would have a training, validation, and testing set. Is there a way I could split the training data once more so that I have some data reserved for the validation set? Is there a way I can check if the train batch iterator completed an epoch so that I can print results at that time (using a regular print statement)?

So my questions are:

  1. How do I achieve the same thing but with the test data?
  2. How do I partition into training and validation?
  3. How do I dynamically print validation results every epoch?

Thank you so much if you can help me with these three areas.

from datasets.

rsepassi avatar rsepassi commented on May 22, 2024
  1. Splits: Pass split="train" or split="test" to get different splits. Different datasets expose different splits. If you need finer grained splits, see the (Splits documentation](https://www.tensorflow.org/datasets/splits).
  2. See 1.
  3. You can train on multiple epochs by calling dataset.repeat(5) (to train on 5 epochs). tf.data unfortunately does not expose anything special for end of epoch, but with TFDS, every dataset has stats on the number of examples per split (info.splits["train"].num_examples) so you can compute how many steps are in your epoch (depending on your batch size). You could also not call dataset.repeat() and instead recreate the input pipeline at the top of each epoch. In this version, at the end of the dataset, tf.data will raise an OutOfRangeError which you can catch and means that you reached the end of the dataset/epoch.

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

Thank you. For 1, I meant, how can I substitute my training dataset with the testing dataset? For example, with regular placeholders, I would use a feed dict to feed first the training data into the placeholders and then the testing data in order to get testing results. Here, my only option seems to be copying the full set of code for a testing phase. Is there a way I can substitute the training variables with testing variables (sans the iterator/shuffling/batching)?

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

Also, if I catch an OutOfRangeError, how do I continue iterating through the dataset after that? Will repeating still happen automatically and the error go away?

from datasets.

rsepassi avatar rsepassi commented on May 22, 2024
  1. You can have a function that takes the dataset and first call it with the training dataset, then the testing dataset.
  2. OutOfRangeError: that indicates you've reached the end of the dataset. To do another epoch, you can create another iterator out of the dataset, or reset your iterator.

It'd be worth reading the guide on tf.data.

from datasets.

slerman12 avatar slerman12 commented on May 22, 2024

Thanks. My last question, if it's ok:

I did as suggested, including an analogous procedure for retrieving test batches:

test_data = test_data.repeat().shuffle(10000).batch(200).prefetch(tf.data.experimental.AUTOTUNE)
example = tf.compat.v1.data.make_one_shot_iterator(test_data).get_next()
test_images, test_labels = example["image"], example["label"]

However, I'd like to just retrieve the full test set, not batches. The one shot iterator seems to be for iterating through the dataset. Is there a way I can modify the above code to simply serve the full test set?

from datasets.

rsepassi avatar rsepassi commented on May 22, 2024

Sure, pass batch_size=-1. That will give you a Tensor with the full split. If you want NumPy, see here.

from datasets.

00krishna avatar 00krishna commented on May 22, 2024

So this is interesting. I was looking for a simple end-to-end model as well. Looks like there are a couple or more tutorials that use the Tensorflow Datasets--but they are not the introductory tutorials. There is an Advanced Tutorial entitled "Load text with tf.data" which uses Tensorflow Datasets and runs a model. I think this will run a Keras embedding model, but the point is still the same--to see how to get data from a Dataset into a running model. There is another tutorials that I have linked below. The odd thing is that the introductory tutorial for text still uses the Keras version of the IMDB dataset. So that probably should be fixed as well.

There really should be a cross-reference to this existing tutorial on the TF Datasets website. I spent a good part of the day looking for something like this, only to find it this evening :).

https://www.tensorflow.org/alpha/tutorials/load_data/text
https://www.tensorflow.org/alpha/tutorials/text/text_classification_rnn

NOTE: You won't need to use the tf.compat.v1.data.make_one_shot_iterator() function as mentioned in a previous note. The tutorials show better ways to get the data into the model.

from datasets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.