Comments (14)
Hey, you can find the tutorials here: https://github.com/tensorflow/docs/tree/master/site/en/r2/tutorials/quickstart
from datasets.
So the only line you're missing is to get an iterator:
data, info = tfds.load("omniglot", with_info=True)
train_data, test_data = data['train'], data['test']
train_data = train_data.repeat().shuffle(10000).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
example = tf.compat.v1.data.make_one_shot_iterator(train_data).get_next()
# These are Tensors
image, label = example["image"], example["label"]
from datasets.
Would the process be analogous for training on Omniglot?
from datasets.
Could an example be provided without the use of eager execution? I'm used to using placeholders. Would I have to first use the numpy API and pass it to a feed dict?
from datasets.
Here was my guess:
I would ordinarily use two placeholders called image
and label
. I want to replace that workflow with TensorFlow Datasets. Here is how I tried to get those same objects with the Datasets workflow:
data, info = tfds.load("omniglot", with_info=True)
train_data, test_data = data['train'], data['test']
train_data = train_data.repeat().shuffle(10000).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
image, label = train_data["image"], train_data["label"]
from datasets.
Thank you!
from datasets.
A few more questions, if it's alright...
How do I do the same thing but for testing? I'm not super familiar with TensorFlow's data API. I imagine I wouldn't need the batch iterator. Ideally, I would have a training, validation, and testing set. Is there a way I could split the training data once more so that I have some data reserved for the validation set? Is there a way I can check if the train batch iterator completed an epoch so that I can print results at that time (using a regular print statement)?
So my questions are:
- How do I achieve the same thing but with the test data?
- How do I partition into training and validation?
- How do I dynamically print validation results every epoch?
Thank you so much if you can help me with these three areas.
from datasets.
- Splits: Pass
split="train"
orsplit="test"
to get different splits. Different datasets expose different splits. If you need finer grained splits, see the (Splits documentation](https://www.tensorflow.org/datasets/splits). - See 1.
- You can train on multiple epochs by calling
dataset.repeat(5)
(to train on 5 epochs).tf.data
unfortunately does not expose anything special for end of epoch, but with TFDS, every dataset has stats on the number of examples per split (info.splits["train"].num_examples
) so you can compute how many steps are in your epoch (depending on your batch size). You could also not calldataset.repeat()
and instead recreate the input pipeline at the top of each epoch. In this version, at the end of the dataset,tf.data
will raise anOutOfRangeError
which you can catch and means that you reached the end of the dataset/epoch.
from datasets.
Thank you. For 1, I meant, how can I substitute my training dataset with the testing dataset? For example, with regular placeholders, I would use a feed dict to feed first the training data into the placeholders and then the testing data in order to get testing results. Here, my only option seems to be copying the full set of code for a testing phase. Is there a way I can substitute the training variables with testing variables (sans the iterator/shuffling/batching)?
from datasets.
Also, if I catch an OutOfRangeError, how do I continue iterating through the dataset after that? Will repeating still happen automatically and the error go away?
from datasets.
- You can have a function that takes the dataset and first call it with the training dataset, then the testing dataset.
- OutOfRangeError: that indicates you've reached the end of the dataset. To do another epoch, you can create another iterator out of the dataset, or reset your iterator.
It'd be worth reading the guide on tf.data
.
from datasets.
Thanks. My last question, if it's ok:
I did as suggested, including an analogous procedure for retrieving test batches:
test_data = test_data.repeat().shuffle(10000).batch(200).prefetch(tf.data.experimental.AUTOTUNE)
example = tf.compat.v1.data.make_one_shot_iterator(test_data).get_next()
test_images, test_labels = example["image"], example["label"]
However, I'd like to just retrieve the full test set, not batches. The one shot iterator seems to be for iterating through the dataset. Is there a way I can modify the above code to simply serve the full test set?
from datasets.
Sure, pass batch_size=-1
. That will give you a Tensor
with the full split. If you want NumPy, see here.
from datasets.
So this is interesting. I was looking for a simple end-to-end model as well. Looks like there are a couple or more tutorials that use the Tensorflow Datasets--but they are not the introductory tutorials. There is an Advanced Tutorial entitled "Load text with tf.data" which uses Tensorflow Datasets and runs a model. I think this will run a Keras embedding model, but the point is still the same--to see how to get data from a Dataset into a running model. There is another tutorials that I have linked below. The odd thing is that the introductory tutorial for text still uses the Keras version of the IMDB dataset. So that probably should be fixed as well.
There really should be a cross-reference to this existing tutorial on the TF Datasets website. I spent a good part of the day looking for something like this, only to find it this evening :).
https://www.tensorflow.org/alpha/tutorials/load_data/text
https://www.tensorflow.org/alpha/tutorials/text/text_classification_rnn
NOTE: You won't need to use the tf.compat.v1.data.make_one_shot_iterator()
function as mentioned in a previous note. The tutorials show better ways to get the data into the model.
from datasets.
Related Issues (20)
- wider_face download fails with error 404 HOT 6
- NonMatchingChecksumError while downloading 'multi_news' or 'cnn_dailymail' dataset HOT 2
- Load only arabic language in c4/multilingual HOT 1
- [data request] <dataset name> HOT 2
- mnist access fails when loaded with ArrayRecord-based data source HOT 2
- tfds.load() does not load datasets with a capital letter HOT 2
- --num-processes causes build to error HOT 6
- Load Sentiment140 failed with HTTP 404 HOT 1
- Can not load robotics dataset HOT 4
- TFBertModel: InvalidArgumentError.__init__() missing 2 required positional arguments: 'op' and 'message' HOT 1
- Streaming dataset construction or appending to an existing dataset HOT 1
- Please support prefetch with python datasets HOT 2
- gs' not implemented HOT 2
- tfds build failed HOT 2
- need help to build a dataset from local numpy data HOT 4
- [data request] <dataset dengue> HOT 1
- [data request] <dataset educação superior no Brasil> HOT 1
- [data request] smallnorb HOT 2
- Multi-threaded compression? HOT 1
- checksum updated
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datasets.