I was just wondering whether we should make this a kind of policy: It's okay (and expe

So here's an idea in slightly more detail: <code class="notran

Being addressed in <a class="issue-link js-issue-link" data-error-text="Failed to load

Transformers should have batch- and example-specific methods about fuel HOT 11 CLOSED

mila-iqia commented on July 21, 2024

Transformers should have batch- and example-specific methods

from fuel.

Comments (11)

vdumoulin commented on July 21, 2024

I think we should benchmark what performance hit we're looking at if we choose to do examplewise preprocessing.

It probably won't do much difference for large models (especially if we have good multithreading support to do the preprocessing in parallel), but for small models it may have a big impact.

It's not clear to me yet why batchwise and examplewise preprocessing should be mutually exclusive. I haven't looked at the code long enough to get a good high-level feel of how things fit together, so the following suggestion may not be applicable, but would it be possible to require that both batchwise and examplewise preprocessing are supported and have a default batchwise implementation that simply concatenates a bunch of examplewise calls?

from fuel.

bartvm commented on July 21, 2024

They're not mutually exclusive per se, although there is no good way of checking whether you received a single example or a batch right now, besides just checking the shape of things or something. This can get quite messy, you end up with code like "if it's a list but the first element is also a list then I'm going to assume it's a batch" but some transformers should in principle work for lists, tuples, NumPy arrays, etc.

Although I'm not too keen on the idea of hard coding a is_batch flag, although that would it easier to implement transformers that deal with both. Then transformers could have a get_example and get_batch method instead of the current get_data, and get_batch would default to get_example(example) for example in batch.

My current proposal is simply to make most transformers example-only by default. For cases where the speed up is significant and the demand is high, we could implement a second, batch-wise version e.g. a Whiten transformer as well as a BatchWhiten transformer.

from fuel.

vdumoulin commented on July 21, 2024

I still need to read the code more carefully, but I think I understand where you're getting at.

Depending on the number of useful batch transformations, we may end up having lots of Transform/BatchTransform pairs, though.

from fuel.

bartvm commented on July 21, 2024

Mm, rather than having separate transformers, or automatically trying to deduce whether something is a batch, maybe we can introduce a flag batch=True which transformers can optionally support? Transformers that don't support it just act on examples, and those that do support it implement two methods, and use one or the other based on the value of the batch flag.

from fuel.

vdumoulin commented on July 21, 2024

That would seem reasonable to me.

from fuel.

rizar commented on July 21, 2024

I fully agree that processing example-wise should be the predominate way for writing transformers. That will save lots of time for people writing and using them.

The idea to turn the optionally supported "batch mode" on seems very reasonable.

from fuel.

bartvm commented on July 21, 2024

So here's an idea in slightly more detail:

get_data becomes get_example and get_batch
Each transformer takes a keyword argument batch which defaults to False. A transformer which only supports batches, sets batch = True as a class attribute.
If a transformer doesn't have a get_batch method, but batch=True was passed, no child_epoch_iterator will be set by the get_epoch_iterator method. Instead, the DataIterator will call batch = next(self.data_stream.data_stream) to retrieve the next batch and set self.data_stream.child_epoch_iterator to iter_(batch), iterating over the examples. It will then return [self.data_stream.get_example() for _ in range(len(batch))].

This has the following limitations, but they seem sensible:

An example-transformer can't be applied to a batch if it needs an iteration scheme (because it's not clear whether each request applies to the entire batch, or if there should be one per example).
NumPy ndarrays will end up being converted to lists of arrays. I'm not sure whether to special case this (just calling numpy.asarray on batches that were ndarrays when coming in), or to just expect the user to add a kind of AsNumpyArray transformer at the end.

from fuel.

bartvm commented on July 21, 2024

Thought about it a bit more, and wondering now whether we should try to intelligently handle batches at all. I can think of quite a few issues:

Imagine a transformer which filters examples (rejecting them based on some sort of criterion). If we feed it a batch, should the size of the batch be maintained? Or should it just filter the given batch? If so, what do we do if it filters each example in the batch?
Likewise for padding; it doesn't make sense to apply the Padding stream example-wise.

So perhaps the simplest solution is the best: Transformers can implement two methods (get_example and get_batch). Which one is used depends on the default of the Transformer, or in the case both are supported, on whether the batch=True flag is set.

from fuel.

rizar commented on July 21, 2024

In the first case I would not support batch input at all. I think it is okay for some transformers to be example-only or batch-only, like your second example.

Your final proposal sounds good.

from fuel.

bartvm commented on July 21, 2024

Being addressed in #40

from fuel.

bartvm commented on July 21, 2024

Closed via #45 (rebase of #40).

from fuel.

Transformers should have batch- and example-specific methods about fuel HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs