Feature request We currently have the save strategies <code class=

FYI <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes I'm open to that! On init we can set it to <code class="notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add support for walltime-based saving/logging/evaluating about transformers HOT 8 OPEN

BramVanroy commented on May 18, 2024

Add support for walltime-based saving/logging/evaluating

from transformers.

Comments (8)

BramVanroy commented on May 18, 2024 1

Ah, that's also possible, as just an extra check. What about:

save_every_minutes
log_every_minutes
eval_every_minutes

as additional arguments in TrainingArguments? Yeah for the delta we can just do (datetime-datetime).total_seconds()/60

from transformers.

muellerzr commented on May 18, 2024 1

Yep! :D

And now it's a very simple API

from transformers.

ArthurZucker commented on May 18, 2024

FYI @muellerzr

from transformers.

muellerzr commented on May 18, 2024

Seems like a good idea to me! Re; _TRAIN_START_TIME, we can set that to trainer.train() being called I think, the callbacks have a workflow that's called on training begin. (called literally on_train_begin) which only gets called once in the _inner_training_loop before the epoch iterations start.

from transformers.

BramVanroy commented on May 18, 2024

@muellerzr Just an idea, maybe the start time can be added as a property to TrainerState? It can then be read in the on step end and on epoch end of the callbacks since the state is passed to it. That would mean that the train start time is set to the initialization of the trainer, though, so perhaps the training time in the state should be set/updated on_train_begin.

transformers/src/transformers/trainer_callback.py

Line 35 in cb5927c

class TrainerState:

So concretely:

add "train_start_time" to TrainerState
add on_train_begin to DefaultFlowCallback which will set train_start_time to the current time in the state
add logic that if save/log/evaluate is set in the args, the on_step_end and on_epoch_end will set should_X to true and reset the timer

If that sounds good I can give it a go.

from transformers.

muellerzr commented on May 18, 2024

Yes I'm open to that!

On init we can set it to -99 or something equivalent to know that it's been instantiated but not started

from transformers.

BramVanroy commented on May 18, 2024

@muellerzr I started working on this. I am not entirely sure how to specify the interval, though. So in case IntervalStrategy==TIME, do we assume that logging_steps (and save, eval) are given in minutes? I considered allowing datetime strings, but that would a typing nightmare on the CLI, I fear, so keeping it as an int seems best. WDYT?

from transformers.

muellerzr commented on May 18, 2024

Just an aside, to me this would be both, better to oversave than under. The time is more of a "backup" and we keep as epoch and step based.

For interval time, use timedelta, similar to what torch.distributed uses for timeout: https://docs.python.org/3/library/datetime.html#datetime.timedelta

from transformers.

Recommend Projects

Add support for walltime-based saving/logging/evaluating about transformers HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs