<input type="checkbox" id="" disabled=""

How does the first test reward come before the first epoch? about tianshou HOT 1 OPEN

zichunxx commented on May 18, 2024

How does the first test reward come before the first epoch?

from tianshou.

Comments (1)

Trinkle23897 commented on May 18, 2024 2

https://github.com/copilot/c/91bb6b3c-b325-4400-ba0e-85e87af043f7

Q: where does it perform test_episode before first train step?

A: In the BaseTrainer class from the tianshou.trainer.base module, testing is performed before the first training step in the reset method. This is executed when the trainer object is initialized or made into an iterable with __iter__.

Here is the relevant code snippet for clarification from the reset method in the class BaseTrainer:

    def reset(self) -> None:
        """Initialize or reset the instance to yield a new iterator from zero."""
        self.is_run = False
        self.env_step = 0
        if self.resume_from_log:
            (
                self.start_epoch,
                self.env_step,
                self.gradient_step,
            ) = self.logger.restore_data()

        self.last_rew, self.last_len = 0.0, 0
        self.start_time = time.time()
        if self.train_collector is not None:
            self.train_collector.reset_stat()

            if self.train_collector.policy != self.policy or self.test_collector is None:
                self.test_in_train = False

        if self.test_collector is not None:    # Testing starts here
            assert self.episode_per_test is not None
            assert not isinstance(self.test_collector, AsyncCollector)  # Issue 700
            self.test_collector.reset_stat()
            test_result = test_episode(
                self.policy,
                self.test_collector,
                self.test_fn,
                self.start_epoch,
                self.episode_per_test,
                self.logger,
                self.env_step,
                self.reward_metric,
            )
            ...

As shown, if a test_collector is provided, it calls the test_episode function before the first training step. This allows the model to be evaluated against a testing set before starting training. If a train_collector is provided, it will also call the reset_stat on the train_collector to clear any previous statistics. If the testing policy doesn't match the training policy or no test_collector is provided, it will not perform testing during the training phase.

from tianshou.

Recommend Projects

How does the first test reward come before the first epoch? about tianshou HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs