It seems that when early_stopping_round=0 , the callba

I think that's precisely the problem, that the documentation says <code class="notrans

Alright, draft PR submitted on a forked copy of the repo. <a class="user-mention notr

Thanks for using LightGBM and for the report <a class="user-mention notranslate" data-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

... just ignore that test. Sound good? <p dir=

[python-package] Early stopping callback added when early_stopping_round = 0,about microsoft/lightgbm

Comments (15)

jmoralez commented on June 24, 2024 2

I think that's precisely the problem, that the documentation says early_stopping_round<=0 means disable and we're enabling it there, so maybe the check should be if params.get("early_stopping_round", 0) > 0:

from lightgbm.

ddelzell commented on June 24, 2024 1

Thanks for the quick reply, James. Yes, I would be happy to submit a PR for this. It saves us a PR on our end to not have to code a workaround! I'll give it a whirl and you can see what you think!

from lightgbm.

jameslamb commented on June 24, 2024 1

No problem, happy to help! We have an R package here too and lots to do on it if you'd ever like to help out with that 😊

We already have a test checking that error message, so you'll have to just modify that.

LightGBM/tests/python_package_test/test_callback.py

Lines 24 to 32 in 28536a0

 def test_early_stopping_callback_rejects_invalid_stopping_rounds_with_informative_errors(): 

 with pytest.raises(ValueError, match="stopping_rounds should be an integer and greater than 0. got: 0"): 

 lgb.early_stopping(stopping_rounds=0) 

 with pytest.raises(ValueError, match="stopping_rounds should be an integer and greater than 0. got: -1"): 

 lgb.early_stopping(stopping_rounds=-1) 

 with pytest.raises(ValueError, match="stopping_rounds should be an integer and greater than 0. got: neverrrr"): 

 lgb.early_stopping(stopping_rounds="neverrrr")

For testing the rest of this change (that passing "early_stopping_rounds": 0 or "early_stopping_rounds": -1 through params works), follow the example in this test:

LightGBM/tests/python_package_test/test_engine.py

Line 913 in 28536a0

@pytest.mark.parametrize("first_metric_only", [True, False])

Add a new test under that one, but passing "early_stopping_rounds": 0 through params, and assertions confirming that LightGBM did the right thing.

Avoid duplicating all the test code for different values (e.g. 0, -1, -10, None) by using pytest.parametrize

LightGBM/tests/python_package_test/test_engine.py

Line 118 in 28536a0

 @pytest.mark.parametrize("objective", ["regression", "regression_l1", "huber", "fair", "poisson", "quantile"]) 

from lightgbm.

ddelzell commented on June 24, 2024 1

Alright, draft PR submitted on a forked copy of the repo. @jameslamb it looks like your review was already requested, but just in case I'm leaving a comment so you see it. Thanks!

from lightgbm.

jameslamb commented on June 24, 2024

Thanks for using LightGBM and for the report @ddelzell ! I agree, this is a bug.

This condition is only checking for the presence of early_stopping_round in params, instead of its value ...

LightGBM/python-package/lightgbm/engine.py

Line 239 in 28536a0

if "early_stopping_round" in params:

... because it's assuming that if early_stopping_round was not provided or was set explicitly to None, that it will have already been removed by this:

LightGBM/python-package/lightgbm/engine.py

Lines 185 to 192 in 28536a0

 # setting early stopping via global params should be possible 

 params = _choose_param_value( 

 main_param_name="early_stopping_round", 

 params=params, 

 default_value=None, 

 ) 

 if params["early_stopping_round"] is None: 

 params.pop("early_stopping_round")

But doing that .pop() means that whether you passed None or 0 or -1 or -1000 for early_stopping_rounds is thrown away and not saved in the model file. So I think the right fix will be to update these two conditions (one in train(), one in cv()):

LightGBM/python-package/lightgbm/engine.py

Line 239 in 28536a0

if "early_stopping_round" in params:

LightGBM/python-package/lightgbm/engine.py

Line 763 in 28536a0

if "early_stopping_round" in params:

To check the value.

Are you interested in contributing that fix?

It'd mean doing the following:

updating those 2 checks
adding new unit tests for early stopping that capture this specific case

There are some tips on how to contribute in #6350, and you could open a draft pull request and @ me or the other maintainers for help with the process.

from lightgbm.

jameslamb commented on June 24, 2024

It'd also be worthwhile, before we consider this issue fixed by any PRs, to check whether the R package is also not treating params[["early_stopping_round"]] <= 0 correctly.

from lightgbm.

ddelzell commented on June 24, 2024

@jameslamb FYI I have my dev environment set up and all the unit tests pass except for those in test_arrow.py with the same errors reported here. So I'm going to follow your previous advice and just ignore that test. Sound good?

ALSO, just to confirm we are on the same page. We want to remove the pop so that we always keep the parameter's user-set value and just check if it's >0 before adding the callback.

from lightgbm.

jameslamb commented on June 24, 2024

... just ignore that test. Sound good?

That's totally fine... this shouldn't affect Arrow functionality and we can rely on Continuous Integration to confirm that.

But if you want to see the entire test suite pass, follow the advice a few more comments down (#6350 (comment)) and install pyarrow in your local dev environment.

We want to remove the pop

No, that .pop() only runs if early_stopping_round was not provided at all. It should be kept.

I was thinking we'd replace this

if "early_stopping_round" in params:
  callbacks_set.add(
    callback.early_stopping(

With this:

if params.get("early_stopping_round", 0) > 0
  callbacks_set.add(
    callback.early_stopping(

But actually, let's slow down for a second... could you share a minimal example showing a way that you used lightgbm code with early_stopping_round = 0 where it did not raise an exception?

I'm wondering how this error wasn't raised:

LightGBM/python-package/lightgbm/callback.py

Lines 283 to 284 in 28536a0

 if not isinstance(stopping_rounds, int) or stopping_rounds <= 0: 

 raise ValueError(f"stopping_rounds should be an integer and greater than 0. got: {stopping_rounds}")

Sorry, forgot about that check until today.

from lightgbm.

ddelzell commented on June 24, 2024

OK, so to your first point. Is this .pop command really to take care of an invalid value for early_stopping_round? I'd agree with that given the documentation. So we DON'T want to keep that key if the user (for whatever reason) set the value to 'None'. Right?

Your 2 code chunks are the same. typo? I was going to replace

if "early_stopping_round" in params:

with

if params.get("early_stopping_round", 0) > 0:

That would work if a None key was already removed.

from lightgbm.

ddelzell commented on June 24, 2024

Ah, just saw new comment. So I think we agree!

from lightgbm.

ddelzell commented on June 24, 2024

@jameslamb to your last question, that's exactly the exception that was raised for me when early_stopping_round = 0.

from lightgbm.

jameslamb commented on June 24, 2024

that's exactly the exception that was raised for me

Sorry, trying to do too many things at once 😅 .

I was thinking about this first sentence in your description: "when early_stopping_round=0, the callback that initiates early stopping is added". That couldn't be true, because that error should be raised when calling callback.early_stopping(). But re-reading it, I understand now that you were not saying "it is successfully added" but more like "lightgbm attempts to add it". My fault!

Your 2 code chunks are the same. typo?

Yes sorry, hit ENTER too fast 😅 . I just edited it and I agree with you and @jmoralez , it would be params.get("early_stopping_round", 0) > 0.

But now that I'm looking closely at it ... I think we also should change the constructor of the early stopping callback like this.

Before:

if not isinstance(stopping_rounds, int) or stopping_rounds <= 0:
    raise ValueError(f"stopping_rounds should be an integer and greater than 0. got: {stopping_rounds}")

self.stopping_rounds = stopping_rounds
self.enabled = True

After:

if not isinstance(stopping_rounds, int):
    raise ValueError(f"stopping_rounds should be an integer. Got {type(stopping_rounds)}")

self.stopping_rounds = stopping_rounds

if stopping_rounds > 0:
     self.enabled = True
else:
    self.enabled = False

It's possible (and actually encouraged) to import lightgbm.callbacks.early_stopping and pass it directly to the callbacks list, instead of initializing it via passing early_stopping_rounds in params.

Like this

LightGBM/examples/python-guide/sklearn_example.py

Line 25 in 28536a0

 gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric="l1", callbacks=[lgb.early_stopping(5)]) 

Is this .pop command really to take care of an invalid value for early_stopping_round?

It helps to handle the case where "early_stopping_round': None was passed through by user code, but that's not the only reason we use that there.

Most lightgbm parameters can be supplied via several different aliases. For example, if you look at https://lightgbm.readthedocs.io/en/latest/Parameters.html#early_stopping_round it lists any of these as equivalent:

early_stopping_round
early_stopping_rounds
early_stopping
n_iter_no_change

In addition to that, wrappers like the Python and R packages also have keyword arguments in some of their APIs which conflict with the other LightGBM parameters. For example, in the train() function in the Python package, these 2 calls are equivalent:

lgb.train(..., num_boost_round=10)
lgb.train(..., params={"num_iterations": 10})

So code in those interfaces often has to resolve multiple competing sources of configuration for the same value. In roughly this order of precedence (latest item in the list, if present, wins):

keyword argument passed to function (like num_boost_round=10 in train())
any alias for that value passed through params dictionary (e.g. {"num_tree": 10})
the "main" parameter passed through params (e.g. {"num_iterations": 10})
- the "main" one is the one furthest to the left in the docs, not included in "aliases: ", eg.g. num_iterations at https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_iterations

Since that's done so frequently, we centralized that logic in an internal function _choose_param_value().

LightGBM/python-package/lightgbm/basic.py

Line 626 in 28536a0

 def _choose_param_value(main_param_name: str, params: Dict[str, Any], default_value: Any) -> Dict[str, Any]: 

It takes in a params dictionary and modifies it by reference.

So this call right before the .pop() we're discussing ...

LightGBM/python-package/lightgbm/engine.py

Lines 186 to 190 in 28536a0

 params = _choose_param_value( 

 main_param_name="early_stopping_round", 

 params=params, 

 default_value=None, 

 )

... will add "early_stopping_round": None to the params dict if there were not any instances of early_stopping_round or its aliases passed through params.

The .pop() after is there to remove it in such instances.

from lightgbm.

jameslamb commented on June 24, 2024

Hope that helps. Sorry this is so complicated.

Parameter resolution is a particularly complex part of using LightGBM, and especially its Python and R packages. There's a higher-than-is-probably-necessary level of flexibility in these interfaces that have been in the libraries for a while, which we've chosen to preserve to avoid breaking existing code relying on them.

If you see any opportunities to make non-breaking simplifications while you're looking through the code, we'd welcome that 😊

from lightgbm.

ddelzell commented on June 24, 2024

Wow, that was thorough, and as a relatively new Python user (I was an R-exclusive academic until about 1.5 years ago) super, super informative. Thank you!

I'll make the change to the callback, as well.

And lastly, any input on the unit test(s) (also a relatively new thing for me)? I'm guessing I want to test if early stopping stays disabled if it's 0? Also check that I get the value error if it's not an integer? And should that be a type error, not a value error?

from lightgbm.

jameslamb commented on June 24, 2024

🙌🏻 thanks so much! We can move the discussion over there.

from lightgbm.

[python-package] Early stopping callback added when early_stopping_round = 0 about lightgbm HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	def test_early_stopping_callback_rejects_invalid_stopping_rounds_with_informative_errors():
	with pytest.raises(ValueError, match="stopping_rounds should be an integer and greater than 0. got: 0"):
	lgb.early_stopping(stopping_rounds=0)

	with pytest.raises(ValueError, match="stopping_rounds should be an integer and greater than 0. got: -1"):
	lgb.early_stopping(stopping_rounds=-1)

	with pytest.raises(ValueError, match="stopping_rounds should be an integer and greater than 0. got: neverrrr"):
	lgb.early_stopping(stopping_rounds="neverrrr")

	# setting early stopping via global params should be possible
	params = _choose_param_value(
	main_param_name="early_stopping_round",
	params=params,
	default_value=None,
	)
	if params["early_stopping_round"] is None:
	params.pop("early_stopping_round")

	if not isinstance(stopping_rounds, int) or stopping_rounds <= 0:
	raise ValueError(f"stopping_rounds should be an integer and greater than 0. got: {stopping_rounds}")