GithubHelp home page GithubHelp logo

Robyn Refresh error about robyn HOT 41 OPEN

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024
Robyn Refresh error

from robyn.

Comments (41)

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024 1

@laresbernardo that works, thanks for the fix!

from robyn.

shuvayan avatar shuvayan commented on May 29, 2024

I am also getting the same error while refreshing the model :
image

I think the issue is with the robyn_chain function as the "json_new$InputCollect$refreshSourceID" part returns NULL.
Maybe it is not being able to read the correct file.

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

There are a couple of bugs I've found in refresh functionality and am working on fixes on my end. Can you try updating Robyn with Robyn::robyn_update(ref = "bl01") and see how it goes? If it goes well I can merge to main afterwards.

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

your second screenshot might be caused by too few iterations for refresh. can you increase refresh iters and retry? For the other errors, I'm waiting for the result from Bernardo's fixes first.

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

@shuvayan @bart-vanvlerken we are planning on landing a CRAN stable version this week. Please, let us know asap if the issue persists with the version of the branch I shared so we can move forward or hold it.

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

@laresbernardo @gufengzhou I'm going to try Bernardo's latest version today and will let you know! Thanks for all the help.

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Update. I updated to Bernardo's latest version, and the refresh functionality works again.
image

However, I'm running into the same error: 'Provided train_size but ts_validation = FALSE. Time series validation inactive.' when I set ts_validation = TRUE in my base model. In addition, I'm getting two new peculiar warning messages (3 & 4, see screenshot below).
image

In addition, the output tells me it selected 2_69_2 as optimal model and exported it as JSON, but I'm not seeing this model as a onepager in the output folder. When trying to produce the onepager with robyn_csv(), I'm getting the following error:
image

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Thanks for confirming @bart-vanvlerken. So these are all warnings, not errors, which is good because you can refresh the model. about the warnings:

  • Intended: The first warning refers to the ts_validation parameter that uses the train_size hyperparameter. If the original model had ts_validation = FALSE, then the refresh model too.
  • Intended: The second warning is displaying a warning based on the calibration constraints and recommending more iterations.
  • Intended: We haven't be able to replicate but, when the structure changes as in you refresh a model and store it elsewhere (not within the original model's folder) you can get these warnings. It will still work but won't have the origins information. I can see in the logs the rf1 folder was created correctly but it's not inside an original model's folder.
  • May or may not be intended: probably a consequence of the previous point (no chain, can't create that data.frame). We could get rid of this one with a quick fix. Checking origin: Robyn:::refresh_plots_json()

By default, Robyn creates one-pagers for the lowest combined errors models per cluster. The default selected model on refresh is the one with lowest DECOMP.RSSD error (NRMSE not used in the criteria) which not necessarily will match. We could improve this behavior on following versions though, but it's not exactly an error. You can recreate any one-pager with robyn_onepagers() ;)

About the file, did you check for the CSV file created and shown in the log in that specific folder? Wasn't it created?

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @laresbernardo, you're right that they are warnings and not errors. However, I did specify ts_validation = TRUE in my base model (and the exported JSON). So I think it's strange that robyn_refresh() does not recognize this, when in the current CRAN version this is not an issue.

Regarding the use of robyn_csv() instead of robyn_onepagers(), that was my bad! Thank you for clearing that up :)

Regarding the exported onepagers in the refresh, it exported four onepagers and the optimal onepager (which was also exported as CSV) was not one of them.

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Alright @bart-vanvlerken would you mind updating to my branch once again and retrying? I've just deployed a fix to the ts_validation issue.

On the one-pagers, as I mentioned before:

By default, Robyn creates one-pagers for the lowest combined errors models per cluster. The default selected model on refresh is the one with lowest DECOMP.RSSD error (NRMSE not used in the criteria) which not necessarily will match. We could improve this behavior on following versions though, but it's not exactly an error. You can recreate any one-pager with robyn_onepagers() ;)

I'll try to add the selected refresh model to the list of exported one-pagers in addition to the clustering soon, and will let you know in this thread.

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @laresbernardo, I updated and tested. The ts_validation issue is now resolved! However, the 'optimal' model chosen by the refresh function is still not exported as a onepager. I wanted to recreate this myself since the JSON file was exported correctly. However, when running robyn_recreate() I'm running into the following error:
image

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Alright @bart-vanvlerken, would you mind updating Robyn to my branch (bl01) and retrying? You should get the one-pager for the winning refresh model created by default.

And also, probably have fixed the "penalty" hyperparameters check issue. Please confirm.

from robyn.

shuvayan avatar shuvayan commented on May 29, 2024

Hello @laresbernardo ,

I am getting the below error after updating to your branch:

Recreating model
Imported JSON file successfully: C:/Users/SD/Documents/Robyn_Modular/Robyn_202405021221_init/RobynModel-models.json

>> Running feature engineering...
NOTE: potential improvement on splitting channels for better exposure fitting. Threshold (Minimum R2) = 0.8 
  Check: InputCollect$modNLS$plots outputs
  Weak relationship for: "Audio_i", "Cable_i", "CTV_Hulu_i", "Display_i", "META_i", "Radio_i", "SA360_i", "Snapchat_i", "TikTok_i", "TV_i", "YouTube_i" and their spend
Error in UseMethod("select") : 
  no applicable method for 'select' applied to an object of class "NULL"

I am using the below code for model refresh:

RobynRefresh <- robyn_refresh(
  json_file = json_file,
  dt_input = mmm_input_ufpf,
  dt_holidays = dt_prophet_holidays,
  refresh_steps = 7,
  refresh_iters = 1000, # 1k is an estimation
  refresh_trials = 1
)

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

@shuvayan Can't replicate your issue. Maybe it's related to the impressions variables. Would you mind sharing your mmm_input_ufpf CSV and the json file via email to me? It's my user @ gmail.com.

@bart-vanvlerken is it running OK for you?

from robyn.

shuvayan avatar shuvayan commented on May 29, 2024

@laresbernardo , I have shared the email, please check!

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @laresbernardo, I used your latest version and the one-pager is exported correctly. However, I'm still not able to reproduce the model with the exported JSON. It gives me the following error:
image

In addition, I question the decision to optimize refreshing models based on DECOMP.RSSD since it has a poor impact on model accuracy as you can see below.
image

Finally, I see quite a discrepancy in ROIs between the base model...
image
... and the refreshed model, despite only adding 1 new observation to the data.
image
This is quite difficult to communicate to stakeholders, I hope you can have a look at this as well. Thanks for all your help so far!

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Imported JSON file successfully: C:/Users/SD/Documents/Robyn_Modular/Robyn_202405021221_init/RobynModel-models.json

@shuvayan it seems you're trying to use a JSON file that is NOT a model. RobynModel-models.json is not a valid JSON file to recreate a model but the whole iterations process. Let me check though if I can replicate the issue..

In addition, I question the decision to optimize refreshing models based on DECOMP.RSSD since it has a poor impact on model accuracy as you can see below.

Finally, I see quite a discrepancy in ROIs between the base model image ... and the refreshed model, despite only adding 1 new observation to the data.

@bart-vanvlerken would you mind opening new threads to discuss these? I may agree with you, but I'd like for @gufengzhou to explain why he decided to change from the combined minimum error to only DECOMP.RSSD and provide some more context.

Also, if you'd like to share with me the model's JSON file and CSV to replicate the issues, please send it so I can try and replicate your issue with robyn_recreate().

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

Hi @bart-vanvlerken , I've just tested refresh with 4 new datapoints. It looks better than what you showed. You only used 200 refresh_iters, right? That's probably not enough. I used 1k. Regarding decomposition, please look at report_decomposition.png, not the default onepager.

Regarding objectives, refresh still uses both NRMSE and DECOMP.RSSD to optimise. Only the final automated winner selection will rely on decomp, because the challenge of refresh is often the too big changes in decomp.

report_actual_fitted
report_decomposition

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @gufengzhou, that is odd. In the screenshots above I actually used 3 x 2000 iterations so that convergence was not an issue. However, it still did not converge, perhaps because I based my refresh off of a calibrated model (so MAPE was active as well). Could that also be the reason we're seeing such big differences in model fit?

I only see the left [0] visualization in my report_decomposition.png file for some reason.

@laresbernardo Here is my JSON file, it's basically a configuration that aims to test most functionalities that Robyn has to offer.
RobynModel-1_1020_3.json

And the corresponding CSVs, basically the data that comes with the Robyn package:
clean_data.csv
clean_prophet.csv

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

(...) However, I'm still not able to reproduce the model with the exported JSON. It gives me the following error:

@bart-vanvlerken I can't seem to be able to reproduce you issue recreating a model. As you can see, with your JSON and CSV I was able to recreate the model with the latest version in branch "bl01":

csv <- read.csv("~/Desktop/clean_data.csv")
json_file <- "~/Desktop/RobynModel-1_1020_3.json"
temp <- Robyn::robyn_recreate(json_file, dt_input = csv)
>>> Recreating 1_1020_3
Imported JSON file successfully: ~/Desktop/RobynModel-1_1020_3.json
>> Running feature engineering...
Input data has 208 weeks in total: 2015-11-23 to 2019-11-11
Initial model is built on rolling window of 104 week: 2016-11-21 to 2018-11-12
>>> Calculating response curves for all models' media variables (5)...
Successfully recreated model ID: 1_1020_3
Warning messages:
1: In check_calibration(dt_input, date_var, calibration_input, dayInterval,  :
  Your calibration's spend (42,148) for facebook_S between 2018-05-01 and 2018-06-10 does not match your dt_input spend (~14.05K). Please, check again your dates or split your media inputs into separate media channels.
2: In check_calibration(dt_input, date_var, calibration_input, dayInterval,  :
  Your calibration's spend (2,841) for tv_S between 2018-04-03 and 2018-06-03 does not match your dt_input spend (~947). Please, check again your dates or split your media inputs into separate media channels.
3: In check_calibration(dt_input, date_var, calibration_input, dayInterval,  :
  Your calibration's spend (67,039) for facebook_S+search_S between 2018-07-01 and 2018-07-20 does not match your dt_input spend (~22.35K). Please, check again your dates or split your media inputs into separate media channels.

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

@laresbernardo I'm sorry, I shared the base model with you (which works fine for me as well). Could you try to reproduce the refreshed model instead? That's what gave me the error. Here is the JSON:
RobynModel-1_168_7.json

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Alright, now I was able to find the problem and fixed it. Thanks.
Would you mind updating to "bl01" again and retrying. It was an issue when recreating a model which used penalties parameter @bart-vanvlerken

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

I guess in this thread the only pending issue is:

I only see the left [0] visualization in my report_decomposition.png file for some reason.

@gufengzhou would you mind checking this one? Are you able to reproduce? Is it because it doesn't find the original models maybe? Perhaps we should include all past models' information in refreshed models (JSON) instead of the model IDs / chain? I'd leave this improvement as a backlog task for now.

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Changes are ready to land in main branch. For the report_decomposition.png issue, I'd suggest we open a new clean thread. FYI: if an error occurs with the creation of this file, the pipeline won't crash results given it's wrapped with try().
Pull Ref: #969 @gufengzhou -> pending review

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Landed to main: v3.10.7. Please update and retest. If no issues are reported in a couple of weeks we will land this version as the latest stable version in CRAN. Thanks for the feedback @bart-vanvlerken @shuvayan

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @laresbernardo, I just tested refreshing a model on 3.10.7 and I'm getting the following error:
image

Here is my data so you can reproduce:
RobynModel-3_185_8.json
clean_data_refresh.csv
clean_prophet.csv

from robyn.

laresbernardo avatar laresbernardo commented on May 29, 2024

Hi @bart-vanvlerken thanks for sharing your files. I was able to replicate the error and have fixed the issue on my end. Would you mind testing with branch "bl02" now and confirming if it's running as expected for you? Update robyn_update(ref = "bl02"), refresh R session, and retry. After that, we can merge with main branch again, v3.10.7.9000 in a new PR. Thanks!

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @laresbernardo the refresh functionality works, but even with one new observation of data it's generating completely different ROAS figures than the base model:
image
Interestingly, the report_decomposition.png paints a different picture, but the ROAS metrics in this visual do not correspond with the base model that was built.
image

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

Hi, regarding the interpretation of refresh, the second plot is related to the file report_aggregated.csv. The concept behind refresh is that it should maintain a stable baseline compared to initial model, but reflect the changes in the new data. You should use report_aggregated.csv and report_decomposition.png to "report" the refresh results.

Let's say you add 4 weeks, then refresh will try to find the best fit and smaller decomp error for the added 4 weeks, while keeping the refresh baseline similar to initial model. It means, in the second plot above, the 2_128_7 [0] is from the initial model (initial modeling window) and 1_450_5 [1] is from the 4 refresh weeks.

The "normal onepagers" are always referring to the entire modelling window that is not exactly relevant for refresh. Hope it makes sense.

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @gufengzhou, thanks for your swift reply! There are two things confusing to me that I hope you can clear up:

  • You mention the plot below is from the modeling window of the initial model, then why are the ROAS numbers different from the initial model's one-pager (that also report results of the modeling window)?
    image
    image
  • You mention the plot below is from the 4 refresh weeks of the refreshed model, why are the ROAS numbers identical to the one-pager of the refresh (which reports results over the entire modeling window)?
    image
    image

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

Hi, I just spent some time to look through the code base and checked a refresh case. I'm using the latest GitHub version. So far, the ROAS of each model are identical in the onepagers & the report decomp png.

Your result definitely doesn't look right. What caught my eye is that in this comment from you, the report_decomposition.png plot shows the initial model 2_128_7 [0] on the right side, the 1st refresh on the left side. This shouldn't be the case. Not sure what caused this for you. You can see in my example below that the initial model is always on the left side. Maybe you can test run a quick new one with the latest version (no need for full iters), then refresh it to see if this is still the case?

However, you're right about one thing. The ROAS of the 1st refresh (1_71_1 in my case) of the report_decomposition.png shouldn't be identical as its onepager. Now both numbers are the ROAS of the media across entire refresh modeling window, while we actually want to have the ROAS for new periods only for report_decomposition.png. We'll see this gets fixed. Also FYI, compared to ROAS, the effect share (or decomposition) of 1_71_1 in report_decomposition.png is indeed reporting the new period, NOT the entire refresh modeling window.

image

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @gufengzhou, great that you will work on a fix! How strange that your ROAS figures do correspond and mine don't! I tried again using the latest Github main branch and the visualization issue also persists unfortunately.

Here is the model and (demo) data used to come to my findings (note that clean_data_refresh is the full dt_simulated_weekly dataset, where I used a trimmed version to build the base model).
RobynModel-1_119_3.json
clean_data_refresh.csv
clean_prophet.csv
Here is some more additional information:
image

In addition, I'm getting the following output at the end of the refresh that might have something to do with it:
image

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

I found the issue. You're using both ts_validation = TRUE and add_penalty = TRUE, and there's a bug that doesn't pick up the train_size and the penalties when recreating the model.

I've reran a job with ts_validation and penalty both true, then exported json and recreated it using the following code. Now the results are identical, which was not the case before.

json_path <- "/Users/gufengzhou/Desktop/Robyn_202405141527_init/RobynModel-1_136_7.json"
RobynRecreated <- robyn_recreate(
  json_file = json_path,
  dt_input = dt_simulated_weekly,
  dt_holidays = dt_prophet_holidays,
  quiet = FALSE)
InputCollectX <- RobynRecreated$InputCollect
OutputCollectX <- RobynRecreated$OutputCollect

get_json <- read_json(json_path)
get_json_tab <- bind_rows(sapply(get_json$ExportedModel$summary, function(x) as.data.frame(t(as.matrix(unlist(x))))))

OutputCollectX$xDecompAgg %>% 
  select(solID, rn, xDecompAgg) %>% 
  left_join(
    select(get_json_tab, rn = variable, xDecompAgg_json = decompAgg) %>% 
      mutate(xDecompAgg_json = as.numeric(xDecompAgg_json)), 
    by = "rn")

image

Can you please check? It's on the branch fix_recreate_with_penalty

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

Hi @gufengzhou, I've updated to the branch you mentioned and I noticed that the refresh function does not work on recreated models: when I use the JSON file generated by recreating the base model (as done in the demo script) it gives me the following error:
image
The JSON of the base model itself works, which implies that these files representing the same model are in some way different.

Unfortunately the main issue is not solved for me either - in report_decomposition.png the visualizations are still twisted and the ROAS figures do not correspond with the one-page of the base model. Could it be something else?

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

This is very strange. I've just tested robyn_refresh again and it works. See my screenshot. Also the report_decomposition.png looks correct. Are you 100% certain that you've got the right branch?

Also, can you verify if this script gives you the identical result?

image

I also see that you're having the "Must provide 'hyperparameters' in robyn_inputs..." error again. Not sure why I don't have it. I do notice that whenever I'm using your json to test robyn_recreate, I don't get identical decomp between json and recreated model. But when I use jsons exported from latest package version, I get identical results. I think if you can get identical results from the script mentioned above, then we're one step closer.

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

@gufengzhou I will try your solution, in the meanwhile I'd like to send you my R code so you can try to reproduce - can I mail it to you?

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

sure. [email protected]

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

@gufengzhou I'm positive i've used your version, as you can see below:
image

Your script gives me identical results, but I'm afraid we are comparing different things: you are comparing the base model JSON with the recreated model, while I was comparing the base model JSON with the recreated base model JSON. In order to produce the recreated base model JSON, I used the alternative approach that's mentioned in the demo script (since robyn_recreate() does not do so automatically):
image
The model JSON generated by this approach does not match, as you can see below, and generates an error when used in robyn_refresh():
Untitled

Have you had any luck reproducing my errors with the code I sent you over email? I hope we can resolve this!

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

Alright...you're obviously a very thorough person:) You're looking at differences at the 5th digits. I'm gonna let this one count as matched. There're just several rounding steps that I don't even remember that might be the reason for this discrepancy. But I'm quite certain this is acceptable for vast majority of use cases. I did another comparison across original object, original json, recreated object and recreated json. I'd count them as identical. I'll check your refresh error bit later.

image

from robyn.

bart-vanvlerken avatar bart-vanvlerken commented on May 29, 2024

That would be great, because a client is not able to refresh their model currently (on the CRAN version nor the dev version). I've sent you the data + JSON over mail so you can inspect at your convenience!

from robyn.

gufengzhou avatar gufengzhou commented on May 29, 2024

After a quick check, I can't recreate the same model result from your json from the email (before we get to the refersh issue). I noticed that the Json is created with a previous version, so I kind of suspect this is the cause. Would you please try to remodel it with the latest version, select a comparable candidate as your old one, then use the new Json to check recreate & refresh? Please understand that this package is still undergoing constant development, so backwards compatibility is not always possible, even tho we try.

from robyn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.