Hi, I am testing a sample dataset that has the same format as the oa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Inconsistency with cv_blup function about metan HOT 5 CLOSED

tiagoolivoto commented on June 29, 2024

Inconsistency with cv_blup function

from metan.

Comments (5)

TiagoOlivoto commented on June 29, 2024

Hi, thank you for your report.
Could you please pass the column BLOCK to factor with dataset <- to_factor(dataset, BLOCK) ,and report if any warning is generated when you run inspect(data)?
Is there any missing value in the data? Please, use has_na(data).
Based on the levels of your experiment the expected number of rows would be 1170 (13 x 30 x 3), Am I right? Maybe have you unbalanced data? Try to make a two-way table with make_mat(dataset, ENV, GEN, GY) and see if any cell has NA.

You can use the package reprex to report the results here
Regards!

from metan.

TiagoOlivoto commented on June 29, 2024

@billh2006 I closed accidentally this issue. I added a commit that improves the check of data before computing the cross-validation. Please, install the development version and check with your data.

from metan.

billh2006 commented on June 29, 2024

@TiagoOlivoto- Thanks for your quick response. Yes-my data set is unbalanced. Certain genotypes are not represented in all environments and 8 (thank you for the new error message) of the genotype-environment interactions are missing data for one or more of the reps. Is a complete, balanced dataset a necessary condition for cv_blup and downstream functions (e.g. waasb)?

from metan.

TiagoOlivoto commented on June 29, 2024

@billh2006 - You can have unbalanced data but it is necessary that all genotype-environment combinations have the same number of replicates (complete dataset). If you assume to have three replicates, for each cross-validation procedure the function will select randomly two replicates for each genotype-environment combination to serve as training set. The other replicate will serve as validation set. If for some GEI combinations you have only two reps, the model cannot be computed because there will be no validation data for such a combination. Please, note a toy example bellow with the example data data_ge

library(metan)
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2
#> []=====================================================[]
#> [] Multi-Environment Trial Analysis (metan) v1.4.0.9000[]
#> [] Author: Tiago Olivoto                               []
#> [] Type citation('metan') to know how to cite metan    []
#> [] Type vignette('metan_start') for a short tutorial   []
#> [] Visit http://bit.ly/2TIq6JE for a complete tutorial []
#> []=====================================================[]
df <- data_ge
df[1, 4] <- NA
df
#> # A tibble: 420 x 5
#>    ENV   GEN   REP      GY    HM
#>    <fct> <fct> <fct> <dbl> <dbl>
#>  1 E1    G1    1     NA     44.9
#>  2 E1    G1    2      2.50  46.9
#>  3 E1    G1    3      2.43  47.8
#>  4 E1    G2    1      3.21  45.2
#>  5 E1    G2    2      2.93  45.3
#>  6 E1    G2    3      2.56  45.5
#>  7 E1    G3    1      2.77  46.7
#>  8 E1    G3    2      3.62  43.2
#>  9 E1    G3    3      2.28  47.8
#> 10 E1    G4    1      2.36  47.9
#> # ... with 410 more rows
cv_blup(df, env = ENV, gen = GEN, rep = REP, resp = GY, nboot = 10)
#>   ENV GEN n
#> 1  E1  G1 2
#> Error: Combinations of genotype and environment with different number of replication than observed in the trial (3)

 # An unbalanced data
df2 <- data_ge
df2[1:3, 4] <- NA
df2
#> # A tibble: 420 x 5
#>    ENV   GEN   REP      GY    HM
#>    <fct> <fct> <fct> <dbl> <dbl>
#>  1 E1    G1    1     NA     44.9
#>  2 E1    G1    2     NA     46.9
#>  3 E1    G1    3     NA     47.8
#>  4 E1    G2    1      3.21  45.2
#>  5 E1    G2    2      2.93  45.3
#>  6 E1    G2    3      2.56  45.5
#>  7 E1    G3    1      2.77  46.7
#>  8 E1    G3    2      3.62  43.2
#>  9 E1    G3    3      2.28  47.8
#> 10 E1    G4    1      2.36  47.9
#> # ... with 410 more rows
cv_blup(df2, env = ENV, gen = GEN, rep = REP, resp = GY, nboot = 10)
#> $RMSPD
#>          MODEL     RMSPD
#> 1  BLUP_g_RCBD 0.3552628
#> 2  BLUP_g_RCBD 0.3763954
#> 3  BLUP_g_RCBD 0.4266109
#> 4  BLUP_g_RCBD 0.4532448
#> 5  BLUP_g_RCBD 0.4043090
#> 6  BLUP_g_RCBD 0.4221250
#> 7  BLUP_g_RCBD 0.4243280
#> 8  BLUP_g_RCBD 0.3874620
#> 9  BLUP_g_RCBD 0.3697422
#> 10 BLUP_g_RCBD 0.3845615
#> 
#> $RMSPDmean
#> # A tibble: 1 x 6
#>   MODEL        mean     sd      se  Q2.5 Q97.5
#>   <chr>       <dbl>  <dbl>   <dbl> <dbl> <dbl>
#> 1 BLUP_g_RCBD 0.400 0.0308 0.00972 0.359 0.447
#> 
#> attr(,"class")
#> [1] "cvalidation"

^{Created on 2020-04-07 by the reprex package (v0.3.0)}

You can use both unbalanced and incomplete data with waasb() and gamem_met(). Since we don't need a validation set, the function will simply ignore the missing data

library(metan)
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2
#> []=====================================================[]
#> [] Multi-Environment Trial Analysis (metan) v1.4.0.9000[]
#> [] Author: Tiago Olivoto                               []
#> [] Type citation('metan') to know how to cite metan    []
#> [] Type vignette('metan_start') for a short tutorial   []
#> [] Visit http://bit.ly/2TIq6JE for a complete tutorial []
#> []=====================================================[]
df <- data_ge
df[1, 4] <- NA
df
#> # A tibble: 420 x 5
#>    ENV   GEN   REP      GY    HM
#>    <fct> <fct> <fct> <dbl> <dbl>
#>  1 E1    G1    1     NA     44.9
#>  2 E1    G1    2      2.50  46.9
#>  3 E1    G1    3      2.43  47.8
#>  4 E1    G2    1      3.21  45.2
#>  5 E1    G2    2      2.93  45.3
#>  6 E1    G2    3      2.56  45.5
#>  7 E1    G3    1      2.77  46.7
#>  8 E1    G3    2      3.62  43.2
#>  9 E1    G3    3      2.28  47.8
#> 10 E1    G4    1      2.36  47.9
#> # ... with 410 more rows
mod <- waasb(df, ENV, GEN, REP, GY)
#> Warning: Row(s) 1 with NA values deleted.
#> Method: REML/BLUP
#> Random effects: GEN, GEN:ENV
#> Fixed effects: ENV, REP(ENV)
#> Denominador DF: Satterthwaite's method
#> ---------------------------------------------------------------------------
#> P-values for Likelihood Ratio Test of the analyzed traits
#> ---------------------------------------------------------------------------
#>     model       GY
#>  COMPLETE       NA
#>       GEN 1.19e-05
#>   GEN:ENV 2.02e-11
#> ---------------------------------------------------------------------------
#> All variables with significant (p < 0.05) genotype-vs-environment interaction
gmd(mod)
#> Class of the model: waasb
#> Variable extracted: genpar
#> # A tibble: 9 x 2
#>   Parameters                GY
#>   <chr>                  <dbl>
#> 1 Phenotypic variance    0.181
#> 2 Heritability           0.154
#> 3 GEIr2                  0.313
#> 4 Heribatility of means  0.814
#> 5 Accuracy               0.902
#> 6 rge                    0.370
#> 7 CVg                    6.24 
#> 8 CVr                   11.6  
#> 9 CV ratio               0.537

^{Created on 2020-04-07 by the reprex package (v0.3.0)}

from metan.

billh2006 commented on June 29, 2024

Thank you!

from metan.

Inconsistency with cv_blup function about metan HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs