mages / chainladder Goto Github PK
View Code? Open in Web Editor NEWClaims reserving models in R
Home Page: https://mages.github.io/ChainLadder/
Claims reserving models in R
Home Page: https://mages.github.io/ChainLadder/
I'd find useful when working with the package to be able to easily extract the SO-NE diagonal of a triangle, the same way base:::diag
extracts the NO-SE diagonal of a matrix.
Would this be a welcome addition to the package? I'll gladly contribute the patch.
It's thank time again, eh Markus? Thanks to github/ChainLadder/Compare, the current NEWS holds all the changes to the repository since CRAN2015 (August). I think this is an opportunity for contributors to brag a little bit. Let's put together a little vignette that demonstrates the reasons for the changes with use cases. I can volunteer to start the RMarkdown file -- or someone else can if you get to it within the next couple of days. Then when we fork and add material, hopefully git will keep it all straight. (This wouldn't be the first time I misunderstood how git works, so if I'm wrong, correct me! :-) ) Would be nice to make this available with the next CL release. Markus, what do you think about targeting CRAN2016 for either Aug if there's no broad interest or September if folks are interested. Simply comment to this "Issue".
Hi,
is there a build-in method to calculate the cashflow projection from the complete triangle? Currently I am running the following code, which might be usefull for others as well
triang2cashflow <- function(matj) {
cf = rep(0, nrow(mat)-1)
for(i in 1:length(cf)){
idx_row = nrow(mat):(1+i)
idx_col = (1+i):nrow(mat)
tmp = 0
for(j in seq_along(idx_col)){
tmp = tmp + mat[idx_row[j], idx_col[j]]
}
cf[i] = tmp
}
cf
}
When two origin periods are at the same age, the statistics are calculated but the recursive generation at future ages fails due to incorrect looping logic. Here is a 3x3 example from GenIns:
G <- GenIns[8:10,1:3]
summary(MackChainLadder(G, est.sigma = "Mack"))$ByOrigin
Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR)
8 2864498 1.0000000 2864498 0 0.0 NaN
9 1363294 0.4961176 2747925 1384631 234192.7 0.1691373
10 344014 0.1311672 2622713 2278699 305432.5 0.1340381
Now duplicate the last row:
G <- rbind(G,
11
= G["10",])
summary(MackChainLadder(G, est.sigma = "Mack"))$ByOrigin
Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR)
8 2864498 1.0000000 2864498 0 0.0 NaN
9 1363294 0.4961176 2747925 1384631 0.0 0.00000000
10 344014 0.1311672 2622713 2278699 226228.3 0.09927961
11 344014 0.1311672 2622713 2278699 305432.5 0.13403811
Origin year 9 loses its standard error and origin years 10 and 11 should be the same.
The ability to handle rows at the same age is important when analyzing origins broken down into more detail. I will work on a solution that incorporates ChainLadder's GetLatestCumulative function.
When getting the plot of Chain ladder developments by origin period it is throwing an error saying In expand.grid(origin = as.numeric(dimnames(.FullTriangle)$origin), :
NAs introduced by coercion. Suggestions on resolving?
filemock3 <-file.choose()
mock3 <- read.csv(filemock3, header = FALSE)
mock3 <- as.triangle(as.matrix(mock3))
mock3
dev
origin V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 573 1392 1736 2035 2259 2360 2349 2311 2324 2324 2324
2 710 1600 1889 2026 2077 2093 2105 2105 2107 2110 NA
3 919 1947 2237 2372 2458 2460 2464 2463 2463 NA NA
4 1459 3488 4325 4623 4717 4728 4734 4738 NA NA NA
5 2145 6824 8875 9777 10093 10128 10131 NA NA NA NA
6 3976 11049 13727 14736 14960 15077 NA NA NA NA NA
7 6747 15757 18463 19713 20110 NA NA NA NA NA NA
8 6019 13139 15679 16952 NA NA NA NA NA NA NA
9 5360 12320 14707 NA NA NA NA NA NA NA NA
10 6012 12641 NA NA NA NA NA NA NA NA NA
11 6603 NA NA NA NA NA NA NA NA NA NA
mock3mack <- MackChainLadder(mock3, est.sigma = "Mack")
mock3mack
MackChainLadder(Triangle = mock3, est.sigma = "Mack")
Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR)
1 2,324 1.000 2,324 0.00 0.000 NaN
2 2,110 1.000 2,110 0.00 0.977 Inf
3 2,463 0.999 2,465 1.67 3.129 1.876
4 4,738 0.997 4,752 13.55 13.965 1.031
5 10,131 1.000 10,129 -1.55 62.252 -40.085
6 15,077 1.000 15,084 7.38 89.183 12.078
7 20,110 0.992 20,275 165.02 186.541 1.130
8 16,952 0.967 17,521 569.46 306.574 0.538
9 14,707 0.896 16,405 1,698.44 389.187 0.229
10 12,641 0.741 17,050 4,409.25 646.818 0.147
11 6,603 0.314 21,046 14,443.33 2,244.681 0.155
Totals
Latest: 107,856.00
Dev: 0.84
Ultimate: 129,162.55
IBNR: 21,306.55
Mack.S.E 2,548.62
CV(IBNR): 0.12
plot(mock3mack)
plot(mock3mack, lattice=TRUE)
Warning message:
In expand.grid(origin = as.numeric(dimnames(.FullTriangle)$origin), :
NAs introduced by coercion
When I use the attached data and code to produce bootstrap results the output seems odd. The IBNR S.E for origin year 5 (a mature year) is for example 10% for seed 12 and 17% for seed 99 (not cherry picking seeds, just picked a couple at random).
Both of these are much higher than the Mack SE of 0.08%.
I have been working on a lot of triangles and for many bootstrap and mack give similar results, but for some there are very large differences, particularly in mature years. Sometimes the difference only appears on certain seeds and sometimes running with the same seed but a slight tweak to the data such as different number of decimal places or extra (stable) year of development on a large triangle cause the odd bootstrap SE result to appear or disappear.
Any suggestions? Am I understanding correctly that this is likely to be a issue with the package rather than differences between Mack and Bootstrap methods?
set.seed(12)
B <- BootChainLadder(data, R=1000, process.distr="gamma")
mack <- MackChainLadder(data, est.sigma="Mack")
BootChainLadder(Triangle = data, R = 1000, process.distr = "gamma")
Latest Mean Ultimate Mean IBNR IBNR.S.E IBNR 75% IBNR 95%
1 0.216 0.216 0.00e+00 0.0000 0.00e+00 0.00e+00
2 0.247 0.247 9.55e-05 0.0125 1.62e-76 1.04e-06
3 0.251 0.250 -8.37e-04 0.0150 3.70e-21 3.54e-04
4 0.968 0.966 -2.52e-03 0.0497 1.37e-04 3.59e-02
5 1.725 1.723 -2.61e-03 0.1033 3.94e-03 6.75e-02
6 1.952 1.949 -2.81e-03 0.0846 7.29e-03 9.43e-02
7 1.952 1.944 -7.88e-03 0.1030 8.87e-03 8.66e-02
8 1.825 1.819 -5.92e-03 0.0965 8.62e-03 8.30e-02
9 0.894 0.884 -9.46e-03 0.0671 1.15e-03 6.18e-02
10 0.260 0.258 -2.34e-03 0.0295 4.08e-06 1.80e-02
11 0.520 0.517 -2.42e-03 0.0413 5.02e-04 5.20e-02
12 1.532 1.513 -1.98e-02 0.1081 5.44e-03 7.66e-02
13 1.166 1.144 -2.17e-02 0.0928 1.71e-03 7.20e-02
14 0.354 0.350 -3.96e-03 0.0416 5.46e-04 3.70e-02
15 0.623 0.618 -5.40e-03 0.0556 3.61e-03 6.41e-02
16 1.119 1.099 -2.06e-02 0.0914 5.97e-03 8.31e-02
17 0.803 0.775 -2.80e-02 0.0846 -1.42e-04 6.09e-02
18 0.761 0.741 -1.93e-02 0.0798 6.35e-03 8.32e-02
19 0.455 0.451 -4.34e-03 0.0638 1.12e-02 9.51e-02
20 0.426 0.446 1.96e-02 0.0724 4.52e-02 1.49e-01
21 0.449 0.505 5.54e-02 0.0909 9.35e-02 2.11e-01
22 0.691 0.944 2.53e-01 0.1860 3.57e-01 5.99e-01
23 0.112 0.426 3.14e-01 0.2955 4.56e-01 8.38e-01
MackChainLadder(Triangle = data, est.sigma = "Mack")
Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR)
1 0.216 1.000 0.216 0.00e+00 0.00e+00 NaN
2 0.247 1.000 0.247 -2.28e-05 2.54e-05 -1.113
3 0.251 1.000 0.251 3.90e-06 6.64e-05 17.006
4 0.968 1.001 0.968 -6.36e-04 4.58e-04 -0.721
5 1.725 1.001 1.724 -9.99e-04 7.62e-04 -0.763
6 1.952 1.001 1.950 -1.98e-03 1.11e-03 -0.562
7 1.952 1.004 1.945 -7.04e-03 8.75e-03 -1.244
8 1.825 1.004 1.817 -7.69e-03 8.45e-03 -1.099
9 0.894 1.010 0.885 -8.67e-03 8.57e-03 -0.988
10 0.260 1.007 0.258 -1.92e-03 5.33e-03 -2.774
11 0.520 1.009 0.515 -4.40e-03 7.91e-03 -1.799
12 1.532 1.012 1.515 -1.76e-02 1.71e-02 -0.973
13 1.166 1.015 1.149 -1.72e-02 2.04e-02 -1.189
14 0.354 1.010 0.350 -3.61e-03 1.51e-02 -4.189
15 0.623 1.012 0.616 -7.47e-03 2.18e-02 -2.917
16 1.119 1.019 1.099 -2.03e-02 3.85e-02 -1.894
17 0.803 1.035 0.776 -2.70e-02 3.99e-02 -1.475
18 0.761 1.023 0.743 -1.73e-02 4.69e-02 -2.719
19 0.455 1.002 0.454 -8.51e-04 4.52e-02 -53.153
20 0.426 0.950 0.449 2.26e-02 8.38e-02 3.711
21 0.449 0.883 0.509 5.94e-02 1.03e-01 1.732
22 0.691 0.724 0.954 2.63e-01 2.28e-01 0.865
23 0.112 0.258 0.435 3.23e-01 3.96e-01 1.227
Hello,
I have this incremental triangle:
incr_tri <- structure(c(1426070.24536192, 1736770.1007, 2639782.9874, 3587024.63956496,
3865940.962, 5673143.20642889, 5714213.50358944, 4301115.41676496,
6693050.8, 8217307.30397754, 11307235.24834, 9905168.2946652,
12130, 1359660.9508, 1633281.9735, 3215907.0814, 6528376.49714343,
7486409.03571738, 5076725.2128, 1004333.56011442, 6730057.650404,
7331244.53689184, 5148580.23881475, 5202152.529135, 9842034.6463464,
NA, 470925.445, 653460.5987, 1095930.8238, 2305059.52683733,
2980199.38384495, 1223521.20321201, 1465304.4065, 2154081.68602803,
2257224.018628, 2045705.0784234, 2503710.4930872, NA, NA, 51722.662,
229922.0163, 778652.711590961, 841699.634600001, 2074852.01257497,
692920.451400001, 892086.349364856, 862832.063100001, 1328214.8,
1339655.7333272, NA, NA, NA, 66196.5024000001, -5951.36274700332,
304899.7604, 484964.062129008, 1361228.6917, -670073.959978774,
215230.3069, 862344.206280001, 0, NA, NA, NA, NA, 136136.177211975,
189860.1108, 298799.72226939, 56610.0911999997, -259425.922708202,
188747.8818, 191191.1053, 618330.214400001, NA, NA, NA, NA, NA,
28622.3487999998, 121967.659968453, 96199.9843999995, 31803.0385980848,
339990.021899998, 66964.3233000003, 599504.000800001, NA, NA,
NA, NA, NA, NA, 154710.976175, -57885.6664000005, 115649.495765802,
-17023.9271000009, 40990.3735000007, 16334.6632000003, NA, NA,
NA, NA, NA, NA, NA, 4964.32774377428, -1413.74210000038, 2367.94040000066,
-21930.2992000002, 128457.843200002, NA, NA, NA, NA, NA, NA,
NA, NA, 2558.33874706039, -2120.56850000005, 19535.0869999994,
0, NA, NA, NA, NA, NA, NA, NA, NA, NA, 320, -5278.58229999989,
0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 217.206400000025,
5382.70040000044, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), dim = c(13L,
13L), dimnames = list(NULL, c("0", "1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12")))
I convert it to cumulative:
# cumulative triangle:
cum_tri <- ChainLadder::incr2cum(incr_tri)
If I use MackChainLadder()
when tail = FALSE
it works fine:
# tail = FALSE:
tail_false <- ChainLadder::MackChainLadder(
Triangle = cum_tri,
tail = FALSE
)
But when I set tail = TRUE
it errors:
# tail = TRUE:
tail_true <- ChainLadder::MackChainLadder(
Triangle = cum_tri,
tail = TRUE
)
Error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In log(.f - 1) : NaNs produced
Why does it error? Is there a workaround?
In the chainladder() method, it was unclear to me how the weights matrix would be applied. I mistakenly assumed that the weight W_{i,j+1} corresponded to the age-to-age factor given by C_{i,j+1}/C_{i,j}. Turns out the weight W_{i,j} corresponds to that factor.
Didn't see any mention of this in the documentation. Would be nice to include it so others don't make the same mistake I did.
Thanks
Correct me if i'm wrong, but i think the parametrization of the rgamma
function is wrong within the Bootchainladder
function.
England and Verrall (2002) stated that using the gamma distribution for the process variance E[C_ij] = m_ij and Var[C_ij] = m_ij^2*phi.
The documentation of the rgamma
function states correctely that, using a as shape parameter and s as scale parameter, E(X) = a*s and Var(X) = a*s^2.
By my calculations the shape and scale parameters should be respectively a = 1/phi and s = m_ij* phi. But in Bootchainladder
it is parametrized as a = m_ij/phi and s = phi. The mean is correct but the variance is m_ij*phi, which does not follow Englang and Verrall.
From the Bootchainladder
function:
if (process.distr == "gamma") processTriangle[!is.na(simExp)] <- sign(simExp[!is.na(simExp)]) * rgamma(length(simExp[!is.na(simExp)]), shape = abs(simExp[!is.na(simExp)]/scale.phi), scale = scale.phi)
Some print methods do not return their argument unchanged. For example:
library(ChainLadder)
mcl <- MackChainLadder(RAA)
# Printing changes it's argument
mcl_print <- print(MackChainLadder(RAA))
identical(mcl, mcl_print) # Returns FALSE
# Brackets change the assinged object
(mcl_brackets <- MackChainLadder(RAA))
identical(mcl, mcl_brackets) # Returns FALSE
# If object is assigned to a name before printing / brackets, results differ
mcl_print2 <- print(mcl)
identical(mcl_print, mcl_print2) # Returns TRUE
(mcl_brackets2 <- mcl)
identical(mcl_brackets, mcl_brackets2) # Returns FALSE
This behaviour can lead to confusion and is not in line with the print
generic's documentation (?print
):
invisible(x)
)
The behaviour I was expecting, illustrated with summary()
:
mtc_smry <- summary(mtcars)
# Printing returns it's argument
mtc_smry_print <- print(summary(mtcars))
identical(mtc_smry, mtc_smry_print) # Returns TRUE
# Brackets have no impact
(mtc_smry_brackets <- summary(mtcars))
identical(mtc_smry, mtc_smry_brackets) # Returns TRUE
# No impact if object is assigned to a name before printing / brackets
mtc_smry_print2 <- print(mtc_smry)
identical(mtc_smry_print, mtc_smry_print2) # Returns TRUE
(mtc_smry_brackets2 <- mtc_smry)
identical(mtc_smry_brackets, mtc_smry_brackets2) # Returns TRUE
I am using the current GitHub version of ChainLadder. Here's my sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)
Matrix products: default
BLAS: /opt/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /opt/R/3.5.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ChainLadder_0.2.6
loaded via a namespace (and not attached):
[1] biglm_0.9-1 statmod_1.4.30 zoo_1.8-2 tidyselect_0.2.4 purrr_0.2.5
[6] reshape2_1.4.3 splines_3.5.0 haven_1.1.1 lattice_0.20-35 carData_3.0-1
[11] colorspace_1.3-2 stats4_3.5.0 yaml_2.1.19 rlang_0.2.1 pillar_1.2.3
[16] foreign_0.8-70 glue_1.3.0 tweedie_2.3.2 readxl_1.1.0 bindrcpp_0.2.2
[21] bindr_0.1.1 plyr_1.8.4 stringr_1.3.1 munsell_0.5.0 cplm_0.7-7
[26] gtable_0.2.0 cellranger_1.1.0 zip_1.0.0 expint_0.1-4 coda_0.19-1
[31] systemfit_1.1-22 rio_0.5.10 forcats_0.3.0 lmtest_0.9-36 curl_3.2
[36] Rcpp_0.12.17 scales_0.5.0 abind_1.4-5 ggplot2_3.0.0 stringi_1.2.3
[41] openxlsx_4.1.0 dplyr_0.7.6 grid_3.5.0 tools_3.5.0 sandwich_2.4-0
[46] magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2 car_3.0-0 pkgconfig_2.0.1
[51] MASS_7.3-50 Matrix_1.2-14 data.table_1.11.4 actuar_2.3-1 assertthat_0.2.0
[56] minqa_1.2.4 R6_2.2.2 nlme_3.1-137 compiler_3.5.0
Line 85 in "Triangles.R" has a bug which leads to incorrect naming of columns:
In the following code, the order of "dev" and "origin" are swapped in the aggregate function and the line which names the columns.
aggTriangle <- stats::aggregate(Triangle[[value]],
list(Triangle[[dev]], Triangle[[origin]]),
sum)
names(aggTriangle) <- c(origin, dev, value)
The following code will generate an error:
op <- as.Date(paste0(2001:2010, "-01-01"))
lags <- 1:10
triangle <- expand.grid(op, lags)
names(triangle) <- c("origin", "dev")
set.seed(1234)
triangle$value <- rnorm(100)
triCL <- ChainLadder::as.triangle(triangle)
plot(triCL, lattice=TRUE)
The error is:
Error in .as.LongTriangle(x, na.rm) :
The origin and dev. period columns have to be of type numeric
or a character which can be converted into numeric.
The origin column in the triangle is numeric. A call to typeof will return "double" and class will return "Date".
I think there's a typo when checking if user supplied arg weight
:
if (!("triangle") %in% class(triangle))
stop("triangle must be of class 'triangle'")
if ("offset" %in% names(list(...)))
stop("'offset' should be passed using the
'exposure' attribute of the triangle!")
if ("weigth" %in% names(list(...)))
stop("'weight' should not be used")
Shouldn't it be if ("weight" %in% names(list(...)))
?
ChainLadder/vignettes/ChainLadder.Rnw
Line 286 in 3ef9731
Hi,
Wonder if the example should be that "the age of the 13112 value, evaluated as of 1990, is three years". Looks like a typo to me.
Regards,
Ben
My issue is with your R code for MackChainLadder formula in particular with both “Total.ProcessRisk” and “Total.ParameterRisk” elements. They both seem to produce incorrect results. If you'd like detailed explanation of the issue send me an email to [email protected]
This is a question more than an issue, but I'm not sure where to post questions.
How can I simulate scenarios from the mackchainladder for each year and period such that the sims have the correct distributions. In particular, I want that if I run enough sims I get results consistent with the mack standard errors for each year and in aggregate?
Could you consider adding a function with the captioned. Here is a version that we wrote:
interpolate_ldfs <- function(observed_ldf_df, interp_age){
# observed_ldf_df <- sel_data
# interp_age <- 9
## At some age ('ldf_2_one') all selected 'ldfs' = 1 for all 'ages' >= ldf_2_one
## Hence our 'pct_ibnr' -> inf for all 'ages' >= 'ldf_2_one',
## and recieve error when fit linear model
## Test if 'interp_age' >= 'ldf_2_one' then return 1. Else proceed to interpolation
ldf_2_one <- min(observed_ldf_df$age[observed_ldf_df$ldf == 1])
#the first age which the ldf is 1
if (interp_age >= ldf_2_one) {
return(1)
} else {
## Exclude rows from 'observed_ldf_df' where ldf == 1
observed_ldf_df <- observed_ldf_df[observed_ldf_df$ldf != 1,]
observed_ldf_df <- observed_ldf_df %>%
dplyr::mutate(pct_ibnr = 1 - (1 / ldf))
## Fit weibull model
weibul_model <- lm(log(-log(observed_ldf_df$pct_ibnr)) ~
log(observed_ldf_df$age)) # Boor Eq (8)
## Define the age of the ldfs above and below the interpulated age
age_below <- interp_age - (interp_age %% 12)
age_above <- interp_age + (12 - (interp_age %% 12))
fit_below <- exp(-exp(weibul_model$coefficients[1] +
weibul_model$coefficients[2] * log(age_below)))
fit_above <- exp(-exp(weibul_model$coefficients[1] +
weibul_model$coefficients[2] * log(age_above)))
fit_at <- exp(-exp(weibul_model$coefficients[1] +
weibul_model$coefficients[2] * log(interp_age)))
## Selected ldfs at age_below and age_above
observed_below <- observed_ldf_df$pct_ibnr[observed_ldf_df$age == age_below]
observed_above <- observed_ldf_df$pct_ibnr[observed_ldf_df$age == age_above]
## observed_below is na when age_below < 12. Set equal to 1
if(interp_age < 12) observed_below = 1
## variables to make extrapolation easier
max_obs_age <- max(observed_ldf_df$age)
if(interp_age < max_obs_age){ # interpolate
interp_along_curve <- observed_below + (((fit_at - fit_below) /
(fit_above - fit_below)) *
(observed_above - observed_below))
} else{ # extrapolate
fit_at_max_age <- exp(-exp(weibul_model$coefficients[1] +
weibul_model$coefficients[2] *
log(max_obs_age)))
obs_at_max_age <- observed_ldf_df$pct_ibnr[observed_ldf_df$age ==
max_obs_age]
interp_along_curve <- fit_at * obs_at_max_age / fit_at_max_age
}
## Calculate ldf
implied_ldf <- 1 / (1 - interp_along_curve)
## Adjust for age < 12 months
implied_full_ay_ldf <- ifelse(interp_age >= 12, implied_ldf,
implied_ldf * 12 / interp_age)
return(implied_full_ay_ldf)
}}
when I run "glmReserve" in RExcel with "var.power=1" and "cum=FALSE" and "mse.method = bootstarp" and "nsim =1000" I get error message "Microsoft Excel is waiting for another application to complete an OLE action"
When I run it in R it takes absolutely ages and eventually I have to kill it with getting no results
Hi all, Greate library.
Playing with demo ""DatabaseExamples"" i have added data for new lob (Marine1) in MSAccess data base. As you can see some data are missing (origin year 2013 missing claims)
Marine1 | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
2012 | 17850 | |||||
2013 | ||||||
2014 | 1167.96 | 53010 | ||||
2015 | 4900 | |||||
2016 | 9308.33 | 69289.5 | ||||
2017 | 377 |
as a result Origin years are missing and replaced by .... (1,3,4,5,6) instead of 2012, 2013 is not reported at all , 2014, 2015, 2016, 2017
BootChainLadderResults
$Marine1
FUN(Triangle = X[[i]], R = 999, process.distr = ..2)
Latest | Mean Ultimate | Mean IBNR | IBNR.S.E | IBNR 75% | IBNR 95% | |
---|---|---|---|---|---|---|
1 | 17,850 | 17,850 | 0 | 0 | 0 | 0 |
3 | 54,178 | 54,178 | 0 | 0 | 0 | 0 |
4 | 4,900 | 4,900 | 0 | 0 | 0 | 0 |
5 | 78,598 | 78,598 | 0 | 0 | 0 | 0 |
6 | 377 | 377 | 0 | 0 | 0 | 0 |
Totals
Latest: 155,903
Mean Ultimate: 155,903
Mean IBNR: 0
IBNR.S.E 0
Total IBNR 75%: 0
Total IBNR 95%: 0
Following the discutions that started at #44, i'd like to continue my investigation on norms for models.
For Univariates (and non-bootstrap) models, do we agree that :
Do i miss something that's important to one model or another ?
My goal is to be able to fit models from the same function, in the caret
-way, something like :
data(ABC)
mod1 <- fitTriangle(ABC,method="Mack",...)
mod2 <- fitTriangle(ABC,method="Merz",...)
mod3 <- fitTriangle(ABC,method="glm",family=quasipoisson(link="log"),...)
mod4 <- fitTriangle(ABC,method="glm.nb",...)
mod.list = list(mod1,mod2,mod3,mod4)
and then to get extraction in a standardized way :
purrr::map(mod.list,CDR)
purrr::map(mod.list,"ultimates")
purrr::map(mod.list,"ultimate.s.e")
purrr::map(mod.list,"total.s.e")
etc.
Thoughts ?
I'm quite new to R and have had some trouble pulling Mack and Bootstrap reserve risk information from R.
I understand Total Mack S.E ^2 = (Process Risk^2) + (Paramater Risk^2) but I'd like to see the individual breakout.
Additionally, the Mack S.E's and Ultimates sometimes show up in abbreviated format (2.03e+06) rather than something like 2028950. Is there any way to re-format the Mack summary?
Thanks in advance.
Hi,
I think it would be very useful to add an option in the BootChainLadder function in order to allow the user to force development factors other than those stemming from a pure application of the chainladder method. Indeed the calibration of development factors encompasses a certain level of expert judgement which can give rise to user defined development factors. The bootstrap should then be based on those DF and not on the canonical ones.
Is this something possible ? Thanks.
The functions are set up to only handle nxn triangles. The AY correlation function appears to just need the value of n adjusted for years with one LDF. However, the CY test function may need more nuance.
Correct me if I'm wrong, but it looks like the getExpected function is calculating the fitted triangle based on the Chain Ladder ultimates. This is different than England/Verrall Appendix 3 in which you "Obtain cumulative fitted values for the past triangle by backwards recursion, starting with the observed cumulative paid to date in the latest diagonal."
getExpected <- function(ults, ultDFs){ ults <- expandArray(ults, 2, dim(ultDFs)[2]) ultDFs <- expandArray(ultDFs, 1, dim(ults)[1]) return(ults * ultDFs) }
While the manual states "The implementation of BootChainLadder follows closely the discussion of the bootstrap model in section 8 and appendix 3 of the paper by England and Verrall (2002)", I think it's appropriate to note the difference in methods.
If I want a volume all, or simple all set of link ratios a call to ata(paid.tri) works fine (where paid.tri = cumulative paid development triangle).
If I want to create a volume-weighted fit on the last five years worth of LDF factors I have found that I can utilise the weights argument as follows: chainladder(paid.tri,weights=wgt.vol.5), where I supply an input triangle of weights.
For "annual-annual" triangles, a five-year weighted average can be achieved using this method provided that the user supplies a weights triangle where the latest 6 diagonals are set = 1 and the balance are NAs. However, when I try the same on a annual/quarterly input triangle (i.e. non square) it fails and returns the following...
"Error in checkTriangle(Triangle) :
Number of origin periods, 10, is less than the number of development periods, 39."
By annual/quarterly I mean annual origin cohorts but tracking development progress quarter on quarter, which is common for Lloyd's and reinsurance companies. In my case I had data running out to 9.75 development years, and had 10 origin years. The error message therefore makes sense but essentially indicates that the chainladder algorithm requires a square-input matrix.
I'm wondering if there is a smart way to overcome this? Personally i know that a lot of people would find it easier to adopt Chainladder if they could easily derive simple and volume-weighted averages using a call to a pre-built UDF, e.g. chainladder(paid.tri, no.diag=5, TypeOfAverage="Vol")
I've found that a n-year simple average can be achieved using the following and I think this would be a good addition to your help file even if you don't recast the "chainladder"function.
######################################
require(zoo)
#quick UDF to get rid of NAs:
link_ratio_simple_n_yrs<-function(x,no.diag){
mean(tail(na.trim(x,sides="both"),no.diag))
}
#derive link ratios using build in UDF:
paid.link.ratios<-ata(paid.tri)
#get simple average as:
apply(paid.link.ratios,2,link_ratio_simple_n_yrs,no.diag=5)
######################################
However I can't tell (and this is annoying) how to achieve the same type of thing for volume weighted averages? Any thoughts? or a pain in a a$$?
I suspect my work around for simple weighted average will fail if there are inf values in the triangle. The following page may provide a better alternative to na.trim above: https://artax.karlin.mff.cuni.cz/r-help/library/IDPmisc/html/NaRV.omit.html
I tried to install a package on a jupyter notebook install.packages("ChainLadder", "/Users/mymac/anaconda/lib/R/library")
and I have this message when I load library(ChainLadder)
any idea?
thanks!
Error: package or namespace load failed for ‘ChainLadder’
Traceback:
library(ChainLadder)
stop(gettextf("package or namespace load failed for %s", sQuote(package)),
. call. = FALSE, domain = NA)
Hello,
The tweedieReserve function is not working when the var.power argument is NULL.
The documentation of the package says that "If NULL, it will be assumed to be in (1,2) and estimated using the cplm package.". I think it is a mistake in the documentation. Indeed, when analysing the code, I don't think the author has tried to implement any code that tackles a NULL value for the var.power argument.
This option is only available with the function glmReserve.
Is that correct ?
It could be nice to implement it within the tweedieReserve code since this function allows ODP model based on the calendar year which is not the case of the glmReserve.
Thank you in advance.
Regards,
BD
Do you plan on implementing the variability calculations from Section 4.2 of the CLFM paper in this package?
Hi,
An error message:
"Error: package or namespace load failed for ‘ChainLadder’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called ‘car’"
is found after I have installed the package and try as.triangle function.
Can you have a look into it and please advise? Thanks a lot
Best regards
Unit testing some code and came across this example:
MackChainLadder(auto$PersonalAutoIncurred,alpha=2,tail=TRUE)
I believe this set-up produces unrealistic MackS.E.s
MultiChainLadder(list(GenIns, GenIns))
fails with lapack error
Error in solve(sigma, tol = solvetol) :
Lapack routine dsptrf returned error code 1
I am getting systemfit-generated errors with other triangles as well.
Another enhancement proposal I'm willing to work on.
Currently, one creates triangle objects using as.triangle
on matrices or data frames. This works very well for imported data, but I think we could do better for triangles created from the command line (or in a script file, which amounts to the same).
Just like there are matrix
and as.matrix
, or data.frame
and as.data.frame
, we could have a function triangle
to directly create triangle objects from vectors of data. The added benefit would be that one would not have to supply NA
values for the lower triangle. Think of an interface similar to this to create a 4 x 4 triangle:
triangle(c(1:4), c(1:3), c(1:2), 1, byrow = TRUE)
Any interest?
Cheers
I am one of the presenters at the R workshop at CLRS (2016). I was assigned the ChainLadder
package. Here is my presentation in case you have any thoughts (no obligation, of course)
http://rpubs.com/rajesh06_2016/chainladder_clrs
Thanks - Raj
The value returned by MackChainLadder()
depends on whether Triangle
is passed directly (i.e. as a function argument) or using magrittr
's pipe operator (%>%
):
library(ChainLadder)
library(magrittr)
# Pass Triangle directly
mcl <- MackChainLadder(RAA)
# Pipe Triangle
mcl_piped <- RAA %>%
MackChainLadder()
identical(mcl, mcl_piped) # Returns FALSE
Differences are in elements "call" and "Model":
idx.diff <- which(vapply(
seq_along(mcl),
function(i) !identical(mcl[[i]], mcl_piped[[i]]),
logical(1))
)
names(mcl)[idx.diff]
Arguably, the only difference is in the original name of the Triangle object. This difference may look minor and cosmetic. However, it will create confusion to anybody trying verify that two pieces of code lead to the same outcome. Also, pipes are so prevalent these days that they shouldn't be ignored.
I am using the current GitHub version of ChainLadder. Here's my sessionInfo():
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)
Matrix products: default
BLAS: /opt/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /opt/R/3.5.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_1.5 ChainLadder_0.2.6
loaded via a namespace (and not attached):
[1] biglm_0.9-1 statmod_1.4.30 zoo_1.8-2 tidyselect_0.2.4 purrr_0.2.5
[6] reshape2_1.4.3 splines_3.5.0 haven_1.1.1 lattice_0.20-35 carData_3.0-1
[11] colorspace_1.3-2 stats4_3.5.0 yaml_2.1.19 rlang_0.2.1 pillar_1.2.3
[16] foreign_0.8-70 glue_1.3.0 tweedie_2.3.2 readxl_1.1.0 bindrcpp_0.2.2
[21] bindr_0.1.1 plyr_1.8.4 stringr_1.3.1 munsell_0.5.0 cplm_0.7-7
[26] gtable_0.2.0 cellranger_1.1.0 zip_1.0.0 expint_0.1-4 coda_0.19-1
[31] systemfit_1.1-22 rio_0.5.10 forcats_0.3.0 lmtest_0.9-36 curl_3.2
[36] Rcpp_0.12.17 scales_0.5.0 abind_1.4-5 ggplot2_3.0.0 stringi_1.2.3
[41] openxlsx_4.1.0 dplyr_0.7.6 grid_3.5.0 tools_3.5.0 sandwich_2.4-0
[46] lazyeval_0.2.1 tibble_1.4.2 car_3.0-0 pkgconfig_2.0.1 MASS_7.3-50
[51] Matrix_1.2-14 data.table_1.11.4 actuar_2.3-1 assertthat_0.2.0 minqa_1.2.4
[56] R6_2.2.2 nlme_3.1-137 compiler_3.5.0
Hello,
When rereserving = FALSE is specifying
, summary.tweedie does not work.
This is due to the fact that in the second part of the code of the summary.tweedie function (this part is applied if rereserving = FALSE
) we have
else{
out<- list(
Reserve=data.frame(
IBNR=c(mean(res$distr.res_ult),
sd(res$distr.res_ult),
#sd(res$distr.res_ult)/mean(res$distr.res_ult),
quantile(res$distr.res_ult,q)
)
),
Diagnostic=c(GLMReserve=res$GLMReserve,
"mean(IBNR)"=mean(res$distr.res_ult))
)
}
rownames(out$Prediction) <- c("mean", "sd", paste0(q*100, "%"))
print(out)
}
Therefore there is an error because out$Prediction
does not exist.
It is easily fixed by changing for example :
Reserve=data.frame(
IBNR=c(mean(res$distr.res_ult),
sd(res$distr.res_ult),
#sd(res$distr.res_ult)/mean(res$distr.res_ult),
quantile(res$distr.res_ult,q)
)
by
Prediction=data.frame(
IBNR=c(mean(res$distr.res_ult),
sd(res$distr.res_ult),
#sd(res$distr.res_ult)/mean(res$distr.res_ult),
quantile(res$distr.res_ult,q)
)
Thank you in advance for your opinion.
Edit:
Problem solved. Issue to close.
The Mack method returns NaN for the the standard error if "sigma[i - 2]^2" is zero in the Mack.S.E function.
Is it possible to perform as.triangle on a triangle that isn't complete? I have older data where I'm only looking at the most recent ~25 years of development, so the upper left corner of my triangle is empty. When I use as.triangle on the data it seems to start the triangle at the first not null development period, which will then mess up recent accident years of the triangle. Is there an option to have the triangle start at the smallest (or first if data is ordered) origin value and development value?
Example triangle for what I'd like to make:
This is the triangle that as.triangle would generate from that data:
The standardized residuals which I am getting are different from the ones which R is throwing up.
I am unsure whether this is a bug since the fitted values are fine.
I am using {Average ATA (weighted or simple or ordinary regression) minus ATA factors from the data} divided by the standard deviation of ATAs from the data.
Please note that ATAs here refer to link development ratios. Also, in this example I have assumed that the future development is based on all years simple average ATAs.
This could also be cited in the paper (page number 10) "Flexible Factor Chain Ladder Model: A Stochastic Framework for Reasonable Link Ratio Selections" by Emanuel Bardis and Ali Majidi; and Daniel Murphy. Attached the paper.
01_Murphy.pdf
I have created an excel file highlighting the differences from rows 60 to 94 in the tab "St. Residuals" of the attachment.
I am bothered about cells H and J 44 (in RED fonts).
Kindly let me know why I see these differences?
Std Residual analysis.xlsx
Hello,
When using the Mack Method in excel, we get #div/0! errors if the tail portion of our triangle sees no development for several periods on all relevant AYs. As can be seen from the picture below, I could hard-code to a 0 in the highlighted cell as a workaround.
However, after playing around with the tail.se and tail.sigma functions in ChainLadder, I've not been able to figure out how I would hard-code something in a similar fashion.
Is there such a work around for triangles with this type of development when running a MackChainLadder process?
Thanks again in advance.
When upper left of triangle is NA, plot fails. Here is a simple example:
GenIns[1, 1] <- NA
plot(MultiChainLadder(list(GenIns)))
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
In addition: Warning message:
In cbind(do.call("rbind", fitted.values), dev) :
number of rows of result is not a multiple of vector length (arg 2)
This appears to be caused in the fitted method. By removing NA's in the variable 'x' (in the "MCL" case) after line 1272 thus
1272: x <- sapply(Triangles, "[", 1:(m-i),i)
x <- x[!is.na(x)]
old 1273: fitted[[i]] <- x%*%diag(B[[i]],nrow=p)
the package compiled and the error went away for me.
I did not thoroughly test, nor test the "GMCL" case.
Perhaps a more elegant solution would utilize each model's fitted method as is currently done for the 'residuals' method, but I did not investigate that.
Thanks,
Dan
R.version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.3
year 2019
month 03
day 11
svn rev 76217
language R
version.string R version 3.5.3 (2019-03-11)
nickname Great Truth
Within the BootChainLadder function, adjusted Pearson residuals are calculated as follows:
adj.resids <- unscaled.residuals * sqrt(nobs/scale.factor)
This is consistent with the formula given in the "Addendum to 'Analytic and Bootstrap Estimates of Prediction Errors in Claims Reserving'" paper (England, 2001, equations 2.4 and 3.1).
However, it is not consistent with the Stochastic Claims Reserving In General Insurance paper (England & Verrall, 2002, Appendix 3). Here the bootstrapping procedure is described with the instruction to adjust the Pearson residuals using the following slightly different formula:
adj.resids <- unscaled.residuals * sqrt(n/scale.factor)
i.e. with n rather than nobs in the numerator.
n is the number of rows/columns in the claims triangle, nobs is the total number of data points observed ( nobs <- 0.5 * n * (n + 1) ), and scale.factor <- (nobs - 2 * n + 1), or "n - p" in the original paper.
The latter source is the paper actually referenced in the R help file for BootChainLadder, although this in turn references the former.
Does anybody know how the Pearson residual adjustment formula is derived? Are there cases for using either version or is one incorrect? There can be a significant impact on the resulting calculation of standard error of IBNR depending on which is used.
Hi,
There is an inconsistency where Mack summary has all origin periods, but glmReserve drops the first origin period.
library(ChainLadder)
dev_glm <- glmReserve(GenIns)
dev_mack <- MackChainLadder(GenIns)
dev_glm$summary
summary(dev_mack)$ByOrigin
Hi,
I'm actualy trying to add some missing reserving models to the package. Is there some norms that all models should implement to be considered as viable models for Triangles ?
I've heard about the TriangleModel class, but i cant find it. Furthermore, wich S3 methods should be implemented for my new models ? What are the necessary outputs of a TriangleModel ? The issue is that with S3 classes everything is optional, so it's hard to understand how things are binded together.
I'm implementing both univariate and multivariate models (notably the recursives Mack/MW bootstraps). Do i need to know somethings specials about the Multivariate case (special classes ? special formating of outputs ? )
Is there somewhere a technical documentation about the package that could help me bring my code to the package standards ?
I am trying to calculate the CDR for solvency II proposes. How can I get the CDR from BootChainLadder object at 99.5% ? Thanks!
updated: CDR(BCL, probs=c(0.995))
hi 😄
i have a short question regarding the ChainLadder package. I found a website with an instruction for the Munich Chain Ladder that is included in the ChainLadder-package.
But unfortunately, the data for "MCLpaid" and "MCLincurred" are both presaved and I have no idea how I can change them, because I want to use it for my own data.https://cran.rstudio.com/web/packages/ChainLadder/vignettes/ChainLadder.html#munich-chain-ladder
I would be very happy if somebody could help me please and I am sorry if this is a stupid question, but i am not very advanced in working with R.
Many Greetings
Hello!
Please I have 2 questions and really hope someone will help me to understand.
1.Is there any way to find the process and parameter error from the Mack ChainLadder output? Is process and parameter error the same as the process risk and parameter risk that the package provide? If they are, Is the sum of the two not supposed to be equal to Mack.S.E for a given accident year?
Am I missing something? I need help on these issues.
Thanks in advance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.