acclab / dabestr Goto Github PK

View Code? Open in Web Editor NEW

214.0 9.0 34.0 18.72 MB

Data Analysis with Bootstrap Estimation in R

Home Page: https://acclab.github.io/dabestr

License: Apache License 2.0

R 100.00%

data-visualization data-analysis statistics estimation r

dabestr's Introduction

dabestr

dabestr is a package for Data Analysis using Bootstrap-Coupled ESTimation.

Estimation statistics is a simple framework that avoids the pitfalls of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one’s experiment/intervention, as opposed to a false dichotomy engendered by P values.

An estimation plot has two key features.

It presents all datapoints as a swarmplot, which orders each point to display the underlying distribution.
It presents the effect size as a bootstrap 95% confidence interval on a separate but aligned axes.

DABEST powers estimationstats.com, allowing everyone access to high-quality estimation plots.

Installation

# Install it from CRAN
install.packages("dabestr")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github(repo = "ACCLAB/dabestr", ref = "dev")

Usage

library(dabestr)

data("non_proportional_data")

dabest_obj.mean_diff <- load(
  data = non_proportional_data,
  x = Group,
  y = Measurement,
  idx = c("Control 1", "Test 1")
) %>%
  mean_diff()

dabest_plot(dabest_obj.mean_diff, TRUE)

Please refer to the official tutorial for more useful code snippets.

Citation

Moving beyond P values: Everyday data analysis with estimation plots

Joses Ho, Tayfun Tumkaya, Sameer Aryal, Hyungwon Choi, Adam Claridge-Chang

Nature Methods 2019, 1548-7105. 10.1038/s41592-019-0470-3

Paywalled publisher site; Free-to-view PDF

Contributing

Please report any bugs on the Github issue tracker.

All contributions are welcome; please read the Guidelines for contributing first.

We also have a Code of Conduct to foster an inclusive and productive space.

Acknowledgements

We would like to thank alpha testers from the Claridge-Chang lab: Sangyu Xu, Xianyuan Zhang, Farhan Mohammad, Jurga Mituzaitė, and Stanislav Ott.

DABEST in other languages

DABEST is also available in Python (DABEST-python) and Matlab (DABEST-Matlab).

dabestr's People

Contributors

Stargazers

Watchers

dabestr's Issues

Difficulty plotting based on levels

Error in levels<-(*tmp*, value = as.character(levels)) :
factor level [4] is duplicated

Dataframe:
Z-Score_Alpha_Right_Temporal_GroupbyMeasurement.xlsx

Code:
Z_Score_Alpha_Right_Temporal <- read_excel("(https://github.com/ACCLAB/dabestr/files/5042212/Z-Score_Alpha_Right_Temporal_GroupbyMeasurement.xlsx")
multi.two.group.unpaired <-
Z_Score_Alpha_Right_Temporal %>%
dabest(Group, Measurement,
idx = list(c("ASDV", "TD"),
c("ASDMLV", "TD"),
c("ASDMLV", "ASDV")),
paired = FALSE)
multi.two.group.unpaired.meandiff <- mean_diff(multi.two.group.unpaired)
plot(multi.two.group.unpaired.meandiff, color.column = Group)

The group appears to have 3 levels based on:
multi.two.group.unpaired.meandiff[[1]]$Group <- factor(multi.two.group.unpaired.meandiff[[1]]$Group, levels = unique(multi.two.group.unpaired.meandiff[[1]]$Group), ordered=FALSE)

subscript and formulas in rawplot.ylabel

Hi,
I would like to include in the rawplot.ylabel special characters and chemical formulas. I have tried few approach but so far no lack.
do you have any suggestion?
thanks
D

Finetune application of ggplot themes

While we are able to apply themes when creating a plot, we need to finetune the theme application (remove gridlines etc).

contradictory results

Dear Adam and Joses,

it seems there is a bug in your program.
I have the attached paired data set and receive the contradictory results:
The paired Hedges' g between TS_hist and TS_a is 0.174 [95.0%CI -0.134, 0.483].
The two-sided P value of the Wilcoxon test is 0.000693.
Program JASP gives d = 0.35 with 95% CI from 0.13 to 0.57 and the same p = 0.00069
TS.zip

Best regards,
Nikita

Misplaced parenthesis causing error to be misreported

dabestr/R/plot.R

Line 445 in 1d322cf

 stop(paste(stringr::str_interp("'${palette}' is not a valid ggplot2 palette.\n"), 

One-Sample t-test

Hi. The package is amazing and I generally find it very useful. Great job! One issue I am having is, that I cannot run a one-sample t-tests. Is there a way around it and are you planning to introduce such feature?

Adding error bar to points

Hello!

I've been trying to add error bars to the individual points (for technical replicates in an experiment). It seems the plot function generates a ggplot2 object but it's not mantaining the original variable names maybe?

require(dabestr)
require(dplyr)
require(ggplot2)

GroupCtrl = rnorm(n = 5, mean = 5000000, sd = 500000)
GroupTest = rnorm(n = 5, mean = 4500000, sd = 450000)

SimulatedData = data.frame(Group = factor(c(rep(0,15),rep(1,15))),
                      Replicate = factor(rep(rep(c(1,2,3,4,5), each = 3),2)),
                      PeakArea = c(sapply(GroupCtrl, function(x){x + rnorm(n = 1, mean = 0, sd = 35000)}),
                                     sapply(GroupCtrl, function(x){x + rnorm(n = 1, mean = 0, sd = 45000)}),
                                     sapply(GroupCtrl, function(x){x + rnorm(n = 1, mean = 0, sd = 15000)}),
                                     sapply(GroupTest, function(x){x + rnorm(n = 1, mean = 0, sd = 25000)}),
                                     sapply(GroupTest, function(x){x + rnorm(n = 1, mean = 0, sd = 20000)}),
                                     sapply(GroupTest, function(x){x + rnorm(n = 1, mean = 0, sd = 30000)})))
levels(SimulatedData$Group) = c('Control', 'Treatment')

SEM = function(x){return(sd(x)/sqrt(length(x)))}

GroupedData = SimulatedData %>% group_by(.dots = c('Group','Replicate')) %>% summarize(AvrgPA = mean(PeakArea),SEM = SEM(PeakArea))
unpaired_mean_diff <- dabest(GroupedData, Group, AvrgPA,
                             idx = c("Control", "Treatment"),
                             paired = FALSE) 

p = plot(unpaired_mean_diff) 
p

So far so good, but if I try to add error bars..

p + geom_pointrange(aes(ymin=AvrgPA-SEM, ymax=AvrgPA+SEM))

Error in FUN(X[[i]], ...) : object 'AvrgPA' not found

Any workaround this?

Thank you so much!

big data

your program cannot analyze
Size.xlsx
big data
It seems for such a data to use option jitter
Nikita

Error in reporting n numbers in printed output

In the printed output of the dabest test, the n numbers for the control and test groups are reversed.

Examination of the output using str() shows the correct values, e.g. 9 control and 15 test in my example:

..$ control_group: chr "control"
..$ test_group : chr "test"
..$ control_size : int 9
..$ test_size : int 15

However, the printed output shows the control and test n numbers reversed:

DABEST (Data Analysis with Bootstrap Estimation) v0.2.4

Variable: value

Unpaired mean difference of test (n=9) minus control (n=15)
-18.7 [95CI -23.2; -13.9]

5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.

Examination of the R code seems to suggest that two lines in print row_dabest are swapped:

printrow_dabest <- function(my.row, sigdig = 3) {
  if (identical(my.row$paired, TRUE)) p <- "Paired" else p <- "Unpaired"
  ffunc <- my.row$func
  line1 <- stringr::str_interp(
    c(
      "${p} ${ffunc} difference of ",
      "${my.row$test_group} ",
      "(n=${my.row$control_size}) ",
      "minus ${my.row$control_group} ",
      "(n=${my.row$test_size})\n"
    )
  )

Error

When I run dabeston the example in the vignette,

unpaired_mean_diff <- dabest(iris, Species, Petal.Width,
+                              idx = c("setosa", "versicolor"),
+                              paired = FALSE)

I get the following error:

Some components of ... were not used: ..1

It seems that this error appeared when compiling on CRAN: ftp://stat.ethz.ch/Software/CRAN/web/checks/check_results_dabestr.html

Here's my sessionInfo(). Any thoughts are greatly appreciated:

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dabestr_0.2.0 magrittr_1.5 boot_1.3-22

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 assertthat_0.2.1 packrat_0.5.0 crayon_1.3.4 dplyr_0.8.1 R6_2.4.0 cellranger_1.1.0 pillar_1.4.1
[9] rlang_0.3.4 readxl_1.3.1 rstudioapi_0.10 ellipsis_0.1.0 forcats_0.4.0 tools_3.6.0 simpleboot_1.1-7 glue_1.3.1.9000
[17] purrr_0.3.2 xfun_0.7 compiler_3.6.0 pkgconfig_2.0.2 tidyselect_0.2.5 knitr_1.23 tibble_2.1.3

Extending color choice

Hello,

The issue: I needed flexibility in the choice of colours for the raw data in an estimation plot AND a colour vector long enough to cover a number of data groups >11, the number of colours in the bigger RColorBrewer palettes. The following worked for me. In code file R/plot.R from GitHub, I replaced the original line 452,

ggplot2::scale_color_brewer(palette = palette) +

with

ggplot2::scale_colour_manual(values = rep(c("#70A4CE", "#70A4CE", "#CF955C", "#CF955C"), 3)) +

This was suitable for 12 data groups paired by color, alternating blue and dark orange as in the DABEST paper by Ho et al., Nature Methods, 2019. I'm running dabestr_0.2.2 from CRAN, and had to read in the (non-modified) plot_helpers.R and flat_violin.R files to make this work, in addition to the edited plot.R file.

Color palettes unavailable for multi-paired plot

I appear to have sorted out issues related #36 but have found a new issue (in v. 0.2.1.9). When doing a multi-paired plot, if a color column is passed to the plot function, there is no way to change the palette.

Eyeball test says that plot.R should have ggplot2::scale_color_brewer(palette = palette) + on line 410.

Reprex (designed for consistency with my current real data):

library(dabestr)
library(tidyverse)
demo_data <- data.frame(id = rep(seq(1:32), each = 2),
                        group = rep(c("amber", "blue"), 
                                    times = c(34, 30)),
                        phase = rep(c("baseline", "ptx"),
                                    times = 32, each = 1),
                        score = rnorm(64, mean = 10, sd = 2),
                        sex = rep(c("Male", "Female"),
                                  times = 16, each = 2))

demo_data <- demo_data %>%
  unite(group, c("group", "phase"), sep = "_")


demo_dabest <- dabest(demo_data,
                      group, score,
                      idx = list(c("blue_baseline", "blue_ptx"),
                                 c("amber_baseline", "amber_ptx")),
                      paired = TRUE,
                      id.col = id)

# Black and white
plot(demo_dabest)

# Set1 colors
plot(demo_dabest,
     color.column = sex)

# Should be Dark1, but is in fact Set1
plot(demo_dabest,
     color.column = sex,
     palette = "Dark1")

Add standardized effect sizes

This feature will address #31 , and also bring dabestr up to speed with DABEST-Python.

Issue with effsize.y axis

I am having issues with the effsize.y axis. For some reason the mean and 95%CI is not aligning to the swarm plot. Please see attached screenshot and code. I have 5 other dabestr plots that I have created with other variables and they are all fine so I am unsure what is happening here. Have tried restarting R, clearing plots and environment and restarting computer.

changeACWR_est <- dabest(wlSNs_nobeta, group1, change_raACWR, idx=c("s", "ns"))
changeACWR_est_plot <- plot(changeACWR_est, rawplot.ylabel = "Change21 ACWR", effsize.ylabel = "")

Any advice would be much appreciated. Thanks in advance.

copyright details for `geom_flat_violin.R`

geom_flat_violin used by Allen and colleagues was written by David Robinson for R:
original: https://gist.github.com/dgrtwo/eb7750e74997891d7c20
version with minor fixes: tidyverse/ggplot2#2459

But you are attributing copyright only to Allen et al.-

dabestr/R/flat_violin.R

Lines 1 to 2 in b2bb127

 # Copyright (c) 2018 Micah Allen, Davide Poggiali, Kirstie Whitaker, 

 # Tom Rhys Marshall and Rogier Kievit.

Changing legend appearance

Hi there,

Thanks for a great package!

I was wondering if there is a way to change the order of the legend objects, labels, title etc for a multi-paired (Cummings Plot). I have tried various different options with no success (still relatively new to R).

Any help would be great,

Thanks

requirement for ellipsis version 0.2.0.1

When I test the example code with plot (unpaired_mean_diff), there is a error happens, said :Error: 'check_dots_empty' is not an exported object from 'namespace:ellipsis' . So I checked the sessionInfo( ), and find that the ellipsis v0.1.0, I tried to update to the new version , and then the plots shows up.

number of digit in summary of dabestr

Hi all,
summary of dabest() give only integer result about mean or median difference and 95%CI.
How can i get one, two or more digits ?

Thank you
Massimo

Allow control of ggplot object

Hi, thank you for such an awesome package. I don't know if this is possible but I would very much like to be able to do modifications on the plot produced by dabestr as I can do with ggplot. Some things are included as arguments to the plot function. But I would rather be able to do:

plot(unpaired_mean_diff, rawplot.ylabel = "y") +
 labs(title = "My awesome title here", 
         caption = "My caption here") +
 theme(legend.position = "bottom") # and other theme stuff here

Even though the returned object is of class "gg" "ggplot" I cannot interact normally with it (as in, add more layers).
For example, if I store a minimum plot() call into p and try:

p + theme(panel.border = element_rect(color="black"))

I get a blank plot

Changing the shape of data

Hi there,
Thanks for this great package!

I want to change the shape of data for a multi-paired (Cumming Plot). The default shape is the dot. How can I specify the shape of each column data point? For example, my multi-paired plot has four column data point (x-axis classification: group1-low-ability, group1-high-ability, group2-low-ability, group2-high-ability), I want to change the shape of data for the "group1-low-ability" and "group2-low-ability".

any suggestion?
Thank you very much !

New version of dabestr crashing

When we recently updated our packages, including dabestr, the data that used to run without a problem are now crashing with the following error:

Error in dabest(data, group, PIs, idx = groupnames, paired = FALSE,  : 
  unused argument (func = median)

Here a reproducible example straight from our data:

groupnames = c("WTB1","WTB2")
names(groupnames)=c("name","name")
tt <- "group       category        PIs
1   WTB1  7 left_torque  0.1900000
2   WTB1  7 left_torque -0.8083333
3   WTB1 7 right_torque -0.2416667
4   WTB1 7 right_torque -0.6300000
5   WTB1  7 left_torque -0.6516667
6   WTB1 7 right_torque -1.0000000
7   WTB1  7 left_torque -0.3925000
8   WTB2 7 right_torque -0.9250000
9   WTB2  7 left_torque -0.7391667
10  WTB2  7 left_torque -0.9991667
11  WTB2 8 right_torque -0.9791667
12  WTB2  7 left_torque -0.8366667"
data <- read.table(text=tt, header = TRUE)
dabest(data, group, PIs, idx = groupnames, paired = FALSE, func = median)

This all worked fine before the update. Thanks for any help!

P.S.: The code is part of our evaluation suite at https://github.com/brembslab/DTSevaluations and can be found at the end of the file project.Rmd

Error in bca.ci for larger sample sizes

I've taken one of the examples and increased the sample size, that caused the code to fail:

library(dplyr)
library(dabestr)
set.seed(54321)
N = 500
c1 <- rnorm(N, mean = 100, sd = 25)
c2 <- rnorm(N, mean = 100, sd = 50)
g1 <- rnorm(N, mean = 120, sd = 25)
g2 <- rnorm(N, mean = 80, sd = 50)
g3 <- rnorm(N, mean = 100, sd = 12)
g4 <- rnorm(N, mean = 100, sd = 50)
gender <- c(rep('Male', N/2), rep('Female', N/2))
dummy <- rep("Dummy", N)
id <- 1: N


wide.data <- 
  tibble::tibble(
    Control1 = c1, Control2 = c2,
    Group1 = g1, Group2 = g2, Group3 = g3, Group4 = g4,
    Dummy = dummy,
    Gender = gender, ID = id)


my.data   <- 
  wide.data %>%
  tidyr::gather(key = Group, value = Measurement, -ID, -Gender, -Dummy)
  


shared.control <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = c("Control1", "Control2", "Group1", "Group2", "Group3", "Group4"),
         paired = FALSE,
         reps = 500
         )

this fails with

Error in bca.ci(boot.out, conf, index[1L], L = L, t = t.o, t0 = t0.o,  : 
  estimated adjustment 'a' is NA

Two group unpaired mean difference - negative values

In the dabestr pacakge 0.3.0 version,
When both groups have negative values, the unpaired_mean_diff plot on the right side shows a wrong mean difference value.
For example, if one group values are around -330 and the other group values are around -300, then the unpaired mean diff plot shows -630 as a mean difference.

Misaligned mean difference line

When plotting proportional data, I found that the mean difference line is not aligned to the mean difference distribution.

Here is an example.

df <- cbind(data.frame(value=sample(1:9, 100, replace= TRUE)/10),data.frame(grp=sample(c("A","B"),100,replace=T)))

plot(dabest(df, x = grp, y = value, idx = c("A","B")))

Using par() and effect size color on plots

Hi guys!

I'm currently changing some of my analysis for a publication using estimation statistics and estimation plots.
Firstly, it is fantastic!

I could use some help on two little issues, hope they are not naive enough (can't say I'm an R-expert).

I'm trying to use par(mfrow()) to plot multiple plots side-by-side but I've failed so far. Can't figure out why. Could you guys help me with this one?
The default color for the bootstrap sampling distribution in Gardner–Altman plots is grey and I can't find a way to change this. However, in Fig. 1e (Ho et al. 2019) it is red. Was it changed manually in another software? I read that color setting is an issue for v0.3.0, but if there is an alternative available, it would be of great help.

Thank you!

consider adding Bayesian bootstrap as difference option?

Love the package! It makes a comparison in a paper I'm contributing really easy to visualize.

I would like to use the Bayesian bootstrap so as to be consistent with the other analytics in the paper. In the context of the plot/data, the difference between the methods will be indistinguishable, but statistical reviewers would know it's technically incorrect. Rasmus Bååth has a package already for this at https://github.com/rasmusab/bayesboot and probably there are some others that might be easy to integrate.

Thanks for considering this!

Unable to change effsize.ylim

Hi! I'm having trouble in changing the scale of y-axis of the effect size in a two-independent DABEST plot. I have two different plots to construct, and I want them to have the same scale sizes to facilitate understanding.

Here is the code with some data to exemplify:

cont1 <- c(17, 18.6, 21, 27, 13, 26, 19.5, 14.8, 12, 23, 10, 14.6, 23, 24)
treat1 <- c(30, 24, 18, 21, 12, 27.1, 25.5, 29, 45, 22, 29.3, 34, 22, 13)
cont2 <- c(6, 6, 22, 20, 15, 21, 13, 6, 6, 19, 6, 6, 31, 29)
treat2 <- c(6.1, 25, 22, 20, 15, 21, 13, 30, 60, 19, 6, 6, 31, 29)

First dataset

data.a <- tibble::tibble(Control = cont1, Treatment = treat1)
data.a.f <-
data.a %>%
tidyr::gather(key = Group, value = Measurement)
dabest.a <-
data.a.f %>%
dabest(Group, Measurement,
# The idx below passes "Control" as the control group,
# and "Group1" as the test group. The mean difference
# will be computed as mean(Group1) - mean(Control1).
idx = c("Control", "Treatment"),
paired = FALSE)
plot(dabest.a,
rawplot.ylabel = "Concentration",
effsize.ylabel = "Mean difference", palette = "Dark2",
rawplot.markersize = 4, rawplot.groupwidth = 0.2,
rawplot.ylim = c(10, 50),
effsize.ylim = c(0, 30))

Second dataset

data.b <- tibble::tibble(Control = cont2, Treatment = treat2)
data.b.f <-
data.b %>%
tidyr::gather(key = Group, value = Measurement)
dabest.b <-
data.b.f %>%
dabest(Group, Measurement,
# The idx below passes "Control" as the control group,
# and "Group1" as the test group. The mean difference
# will be computed as mean(Group1) - mean(Control1).
idx = c("Control", "Treatment"),
paired = FALSE)
plot(dabest.b,
rawplot.ylabel = "Concentration",
effsize.ylabel = "Mean difference", palette = "Dark2",
rawplot.markersize = 4, rawplot.groupwidth = 0.2,
rawplot.ylim = c(10, 50),
effsize.ylim = c(0,30))

Another question: how do I change the scale interval (for example, by 5, not 10). Thanks!

palette argument does not work

I try to follow Joses' tutorial: https://cran.r-project.org/web/packages/dabestr/vignettes/using-dabestr.html

I use the code in the tutorial:

plot(multi.group, color.column = Gender, palette = c("#FFA500", "sienna4"))

But it does not work and gives a warning message:

Warning messages:
1: In if (!palette %in% unlist(brewer)) { :
  the condition has length > 1 and only the first element will be used
2: In pal_name(palette, type) : Unknown palette #FFA500sienna4

In my own case, my aim is to manually change the colour of rawdata swarmplot. In addition (maybe related or not), I also wish to manually change the colour of slope lines (for example, to "grey"). Any idea how to do it? Thanks!

Marker Transparency

Thank you for generating this package - I'm excited about the possibilities it opens for interpreting my data!

I would like to render the datapoints/markers semi-transparent (alpha < 1) so that overplotted data will be easier to visualize. I have attempted using the following arguments, based on code from the readme, tutorial, and manual files, though have not had any luck. I do not receive any errors using the following code, though the transparency of the data points in the plot is not altered.

plot(unpaired_mean_diff, rawplot.type = "swarmplot", palette = c(rgb(0, 0, 0, 0.2), alpha("sienna4", 0.2)))

plot(unpaired_mean_diff, rawplot.type = "swarmplot", palette = c(rgb(0, 0, 0), "sienna4")) + geom.quasirandom(alpha = 0.2)

I have also attempted utilizing the swarmplot.params argument (from geom_quasirandom()link), though receive the error, "Error in plot.dabest_effsize(unpaired_mean_diff, rawplot.type = "swarmplot", : swarmplot.params is not a list."

plot(unpaired_mean_diff, rawplot.type = "swarmplot", palette = c(rgb(0, 0, 0), "sienna4"), swarmplot.params = (alpha = 0.2))

Interestingly, if I change the plot type to "sinaplot", alpha can be changed through the palette argument, though the group width becomes rather thin and is not responsive to the rawplot.groupwidth argument.

plot(unpaired_mean_diff, rawplot.type = "sinaplot", rawplot.groupwidth = 1, palette = c(rgb(0, 0, 0, 0.2), alpha("sienna4", 0.2)))

I am currently toubleshooting through RStudio v1.3.1056 and dabestr v.0.3.0 with the iris dataset. As I'm relatively new to R, I would like to know whether I am missing a simple argument or command, or whether there is a bug and/or workaround? Any advice would be appreciated!

NAs in plot dabestr

Hi,
I am trying to plot a two.group.unpaired object with the package dabestr. I am able to create the object and find que mean difference between my two groups, but not to plot it.
Shortly, I have 18 moose calves that were captured in January (time = 1) and recaptured in spring (time = 2). However, only 11 were really recaptured in spring due to mortality (therefore my two groups are not of the same size). We weighed the moose for at each capture, so we have their mass in January and in spring.

The NAs seem to be causing problems but I don't know how to solve it.
Here is an example of what happens.

id <- c('18001','18001', '18002', '18002', '18003', '18003', '18004', '18004', 
        '18005', '18005', '18006', '18006', '18007', '18007', '18008', '18008',
        '18009', '18009', '18010', '18010', '18011', '18011', '18012', '18012',
        '18013', '18013', '18014', '18014', '18015', '18015', '18016', '18016',
        '18017', '18017', '18018', '18018')
time <- c('1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2',
          '1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2',
          '1', '2', '1', '2', '1', '2', '1', '2')
location<- c('N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N',
             'N', 'N', 'N', 'N', 'N', 'N', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S',
             'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S')
masskg<- c(235,220,174,169,193,163,221,207,188, NA,242,217,175, NA,196,190,196,186,
           184, 172, 168, NA, 205, 191, 172, NA, 189, 169, 203, 166, 167, NA, 219, NA,
           194,NA)

mm <- data.frame(id, time, location, masskg)

library(tidyverse)
library(dabestr)

#> Loading required package: boot
#> Loading required package: magrittr
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

two.group.unpaired <- mm %>% dabest(time, masskg, idx = c("1", "2"), paired = FALSE)
two.group.unpaired

#> DABEST (Data Analysis with Bootstrap Estimation) v0.2.2
#> =======================================================
#> 
#> Variable: masskg 
#> 
#> Unpaired mean difference of 2 (n=18) minus 1 (n=11)
#>  -9.25 [95CI  -24.7; 5.6]
#> 
#> 
#> 5000 bootstrap resamples.
#> All confidence intervals are bias-corrected and accelerated.

plot(two.group.unpaired, color.column = location)
#> Error in quantile.default(~masskg): missing values and NaN's not allowed if 'na.rm' is FALSE

plot(two.group.unpaired, color.column = location, na.rm = T)
#> Error: `...` is not empty.
#> 
#> We detected these problematic arguments:
#> * `na.rm`
#> 
#> These dots only exist to allow future extensions and should be empty.
#> Did you misspecify an argument?

plot(na.omit(two.group.unpaired, color.column = location))
#> Error in quantile.default(~masskg): missing values and NaN's not allowed if 'na.rm' is FALSE

^{Created on 2019-10-22 by the reprex package (v0.3.0)}

Thank you

Delphine

Outcomes in WEB and R don't match

By chance I noticed that the outcomes of a paired (X and Y) mean difference that I had run in R (dabestr v0.3.0) and the one obtained after copy&paste and also upload to https://www.estimationstats.com/#/analyze/paired are quite different.

In R: Paired mean difference of X (n = 202) minus Y (n = 202) 0.161 [95CI -0.371; 0.713]
In WEB: The paired mean difference between X and Y is 0.161 [95.0%CI -0.0476, 0.367]

In both cases 202 paired observations were used (I attached the csv file). In WEB both variables (X, Y) were upload as expected in distinct columns and in R I prepared a tidy dataset as it is explained here.

As you can see the limits of the 95%CI are very different and I don't understand why. Which one is the correct outcome?

By the way.. I wonder to what extend the length of the 95%CI can be used to assess whether the scores of my two variables (X,Y) are 'equivalent' as it is done in TOST analysis (e.g. TOSTER R package). Let's suppose that I define that two scores in my research field are equivalent if their mean paired differences fall inside -0.5 and +0.5. If all I said is correct the outcome of the web [95.0%CI -0.0476, 0.367] would support equivalence but the outcome in R does not. Am I correct if I use the 95%CI of dabestr to assess equivalence or do I have to run specific tests for this?

Thanks you so much for your software. Actually, as soon as I was able to solve this issue I'm going to use it in my next paper. 👍

Best regards from Spain!

202_X_Y_paired_data.zip

Direction of effect

Hi,

I was wondering why the direction of effect is the same regardless of which way around the groups are. See the graphs attached for examples. Both display the same data, but on is "off minus on", while the other is "on minus off". Yet they have the same direction of effect.

Thanks.

Median difference in R

Hello,

I have been using your website interface to graph my data but would like to switch over to graphing in R to take advantage of choosing some details, like the color of the plots. Is there a function to analyze the data with the median (instead of the default mean) in R? I can't seem to find it, if it's out there already!

Thanks for this wonderful resource!
Miwa

using superscript in the names of groups in plot

Hi all,
I need to change the groups names in a dabestr plot using super and subscript

from Asaia phM4 to Asaia^phM4

any suggestion?

Thank you very much

Massimo

How to remove the legend

How can I get rid of the legend? I tried a ggplot-like command legend.position = “none”) but I get: Error in theme_classic(legend.position = "none") : unused argument (legend.position = "none")

Thanks

EIDT - OK ignore - just remembered the legend goes if I don't include a color.column statement.

Import plyr

The new version of ggplot2 no longer depend on plyr. dabestr has somehow been able to have plyr:: calls in its code without explicitly depending on plyr, but these will now break as plyr is no longer present on a clean system.

To fix this, add plyr to Imports in the DESCRIPTION file and submit to CRAN

Removing the distribution plot

Hi,

I want to combine 9 plots in one graph, but I notice that the graph was too big.
So is it possible to remove the distribution plot and just present the effect size plot?

And how I can remove the legends and the xlab and ylab?

Thanks
Cheers,

Giulliana

Error with unused argument "..1"

Hi there, packages looks great and I'm keen to run in R, but when I try to run the dabest function, I get the following error:

dabestr::dabest(iris, Species, Petal.Width, idx = c("setosa", 
    "versicolor", "virginica"), paired = FALSE)
#> 1 components of `...` were not used.
#> 
#> We detected these problematic arguments:
#> * `..1`
#> 
#> Did you misspecify an argument?

The traceback from the error is:

    x
 1. \-dabestr::dabest(...)
 2.   \-forcats::as_factor(data.out[[x_quoname]], all_groups)
 3.     \-(function (env = parent.frame()) ...
 4.       \-ellipsis:::stop_dots(...)

My session info is:

#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7601) Service Pack 1
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252 
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.6.0  magrittr_1.5    tools_3.6.0     htmltools_0.3.6
#>  [5] yaml_2.2.0      Rcpp_1.0.1      stringi_1.4.3   rmarkdown_1.13 
#>  [9] highr_0.8       knitr_1.23      stringr_1.4.0   xfun_0.7       
#> [13] digest_0.6.19   evaluate_0.14

No idea what's causing this, but any help in get dabestr to run would be much appreciated!

Controling Aesthetics in dabestr

Great package so far, having troubles controlling some key aesthetics.
I want to be able to control the y-axis title and move it closer to its y-axis.
This can normally be achieved in ggplot using theme(axis.title.y = element_text(hjust=3.5))
Though if we try to add this line of code into the plotting of a Gardner Altman plot using Dabestr it doesn't work.
Is there some way to control this aesthetic?

Below is an example using the iris dataset and the associated error if we try to alter the theme
data(iris)
iris <- dplyr::filter(iris, Species != "setosa")
iris_plot <- dabest(iris, Species, Sepal.Length, idx = c("versicolor", "virginica"), paired= FALSE)
plot(iris_plot, rawplot.ylabel = "Sepal Length (cm)", effsize.ylabel = "Mean difference", rawplot.type = "sinaplot", theme(axis.title.y = element_text(hjust=3.5)))

Error: ... is not empty.

We detected these problematic arguments:

..1

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
Run rlang::last_error() to see where the error occurred.

Side note can we obtain a print out of specific values from the Gardner Altman plot?
I've attached a picture to hopefully explain this a little better.

Thanks in advance!

Setting function to Cohen's d or Hedges' g

Hi,

In a paired comparison, is it possible to set the function (func =) to obtain the Cohen's d or Hedges g effect size, as it is on the website version?

On the website version, I can choose between mean, median, Cohen's d and Hedges' g. On R, I tried with func = mean and func = median, and both work. But what argument should I input to have the Cohen's d effect size, as in the website?

Thank you very much.

paired_mean_diff <- data %>% dabest(., x, y, idx = c("Control", "Group"), paired = TRUE, id.column = id, ci = 95, func = mean)

Error: 'check_dots_empty' is not an exported object from 'namespace:ellipsis'

I followed the code in vignette and got this error when I tried to plot. I was able to get dabest.object but can't plot it. The development version also has this issue. Any advice? Thanks!

Paired data with more than 2 measures don't show slopegraph

Problem

Paired measures with more than 2 measures don't show a slopegraph, meaning that the main data points are not visually paired.

One fix is to use color, e.g.,

color.column = groups

But with large datasets this is hard to parse, and requires some theme adjustment to avoid a huge legend, e.g.,

theme = ggplot2::theme_classic() + theme(legend.position = "none")

Solution

Ideally, each group in each measure would be connected with a line to consecutive measures.

How to output Cliff's Delta as effect size

Hi,

Thanks for the excellent estimation stats package. May I ask how I can output Cliff's Delta as effect size measure in the Gardner-Altman estimation plot? I am new to using dabestr and I apologize if I have missed the method in the package documentation.

Thank you.

Regards,
Yng Miin Loke

Adding Darker Average Line

Hello,
Is there a way to add a darker average line for the multi-paired two group plot?

Thank you,
Ben

Visualization is different across computers?

When I plot dabest objects, I get annoying black borders around the effect size distributions, seen here. They look ugly and can partially hide the actual confidence intervals.

Additionally, when I re-knitted "bootstrap-confidence-intervals.Rmd," all the images with distribution curves (even the images not created with dabestr) had black borders. Not sure what that means.

Anyway, I included my session info below, if that helps. I'm running Windows, and I know not everyone else does.

Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dabestr_0.2.0   magrittr_1.5    boot_1.3-20     forcats_0.3.0   stringr_1.3.1  
 [6] dplyr_0.8.0.1   purrr_0.2.5     readr_1.3.1     tidyr_0.8.2     tibble_2.0.1   
[11] ggplot2_3.1.0   tidyverse_1.2.1

loaded via a namespace (and not attached):
 [1] beeswarm_0.2.3     tidyselect_0.2.5   xfun_0.8           haven_2.0.0       
 [5] lattice_0.20-38    colorspace_1.4-0   generics_0.0.2     htmltools_0.3.6   
 [9] yaml_2.2.0         rlang_0.3.1        pillar_1.3.1       simpleboot_1.1-7  
[13] glue_1.3.0         withr_2.1.2        RColorBrewer_1.1-2 modelr_0.1.2      
[17] readxl_1.2.0       plyr_1.8.4         munsell_0.5.0      gtable_0.2.0      
[21] cellranger_1.1.0   rvest_0.3.2        evaluate_0.14      labeling_0.3      
[25] knitr_1.23         vipor_0.4.5        broom_0.5.1        Rcpp_1.0.0        
[29] backports_1.1.3    scales_1.0.0       jsonlite_1.6       hms_0.4.2         
[33] digest_0.6.18      stringi_1.2.4      cowplot_0.9.4      grid_3.5.2        
[37] cli_1.0.1          tools_3.5.2        lazyeval_0.2.1     crayon_1.3.4      
[41] pkgconfig_2.0.2    rsconnect_0.8.13   xml2_1.2.0         ggbeeswarm_0.6.0  
[45] lubridate_1.7.4    assertthat_0.2.0   httr_1.4.0         rstudioapi_0.9.0  
[49] R6_2.3.0           nlme_3.1-137       compiler_3.5.2```

Sinaplot

I like your idea! I seems that maybe you are actually not making a swarm plot, but, wisely, a Sina Plot. The name should be changed in the readme, as well as stated in your publication. Swarm plots have a branching structure, which I don't see in your plots. Sina plots have random jitter within the distribution (like jitter in a violin plot). For small size it is hard to see the difference, but clear for many. Nonetheless I suggest the Sinaplot.

p value

Is there anyway I can get the pvalue of permutation t-test by using this package? Thanks!

bac CI not possible

Hello
Cool package.
You write:
"All confidence intervals are bias-corrected and accelerated".
But I think this is a mistake.

In your function main.R at line 333 you write:
bootci <- boot::boot.ci(boot, conf = ci/100, type = c("perc", "bca"))
I cannot find the formula for bca CI but you can see with this example that it does not work:

my_boot <- simpleboot::two.boot(rnorm(n = 100, mean = 1), rnorm(n = 151, mean = 3),  R= 100, FUN = mean)
boot::boot.ci(my_boot, conf = 0.95, type = "bca")

This is because the formula for bca CI depends on a which in turn depends on n. But we don't have a n in difference of means... We only have n1 and n2. But it is possible to create a "bias-corrected CI".

My point is that I it is not possible to make a bca CI for difference in means (see slide 36 here http://users.stat.umn.edu/~helwig/notes/bootci-Notes.pdf)

Multi-paired analysis with unequal samples

Greetings,
I'm hoping you can help me. I'm looking to analyze some data from a repeated measures ANOVA design (2 groups, pre- and post-measures). Unfortunately there is some missing data and so currently my sample sizes are unbalanced (n=12 and n=15).

I can use dabestr to analyze the groups separately using a paired analysis. However, if I try to use the multi-group paired method, I get an error. Any advice or help would be great.

See below for reproducible example

Data

library(tidyverse)
library(dabestr)

demo_data <- data.frame(id = rep(seq(1:27), each = 2),
                        group = rep(c("amber", "blue"), 
                                    times = c(15, 12)),
                        phase = rep(c("baseline", "ptx"),
                                    times = 27, each = 1),
                        score = rnorm(54, mean = 10, sd = 2))

Single group analyses:

demo_data %>%
  filter(group == "blue") %>%
  dabest(phase, score,
         idx = list(c("baseline", "ptx")),
         paired = TRUE,
         id.col = id)

DABEST (Data Analysis with Bootstrap Estimation) v0.2.0

Variable: score

Paired mean difference of ptx (n=12) minus baseline (n=12)
0.255 [95CI -1.61; 1.85]

5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.

demo_data %>%
  filter(group == "amber") %>%
  dabest(phase, score,
         idx = list(c("baseline", "ptx")),
         paired = TRUE,
         id.col = id)

DABEST (Data Analysis with Bootstrap Estimation) v0.2.0

Variable: score

Paired mean difference of ptx (n=15) minus baseline (n=15)
-0.719 [95CI -2.12; 0.619]

5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.

However, it fails if I try to run both groups together:

# Merge group and phase for a single column

demo_data %>%
  unite(group, c("group", "phase"), sep = "_") %>%
  dabest(group, score,
         idx = list(c("blue_baseline", "blue_ptx"),
                    c("amber_baseline", "amber_ptx")),
         paired = TRUE,
         id.col = id)

demo_data %>%
  unite(group, c("group", "phase"), sep = "_") %>%
  group_by(group) %>%
  summarise(count = n())

Error in dabest(., group, score, idx = list(c("blue_baseline", "blue_ptx"), :
The two groups are not the same size, but paired = TRUE.

A tibble: 4 x 2
group count

1 amber_baseline 15
2 amber_ptx 15
3 blue_baseline 12
4 blue_ptx 12

Column names conflict with local variables

I got the strange result as seen below.

> iris %>% dabestr::dabest(Species,Sepal.Length,c("virginica","setosa"))
DABEST (Data Analysis with Bootstrap Estimation) v0.2.0
=======================================================

Variable: Sepal.Length 

Unpaired mean difference of setosa (n=50) minus virginica (n=50)
 -1.58 [95CI  -1.78; -1.38]


5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.

> iris %>% rename(group=Species) %>% dabestr::dabest(group,Sepal.Length,c("virginica","setosa"))
DABEST (Data Analysis with Bootstrap Estimation) v0.2.0
=======================================================

Variable: Sepal.Length 

Unpaired mean difference of setosa (n=50) minus virginica (n=50)
 0 [95CI  -0.144; 0.128]


5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.

The cause of this is that data_for_diff %>% dplyr::filter(!!x_enquo == group[1]) in main.R is evaluated as data_for_diff %>% dplyr::filter(group == group[1]) in the latter case.

	# Copyright (c) 2018 Micah Allen, Davide Poggiali, Kirstie Whitaker,
	# Tom Rhys Marshall and Rogier Kievit.