GithubHelp home page GithubHelp logo

rigr's Introduction

CRAN status R-CMD-check codecov

rigr: Regression, Inference, and General Data Analysis Tools for R

Introduction

rigr is an R package to streamline data analysis in R. Learning both R and introductory statistics at the same time can be challenging, and so we created rigr to facilitate common data analysis tasks and enable learners to focus on statistical concepts.

rigr, formerly known as uwIntroStats, provides easy-to-use interfaces for descriptive statistics, one- and two-sample inference, and regression analyses. rigr output includes key information while omitting unnecessary details that can be confusing to beginners. Heteroskedasticity-robust (“sandwich”) standard errors are returned by default, and multiple partial F-tests and tests for contrasts are easy to specify. A single regression function (regress()) can fit both linear models, generalized linear models, and proportional hazards models, allowing students to more easily make connections between different classes of models.

Installation

You can install the stable release of rigr from CRAN as follows:

install.packages("rigr")

You can install the development version of rigr from GitHub using the code below. The installment is through the R package remotes.

# run the following code to make sure that the remote package it installed
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
# install the package
remotes::install_github("statdivlab/rigr")

rigr is maintained by the StatDivLab, but relies on community support to log issues and implement new features. Is there a method you would like to have implemented? Please submit a pull request or start a discussion!

Documentation

Examples of how to use the main functions in rigr are provided in three vignettes, available here. One details the regress function and its utilities, one details the descrip function for descriptive statistics, and the third details functions used for one- and two-sample inference, including ttest, wilcoxon, and proptest.

Humans

Maintainer: Amy Willis

Authors: Scott S Emerson, Brian D Williamson, Charles Wolock, Taylor Okonek, Yiqun T Chen, Jim Hughes, Amy Willis, Andrew J Spieker and Travis Y Hee Wai.

Issues

If you encounter any bugs, please file an issue. Better yet, submit a pull request!

Do you have a question? Please first check out the vignettes, then please post on the Discussions.

Code of Conduct

Please note that the rigr project is released with a [Contributor Code of Conduct](https://statdivlab.github.io/rigr/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

rigr's People

Contributors

adw96 avatar ameerd avatar bdwilliamson avatar cwolock avatar gthopkins avatar svteichman avatar taylorokonek avatar yiqunchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rigr's Issues

factors in bplot

bplot should be able to handle factors

bplot(fev,agecat,xlab="Age Category",ylab="FEV (liters)") # fails if agecat is a factor

coefficient names unclear for categorical variables in `regress`

Note that when the following code is run:

library(rigr); data(mri)
regress("mean", atrophy ~ sex, data=mri)

the coefficient names printed are "Intercept" and "sex", whereas from lm() we'd have "(Intercept)" and "sexMale", this telling us which value of the categorical variable the coefficient corresponds to. We should have something similar to this in our print method for regress. Maybe something like "sex: Male"?

Alternative hypothesis in ttest

The alternative hypothesis in test is specified with the argument "test.type". It would be better to be consistent with other R functions (particularly t.test) and use the argument "alternative". The attached version of ttest changes the
argument name everywhere.
my_ttest.txt

Does `ifelse1` need to exist?

I think this function has identical behavior and interface as ifelse - I'll look into it but it doesn't seem like we need this.

Make styling consistent

Currently, function names and objects follow inconsistent styling. Ideally, we would follow a style guide (e.g., Hadley Wickham's or Google's) so that it is easier to read and debug the code.

I propose following Hadley's for the most part, and using underscores (_) in both function and object names.

Also, ideally the main user-facing functions will have their own files (e.g., descrip.R) and internal helper functions will live in a separate file (e.g., descrip_utils.R or possibly a separate file for each internal function).

descrip no longer works is intended on `mri` data

A previous line in test_cases vignette was descrip(mri). This no longer works. Possible reasons include that mri is now a tibble rather than a dataframe, but perhaps also because it contains variables of different types (not only numeric).

Streamline `ttest.R` code

The code for this function is much more verbose than it needs to be, including many statements like if (condition){}; else{do stuff}. For ease of future maintenance this should be cleaned up.

Document helper functions

Many helper functions (i.e., non-exported functions) have no documentation. I think a good goal is to have all functions be documented in some way.

why does `regress` no longer support proportional hazards regression, and what do we want moving forward?

The documentation for regress seems to suggest you can do proportional hazards regression using fnctl = "hazard", but the code will actually throw an error if you specify fnctl = "hazard", stating "proportional hazards regression no longer supported". That being said, the remainder of the code seems to still have if/else cases and other sections in case fnctl = "hazard", so is this error something that was thrown in recently? @bdwilliamson do you have any insight?

Another question is whether we want regress to be able to handle proportional hazards regression moving forward.

online documentation

It would be wonderful to have online documentation. I haven't looked into this for many years but I recall this being pretty easy with github.io and pkgdown.

Fix `ttesti` documentation

ttesti.R documentation needs some changes - a few of the parameter descriptions aren't working properly.

Make TRUE and FALSE consistent

Currently, T and F are sometimes used as shorthand for TRUE and FALSE. While this is possible, it isn't good practice -- TRUE and FALSE are protected, while T and F are valid object names and thus can be overwritten by the user.

We should use the full words!

`proptest`

Implement proptest function as a branch from ttest. Add continuity correction, which is not currently implemented.

devtools::check() fails

As a result of some of the change is associated with modernizing the package, devtools::check() fails.

updated functions from scott emerson

In an earlier email from Scott Emerson to me: "I have some updated functions that have not yet been put in the package."

Could the summer TA team please that reach out to him about this and incorporate those edits?

remove `version` parameter from `descrip()`?

Currently if you specify version = TRUE in descrip(), the function will return "20160730" and nothing else. I'd imagine we either want to to return a date that's a bit closer to today, or just remove this parameter entirely. Personally, I'm in favor of removing it entirely. Any objections or other thoughts?

bplot axis label issue

the followiong produces a plot with double x axis labels

bplot(fev,as.integer(agecat),xlab="Age Category",ylab="FEV (liters)")

CRAN

It would be great to be able to install this package from CRAN -- this would enormously help many of our students with using it for the first time. Target date: Sept 13.

interaction terms in regress

notice in the output of the following, the interaction term is not clearly labelled

also the overall F-statistic from regress does not agree with the F-statistic from lm

regress("mean",fev~ht*as.integer(sex))

`equal` function could be cleaner

In termTraverse.R (soon to be in regress_utils.R), there is a function defined called equal, and the value returned calls the function itself (not sure if there are any implications to this?). The function seems to be doing the same thing as checking if length(unique(x)) == 1, other than that they would handle vectors that include NAs differently. This is mostly a reminder to myself to edit this helper function to be a bit cleaner in the future.

equal <- function(x){
if(length(x)==1){
return(TRUE)
} else {
if(x[1]!=x[2]){
return(FALSE)
} else {
return(equal(x[-1]))
}
}
}

lincom

It would be very useful if lincom could handle lm objects as well as uRegress objects

Remove `version` parameter from functions, add version and/or package creation date when package is loaded

It will be far more maintainable to have the version listed in one spot (for example, when the package is loaded) as opposed to in individual functions. version parameter has already been removed from regress, descrip, and both of their utility functions, but the parameter likely still exists elsewhere.

At the suggestion of @bdwilliamson, we could consider is adding a print message when the package gets loaded that says something about the version and/or the package creation date.

This issue builds off of #33

`wilxcoxon()` returns incorrect `data.name`

This seems to me to be a scoping issue, since wilcoxon() calls wilcoxon.do(). I tried using sys.frame(1) in the substitute(), without success. Commented out failing unit tests for this - restore these when fixed.

Streamline `wilcoxon.R` code

wilcoxon.R has some redundancies, including calculating the p-value multiple times. The whole function should be simplified.

`wilcoxon()` provides two (different) p-values when `exact = TRUE`

As currently written, wilcoxon() is (as far as I can tell) largely a copy-pasted version of wilcox.test() from the stats package. That copied code provides a p-value (variable PVAL in the code), either using an exact test, or Normal approximation.

One of the additions in wilcoxon() is the inf return object, which is described as "a formatted table of inference values, for printing." This object provides the PVAL mentioned above, but also includes the Z-score and corresponding Normal approximation p-value. This p-value will correspond to PVAL when exact = FALSE. However, when exact = TRUE is used, PVAL is an exact p-value, but the Normal approximation p-value is still included in inf.

Long story short: when exact = TRUE, wilcoxon() provides both the exact and approximate p-value, which I think would be confusing to a user. To see this run:

set.seed(2)
y <- rnorm(100)
wilcoxon(y, exact = TRUE)

The inf return object has p-value 0.70012 (exact) and 0.69762 (Normal approximation).

My feeling is that we shouldn't report two different p-values. Thoughts?

regress doesn't accept factors

Other regression commands in R accept factor covariates but
regress("mean", fev ~ ht+sex,data=fevdata) # fails if sex is a factor

Incorrect label in ttest

I downloaded the rigr package on 7/23 and was using ttest() and got this result. Notice the incorrect label above the output that says "One-sample t-test"

with(hiv,ttest(cd4[art==0],cd4[art==1]))

Call:
ttest(var1 = cd4[art == 0], var2 = cd4[art == 1])

Two-sample t-test allowing for unequal variances :

One-sample t-test :

Summary:
Group Obs Missing Mean Std. Err. Std. Dev. 95% CI
cd4[art == 0] 78 14 388.5 32.5 260 [323.5, 454]
cd4[art == 1] 50 19 298.5 32.9 183 [231.3, 366]
Difference 128 33 90.1 46.3 [-2.02, 182]

Ho: difference in means = 0 ;
Ha: difference in means != 0
t = 1.946 , df = 80.7
Pr(|T| > t) = 0.0551195

force users to write out the full fnctl argument in `regress`

We no longer want to support shorted fnctl strings in regress

Discussed in #53

Originally posted by taylorokonek August 9, 2021
Currently, users can enter a shortened character string for the fnctl argument in regress, and so long as it corresponds to a unique substring, the function will run. As an example, a user could simply set fnctl = "g", and the function would take this to mean fnctl = "geometric mean". Personally, I don't think there's a huge harm in forcing users to spell out the fnctl argument. Does anyone have any insight as to why this particular functionality was introduced (@bdwilliamson)?

bug in ttesti

one-sided p-value in ttesti wrong for Ha: <. Fixed in the attached file (to find changes, search for "JPH")
my_ttesti.txt

intercepts in linear models

If I recall correctly, you can use the argument intercept = FALSE in regress(). Is it possible to add the -1 functionality?

salary dataset as dataset

Currently both some vignettes and some examples run the following to get the salary data

salary <- read.table("http://www.emersonstatistics.com/datasets/salary.txt", header = TRUE, stringsAsFactors = FALSE)

This should instead be a documented dataset released with the package.

This impact the vignette regress_intro amongst others.

what is 'stratify' supposed to do in bplot?

I can't seem to get an example working (that doesn't throw an error) where the 'strata' argument in bplot is used. Is this argument supposed to do something analogous to facet_wrap in ggplot, or something else?

changed behavior for descrp

The newest version of descrip prints, by default, the columns Max Restriction, FirstEvent, LastEvent, isDate. Those seem confusing for the average user, especially if none of the measures is a time-to-event. Can we have descrip not print those columns by default?

Also, at the bottom of the descrip output I see
attr(,"class")
[1] "uDescriptives"
which will be confusing to the naive user.

Finally, all statistics are printed in scientific notation which is different from before and harder to read.

Having said all this, I just figured out that I think the issue is simply that the print.uDescriptives function has not been included in the rigr package?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.