ohdsi / selfcontrolledcohort Goto Github PK

An R package for performing self-controlled cohort analyses, a method to estimate risk by comparing time exposed with time unexposed among the exposed cohort.

Home Page: http://ohdsi.github.io/SelfControlledCohort

R 98.57% Perl 0.60% Shell 0.83%

hades

selfcontrolledcohort's Introduction

SelfControlledCohort

SelfControlledCohort is part of HADES.

Introduction

This package provides a method to estimate risk by comparing time exposed with time unexposed among the exposed cohort.

Features

Extracts the necessary data from a database in OMOP Common Data Model format.
Supports stratification by age, gender, and index year.

Example

library(SelfControlledCohort)

connectionDetails <- createConnectionDetails(dbms = "postgresql",
                                             user = "joe",
                                             password = "secret",
                                             server = "myserver")
                                             
sccResults <- runSelfControlledCohort(connectionDetails,
                                     cdmDatabaseSchema = "cdm_data",
                                     exposureIds = c(767410, 1314924, 907879),
                                     outcomeIds = 444382,
                                     outcomeTable = "condition_era")

summary(sccResults)

Technology

SelfControlledCohort is an R package.

System Requirements

Requires R. Libraries used in SelfControlledCohort require Java.

Getting Started

See the instructions here for configuring your R environment, including Java.
In R, use the following commands to download and install SelfControlledCohort:

install.packages("remotes")
remotes::install_github("ohdsi/SelfControlledCohort")

User Documentation

Documentation can be found on the package website.

PDF versions of the documentation are also available:

Package manual: SelfControlledCohort.pdf

Support

Developer questions/comments/feedback: OHDSI Forum
We use the GitHub issue tracker for all bugs/issues/enhancements

Contributing

Read here how you can contribute to this package.

License

SelfControlledCohort is licensed under Apache License 2.0

Development

SelfControlledCohort is being developed in R Studio.

Development status

Beta

selfcontrolledcohort's People

Contributors

Stargazers

Watchers

Forkers

jamieweaver azimov ablack3

selfcontrolledcohort's Issues

Remove i argument from computeIrr function

It appears that the argument i in the computeIrr function is never assigned a value unless I'm missing something. In any case it seems that the code gives identical results if the i argument is removed.

library(SelfControlledCohort)
#> Loading required package: DatabaseConnector

computeIrrs <- function(estimates) {
  computeIrr <- function(i, numOutcomesExposed, numOutcomesUnexposed, timeAtRiskExposed, timeAtRiskUnexposed) {
    test <- rateratio.test::rateratio.test(x = c(numOutcomesExposed[i],
                                                 numOutcomesUnexposed[i]),
                                           n = c(timeAtRiskExposed[i],
                                                 timeAtRiskUnexposed[i]))
    return(c(test$estimate[1], test$conf.int))
  }
  irrs <- mapply(computeIrr,
                 numOutcomesExposed = estimates$numOutcomesExposed,
                 numOutcomesUnexposed = estimates$numOutcomesUnexposed,
                 timeAtRiskExposed = estimates$timeAtRiskExposed,
                 timeAtRiskUnexposed = estimates$timeAtRiskUnexposed)
  estimates$irr <- irrs[1, ]
  estimates$irrLb95 <- irrs[2, ]
  estimates$irrUb95 <- irrs[3, ]
  return(estimates)
}


computeIrrs2 <- function(estimates) {
  computeIrr <- function(numOutcomesExposed, numOutcomesUnexposed, timeAtRiskExposed, timeAtRiskUnexposed) {
    test <- rateratio.test::rateratio.test(x = c(numOutcomesExposed,
                                                 numOutcomesUnexposed),
                                           n = c(timeAtRiskExposed,
                                                 timeAtRiskUnexposed))
    return(c(test$estimate[1], test$conf.int))
  }
  irrs <- mapply(computeIrr,
                 numOutcomesExposed = estimates$numOutcomesExposed,
                 numOutcomesUnexposed = estimates$numOutcomesUnexposed,
                 timeAtRiskExposed = estimates$timeAtRiskExposed,
                 timeAtRiskUnexposed = estimates$timeAtRiskUnexposed)
  estimates$irr <- irrs[1, ]
  estimates$irrLb95 <- irrs[2, ]
  estimates$irrUb95 <- irrs[3, ]
  return(estimates)
}



# test difference on some simulated data
res <- replicate(10, {
  lambdas <- rgamma(4, max(1, rnorm(1, 10, 2)), scale = runif(1, .5, 3))
  estimates <- data.frame(numOutcomesExposed   = pmax(1, rpois(1000, lambdas[1])),
                          numOutcomesUnexposed = pmax(1, rpois(1000, lambdas[2])),
                          timeAtRiskExposed    = pmax(1, rpois(1000, lambdas[3])),
                          timeAtRiskUnexposed  = pmax(1, rpois(1000, lambdas[4])))

  identical(computeIrrs(estimates), computeIrrs2(estimates))
})

all(res)
#> [1] TRUE


suppressPackageStartupMessages(library(dplyr))
# test difference on Eunomia
estimates <- runSelfControlledCohort(Eunomia::getEunomiaConnectionDetails(),
               cdmDatabaseSchema = "main",
               exposureIds = '',
               outcomeIds = '') %>%
  {.$estimates}
#> Connecting using SQLite driver
#> Retrieving counts from database
#> Executing SQL took 0.898 secs
#> Computing incidence rate ratios and exact confidence intervals
#> Performing SCC analysis took 2.2 secs

estimates <- estimates %>%
  select(numOutcomesExposed, numOutcomesUnexposed, timeAtRiskExposed, timeAtRiskUnexposed)


identical(computeIrrs(estimates), computeIrrs2(estimates))
#> [1] TRUE

^{Created on 2021-01-08 by the reprex package (v0.3.0)}

Document usage of utilities for examining time to event (e.g. to uncover protopathic bias)

This should include a vignette that describes the following:

What the time at risk tables represent
How to utilise incidence rates over time (e.g. graphing incidence in binned time windows before and after exposure index)
How to visualise this data with ggplot2

Error message with time-at-risk boundaries incorrect

Currrently runSelfControlledCohort will stop if riskWindowStartExposed < riskWindowEndExposed: https://github.com/OHDSI/SelfControlledCohort/blob/master/R/SelfControlledCohort.R#L168

However, if addLengthOfExposureExposed = TRUE, then this stopping condition should not be satisfied.

Fails when running with large number of exposures/outcomes

When running with many thousands of cohorts the SQL IN query in inst/sql/sql_server/Scc.sql fails. To resolve this I have created an associated PR that places the ids in temporary tables.

Add vignette that describes the method in detail with examples

I'm thinking of writing a vignette that describes this method in detail with examples for anyone not familiar with it. @schuemie, what do you think?

Add a NEWS.md file

Similar to other HADES repos (e.g. https://github.com/OHDSI/SqlRender/blob/master/NEWS.md) . This will automatically be included in the package website when running pkgdown.

Addition of time at risk distributions for exposure/outcome cohorts

For some time I have had the requirement to compute some statistics for the time at risk calculations for subjects in the exposure/outcome cohorts computed in the SCC analysis. Essentially, the calculations are the distributions of exposure times. E.g. this is informative for comparison between the length of exposure times between drug users with the outcome and drug users without the outcome of interest.

The code also computes the distribution of the absolute difference between exposure start and outcome start. I'm not sure if this is as useful or a particularly informative statistic but I was asked to compute and include it.

This code lives elsewhere in my own project but I have made a branch and PR that may be of use here. The primary benefit for me of having it live in this package is that I won't have to compute the risk windows twice.

@schuemie Let me know if this is useful and if the PR should live here, or there are any additions or suggestions that can go along with it.