philipdelff / nmdata Goto Github PK
View Code? Open in Web Editor NEWPrepare and document data for Nonmem - and automatically retrieve results
Home Page: https://philipdelff.github.io/NMdata/
License: Other
Prepare and document data for Nonmem - and automatically retrieve results
Home Page: https://philipdelff.github.io/NMdata/
License: Other
hi @philipdelff
When I run R CMD check on NMdata with the new version of data.table from github master, I get the following new error which is not present when using data.table release version from CRAN:
* checking tests ...
Running 'spelling.R'
Running 'testthat.R'
ERROR
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
2. \-testthat::expect_known_value(..., update = update)
-- Failure ('test_NMscanInput.R:152'): CYCLE=DROP ------------------------------
`nm1` has changed from known value recorded in 'testReference/NMscanInput_7.rds'.
Component "input.colnames": Attributes: < Names: 1 string mismatch >
Component "input.colnames": Attributes: < Length mismatch: comparison on first 2 components >
Component "input.colnames": Attributes: < Component 2: Attributes: < target is NULL, current is list > >
Component "input.colnames": Attributes: < Component 2: Numeric: lengths (24, 0) differ >
Backtrace:
x
1. \-testthat::expect_equal_to_reference(nm1, fileRef, version = 2) at test_NMscanInput.R:152:4
2. \-testthat::expect_known_value(..., update = update)
[ FAIL 6 | WARN 0 | SKIP 0 | PASS 210 ]
Error: Test failures
Execution halted
Looking at your tests, it seems that you expect that some computed value from your code is equal to the value stored in a RDS file. The input.colnames
table has a new attribute named "index" which is causing the error, since the stored RDS value has no such attribute.
Can you please update your tests and/or code so that this ERROR goes away, and then submit a new version to CRAN?
This will help facilitate releasing a new version of data.table to CRAN. (data.table devs must make sure all reverse dependencies do not break, before submitting a new version to CRAN)
This package no longer passes its checks with the current tibble release candidate, perhaps triggered by tidyverse/tibble#1574 (now keeping attributes named "x"
and "n"
after new_tibble()
and as_tibble()
). Please see https://github.com/tidyverse/tibble/blob/main/revdep/problems.md#nmdata for details.
Should we be doing things differently? Can you please take a look and, if appropriate, submit an update to CRAN? Thanks!
This is somewhat related to #30 (in that it's about column naming in NMcheckData()
).
I like how NMcheckData()
does not require the default column names as stated in #30, I often use names that are closer to the SDTM and ADaM source data names for easier tracking back to the origin.
With that, it appears that NMcheckData()
has column renaming for many but not all columns that are checked. The columns that do not appear to have the ability to use other names that I've found so far appear to be col.dv
, col.mdv
, and col.amt
. col.evid
also doesn't appear to exist, but that doesn't seem like a name I'd use something different for. (So, col.evid
may be of interest for completeness, but it's not a real need to me.)
A simple workaround that I'm doing right now is simply renaming the columns as they go into NMcheckData()
which is not a significant hardship.
Hello,
My name is Eric Anderson and I work at Metrum Research Group as a Data Scientist. For starters, I just want to mention that I find this package very interesting and useful.
The function I have been exploring the most is NMdata::NMcheckData()
.
I have a potential feature request regarding the duplicate checking. It seems like right now the function checks across these columnsID, CMT, EVID, TIME
.
Sometimes I work with data sets that have additional columns that define unique rows (e.g. DVID
, DRUGID
, etc). Have you considered adding an argument to the function that allows for this?
I'm trying to make a reprex for this, but the quick issue is:
When I use the following to load several data files, I get the following error:
NMdata::NMscanData("NONMEM/PK_akrv2/nash18.lst", use.input=FALSE)
# Warning in NMreadTab(meta[I, file], quiet = TRUE, tab.count = tab.count, :
# Duplicated column names found: DV. Cleaning.
# Warning in NMreadTab(meta[I, file], quiet = TRUE, tab.count = tab.count, :
# Duplicated column names found: ET_QCENTP. Cleaning.
# Error in file.exists(file.mod) : invalid 'file' argument
When I traced the error a bit, it appears that file.mod
is NULL
. There are several places within NMscanData()
where file.exists(file.mod)
is called, and I don't immediately see which is causing the problem.
Most of the NONMEM data that I work with codes missing data as a period ("."). When I make an error with how I'm loading the data in R (mainly, when I'm not thinking about the loading carefully and just use read.csv()
without modification), I will have those periods in the data.
I think it would be a useful feature to have NMcheckData()
check and see if cells in the data have periods and suggest that maybe those should be converted to NA with code like type_convert(data, na=c(".", "NA"))
.
As you can tell, I'm using NMdata on some real projects now. So, I'm having lots of good thoughts about it! :)
I often use nonstandard names for NONMEM column names because I prefer keeping them closer to the SDTM and ADaM-like names in source data. With that, I found that the col.id
argument does not appear to be used with NMcheckData()
based on the fact that it says the ID column is not found (when I think it should not be expected):
library(NMdata)
#> Warning: package 'NMdata' was built under R version 4.1.3
#> Welcome to NMdata. Best place to browse NMdata documentation is
#> https://philipdelff.github.io/NMdata/
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Warning: package 'tibble' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3
dat <- readRDS(system.file("examples/data/xgxr2.rds", package="NMdata"))
dat2 <-
dat %>%
rename(
USUBJIDN=ID
)
NMcheckData(dat2, col.id="USUBJIDN")
#> column check N Nid
#> EVID Subject has no obs 19 0
#> ID Column not found 1 0
#> MDV Column not found 1 0
Created on 2022-05-17 by the reprex package (v2.0.1)
I just got the error "After applying filters to input data, the resulting number of rows differ from the number of rows in output data"
It would help track down the source of the error if the number of rows in the input and output were reported. (I realize that the NMdata-preferred solution is to use a ROWID column. I'm trying to work within someone else's data management for the moment.)
My preferred error would look something like the following:
After applying filters to input data, the resulting number of rows differ (input = 123 rows) from the number of rows in output data (output = 456 rows)
After running NMcheckData, it would be convenient to have a way to extract subsets of data for plotting or "data scrolling". I am starting a discussion on what such a function could look like. For a "row-level" finding, one may want to extract all data related to the subjects affected, and plot involved columns.
Inputs appreciated!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.