GithubHelp home page GithubHelp logo

nhanes's People

Contributors

ainilaha avatar cjendres1 avatar deepayan avatar muschellij2 avatar rgentlem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nhanes's Issues

Update minimum R version requirement?

nhanesA currently depends on R >= 3.0.0, but @rgentlem 's code already uses the base pipe operator |>, which will not actually work in R < 4.1.0. We could of course change them to %>%, but I prefer the more efficient and better error-handling native pipe.

Is it OK to increase the dependency to R >= 4.1.0 (which is now almost 2.5 years old)?

add withdrawl argument to nhanesSearchTableNames

nhanesA::nhanesSearchTableNames("PAHS", details = TRUE) returns,

      Years                                                  Data.File.Name   Doc.File
1 2011-2012        Polyaromatic Hydrocarbons (PAHs)- Urine - Special Sample PAHS_G Doc
2 2013-2014 Polycyclic Aromatic Hydrocarbons (PAH) - Urine - Special Sample PAHS_H Doc
3 2015-2016 Polycyclic Aromatic Hydrocarbons (PAH) - Urine - Special Sample PAHS_I Doc
                     Data.File Date.Published
1   PAHS_G Data [XPT - 426 KB]  February 2014
2 PAHS_H Data [XPT - 354.6 KB]      **Withdrawn**
3 PAHS_I Data [XPT - 310.7 KB]      July 2020

Withdrawn datasets are not really useful and cannot be downloaded from the nhanes() function, so it would be very nice to have an option to automatically remove withdrawn datasets from the search.

Option to work from local mirror

I think it would be useful to have the option to work from locally downloaded copies of the NHANES doc / data files, both for debugging as well as for those who cannot easily install the database. I have an initial implementation here:

https://github.com/deepayan/nhanes/blob/mirror/R/nhanes_mirror.R

If you think that this is worth including in the nhanesA package, I will create a PR with the additional changes required to have other functions like nhanes() use the local copies if nhanesOptions("local.mirror") is set.

To summarize the effect of this: I ran

$ time Rscript -e "library(nhanesA); nhanesMirror('nhanes-mirror', quiet = TRUE)"
[logs skipped]
real	77m55.093s
user	1m58.471s
sys	1m15.858s

This failed to download 23 files (the larger ones) because they hit the download timeout of 60 seconds. After setting

options(timeout = 500)

and running again, the remaining count went down to 2:

$ grep FALSE nhanes-mirror/manifest.csv
"1159",1159,1159,1159,"PAXMIN_G","PAXMIN_G","/nchs/nhanes/2011-2012/paxmin_g.htm","/nchs/nhanes/2011-2012/paxmin_g.xpt","2011-2012","Updated October 2022",FALSE,TRUE
"1160",1160,1160,1160,"PAXMIN_H","PAXMIN_H","/nchs/nhanes/2013-2014/paxmin_h.htm","/nchs/nhanes/2013-2014/paxmin_h.xpt","2013-2014","Updated October 2022",FALSE,TRUE

These are largish physical activity record files, 7.7GB and 8.9GB and timeout with nhanes() as well. Maybe they should also be excluded from the download list.

Excluding these, the size of the local copy is:

$ du -sh nhanes-mirror/
6.1G	nhanes-mirror/

which is not too bad to have locally.

Range of Values not translated

For tables with range of values in there, but having a category (such as DMDHHSIZ in DEMO_G and DEMO_H) are not translated. In other variables such as RIDAGEMN, this shouldn't be translated to a factor.

The issue happens on how https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DEMO_G.htm#DMDHHSIZ represents the data (1 to 6 then 7 is category)
Screen Shot 2023-08-03 at 3 40 29 PM

versus how DEMO_H represents it https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.htm#DMDHHSIZ (1, 2, 3, 4, 5, 6, then 7 is category)
Screen Shot 2023-08-03 at 3 39 56 PM

When trying to compare or combine waves (which has a number of issues), this breaks down. Below is an example of how it currently runs. Ideally, the 1 to 6 would be translated (sending a PR):

library(nhanesA)
translations = nhanesTranslate(nh_table = "DEMO_G", colnames = c("DMDHHSIZ", "RIDAGEMN"))
translations$DMDHHSIZ
#>   Code.or.Value                 Value.Description
#> 1        1 to 6                   Range of Values
#> 2             7 7 or more people in the Household
#> 3             .                           Missing
translations$RIDAGEMN
#>   Code.or.Value Value.Description
#> 1       0 to 24   Range of Values
#> 2             .           Missing

translations = nhanesTranslate(nh_table = "DEMO_H", colnames = c("DMDHHSIZ", "RIDAGEMN"))
translations$DMDHHSIZ
#>   Code.or.Value                 Value.Description
#> 1             1                                 1
#> 2             2                                 2
#> 3             3                                 3
#> 4             4                                 4
#> 5             5                                 5
#> 6             6                                 6
#> 7             7 7 or more people in the Household
#> 8             .                           Missing
translations$RIDAGEMN
#>   Code.or.Value Value.Description
#> 1       0 to 24   Range of Values
#> 2             .           Missing

Created on 2023-08-03 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-08-03
#>  pandoc   3.1.5 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.2.0)
#>  curl          5.0.1   2023-06-07 [1] CRAN (R 4.2.0)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.2.0)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.2.0)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.2.0)
#>  foreign       0.8-84  2022-12-06 [1] CRAN (R 4.2.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.2.0)
#>  httr          1.4.6   2023-05-08 [1] CRAN (R 4.2.0)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  nhanesA     * 0.7.4   2023-08-03 [1] Github (cjendres1/nhanes@fa90130)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  plyr          1.8.8   2022-11-11 [1] CRAN (R 4.2.0)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.2.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.23    2023-07-01 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.2.0)
#>  rvest         1.0.3   2022-08-19 [1] CRAN (R 4.2.0)
#>  selectr       0.4-2   2019-11-20 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.2.0)
#>  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.2.0)
#>  styler        1.10.1  2023-06-05 [1] CRAN (R 4.2.0)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.2.0)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.0)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.2.0)
#>  xml2          1.3.5   2023-07-06 [1] CRAN (R 4.2.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Error obtaining details of specific variables

Thanks a lot for integrating the new code in release 0.7.1!

While obtaining details of all variables at least one of them threw an error:
nhanesCodebook(nh_table = 'DRXIFF', colname = 'SEQN')

results in:

Error in names(tabletrans) <- colname : 
  'names' attribute [1] must be the same length as the vector [0]

As a sidenote, I had to load the libraries stringr and rvest to use the altered nhanesCodebook function. Without those I got a couple of errors: missing str_c and html_elements functions.

Functionality to obtain `SAS Label` of each variable

Would it be possible to obtain the SAS Label of each variable, to facilitate analysis/post-processing of study variables?

For example, to have all the following details of each variable:

Variable Name: DRDINT
SAS Label: Number of days of intake
English Text: Indicates whether the sample person has intake data for one or two days.

Subset with grep

The following statement in nhanesSearchTableNames caused an error:
df <- subset(df, grep(paste(pattern,collapse="|"), Doc.File))

I resolved the error with the following:
df <- subset(df, Doc.File %in% grep(paste(pattern,collapse="|"), Doc.File, value=TRUE))

@deepayan please review my edit and let me know if you agree with the change ([which I already checked in])(38fa70e).
The combination of subset with grep also appears elsewhere. Would appreciate if you can take a look and ensure those statements are working as expected. Thanks.

hopping analysis functions can be added

due to the weighted design of NHANES,many statistical works require weight variable, the detailed algorithm really confuses me, hope one day there can be easy-to-use API, best appreciation :)

Extended Codebook Data

Hello,

This tool is very helpful, thank you for making it. I am currently working on a project with NHANES that also assesses the Codebook/variable data, so your nhanesTableVars functionality is life-saving.

On that front, is there any way to extend that functionality to include the expected values and their keys? For example, variable AUQ020D has label Earache Last 24 Hours, Left? , has a description Have you had an earache in left ear in the last 24 hours? and the set of expected values Yes No Don't know or Missing. The raw data does not exist in any form I have been able to find or download and it would be invaluable.

Should we use factors?

The nhanes(translated = TRUE) function currently converts categorical variables into factors. Is this a good idea? Often we would want to combine data across cycles, and if the factor levels do not match across cycles, the result may not be what we expect.

I haven't checked to see if this problem is widespread, but here is one example (look at ALQ110):

> str(nhanes("ALQ_H"))
'data.frame':	5924 obs. of  10 variables:
 $ SEQN   : num  73557 73558 73559 73561 73562 ...
 $ ALQ101 : Factor w/ 3 levels "Yes","No","Don't know": 1 1 1 1 1 2 1 1 1 2 ...
 $ ALQ110 : Factor w/ 3 levels "Yes","No","Don't know": NA NA NA NA NA 1 NA NA NA 1 ...
 $ ALQ120Q: num  1 7 0 0 5 2 1 4 2 2 ...
 $ ALQ120U: Factor w/ 3 levels "Week","Month",..: 3 1 NA NA 3 3 1 1 1 3 ...
 $ ALQ130 : num  1 4 NA NA 1 1 1 3 2 1 ...
 $ ALQ141Q: num  0 2 NA NA 0 0 0 0 1 0 ...
 $ ALQ141U: Factor w/ 3 levels "Week","Month",..: NA 1 NA NA NA NA NA NA 3 NA ...
 $ ALQ151 : Factor w/ 3 levels "Yes","No","Don't know": 1 1 2 2 2 2 2 2 2 2 ...
 $ ALQ160 : num  NA 0 NA NA 0 NA NA NA 0 NA ...
> str(nhanes("ALQ_I"))
'data.frame':	5735 obs. of  10 variables:
 $ SEQN   : num  83732 83733 83734 83735 83736 ...
 $ ALQ101 : Factor w/ 3 levels "Yes","No","Don't know": 1 1 1 2 2 2 1 1 NA 1 ...
 $ ALQ110 : Factor w/ 4 levels "Yes","No","Refused",..: NA NA NA 1 1 2 NA NA NA NA ...
 $ ALQ120Q: num  1 7 0 3 1 NA 3 1 NA 0 ...
 $ ALQ120U: Factor w/ 3 levels "Week","Month",..: 2 1 NA 3 3 NA 2 2 NA NA ...
 $ ALQ130 : num  1 6 NA 1 1 NA 8 1 NA NA ...
 $ ALQ141Q: num  0 7 NA 0 0 NA 20 0 NA NA ...
 $ ALQ141U: Factor w/ 3 levels "Week","Month",..: NA 1 NA NA NA NA 3 NA NA NA ...
 $ ALQ151 : Factor w/ 4 levels "Yes","No","Refused",..: 2 1 1 2 2 NA 1 2 NA 1 ...
 $ ALQ160 : num  NA 0 NA NA NA NA 2 NA NA NA ...

The general trend in recent years has been to move towards stringsAsFactors = FALSE by default. Should we also keep these as character strings instead of converting to factors?

NOTE reported during cran check

Hi Deepayan,
I received this message when I ran a cran check after your latest merge:
❯ checking R code for possible problems ... NOTE
nhanesManifest_public: no visible binding for global variable 'DataURL'
Undefined global functions or variables:
DataURL

I believe it's complaining about this line in nhanes_tables.R:
df <- subset(df, startsWith(DataURL, "/") & endsWith(toupper(DataURL), ".XPT"))

It appears that the code is working as expected, i.e. the function recognizes DataURL as a column. Just need to make sure the Note doesn't appear during cran checks.

Detection of numeric columns in `nhanesTranslate()`

While looking at nhanesTranslate(), I came across this example which puzzled me for a while:

> d <- nhanesTranslate("LAB04", data = nhanes("LAB04", translated = FALSE))
Warning message:
In nhanesTranslate("LAB04", data = nhanes("LAB04", translated = FALSE)) :
  No columns were translated

This is surprising because the table has categorical variables that should be translated, e.g., https://wwwn.cdc.gov/Nchs/Nhanes/1999-2000/LAB04.htm#LBDWBFLC.

The reason is that the current code mistakes these for numeric variables because one of the value descriptions is "Detectable Result and Exceeds the Calibrated Range of Assay", which includes the word "Range", and hence is flagged here:

https://github.com/cjendres1/nhanes/blob/master/R/nhanes_translate.R#L151

Should we change

nskip <- grep('Range', translations)

to

nskip <- grep('Range of values', translations)

?

Adding SAS labels when importing

Hi Chris-

Is there a way to add SAS labels to variables (in the data frame) when importing an NHANES table using the function nhanes?

Thanks!
Huda

please bump version number

Hi,
Every commit should involve a change to the version number - otherwise we really get in trouble with bug reports...
thx

Error with nhanesSearchVarName()

nhanesSearchVarName('BMXLEG')
Error in matrix(unlist(ttlist), nrow = length(ttlist), byrow = TRUE) :
'data' must be of a vector type, was 'NULL'

This is with the latest version of R and nhanesA (0.6.4.4).
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Platform: x86_64-w64-mingw32/x64 (64-bit)

Did NHANES change the location of the tables again?

Name suggestion for alternative translation interface

In the "translation" branch, I have a slightly different implementation of the translation workflow. This differs basically in how numeric variables are handled; instead of retaining as-is, an attempt is made to convert some special values into NAs. The translation of categorical variables is also simplified.

Some context for this is available here.

The current interface to this is through the function nhanesTranslateRaw(nh_table). I am not sure if this is a good name, but I can't think of a better one.

Any suggestions?

Inconsistent capitalization in factor levels (from codebook)

I am not sure how best to deal with this, but thought I should mention it for the record. The codebooks for the same variables in different cycles sometimes change the capitalization of the same values. This leads to apparently different levels which are actually the same. One example:

> nhanesCodebook("DEMO_J")$DMDEDUC3$DMDEDUC3
# A tibble: 21 × 5
   `Code or Value` `Value Description`           Count Cumulative `Skip to Item`
   <chr>           <chr>                         <int>      <int> <lgl>         
 1 0               Never attended / kindergarte…   184        184 NA            
 2 1               1st grade                       176        360 NA            
 3 2               2nd grade                       202        562 NA            
 4 3               3rd grade                       202        764 NA            
 5 4               4th grade                       179        943 NA            
 6 5               5th grade                       199       1142 NA            
 7 6               6th grade                       154       1296 NA            
 8 7               7th grade                       151       1447 NA            
 9 8               8th grade                       154       1601 NA            
10 9               9th grade                       154       1755 NA            
# ℹ 11 more rows
# ℹ Use `print(n = ...)` to see more rows
> nhanesCodebook("DEMO_C")$DMDEDUC3$DMDEDUC3
# A tibble: 21 × 5
   `Code or Value` `Value Description`           Count Cumulative `Skip to Item`
   <chr>           <chr>                         <int>      <int> <lgl>         
 1 0               Never Attended / Kindergarte…   205        205 NA            
 2 1               1st Grade                       177        382 NA            
 3 2               2nd Grade                       193        575 NA            
 4 3               3rd Grade                       158        733 NA            
 5 4               4th Grade                       183        916 NA            
 6 5               5th Grade                       210       1126 NA            
 7 6               6th Grade                       286       1412 NA            
 8 7               7th Grade                       277       1689 NA            
 9 8               8th Grade                       312       2001 NA            
10 9               9th Grade                       302       2303 NA            
# ℹ 11 more rows
# ℹ Use `print(n = ...)` to see more rows

This is tricky because we don't know what the duplicates are without downloading everything...

One option is to convert based on a known dictionary (which is pre-computed using the database and then stored in the package).

"Tables" functions returning errors

I've just installed nhanesA, and encountered errors trying to follow the vignette: https://cran.r-project.org/web/packages/nhanesA/vignettes/Introducing_nhanesA.html

In particular:

> nhanesTables('EXAM', 2005)
Error in `[.data.frame`(df, , c("Data.File.Name", "Data.File.Description")) : 
  undefined columns selected
> nhanesTableVars('EXAM', 'BMX_D')
Error in nhanesTableVars("EXAM", "BMX_D") : 
  Table BMX_D not present in the EXAM survey

However, bmx_d <- nhanes('BMX_D') appears to be working as expected.

Unable to access survey tables from 2019 and 2021

When running the following line for survey years 2019 and 2021:
tables_2019 <- nhanesTables(data_group=nhanes_data_group, year=2019)

I obtain the following error:

Error in `[.data.frame`(df, , c("Data.File.Name", "Data.File.Description")) : undefined columns selected

5. stop("undefined columns selected")
4. `[.data.frame`(df, , c("Data.File.Name", "Data.File.Description"))
3. df[, c("Data.File.Name", "Data.File.Description")]
2. unique(df[, c("Data.File.Name", "Data.File.Description")])
1. nhanesTables(data_group = nhanes_data_group, year = 2019)

a minor bug in nhanesTranslate()

The nhanesTranslate raises errors when we translate the data with nchar=512 or larger.
For example:

> demo = nhanes("DEMO_C")
> demo_trans = nhanesTranslate("DEMO_C",colnames = colnames(demo)[2:nrow(demo)],data = demo,nchar = 1024)

Error message:

Timeout was reached: No data pulled

Error in UseMethod("xml_find_all") : 
  no applicable method for 'xml_find_all' applied to an object of class "NULL"
In addition: There were 38 warnings (use warnings() to see them)

It often works if data= not be used.
For example:

nhanesTranslate("DEMO_C",colnames = colnames(demo)[2:nrow(demo)],nchar = 1024)

Often print the correct information.

unable to browse or pull data/tables for modified cycle 2017-March2020 (prepandemic)

Thank you for this awesome package. Pardon me if it is obvious as I am "very beginner". I have been able to browse and pull data for all years/cycles except the most recent available as 3.2 yrs data 2017 to March 2020. Using 2017 as year points to 2017 - 2018 data which is okay, but not what i want. How can I point to the modified cycle 2017-2020?

nhanes() is not working

hi Chris, all other functions in the nhanesA package works well for me after loading the package, except the file importing function nhanes(). When I run demo <- nhanes('DEMO_I'), it tells me that the dataset DEMO_I is not available. is something wrong with the way i am using the function? Please help. thanks,

nhanes() cannot import dietary files only when using lapply

According to your tutorial "Introducing nhanesA", I wrote the following code to import files from NHANES data. Strangely, this code works for all other data groups but DIET.

dr2i_names<-nhanesSearchTableNames('DR2IFF',2003,2013)
   dr2is<-lapply(dr2i_names,nhanes)
   names(dr2is)<- dr2i_names

This yields the following error:
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0

I tried importing one file only from the table names but still got the same error.

> dr1<-nhanes(dr2i_names[1])
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0
> dr1<-nhanes("DR2IFF_C")
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0
> dr1<-nhanes('DR2IFF_C')
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0

Do you have any idea why this happens and how to fix the issue? Thanks.

Import data issue

Hi,

When I run the command nhanesTables('DEMO', 2015), I get all the available tables in 2015. However, when I type nhanes('DEMO_I') to load the table, I get the following error message :
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): cannot open URL 'https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT': HTTP status was '404 Not Found'
I think the route has changed and not being updated on this project.
What should I do to fix the problem ?

In addition, is it possible to make the 2016, 2017 and 2018 years available in the package ?

Regards,

Bryan

Tranlation bug for `SDDSRVYR`?

We have

> nhanesCodebook("DEMO_C")$SDDSRVYR |> str()
List of 5
 $ Variable Name:: chr "SDDSRVYR"
 $ SAS Label:    : chr "Data Release Number"
 $ English Text: : chr "Data Release Number."
 $ Target:       : chr "Both males and females 0 YEARS -\r 150 YEARS"
 $ SDDSRVYR      : tibble [2 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ Code or Value    : chr [1:2] "3" "."
  ..$ Value Description: chr [1:2] "NHANES 2003-2004 Public Release" "Missing"
  ..$ Count            : int [1:2] 10122 0
  ..$ Cumulative       : int [1:2] 10122 10122
  ..$ Skip to Item     : logi [1:2] NA NA

Yet, SDDSRVYR is not translated by nhanes():

> nhanes("DEMO_C")$SDDSRVYR |> head()
[1] 3 3 3 3 3 3

I think this is due to the default of mincategories = 2 in nhanesTranslate() (there are no missing values, and so only one unique value):

https://github.com/cjendres1/nhanes/blob/master/R/nhanes_translate.R#L52

I don't think this is sensible behavior. Is there any particular reason for the default not to be 1? My suggestion would be to change the default to 1 otherwise.

Also, currently there is no way to specify this when calling nhanes(), which makes this effectively hard-coded. There should be a provision to pass this on to nhanesTranslate() from nhanes().

Difficulty with nhanesA and pre pandemic data

Dear Mr. Endres,

Thank you for creating the nhanesA package, it has made working with the NHANES data easier.

Please I am having difficulty with importing data from the 2017-2020 pre pandemic data.
Wondering if others have also encountered similar problems and what the solution is.

For eg the following code works for all other years but not the 2017-2020 data

nhanesTableVars(data_group="DEMO", nh_table="P_DEMO",namesonly = TRUE)

Thank you

Inconsistency in codebook column names

I was initially confused by this inconsistency in whether column names are 'fixed':

> nhanesCodebook('AUX_D', 'AUQ020D')$AUQ020D
# A tibble: 4 × 5
  `Code or Value` `Value Description`     Count Cumulative `Skip to Item`
  <chr>           <chr>                   <int>      <int> <lgl>         
1 1               Yes (checkbox checked)     11         11 NA            
2 2               No (checkbox unchecked)   342        353 NA            
3 9               Don't know                  1        354 NA            
4 .               Missing                  2680       3034 NA            
> nhanesTranslate('AUX_D', colnames = 'AUQ020D', details = TRUE)$AUQ020D
  Code.or.Value       Value.Description Count Cumulative Skip.to.Item
1             1  Yes (checkbox checked)    11         11           NA
2             2 No (checkbox unchecked)   342        353           NA
3             9              Don't know     1        354           NA
4             .                 Missing  2680       3034           NA

I eventually figured out that this is because nhanesTranslate() has

  tabletrans <- html_elements(tabletree, 'table') |> html_table() |> as.data.frame()

while nhanesCodebook() does not have the as.data.frame() coercion.

  tabletrans <- html_elements(tabletree, 'table') |> html_table()

I am conflicted about which is the right thing to do, but I guess we should at least be consistent?

And a lot of the existing code assumes that the as.data.frame() coercion has been done, so probably that's the easiest choice for now.

load translated data by default

Hi Chris

Could you update nhanes(nh_table, includelabels = FALSE) to nhanes(nh_table, includelabels = FALSE, translated = TRUE) by adding translated = TRUE parameter? In the docker version, the function loads the translated data unless the users explicitly want to load the raw data by set translated = FALSE.
For example,
nhanes("DEMO_C") # By default, it would be better if it loads the categorical variables such as gender (RIAGENDR) column should fill with Male and Female , and race(RIDRETH1) column contains Mexican American, Other Hispanic,Non-Hispanic White... instead of the integer numbers.

Thanks,
Laha

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.