cjendres1 / nhanes Goto Github PK
View Code? Open in Web Editor NEWnhanesA: R package for browsing and retrieving NHANES data
nhanesA: R package for browsing and retrieving NHANES data
nhanesA currently depends on R >= 3.0.0
, but @rgentlem 's code already uses the base pipe operator |>
, which will not actually work in R < 4.1.0
. We could of course change them to %>%
, but I prefer the more efficient and better error-handling native pipe.
Is it OK to increase the dependency to R >= 4.1.0
(which is now almost 2.5 years old)?
nhanesA::nhanesSearchTableNames("PAHS", details = TRUE)
returns,
Years Data.File.Name Doc.File
1 2011-2012 Polyaromatic Hydrocarbons (PAHs)- Urine - Special Sample PAHS_G Doc
2 2013-2014 Polycyclic Aromatic Hydrocarbons (PAH) - Urine - Special Sample PAHS_H Doc
3 2015-2016 Polycyclic Aromatic Hydrocarbons (PAH) - Urine - Special Sample PAHS_I Doc
Data.File Date.Published
1 PAHS_G Data [XPT - 426 KB] February 2014
2 PAHS_H Data [XPT - 354.6 KB] **Withdrawn**
3 PAHS_I Data [XPT - 310.7 KB] July 2020
Withdrawn datasets are not really useful and cannot be downloaded from the nhanes()
function, so it would be very nice to have an option to automatically remove withdrawn datasets from the search.
I think it would be useful to have the option to work from locally downloaded copies of the NHANES doc / data files, both for debugging as well as for those who cannot easily install the database. I have an initial implementation here:
https://github.com/deepayan/nhanes/blob/mirror/R/nhanes_mirror.R
If you think that this is worth including in the nhanesA package, I will create a PR with the additional changes required to have other functions like nhanes()
use the local copies if nhanesOptions("local.mirror") is set.
To summarize the effect of this: I ran
$ time Rscript -e "library(nhanesA); nhanesMirror('nhanes-mirror', quiet = TRUE)"
[logs skipped]
real 77m55.093s
user 1m58.471s
sys 1m15.858s
This failed to download 23 files (the larger ones) because they hit the download timeout of 60 seconds. After setting
options(timeout = 500)
and running again, the remaining count went down to 2:
$ grep FALSE nhanes-mirror/manifest.csv
"1159",1159,1159,1159,"PAXMIN_G","PAXMIN_G","/nchs/nhanes/2011-2012/paxmin_g.htm","/nchs/nhanes/2011-2012/paxmin_g.xpt","2011-2012","Updated October 2022",FALSE,TRUE
"1160",1160,1160,1160,"PAXMIN_H","PAXMIN_H","/nchs/nhanes/2013-2014/paxmin_h.htm","/nchs/nhanes/2013-2014/paxmin_h.xpt","2013-2014","Updated October 2022",FALSE,TRUE
These are largish physical activity record files, 7.7GB and 8.9GB and timeout with nhanes()
as well. Maybe they should also be excluded from the download list.
Excluding these, the size of the local copy is:
$ du -sh nhanes-mirror/
6.1G nhanes-mirror/
which is not too bad to have locally.
As additional columns in the data frame returned by the nhanesA::nhanesSearchTableNames() function (as discussed in ccb-hms/phonto#22).
For tables with range of values in there, but having a category (such as DMDHHSIZ
in DEMO_G
and DEMO_H
) are not translated. In other variables such as RIDAGEMN
, this shouldn't be translated to a factor.
The issue happens on how https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DEMO_G.htm#DMDHHSIZ represents the data (1 to 6 then 7 is category)
versus how DEMO_H represents it https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.htm#DMDHHSIZ (1, 2, 3, 4, 5, 6, then 7 is category)
When trying to compare or combine waves (which has a number of issues), this breaks down. Below is an example of how it currently runs. Ideally, the 1 to 6
would be translated (sending a PR):
library(nhanesA)
translations = nhanesTranslate(nh_table = "DEMO_G", colnames = c("DMDHHSIZ", "RIDAGEMN"))
translations$DMDHHSIZ
#> Code.or.Value Value.Description
#> 1 1 to 6 Range of Values
#> 2 7 7 or more people in the Household
#> 3 . Missing
translations$RIDAGEMN
#> Code.or.Value Value.Description
#> 1 0 to 24 Range of Values
#> 2 . Missing
translations = nhanesTranslate(nh_table = "DEMO_H", colnames = c("DMDHHSIZ", "RIDAGEMN"))
translations$DMDHHSIZ
#> Code.or.Value Value.Description
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 5
#> 6 6 6
#> 7 7 7 or more people in the Household
#> 8 . Missing
translations$RIDAGEMN
#> Code.or.Value Value.Description
#> 1 0 to 24 Range of Values
#> 2 . Missing
Created on 2023-08-03 with reprex v2.0.2
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23)
#> os macOS Big Sur ... 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2023-08-03
#> pandoc 3.1.5 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.0)
#> curl 5.0.1 2023-06-07 [1] CRAN (R 4.2.0)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.2.0)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.2.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.0)
#> foreign 0.8-84 2022-12-06 [1] CRAN (R 4.2.0)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.2.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.2.0)
#> httr 1.4.6 2023-05-08 [1] CRAN (R 4.2.0)
#> knitr 1.43 2023-05-25 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> nhanesA * 0.7.4 2023-08-03 [1] Github (cjendres1/nhanes@fa90130)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> plyr 1.8.8 2022-11-11 [1] CRAN (R 4.2.0)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.2.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.0)
#> rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.2.0)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.2.0)
#> rvest 1.0.3 2022-08-19 [1] CRAN (R 4.2.0)
#> selectr 0.4-2 2019-11-20 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0)
#> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.0)
#> styler 1.10.1 2023-06-05 [1] CRAN (R 4.2.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0)
#> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.2.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.39 2023-04-20 [1] CRAN (R 4.2.0)
#> xml2 1.3.5 2023-07-06 [1] CRAN (R 4.2.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Currently, there must be at least two categories (e.g. Male, Female) for nhanesTranslate to apply a code translation to a data set directly. Want to allow user to specify the desired minimum categories.
Bug when nhanesSearch should return NULL. e.g.
nhanesSearch("creatinine", ystart=1999, ystop=2017, ignore.case=TRUE)
Thanks a lot for integrating the new code in release 0.7.1!
While obtaining details of all variables at least one of them threw an error:
nhanesCodebook(nh_table = 'DRXIFF', colname = 'SEQN')
results in:
Error in names(tabletrans) <- colname :
'names' attribute [1] must be the same length as the vector [0]
As a sidenote, I had to load the libraries stringr
and rvest
to use the altered nhanesCodebook
function. Without those I got a couple of errors: missing str_c
and html_elements
functions.
Would it be possible to obtain the SAS Label of each variable, to facilitate analysis/post-processing of study variables?
For example, to have all the following details of each variable:
Variable Name: DRDINT
SAS Label: Number of days of intake
English Text: Indicates whether the sample person has intake data for one or two days.
The following statement in nhanesSearchTableNames caused an error:
df <- subset(df, grep(paste(pattern,collapse="|"), Doc.File))
I resolved the error with the following:
df <- subset(df, Doc.File %in% grep(paste(pattern,collapse="|"), Doc.File, value=TRUE))
@deepayan please review my edit and let me know if you agree with the change ([which I already checked in])(38fa70e).
The combination of subset with grep also appears elsewhere. Would appreciate if you can take a look and ensure those statements are working as expected. Thanks.
due to the weighted design of NHANES,many statistical works require weight variable, the detailed algorithm really confuses me, hope one day there can be easy-to-use API, best appreciation :)
Hello,
This tool is very helpful, thank you for making it. I am currently working on a project with NHANES that also assesses the Codebook/variable data, so your nhanesTableVars functionality is life-saving.
On that front, is there any way to extend that functionality to include the expected values and their keys? For example, variable AUQ020D
has label Earache Last 24 Hours, Left?
, has a description Have you had an earache in left ear in the last 24 hours?
and the set of expected values Yes
No
Don't know
or Missing
. The raw data does not exist in any form I have been able to find or download and it would be invaluable.
The nhanes(translated = TRUE)
function currently converts categorical variables into factors. Is this a good idea? Often we would want to combine data across cycles, and if the factor levels do not match across cycles, the result may not be what we expect.
I haven't checked to see if this problem is widespread, but here is one example (look at ALQ110
):
> str(nhanes("ALQ_H"))
'data.frame': 5924 obs. of 10 variables:
$ SEQN : num 73557 73558 73559 73561 73562 ...
$ ALQ101 : Factor w/ 3 levels "Yes","No","Don't know": 1 1 1 1 1 2 1 1 1 2 ...
$ ALQ110 : Factor w/ 3 levels "Yes","No","Don't know": NA NA NA NA NA 1 NA NA NA 1 ...
$ ALQ120Q: num 1 7 0 0 5 2 1 4 2 2 ...
$ ALQ120U: Factor w/ 3 levels "Week","Month",..: 3 1 NA NA 3 3 1 1 1 3 ...
$ ALQ130 : num 1 4 NA NA 1 1 1 3 2 1 ...
$ ALQ141Q: num 0 2 NA NA 0 0 0 0 1 0 ...
$ ALQ141U: Factor w/ 3 levels "Week","Month",..: NA 1 NA NA NA NA NA NA 3 NA ...
$ ALQ151 : Factor w/ 3 levels "Yes","No","Don't know": 1 1 2 2 2 2 2 2 2 2 ...
$ ALQ160 : num NA 0 NA NA 0 NA NA NA 0 NA ...
> str(nhanes("ALQ_I"))
'data.frame': 5735 obs. of 10 variables:
$ SEQN : num 83732 83733 83734 83735 83736 ...
$ ALQ101 : Factor w/ 3 levels "Yes","No","Don't know": 1 1 1 2 2 2 1 1 NA 1 ...
$ ALQ110 : Factor w/ 4 levels "Yes","No","Refused",..: NA NA NA 1 1 2 NA NA NA NA ...
$ ALQ120Q: num 1 7 0 3 1 NA 3 1 NA 0 ...
$ ALQ120U: Factor w/ 3 levels "Week","Month",..: 2 1 NA 3 3 NA 2 2 NA NA ...
$ ALQ130 : num 1 6 NA 1 1 NA 8 1 NA NA ...
$ ALQ141Q: num 0 7 NA 0 0 NA 20 0 NA NA ...
$ ALQ141U: Factor w/ 3 levels "Week","Month",..: NA 1 NA NA NA NA 3 NA NA NA ...
$ ALQ151 : Factor w/ 4 levels "Yes","No","Refused",..: 2 1 1 2 2 NA 1 2 NA 1 ...
$ ALQ160 : num NA 0 NA NA NA NA 2 NA NA NA ...
The general trend in recent years has been to move towards stringsAsFactors = FALSE
by default. Should we also keep these as character strings instead of converting to factors?
Hi Deepayan,
I received this message when I ran a cran check after your latest merge:
❯ checking R code for possible problems ... NOTE
nhanesManifest_public: no visible binding for global variable 'DataURL'
Undefined global functions or variables:
DataURL
I believe it's complaining about this line in nhanes_tables.R:
df <- subset(df, startsWith(DataURL, "/") & endsWith(toupper(DataURL), ".XPT"))
It appears that the code is working as expected, i.e. the function recognizes DataURL as a column. Just need to make sure the Note doesn't appear during cran checks.
There appears to be a fundamental change in table formatting such that code lookups do not work for 2013-2014 variables.
While looking at nhanesTranslate()
, I came across this example which puzzled me for a while:
> d <- nhanesTranslate("LAB04", data = nhanes("LAB04", translated = FALSE))
Warning message:
In nhanesTranslate("LAB04", data = nhanes("LAB04", translated = FALSE)) :
No columns were translated
This is surprising because the table has categorical variables that should be translated, e.g., https://wwwn.cdc.gov/Nchs/Nhanes/1999-2000/LAB04.htm#LBDWBFLC.
The reason is that the current code mistakes these for numeric variables because one of the value descriptions is "Detectable Result and Exceeds the Calibrated Range of Assay", which includes the word "Range", and hence is flagged here:
https://github.com/cjendres1/nhanes/blob/master/R/nhanes_translate.R#L151
Should we change
nskip <- grep('Range', translations)
to
nskip <- grep('Range of values', translations)
?
Hi Chris-
Is there a way to add SAS labels to variables (in the data frame) when importing an NHANES table using the function nhanes?
Thanks!
Huda
Hi,
Every commit should involve a change to the version number - otherwise we really get in trouble with bug reports...
thx
nhanesSearchVarName('BMXLEG')
Error in matrix(unlist(ttlist), nrow = length(ttlist), byrow = TRUE) :
'data' must be of a vector type, was 'NULL'
This is with the latest version of R and nhanesA (0.6.4.4).
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Platform: x86_64-w64-mingw32/x64 (64-bit)
Did NHANES change the location of the tables again?
Great work on this package—super helpful!
It looks like there is a restriction on the character length of variable descriptions obtained via the nhanesTableVars()
function:
Line 332 in 92cb256
Would it be possible to not hard-code the restriction to 128 characters so that we can obtain the full descriptions?
In the "translation" branch, I have a slightly different implementation of the translation workflow. This differs basically in how numeric variables are handled; instead of retaining as-is, an attempt is made to convert some special values into NAs. The translation of categorical variables is also simplified.
Some context for this is available here.
The current interface to this is through the function nhanesTranslateRaw(nh_table)
. I am not sure if this is a good name, but I can't think of a better one.
Any suggestions?
ggplot figures are not rendered well. Please see: Survey Weight html
Please note that a vignette engine has been added to UsingSurveyWeights.rmd
To view changes see:
41c12a1
I am not sure how best to deal with this, but thought I should mention it for the record. The codebooks for the same variables in different cycles sometimes change the capitalization of the same values. This leads to apparently different levels which are actually the same. One example:
> nhanesCodebook("DEMO_J")$DMDEDUC3$DMDEDUC3
# A tibble: 21 × 5
`Code or Value` `Value Description` Count Cumulative `Skip to Item`
<chr> <chr> <int> <int> <lgl>
1 0 Never attended / kindergarte… 184 184 NA
2 1 1st grade 176 360 NA
3 2 2nd grade 202 562 NA
4 3 3rd grade 202 764 NA
5 4 4th grade 179 943 NA
6 5 5th grade 199 1142 NA
7 6 6th grade 154 1296 NA
8 7 7th grade 151 1447 NA
9 8 8th grade 154 1601 NA
10 9 9th grade 154 1755 NA
# ℹ 11 more rows
# ℹ Use `print(n = ...)` to see more rows
> nhanesCodebook("DEMO_C")$DMDEDUC3$DMDEDUC3
# A tibble: 21 × 5
`Code or Value` `Value Description` Count Cumulative `Skip to Item`
<chr> <chr> <int> <int> <lgl>
1 0 Never Attended / Kindergarte… 205 205 NA
2 1 1st Grade 177 382 NA
3 2 2nd Grade 193 575 NA
4 3 3rd Grade 158 733 NA
5 4 4th Grade 183 916 NA
6 5 5th Grade 210 1126 NA
7 6 6th Grade 286 1412 NA
8 7 7th Grade 277 1689 NA
9 8 8th Grade 312 2001 NA
10 9 9th Grade 302 2303 NA
# ℹ 11 more rows
# ℹ Use `print(n = ...)` to see more rows
This is tricky because we don't know what the duplicates are without downloading everything...
One option is to convert based on a known dictionary (which is pre-computed using the database and then stored in the package).
I've just installed nhanesA, and encountered errors trying to follow the vignette: https://cran.r-project.org/web/packages/nhanesA/vignettes/Introducing_nhanesA.html
In particular:
> nhanesTables('EXAM', 2005)
Error in `[.data.frame`(df, , c("Data.File.Name", "Data.File.Description")) :
undefined columns selected
> nhanesTableVars('EXAM', 'BMX_D')
Error in nhanesTableVars("EXAM", "BMX_D") :
Table BMX_D not present in the EXAM survey
However, bmx_d <- nhanes('BMX_D')
appears to be working as expected.
When running the following line for survey years 2019 and 2021:
tables_2019 <- nhanesTables(data_group=nhanes_data_group, year=2019)
I obtain the following error:
Error in `[.data.frame`(df, , c("Data.File.Name", "Data.File.Description")) : undefined columns selected
5. stop("undefined columns selected")
4. `[.data.frame`(df, , c("Data.File.Name", "Data.File.Description"))
3. df[, c("Data.File.Name", "Data.File.Description")]
2. unique(df[, c("Data.File.Name", "Data.File.Description")])
1. nhanesTables(data_group = nhanes_data_group, year = 2019)
Robert @rgentlem , will check the HTML parse function and fix it.
The nhanesTranslate
raises errors when we translate the data with nchar=512 or larger.
For example:
> demo = nhanes("DEMO_C")
> demo_trans = nhanesTranslate("DEMO_C",colnames = colnames(demo)[2:nrow(demo)],data = demo,nchar = 1024)
Error message:
Timeout was reached: No data pulled
Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class "NULL"
In addition: There were 38 warnings (use warnings() to see them)
It often works if data=
not be used.
For example:
nhanesTranslate("DEMO_C",colnames = colnames(demo)[2:nrow(demo)],nchar = 1024)
Often print the correct information.
Thank you for this awesome package. Pardon me if it is obvious as I am "very beginner". I have been able to browse and pull data for all years/cycles except the most recent available as 3.2 yrs data 2017 to March 2020. Using 2017 as year points to 2017 - 2018 data which is okay, but not what i want. How can I point to the modified cycle 2017-2020?
hi Chris, all other functions in the nhanesA package works well for me after loading the package, except the file importing function nhanes(). When I run demo <- nhanes('DEMO_I'), it tells me that the dataset DEMO_I is not available. is something wrong with the way i am using the function? Please help. thanks,
According to your tutorial "Introducing nhanesA", I wrote the following code to import files from NHANES data. Strangely, this code works for all other data groups but DIET.
dr2i_names<-nhanesSearchTableNames('DR2IFF',2003,2013)
dr2is<-lapply(dr2i_names,nhanes)
names(dr2is)<- dr2i_names
This yields the following error:
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0
I tried importing one file only from the table names but still got the same error.
> dr1<-nhanes(dr2i_names[1])
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0
> dr1<-nhanes("DR2IFF_C")
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0
> dr1<-nhanes('DR2IFF_C')
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): downloaded length 0 != reported length 0
Do you have any idea why this happens and how to fix the issue? Thanks.
Hi,
When I run the command nhanesTables('DEMO', 2015), I get all the available tables in 2015. However, when I type nhanes('DEMO_I') to load the table, I get the following error message :
simpleWarning in download.file(url, tf, mode = "wb", quiet = TRUE): cannot open URL 'https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT': HTTP status was '404 Not Found'
I think the route has changed and not being updated on this project.
What should I do to fix the problem ?
In addition, is it possible to make the 2016, 2017 and 2018 years available in the package ?
Regards,
Bryan
Hello there,
The package works really well! However, it's not possible to access to pre-pandemic data (NHANES 2017 to March 2020). Could you enable access?
https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&Cycle=2017-2020
All the best,
Jenny
We have
> nhanesCodebook("DEMO_C")$SDDSRVYR |> str()
List of 5
$ Variable Name:: chr "SDDSRVYR"
$ SAS Label: : chr "Data Release Number"
$ English Text: : chr "Data Release Number."
$ Target: : chr "Both males and females 0 YEARS -\r 150 YEARS"
$ SDDSRVYR : tibble [2 × 5] (S3: tbl_df/tbl/data.frame)
..$ Code or Value : chr [1:2] "3" "."
..$ Value Description: chr [1:2] "NHANES 2003-2004 Public Release" "Missing"
..$ Count : int [1:2] 10122 0
..$ Cumulative : int [1:2] 10122 10122
..$ Skip to Item : logi [1:2] NA NA
Yet, SDDSRVYR
is not translated by nhanes()
:
> nhanes("DEMO_C")$SDDSRVYR |> head()
[1] 3 3 3 3 3 3
I think this is due to the default of mincategories = 2
in nhanesTranslate()
(there are no missing values, and so only one unique value):
https://github.com/cjendres1/nhanes/blob/master/R/nhanes_translate.R#L52
I don't think this is sensible behavior. Is there any particular reason for the default not to be 1? My suggestion would be to change the default to 1 otherwise.
Also, currently there is no way to specify this when calling nhanes()
, which makes this effectively hard-coded. There should be a provision to pass this on to nhanesTranslate()
from nhanes()
.
Dear Mr. Endres,
Thank you for creating the nhanesA package, it has made working with the NHANES data easier.
Please I am having difficulty with importing data from the 2017-2020 pre pandemic data.
Wondering if others have also encountered similar problems and what the solution is.
For eg the following code works for all other years but not the 2017-2020 data
nhanesTableVars(data_group="DEMO", nh_table="P_DEMO",namesonly = TRUE)
Thank you
I was initially confused by this inconsistency in whether column names are 'fixed':
> nhanesCodebook('AUX_D', 'AUQ020D')$AUQ020D
# A tibble: 4 × 5
`Code or Value` `Value Description` Count Cumulative `Skip to Item`
<chr> <chr> <int> <int> <lgl>
1 1 Yes (checkbox checked) 11 11 NA
2 2 No (checkbox unchecked) 342 353 NA
3 9 Don't know 1 354 NA
4 . Missing 2680 3034 NA
> nhanesTranslate('AUX_D', colnames = 'AUQ020D', details = TRUE)$AUQ020D
Code.or.Value Value.Description Count Cumulative Skip.to.Item
1 1 Yes (checkbox checked) 11 11 NA
2 2 No (checkbox unchecked) 342 353 NA
3 9 Don't know 1 354 NA
4 . Missing 2680 3034 NA
I eventually figured out that this is because nhanesTranslate()
has
tabletrans <- html_elements(tabletree, 'table') |> html_table() |> as.data.frame()
while nhanesCodebook()
does not have the as.data.frame()
coercion.
tabletrans <- html_elements(tabletree, 'table') |> html_table()
I am conflicted about which is the right thing to do, but I guess we should at least be consistent?
And a lot of the existing code assumes that the as.data.frame()
coercion has been done, so probably that's the easiest choice for now.
From the Doc for L06_2_B there should be a column LB2DAY, but upon inspecting the columns downloaded using nhanesFromURL() this column is missing.
colnames(nhanesFromURL("https://wwwn.cdc.gov/Nchs/Nhanes/2001-2002/L06_2_B.XPT"))
Hi Chris
Could you update nhanes(nh_table, includelabels = FALSE)
to nhanes(nh_table, includelabels = FALSE, translated = TRUE)
by adding translated = TRUE
parameter? In the docker version, the function loads the translated data unless the users explicitly want to load the raw data by set translated = FALSE
.
For example,
nhanes("DEMO_C")
# By default, it would be better if it loads the categorical variables such as gender (RIAGENDR
) column should fill with Male
and Female
, and race(RIDRETH1) column contains Mexican American, Other Hispanic,Non-Hispanic White...
instead of the integer numbers.
Thanks,
Laha
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.