GithubHelp home page GithubHelp logo

datapackr's People

Contributors

christian-domaas avatar chuqyang avatar cnemarich avatar flopez-bao avatar jacksonsj avatar jason-p-pickering avatar jordanbalesbao avatar lownin avatar sam-bao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

datapackr's Issues

rePackSNUxIM giving distribution error because of decimals

With this datapack: https://www.pepfar.net/Project-Pages/collab-38/Shared%20Documents/Data%20Pack%202019%20Staging%20Area/Support%20Files/datapacks/DataPack_Malawi_03182019.xlsx

I am receiving a large number of errors related to incorrectly distributing data on the SNU x IM tab:

13 :  WARNING!: 519 cases where distributed total is either more or less than total Target. To identify these, go to your SNU x IM tab and filter the Rollup column for Pink cells. This has affected the following indicators -> 
	* GEND_GBV.N.ViolenceServiceType.20T.physEmot
	* GEND_GBV.N.ViolenceServiceType.20T.postRape
	* HTS_INDEX_COM.N.Age/Sex/Result.20T.NewNeg
	* HTS_INDEX_COM.N.Age/Sex/Result.20T.NewPos
	* HTS_INDEX_FAC.N.Age/Sex/Result.20T.NewNeg
	* HTS_INDEX_FAC.N.Age/Sex/Result.20T.NewPos
	* HTS_SELF.N.Age/Sex/HIVSelfTest.20T.Directly_Assisted
	* HTS_SELF.N.HIVSelfTest.20T.Unassisted
	* HTS_TST_Inpat.N.Age/Sex/Result.20T.Negative
	* HTS_TST_Inpat.N.Age/Sex/Result.20T.Positive
	* HTS_TST_MobileMod.N.Age/Sex/Result.20T.Negative
	* HTS_TST_MobileMod.N.Age/Sex/Result.20T.Positive
	* HTS_TST_OtherMod.N.Age/Sex/Result.20T.Negative
	* HTS_TST_OtherPITC.N.Age/Sex/Result.20T.Negative
	* HTS_TST_OtherPITC.N.Age/Sex/Result.20T.Positive
	* HTS_TST_PMTCTPostANC1.N.Age/Sex/Result.20T.Negative
	* HTS_TST_PMTCTPostANC1.N.Age/Sex/Result.20T.Positive
	* HTS_TST_STIClinic.N.Age/Sex/Result.20T.Negative
	* HTS_TST_STIClinic.N.Age/Sex/Result.20T.Positive
	* HTS_TST_VCT.N.Age/Sex/Result.20T.Negative
	* HTS_TST_VCT.N.Age/Sex/Result.20T.Positive
	* HTS_TST.N.KeyPop/Result.20T.Negative
	* HTS_TST.N.KeyPop/Result.20T.Positive
	* KP_PREV.N.KeyPop.20T
	* PMTCT_ART.N.Age/NewExistingART/Sex/HIVStatus.20T.Already
	* PMTCT_ART.N.Age/NewExistingART/Sex/HIVStatus.20T.New
	* PMTCT_STAT.D.Age/Sex.20T
	* PMTCT_STAT.N.Age/Sex/KnownNewResult.20T.NewNeg
	* PMTCT_STAT.N.Age/Sex/KnownNewResult.20T.NewPos
	* PP_PREV.N.Age/Sex.20T
	* PrEP_CURR.N.Age/Sex.20T
	* PrEP_CURR.N.KeyPop.20T
	* PrEP_NEW.N.Age/Sex.20T
	* PrEP_NEW.N.KeyPop.20T
	* TB_ART.N.Age/Sex/NewExistingART/HIVStatus.20T.Already
	* TB_ART.N.Age/Sex/NewExistingART/HIVStatus.20T.New
	* TB_PREV.D.Age/TherapyType/NewExistingArt/HIVStatus.20T.IPTNew
	* TB_PREV.N.Age/TherapyType/NewExistingArt/HIVStatus.20T.IPTNew
	* TX_CURR.N.Age/Sex/HIVStatus.20T
	* TX_NEW.N.Age/Sex/HIVStatus.20T
	* TX_NEW.N.KeyPop/HIVStatus.20T
	* TX_PVLS.D.Age/Sex/Indication/HIVStatus.20T.Routine
	* TX_PVLS.N.Age/Sex/Indication/HIVStatus.20T.Routine
	* TX_TB.D.Age/Sex/TBScreen/NewExistingART/HIVStatus.20T.ScreenNegAlready
	* TX_TB.D.Age/Sex/TBScreen/NewExistingART/HIVStatus.20T.ScreenNegNew
	* TX_TB.D.Age/Sex/TBScreen/NewExistingART/HIVStatus.20T.ScreenPosAlready
	* TX_TB.D.Age/Sex/TBScreen/NewExistingART/HIVStatus.20T.ScreenPosNew
	* VMMC_CIRC.N.Age/Sex/HIVStatus.20T.Negative
	* VMMC_CIRC.N.Age/Sex/HIVStatus.20T.Positive
	* VMMC_CIRC.N.Age/Sex/HIVStatus.20T.Unknown

Many if not all of these seem to be because the targets are not rounded, but the SNUxIM tab has rounded targets.

Gend GBV tab

image

SNUxIM tab

image

Notice these are not actually flagged in pink in the data pack as suggested in the error message.

This is the related code.

datapackr/R/rePackSNUxIM.R

Lines 91 to 120 in 9b34ce5

# TEST where attempted distribution sum != target
imbalancedDistribution <- d$data$distributedMER %>%
tidyr::drop_na(value, distribution) %>%
dplyr::select(-Age, -distribution, -mechanism_code) %>%
dplyr::group_by_at(dplyr::vars(dplyr::everything(), -value)) %>%
dplyr::summarize(value = round(sum(value), digits = 5)) %>%
dplyr::ungroup() %>%
dplyr::group_by_at(dplyr::vars(dplyr::everything(), -SNUxIM_value)) %>%
dplyr::summarize(SNUxIM_value = round(sum(SNUxIM_value), digits = 5)) %>%
dplyr::ungroup() %>%
dplyr::filter(value != SNUxIM_value)
if (NROW(imbalancedDistribution) > 0) {
d$tests$imbalancedDistribution <- imbalancedDistribution
imbalancedDistribution_inds <- imbalancedDistribution %>%
dplyr::select(indicator_code) %>%
dplyr::distinct() %>%
dplyr::arrange(indicator_code) %>%
dplyr::pull(indicator_code)
warning_msg <-
paste0(
"WARNING!: ",
NROW(imbalancedDistribution),
" cases where distributed total is either more or less than total Target.",
" To identify these, go to your SNU x IM tab and filter the Rollup column for Pink cells.",
" This has affected the following indicators -> \n\t* ",
paste(imbalancedDistribution_inds, collapse = "\n\t* "),
"\n")

Fix buggy code in createKeychainInfo

Lines 98-103 in createKeyChainInfo seem to want to compare the names of sheets from the schema, to the names of the sheets contained in the DataPack to be parsed.

  1. This does not do what I think the authors intention was
any(tab_names_expected != tab_names_received)

If any tabs have been added, and the lengths are different, this warning will appear, since the vectors are of different lengths.

Warning in tab_names_expected != tab_names_received :                                  
  longer object length is not a multiple of shorter object length
  1. Pretty sure this is already taken care of in checkStructure anyway.

@jacksonsj could you have a look and fix/remove?

Revisit solution in PR #89

Revisit solution in #89. Implemented fix seems functional but clunky. Due to time constraints we implemented the fix but consider it technical debt.

See pull request for details on the issue and solution.

Update dependencies

All developers need to be using the same version of dependencies in order to ensure that everything is reproducible across different environments.

Try and get type of tool and COP year if not specified

The current OPU and Datapack app share basically the same code and functionality. Keeping both of these apps maintained will be laborious and duplicative. With one app, we should be able to perform the necessary validations on both OPU DataPacks and normal DataPacks, since the vast majority of the code is essentially the same.

We should be able to pretty easily determine what type of tool we are working with, and from there, decide what to do with it in the app. Ideally, we could write the specific type of tool "Data Pack", "OPU Data Pack", etc into a specific range of cells on the Home tab, but this is currently available in cell B10, like "COP21 Data Pack" or "COP20 OPU Data Pack". Once we have this information in the app, we can proceed with the specific processing which each tool requires.

Command line users or apps would still be able to specify this information for specific use cases, but if left blank (NULL) we would try and obtain this information from the home tab.

Site Tool: Change OU sum logic

Change the way site tool computes OU sums from Data Pack. Instead of pulling from d$data$site$distributed, pull from d$data$MER for purest link.

Revise site tool schema

Current structure of datapackr::site_tool_schema does not actually relfect the outputted schema.

getMechMap doesn't return the column name

@jacksonsj I am getting an error when trying to pack a site tool that I have traced to this point.

dplyr::select(name, code) %>%

It appears that get MechList is not returning a column named name as expected at the referenced point in the code (and mabe some future points e.g. x = data.frame(mechID = mechList$name)). I get these columns when calling getMechList directly

> names(mechList)
[1] "mechanism" "code"      "uid"       "partner"   "primeid"   "agency"    "ou"        "startdate"
[9] "enddate"

I don't feel I know the code well enough here to fix this bug. Perhaps we should be using mechanism instead of name or perhaps we need to rename what is returned from getMechList.

Improve validation of Prioritization tab

As noted in the code, _Military PSNUs should not have any prioritization, and even if they do, it should not be imported and just ignored.

The code in this section of the parser could be improved a bit to provide better feedback to the user.

Unable to install package

FYI @sam-bao @jacksonsj

@gsarfaty in SA and I are having some trouble installing datapackr. There seem to be some upstream issues with installing datacommons, which has a dependency for doMC. We both are working off R 4.0.3.

remotes::install_github("pepfar-datim/datapackr")
#> Using github PAT from envvar GITHUB_PAT
#> Downloading GitHub repo pepfar-datim/datapackr@HEAD
#> datapackc... (NA -> cc99f39e4...) [GitHub]
#> piton        (NA -> 1.0.0       ) [CRAN]
#> tidyxl       (NA -> 1.0.7       ) [CRAN]
#> Downloading GitHub repo pepfar-datim/data-pack-commons@HEAD
#> Skipping 1 packages not available: doMC
#>          checking for file 'C:\Users\achafetz\AppData\Local\Temp\2\Rtmp61BBOY\remotes1ae4793a38f\pepfar-datim-data-pack-commons-cc99f39/DESCRIPTION' ...  v  checking for file 'C:\Users\achafetz\AppData\Local\Temp\2\Rtmp61BBOY\remotes1ae4793a38f\pepfar-datim-data-pack-commons-cc99f39/DESCRIPTION' (711ms)
#>       -  preparing 'datapackcommons':
#>    checking DESCRIPTION meta-information ...     checking DESCRIPTION meta-information ...   v  checking DESCRIPTION meta-information
#>       -  checking for LF line-endings in source and make files and shell scripts
#>       -  checking for empty or unneeded directories
#>       -  building 'datapackcommons_0.2.1.tar.gz'
#>      
#> 
#> Installing package into 'C:/Users/achafetz/Documents/R/win-library/4.0'
#> (as 'lib' is unspecified)
#> Error: Failed to install 'datapackr' from GitHub:
#>   Failed to install 'datapackcommons' from GitHub:
#>   (converted from warning) installation of package 'C:/Users/achafetz/AppData/Local/Temp/2/Rtmp61BBOY/file1ae4676b2d13/datapackcommons_0.2.1.tar.gz' had non-zero exit status
remotes::install_github("pepfar-datim/data-pack-commons")
#> Using github PAT from envvar GITHUB_PAT
#> Downloading GitHub repo pepfar-datim/data-pack-commons@HEAD
#> Skipping 1 packages not available: doMC
#>          checking for file 'C:\Users\achafetz\AppData\Local\Temp\2\RtmpOQqKpN\remotes22c442247a37\pepfar-datim-data-pack-commons-cc99f39/DESCRIPTION' ...  v  checking for file 'C:\Users\achafetz\AppData\Local\Temp\2\RtmpOQqKpN\remotes22c442247a37\pepfar-datim-data-pack-commons-cc99f39/DESCRIPTION' (720ms)
#>       -  preparing 'datapackcommons':
#>    checking DESCRIPTION meta-information ...     checking DESCRIPTION meta-information ...   v  checking DESCRIPTION meta-information
#>       -  checking for LF line-endings in source and make files and shell scripts
#>       -  checking for empty or unneeded directories
#>       -  building 'datapackcommons_0.2.1.tar.gz'
#>      
#> 
#> Installing package into 'C:/Users/achafetz/Documents/R/win-library/4.0'
#> (as 'lib' is unspecified)
#> Error: Failed to install 'datapackcommons' from GitHub:
#>   (converted from warning) installation of package 'C:/Users/achafetz/AppData/Local/Temp/2/RtmpOQqKpN/file22c4e174685/datapackcommons_0.2.1.tar.gz' had non-zero exit status
install.packages("doMC")
#> Installing package into 'C:/Users/achafetz/Documents/R/win-library/4.0'
#> (as 'lib' is unspecified)
#> Warning: package 'doMC' is not available for this version of R
#> 
#> A version of this package for your version of R might be available elsewhere,
#> see the ideas at
#> https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Created on 2021-01-28 by the reprex package (v0.3.0)

Better handling of authentication files.

Related to pepfar-datim/datimutils#5

Logins functions should

Accept a config file with no slash, a slash, or multiple slashes for baseurl and mutate this to a single trailing slash
All other API calls in code, should never start with a slash
A utility function to a) encode all URIs and b) check if there are any double slashes (which we know will fail)
e.g. a wrapper around utils::urlencode that thrown an error if there is a "//" other than in "https://"

Eventually, this function should really be replaced entirely be something similar from the upcoming datimutils package.

HTS.Index

Reviewing the indicators in the schema (cop23_data_pack_schema), why is index testing the only indicator that does not match MER - stored as HTS.Index as opposed to HTS_INDEX?

  datapackr::cop23_data_pack_schema |>
    tibble::as_tibble() |>
    dplyr::filter(col_type == "target") |>
    dplyr::select(indicator_code) |>
    dplyr::distinct(indicator_code) |>
    dplyr::mutate(indicator = stringr::str_extract(indicator_code, "[^\\.]+")) |>
    dplyr::arrange(indicator) |>
    print(n = Inf)

Remove use of "file.choose" if possible

We will often need to automate these scripts, and having a user interaction required is problematic.Be sure to remove the use of file.choose if the required file path is not supplied as a parameter to the function which needs it.

Bug in imbalancedDistribution Test

with this data pack: https://www.pepfar.net/Project-Pages/collab-38/Shared%20Documents/Data%20Pack%202019%20Staging%20Area/Support%20Files/datapacks/DataPack_Malawi_03182019.xlsx

I am getting erroneous imbalanced distribution warnings:

13 :  WARNING!: 131 cases where distributed total is either more or less than total Target. To identify these, go to your SNU x IM tab and filter the Rollup column for Pink cells. This has affected the following indicators -> 
	* GEND_GBV.N.ViolenceServiceType.20T.physEmot
	* GEND_GBV.N.ViolenceServiceType.20T.postRape
	* HTS_INDEX_COM.N.Age/Sex/Result.20T.NewNeg
	* HTS_INDEX_COM.N.Age/Sex/Result.20T.NewPos
	* HTS_INDEX_FAC.N.Age/Sex/Result.20T.NewPos
	* HTS_SELF.N.Age/Sex/HIVSelfTest.20T.Directly_Assisted
	* HTS_SELF.N.HIVSelfTest.20T.Unassisted
	* HTS_TST_OtherMod.N.Age/Sex/Result.20T.Negative
	* HTS_TST_OtherPITC.N.Age/Sex/Result.20T.Positive
	* HTS_TST.N.KeyPop/Result.20T.Negative
	* HTS_TST.N.KeyPop/Result.20T.Positive
	* KP_PREV.N.KeyPop.20T
	* PrEP_CURR.N.Age/Sex.20T
	* PrEP_CURR.N.KeyPop.20T
	* PrEP_NEW.N.Age/Sex.20T
	* PrEP_NEW.N.KeyPop.20T
	* TB_ART.N.Age/Sex/NewExistingART/HIVStatus.20T.Already
	* TB_ART.N.Age/Sex/NewExistingART/HIVStatus.20T.New
	* TX_CURR.N.Age/Sex/HIVStatus.20T
	* TX_NEW.N.Age/Sex/HIVStatus.20T
	* TX_NEW.N.KeyPop/HIVStatus.20T
	* VMMC_CIRC.N.Age/Sex/HIVStatus.20T.Negative
	* VMMC_CIRC.N.Age/Sex/HIVStatus.20T.Positive
	* VMMC_CIRC.N.Age/Sex/HIVStatus.20T.Unknown

This screen shot has two rows from the same PSNU. NOTE that the value column has a different (exactly double) entry in the second row.

image

If we look at the data pack we see the targets are correctly allocated:

image

image

The affected code is here:

datapackr/R/rePackSNUxIM.R

Lines 92 to 120 in 9b34ce5

imbalancedDistribution <- d$data$distributedMER %>%
tidyr::drop_na(value, distribution) %>%
dplyr::select(-Age, -distribution, -mechanism_code) %>%
dplyr::group_by_at(dplyr::vars(dplyr::everything(), -value)) %>%
dplyr::summarize(value = round(sum(value), digits = 5)) %>%
dplyr::ungroup() %>%
dplyr::group_by_at(dplyr::vars(dplyr::everything(), -SNUxIM_value)) %>%
dplyr::summarize(SNUxIM_value = round(sum(SNUxIM_value), digits = 5)) %>%
dplyr::ungroup() %>%
dplyr::filter(value != SNUxIM_value)
if (NROW(imbalancedDistribution) > 0) {
d$tests$imbalancedDistribution <- imbalancedDistribution
imbalancedDistribution_inds <- imbalancedDistribution %>%
dplyr::select(indicator_code) %>%
dplyr::distinct() %>%
dplyr::arrange(indicator_code) %>%
dplyr::pull(indicator_code)
warning_msg <-
paste0(
"WARNING!: ",
NROW(imbalancedDistribution),
" cases where distributed total is either more or less than total Target.",
" To identify these, go to your SNU x IM tab and filter the Rollup column for Pink cells.",
" This has affected the following indicators -> \n\t* ",
paste(imbalancedDistribution_inds, collapse = "\n\t* "),
"\n")

Seems like a problem in the group by/aggregation of the data.

Add check for row 5 to colStructure checks

To detect cases where users have added rows above row 5 that causes problems, or where row 6 is not the beginning of data.

col_check <- schema %>%
dplyr::filter(sheet_name == sheet
& !(sheet == "SNU x IM" & indicator_code == "Mechanism1")) %>%
dplyr::select(indicator_code, template_order = col) %>%
dplyr::full_join(submission_cols, by = c("indicator_code" = "indicator_code")) %>%
dplyr::mutate(order_check = template_order == submission_order)

False Positives for decimal values in unPackSheet

This line of code is not reliably detecting non-integers.

dplyr::filter(value %% 1 != 0

As an example for this data pack: https://www.pepfar.net/Project-Pages/collab-38/Shared%20Documents/Data%20Pack%202019%20Staging%20Area/Support%20Files/datapacks/71_DataPack_Uganda_20190124160453_03082019.xlsx,

non decimals are flagged on the PMTCT_STAT_ART tab in the PMTCT_STAT.D.Age/Sex.20T column. However looking at the excel version of the data pack does not reveal any non-integer numbers. There appears to be some floating point error introduced when readxl::read_excel initially reads in the sheet.

image

Error message on SNUxIM not distributed when target <.5

For this data pack: https://www.pepfar.net/Project-Pages/collab-38/Shared%20Documents/Data%20Pack%202019%20Staging%20Area/Support%20Files/datapacks/DataPack_Malawi_03182019.xlsx

I am recieving this blocking error:

13 :  ERROR!: 1 cases where no distribution was attempted for Targets. To identify these, go to your SNU x IM tab and filter the Rollup column for Pink cells. This has affected the following indicators -> 
	* PMTCT_STAT.N.Age/Sex/KnownNewResult.20T.NewPos

Investigating I find that the source of the error is a target < .5 that is rounded to 0 on theSNUxIM tab. So no distribution against this target was made.

image

image

This is the code that produces the error.

undistributed <- d$data$distributedMER %>%
dplyr::filter(!is.na(value) & is.na(distribution))
if (NROW(undistributed) > 0) {
d$tests$undistributed <- undistributed
undistributed_inds <- undistributed %>%
dplyr::select(indicator_code) %>%
dplyr::distinct() %>%
dplyr::arrange(indicator_code) %>%
dplyr::pull(indicator_code)
warning_msg <-
paste0(
"ERROR!: ",
NROW(undistributed),
" cases where no distribution was attempted for Targets.",
" To identify these, go to your SNU x IM tab and filter the Rollup column for Pink cells.",
" This has affected the following indicators -> \n\t* ",
paste(undistributed_inds, collapse = "\n\t* "),
"\n")

Site Tool: Cannnot validate regional Data Packs

It is currently not possible to validate the West Africa Regional Data Pack, due to the lack of a UID.

> d<-unPackSiteToolData("/home/jason/consultancy/DATIM/Site Tool_West-Central Africa Region_20190410085106.15Apr2019.GLMSBT.xlsx")
[1] "Checking the file exists..."
[1] "Checking the OU name and UID on HOME tab..."
Error in if (d$info$datapack_name != datapack_name | d$info$datapack_uid !=  :                                                     
  argument is of length zero

is the error.

Problem seems to be here.

I would rather not fix a hack with another hack. For West Africa, can't we just use the UID which is in DATIM?

Improve performance of adornMechanisms

There are some significant performance issues when calling the method adornMechanisms, since each time the function is applied, an API request must be obtained from the DATIM server, which is fairly slow. This does not happen if a support file is present, which is simply an RDS file containing the API view.

With the deployment of the app on the new connect servers, we need a slightly different mechanism to store this file. This function will be refactored slightly to

  1. Load the cached file if its present and less than a day old
  2. Attempt to retrieve the file and cache it locally if the file is stale or not present.

Trying datapackr, not working

Scott et al.,

I ran this, after what seemed a successful install, and a restart (so I didn't have the whole history of the install), and I it dropped me out early with an error.

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(devtools)
Loading required package: devtools
> require(datapackr)
Loading required package: datapackr
> d <- unPackData()
[1] "Checking the file exists..."
[1] "Checking the OU name and UID on HOME tab..."
Error: expected <

Here is link to latest datapack which I was trying to check:
https://www.pepfar.net/ou/vietnam/HQ%20Collaboration/COP%202019%20%E2%80%93%20FY%202020/Original%20Submission%20of%20Required%20Tools%20(Feb%2021)/DataPack_Vietnam%2020190225%2018h00.xlsb

Year 2 sheet bug on check

Running datapackr (master) I run into a bug with the newly introduced Year 2 tab. The tab does not does not match the structure of the other tabs (no PSNU information) which results in an error mapping through the data import. Not sure if this is part of your PR @jason-p-pickering in the parse-year2 branch.

image

Flawed logic in dedupe resolution.

There was an issue in the following lines of code:

dplyr::group_by(PSNU,psnuid,indicator_code,Age,Sex,KeyPop,support_type) %>% 
    dplyr::summarize(distribution = sum(distribution)) %>% 
    dplyr::mutate(distribution_diff = abs(distribution - 1.0)) %>% 
    dplyr::filter(distribution_diff >= 1e-3 & distribution != 1.0) %>% 

So, since the data was being grouped by support_type and then summed...well, it was just wrong. Sloppy copy and paste from the pure dedupe section.

The correct way to identify dedupes is to calculate the count of components (DSD/DSD or TA/TA) for pure duplication, and for crosswalks, to determine if there is any DSD/TA allocation for the same data element disagg. There is no need at the identification phase to worry about what the allocation is. Its better just to count and see how many potential data element/disaggs overlap, and then filter for the 100% allocations.

Implement encryption of support files

The writePSNUxIM function requires the path to the SNUxIM model file. This works fine for local installs where a file path is known, but does not work great for server installations/apps where there is no intrinsic ability to control the location of the file path on the server. We have previously not been able to store the model file as part of the source code due to security concerns.

The basic approach is to use a single symmetric key, which we can then store on the server and retrieve as an environment variable.

#Create a random string of 32 characters
k<-stringi::stri_rand_strings(1,32)
> k
[1] "JiVsc14Ob9L7FClK6OVTxvAfHW9U7XZS"
#Convert this to a sodium key
key<-cyphr::key_sodium(charToRaw(k))
#Read the data to be encrypted
foo<-readRDS("PSNUxIM_20200319.rds")
#Save as an encrypted file 
cyphr::encrypt(saveRDS(foo,"foo.encrypted"),key)
#This does not work
> readRDS("foo.encrypted")
Error in readRDS("foo.encrypted") : unknown input format
#This does work
cyphr::decrypt(readRDS("foo.encrypted"),key)

The encrypted file cannot be read without the key, and can thus be securely stored as part of the source code in GitHub (as long as the key itself is kept secret).

This approach should alleviate the issues we have with not being able to store support files, such as the model file, as part of the source code itself, which is needed to deploy the app to the server, without intrinsic knowledge of where the file itself is going to be stored.

Thoughts @sam-bao @jacksonsj ?

Bug/BadError message in unpack site tool data

if ( any( has_positive_dedupe ) ) {

While validating a South Africa site tool the validation app states:
image

Running the code from the terminal states: Error in if (any(has_positive_dedupe)) { :
missing value where TRUE/FALSE needed

I determined there were rows in the PrEP tab of the site tool with blanks for the mechanism code (the very last rows of the table to be exact.) Once these cells were populated the validation worked.

Bug if >1 column contains decimal values

Receiving this error:

Error in d[["tests"]][["decimal_cols"]][[as.character(sheet)]] <- decimal_cols : 
  more elements supplied than there are to replace

trying to parse this datapack:

https://www.pepfar.net/Project-Pages/collab-38/Shared%20Documents/Data%20Pack%202019%20Staging%20Area/Support%20Files/datapacks/DataPack_Namibia_20190314_1200.xlsx

The code I used to parse the data pack:

country_uids <- c("FFVkaV9Zk1S")

submission_path <- "###"
## Note that submission_path is optional in this setup. If not supplied, a console window will pop up to allow you to pick the file.

d <- datapackr::unPackTool(submission_path = submission_path,
                tool = "Data Pack",
                country_uids = country_uids)

Problem is related to this line of code:

d[["tests"]][["decimal_cols"]][[as.character(sheet)]] <- decimal_cols

It is not obvious to me what should go in to d[["tests"]][["decimal_cols"]][[as.character(sheet)]] but it seems like this works:

d[["tests"]][["decimal_cols"]][[as.character(sheet)]] <- list(decimal_cols)

This same issue may be repeated in other pieces of code such as

 d[["tests"]][["non_numeric"]][[as.character(sheet)]] <- non_numeric

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.