GithubHelp home page GithubHelp logo

data-pack-importer's Introduction

Jekyll Now

Jekyll is a static site generator that's perfect for GitHub hosted blogs (Jekyll Repository)

Jekyll Now makes it easier to create your Jekyll blog, by eliminating a lot of the up front setup.

  • You don't need to touch the command line
  • You don't need to install/configure ruby, rvm/rbenv, ruby gems ☺️
  • You don't need to install runtime dependancies like markdown processors, Pygments, etc
  • If you're on Windows, this will make setting up Jekyll a lot easier
  • It's easy to try out, you can just delete your forked repository if you don't like it

In a few minutes you'll be set up with a minimal, responsive blog like the one below giving you more time to spend on writing epic blog posts!

Jekyll Now Theme Screenshot

Quick Start

Step 1) Fork Jekyll Now to your User Repository

Fork this repo, then rename the repository to yourgithubusername.github.io.

Your Jekyll blog will often be viewable immediately at http://yourgithubusername.github.io (if it's not, you can often force it to build by completing step 2)

Step 1

Step 2) Customize and view your site

Enter your site name, description, avatar and many other options by editing the _config.yml file. You can easily turn on Google Analytics tracking, Disqus commenting and social icons here too.

Making a change to _config.yml (or any file in your repository) will force GitHub Pages to rebuild your site with jekyll. Your rebuilt site will be viewable a few seconds later at http://yourgithubusername.github.io - if not, give it ten minutes as GitHub suggests and it'll appear soon

There are 3 different ways that you can make changes to your blog's files:

  1. Edit files within your new username.github.io repository in the browser at GitHub.com (shown below).
  2. Use a third party GitHub content editor, like Prose by Development Seed. It's optimized for use with Jekyll making markdown editing, writing drafts, and uploading images really easy.
  3. Clone down your repository and make updates locally, then push them to your GitHub repository.

_config.yml

Step 3) Publish your first blog post

Edit /_posts/2014-3-3-Hello-World.md to publish your first blog post. This Markdown Cheatsheet might come in handy.

First Post

You can add additional posts in the browser on GitHub.com too! Just hit the + icon in /_posts/ to create new content. Just make sure to include the front-matter block at the top of each new blog post and make sure the post's filename is in this format: year-month-day-title.md

Local Development

  1. Install Jekyll and plug-ins in one fell swoop. gem install github-pages This mirrors the plug-ins used by GitHub Pages on your local machine including Jekyll, Sass, etc.
  2. Clone down your fork git clone [email protected]:yourusername/yourusername.github.io.git
  3. Serve the site and watch for markup/sass changes jekyll serve
  4. View your website at http://0.0.0.0:4000
  5. Commit any changes and push everything to the master branch of your GitHub user repository. GitHub Pages will then rebuild and serve your website.

Moar!

I've created a more detailed walkthrough, Build A Blog With Jekyll And GitHub Pages over at the Smashing Magazine website. Check it out if you'd like a more detailed walkthrough and some background on Jekyll. 🤘

It covers:

  • A more detailed walkthrough of setting up your Jekyll blog
  • Common issues that you might encounter while using Jekyll
  • Importing from Wordpress, using your own domain name, and blogging in your favorite editor
  • Theming in Jekyll, with Liquid templating examples
  • A quick look at Jekyll 2.0’s new features, including Sass/Coffeescript support and Collections

Jekyll Now Features

✓ Command-line free fork-first workflow, using GitHub.com to create, customize and post to your blog
✓ Fully responsive and mobile optimized base theme (Theme Demo)
✓ Sass/Coffeescript support using Jekyll 2.0
✓ Free hosting on your GitHub Pages user site
✓ Markdown blogging
✓ Syntax highlighting
✓ Disqus commenting
✓ Google Analytics integration
✓ SVG social icons for your footer
✓ 3 http requests, including your avatar

✘ No installing dependancies
✘ No need to set up local development
✘ No configuring plugins
✘ No need to spend time on theming
✘ More time to code other things ... wait ✓!

Questions?

Open an Issue and let's chat!

Other forkable themes

You can use the Quick Start workflow with other themes that are set up to be forked too! Here are some of my favorites:

Credits

Contributing

Issues and Pull Requests are greatly appreciated. If you've never contributed to an open source project before I'm more than happy to walk you through how to create a pull request.

You can start by opening an issue describing the problem that you're looking to resolve and we'll go from there.

I want to keep Jekyll Now as minimal as possible. Every line of code should be one that's useful to 90% of the people using it. Please bear that in mind when submitting feature requests. If it's not something that most people will use, it probably won't get merged. 💂‍♂️

data-pack-importer's People

Contributors

achafetz avatar jacksonsj avatar jason-p-pickering avatar sam-bao avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-pack-importer's Issues

Identified variation in Disagg Tool structure

For Zimbabwe and DRC, fields in "Already Alloc Targets" are as follows:

kp_prev_f_pwid_fy19 D_kp_prev_fsw_fy19 kp_prev_m_pwid_fy19 D_kp_prev_msm_not_sw_fy19 kp_prev_msm_sw_fy19 kp_prev_prison kp_prev_tg_notsw kp_prev_tg_sw kp_mat_f kp_mat_m D_ovc_hivstat_fy19 D_pmtct_art_already_fy19 D_pmtct_art_new_fy19 D_pmtct_eid_u2mo_fy19 D_pmtct_eid_o2mo_fy19

For other countries and in data-pack-importer schema (as provided in "final" template from @achafetz), fields in same tab are as follows:

kp_prev_f_pwid kp_prev_fsw kp_prev_m_pwid kp_prev_msm_notsw kp_prev_msm_sw kp_prev_prison kp_prev_tg_notsw kp_prev_tg_sw kp_mat_f kp_mat_m ovc_hivstat D_pmtct_art_already_fy19 D_pmtct_art_new_fy19 D_pmtct_eid_u2mo_fy19 D_pmtct_eid_o2mo_fy19

This causes a validation error for files using the first version as it is out-of-sync with version 2 as captured in data-pack-importer schemas.

Number formatting

Asking for a friend

To whom it may concern;

Please fix number formatting -> "#,##0;-#,##0;;"

Love,
A concerned citizen

image

Adding in Validation Columns?

One feature currently missing from the disagg tool is a validation check to ensure the disagg allcations sum to 100% for each grouping. Would it be possible for me to add in 1 column per disagg grouping in each tab prior to the data pack target columns?

For example, TX_CURR would have a validation check for <15 and 15+ (the gold headed columns in the image below).

image

prepare_export_to_datim dropping dataElements

@jason-p-pickering - for some reason the prepare_export_to_datim function is dropping select dataElements when I attempt to produce PSNU import files.

Example: Namibia, PMTCT_STAT (D, DSD, Age/Sex) (l7M8cLykXri) getting dropped in import file.

Screenshot of these values in Disagg Tool:

image

When I run:
DataPackR(disagg_tool, distribution_method = distribution_year, support_files_path = support_files)

I don't see these in the resulting CSV file (l7M8cLykXri should appear between KobylmT2gvc and M5m4zQMsdX5):

image

However, when I break prepare_export_to_datim apart and run code this way:

d<- ImportSheets(wb_path,distribution_method = distribution_method,support_files_path = support_files_path)

if (d$wb_info$wb_type %in% c("HTS", "NORMAL")) {
    file_prefix <- "/psnu_import_"
    d <- distributeCluster(d)
  } else if (d$wb_info$wb_type %in% c("HTS_SITE", "NORMAL_SITE")) {
    file_prefix <- "/site_import_"
  }

  export_data <- d$data %>%
    dplyr::mutate(value = as.character(round_trunc(as.numeric(value)))) %>%
    dplyr::filter(!(value < 1)) %>%
    dplyr::select(
      dataelement,
      period,
      orgunit,
      categoryoptioncombo,
      attributeoptioncombo,
      value
    ) %>%
    na.omit()

  output_file_path <- paste0(
    dirname(d$wb_info$wb_path),
    file_prefix,
    d$wb_info$wb_type,
    "_",
    d$wb_info$ou_name,
    "_",
    format(Sys.time(), "%Y%m%d%H%M%S"),
    ".csv"
  )

  utils::write.table(
    export_data,
    file = output_file_path,
    quote = TRUE,
    sep = ",",
    row.names = FALSE,
    col.names = TRUE
  )

I see those values in the export (appearing between KobylmT2gvc and M5m4zQMsdX5):

image

Can't figure out a cause.

This seems to be happening for many dataElements for many OUs and is raising numerous flags in datim validations.

Change in tabs needed to be imported from Data Pack

Just uploaded a new version of map between Data Pack Codes and DATIM DE.CoC UIDs. This version specifies which document (DisaggTool or HTS DisaggTool) and in which tabs each data pack code is found.

Uniquely selecting DataPackFilename and DataPackTabName will give a list of which tabs to keep within each tool.

Note the following tabs that are outliers in the DisaggTool:

Already Alloc Targets

  • KP_MAT
  • KP_PREV
  • OVC_HIVSTAT
  • PMTCT_ART (denominator)
  • PMTCT_EID (denominator)

IMPATT Table

  • plhiv_fy19

Add method to export PSNU and site level dataset

ImportSheets will parse the disagg tool.

For non-clustered countries, we can export the data directly to JSON/CSV.

For clustered countries, we should either

  1. Distribute to site and then reaggregate back up to PSNU and then export
    or
  2. Can we aggregate the percentage allocations by PSNU first and then export?

@jacksonsj thoughts?

Address bug in how ValidateSheet is reading fields from schemas$schema

Seeing an issue with how ValidateSheet is reading field names from main_schema and hts_schema.

Hypothesis: using sheet_name as a function variable, where this is also explicitly named as a field in schemas$schema causes list.find to always return field names for first sheet in each workbook, rather than for each sheet as looped in ValidateSheets.

https://github.com/jason-p-pickering/data-pack-importer/blob/prod/R/data-pack-importer.R#L11

Checksum on support files

Undiscovered issues may come up if it's not run with the most recent support files.

  • create checksum on support files
  • proceed if matching to the checksum in this repo
  • bail out if it fails, include message "update your support files"

Unknown warnings in Uganda's HTS tool

warnings()
Warning messages:
1: In read_fun(path = path, sheet = sheet, limits = limits, ... :
Expecting numeric in AT7 / R7C46: got a date
2: In read_fun(path = path, sheet = sheet, limits = limits, ... :
Expecting numeric in AU7 / R7C47: got a date
3: In read_fun(path = path, sheet = sheet, limits = limits, ... :
Expecting numeric in AV7 / R7C48: got a date
4: In read_fun(path = path, sheet = sheet, limits = limits, ... :
Expecting numeric in AW7 / R7C49: got a date

Difference in Centrally Supported reference between Disagg Tool and data-pack-importer::impatt$options$dp_code

Code below creates impatt option set to translate Data Pack codes to values as seen in IMPATT data set, using "CtrlSupported" with no space.

"dp_code": "CtrlSupported"

Disagg Tool lists this as "Ctrl Supported" with a space.
image

This is preventing ImportSheet from correctly importing Centrally Supported data, and raising a warning about coercion of non-numeric values into NAs here:

has_negative_numbers<-as.numeric(df$value) < 0

Handling of very small values from the Disagg Tool

I ran into an issue with UG's file related to this. Small positive values produced site-level values between 0 and 0.5. These were rounded to 0 and dropped in transition between Disagg Tool and Site Level Tool. The result was immediate differences between Disagg Tool values and Site Level roll-ups. Current process makes it difficult for users to identify where this has happened.

Possible Solutions:

  1. Keep as is. Make this clear in guidance.
  2. Push rounding operations to right before exporting to site level tool (move from distributeSite function to export_site_level_tool or write_site_level_sheet). Then, do not drop values between 0 and 0.5 where these were produced by our automatic distribution process (ok to drop these small positive values where inherited directly from Disagg Tool). Write these to site level tool as 0s to indicate where this has occurred.

So far, option 1 has been fine. Felt the need to flag and ask feedback.

Remove references to local files

@jacksonsj In trying to merge your code from the COP18 repo into this, there are a number of references to files which are not available.

As an example

read.csv(file=paste0(GoogleDrive,"/Facility Distribution Script/Distribution Source Files/COP18DistributionSource_20180123.csv"),stringsAsFactors=FALSE,header=TRUE)

All of these types of references need to be removed. I can help to refactor all of this, but could you make available all of these files (here if they are not sensitive or Sharepoint otherwise)?

Check for invalid mechanisms

In some data packs, people are using mechanisms like "TBD". We should check for the presence of invalid mechanisms and issue a warning if they exist.

Burundi data pack hierarchy change and re-run

ICPI/COP#92

Jasmine:
Request received from SI Advisor (Yesenia)
Approved by ICPI
Assigned to Aaron and Pooja

Request
Based on this morning's discussion, Burundi will need to make a change to our DATIM hierarchy. We need to combine/cluster Bujumbura Mairie and Bujumbura Rural into Bujumbura. Also, we are proposing (with Chair support) to reactivate Gitega, which already exists in DATIM and the Datapack--just noting as an FYI in case of consequence.

Apart from the aforementioned, no other changes.

Please let me know what is required, timelines and next steps for the update.

I will also swing by the ICPI helpdesk again for follow up.

Aaron:
sent cluster form to Y.Saulino to complete and then I can regenerate the Data Pack (will be done off FV Q4v3_2, current is v2_1).Data Pack should be a quick run (<20min) once the file is received; the DTs will take longer since I have to access the server and move files over (on this internet with poor bandwidth)

@jason-p-pickering @jacksonsj @davidhuser FYI

Changes/Change Requests to (Normal) Disagg Tool

I went through the Disagg Tool this afternoon to align it with the Target entry screens. I've detailed some proposed changes (or a couple I already changed due to minor issues or were necessary to protect the tool's integrity). I'd like your okay on making these structural changes.
COP18DisaggToolTemplate v2018.01.29.zip

  • HTS_SELF
    • potential issue - target entry screens show - fine compared with the new "finerer" which are reflected in the MER 2.0 guidance and Disagg Tool
  • PMTCT
    • fixed #REF formula error in the Age/Sex disaggs
    • can I rename this sheet PMTCT_STAT rather than just PMTCT?
    • can I remove a middle column for Sex disagg no longer in entry screen, can it be deleted?
  • TB_ART
    • can I remove 2 allocation columns for ART HIVStatus, as they are no longer being used/collected?
    • can I remove last column for HIVStatus, no longer collected?
      -TB_PREV
    • removed group header (row 3) in four columns (allocation and target for D and N); grouping all of TherapyType together for Denom and Num
    • can I remove two middle columns for HIVStatus, no longer collected?
  • TX_CURR
    • can I remove last column for HIVStatus, no longer collected?
  • TX_NEW
    • can I remove a middle columns for HIVStatus, no longer collected?
  • TX_PLVS
    • added DENOMINATOR to all header labels in row 2
    • issue with allocation variable names row 6, replaced with correct ones
    • can I remove a middle column for HIVStatus, no longer collected?
  • TX_RET
    • can I remove two middle columns for HIVStatus, no longer collected?
  • TX_TB
    • added DENOMINATOR to all header labels in row 2
    • can I remove two middle columns for HIVStatus, no longer collected?
  • VMMC
    • can I remove two middle column for Sex, no longer collected?

BWA --> 3 new clusters

J.Roffenbender added 3 clusters for Botswana. New Data Pack was run yesterday to include them. Plan on rerunning the Disagg Tools for them today.

Additions in the table below with newly assigned PSNU cluster UIDs:

operatingunit psnu psnuuid currentsnuprioritization cluster_psnu cluster_snu1 cluster_psnuuid cluster_currentsnuprioritization
Botswana Serowe YmeyoDakwFX 4 - Sustained SerowePalapye Cluster [Clustered] eIMbzhLw0CH [NOT DEFINED]
Botswana Palapye TcLMNlsgbcz [NOT DEFINED] SerowePalapye Cluster [Clustered] eIMbzhLw0CH [NOT DEFINED]
Botswana Charleshill v3v0gu2Gtol [NOT DEFINED] CharleshillGhanzi Cluster [Clustered] xy7foRJ2vp9 [NOT DEFINED]
Botswana Ghanzi saOJtBcENzM 6 - Sustained: Commodities CharleshillGhanzi Cluster [Clustered] xy7foRJ2vp9 [NOT DEFINED]
Botswana Moshupa xLLFT8x1CF5 [NOT DEFINED] MoshupaSouthern Cluster [Clustered] LRwB4ApIz9u [NOT DEFINED]
Botswana Southern LEUjALXInGD 1 - Scale-Up: Saturation MoshupaSouthern Cluster [Clustered] LRwB4ApIz9u [NOT DEFINED]

Import of Follow on mechs

@jacksonsj Not sure how to handle the follow on mechs. I would guess you need that information from the parser. My initial thought was to produce a data structure like

{ data : [...],
  follow_on_mechs: [...] }

which would contain both objects. The data object would be just a normal data frame as we have discussed, and the follow-on mechs would contain some map which would be parsed from that sheet. Is there any need to worry about this?

Create function to export negative values

  #Parse the PNSU data
  psnu_values<-ImportSheets(disagg_tool,
            distribution_method = distribution_year,
            support_files_path = support_files)

Would provide the PNSU data. Create a pipeline to remap all of the UIDs to human readable forms and create an export file to send back to the OU.

Issue with Cameroon's data pack

@jacksonsj it looks like the structure of the distrClusterFY17.rda file has been changed to include a uidlevel3 field. Has this been altered in your COP18 repo and the new version of the file which is on SharePoint?

Problem is here

dplyr::left_join(Pcts[Pcts$uidlevel3==ou_uid,],by=c("whereWhoWhatHuh")) %>% on line 82 of perform_distribution.R

Remove sysdata.rda

We need to be able to regenerate the schemas more modularly. Remove the sysdata.rda file and replace with standard package datasets instead.

Military Prioritizations being read as NAs --> throwing off negative values check

I received this error when validating Uganda's non-HTS Disagg Tool:

image

After investigation, I found this:

image

"Mil" is not defined as an option in "./data-raw/impatt_options_set.json" so is being left in the value column as a string.

When the below code is run:

has_negative_numbers <- as.numeric(df$value) < 0
if (any(has_negative_numbers)) {
    foo <- df[has_negative_numbers, ]
    warning("Negative values were found in the data!")
    print(foo)
  }

"Mil" prioritization becomes NA, causing the any function to yield NA, explaining the error above

Problems initializing follow-on mech mapping

Uganda was the first OU to leverage this feature and I found several issues with how it's currently being staged.

In the UG example, three mechanisms (13717, 9183, and 17978) are merging to become 17978. Initially, follow-on mech mapping was designed to accommodate one mechanism being replaced entirely by a completely new mechanism. In UG's case, all three previous mechanisms -- one of which was the new mechanism -- already had complete sets of percent distributions within each PSNU/dataelement/categoryoptioncombo combination. This resulted in dramatic inflation of values when they were combined.

E.g.,

13717=(1/3+1/3+1/3), 9183=(1/2+1/2), 17978=(1/4+1/2+1/4)

If 13717 and 9183 both become 17978, current process produces:
17978=(1/3+1/3+1/3+1/2+1/2+1/4+1/2+1/4)

This triples intended values.

Options for resolving:

  1. Drop follow-on mech feature entirely. Country Teams will need to manually remap in Site Level Tool.
  2. Retain values in cluster and site distribution support files. After receiving Disagg Tool, remap follow-on mechs in support files, and THEN compute percentages within these new groups.

Another note: Currently, follow-on mech mapping only occurs for non-HTS indicators, because the follow-on mech table only appears in this file and not in the HTS Disagg Tool. Follow-on mech mapping should apply to both HTS and non-HTS indicators.

This is a very fragile part of the process currently with many downstream impacts.

For clustered OUs, datimvalidation::validateData output may be confusing

When datimvalidation::validateData is fed a PSNU import file for a Clustered OU, it outputs a log showing validation rule violations at the PSNU level, made possible because of operations in distributeCluster which moves data from cluster to PSNU level in producing the PSNU import file.

This makes it difficult for reviewers (e.g., SI Advisors) to identify where in their Disagg Tools (at the Cluster level) to correct when looking at violations expressed at PSNU level in validateData's output.

For clustered OUs, could we perform validateData validations as part of datapackimporter::ImportSheets, prior to data being distributed from cluster to psnu, and also to incorporate these validation checks earlier?

This change would require validateData to be in sync with cluster mappings.

Filter out values -0.5 < x <0.5

These appear in Disagg Tools as -0 or 0 and this issue is tripping up many Country Teams. Given that these are approved as they're seen in the Disagg Tool -- as zeros -- and our team has been instructed to drop all zero values, filter these out with zeros.

Debate over whether to cut off at 0.4 or 0.49 or 0.45 - Excel visually rounds anything <0.5 and >-0.5 to 0. See below.

image

Certain numerators are not appearing

Certain numerators like D_tb_art_fy19 are present in the disagg tool, but are not being distributed into the SLRT.

I suspect this has to do with the data element mapping.

Investigating now.

Thoughts @jacksonsj?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.