GithubHelp home page GithubHelp logo

Comments (19)

ddooley avatar ddooley commented on July 17, 2024 1

I did that already!
And will make change to code to handle the "#validated using ... " message better. We need a way to control, for each template or export format, whether it is added, and under what column label. I'll make it a kind of field data type, so people can change the column name of it.

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024 1

At moment test data doesn't validate on the new Sequencing Centre selections. But I have all the field changes working except waiting for new comments field for version.

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024 1

Technical note: I’m thinking of switching file export output for ".csv" and ".txt" files to UTF-16 with no BOM. Looks like the easiest way to do that is have “txt” as the booktype rather than “csv” (while keeping the delimiters appropriate for csv and tsv).

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

Can put a hold on this until after CanCOGeN meeting on Friday 2020-Oct-16; they maybe be able to resolve so of these issues on their end.

I will follow-up afterwards.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

Updates - we need to implement to following changes:

  1. Include a "Sequencing Centre" field.
  2. Change "Primary Specimen Identification Number" to "Primary Specimen ID".
  3. Remove the "#validated using data harmonizer version 0.13.1" etc. output from the header row. Lev is going to send me a field name that they use for additional comments that we can output that information into for each observation.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024
  1. @cmrn-rhi will add a "Sequencing Centre" picklist to the "DataHarmonizer Templates" document that will be used to populate the new field.

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024

Ah, though I see you mention "picklist" so setting up some values for the field. Its a select field then.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

Yes! I just made it a select field and added the list (from the CNPHI template) to the DataHarmonizer templates (rows 880-898 at this time).

Update: CNPHI doesn't include the null values that we do on their list, so I did a manual check (via order review) and it accepted all of them. Seems that it just requires the field be filled and doesn't have restrictions on what it is filled with.

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024

I've set it up so that on any template, we can specify a field of datatype = provenance which at moment then receives a "DataHarmonizer vX.Y.Z" value in first row of that column's data - but only if you press "Validate". So we can talk about what kind of functionality tweaks we should have for this, i.e. do we

  • want dataharmonizer version to be included anytime someone saves the file, or just updated when someone presses Validate?
  • want the version info to be written to every row of output data?
    One can actually put other content in the provenance field. Version will always be inserted before that content.
    The specification is shown in the CanCoGen Covid19 tab, on line 120.
    This is demonstrated on the multi-template DataHarmonizer website.

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024

Resolved that version info is written to each row of output data.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

Having an issue with primary specimen id now as there appears to be some leading characters that make it unrecognizable to CNPHI. I can see these chars when view with UTF-7 encoding.

image

I double checked previous DH -> CNPHI export files and these characters did not appear in them.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

leading characters = 

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024

Can you send me the test file containing this? Is it in header row? Not seeing it in some saved Laser dataset.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

Sent via SLACK.

I'm also seeing those leading chars in files I saved from DataHarmonizer yesterday. Looking at some previous DH1 testing I did (October 6th, 2020) and neither the saved files nor CNPHI export had these leading characters in them.

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024

Ok, so in the current javascript generated export file, there is a BOM header that is often included with UTF-8 to indicate that the file is encoded in UTF-8. Unpleasant that CNPHI objects to the presence of this on import because these same characters enable Excel to recognize a comma delimited file as such. I'm just checking that out and determining whether those characters can be safely dropped or if we have to tailor something just for the CNPHI export.

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024

Solution was to create a special .csv output format encoded as UTF-8 but without the BOM. Awaiting test on whether that works with character sets going into CNPHI. Alternately we force output format to ASCII.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

CNPHI now accepts the import; however, they don't appear to accept symbols in UTF-8 (at least not greater than or equal to, and degree)

CNPHI Upload Error:
image

Have emailed NML/CNPHI liaison outlining these issues and asked:

  1. Does CNPHI require ASCII text?
  2. Are there other acceptable characters for mathematical logic?
  3. Would it be possible for CNPHI to accept the BOM?

from dataharmonizer.

ddooley avatar ddooley commented on July 17, 2024

This might be fixed now with a .csv (ASCII) format save. Awaiting testing.

from dataharmonizer.

cmrn-rhi avatar cmrn-rhi commented on July 17, 2024

.csv (ASCII) format has been tested and there have been no further issues.

from dataharmonizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.