Issues with DataHarmonizer -> CNPHI Export Headers: Does <e

Updates - we need to implement to following changes: <ol dir="aut

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

CNPHI Export Header Issues - sequencing centre, primary specimen id, #version about dataharmonizer HOT 19 CLOSED

cidgoh commented on July 17, 2024

CNPHI Export Header Issues - sequencing centre, primary specimen id, #version

from dataharmonizer.

Comments (19)

ddooley commented on July 17, 2024 1

I did that already!
And will make change to code to handle the "#validated using ... " message better. We need a way to control, for each template or export format, whether it is added, and under what column label. I'll make it a kind of field data type, so people can change the column name of it.

from dataharmonizer.

ddooley commented on July 17, 2024 1

At moment test data doesn't validate on the new Sequencing Centre selections. But I have all the field changes working except waiting for new comments field for version.

from dataharmonizer.

ddooley commented on July 17, 2024 1

Technical note: I’m thinking of switching file export output for ".csv" and ".txt" files to UTF-16 with no BOM. Looks like the easiest way to do that is have “txt” as the booktype rather than “csv” (while keeping the delimiters appropriate for csv and tsv).

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

Can put a hold on this until after CanCOGeN meeting on Friday 2020-Oct-16; they maybe be able to resolve so of these issues on their end.

I will follow-up afterwards.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

Updates - we need to implement to following changes:

Include a "Sequencing Centre" field.
Change "Primary Specimen Identification Number" to "Primary Specimen ID".
Remove the "#validated using data harmonizer version 0.13.1" etc. output from the header row. Lev is going to send me a field name that they use for additional comments that we can output that information into for each observation.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

@cmrn-rhi will add a "Sequencing Centre" picklist to the "DataHarmonizer Templates" document that will be used to populate the new field.

from dataharmonizer.

ddooley commented on July 17, 2024

Ah, though I see you mention "picklist" so setting up some values for the field. Its a select field then.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

Yes! I just made it a select field and added the list (from the CNPHI template) to the DataHarmonizer templates (rows 880-898 at this time).

Update: CNPHI doesn't include the null values that we do on their list, so I did a manual check (via order review) and it accepted all of them. Seems that it just requires the field be filled and doesn't have restrictions on what it is filled with.

from dataharmonizer.

ddooley commented on July 17, 2024

I've set it up so that on any template, we can specify a field of datatype = provenance which at moment then receives a "DataHarmonizer vX.Y.Z" value in first row of that column's data - but only if you press "Validate". So we can talk about what kind of functionality tweaks we should have for this, i.e. do we

want dataharmonizer version to be included anytime someone saves the file, or just updated when someone presses Validate?
want the version info to be written to every row of output data?
One can actually put other content in the provenance field. Version will always be inserted before that content.
The specification is shown in the CanCoGen Covid19 tab, on line 120.
This is demonstrated on the multi-template DataHarmonizer website.

from dataharmonizer.

ddooley commented on July 17, 2024

Resolved that version info is written to each row of output data.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

Having an issue with primary specimen id now as there appears to be some leading characters that make it unrecognizable to CNPHI. I can see these chars when view with UTF-7 encoding.

I double checked previous DH -> CNPHI export files and these characters did not appear in them.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

leading characters = ï»¿

from dataharmonizer.

ddooley commented on July 17, 2024

Can you send me the test file containing this? Is it in header row? Not seeing it in some saved Laser dataset.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

Sent via SLACK.

I'm also seeing those leading chars in files I saved from DataHarmonizer yesterday. Looking at some previous DH1 testing I did (October 6th, 2020) and neither the saved files nor CNPHI export had these leading characters in them.

from dataharmonizer.

ddooley commented on July 17, 2024

Ok, so in the current javascript generated export file, there is a BOM header that is often included with UTF-8 to indicate that the file is encoded in UTF-8. Unpleasant that CNPHI objects to the presence of this on import because these same characters enable Excel to recognize a comma delimited file as such. I'm just checking that out and determining whether those characters can be safely dropped or if we have to tailor something just for the CNPHI export.

from dataharmonizer.

ddooley commented on July 17, 2024

Solution was to create a special .csv output format encoded as UTF-8 but without the BOM. Awaiting test on whether that works with character sets going into CNPHI. Alternately we force output format to ASCII.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

CNPHI now accepts the import; however, they don't appear to accept symbols in UTF-8 (at least not greater than or equal to, and degree)

CNPHI Upload Error:

Have emailed NML/CNPHI liaison outlining these issues and asked:

Does CNPHI require ASCII text?
Are there other acceptable characters for mathematical logic?
Would it be possible for CNPHI to accept the BOM?

from dataharmonizer.

ddooley commented on July 17, 2024

This might be fixed now with a .csv (ASCII) format save. Awaiting testing.

from dataharmonizer.

cmrn-rhi commented on July 17, 2024

.csv (ASCII) format has been tested and there have been no further issues.

from dataharmonizer.

CNPHI Export Header Issues - sequencing centre, primary specimen id, #version about dataharmonizer HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs