Comments (19)
I did that already!
And will make change to code to handle the "#validated using ... " message better. We need a way to control, for each template or export format, whether it is added, and under what column label. I'll make it a kind of field data type, so people can change the column name of it.
from dataharmonizer.
At moment test data doesn't validate on the new Sequencing Centre selections. But I have all the field changes working except waiting for new comments field for version.
from dataharmonizer.
Technical note: I’m thinking of switching file export output for ".csv" and ".txt" files to UTF-16 with no BOM. Looks like the easiest way to do that is have “txt” as the booktype rather than “csv” (while keeping the delimiters appropriate for csv and tsv).
from dataharmonizer.
Can put a hold on this until after CanCOGeN meeting on Friday 2020-Oct-16; they maybe be able to resolve so of these issues on their end.
I will follow-up afterwards.
from dataharmonizer.
Updates - we need to implement to following changes:
- Include a "Sequencing Centre" field.
- Change "Primary Specimen Identification Number" to "Primary Specimen ID".
- Remove the "#validated using data harmonizer version 0.13.1" etc. output from the header row. Lev is going to send me a field name that they use for additional comments that we can output that information into for each observation.
from dataharmonizer.
- @cmrn-rhi will add a "Sequencing Centre" picklist to the "DataHarmonizer Templates" document that will be used to populate the new field.
from dataharmonizer.
Ah, though I see you mention "picklist" so setting up some values for the field. Its a select field then.
from dataharmonizer.
Yes! I just made it a select field and added the list (from the CNPHI template) to the DataHarmonizer templates (rows 880-898 at this time).
Update: CNPHI doesn't include the null values that we do on their list, so I did a manual check (via order review) and it accepted all of them. Seems that it just requires the field be filled and doesn't have restrictions on what it is filled with.
from dataharmonizer.
I've set it up so that on any template, we can specify a field of datatype = provenance which at moment then receives a "DataHarmonizer vX.Y.Z" value in first row of that column's data - but only if you press "Validate". So we can talk about what kind of functionality tweaks we should have for this, i.e. do we
- want dataharmonizer version to be included anytime someone saves the file, or just updated when someone presses Validate?
- want the version info to be written to every row of output data?
One can actually put other content in the provenance field. Version will always be inserted before that content.
The specification is shown in the CanCoGen Covid19 tab, on line 120.
This is demonstrated on the multi-template DataHarmonizer website.
from dataharmonizer.
Resolved that version info is written to each row of output data.
from dataharmonizer.
Having an issue with primary specimen id
now as there appears to be some leading characters that make it unrecognizable to CNPHI. I can see these chars when view with UTF-7 encoding.
I double checked previous DH -> CNPHI export files and these characters did not appear in them.
from dataharmonizer.
leading characters = 
from dataharmonizer.
Can you send me the test file containing this? Is it in header row? Not seeing it in some saved Laser dataset.
from dataharmonizer.
Sent via SLACK.
I'm also seeing those leading chars in files I saved from DataHarmonizer yesterday. Looking at some previous DH1 testing I did (October 6th, 2020) and neither the saved files nor CNPHI export had these leading characters in them.
from dataharmonizer.
Ok, so in the current javascript generated export file, there is a BOM header that is often included with UTF-8 to indicate that the file is encoded in UTF-8. Unpleasant that CNPHI objects to the presence of this on import because these same characters enable Excel to recognize a comma delimited file as such. I'm just checking that out and determining whether those characters can be safely dropped or if we have to tailor something just for the CNPHI export.
from dataharmonizer.
Solution was to create a special .csv output format encoded as UTF-8 but without the BOM. Awaiting test on whether that works with character sets going into CNPHI. Alternately we force output format to ASCII.
from dataharmonizer.
CNPHI now accepts the import; however, they don't appear to accept symbols in UTF-8 (at least not greater than or equal to, and degree)
Have emailed NML/CNPHI liaison outlining these issues and asked:
- Does CNPHI require ASCII text?
- Are there other acceptable characters for mathematical logic?
- Would it be possible for CNPHI to accept the BOM?
from dataharmonizer.
This might be fixed now with a .csv (ASCII) format save. Awaiting testing.
from dataharmonizer.
.csv (ASCII) format has been tested and there have been no further issues.
from dataharmonizer.
Related Issues (20)
- Document the fact that DH JSON is a bare list and not compatible with LinkML tools as is HOT 2
- Implement time and datetime datatypes HOT 2
- Add column help sidebar to core library HOT 2
- Can't install HOT 5
- `linkml.py` does not generate a `schema.js` as expected but a `schema.json` HOT 2
- Schema not loading in DataHarmonizer (only Chrome and Edge; Firefox working) HOT 3
- Adding term normalization / ontology lookup feature to DataHarmonizer HOT 1
- parse() function appears to have repeated regular expression compilation?
- Validation(), specifically setDataAtCell(), can be made much more efficient by eliminating re-render on each column. HOT 2
- Handsontable 13.0.1 and Flatpickreditor.js date column cut/paste challenge HOT 2
- Compress image assets HOT 6
- Customizable Help Menu with Linkage to LinkML generated data model schema documentation HOT 1
- Dropdown picklists including print tags in latest release HOT 3
- Proposal: UI for working with non-enum multivalued slots HOT 2
- Changing the language should depend on the available user interface locales, not schema locales
- JSON Export HOT 3
- Tables should be exportable to a supported template language
- Rework "menu.json", "manifest.json" and "tabular_schema.py" script towards an integrated template publishing workflow
- Multi-sheet Excel export HOT 2
- Importing data and schema configuration from standard JSON file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataharmonizer.