Towards a reporting standard for NLP-assisted observational research
This work is an example implementation of Daniel G Smith's proposed reporting standard, in which he notes:
We therefore encourage the use of reporting standards, heavily influenced by the work of the Hongfang lab and their collaborators (Fu et al., 2023; Wang et al., 2022) in an effort to support FAIR and RITE principals in methods promoting evidence generation using datasets containing NLP.
PoC study details: https://github.com/OHDSI/NLPTools/wiki/NOTE-NLP-Proposal-POC
- Detailed Cohort Definition (Atlas)
"ConceptSets": [
{
"id": 0,
"name": "Simple brain",
"expression": {
"items": []
}
},
{
"id": 1,
"name": "Simple brainer",
"expression": {
"items": [
{
"concept": {
"CONCEPT_ID": 443588,
"CONCEPT_NAME": "Malignant neoplasm of brain",
"STANDARD_CONCEPT": "S",
"STANDARD_CONCEPT_CAPTION": "Standard",
"INVALID_REASON": "V",
"INVALID_REASON_CAPTION": "Valid",
"CONCEPT_CODE": "428061005",
"DOMAIN_ID": "Condition",
"VOCABULARY_ID": "SNOMED",
"CONCEPT_CLASS_ID": "Clinical Finding"
},
"includeDescendants": true,
"includeMapped": true
}
]
}
}
],
"PrimaryCriteria": {
"CriteriaList": [
{
"ConditionOccurrence": {
"CodesetId": 1,
"OccurrenceStartDate": {
"Value": "2010-01-01",
"Extent": "2020-01-01",
"Op": "bt"
}
}
}
],
"ObservationWindow": {
"PriorDays": 0,
"PostDays": 0
},
"PrimaryCriteriaLimit": {
"Type": "First"
}
},
"QualifiedLimit": {
"Type": "First"
},
"ExpressionLimit": {
"Type": "First"
},
"InclusionRules": [],
"CensoringCriteria": [],
"CollapseSettings": {
"CollapseType": "ERA",
"EraPad": 0
},
"CensorWindow": {},
"cdmVersionRange": ">=5.0.0"
}
- Methods for creating cohort
- Generation in Atlas on the Tufts Medical Center TRDW "TMC_RED" database
- Evaluation of Individuals in Cohort
- NA (CohortDiagnostics results?)
- Validity of Cohort screen methods reported
- NA
- Definitions of context
- Synonym normalization (e.g., UMLS, ICDβOβ3, NAACCR, CAP)
- Definitions of rules, patterns, and logic
- References and code access to any published part of pipeline
- generic text processors and statistical models
- Primary NLP model utilized (see definition of note_nlp.nlp_system guidance)
- Measurement level (e.g., concept, document, patient)
- Duration of NLP evaluation
- Refinement procedures based on error analysis