Comments (11)
Hi @gowthamrao ,
Just trying to understand what the feature is that you'd like to construct. Do you want to know the unique list of places of service where a person has received care? Or do you want one list where they got their drugs, another where they were diagnosed? Or do you want a list for every drug-place combination, every visit-place combination, etc.?
from featureextraction.
Do you want to know the unique list of places of service where a person has received care?
we want count of visit records per place of service e.g. number of unique visits per person to doctor's office
Or do you want one list where they got their drugs, another where they were diagnosed?
yes - number of records in drug_occurrence at doctor's office. e.g. number of unique visits to doctors office per person where chemotherapy was administered
or
number of records in visit_occurrence where primary diagnosis was 'influenza'
or
number of unique visits to with inpatient place of service per person where primary diagnosis was 'chest pain' and procedure was 'cardiac catheterization'
Or do you want a list for every drug-place combination, every visit-place combination, etc.?
no
from featureextraction.
@schuemie - i was thinking of something like below, where @place_of_service will be a new optional parameter in R-function call of FeatureExtraction2.
ConceptCount.sql
SELECT subject_id,
cohort_start_date,
{@temporal} ? {
time_id,
}
concept_count
INTO #raw_data
} : {
SELECT 1000 + @analysis_id AS covariate_id,
{@temporal} ? {
time_id,
}
row_id,
concept_count AS covariate_value
INTO @covariate_table
}
FROM (
SELECT
{@temporal} ? {
time_id,
}
{@aggregated} ? {
subject_id,
cohort_start_date,
} : {
cohort.@row_id_field AS row_id,
}
{@sub_type == 'distinct'} ? {
COUNT(DISTINCT @domain_concept_id) AS concept_count
} : {
COUNT(*) AS concept_count
}
FROM @cohort_table cohort
INNER JOIN @cdm_database_schema.@domain_table
ON cohort.subject_id = @domain_table.person_id
{@place_of_service} ? {
{@domain_table == 'visit_occurrence'} ? {
INNER JOIN @cdm_database_schema.care_site care_site
ON @domain_table.care_site_id = care_site.care_site_id
}
{@domain_table %in% ('condition_occurrence', 'device_exposure', 'drug_exposure', 'measurement','observation','procedure_occurrence')} ? {
INNER JOIN @cdm_database_schema.visit_occurrence visit_occurrence
ON @domain_table.visit_occurrence_id = visit_occurrence.visit_occurrence_id
INNER JOIN @cdm_database_schema.care_site care_site
ON visit_occurrence.care_site_id = care_site.care_site_id
}
}
{@temporal} ? {
INNER JOIN #time_period time_period
ON @domain_start_date <= DATEADD(DAY, time_period.end_day, cohort.cohort_start_date)
AND @domain_end_date >= DATEADD(DAY, time_period.start_day, cohort.cohort_start_date)
WHERE @domain_concept_id != 0
} : {
WHERE @domain_start_date <= DATEADD(DAY, @end_day, cohort.cohort_start_date)
AND @domain_end_date >= DATEADD(DAY, @start_day, cohort.cohort_start_date)
AND @domain_concept_id != 0
}
{@excluded_concept_table != ''} ? {
AND @domain_concept_id NOT IN (SELECT id FROM @excluded_concept_table)
{@place_of_service} ? {
AND care_site.place_of_service_concept_id NOT IN (SELECT id from @excluded_concept_table)
}
}
{@included_concept_table != ''} ? {
AND @domain_concept_id IN (SELECT id FROM @included_concept_table)
{@place_of_service} ? {
AND care_site.place_of_service_concept_id IN (SELECT id from @included_concept_table)
}
}
{@cohort_definition_id != -1} ? { AND cohort.cohort_definition_id = @cohort_definition_id}
GROUP BY
{@temporal} ? {
time_id,
}
{@aggregated} ? {
subject_id,
cohort_start_date
} : {
cohort.@row_id_field
}
) raw_data;
from featureextraction.
Just checking: do you really mean place_of_service_concept_id
, or are you actually thinking of visit_concept_id
? The first can have many values, like:
concept_id | concept_name |
---|---|
8782 | Urgent Care Facility |
8809 | Independent Laboratory |
8827 | Custodial Care Facility |
8844 | Other Place of Service |
8850 | Ambulance - Air or Water |
8851 | Group Home |
8858 | Mass Immunization Center |
… | … |
The second can have only five different values:
concept_id | concept_name |
---|---|
9201 | Inpatient Visit |
42898160 | Long Term Care Visit |
9203 | Emergency Room Visit |
9202 | Outpatient Visit |
262 | Emergency Room and Inpatient Visit |
from featureextraction.
Definitely place_of_service_concept_id
. visit_concept_id is a derived field that is calculated during the ETL process, most source data have place_of_service_concept_id
. http://www.ohdsi.org/web/atlas/#/conceptset/1655609/details
e.g. visit_concept_id has 'Outpatient Visit', but it includes many place_of_service_concept_id's like
Concept ID | Concept Code | Concept Name | Standard Concept |
---|---|---|---|
8883 | 24 | Ambulatory Surgical Center | S |
8716 | 49 | Independent Clinic | S |
8940 | 11 | Office | S |
8977 | 71 | Public Health Clinic | S |
8761 | 72 | Rural Health Clinic | S |
8782 | 20 | Urgent Care Facility | S |
38003620 | 17 | Walk---in Retail Health Clinic | S |
using just visit_concept_id is lossy. e.g. we want to separately calculate 8883, and 8940, and combination of 8863 (Skilled Nursing Facility) & 8676 (Nursing Facility). We want to calculate at both concept_id level and combination of concept_id's
from featureextraction.
Use case -- if we are predicting admission to hospital outcome for a cohort of diabetics, we would like to use the feature - number of 8940 place of service visit occurrence in long-term period prior to cohort entry.
from featureextraction.
I'll second @gowthamrao suggestion. Definitely would be useful to have by specific place_of_service_concept_id
, especially for the outpatient category in visit_concept_id
.
And, if possible, to create an "all other outpatient" category based on those that haven't been broken out on their own. For example if I want to look at Office visits (8940) and Nursing facility stays (8863 & 8676) I'd also want to see all other outpatient services not included in either of those buckets.
from featureextraction.
@gowthamrao : do you want to take a stab at implementing the proposed changes? I've just brought the develop branch up-to-date, we can land your pull-requests there and work on debugging the code.
from featureextraction.
Hi @gowthamrao . Sorry, took me a while to make time to take a thorough look at your code. I have some questions and comments:
In general:
I thought the idea was to have counts per place_of_service_concept_id? Currently this code is just counting visits that have some (combination of) type of records associated with them.
Also, there are currently 128 new analyses which each generate exactly 1 covariate ID. Our analysis ID space is limited to 3 digits while our covariate ID space is virtually limitless. Maybe we could group some of these analyses?
More specific:
What is the difference between analysis 120 (VisitOccurrenceRecordCountsLongTerm) and analysis 917 (VisitCountLongTerm)?
I'm having trouble understand what the purpose of analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm) is. What the code does is count the number of visits in the window that are associated with care sites that do not have care_site_id = 0. What is the significance of care_site_id = 0?
Similarly, what is the purpose of analysis 122 (VisitOccurrenceRecordCountsByCoLongTerm)? This counts the number of visits in the window that are associated with a condition_occurrence, where the visit_occurrence_id is not 0. visit_occurrence_id is an auto-increment number. What is the significance of visit_occurrence_id = 0?
Comments and questions specific to RecordCounts.sql
:
line 141: please don't hardcode analysis IDs. What if someone want to use different analysis IDs? This is precisely what the sub_type field is for (which you didn't use).
line 142: @domain_table_id only works because @domain_table is replaced with the table name. This is a bit confusing, because it suggests there is a variable called @domain_table_id. Why not simply count the number of records using COUNT(*)?
line 240: why would @domain_table_id be 0? (It is an auto-increment number). What is the significance of @domain_table_id = 0? Do you mean @domain_concept_id?
line 246: Here's the only difference I see between analysis 120 and 917: analysis 917 includes visits where the end date is after the window start, even though the visit start might be before the window start. In contrast, analysis 120 only considered visits that start in the window. Is the distinction important? If it is, this is something to explicitly mention in the descriptions.
Line 369-382: Having just 2 different covariate names for 128 different covariates might not be very informative.
from featureextraction.
Can you help us understand the concept of covariates?
from featureextraction.
Ok, let me give it a shot:
First, note that I tend to use the words 'covariate', 'feature', and 'baseline characteristic' interchangeably.
An analysis is defined as a process that creates one or more similar covariates. For example, prespecified analysis 102 creates a separate covariate for each condition concept encountered in the long-term window (default is past 365 days).
A covariate is a variable that will have one value for each entry in the cohort table, and that value can be different for each cohort entry. One such covariate is whether the concept 'fever' (concept ID 437663) has been encountered in the long term window. In analysis 102, we construct a unique covariate ID by concatenating the concept ID to the analysis ID, so this covariate will have ID 437663102. In this case, this covariate is binary; it can only have the value 0 (concept not found in window) or 1 (concept found in window).
So if I think about your analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm), I would expect it to generate one covariate per unique place_of_service_concept_id.
from featureextraction.
Related Issues (20)
- Switch unit tests to use temp cohort tables to avoid collisions between runs HOT 2
- Throw an error if two covariate builders generate overlapping covariate IDs
- Create cohort-based standard table 1 HOT 7
- Add function for creating interactions between covariates
- FeatureExtraction (as well as CohortDiagnostics) not compatible with newest versions of R? HOT 2
- EHR-based electronic frailty index HOT 1
- minimum threshold for covariate mean HOT 1
- in arrow branch - tidyCovariates fails if it removes rare, but not redundant covariates HOT 3
- Vignette 'CreatingCustomCovariateBuilders': duplicate cdmVersion arguments HOT 1
- Unit tests failing on various DBMS HOT 2
- Methodology for normalization with max value division HOT 6
- Can Feature Extraction support computation with respect to target cohort_end_date HOT 1
- Fix minor typo in UsingFeatureExtraction.Rmd HOT 1
- Deprecate `cohortId` and use `cohortIds`
- Inclusion of Pediatrics age groups as optional Features
- Atlas - Generate with Reports HOT 2
- Inconsistency of output for temporal covariates HOT 2
- custom covariate feature metaData aggregation issue
- Standardize build tooling and common libs HOT 1
- Error when specifying only one temporal window in temporalCovariateSettings HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from featureextraction.