<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I'll second <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

FeatureExtraction2 - Support place_of_service_concept_id about featureextraction HOT 11 CLOSED

ohdsi commented on June 2, 2024

FeatureExtraction2 - Support place_of_service_concept_id

from featureextraction.

Comments (11)

schuemie commented on June 2, 2024

Hi @gowthamrao ,

Just trying to understand what the feature is that you'd like to construct. Do you want to know the unique list of places of service where a person has received care? Or do you want one list where they got their drugs, another where they were diagnosed? Or do you want a list for every drug-place combination, every visit-place combination, etc.?

from featureextraction.

gowthamrao commented on June 2, 2024

@schuemie

Do you want to know the unique list of places of service where a person has received care?

we want count of visit records per place of service e.g. number of unique visits per person to doctor's office

Or do you want one list where they got their drugs, another where they were diagnosed?

yes - number of records in drug_occurrence at doctor's office. e.g. number of unique visits to doctors office per person where chemotherapy was administered
or
number of records in visit_occurrence where primary diagnosis was 'influenza'
or
number of unique visits to with inpatient place of service per person where primary diagnosis was 'chest pain' and procedure was 'cardiac catheterization'

Or do you want a list for every drug-place combination, every visit-place combination, etc.?

from featureextraction.

gowthamrao commented on June 2, 2024

@schuemie - i was thinking of something like below, where @place_of_service will be a new optional parameter in R-function call of FeatureExtraction2.

ConceptCount.sql

SELECT subject_id,
	cohort_start_date,
{@temporal} ? {
    time_id,
}
	concept_count
INTO #raw_data
} : {
SELECT 1000 + @analysis_id AS covariate_id,
{@temporal} ? {
    time_id,
}
	row_id,
	concept_count AS covariate_value
INTO @covariate_table
}
FROM (
	SELECT
{@temporal} ? {
		time_id,
}
{@aggregated} ? {
		subject_id,
		cohort_start_date,
} : {
		cohort.@row_id_field AS row_id,
}
{@sub_type == 'distinct'} ? {
		COUNT(DISTINCT @domain_concept_id) AS concept_count
} : {
		COUNT(*) AS concept_count
}
	FROM @cohort_table cohort
	INNER JOIN @cdm_database_schema.@domain_table
		ON cohort.subject_id = @domain_table.person_id
{@place_of_service} ? {
	{@domain_table == 'visit_occurrence'} ? {
		INNER JOIN @cdm_database_schema.care_site care_site
			 ON @domain_table.care_site_id = care_site.care_site_id
	}
	{@domain_table %in% ('condition_occurrence', 'device_exposure', 'drug_exposure', 'measurement','observation','procedure_occurrence')} ? {
		INNER JOIN @cdm_database_schema.visit_occurrence visit_occurrence
			 ON @domain_table.visit_occurrence_id = visit_occurrence.visit_occurrence_id
		INNER JOIN @cdm_database_schema.care_site care_site
			 ON visit_occurrence.care_site_id = care_site.care_site_id
	}
}
{@temporal} ? {
	INNER JOIN #time_period time_period
		ON @domain_start_date <= DATEADD(DAY, time_period.end_day, cohort.cohort_start_date)
		AND @domain_end_date >= DATEADD(DAY, time_period.start_day, cohort.cohort_start_date)
	WHERE @domain_concept_id != 0
} : {
	WHERE @domain_start_date <= DATEADD(DAY, @end_day, cohort.cohort_start_date)
		AND @domain_end_date >= DATEADD(DAY, @start_day, cohort.cohort_start_date)
		AND @domain_concept_id != 0
}
{@excluded_concept_table != ''} ? {
		AND @domain_concept_id NOT IN (SELECT id FROM @excluded_concept_table)
		{@place_of_service} ? {
			AND care_site.place_of_service_concept_id NOT IN (SELECT id from @excluded_concept_table)
		}
}
{@included_concept_table != ''} ? {
		AND @domain_concept_id IN (SELECT id FROM @included_concept_table)
		{@place_of_service} ? {
			AND care_site.place_of_service_concept_id IN (SELECT id from @included_concept_table)
		}
}
{@cohort_definition_id != -1} ? {		AND cohort.cohort_definition_id = @cohort_definition_id}
	GROUP BY
{@temporal} ? {
		time_id,
}
{@aggregated} ? {
		subject_id,
		cohort_start_date
} : {
		cohort.@row_id_field
}
	) raw_data;

from featureextraction.

schuemie commented on June 2, 2024

Just checking: do you really mean place_of_service_concept_id, or are you actually thinking of visit_concept_id? The first can have many values, like:

concept_id	concept_name
8782	Urgent Care Facility
8809	Independent Laboratory
8827	Custodial Care Facility
8844	Other Place of Service
8850	Ambulance - Air or Water
8851	Group Home
8858	Mass Immunization Center
…	…

The second can have only five different values:

concept_id	concept_name
9201	Inpatient Visit
42898160	Long Term Care Visit
9203	Emergency Room Visit
9202	Outpatient Visit
262	Emergency Room and Inpatient Visit

from featureextraction.

gowthamrao commented on June 2, 2024

Definitely place_of_service_concept_id. visit_concept_id is a derived field that is calculated during the ETL process, most source data have place_of_service_concept_id. http://www.ohdsi.org/web/atlas/#/conceptset/1655609/details

e.g. visit_concept_id has 'Outpatient Visit', but it includes many place_of_service_concept_id's like

Concept ID	Concept Code	Concept Name	Standard Concept
8883	24	Ambulatory Surgical Center	S
8716	49	Independent Clinic	S
8940	11	Office	S
8977	71	Public Health Clinic	S
8761	72	Rural Health Clinic	S
8782	20	Urgent Care Facility	S
38003620	17	Walk---in Retail Health Clinic	S

using just visit_concept_id is lossy. e.g. we want to separately calculate 8883, and 8940, and combination of 8863 (Skilled Nursing Facility) & 8676 (Nursing Facility). We want to calculate at both concept_id level and combination of concept_id's

from featureextraction.

gowthamrao commented on June 2, 2024

Use case -- if we are predicting admission to hospital outcome for a cohort of diabetics, we would like to use the feature - number of 8940 place of service visit occurrence in long-term period prior to cohort entry.

from featureextraction.

davekern commented on June 2, 2024

I'll second @gowthamrao suggestion. Definitely would be useful to have by specific place_of_service_concept_id, especially for the outpatient category in visit_concept_id.
And, if possible, to create an "all other outpatient" category based on those that haven't been broken out on their own. For example if I want to look at Office visits (8940) and Nursing facility stays (8863 & 8676) I'd also want to see all other outpatient services not included in either of those buckets.

from featureextraction.

schuemie commented on June 2, 2024

@gowthamrao : do you want to take a stab at implementing the proposed changes? I've just brought the develop branch up-to-date, we can land your pull-requests there and work on debugging the code.

from featureextraction.

schuemie commented on June 2, 2024

Hi @gowthamrao . Sorry, took me a while to make time to take a thorough look at your code. I have some questions and comments:

In general:

I thought the idea was to have counts per place_of_service_concept_id? Currently this code is just counting visits that have some (combination of) type of records associated with them.

Also, there are currently 128 new analyses which each generate exactly 1 covariate ID. Our analysis ID space is limited to 3 digits while our covariate ID space is virtually limitless. Maybe we could group some of these analyses?

More specific:

What is the difference between analysis 120 (VisitOccurrenceRecordCountsLongTerm) and analysis 917 (VisitCountLongTerm)?

I'm having trouble understand what the purpose of analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm) is. What the code does is count the number of visits in the window that are associated with care sites that do not have care_site_id = 0. What is the significance of care_site_id = 0?

Similarly, what is the purpose of analysis 122 (VisitOccurrenceRecordCountsByCoLongTerm)? This counts the number of visits in the window that are associated with a condition_occurrence, where the visit_occurrence_id is not 0. visit_occurrence_id is an auto-increment number. What is the significance of visit_occurrence_id = 0?

Comments and questions specific to RecordCounts.sql:

line 141: please don't hardcode analysis IDs. What if someone want to use different analysis IDs? This is precisely what the sub_type field is for (which you didn't use).

line 142: @domain_table_id only works because @domain_table is replaced with the table name. This is a bit confusing, because it suggests there is a variable called @domain_table_id. Why not simply count the number of records using COUNT(*)?

line 240: why would @domain_table_id be 0? (It is an auto-increment number). What is the significance of @domain_table_id = 0? Do you mean @domain_concept_id?

line 246: Here's the only difference I see between analysis 120 and 917: analysis 917 includes visits where the end date is after the window start, even though the visit start might be before the window start. In contrast, analysis 120 only considered visits that start in the window. Is the distinction important? If it is, this is something to explicitly mention in the descriptions.

Line 369-382: Having just 2 different covariate names for 128 different covariates might not be very informative.

from featureextraction.

gowthamrao commented on June 2, 2024

Can you help us understand the concept of covariates?

from featureextraction.

schuemie commented on June 2, 2024

Ok, let me give it a shot:

First, note that I tend to use the words 'covariate', 'feature', and 'baseline characteristic' interchangeably.

An analysis is defined as a process that creates one or more similar covariates. For example, prespecified analysis 102 creates a separate covariate for each condition concept encountered in the long-term window (default is past 365 days).

A covariate is a variable that will have one value for each entry in the cohort table, and that value can be different for each cohort entry. One such covariate is whether the concept 'fever' (concept ID 437663) has been encountered in the long term window. In analysis 102, we construct a unique covariate ID by concatenating the concept ID to the analysis ID, so this covariate will have ID 437663102. In this case, this covariate is binary; it can only have the value 0 (concept not found in window) or 1 (concept found in window).

So if I think about your analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm), I would expect it to generate one covariate per unique place_of_service_concept_id.

from featureextraction.

FeatureExtraction2 - Support place_of_service_concept_id about featureextraction HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs