GithubHelp home page GithubHelp logo

Comments (11)

schuemie avatar schuemie commented on June 2, 2024

Hi @gowthamrao ,

Just trying to understand what the feature is that you'd like to construct. Do you want to know the unique list of places of service where a person has received care? Or do you want one list where they got their drugs, another where they were diagnosed? Or do you want a list for every drug-place combination, every visit-place combination, etc.?

from featureextraction.

gowthamrao avatar gowthamrao commented on June 2, 2024

@schuemie

Do you want to know the unique list of places of service where a person has received care?

we want count of visit records per place of service e.g. number of unique visits per person to doctor's office

Or do you want one list where they got their drugs, another where they were diagnosed?

yes - number of records in drug_occurrence at doctor's office. e.g. number of unique visits to doctors office per person where chemotherapy was administered
or
number of records in visit_occurrence where primary diagnosis was 'influenza'
or
number of unique visits to with inpatient place of service per person where primary diagnosis was 'chest pain' and procedure was 'cardiac catheterization'

Or do you want a list for every drug-place combination, every visit-place combination, etc.?

no

from featureextraction.

gowthamrao avatar gowthamrao commented on June 2, 2024

@schuemie - i was thinking of something like below, where @place_of_service will be a new optional parameter in R-function call of FeatureExtraction2.

ConceptCount.sql

SELECT subject_id,
	cohort_start_date,
{@temporal} ? {
    time_id,
}
	concept_count
INTO #raw_data
} : {
SELECT 1000 + @analysis_id AS covariate_id,
{@temporal} ? {
    time_id,
}
	row_id,
	concept_count AS covariate_value
INTO @covariate_table
}
FROM (
	SELECT
{@temporal} ? {
		time_id,
}
{@aggregated} ? {
		subject_id,
		cohort_start_date,
} : {
		cohort.@row_id_field AS row_id,
}
{@sub_type == 'distinct'} ? {
		COUNT(DISTINCT @domain_concept_id) AS concept_count
} : {
		COUNT(*) AS concept_count
}
	FROM @cohort_table cohort
	INNER JOIN @cdm_database_schema.@domain_table
		ON cohort.subject_id = @domain_table.person_id
{@place_of_service} ? {
	{@domain_table == 'visit_occurrence'} ? {
		INNER JOIN @cdm_database_schema.care_site care_site
			 ON @domain_table.care_site_id = care_site.care_site_id
	}
	{@domain_table %in% ('condition_occurrence', 'device_exposure', 'drug_exposure', 'measurement','observation','procedure_occurrence')} ? {
		INNER JOIN @cdm_database_schema.visit_occurrence visit_occurrence
			 ON @domain_table.visit_occurrence_id = visit_occurrence.visit_occurrence_id
		INNER JOIN @cdm_database_schema.care_site care_site
			 ON visit_occurrence.care_site_id = care_site.care_site_id
	}
}
{@temporal} ? {
	INNER JOIN #time_period time_period
		ON @domain_start_date <= DATEADD(DAY, time_period.end_day, cohort.cohort_start_date)
		AND @domain_end_date >= DATEADD(DAY, time_period.start_day, cohort.cohort_start_date)
	WHERE @domain_concept_id != 0
} : {
	WHERE @domain_start_date <= DATEADD(DAY, @end_day, cohort.cohort_start_date)
		AND @domain_end_date >= DATEADD(DAY, @start_day, cohort.cohort_start_date)
		AND @domain_concept_id != 0
}
{@excluded_concept_table != ''} ? {
		AND @domain_concept_id NOT IN (SELECT id FROM @excluded_concept_table)
		{@place_of_service} ? {
			AND care_site.place_of_service_concept_id NOT IN (SELECT id from @excluded_concept_table)
		}
}
{@included_concept_table != ''} ? {
		AND @domain_concept_id IN (SELECT id FROM @included_concept_table)
		{@place_of_service} ? {
			AND care_site.place_of_service_concept_id IN (SELECT id from @included_concept_table)
		}
}
{@cohort_definition_id != -1} ? {		AND cohort.cohort_definition_id = @cohort_definition_id}
	GROUP BY
{@temporal} ? {
		time_id,
}
{@aggregated} ? {
		subject_id,
		cohort_start_date
} : {
		cohort.@row_id_field
}
	) raw_data;

from featureextraction.

schuemie avatar schuemie commented on June 2, 2024

Just checking: do you really mean place_of_service_concept_id, or are you actually thinking of visit_concept_id? The first can have many values, like:

concept_id concept_name
8782 Urgent Care Facility
8809 Independent Laboratory
8827 Custodial Care Facility
8844 Other Place of Service
8850 Ambulance - Air or Water
8851 Group Home
8858 Mass Immunization Center

The second can have only five different values:

concept_id concept_name
9201 Inpatient Visit
42898160 Long Term Care Visit
9203 Emergency Room Visit
9202 Outpatient Visit
262 Emergency Room and Inpatient Visit

from featureextraction.

gowthamrao avatar gowthamrao commented on June 2, 2024

Definitely place_of_service_concept_id. visit_concept_id is a derived field that is calculated during the ETL process, most source data have place_of_service_concept_id. http://www.ohdsi.org/web/atlas/#/conceptset/1655609/details

e.g. visit_concept_id has 'Outpatient Visit', but it includes many place_of_service_concept_id's like

Concept ID Concept Code Concept Name Standard Concept
8883 24 Ambulatory Surgical Center S
8716 49 Independent Clinic S
8940 11 Office S
8977 71 Public Health Clinic S
8761 72 Rural Health Clinic S
8782 20 Urgent Care Facility S
38003620 17 Walk---in Retail Health Clinic S

using just visit_concept_id is lossy. e.g. we want to separately calculate 8883, and 8940, and combination of 8863 (Skilled Nursing Facility) & 8676 (Nursing Facility). We want to calculate at both concept_id level and combination of concept_id's

from featureextraction.

gowthamrao avatar gowthamrao commented on June 2, 2024

Use case -- if we are predicting admission to hospital outcome for a cohort of diabetics, we would like to use the feature - number of 8940 place of service visit occurrence in long-term period prior to cohort entry.

from featureextraction.

davekern avatar davekern commented on June 2, 2024

I'll second @gowthamrao suggestion. Definitely would be useful to have by specific place_of_service_concept_id, especially for the outpatient category in visit_concept_id.
And, if possible, to create an "all other outpatient" category based on those that haven't been broken out on their own. For example if I want to look at Office visits (8940) and Nursing facility stays (8863 & 8676) I'd also want to see all other outpatient services not included in either of those buckets.

from featureextraction.

schuemie avatar schuemie commented on June 2, 2024

@gowthamrao : do you want to take a stab at implementing the proposed changes? I've just brought the develop branch up-to-date, we can land your pull-requests there and work on debugging the code.

from featureextraction.

schuemie avatar schuemie commented on June 2, 2024

Hi @gowthamrao . Sorry, took me a while to make time to take a thorough look at your code. I have some questions and comments:

In general:

I thought the idea was to have counts per place_of_service_concept_id? Currently this code is just counting visits that have some (combination of) type of records associated with them.

Also, there are currently 128 new analyses which each generate exactly 1 covariate ID. Our analysis ID space is limited to 3 digits while our covariate ID space is virtually limitless. Maybe we could group some of these analyses?

More specific:

What is the difference between analysis 120 (VisitOccurrenceRecordCountsLongTerm) and analysis 917 (VisitCountLongTerm)?

I'm having trouble understand what the purpose of analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm) is. What the code does is count the number of visits in the window that are associated with care sites that do not have care_site_id = 0. What is the significance of care_site_id = 0?

Similarly, what is the purpose of analysis 122 (VisitOccurrenceRecordCountsByCoLongTerm)? This counts the number of visits in the window that are associated with a condition_occurrence, where the visit_occurrence_id is not 0. visit_occurrence_id is an auto-increment number. What is the significance of visit_occurrence_id = 0?

Comments and questions specific to RecordCounts.sql:

line 141: please don't hardcode analysis IDs. What if someone want to use different analysis IDs? This is precisely what the sub_type field is for (which you didn't use).

line 142: @domain_table_id only works because @domain_table is replaced with the table name. This is a bit confusing, because it suggests there is a variable called @domain_table_id. Why not simply count the number of records using COUNT(*)?

line 240: why would @domain_table_id be 0? (It is an auto-increment number). What is the significance of @domain_table_id = 0? Do you mean @domain_concept_id?

line 246: Here's the only difference I see between analysis 120 and 917: analysis 917 includes visits where the end date is after the window start, even though the visit start might be before the window start. In contrast, analysis 120 only considered visits that start in the window. Is the distinction important? If it is, this is something to explicitly mention in the descriptions.

Line 369-382: Having just 2 different covariate names for 128 different covariates might not be very informative.

from featureextraction.

gowthamrao avatar gowthamrao commented on June 2, 2024

Can you help us understand the concept of covariates?

from featureextraction.

schuemie avatar schuemie commented on June 2, 2024

Ok, let me give it a shot:

First, note that I tend to use the words 'covariate', 'feature', and 'baseline characteristic' interchangeably.

An analysis is defined as a process that creates one or more similar covariates. For example, prespecified analysis 102 creates a separate covariate for each condition concept encountered in the long-term window (default is past 365 days).

A covariate is a variable that will have one value for each entry in the cohort table, and that value can be different for each cohort entry. One such covariate is whether the concept 'fever' (concept ID 437663) has been encountered in the long term window. In analysis 102, we construct a unique covariate ID by concatenating the concept ID to the analysis ID, so this covariate will have ID 437663102. In this case, this covariate is binary; it can only have the value 0 (concept not found in window) or 1 (concept found in window).

So if I think about your analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm), I would expect it to generate one covariate per unique place_of_service_concept_id.

from featureextraction.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.