GithubHelp home page GithubHelp logo

faersdbstats's Introduction

faersdbstats

Standardize FDA LAERS database and FAERS database drugs, indications, reactions and outcomes to OMOP Common Data Model V5 concepts and generate the unique case report counts and safety signal statistics

See the documentation folder for more information.

faersdbstats's People

Contributors

leeevans avatar rsjaffe avatar

Stargazers

Rémi Tod avatar  avatar  avatar Zaj avatar Karel D'Oosterlinck avatar Jason Robinson avatar Klim Zaporojets avatar Vassilis Virvilis avatar eno avatar Kefeng Zhang avatar Grateful Dave avatar AAlami avatar  avatar Ricardo Avila avatar Eizen Kimura avatar Yu GENG avatar Goh Jia Yi, Jesa avatar Andrew Bean avatar Julius Cathalina avatar  avatar Engin Yapici avatar zhangzhanhui avatar  avatar  avatar Quim Aguirre avatar Sam Tomioka avatar Louis Huang avatar Steven Beales avatar JoyShi avatar Hamdi Chau avatar Li meng avatar Richard Zhu avatar James (Cuong) Nguyen avatar  avatar Arina avatar  avatar Shashank Gupta avatar  avatar Zhibo avatar Abhik Seal avatar Alex Makarov avatar Trish Whetzel avatar

Watchers

Steven Beales avatar James Cloos avatar colin b. erdman avatar  avatar OpenSourcePV avatar  avatar  avatar AAlami avatar

faersdbstats's Issues

Updated relationship_id for Brand Names

Hello,

Thank you for posting so much source code for this project, it is an incredibly impressive project!

I noticed in the standardize_combined_drug_mapping.sql file that brand name matching is only available when the relationship_id = 'Tradename of.' However, in the most recent version of the OMOP CDM (maybe a change in v6.0?), this relationship is often stored as 'Brand name of.' Please consider updating the sql file to reflect this change, but otherwise this is still an amazing project, thanks!

TODO: case version analysis

This TODO is for Rich's team at Pitt --

There is an open question about how different case versions are and how much it matters if a specific single version is chosen during analyses. We will sample some % of FAERS reports with multiple cases (e.g., 20%) and then create a table that shows the frequency of fallout and changes across case versions (1, 2, 3, ... max) for each column of case reports.

SQL version in AEOLUS README.txt

Apologies if this question is completely off-base. The paper by Banda et al. in Scientific Data mentions PostgreSQL 9.3 as the SQL version used for AEOLUS. However, the code in the README.txt file at gives an error when I try to run it on a PostgreSQL server, and seems to use mySQL vocabulary and syntax (e.g., INT(11) in the CREATE TABLE statements and LOAD DATA INFILE in the statements used to populate the tables).

Should I switch to mySQL, or is the code on GitHub in PostgreSQL (or am I completely missing something else)? Thanks!

Extract Original Drug Indications

It seems that the "outcomes" of a particular drug include the "indications" that it was originally prescribed for. Is there a best-practices of removing the prescribed indications of a drug from the list of side-effects/outcomes for that drug?

For example, the top outcomes for the drug 'Imatinib' (drug_concept_id: 1304107) are: Death, Nausea, Diarrhoea, Vomiting, Malignant neoplasm progression, Anaemia, Pyrexia, Fatigue, Dyspnoea, Neoplasm malignant, Rash, Oedema peripheral, Chronic myeloid leukaemia. 'Chronic myeloid leukaemia' is an indication for which the drug Imatinib is often used, however it is often listed as an outcome (case_count: 616 in 'standard_drug_outcome_statistics' table).

I've tried filtering this using the 'standard_case_indication' table. However, I get the opposite problem: outcomes often show up as indications. For example, the top indications for 'Imatinib' include Nausea (187 counts), Anxiety (186 counts), Vomiting (98 counts), which look like outcomes to me.

I've included the SQL queries I'm using below. I'm hoping I'm simply querying the database wrong? Maybe I'm getting all symptoms/indications for which these patients were treated, not just indications specific for imatinib? Is there a way to get the drug's specific indications? Thanks in advance.

Extracting outcomes for imatinib

SELECT aeolus.standard_drug_outcome_statistics.*,
drug_concept.vocabulary_id AS 'drug_vocabulary',
drug_concept.concept_code AS 'drug_concept_code',
drug_concept.concept_name AS 'drug_name',
outcome_concept.vocabulary_id AS 'outcome_vocabulary',
outcome_concept.concept_code AS 'outcome_concept_code',
outcome_concept.concept_name AS 'outcome_name'
FROM aeolus.standard_drug_outcome_statistics
    LEFT JOIN concept AS drug_concept ON aeolus.standard_drug_outcome_statistics.drug_concept_id=aeolus.drug_concept.concept_id
    LEFT JOIN concept AS outcome_concept ON aeolus.standard_drug_outcome_statistics.outcome_concept_id=aeolus.outcome_concept.concept_id
WHERE drug_concept_id = 1304107
order by -case_count
LIMIT 10000;

Extracting indications for imatinib

SELECT indi_pt, count(indi_pt) FROM aeolus.standard_case_indication
 LEFT JOIN aeolus.standard_drug_outcome_drilldown ON aeolus.standard_drug_outcome_drilldown.primaryid=aeolus.standard_case_indication.primaryid
 LEFT JOIN aeolus.concept ON aeolus.standard_drug_outcome_drilldown.drug_concept_id=aeolus.concept.concept_id
WHERE drug_concept_id = 1304107
group by indi_pt;

biothings/mychem.info#6

issue with sed in bash script

In create_legacy_all_demo_data_files_with_filename_column.sh, the original string in sed (line 59) doesn't exist in all_version_B_demo_legacy_data_with_filename.txt (based on the most updated version of DEMO12Q1.txt).

I think the line should be replaced the following:

sed -i 's/8129732$8401177$I$$8129732-9$20120126$20120206$20120210$EXP$JP-CUBIST-$E2B0000000182$CUBIST PHARMACEUTICALS, INC.$85$YR$M$Y$$$20120210$$$$$JAPAN$DEMO12Q1.TXT/8129732$8401177$I$$8129732-9$20120126$20120206$20120210$EXP$JP-CUBIST-E2B0000000182$CUBIST PHARMACEUTICALS, INC.$85$YR$M$Y$$$20120210$$$$$JAPAN$DEMO12Q1.TXT/' all_version_B_demo_legacy_data_with_filename.txt

Manual mapping

Hi,

First of all, thank you for making this repo public. Second, I have a question about some of the steps in the instruction document

image

What did you mean by "some %"? Are there specific medications we should manually map but for other medications we don't? And how much % should we manually map?

Thanks

Processing Instructions vs Bash Scripts

Hi, 2 questions.

  1. I'm confused about whether the instructions in documentation/FAERS_Processing_Instructions.docx and the various bash scripts are rendundant. For example, doesn't the run-parts call in faersdbstats/load_data_files_from_website/download_all_and_create_files.sh mean we don't need to "Run the following current data shell scripts:", as described in the word doc.

  2. How does one Download "OHDSI Usagi mapping tool". Not clear from googling.

Thanks!

Removal of duplicate cases - Same ISR but different caseid

In LAERS, a number of reports is duplicated, having the same ISR number but different CASE ID numbers (i.e. two different CASE ID numbers have been assigned to the same report). This issue is not addressed using the script derive_unique_all_case.sql . This can be checked by running the following query:

SELECT * 
FROM unique_all_casedemo
WHERE isr IN (SELECT isr
               FROM unique_all_casedemo
               GROUP BY isr HAVING COUNT(*) > 1)
ORDER BY isr;

In the final table unique_all_case, currently there exist 943 rows that have an ISR number that also appears in another row of the table (with a different CASE ID).

A suggested modification to the existing code is the following (after the step that creates the table unique_all_casedemo):

-- remove duplicates with same isr and different CASE ID
drop table if exists unique_all_casedemo_2;
create table unique_all_casedemo_2 as
select database, caseid, isr, caseversion, i_f_code, event_dt, age, sex, reporter_country, primaryid, drugname_list, reac_pt_list, fda_dt
from (
select *, 
row_number() over(partition by isr order by filename desc, caseid desc) as row_num 
from unique_all_casedemo
where isr is not null
) a where a.row_num = 1
union
select database, caseid, isr, caseversion, i_f_code, event_dt, age, sex, reporter_country, primaryid, drugname_list, reac_pt_list, fda_dt
from unique_all_casedemo where isr is null;

-- remove any duplicates based on fully populated matching demographic key fields and exact match on list of drugs and list of outcomes (FAERS reactions)
-- NOTE. when using this table for subsequent joins in the ETL process, join to FAERS data using primaryid and join to LAERS data using isr
drop table if exists unique_all_case;   
create table unique_all_case as
select caseid, case when isr is not null then null else primaryid end as primaryid, isr 
from (
	select caseid, primaryid,isr, 
	row_number() over(partition by event_dt, age, sex, reporter_country, drugname_list, reac_pt_list order by primaryid desc, database desc, fda_dt desc, i_f_code, isr desc) as row_num 
	from unique_all_casedemo_2 
	where caseid is not null and event_dt is not null and age is not null and sex is not null and reporter_country is not null and drugname_list is not null and reac_pt_list is not null
) a where a.row_num = 1
union 
select caseid, case when isr is not null then null else primaryid end as primaryid, isr 
from unique_all_casedemo_2 
where caseid is null or event_dt is null or age is null or sex is null or reporter_country is null or drugname_list is null or reac_pt_list is null;

Typo in aeolus readme

Not sure if this is the right place for this, but:
In the README.txt file in the AEOLUS data available here: http://datadryad.org/resource/doi:10.5061/dryad.8q0s4

On line 48 there is tab character in "ADD INDEX idx_standard_case_indication_indication_concept_id"

On line 55:
ALTER TABLE standard_unique_all_case INDEX idx_standard_unique_all_case_caseid (caseid(255)), ADD INDEX idx_standard_unique_all_case_primary_id (primaryid(255)), ADD INDEX idx_standard_unique_all_case_isr (isr(255));
should be
ALTER TABLE standard_unique_all_case ADD INDEX idx_standard_unique_all_case_caseid (caseid(255)), ADD INDEX idx_standard_unique_all_case_primary_id (primaryid(255)), ADD INDEX idx_standard_unique_all_case_isr (isr(255));

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.