Light

leobeeson / openfda-labels Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 505 KB

Anlysis of Ingredients on Medication Labels

Jupyter Notebook 91.92% R 8.08%

openfda-labels's Introduction

Analysis of Ingredients in the openFDA Drug Label Dataset

Instructions:

First review the jupyter notebook at 01_openFDA_API_Exploration/QueryOpenFDA-API.ipynb for the openFDA API exploration analysis.
Then review the analysis of ingredients in the openFDA drug label dataset, either as an html file, an Rmd file, or a pure R script:

02_openFDA_Label_Ingredients_Analysis/openFDA_Label_Ingredients_Analysis.html
02_openFDA_Label_Ingredients_Analysis/openFDA_Label_Ingredients_Analysis.Rmd
02_openFDA_Label_Ingredients_Analysis/openFDA_Label_Ingredients_Analysis.R

Considerations:

The datasets are not stored along with the code.
If you want to run the 02_openFDA_Label_Ingredients_Analysis/openFDA_Label_Ingredients_Analysis.Rmd file, remove the eval=FALSE from the cells for downloading the full dataset, and from the cell for storing it into MongoDB.

Objectives for this task

Using the data from the OpenFDA API:
- Determine the average number of ingredients contained in AstraZeneca's (AZ) medicines per year.
- Determine the average number of ingredients across all manufacturers per year per route of administration.
- Use the field spl_product_data_elements for identifying a medicine's ingredients.

Problems

The field spl_product_data_elements does not contain punctuations, difficulting the task of identifying the boundaries of multi-word ingredient.
Given the morphology of the pharmaceutical linguistic domain, we can asssume that a significant portion of medications' ingredients are multi-worded, i.e. n-grams.

Solution Proposal

Estimate ingredient-specific collocations (a.k.a Multi-word expressions [MWE]), leveraging the dataset's fields which contain properly punctuated lists of ingredients.
Use the learned ingredient-specific collocations to compound the unbounded multi-word ingredients in the spl_product_data_elements field.
Count the unique list of compounded multi-word ingredients plus single-word ingredients per AZ's medicines.
Count the unique list of compounded multi-word ingredients plus single-word ingredients for all manufacturers per year per route of administration.

Open the full R Markdown doument in html form

openfda-labels's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs