aztec_pub_classifier's People
aztec_pub_classifier's Issues
Pattern Detection of Sentences
Find the patterns of sentences that tell us that the tool is available.
Implement Classifier with Neural Networks
As a developer, I'd like to use a neural network to classify publications.
Document Work
Comments to code
High level wiki on Github on: how to setup and run.
Label sentences
Dead Links in publications
Old publications will have links that are broken
Classify sentences: 99 nontools, 64 tools
given a sentence, classify it as important or not important.
Important: says that the tool is accessible
Not important: everything else
Multiple tool publications
Some tools will have multiple publications, possibly for different versions
Use Naive Bayes Classifier to calculate probabilities
Applying LR, SVM, Decision Tree to Features
Extract Features: Pairs of (object, verb), (object, verb)
Collect Data and Extract Features
Try adding new features to classifier, e.g.:
- presence of colon in title
- case changes in title
- MeSH terms
Look into examining syntactical structure and examining subject, object, and verb
Increase Training Set:
Classify Publications from Bioinformatics Journal
Using the Pubmed API, classify the publications based on the title and abstract of the publications.
Example usage of API:
Gets a list of ID's of the publications: (returns 1000, set the parameter accordingly)
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&sort=relevance&term=%22Bioinformatics%22&field=journal
Get's information about the particular publication (change the id parameter accordingly)
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=27402908&rettype=xml
Tools that have Availability in the abstract are tools.
Extracting all the sentences with URLs
Include context as well
Increase sentence training set
Learn About Neural Networks
As a developer, I would like to understand how to use neural networks to classify publications based on the text.
Tutorial: http://jrmeyer.github.io/tutorial/2016/02/01/TensorFlow-Tutorial.html
Identify keywords to use as features in classifier
- look at data to manually determine which words or phrases determine tool or not tool
- use list of programming languages to create feature that indicates presence of a programming language
- create classifier that only uses above features and presence of colon, capital letters in title, and not total word freqencies
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.