Comments (4)
This is still an active area of development for us, so we aren't able to release that yet. However, our code is very specific to working on our infrastructure at LBL, (database config, etc) and wouldn't really be that useful to you, I think. However, I'm happy to give you an overview of how we went about it and point you to some libraries/resources that can help.
Our code uses the pybliometrics library to connect to the ScienceDirectAPI. We constructed a list of journals we were interested in and the years they were in service from the spreadsheet published by Elsevier every year (here is the list for journals on ScienceDirect). We then split up this list into journal-year pairs and queried the ScienceDirect API for those parameters. After that, we process the entries to make sure everything has the same metadata (doe, authors, etc.) We hand-labeled a number of abstracts for relevance (I think it was 1000) and then trained a classifier, which I believe used a bag of words featurization, for relevance.
from mat2vec.
from mat2vec.
Could you please share the code you used to query the APIs and filter the abstracts, as described in the Methods sections of the paper?
from mat2vec.
Regarding the hand-labels, I see there is a dois.txt file and also a relevant_dois.json, so is it correct to assume the dois.txt is the complete set, and the relevant_dois.json are those predicted in by the classifier trained with the 1000 hand-labels? In that case is it possible to provide a table of those 1000 hand labels so I could attempt to recreate the same classifier?
from mat2vec.
Related Issues (20)
- No module named 'helpers' error when loading newly trained embeddings HOT 5
- Question about target and context words HOT 2
- Code used to obtain the training data and for abstract classification HOT 2
- Prediction of (new) thermoelectric materials HOT 13
- Question about the outputs HOT 5
- Request for a step by step document on how to run the code HOT 7
- Training my own word embeddings HOT 3
- Trained my model using phrase2vec.py but now I want to test using that model. How? HOT 6
- I was able to train and use this on the COVID papers dataset HOT 3
- Drug repurposing for COVID HOT 3
- Script to fetch cleaned abstracts HOT 1
- my corpus is too big to be put in one large file HOT 1
- TypeError: __init__() got an unexpected keyword argument 'common_terms' HOT 10
- About the final word embeddings. HOT 4
- Problems training the model HOT 2
- Model missing from the training folder HOT 1
- Do we need to use argument `include_extra_phrases`
- Setup requirements error HOT 6
- Other applications HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mat2vec.