Comments (8)
Added logic for Phase 1 journal similarity.
If the average journal similarity > 0.8, the current cosine similarity becomes equal to 1.5 times its current value.
The average journal similarity is computed as
(sum of journal similarity scores that exist in table) / (# of comparisons made).
from reciter.
Currently, it's caching the similarity scores once it's retrieved. Still requires a long time to fetch this data. Need to come up with a better retrieval method.
from reciter.
Paul: Match the current article being assigned to the original member of a cluster.
Journal similarity on a sliding scale: sim = (1 + f(A, B)) * sim
from reciter.
Added sim *= (1 + journalSimScore);
to ReCiterClusterer.java
as a sliding scale measurement.
from reciter.
I realize there are various competing scores involved, but we have a couple cases where this logic should have worked but didn't.
- rmm2002 authored an article (18068236) in " Int J Cardiol" but it isn't successfully mapped to the cluster that is almost pure Cardiology, and as a result is a false negative. What explains this?
- jww2001 should map to 25623219 based on journal similarity.
- rdgranst should map to 10951265 and 24763504
- cnathan by virtue of his other publications in immunology should map to the journals associated with these articles which are highly focused on immunology: "19286131 11135572 16192449 9698876 7553846 1725935 2040651 2715632 7909663 2510571"
- For ljgudas, these papers - 1939678, 2289970 - shouldn't be clustered with the large group of biochemistry/cancer papers. These two articles appear in J Dev Behav Pediatr, which is a psychiatry / behavioral science journal.
- For rjm2002, these papers should be clustered with the larger group of radiology papers:
- 24065258
- 23849389
- 23245821
- 21694949
- 21266554
- 20876891
- 20813299
- 19237054
- 17901143
- 9124127
- mroman studies cardiovascular disease as evidenced by the true positives, but there is a group of false negatives also in that field that got left out of the cluster:
- 2470462
- 2437995
- 24268115
- brs9035 does cardiothoracic surgery as evidenced by the true positives, but there is whole other group of articles, that he didn't write, published in journals devoted to infectious disease and biochemistry:
- 8726063
- 1725235
- 10391869
- 19747126
- 19254170
- 9234818
- 7890377
- 8225606
- 1343780
- 1718309
- 20949003
- 9359860
- 7945236
- 2012611
- 15609239
- 9534976
- 8212028
- 1378233
- 1862522
- 1796476
- 1724806
- 9066029
from reciter.
rmm2002 authored an article (18068236) in " Int J Cardiol" but it isn't successfully mapped to the cluster that is almost pure Cardiology, and as a result is a false negative. What explains this?
Assigned to the correct cluster by relaxing the constraint for affiliation matching from "weill cornell medical college" to "weill cornell". Another method would be to increase the journal similarity further.
from reciter.
Good thinking, @jl987-Jie. "weill cornell" should work for post-2000. "cornell medical" will work for earlier papers.
from reciter.
Jie has informed me that coding on this issue is completed.
from reciter.
Related Issues (20)
- Error in running dynamoDb locally in Docker HOT 7
- Candidate article count is wrong HOT 6
- Capture lookup_type in esearchresults HOT 1
- Admin can throttle users' requests
- doctoralYearDiscrepancy not kicking in for certain grad students HOT 1
- Failure to score matching institution for xiz4007 HOT 1
- Update MeSHterm.json
- First name scoring does not properly match in cases where nameMatchFirstType should be "full-conflictingAllButInitials"
- Failure to score article first name in cases where institutional first name contains space or dash
- Feature Generator by Group API should accept input of an array of person IDs HOT 2
- Fields parameter in Feature Generator not working as expected
- Feature Generator outputs in a single article suggestion pieces of two separate article records HOT 1
- 500 Internal Server Error for _dar7342 HOT 1
- Application returns 500 error if "emails" field is null
- Refactor publication type assignment
- DOI is parsed incorrectly HOT 1
- No documentation on how to use Reciter and the Reciter Pubmed retrieval tool HOT 6
- Create new publication type, "Erratum"
- Add first name likelihood scoring strategy
- Investigate 404 errors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reciter.