GithubHelp home page GithubHelp logo

Comments (8)

jl987-Jie avatar jl987-Jie commented on May 31, 2024

Added logic for Phase 1 journal similarity.

If the average journal similarity > 0.8, the current cosine similarity becomes equal to 1.5 times its current value.

The average journal similarity is computed as
(sum of journal similarity scores that exist in table) / (# of comparisons made).

from reciter.

jl987-Jie avatar jl987-Jie commented on May 31, 2024

Currently, it's caching the similarity scores once it's retrieved. Still requires a long time to fetch this data. Need to come up with a better retrieval method.

from reciter.

jl987-Jie avatar jl987-Jie commented on May 31, 2024

Paul: Match the current article being assigned to the original member of a cluster.
Journal similarity on a sliding scale: sim = (1 + f(A, B)) * sim

from reciter.

jl987-Jie avatar jl987-Jie commented on May 31, 2024

Added sim *= (1 + journalSimScore); to ReCiterClusterer.java as a sliding scale measurement.

from reciter.

paulalbert1 avatar paulalbert1 commented on May 31, 2024

I realize there are various competing scores involved, but we have a couple cases where this logic should have worked but didn't.

  • rmm2002 authored an article (18068236) in " Int J Cardiol" but it isn't successfully mapped to the cluster that is almost pure Cardiology, and as a result is a false negative. What explains this?
  • jww2001 should map to 25623219 based on journal similarity.
  • rdgranst should map to 10951265 and 24763504
  • cnathan by virtue of his other publications in immunology should map to the journals associated with these articles which are highly focused on immunology: "19286131 11135572 16192449 9698876 7553846 1725935 2040651 2715632 7909663 2510571"
  • For ljgudas, these papers - 1939678, 2289970 - shouldn't be clustered with the large group of biochemistry/cancer papers. These two articles appear in J Dev Behav Pediatr, which is a psychiatry / behavioral science journal.
  • For rjm2002, these papers should be clustered with the larger group of radiology papers:
    • 24065258
    • 23849389
    • 23245821
    • 21694949
    • 21266554
    • 20876891
    • 20813299
    • 19237054
    • 17901143
    • 9124127
  • mroman studies cardiovascular disease as evidenced by the true positives, but there is a group of false negatives also in that field that got left out of the cluster:
    • 2470462
    • 2437995
    • 24268115
  • brs9035 does cardiothoracic surgery as evidenced by the true positives, but there is whole other group of articles, that he didn't write, published in journals devoted to infectious disease and biochemistry:
    • 8726063
    • 1725235
    • 10391869
    • 19747126
    • 19254170
    • 9234818
    • 7890377
    • 8225606
    • 1343780
    • 1718309
    • 20949003
    • 9359860
    • 7945236
    • 2012611
    • 15609239
    • 9534976
    • 8212028
    • 1378233
    • 1862522
    • 1796476
    • 1724806
    • 9066029

from reciter.

jl987-Jie avatar jl987-Jie commented on May 31, 2024

rmm2002 authored an article (18068236) in " Int J Cardiol" but it isn't successfully mapped to the cluster that is almost pure Cardiology, and as a result is a false negative. What explains this?

Assigned to the correct cluster by relaxing the constraint for affiliation matching from "weill cornell medical college" to "weill cornell". Another method would be to increase the journal similarity further.

from reciter.

paulalbert1 avatar paulalbert1 commented on May 31, 2024

Good thinking, @jl987-Jie. "weill cornell" should work for post-2000. "cornell medical" will work for earlier papers.

from reciter.

michaelbales1 avatar michaelbales1 commented on May 31, 2024

Jie has informed me that coding on this issue is completed.

from reciter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.