Comments (3)
Receiving following error with Lucene:
Document contains at least one immense term in field="AFFILIATION_UNTOKENIZED" (whose UTF8 encoding is longer than the max length 32766)
when running ReCiter with yiwang
as input.
Update:
Trimmed affiliation information to have maximum of 32766 characters long.
from reciter.
ReCiter starts slowing down after reaching into the first thousand articles. Could take a long time to cluster all those 70,000+ articles. (When ReCiter reaches the 3000th article, it's taking ~1 second to assign the article.)
from reciter.
So, that means it would take ~18 hours to run for Yi Wang. That's too long. As they say, let's not let perfection be the enemy of the good. Don't forget about the other 98% of authors who have reasonably unique names!
Even if this figure could be improved by 100%, we would need some sort of fall back plan (search by full name). And if that means that for such people, we don't get every last publications, so be it. Can you predict how long it takes to compute publications for someone with 20 candidate articles vs. 100 vs. 1,000 vs. 10,000? I assume it is exponential?
from reciter.
Related Issues (20)
- Update MeSHterm.json
- First name scoring does not properly match in cases where nameMatchFirstType should be "full-conflictingAllButInitials"
- Failure to score article first name in cases where institutional first name contains space or dash
- Feature Generator by Group API should accept input of an array of person IDs HOT 2
- Fields parameter in Feature Generator not working as expected
- Feature Generator outputs in a single article suggestion pieces of two separate article records HOT 1
- 500 Internal Server Error for _dar7342 HOT 1
- Application returns 500 error if "emails" field is null
- Refactor publication type assignment
- DOI is parsed incorrectly HOT 1
- No documentation on how to use Reciter and the Reciter Pubmed retrieval tool HOT 6
- Create new publication type, "Erratum"
- Add first name likelihood scoring strategy
- Investigate 404 errors
- Switch to using environmental variables
- App throws an error if firstName field is blank
- Output equalContrib as an author level attribute
- Update the way ReCiter handles books HOT 1
- Downweight cases where org unit doesn't match
- Look up candidate records by names of collaborators
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reciter.