Comments (5)
Possible solution:
- if the DOI ends in some of those characters (to be specified), have refextract spit out both the naive DOI and the same with the next word appended, recursively, as possible DOIs
- call bibrank on all those DOIs that are substrings of each other, trying to match first the longest (to avoid the proceedings vs conference paper issue) and put the first match in the right MARC field
- if there is no match, try to resolve the DOIs to find the right one
I don't know if this is compatible with the way refextract and bibrank talk to each other
from refextract.
I stand corrected, tex does insert line breaks in dois when necessary in most circumstances
from refextract.
@michamos, was the case we were looking today on arXiv? In that case we can look-up the sources and better understand the circumstances.
from refextract.
@kaplun yes, we were looking at an arXiv paper with this problem the other day. I don't really understand your remark though
from refextract.
Nevermind: I misread @tsgit's "I stand corrected"... I.e. we all agree that LaTeX can indeed break on newline DOIs.
from refextract.
Related Issues (20)
- TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType HOT 1
- TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType HOT 2
- clean_pdf_file throws SystemError on MacOS with mmap: resizing not available HOT 1
- extract_references_from_file returns inconsistent data
- refextract: recognize system identifiers
- refextract: month in pubnote HOT 2
- Crash in TeXKeys extraction HOT 3
- TypeError: coercing to Unicode HOT 5
- Error in importing HOT 2
- Syntax error in references/api.py line 96
- refextract: recognize DELPHI notes HOT 1
- Import refextract fails HOT 15
- dont split PTEP articleIDs at letter in the middle
- Year taken as page number when page number is 4 digits
- Error in extract_references_from_file(path) method HOT 2
- Issue with non a-zA-Z auther names
- TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType HOT 3
- Ininite loop on debian
- Refextract fails to extract from two-columned layout pdf HOT 2
- mmap: resizing not available HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from refextract.