Comments (8)
but XME format is also publicly accessible and advertised, so deleted records should clearly be separate, no !? 980__a:deleted isn't obvious to external users
from inspire.
Is XME advertised? on the records detailed view, clicking on MARCXML opens the XM format, not XME.
from inspire.
XME is advertized in the FAQ. However there are 2 different XME dumps. The one publicly accessible e.g. from the dump advertized in the FAQ or via /export/xme
, and this already today is protecting hidden fields and can keep on avoiding exporting deleted records.
Then we are creating the full-dump via the bibtasklet
. This can be extended to also included content from deleted records.
from inspire.
This is currently causing the biggest error on Sentry: https://sentry.cern.ch/inspire-sentry/inspire-nightly/group/619214/.
While clearly this data is not useful for end users it's useful data for curation algorithms: for example, deleted records contain records that were determined to be duplicates. So it would be cool to have this as soon as possible.
from inspire.
Yeah. The only issue is that this data will surely contain more outliers and crap because it won't have been curated (whoah what a tense is that) for a long time.
from inspire.
I've seen things you people wouldn't believe. Records with three first authors (all referring to the same guy). I watched exceptions bubble up the stack because records had two longitudes and no latitude.
Seriously, I don't expect this data to be any worse than what we've already found : )
from inspire.
You won't believe your eyes...
from inspire.
\o/
ETA for a new prodsync dump with the deleted records?
Answered in #277 (comment).
from inspire.
Related Issues (20)
- Add display format for HepNames awards HOT 9
- What's up with FFT%%? HOT 13
- What to do with records with 0 pages? HOT 3
- The XME format switches 961__c and 961__x HOT 1
- Custom API to export ORCID,DOI association HOT 1
- multiple DOIs for a record mishandled by bst_arxiv_doi_update
- Authorxml check script seems not work properly
- Experiments: what's in 372__a? HOT 8
- Conferences: ill-formatted CNUMs HOT 3
- Institutions: is CORE in 690C or in 980? HOT 4
- CNUM generator is wrong when the starting date is incomplete HOT 6
- export accelerator in experiment XME HOT 1
- Exposing doctype instead of type in FFT in XME
- Google Scholar indexing issues HOT 6
- Journals: what's in 022__m? HOT 2
- A 773__0 is generated in XME even if one was already there
- Some records have empty XME HOT 2
- Fix CDS OAI harvest duplicate detection
- Make validation for dates more strict.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from inspire.