Comments (4)
Based on the date of 2020-03-27, are you trying to run on CORD-19 data from that date? The format changed on 2020-05-12 and this method only supports data dumps from that date and on:
https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html
Unless you have a specific reason, I would use the latest dump from the link above.
from paperetl.
ok, I just want try it on a small file
from paperetl.
Best thing to do would be to download the latest dump, extract it and filter a few rows from metadata.csv
For example
tar -xvzf cord-19_2020-08-12.tar.gz
cd 2020-08-12
mv metadata.csv metadata.csv.bkup
head -500 metadata.csv.bkup > metadata.csv
from paperetl.
it works
from paperetl.
Related Issues (20)
- Add common method for accessing Grammar object
- Update CORD-19 entry dates source
- Detect month changes in CORD-19 entry date process
- Remove study attribute and design models and all related dependencies
- Add pre-commit checks
- Remove legacy merge logic
- Add database flag to determine if database should be replaced
- Add multiprocessing support to files process
- Support reading compressed files
- Require Python 3.7+
- Issue processing into Elasticsearch HOT 5
- Improve PMB filtering logic
- Add example notebook
- Update CORD-19 scripts
- sqlite3.OperationalError: database is locked HOT 6
- AttributeError: 'NoneType' object has no attribute 'upper' HOT 3
- Update minimum Python version to 3.8
- Update setup.py to only show standard image on PyPI
- Zotero connection HOT 1
- Scaling to create a proccess per cpu core overwhelms grobid service HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paperetl.