elifesciences / bot-lax-adaptor Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
getting this attempting to re-generate a fixture for lax:
$ ./scrape-article.sh article-xml/articles/elife-12215-v1.xml
ERROR - 2017-09-27 17:18:52,266 - failed to render doc 'https://raw.githubusercontent.com/elifesciences/elife-article-xml/f0dd69e8d0976e68868ced40de0b7140c040e250/articles/elife-12215-v1.xml' with error: ('Author missing required competingInterests', {'version': 1, 'msid': u'12215'}) -- {}
Traceback (most recent call last):
File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
File "/home/luke/dev/python/bot-lax-adaptor/src/conf.py", line 84, in format
record.__dict__['extra'] = utils.json_dumps(unknown_fields)
File "/home/luke/dev/python/bot-lax-adaptor/src/utils.py", line 121, in json_dumps
return json.dumps(obj, default=datetime_handler, **kwargs)
File "/usr/lib64/python2.7/json/__init__.py", line 251, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/home/luke/dev/python/bot-lax-adaptor/src/utils.py", line 120, in datetime_handler
raise TypeError('Object of type %s with value of %s is not JSON serializable' % (type(obj), repr(obj)))
TypeError: Object of type <type 'file'> with value of <open file 'article-xml/articles/elife-12215-v1.xml', mode 'r' at 0x7f7c0dad9390> is not JSON serializable
Logged from file main.py, line 659
Traceback (most recent call last):
File "src/main.py", line 671, in <module>
print main(doc, args)
File "src/main.py", line 650, in main
article_json = render_single(doc, **ctx)
File "src/main.py", line 612, in render_single
article_data = postprocess(render(description, [soup], ctx)[0], ctx)
File "src/main.py", line 486, in postprocess
partial(manual_overrides, ctx),
File "/home/luke/dev/python/bot-lax-adaptor/venv/lib/python2.7/site-packages/et3/render.py", line 21, in doall
item = do(item, seg, ctx)
File "/home/luke/dev/python/bot-lax-adaptor/venv/lib/python2.7/site-packages/et3/render.py", line 13, in do
return segment(item)
File "src/main.py", line 428, in check_authors
raise ValueError("Author missing required competingInterests", context)
ValueError: ('Author missing required competingInterests', {'version': 1, 'msid': u'12215'})
this is using the latest asset-file-label-title-testing-validation
branch, latest article-xml.
haven't investigated yet, but I remember @gnott mentioning doing a complete scrape and not finding any errors.
Need to extend the changes made in this commit to handle invalid string values as well as None
values.
We have had a case in production where a long string value was given instead of a valid isbn
which resulted in the below trace: (though this took some debugging, all we initially saw was 'NoneType' object is not iterable
File: ...isbnlib/_ext.py", line 20, in mask
return msk(isbn, separator)
File: ...isbnlib/_msk.py", line 25, in msk
ib = canonical(isbn)
File: ...isbnlib/_core.py", line 143, in canonical
numb = [c for c in isbnlike if c in '0123456789Xx']
TypeError: 'NoneType' object is not iterable
Pinging @lsh-0
Making a note while I merge conficts,
I think the logic on to_volume() for PoA may need to be reviewed. Volume should be calculated based on the PoA article's pub date year, and not from today's year.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.