GithubHelp home page GithubHelp logo

elifesciences / bot-lax-adaptor Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 5.0 3.33 MB

License: GNU General Public License v3.0

Shell 15.00% Python 85.00%
article article-json elife jats-xml publishing xml-to-json

bot-lax-adaptor's People

Contributors

dependabot[bot] avatar elife-alfred-user avatar giorgiosironi avatar gnott avatar jenniferstrej avatar lsh-0 avatar nuclearredeye avatar seanwiseman avatar thewilkybarkid avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bot-lax-adaptor's Issues

article 12215 v1 scrape error

getting this attempting to re-generate a fixture for lax:

$ ./scrape-article.sh article-xml/articles/elife-12215-v1.xml 
ERROR - 2017-09-27 17:18:52,266 - failed to render doc 'https://raw.githubusercontent.com/elifesciences/elife-article-xml/f0dd69e8d0976e68868ced40de0b7140c040e250/articles/elife-12215-v1.xml' with error: ('Author missing required competingInterests', {'version': 1, 'msid': u'12215'}) -- {}
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
    msg = self.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
    return fmt.format(record)
  File "/home/luke/dev/python/bot-lax-adaptor/src/conf.py", line 84, in format
    record.__dict__['extra'] = utils.json_dumps(unknown_fields)
  File "/home/luke/dev/python/bot-lax-adaptor/src/utils.py", line 121, in json_dumps
    return json.dumps(obj, default=datetime_handler, **kwargs)
  File "/usr/lib64/python2.7/json/__init__.py", line 251, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/home/luke/dev/python/bot-lax-adaptor/src/utils.py", line 120, in datetime_handler
    raise TypeError('Object of type %s with value of %s is not JSON serializable' % (type(obj), repr(obj)))
TypeError: Object of type <type 'file'> with value of <open file 'article-xml/articles/elife-12215-v1.xml', mode 'r' at 0x7f7c0dad9390> is not JSON serializable
Logged from file main.py, line 659
Traceback (most recent call last):
  File "src/main.py", line 671, in <module>
    print main(doc, args)
  File "src/main.py", line 650, in main
    article_json = render_single(doc, **ctx)
  File "src/main.py", line 612, in render_single
    article_data = postprocess(render(description, [soup], ctx)[0], ctx)
  File "src/main.py", line 486, in postprocess
    partial(manual_overrides, ctx),
  File "/home/luke/dev/python/bot-lax-adaptor/venv/lib/python2.7/site-packages/et3/render.py", line 21, in doall
    item = do(item, seg, ctx)
  File "/home/luke/dev/python/bot-lax-adaptor/venv/lib/python2.7/site-packages/et3/render.py", line 13, in do
    return segment(item)
  File "src/main.py", line 428, in check_authors
    raise ValueError("Author missing required competingInterests", context)
ValueError: ('Author missing required competingInterests', {'version': 1, 'msid': u'12215'})

this is using the latest asset-file-label-title-testing-validation branch, latest article-xml.

haven't investigated yet, but I remember @gnott mentioning doing a complete scrape and not finding any errors.

ISBN validation failure: `TypeError: 'NoneType' object is not iterable`

Need to extend the changes made in this commit to handle invalid string values as well as None values.

We have had a case in production where a long string value was given instead of a valid isbn which resulted in the below trace: (though this took some debugging, all we initially saw was 'NoneType' object is not iterable

File: ...isbnlib/_ext.py", line 20, in mask
    return msk(isbn, separator)
 File: ...isbnlib/_msk.py", line 25, in msk
    ib = canonical(isbn)
File: ...isbnlib/_core.py", line 143, in canonical
    numb = [c for c in isbnlike if c in '0123456789Xx']
TypeError: 'NoneType' object is not iterable

Pinging @lsh-0

to_volume review for PoA

Making a note while I merge conficts,

I think the logic on to_volume() for PoA may need to be reviewed. Volume should be calculated based on the PoA article's pub date year, and not from today's year.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.