GithubHelp home page GithubHelp logo

lsh-0 / bot-lax-adaptor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from elifesciences/bot-lax-adaptor

0.0 0.0 0.0 3.33 MB

License: GNU General Public License v3.0

Shell 15.00% Python 85.00%

bot-lax-adaptor's Introduction

bot-lax-adaptor

This application:

  1. listens for messages from the elife-bot
  2. downloads XML from S3 via HTTP
  3. converts XML to a mostly complete representation of our article-json schema
  4. sends article-json to Lax to be ingested

installation

$ ./install.sh

web interface

The bot-lax-adaptor comes with a simple web interface that allows uploading eLife JATS XML, generating article-json from it and then validating it.

$ ./web.sh

See example-upload-file-to-api.sh.

conversion

$ source venv/bin/activate
$ python src/main.py /path/to/a/jats.xml

Output at time of writing looks like this.

convert specific article

Thin wrapper around the above command:

$ ./scrape-article.sh ./article-xml/articles/elife-09560-v1.xml

convert random article

Converts a random article to article-json:

$ ./scrape-random-article.sh

convert all articles

Converts all articles in the ./article-xml/articles/ directory, writing the results to ./article-json/. This script makes use of all available cores:

$ ./generate-article-json.sh

validation

The article-json generated by this application is structured according to the eLife json-schema article specification.

Because the XML only contains a partial representation of an article, validation also involves filling in certain gaps that can only be provided by Lax.

$ source venv/bin/activate
$ python src/validate.py /path/to/an/article.json

validate specific article-json

Thin wrapper around above command:

$ ./validate-json.sh ./article-json/elife-09560-v1.xml.json

validate all article-json

Validates all article-json in the ./article-json/ directory. This script makes use of all available cores:

$ ./validate-all-json.sh

backfill

populating a Lax installation

This generates, validates and then performs an ingest --force to lax for each article in the article-xml repository.

$ ./backfill.sh

The generation, validation and ingest actions happen in separate steps for greater parallelism.

updating a small subset of articles in Lax

This reads a list of article IDs from a file and then generates, validates and performs an ingest --force to lax for each article sequentially. It can be quite slow for a large number of articles.

$ ./backfill-many.sh

listening/sending

receiving messages from an AWS SQS queue

This is quite eLife-specific but can be modified easily if you're a developer:

$ ./bot-lax-listener.sh

testing

$ ./test.sh

Copyright & Licence

Copyright 2023 eLife Sciences. Licensed under the GPLv3

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

bot-lax-adaptor's People

Contributors

gnott avatar elife-alfred-user avatar giorgiosironi avatar lsh-0 avatar seanwiseman avatar thewilkybarkid avatar jenniferstrej avatar dependabot[bot] avatar nuclearredeye avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.