GithubHelp home page GithubHelp logo

mindrones / nestauk-dap_dv_backends Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 10.33 MB

Copy of https://github.com/nestauk/dap_dv_backends

License: MIT License

JavaScript 77.22% Cypher 0.53% Shell 22.00% Dockerfile 0.24%

nestauk-dap_dv_backends's Introduction

dap_dv_backends Repository

Overview

This system is designed to annotate text from documents using DbPedia Spotlight from Wikipedia in such a way that annotations (topics) are Wikipedia pages.

It does so by creating an infrastructure of EC2 instances to annotate batches of documents stored in either S3 or ElasticSearch, orchestrating the flow of information automatically. You'll find a more thorough description of the architecture in docs/architecture.md.md.

Requirements

This system relies on having an ElasticSearch index. There are two scenarios:

  • The data is in this index: In this case the index acts as a source and recipient of the annotations.
  • The data is in S3: In this case the ElasticSearch index acts as a recipient of the annotations.

Using a tool (e.g. Insomnia) select your domain URI pointing to the ES index domain (e.g. in our case https://es.annotations.dap-tools.uk/).

Then create the index:

PUT /my-index

Use that index name for data ingestion and annotation tasks.

Installations

Please refer to docs/installation.md.

Usage

To annotate data, clients send requests to the reverse proxy, which then routes these to the appropriate service based on the request path. This setup simplifies access control, monitoring, and management of the system's components.

The process is in 3 steps:

  • Make sure you have AWS credentials in ~/.aws
  • Navigate to the dap_dv_backends repo, then:
    • edit src/services/config.mjs
    • push the changes to the repo
    • npm install if needed
    • npm run deployInfra
  • To request a token, in your browser:
    • navigate to https://<API_SUBDOMAIN>.<BASE_DOMAIN>/static/index.html (for Nesta this would be https://api.dap-tools.uk/auth/static/index.html)
    • open the /request section
    • click on Try it out
    • insert your email, click on Execute: this should send you an email
  • Check your email for an email with the token then:
    • copy the token
    • click on the activation link
  • In your terminal:
    • export set NESTA_EMAIL=<email>
    • export set NESTA_TOKEN=<token>
  • Navigate to the nestauk/dsp_waifinder repo, then:
    • cd etl
    • npm install
    • npm install dap_dv_backends_utils (this installs from a git url so make sure you have the latest version, 0.0.16)
    • edit the npm script to point to the domains you have chosen, so for example in [1] check make sure the -e is correct
    • npm run annotateData
  • When the annotation has finished:
    • you should receive an email to notify you the annotation is done
    • navigate to your local dap_dv_backends repo
    • npm run deleteInfra

[1] "annotateData": "npx annotateEsIndex -d es.annotations.dap-tools.uk -e https://api.dap-tools.uk/annotation -i ai_map -f description"

nestauk-dap_dv_backends's People

Contributors

doogyb avatar saabi avatar mindrones avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.