GithubHelp home page GithubHelp logo

divvun / divvun-api Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 1.0 1.09 MB

Web server for language processing

Home Page: https://divvun.github.io/divvun-api

Rust 94.05% Gherkin 4.28% Dockerfile 0.66% Shell 1.00%
openapi language-technology proofing-tools minority-language indigenous-languages

divvun-api's Introduction

Divvun API

Starts a web server for accessing the Divvun spellcheck API.

See https://divvun.github.io/divvun-api/index.html for installation and usage documentation instructions.

OpenAPI

The OpenAPI documentation is generated with ReDoc and hosted at https://divvun.github.io/divvun-api/redoc-static.html

To refresh the documentation, install the redoc-cli NPM package (npm i -g redoc-cli) and run redoc-cli bundle openapi.yml. This will generate a redoc-static.html file that needs to be placed in the docs folder.

To refresh docs/index.html, cd docs/ and run asciidoctor index.adoc.

Testing

Tests use the files in tests/resources/data_files. The files need to be organized as follows before running cargo test:

tests
|--resources
   |--data_files
      |  se.zcheck
      |  se.zhfst
      |  smj.zcheck
      |  smj.zhfst
      |
      |--grammar
         |  se.zcheck
      |--hyphenation
         |  se.hfstol
      |--spelling
         |  se.zhfst

The base data_files folder is expected to have both se and smj grammar (.zcheck) and checker (.zhfst) files for the purposes of testing the file watcher, and the se files are also expected to be present in the spelling, hyphenation, and grammar folders for testing loading of files at startup.

  • run cargo test

Deployment

Additional steps for deployment.

Requirements

  • Create a regular user with sudo privileges (default: ubuntu)
  • Create an API user with which the API will run (default: api)
  • Setup SSH access
  • Install python

Set the admin_email variable to receive emails from Let's Encrypt when it's time to renew the HTTPS certificate.

Docker image

This project is built and pushed into a docker image. The docker images is deployed by the https://github.com/divvun/divvun-api-deploy/

You need to build the docker image on an x86 machine. An m1 mac won't do.

docker build -t divvun/divvun-api:v2 .

Now.. this image is not uploaded to a repository. It just sits there. You'll have to pack it yourself.

Crimes

Basic litmus tests for spellers and grammar

$ curl -X POST -H 'Content-Type: application/json' 'https://api-giellalt.uit.no/speller/se' --data '{"text": "pahkat"}'
{"text":"pahkat","results":[{"word":"pahkat","is_correct":false,"suggestions":[{"value":"páhkat","weight":15.301758},{"value":"páhkkat","weight":21.3018},{"value":"dahkat","weight":33.012695},{"value":"háhkat","weight":34.89453},{"value":"ráhkat","weight":38.691406},{"value":"čáhkat","weight":38.79785},{"value":"hahkát","weight":39.896484},{"value":"báhkat","weight":39.89746},{"value":"Ráhkat","weight":40.05078},{"value":"páhka","weight":40.301758}]}]
$ curl -X POST -H 'Content-Type: application/json' 'https://api-giellalt.uit.no/grammar/se' --data '{"text": "Danne lea politijuristtaide eanemus praktihkkalaččat vuogas dan dahkat Čáhcesullos."}'
{"text":"Danne lea politijuristtaide eanemus praktihkkalaččat vuogas dan dahkat Čáhcesullos.","errs":[{"error_text":"politijuristtaide","start_index":10,"end_index":27,"error_code":"typo","description":"Ii leat sátnelisttus","suggestions":["politiijajuristtaide"],"title":"Čállinmeattáhus"},{"error_text":"praktihkkalaččat","start_index":36,"end_index":52,"error_code":"typo","description":"Ii leat sátnelisttus","suggestions":["praktihkalaččat","praktihkalat","praktihkalet","praktihkalit","praktihkalut"],"title":"Čállinmeattáhus"}]}%

Pack your image:

docker save divvun/divvun-api:v2 | gzip > divvun-api-v2.tar.gz

Copy your image to the divvun-api server:

scp divvun-api-v2.tar.gz [email protected]:

Load your image up

% ssh [email protected]
$ gunzip --stdout divvun-api-v2.tar.gz | docker load
8553b91047da: Loading layer [==================================================>]  84.01MB/84.01MB
22050e545130: Loading layer [==================================================>]  22.38MB/22.38MB
16c60edfa394: Loading layer [==================================================>]    215kB/215kB
86ffb3b2b27a: Loading layer [==================================================>]  80.01MB/80.01MB
c642b882d7d8: Loading layer [==================================================>]  1.536kB/1.536kB
0b38c186a861: Loading layer [==================================================>]  16.72MB/16.72MB
75784fda3f36: Loading layer [==================================================>]   2.56kB/2.56kB
Loaded image: divvun/divvun-api:v2

The default user is API. Switch to it, go to the deploy directory and change the docker-compose to use your new tag. The restart docker-compose and start tailing the newly started divvun-api container.

su api -s /bin/bash
cd /home/api/dist
nano -w docker-compose.yml #change the image to use :v2
docker-compose restart
docker logs -n10 -f dist_divvun_api_1

And then you rerun your litmus tests. Make sure to change the languages around so you're not testing.. something that would have worked anyway. And then you change your grammar packages around. And then you rerun the litmus tests.

Ultimately, you update the https://github.com/divvun/divvun-api-deploy/ repo and stop doing crimes.

divvun-api's People

Contributors

bbqsrc avatar fry avatar killercup avatar projektir avatar snomos avatar unhammer avatar zbrox avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

killercup

divvun-api's Issues

Speller support for Overleaf?

It would be nice to have speller support for our own languages in Overleaf. The Overleaf speller code is here.

The idea would be to add a simple layer that calls our speller API. It needs at least three functions:

  • get list of supported languages
  • check word
  • get suggestions for word

Need a function to list all available languages for a given service

For service discovery, there needs to be a way of getting a list of languages supported by a given service. An obvious candidate would be list, as in:

https://api-giellalt.uit.no/hyphenation/list

The function name list would not collide with any language codes, as these are always two or three letters long.

Without this function, there is no way external tools can utilise our services, unless language codes are hard-coded.

Also note that the list function needs to be on the level of tools, as the supported languages may vary from tool to tool.

Language options list returns invalid data for `hyphenation` field

The following command:

$ curl -s -X GET -k -H 'Content-Type: application/json' -i 'https://api-giellalt.uit.no/languages' | grep '{' | jq .

returns this data structure:

{
  "available": {
    "grammar": {
      "smj": "julevsámegiella",
      "sma": "Åarjelsaemien gïele",
      "sms": "nuõrttsääʹmǩiõll",
      "smn": "anarâškielâ",
      "se": "davvisámegiella"
    },
    "speller": {
      "se": "davvisámegiella",
      "smj": "julevsámegiella",
      "sma": "Åarjelsaemien gïele",
      "sms": "nuõrttsääʹmǩiõll",
      "smn": "anarâškielâ"
    },
    "hyphenation": {
      "hyphenator-gt-desc": "hyphenator-gt-desc"
    }
  }
}

The returned data for hyphenation is invalid, the expected data should be as for speller and grammar.

Make hyphenation server work properly

The hyphenation server is not working properly, cf #3. We have also received requests for hyphenation services, which we presently can't offer.

At least the following steps are needed:

  • install FST hyphenator for at least two languages
  • test that they are working properly
  • hyphenation for both single words and whole texts (paragraphs?), must be differentiated in the API
  • offer the hyphenation service in a web interface (we already have the web interface, but we need to make it use our API server)

Documentation outdated

The GraphQL part is outdated: https://divvun.github.io/divvun-api/#_graphql

Running the code returns an error:

curl -X POST -H 'Content-Type: application/json' -i 'https://api-giellalt.uit.no/graphql' --data  '{ "query": "query { suggestions(text: \"pákhat\", language: \"se\") { speller { isCorrect }, grammar { errs { startIndex endIndex errorCode description suggestions title } } } }" }'
HTTP/2 200 
content-type: application/json
date: Tue, 14 May 2024 08:09:23 GMT
server: Caddy
content-length: 111

{"errors":[{"message":"Unknown field \"isCorrect\" on type \"Speller\"","locations":[{"line":1,"column":65}]}]}%                                             

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.