GithubHelp home page GithubHelp logo

cceh / kosh Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 6.0 1.07 MB

Kosh - APIs for Lexical Data

Home Page: https://kosh.uni-koeln.de

License: MIT License

Python 99.66% Dockerfile 0.34%
api graphql rest

kosh's People

Contributors

fmondaca avatar lguenth avatar querela avatar schlusslicht avatar vocabulista avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kosh's Issues

Support for python 3.8+?

I think you could relax the python version requirement from 3.11 down to 3.8. Is there a reason to not support older python versions due to new syntax usage? Locally, I could run kosh with 3.8 but I have not yet tested all possible code paths just some random api requests.

Implement sensible "Not Found" message

Currently, when using the ids endpoint for some non-existent ID, the API returns an error instead of a sensible "not found" message, e.g.:

2019-07-04 12:49:39 [ERROR] <graphql.execution.executor> An error occurred while resolving field query.ids
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/graphql/execution/executor.py", line 450, in resolve_or_error
    return executor.execute(resolve_fn, source, info, **args)
  File "/usr/lib/python3.7/site-packages/graphql/execution/executors/sync.py", line 16, in execute
    return fn(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/kosh-0.0.1-py3.7.egg/kosh/api/graphql.py", line 50, in resolve_ids
    return search.ids(elex, ids)
  File "/usr/lib/python3.7/site-packages/kosh-0.0.1-py3.7.egg/kosh/elastic/search.py", line 29, in ids
    }) for item in find.mget(ids)]
  File "/usr/lib/python3.7/site-packages/kosh-0.0.1-py3.7.egg/kosh/elastic/search.py", line 29, in <listcomp>
    }) for item in find.mget(ids)]
AttributeError: 'NoneType' object has no attribute 'to_dict'

Which results in an graphql error:

{
  "errors": [
    {
      "message": "RequestError(400, 'Required routing not provided for documents .')",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": [
        "ids"
      ]
    }
  ],
  "data": {
    "ids": null
  }
}

Empty response if one of requested IDs is invalid

You get an empty response if one of the requested IDs is invalid:

request

{
  ids(ids: ["lemma_agni_79", "invalidIDwhatever"]) {
    id
    headwordIso
  }
}

response

{
  "data": {
    "ids": []
  }
}

... should be either a response containing only the matched documents:

{
  "data": {
    "ids": [
      {
        "id": "lemma_agni_79",
        "headwordIso": "agni"
      }
    ]
  }
}

... or even a response with placeholders for the invalid IDs requested:

{
  "data": {
    "ids": [
      {
        "id": "lemma_agni_79",
        "headwordIso": "agni"
      },
      {
        "id": "invalidIDwhatever"
      }
    ]
  }
}

add authors

{  
  "mappings": {  
"_meta": {  
  "title": "My Nicename",  
  "authors": ["Peter Pan", "Wendy"],   // <--
  "_xpaths": {  
    "id": "./@id",  
    "root": "//entry",  
    "fields": {  
      "lemma": "./form/orth",  
      "[sense_def]": "./sense/def",  
      "[sense_pos]": "./sense/gramGrp/pos/q",  
      "[dicteg]": "./sense/dicteg/q"  
    }  
  }  
},  
"properties": {  
  "lemma": {  
    "type": "keyword"  
  },  
  "sense_def": {  
    "type": "text"  
  },  
  "sense_pos": {  
    "type": "text"  
  },  
  "dicteg": {  
    "type": "text"  
  }  
  

Improve parser error handling

Currently, when the parsing process of any XML resource or .kosh definition fails, kosh breaks down completely. We should catch parsing errors, print a nice log message and ignore the respective (broken) data module.

Query aggregations

It should be possible to use aggreation queries via either REST or GraphQL (and any other, future API modules).

data_sync parameter: Interval or Boolean?

Hi Phil @schlusslicht :)
While testing each CLI parameter again for #127, I noticed something odd with the data_sync setting. Using something like kosh --data_sync 20 raises a ValueError.

From the doc string and the default value set in the kosh.ini, data_sync is supposed to be an integer with a value of 0 meaning that the feature is disabled. This is also reflected in the watch function in kosh.py.

However, in the parse function for this parameter, the value is handled as a truth literal instead, using strtobool.

You introduced this parameter a long while ago but only changed the default value to an integer in your overhaul commit 4 months back, when you also added the watcher, I believe.
Did you mean to change the parser as well at some point or am I reading this wrong? ๐Ÿ˜…

Add title and authors

We should implement a xpath property or a custom field within the JSON mappings to assign a human understandable name to any imported/served dictionary. Also information about its authors.

E.g.:

{
  "mappings": {
    "_meta": {

    "title": "My Nicename", // <--
    "authors": ["Peter Pan", "Wendy"],   // <--

      "_xpaths": {

        "title": "//title", // <--

        "id": "./@xml:id",
        "root": "//tei:cit[@type='item']",
        "fields": {
          ...
        }
      }
    },
    "properties": {
      ...
    }
  }
}

pip(3) and python(3) calls in Makefile

When trying to make kosh within a python venv, the pip and python binaries are not aliased by their version number, e.g. pip2/pip3 and pytho2/python3. Therefore the python3 call in the Makefile fails.

Read XML node attribute values

Kosh currently only supports indexing the text content of XML nodes, but not their attributes. This needs to be implemented as many datasets provide crutial information not within the XML node attributes.

Trace (possible) memory leak

When running kosh instances long-term, their memory.usage grows significantly, which indicates a memory leak. Most probable this leak originates in the XML-parsing components of kosh, possibly occurring because of the use of threads.

This needs to be traced by, e.g., using valagrind in combination with PYTHONMALLOC=malloc.

Query multiple fields

It seems essential, that one may query multiple fields. To implement this we need to:

  • implement search logic to query multiple fields (e.g. encapsulate in boolean query)
  • enable SwaggerUI/RESTful API to add key/value pairs with predefined keys
  • enable Graphqli/Graphql API to add key/value pairs with predefined keys

Id field in entries query broken

When using the entries query, searching within the id field is broken, e.g. never returns any results. Either the id field should not be searchable from the entries query or the search logic must be fixed.

Query nested objects

It should be possible to (1) index nested objects via specifying a object structure in the respective JSON definitions file. Further, it then should be possible to (2) query the nested objects via either REST or GraphQL (and any other, future API modules).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.