GithubHelp home page GithubHelp logo

cltk_api's Introduction

Build Status

Join the chat at https://gitter.im/cltk/cltk_api

Notice

The Classics Archive application is currently under active development and is not ready for production.

About

A simple Flask app for accessing corpora from the CLTK corpora. Currently under development.

To run with gunicorn: gunicorn -w 4 -b 0.0.0.0:5000 api_json:app.

Development

To get started developing, you'll need Python3.5 and Mongo installed.

Create a virtual environment and activate it:

$ pyvenv venv $ source venv/bin/activate

Install dependencies:

$ pip install -r requirements.txt

Finally, start the app with the following command:

$ python api_json.py

cltk_api's People

Contributors

achaitanyasai avatar gitter-badger avatar imran31 avatar kylepjohnson avatar lukehollis avatar manu-chroma avatar sameeriitkgp avatar suheb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cltk_api's Issues

Add ability to identify and respond with a list of entities given an input string of a classical text

For a given input string, we need to identity and respond with a list of entities (perhaps also their positions in the string).

A working function..

  • Accepts an input string
  • Identifies named entities in the input string
  • Associates named entities to some external data sources (maybe wikipedia or VIAF?)
  • Returns a list of entities serialized as JSON

We have a little bit of work done here that I used on an earlier project: https://github.com/cltk/cltk_api/tree/master/metadata/entities We should only keep what is useful of this and delete every line of code that is not. I will filter through the existing files and be more judicious in removing what's not applicable and documenting what is.

Write Vagrant bootstrap.sh for CLTK and CLTK API

Building upon Manvendra's script to automate Nginx: https://gist.github.com/manu-chroma/4a6f3b6b27aa49683c67b9fb0b23d493

Let's add the basic setup, too:

Relevant Vagrant example here: https://www.vagrantup.com/docs/getting-started/provisioning.html

Please reach out for anything you get stuck on, we're here to help.

Stabilize and catalog all document formats (chapter, book-chapter, book-chapter-section, etc.)

In order to sync data from the text server to the Meteor application's database, we need to better define the different document formats. What makes a document "chapter" v. "book-chapter", etc.?

It only really matters for the API to understand how many levels of nested content there are. We can build it to be flexible if each level of nested content above the actual string of text (book, chapter, poem, etc.) has a similar structure.

Add converter.py and converted cltk_json to all CTLK corpora _text_ repos

All CLTK corpora text repos need a converter.py and the subsequent converted cltk_json dir with json files that are produced by converter.py.

One example of a converter.py is here: https://github.com/cltk/chinese_text_cbeta_01/blob/master/converter.py

A simple search on the cltk github page will yield all CLTK text corpora: https://github.com/cltk?utf8=%E2%9C%93&q=text&type=&language=

Here is a checklist of all CLTK text repos with their conversion status:

Converting TLG texts with TLGU issue

I have an issue trying to convert the TLG texts with TLGU. I use the code exactly as it is given in the instruction:

In [1]: from cltk.corpus.greek.tlgu import TLGU

In [2]: t = TLGU()

In [3]: t.convert_corpus(corpus='tlg')

I use Python 3.4.3 on Cygwin. When I enter [3], it does not do anything except start a newline waiting for me to enter another command. I checked and there is no file output.

By the way I cannot find PHI7 in my CLTK folders either.

Add route for accessing CLTK stemmer

To get started with adding routes for accessing the CLTK core modules, an easy first step seems to be to add a route which we can send a GET request to with an input string of Latin words and receive a stemmed string of Latin words in response. I'll take a look at adding this unless anyone else is interested in working on it.

Version API resources

Seems like this might be handled best through Accept Headers, though many solutions will work here.

Renaming API name gives error with gunicorn command

After setting up cltk_api I was able to run the API through python api.py command but not through gunicorn -w 4 -b 0.0.0.0:5000 api:app (Error in the screenshot)
I changed to it's old name api_json.py and run the command gunicorn -w 4 -b 0.0.0.0:5000 api_json:app and it worked.
I think the already present api/ folder in the same directory creates conflict when the gunicorn command is run and API is named api.py
Can someone else please verify the same error on their system ?
P.S: Name of the API was updated 2 commits back. 2bcc0da
Terminal Output:
image
OS: Ubuntu 15.10 (64-bit)

REST API Design

This issue is to brainstorm the design of the API endpoints and responses. I'll start with a couple of points on shorter URLs, HATEOAS and the folder structure.

Shorter URLs

I propose maintaining numeric IDs for each author, corpus, text, etc. and using those to construct the REST endpoints.

So, for example, endpoint GET /lang/latin/corpus/perseus/author/tacitus/text/germania becomes GET /lang/latin/corpus/1/author/6/text/8.

This keeps the URLs short while allowing the actual names that the IDs map to to be as long as needed.

A problem with this (assuming an external API consumer) is figuring out the ID of a specific author/corpus/text.

API Discoverability

The formal term for this is HATEOAS. This implies a user should be able to browse and discover all the endpoints of the REST API using the REST API itself.

Towards this, we should define endpoints like GET /lang/latin/corpus/ that returns a response:

{"corpora": [ {"name": "perseus", "id": "1"}, ... ]}

This way, the user will be able to query for all the available corpora and figure out the ID.

Another example of this is from my POS tagger implementation. It is possible to view the list of languages and POS tagging methods they support via GET /core/pos, and perform the actual POS tagging for a string via POST /core/pos.

In general, adding a GET request handler to endpoints like /lang, /lang/<int:lang_id>/corpus, etc. should make the API discoverable.

Folder Structure

Right now all the resources are defined in a single file (api_json.py), and so are tests (tests.py). There is also no distinction between files containing utility functions and actual REST resources.

I briefly mentioned this in my #20 (comment).

An example of my proposed organisation is in #27. Inside the folder for a specific function (/pos), the resources will be in views.py, the database stuff (if any) in models.py, utility functions in utils.py and parameters in constants.py.

(It may be better to keep constants.py at the root of the API folder structure, to easily find and change)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.