GithubHelp home page GithubHelp logo

glavin001 / fasttext-serverless Goto Github PK

View Code? Open in Web Editor NEW

This project forked from careervillage/fasttext-serverless

0.0 3.0 0.0 155 KB

Serverless hashtag recommendations using fastText and Python with AWS Lambda

License: MIT License

Python 100.00%

fasttext-serverless's Introduction

fasttext-serverless

Serverless hashtag recommendations using fastText and Python with AWS Lambda.

A simple HTTP POST endpoint that returns hashtag recommendations. This function requires a pre-trained fastText model. When you send a properly formatted string in the body of a POST to this endpoint, it will reply with JSON containing up to 5 topic recommendations that it believes match that string. It will also identify and return a list of hashtags that are already included in the submitted text (so you can handle collisions if you want to). While the internal function is named tagRecommendations the HTTP endpoint is exposed as recommendations.

Setup

Step 1: Clone this repo

$ git clone https://github.com/CareerVillage/fasttext-serverless/

Step 2: Install and configure Serverless
Refer to the Serverless docs [1, 2] for help.

$ npm install -g serverless
$ serverless config credentials --provider aws --key AKIAIOSFODNN7EXAMPLE --secret wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Step 3: Add your pre-trained classification model
Save your pre-trained model file in this project as /trained_models/model_standard.bin.

$ mv model_standard.bin /path/to/fasttext-serverless/trained_models/model_standard.bin

If you have not yet trained a model, refer to the fastText docs for help. You'll be looking to use the fasttext supervised command to generate the model.

Step 4: Deploy to AWS
Assuming you have properly configured Serverless to access AWS, to deploy the endpoint (with verbose logs) simply run serverless deploy. You should see something like this (I added the -v "verbose" flag to get more logging):

$ serverless deploy -v
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (27.71 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
..............
Serverless: Stack update finished...
Service Information
service: lambda
stage: dev
region: us-east-1
stack: lambda-dev
api keys:
  None
endpoints:
  POST - https://{your-subdomain-here}.execute-api.{your-region-code-here}.amazonaws.com/dev/recommendations
functions:
  tagRecommendations: lambda-dev-tagRecommendations
Serverless: Removing old service versions...

Usage

You can now send an HTTP POST request directly to the endpoint. For example using curl you might do:

curl -X POST https://{your-subdomain-here}.execute-api.{your-region-code-here}.amazonaws.com/dev/recommendations --data '{ "text": "What should I do in the evenings and weekends during high school to become a pediatrician? I want to become a doctor after college so that I can help children recover from terrible diseases and illnesses. #doctor #medicine" }'

The expected result should be similar to:

{"hashtags_already_used": "#doctor #healthcare #medicine", "hashtags_recommended": "('__label__doctor 0.662109 __label__pediatrician 0.0585938 __label__medicine 0.015625 __label__pre-med 0.0136719 __label__surgeon 0.0136719\\n', '')"}

Success!

Updating fastText assets

Updating the fastText binary

It's important to make the fastTExt binary using the same environment as the one your serverless function will run in. I followed the approach used here to set up an EC2 instance, import everything needed, and then make and download the binary. The fastText binary included in this project was built using fastText version 0.1.0 with:

wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip
$ unzip v0.1.0.zip
$ cd fastText-0.1.0
$ make

If you would like to update the fastText binary, you should follow a similar set of steps: ssh into a running EC2 instance (which is running an Amazon Linux AMI), follow the instructions at https://github.com/facebookresearch/fastText to update to the latest version of fastText so you can make the binary, and then copy (scp) the binary file into the folder for this repo.

Updating the classification model file

To update the model_standard.bin file, you must have training data properly formated for fastText training (e.g., training_set.txt cleaned in the same way as on the machine doing prediction. For example, for CareerVillage we remove all punctuation, remove all HTML tags, and lowercase all characters) and for optimal results, you should also have a local copy of the wikipedia-based english language word vectors file provided by fastText (wiki.en.vec). Training is completed with the following parameters: ./fasttext supervised -input ./data/questions_set_for_training.txt -output model -pretrainedVectors ./data/wiki.en.vec -verbose 2 -lr 1.0 -epoch 20 -dim 300 -wordNgrams 2 -neg 10 -bucket 10000. If you use the pretrained vectors, your model will almost certainly be too large for AWS Lambda, so you will need to use fastText's quantize to reduce the filesize. More information is available at https://github.com/facebookresearch/fastText#text-classification

Scaling

By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 100. The default limit is a safety limit that protects you from costs due to potential runaway or recursive functions during initial development and testing. To increase this limit above the default, follow the steps in To request a limit increase for concurrent executions.

References

License

Please refer to the LICENSE file for license information applying to everything in this project except for the fastText binary. The license for the fastText binary is in the LICENSE_FASTTEXT file.

fasttext-serverless's People

Contributors

jchubber avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.