Simple web app to expose open source C++ robots.txt parser on the web.
Uses the version of the Google open source robots.txt parser in my branch that exposes the information I need in structured output as well as the modification to allow passing in multiple user agents that uses the first UA passed in unless there is no ruleset explicitly targeting that UA, in which case it falls back to the second UA.
Calls the binary executable which needs compiling for whatever OS it is going to run on.
Use env FLASK_APP=pyrobots.py flask run
to test locally then visit http://127.0.0.1:5000/.
Test the API with e.g.:
curl -X POST -H "Content-Type: application/json" -d '{
"robots": "User-agent: *\nDisallow: /foo/\n\nUser-agent: googlebot/1.2\nDisallow: /bar/",
"ua": "googlebot-image",
"url": "/bar/"
}' http://127.0.0.1:5000/api/parse/
curl -X POST -H "Content-Type: application/json" -d '{
"robots": "User-agent: *\nDisallow: /foo/\n\nUser-agent: googlebot/1.2\nDisallow: /bar/",
"ua": "adsbot-google",
"url": "/foo/"
}' http://127.0.0.1:5000/api/parse/
See AWS instructions. Currently deployed on a lightsail instance in my own AWS account.