pdf to img, pdf to text.. and so on!!
I have created this repo out of furstration by the lack of documantation and general help in concern to working with pdf documents on aws lambda.
These instructions will get you a copy of the project up to run it on lambda
A step by step on how to setup using my sample script witch uses poppler to create thumbnails for pdf files.
- Clone the project
git clone https://github.com/johanub/Lambda-poppler-precompiled
- Edit the index.py file to use your bucket
s3_bucket = s3.Bucket("<your-bucket>")
- Navigate into the project directorty and zip the files using this command (command only works on unix based systems)
zip -r -X "app.zip" *
-
Now go to the aws lambda console and and go into layer
-
Make a new layer with the
poppler.zip
file. For runtime just choose all the python runtimes.
-
Create a new lambda function and upload the file
app.zip
-
Select the layer witch we just made.
-
Setup a trigger on the s3 bucket where the pdf's will be uploaded
-
Go to the bucket and make a directory called
previews
Explain how to run the automated tests for this system -
Upload a pdf to your s3 and see the magic
- Pavinthan - Poppler for aws lambda - Pavinthan