GithubHelp home page GithubHelp logo

florent-escribe / idsctppromptengineering Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 8.52 MB

A training exercise on prompt engineering for the IDSC class of Mines Paris

Jupyter Notebook 80.95% Python 19.05%

idsctppromptengineering's Introduction

idscTpPromptEngineering

A training exercise on prompt engineering for the IDSC class of Mines Paris
Jan 2024

Context :

You are trying to automate the invoice processing service of a company.
In data/pdf_documents, there is a dataset of invoices and purchase orders.
In data/ocr_results, there is the output of Amazon Textract, an OCR system, on those documents.
Your goal is to determine for each pair of Invoice/PO whether :

  • The PO number is the same
  • The receiver name is the same
  • The vendor name is the same
  • The invoice total is lower than the PO number

Instructions :

  1. MODOP: Get your OpenAI API key

    • Go to https://openai.com/product
    • Create an account or login with google
    • Fill in personal information
    • Label data Prove you are human
    • Go to API keys on the left panel
    • Verify your phone number
    • A window automatically opens to create an API key
    • Give it a name
    • Save it to a local text file
    • Go to Billing on the left panel
      • Click Add payment details
      • Select Individual account
      • Fill in your payment details
      • Ask for the minimum 5$
      • Pay (6$ total with tax)
    • All set !
  2. Create a .env file at the root of the project

  3. Add

OPENAI_API_KEY="[your api key]"
PYTHONPATH="[the path to your projet]"
  1. In the terminal, run source .env. You can check that it has worked by looking at your environment variables.
  2. Create a python virtual environment in the project
  3. Install project requirements by running pip install -r requirements.txt
  4. In notebooks/ open prompt_examples.ipynb and run your first GPT prompts through OpenAI's API.
  5. In notebooks/ open extraction_example.ipynb, extract the PO#, receiver/vendor names and total for one document.
  6. In notebooks/extraction_on_dataset.ipynb, you will find code that extracts those fields on the whole dataset.
  7. In notebooks/ open matching_layout.ipynb, determine for each order if the PO fields match those of the invoice.
  8. In notebooks/matching_on_dataset.ipynb, you will find code that matches POs and invoices.
  9. Iterate on the prompts to get better results.

idsctppromptengineering's People

Contributors

florent-escribe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.