Welcome to the Google Cloud Document AI sample repository.
The repository contains samples and Community Samples that demonstrate how to analyze, classify and search documents using Google Cloud Document AI.
- Document AI Warehouse Processing: This project demonstrates how to perform common actions on Document AI Warehouse through API.
- BQ Connector: This project uses the Document AI API to process a document, format the result and save it into a BigQuery table.
- Filter HITL Language: This project uses the languages detected by Document AI (post-HITL) to sort the
Document.json
files into separate Cloud Storage buckets. - Fraud Detection: This project uses the Document AI Invoice Parser with EKG and Google Maps to store document Entities in BigQuery.
- Language Extraction: This project uses the Document AI API to detect the languages in a multi-page document.
- Paper Summarization: This project uses the Document AI API to summarize scientific articles.
- PDF Splitter: This project uses the Document AI API to split PDF documents.
- SQL over Docs: This project shows how to run a BigQuery SQL and extract information from documents.
- Tabular Data Extraction: This project uses the Document AI API to extract tabular data from a document.
- Tax Processing Pipeline: This project uses the Document AI API to classify, parse, and calculate a tax form using multiple document types.
- Web App Demo: This project is a full-stack application that uses Document AI to process different types of documents. This application currently supports Form, Invoice and OCR processors.
If you need Document Files to run the samples, you can access them from this publicly-accessible Google Cloud Storage Bucket.
gs://cloud-samples-data/documentai/
The directory is organized by solution and document type, you can see the folder structure listed here.
documentai/
├── ContractDocAI
├── GeneralProcessors
│ ├── FormParser
│ ├── OCR
│ └── Quality
├── IdentityDocAI
│ ├── Driver's License (USA)
│ └── Passport (USA)
├── LendingDocAI
│ ├── 1040 Parser
│ ├── 1099-DIV Parser
│ ├── 1099-INT Parser
│ ├── 1099-MISC Parser
│ ├── 1099-NEC Parser
│ ├── 1099-R Parser
│ ├── Bank Statement Parser
│ ├── Lending Document Splitter & Classifier
│ └── Pay Slip Parser
├── ProcurementDocAI
│ ├── Expense Parser
│ ├── Invoice Parser
│ ├── Procurement Document Splitter & Classifier
│ └── Utility Parser
├── codelabs
├── form-parser
├── hitl
├── ocr
└── specialized-processors
- Optical Character Recognition (OCR) with Document AI (Python)
- Form Parsing with Document AI (Python)
- Specialized Processors with Document AI (Python)
- Managing Document AI processors (Python)
Disclaimer: Community samples are not officially maintained by Google.
- PDF Annotator Sample: This project uses the Document AI API to annotate PDF documents.
Contributions welcome! See the Contributing Guide.
Please use the issues page to provide feedback or submit a bug report.
This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.