Turn inbound paperwork into structured data
Orders, delivery notes, supplier invoices and signed forms read by AI, validated against your records and filed without retyping.
The problem
How it works by hand
Orders, delivery notes, supplier invoices and signed forms arrive as PDFs, email attachments and photos taken in a van. Then someone sits and retypes them into your accounts package or job system, line by line, and every retyped line is a chance to fat-finger a quantity or a price. It is dull, error-prone work, and it piles up precisely when the business is busiest.
A worked example
What a working version looks like
Documents arrive however they already arrive: a shared inbox, a drag-and-drop folder, a photo from a phone. AI extraction reads each one and pulls the fields you care about into structured data, whether the document is a clean PDF or a creased photo of a delivery note. The system then validates what it read against your own records: does the supplier exist, does the PO match, do the line totals add up. Clean documents file straight through into your accounts or job system with the original attached. Anything that fails a check lands in a short exceptions queue for a human, with the discrepancy highlighted, so people only ever look at the documents that need judgement.
The exact tools change per business. The shape does not.
What it needs
Honest inputs, nothing exotic
- 01Examples of the documents you handle (a dozen real ones is enough to start)
- 02The system the data ends up in (Xero, Sage, QuickBooks, a job system or a spreadsheet)
- 03Your matching rules: supplier lists, PO formats, tolerance for price differences
- 04An inbox or folder where documents already land
The payoff
What you get back
Retyping documents stops being a job. Data lands in your systems the day it arrives instead of when someone gets round to it, the error rate drops because the checks run on every single document, and your people only handle the genuine oddities.
Do it yourself
How you would build this yourself
No course, no upsell. This is the order we would build it in, with the tools named, and a prompt to start from.
- 1
Collect a dozen real documents of each type, including the creased photos and bad scans. The system has to handle your worst inputs, not the clean example on the supplier’s website.
- 2
Use a model that reads documents natively (Claude handles PDFs and photos directly) and test extraction on those real samples before building any pipeline around it.
- 3
Define the output as a strict schema: fields, types, validation rules. Extraction without validation is just retyping with extra steps.
- 4
Build the checks: does the supplier exist in your records, does the PO match, do the line totals add up. The checks are what catch errors, not the extraction.
- 5
Pipe clean documents into Xero, Sage or your job system via API with the original attached. Anything failing a check goes to a short human queue with the discrepancy highlighted.
- 6
The fiddly bit is the accounting-package APIs: auth, rate limits, draft versus posted states. Budget more time for that than for the AI part.
I receive [supplier invoices/delivery notes/order forms] as PDFs and phone photos, and someone retypes them into [Xero/Sage/QuickBooks/spreadsheet]. Build me an extraction pipeline: 1) define a strict schema from the field list I will give you, 2) extract from real sample documents including the rough ones, 3) validate against my supplier list and PO rules, 4) push clean items into [system] via its API with the original attached, queueing exceptions with the discrepancy highlighted. Start by asking me for ten real samples and the field list.
Copy it into Claude Code, fill the brackets, and it will plan the build with you before writing a line of code.
We would rather show you how than bill you. The whole ladder of free help, answers, guides and the weekly build-along, is on the do-it-yourself page.