Why does AI-written cold outreach usually fail?

Because most tools treat personalisation as decoration: a template with a research-flavoured first line. Buyers delete templated email on sight and enough hit report spam that the sending domain pays. AI outreach works when each email is individually drafted from real research about that one prospect, and is simply not sent when no genuine reason to write exists.

How do you stop an AI inventing fake personalisation in cold emails?

Structurally, not with prompt tweaks. Run research and drafting as separate stages, where research must produce specific sourced facts about the prospect and drafting is only permitted to use those facts. If research returns nothing, the drafting stage never runs and no email is sent for that prospect.

Should AI cold emails be reviewed by a human before sending?

At first, yes, every one. Review drafts for hallucinated facts and tone drift until batches come back consistently clean, then allow qualifying drafts to send automatically, throttled and spread across warmed inboxes. The review phase is how you earn the right to automate the send.

What does it take to build a researched AI outbound system yourself?

Comfort with messy CSVs and enrichment APIs, prompt writing and rewriting, sending infrastructure that needs warming and throttling, and the stamina to read hundreds of drafts during the trust-building phase. The method is straightforward to learn. The work is the hard part, which is why so few people do it properly.

How I make AI write cold outreach that gets replies

The exact method I use to make AI write one researched cold email per prospect, never a template, and the rule that decides when no email gets sent at all.

Last updated 11 June 20267 min read

This is the part of the open kitchen where I hand over the recipe. What follows is the exact method I use to build AI outbound systems for paying clients, every step, nothing held back. People ask why I would publish this. Simple: knowing the method has never been the moat. Doing the work is the moat. Most people who read this will agree with every word, nod along, and then go straight back to mail-merging one template into two thousand inboxes, because the proper version is harder and slower to build. That is fine by me. The few who build it properly will get replies, and they will have earned them.

The one rule that makes everything else work

Before the steps, the rule: if the AI cannot find a genuine reason to write to this person, it does not write.

No fallback template. No "hope you're doing well" filler for the prospects where research came up empty. The email exists because there is something true and specific to say to this one person, or it does not exist at all. Some rows in every batch produce no email. Good. That is the system working.

This is the opposite of how most "AI personalisation" tools behave. They treat personalisation as decoration: a template with a research-flavoured first line bolted on. I treat the research as the reason the email gets written. If the reason is missing, there is no email to decorate.

Why be this strict? Because generic mail-merge outbound is dead. Buyers can smell a {{first_name}} template from the subject line and delete it on sight, and enough of them hit "report spam" that your domain reputation pays for it. The traditional alternative, researching every prospect properly by hand, takes about twenty minutes per email, so nobody does it past the first ten. The whole point of this build is to do the twenty-minute version at a volume no human could sustain, without ever cheating on the twenty-minute part.

The method

1. Start with a list that deserves the effort

Every outbound project I have built starts with a messy spreadsheet. For Inform Holdings, a business rates advisory firm I built this for, the raw material was CSV exports of ratepayer cases: hundreds of rows of company names, addresses and case types, no contact details, no obvious owner inside the sales team.

The first stage is qualification, and it is boring on purpose. Parse the file, normalise the messy headers, clean up the lead-type values (trim them, uppercase them, match on contains so a stray suffix or a header quirk does not kill a good row), then qualify each row against the criteria that make a prospect worth contacting. Rows that fail never reach enrichment. You are about to spend real money and real model time per prospect, so dead rows have to die here, not three stages later.

If you do not have a list, build one from sources your prospects genuinely leave traces in: public registries, contract award records, your own past enquiries. Bought lists are stale the day they arrive and every competitor with a credit card has the same spreadsheet.

2. Turn company rows into named people

A company name and an address is not a prospect. The next stage is enrichment: match each qualified company in an enrichment platform (I used Apollo for the Inform Holdings build) and pull in decision-maker contacts with verified email addresses. The output of this stage is the difference between "a row about a company" and "a reachable human with a name and a job title".

Verify the emails before anything sends. A bounce rate above a few percent does more damage to your sending domains than a month of good behaviour repairs.

3. Research each prospect, one at a time

For every contact, the system gathers real context: the company website, what they sell, the situation they are in, anything recent and concrete. The best builds have a signal baked into the data itself. For Inform Holdings it was a specific 2023 Rating List case with a closing window, which is about as genuine a reason to write to someone as exists. For Bidwell, my tender-writing product, it was public contract award data about the recipient: 3,416 individually researched touches across email and LinkedIn, every one grounded in something true about who they had won work from.

This is where the rule gets enforced. Research either produces specific, sourced facts about this prospect or it produces nothing. When it produces nothing, the pipeline stops for that person. No email.

4. Draft one email per person, in the sender's voice

Every email is generated individually from that one prospect's research. No shared template. No token-swap personalisation. The draft references something true and specific about their business, because the research stage handed it exactly that and nothing else.

Voice takes less than people think: two or three of the sender's best past emails teaches the model how that person writes. Then the guardrails, written down and enforced in the prompt: who we never contact, what we never claim, and no fake "I loved your recent post" filler. The model is not allowed to flatter, invent enthusiasm or manufacture urgency. It states the reason for writing and asks one clear question.

5. Send it from the person who owns the relationship

An individually researched email loses half its value if it arrives from outreach@yourcompany.com. The last routing stage maps each lead to the right salesperson. At Inform Holdings that meant sector specialists, specialists by incumbent rating agent, and caseload allocation, with each account manager tied to their own exact sending sequence, so the email lands from the mailbox of the person who will take the call.

One build decision that paid for itself repeatedly: routing rules live in configuration, not in code paths scattered through the app. People left, a sequence moved to a different sender, allocations changed. Every one of those was a config edit, not a rebuild.

6. Review until you trust it, then get out of the way

Drafts queue for human approval at first. Read them. All of them. You are checking for two things: hallucinated facts and tone drift. After enough clean batches you let qualifying drafts send automatically, throttled and spread across warmed inboxes so deliverability holds.

Then close the loop: replies route back into the system, classified, so a "can you call me this week" never drowns under the out-of-offices and unsubscribes. The salesperson's involvement starts where it should: when a prospect replies.

Where this goes wrong

I have watched every one of these happen, including to me.

The model invents the signal. Left unsupervised, an LLM will hallucinate a reason to write: "congratulations on your recent growth" to a company that has not grown. The fix is structural, not prompt-deep. Research and drafting are separate stages, and the drafting stage is only allowed to use facts the research stage produced. If research returned nothing, drafting never runs.

You cave on the veto. A third of your batch produced no email and the volume feels disappointing, so you add a "fallback template" for the empties. Congratulations, you are mail-merging again, and the spam complaints from the fallback emails drag down deliverability for the good ones too.

One inbox, full throttle. A perfect email that lands in spam was never sent. Warm the inboxes before the campaign, throttle the daily volume, spread the load across accounts. Deliverability is plumbing, and plumbing fails silently.

Routing is hardcoded. Salespeople leave, get reassigned, go on holiday. If their sequence mapping lives in code, every staffing change is a development task and the system rots. Config files and a five-minute edit, every time.

Nobody owns the replies. Teams build the sending half and forget the receiving half. At any real volume, the buying signal arrives buried in autoresponders, and by the time someone digs it out the moment has passed. The reply pipeline is half the system, not an afterthought.

What it actually takes

Honesty time. This is not a weekend with a no-code tool. You need to be comfortable wrangling filthy CSVs, working with enrichment APIs, writing and rewriting prompts until the drafts stop embarrassing you, and tending sending infrastructure that punishes neglect. You need the stamina to read hundreds of drafts during the trust-building phase, and the discipline to keep the veto rule when it costs you volume. The method is free, right here, all of it. The work is the price, and the work is why this gets results while the template-blasters wonder why nobody replies.

If you would rather the replies just appeared without building any of this, that is the other thing I do: book a consultation and I will build it for you.

Frequently asked questions

Why does AI-written cold outreach usually fail?: Because most tools treat personalisation as decoration: a template with a research-flavoured first line. Buyers delete templated email on sight and enough hit report spam that the sending domain pays. AI outreach works when each email is individually drafted from real research about that one prospect, and is simply not sent when no genuine reason to write exists.
How do you stop an AI inventing fake personalisation in cold emails?: Structurally, not with prompt tweaks. Run research and drafting as separate stages, where research must produce specific sourced facts about the prospect and drafting is only permitted to use those facts. If research returns nothing, the drafting stage never runs and no email is sent for that prospect.
Should AI cold emails be reviewed by a human before sending?: At first, yes, every one. Review drafts for hallucinated facts and tone drift until batches come back consistently clean, then allow qualifying drafts to send automatically, throttled and spread across warmed inboxes. The review phase is how you earn the right to automate the send.
What does it take to build a researched AI outbound system yourself?: Comfort with messy CSVs and enrichment APIs, prompt writing and rewriting, sending infrastructure that needs warming and throttling, and the stamina to read hundreds of drafts during the trust-building phase. The method is straightforward to learn. The work is the hard part, which is why so few people do it properly.