Most AI agencies are selling you a chatbot wrapper
Most AI agencies sell a chat window with your logo on it. Here is what a real AI system build involves, and the questions that expose a wrapper early.
An AI agency should build you a system that takes real work off real people: drafting the tender response, qualifying the lead, chasing the unpaid invoice, writing the first version of the email your marketing manager currently writes from scratch. If what you are being sold is a chat window with your logo on it, you are not buying a system. You are buying a wrapper around someone else's model, and you could have built it yourself in an afternoon.
That distinction matters because the market is full of wrappers dressed up as engineering. Gartner calls the practice "agent washing": vendors rebranding existing products as AI agents without any substantial agentic capability underneath. Of the thousands of companies selling "AI agents", Gartner estimates only about 130 are real. The same press release predicts that over 40% of agentic AI projects will be cancelled by the end of 2027, mostly because of escalating costs and unclear business value.
So before you sign anything, it is worth knowing what the job actually involves.
"If the demo works without your data, it will fail with it."
Dean Cookson, founder, Operosus
What does a real AI build look like?
Strip away the branding and a working AI system has five layers. The model is the least interesting one.
Your data, made usable. The model needs your case studies, your price list, your tone of voice, your past proposals, your CRM history. Most of that lives in PDFs, inboxes and someone's head. A real build starts by extracting it, structuring it and keeping it current. This is unglamorous database and pipeline work, and it is where most of the value sits. When we built Bidwell, our tender-drafting product, the hard part was never "generate text". It was getting a company's evidence library into a shape where the right case study surfaces for the right question, every time.
Integration with the tools you already use. A system that lives in a separate tab gets abandoned within a month. The output has to land where the work happens: the draft appears in the CRM, the email goes out through your sending domain, the lead gets tagged and routed before a human ever looks at it. Our client builds are mostly plumbing: webhooks, queues, triggers, sync jobs. Nobody puts plumbing in the sales deck, which is exactly why wrappers skip it.
Rules for when the AI is not allowed to act. Every system we ship has explicit failure paths. What happens when the model is unsure? When the data is stale? When the customer asks something out of scope? A wrapper guesses. A production system escalates to a human, logs the gap and fails politely. Designing those boundaries is judgement work, and it is most of the difference between a demo and something you can put in front of customers.
Human review where the stakes demand it. AI-drafted, human-approved is the right pattern for anything that carries your name: tenders, client emails, published content. In Contentwell and Emailwell, the model produces the draft and a person signs it off before anything ships. The win is not removing people. It is removing the blank page, so the person spends ten minutes editing instead of two hours writing.
Measurement. If nobody can tell you what the system saved or won last month, it is a toy. Real builds track outputs against a baseline: drafts produced, hours displaced, replies generated, deals registered. You should be able to cancel the contract the moment the numbers stop justifying it, and a confident agency will hand you that dashboard on day one.
Why do so many AI projects fail?
Not because the models are weak. MIT's NANDA initiative studied 300 enterprise deployments and found that 95% of generative AI pilots fail to deliver measurable impact on the bottom line. The researchers were blunt about the cause: not model quality, but flawed integration and a "learning gap" between generic tools and the specific workflows of the business.
Read that again from a buyer's seat. The technology you are being pitched is rarely the risk. The risk is the gap between a demo running on clean sample data and a system running on your messy, real, half-documented operation. The same MIT research found that buying from a specialist and building the partnership properly succeeded about 67% of the time, while purely internal builds succeeded at roughly half that rate. The lesson is not "never build". It is that integration expertise, not access to a model, is the thing you are actually paying for.
Which leads back to the one-line test at the top of this piece, one we would apply to anyone, ourselves included: if the demo works without your data, it will fail with it.
How do you spot a wrapper before you pay for it?
A few questions do most of the work. (The full ten-question version is in our guide to choosing an AI agency in the UK.)
"What happens to my data before the model sees it?" A real agency talks about extraction, structuring, deduplication and refresh cycles. A wrapper talks about "training the AI on your documents", which usually means uploading PDFs into a retrieval tool and hoping.
"Show me the system handling a question it cannot answer." Every demo handles the happy path. Ask to see the unhappy one. If there is no escalation route, no logging, no human fallback, you are looking at a prototype.
"Which of my existing tools does this write to?" Not read from. Write to. Creating a contact in your CRM, sending from your domain, updating your job board. Write access is where integration gets hard, and where wrappers quietly stop.
"What number will we review each month?" If the answer is vague, the agency does not expect the system to survive contact with your P&L. We build the measurement in because we expect to be judged by it, and you should expect the same from anyone you hire.
"What did you build before AI was fashionable?" Useful systems are mostly software engineering: data, integration, error handling, deployment. An agency that only learned to prompt in 2024 has none of that muscle.
So what should you actually buy?
Buy outcomes attached to workflows you already understand. Not "AI transformation", not a roadmap workshop, not a chatbot for a website nobody visits. One process, automated end to end, measured against what it cost you before. Tender drafting. Lead capture and routing. Content production with editorial review. Email follow-up that actually follows up.
That is the entire business we are in at Operosus: products like Bidwell, Contentwell and Emailwell where the workflow is common enough to productise, and bespoke systems where a client's process is the moat and deserves its own build. Our numbers are public on the proof page because measurement is layer five, and we eat our own cooking. In both cases the model is maybe a tenth of the work. The rest is making it safe, connected and accountable.
The honest answer to "what does an AI agency actually do" is this: the boring engineering that turns a clever model into a dependable employee. If the agency in front of you cannot describe that work in detail, they are not doing it. And given that Gartner expects over 40% of these projects to be cancelled by 2027, the cheapest moment to find that out is before you sign.