How I built a 2,000-page SEO footprint from public data
I built over 2,100 indexed pages for Bidwell from open UK contract data, one person and a nightly pipeline. Here is the whole method, nothing held back.
I am going to tell you exactly how I built the SEO footprint behind Bidwell, my tender platform: over 2,100 pages submitted to Google Search Console, generated and refreshed by a pipeline rather than a content team, run by one person. Every step, including the bit where I broke it.
Why give it away? Because the method is not the moat. The work is the moat. I have explained this process to plenty of people and the response is always the same: nodding, a note in a notebook, and then nothing happens. The steps are easy to know and tedious to do, which is exactly the kind of work I like, because tedious-to-do is where the competition thins out. So here are the keys. People won't use them.
The idea in one paragraph
The UK government publishes every public contract award as open data. Buried in that data is genuinely useful information: which councils buy what, which sectors are spending, whether the winner was an SME. Almost nobody turns it into readable pages. So I ingest it daily, store it in Postgres, and generate a page for every buyer, sector, framework and industry, each one written from the real numbers underneath it. The same pipeline that powers the product powers the public site. Every new award notice makes both more useful.
You can do this with any public dataset that updates regularly and maps to things people search for. Contract awards, planning applications, food hygiene ratings, company filings. The contract data happened to be the one my product needed anyway, which is the best reason to pick a dataset: the pages are a by-product of work you were doing regardless.
The method, step by step
1. Find data that refreshes and maps to search demand
I use OCDS JSON from the Find a Tender REST API and the Contracts Finder bulk dataset, plus Public Contracts Scotland, Sell2Wales and eTendersNI. Five portals, all free, all public. Contracts Finder alone covers everything over £12,000. Award notices carry supplier Companies House numbers and an official AwardedToSme flag, so the data tells you not just what was published but who actually won it.
The test for any dataset: does it change often enough that fresh pages mean something, and does each row connect to a phrase a real person types into Google? "NHS trust contracts" and "construction tenders Manchester" are searches with intent behind them. If your dataset only maps to phrases nobody searches, you are building a museum.
2. Ingest daily into Postgres
n8n workflows and a TypeScript ingestion script pull the JSON every day into Supabase. Nothing exotic. The unglamorous part is normalisation: the same buyer appears as "Manchester City Council", "MANCHESTER CITY COUNCIL" and "Manchester City Council " with a trailing space, and if you do not merge those at ingestion you get three thin pages instead of one good one. As of 10 June 2026 there are 32,858 award records in that table, counted live, and the count goes up every morning without me touching it.
3. Choose your page dimensions
Each useful way of slicing the data becomes a page type. For Bidwell that is per-buyer pages, per-sector pages, framework pages and industry pages, cross-linked to each other so a buyer page points at the sectors it spends in and back again. The cross-linking matters: it is how Google finds the long tail and how a visitor who lands on one page ends up reading three.
Resist the urge to multiply dimensions for the sake of it. Buyer times sector times region times year sounds like a million pages of free traffic. It is actually a million thin pages and a manual action waiting to happen.
4. Write real narrative from the data, never templated filler
This is the step everyone skips and it is the whole game. A page that says "{Buyer} has published {N} contracts" with the blanks filled in is a doorway page, and Google has been burning those down since before some of my competitors were registered at Companies House.
Instead, the generation step reads the actual rows for that buyer and writes about what is actually there: what they buy, how the spend skews, whether SMEs win there or get flattened, what changed recently. Two buyer pages built from the same template should read like two different pages because the data underneath them is different. If your data is not rich enough to support genuinely different narratives per page, you do not have a programmatic SEO opportunity, you have a spreadsheet.
5. Prerender every single page
The Bidwell site is Next.js on Cloudflare Workers. Every programmatic page is rendered to static HTML at build time. All of them, not the popular ones. Static HTML on a CDN is fast, free to serve and cannot fall over when a crawler shows up hungry. I learned the importance of "all of them" the hard way, which gets its own section below.
6. Make the per-page query stupidly cheap
Each page needs data at build time, and a naive query run thousands of times will kill the build. My first version called a SQL view that grouped the whole awards table per page: about 1.4 seconds a call, twice per page across thousands of pages. The fix was to materialise the view, dedupe it by the slug key so the case-and-whitespace variants merge with summed counts, and put a functional index on the slug expression. Same query went from 1,408ms to 20ms. The other rule: build slugs with the exact same function the sitemap uses, so a URL in the sitemap can never 404.
7. Refresh nightly and let it compound
A pg_cron job refreshes the materialised view daily, the site rebuilds, the sitemap updates, and Google sees a site where the content moves every day. That freshness signal is something a static brochure site can never fake. From there it compounds on its own: the same award data also drives the outbound side of the business, 17,136 prospect companies derived from it, 6,110 named decision-makers enriched, 3,416 outreach drafts written by the system. One dataset feeding the product, the website and the pipeline at once.
Where this goes wrong
I can be specific because it went wrong for me, in May 2026, and the failure was nearly invisible.
I had capped prerendering at the top 200 pages of roughly 2,000, with the long tail rendering on demand. Seemed reasonable. But on this stack the runtime cache is read-only, so those long-tail pages re-rendered against the database on every request, forever. To a human visitor everything looked fine. To Googlebot crawling the cold long tail in bulk, the expensive queries piled up and the pages intermittently returned 5xx errors.
Here is the lesson worth the whole article: Google does not treat a 5xx as one bad page. Enough of them and it throttles crawling for your entire domain. My genuinely new pages, the ones I wanted indexed, sat at "URL unknown to Google" because a different set of pages was quietly choking the crawler. One sick page type can starve the whole site, and nothing on the surface looks broken. The symptom in Search Console is a page type that is slow or erroring while the homepage is fine, plus new content that never gets discovered.
The fix was steps 5 and 6 done properly: drop the cap, prerender the full set, and make the data source cheap enough that builds finish. The other classic failures are thin templated pages (see step 4), slugs generated differently in the sitemap and the page (instant 404s that erode trust), and dirty source data turning one entity into five weak pages.
What it actually takes
Honesty section. To do this you need to be comfortable with SQL, a scripting language, a modern web framework and reading Search Console like a doctor reads a chart. None of it is hard individually. What kills people is the combination and the stamina: the normalisation grind, the index tuning when builds time out, the week of watching GSC to work out why indexing stalled, the nightly jobs that need to keep running long after the novelty has gone. This is a system you operate, not a campaign you launch. That is precisely why giving away the method costs me nothing. The people who will actually do all of the above were always going to be competitors. Everyone else now knows exactly what they are not going to do.
If you would rather we just built it for you, book a consultation.
Frequently asked questions
- Will Google penalise programmatic SEO pages?
- Google penalises thin, templated pages where only the blanks change between URLs. Pages built from real data, where each one carries a genuinely different narrative because the numbers underneath are different, are just useful content at scale. The test: would two pages from the same template read differently to a human?
- What public data can a UK business use for this?
- UK contract awards published as OCDS JSON through Find a Tender and Contracts Finder are free and update daily, alongside regional portals like Public Contracts Scotland and Sell2Wales. Planning applications, food hygiene ratings and Companies House filings work on the same principle. Pick a dataset that refreshes regularly and maps to phrases people actually search.
- Why do server errors on a few pages hurt the whole site?
- Google treats repeated 5xx responses as a site health problem, not a page problem, and throttles crawling across the entire domain. On Bidwell a set of on-demand pages that erred only under crawler load left brand-new pages stuck at URL unknown to Google, while everything looked fine in a normal browser.
- How big is the Bidwell footprint and who maintains it?
- Over 2,100 pages indexed in Google Search Console, generated from 32,858 UK contract award records counted live on 10 June 2026. A nightly pipeline refreshes the data and rebuilds the pages. One person runs it.