The Problem with Smart Hammers: Should We Give AI the Keys to the Car?

Published 16 October 2025

The Problem with Smart Hammers: Should We Give AI the Keys to the Car?

When Anthropic's Jack Clark described AI breakthroughs, he used a brilliant analogy. Imagine a hammer that suddenly becomes self-aware. You keep using it to hit nails, but now it has its own thoughts about the job.

It sounds like a dark philosophical comedy. But it perfectly captures the shift we are seeing. We've become used to AI as a passive co-pilot, whispering suggestions in our ear. Now, it wants to be an autonomous agent, the self-directed doer that takes the keys and drives.

This change is moving AI from being a helper to being a doer. The convenience is huge. The risk is profound. As we hand over more control, we have to talk about who is responsible when the AI crashes the car.

When the Shopping Trolley Drives Itself

The clearest example of this agentic leap is the new trial at Walmart. They're calling it agentic commerce.

You don't ask ChatGPT for ideas for a Sunday roast. That's co-pilot stuff. Instead, you say: Buy everything for a family Sunday roast. The AI then executes the entire job. It selects the meat, picks the potatoes, substitutes chicken for pork if you've run out of pork, and places the order.

That's a huge, fundamental change. The AI isn't assisting your thinking. It's automating your action.

But what if it buys organic sprouts when you hate sprouts? What if it misses the free range filter you forgot to mention? We get angry at a person for messing up our order. Who do we blame when the agent gets it wrong?

The Black Box Problem

Anthropic just gave Claude Skills. This lets the model automatically switch to its best persona for a task. Ask it to write code, and it switches to its coding skill model. You don't prompt it. It just decides.

For a business, this is a dream. It moves IT from reactive to proactive. The system sees a problem and fixes it using its best tools without human input. It's faster. It's cheaper.

The problem? That level of autonomy removes visibility. If a human engineer makes a mistake, you can trace the decision-making process. If an AI agent running a business process makes a million tiny autonomous decisions, tracing the failure becomes almost impossible. We trade oversight for speed.

Where We Stand

The agent is coming. We're going to use systems that don't just help us but do things for us. The convenience is too strong to resist.

But our job isn't just to enjoy the automation. It's to define the perimeter.

We need to treat these autonomous agents like a smart, highly-motivated intern who is brilliant but prone to moments of pure, baffling madness. You let them handle the tedious work, but you never give them the final signature.

AI News This Week

Walmart Tests Agentic Commerce with ChatGPT

Walmart's new agentic commerce system lets ChatGPT handle your weekly shop. You tell it to buy everything for a family Sunday roast, and it compiles the list, substituting items if needed.

Parody Site Skewers AI Hype and Danger

A parody website called Replacement.ai is mocking the AI hype cycle with silly, over-the-top examples.

Anthropic Launches Skills for Claude

Anthropic is giving its Claude model Skills, which lets it automatically choose the right prompt for a specialised task.

Read

Anthropic Releases Claude Code Tool in the Browser

Anthropic also put its code-specific Claude tool directly into the browser for immediate use.

Read

Anthropic's Haiku Model Gets an Update

Anthropic updated its smaller, faster Claude model, called Haiku. The new version is built to be cheaper and faster for large-scale enterprise tasks.

AI Breakthroughs are Hammers That Suddenly Become Self-Aware

Anthropic's Jack Clark described AI breakthroughs using a great analogy: imagine a hammer that suddenly becomes self-aware.

Few Companies Have a Solid AI Strategy

A new Cisco study found that only 13% of companies have a truly solid, well-defined AI strategy.

What it is

Claude Skills is a new feature for the Anthropic Claude model. It moves beyond standard prompting. Instead of the user writing the perfect instruction, the Claude model automatically determines the best skill or specialised set of instructions needed for the job.

Pros

Reduced Friction: It removes the need for prompt engineering. Claude just decides the best way to handle your request.

Consistency: By automatically applying specialised, pre-tested prompts, the output quality is far more reliable for business use cases.

Speed: Claude can switch context quickly, making multi-step tasks faster.

Cons

Black Box: You don't know which skill model it used, which means losing visibility and control over the output process.

Potential Over-reliance: Users may stop thinking critically about the inputs needed.

Still Prone to Flaw: Even the best skills models are still LLMs and can occasionally output nonsense.

Verdict

Anthropic isn't trying to be just better than the competition. They're trying to be easier and more reliable for business. Claude Skills is a smart move that hides complexity behind convenience.

You can check out my video analysis here.

The Problem with Smart Hammers: Should We Give AI the Keys to the Car?