PA
Peter AaronBlog
PM Craft

Mt. Stupid is in your launch metrics

7 min read
Product ManagementAILaunch StrategyMetricsUser Behaviour
A snow-covered Himalayan peak rises through the cloud line as a lone bird traces the air in front of it — the false summit climbers warn each other about, and the shape of every launch dashboard read too early.
A snow-covered Himalayan peak rises through the cloud line as a lone bird traces the air in front of it — the false summit climbers warn each other about, and the shape of every launch dashboard read too early.
Article InsightsAI
Generating insights…

The day a new feature ships, the dashboard usually looks better than the spec promised. Activation rate up. Time-to-first-value down. Week-one through the roof. Slack lights up with congratulations. Two weeks later the same chart is upside down, and the loudest customer email in the inbox is some variant of "this is broken, fix it or we churn".

It's easy to shrug this off as novelty effect or the long tail of edge cases. That isn't wrong, but it isn't the most useful frame. The more useful frame is older than software. The Dunning-Kruger effect describes a gap between confidence and competence, and the gap is the place where launches go to die.

The shape of overconfidence

The Kruger and Dunning paper from 1999 is one of those studies whose conclusions everyone has heard and almost nobody has read. The headline finding was straightforward. People who were bad at a task in the original studies (grammar, logic and humour judgement) didn't just perform poorly. They were, as the paper's title put it, "unskilled and unaware of it". The metacognitive machinery they would have needed to see the gap was the same machinery they were missing.

The original chart has been picked apart in the years since. Gignac and Zajenkowski showed in 2020 that much of the dramatic curve everyone reposts on LinkedIn is partly a statistical artefact, and the relationship between actual ability and self-assessed ability is closer to linear than the iconic peak-and-valley shape suggests. So if you want to be precise about it, Dunning-Kruger is contested as a clean phenomenon in the literature.

But the metaphor describes something every PM has watched happen in real product data. There is a phase, immediately after a user encounters something new, where their confidence runs miles ahead of their understanding. The webcomic artist Zach Weinersmith called the spike "Mount Stupid" back in 2011 (not my words, but a useful one). It's the spot on the curve where a user has tried your feature twice and is confident enough to write you a strongly-worded email about how it should work.

Why your launch metrics lie to you

Here's the practical problem. The phase of the curve where confidence is high and understanding is low is exactly the phase where most launch metrics get taken.

Day-one , activation rate, time-to-first-value, week-one DAU. All of these capture users at peak confidence, which is also peak fragility. They feel they understand the product. They haven't yet hit the boundary where their model of it breaks. So the data looks great. So leadership cheers. So the team ships the next thing.

The collapse comes later, in the metrics that actually matter to the business. Week-four retention. Renewal conversations. Support volume in month two. Churn surveys. By the time the trough shows up the team has moved on, and the lesson gets misfiled as "we shipped something users didn't really want". The truth is usually that you shipped something users were instantly confident about, and that confidence was poorly calibrated.

This is true for features. It is also true for almost anything with a non-obvious mental model: a new pricing structure, a redesigned permissions system, an automation tool, a smart inbox or a workflow change. The curve doesn't care what's under the hood.

AI is the same shape, only louder and faster

AI hasn't invented this dynamic. It has compressed and amplified it to the point where you can now watch the curve play out in public, in real time, with each new model release.

ChatGPT hit a hundred million users in two months. Within six months, lawyers in Mata v. Avianca were submitting fake citations to federal court because the model had given them confidently. Within fifteen months, a Canadian tribunal had ruled Air Canada liable for promises a chatbot had made about bereavement fares. The cycle from peak confidence to disillusionment, which used to take years for a whole category of technology, was running in months for individual users.

The most useful single piece of evidence for what this looks like at the user level is the METR study published in 2025. Sixteen experienced open-source developers worked on a set of real tasks with and without AI assistance. They were, on average, 19% slower with the AI. They believed they had been 20% faster. That 39-point gap between perception and reality is Mount Stupid distilled into a single chart, in the most AI-literate user population on the planet.

The "jagged frontier" study from BCG and Harvard tells the same story from the other direction. Consultants using GPT-4 did materially better on tasks inside the model's capability frontier and meaningfully worse on tasks outside it. Critically, they couldn't tell which side of the line they were on in advance. The hard problem of an AI-era product is not the model. It's that users can't see where the cliff is.

What makes this a uniquely 2020s problem is that the cycle now resets every model release. GPT-4 launched, Claude 3 launched, Apple Intelligence launched, agents launched. Each time, the curve restarted from the left-hand peak with a new set of inflated expectations and a new disappointment baked in. Gartner's 2025 hype cycle put AI agents at the Peak of Inflated Expectations while putting generative AI itself in the Trough of Disillusionment, and the two sit on the same chart at the same time. The cycle is no longer linear. It runs concurrent waves.

The launch curve · click a phase
↑ ConfidenceTime since launch →
What you'll see

Activation up. NPS up. Week-one DAU through the roof.

Watch out for

Confidence is high; understanding is low. Users haven't yet hit the boundary where their mental model breaks.

Do this

Treat day-one metrics as confidence indicators, not value indicators. Pair every confidence metric with a competence metric scheduled for week three or later.

What this means for the way you launch

Once you start reading product launches as confidence-vs-competence curves, four things change about how you actually do the job.

1. Treat day-one metrics as confidence indicators, not value indicators.

A great onboarding completion rate tells you users felt confident enough to finish the flow. It does not tell you they understood it. Pair every confidence metric with a competence metric. Activation rate with task success rate at week three. NPS with retention at sixty days. Demo bookings with second-call conversion. If you only watch the first column you will only see the front of the curve, and your will quietly reward the wrong behaviour.

2. Engineer reality friction into the experience.

The kindest thing you can do for a user on the way up Mount Stupid is to nudge their mental model towards the boundary before they discover it the hard way. For AI features specifically that means visible confidence signals, hard limits made obvious and example failures shown alongside example successes. For non-AI features it means realistic empty states, sample data that exposes edge cases and onboarding flows that test understanding instead of clicks. A user who has seen the cliff before they reach it is much less likely to walk off it.

3. Don't go quiet during the trough.

The disillusionment phase is the most dangerous phase for the brand, because users feel they were misled. The instinct is to go quiet, ship a hot-fix and hope it blows over. Resist it. The damage is rarely from the rough launch itself. It's from the silence that follows. Plan for the trough at the same time you plan the launch. Have communication ready, fixes scoped and recovery metrics defined, not just release metrics.

4. Re-baseline expectations on every model release.

This one is specific to AI, but the principle generalises to anything where the underlying capability changes underneath your product. Every time the model changes the user's mental model has to be rebuilt from scratch, and the curve resets to the left-hand peak. Treat each major version like a new launch internally, even if marketing treats it as an upgrade. Your support volume and your week-three retention in the seven days following the change will tell you whether you handled it well.


The real reason Dunning-Kruger matters for PMs isn't that users are foolish. It's that confidence is the easiest signal to read and the most misleading one to act on. The curve is a discipline against the part of the job that wants to celebrate the launch metric and move on. Read the whole curve. Plan for the whole curve. Especially now, when the curve is louder and faster, and it resets every time someone publishes a model card.

Notable Quotes
Scroll
  • People who are unskilled in these domains suffer a dual burden: not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realise it.
    Justin Kruger and David DunningJournal of Personality and Social Psychology1999
  • Today's AI is the worst AI you will ever use.
    Ethan MollickWharton, One Useful Thing2023
  • Early publicity produces a number of success stories — often accompanied by scores of failures. Expectations rise above the current reality of what can be achieved.
    GartnerDefinition of the Peak of Inflated Expectations2024
Related Reading
Scroll

More to read