What We Learned Building UX for AI: Why Chat Fails at scale & what everyone gets wrong about ‘THAT MIT REPORT'

David Isaac
Aug 27
8 min read

Updated: Aug 28

AI Chat wasn't designed to make strategy visible and editable. Teams need simulation & user testing to turn opinions into operating plans. Here's the playbook. — AI is great, but Chatbots less so for enterprise use

When the MIT study landed with the headline number everyone remembers: ‘95 percent of enterprise AI pilots fail,’ we winced, then nodded. We had already seen (and lived) the pattern. Chat looked magical but in real life, it collapsed under the weight of multi-actor workflows, shifting priorities, missing data, adding new data and handoffs that never stayed in sync. AI is great, chat isn’t enough of an end-to-end tool.

This is a founder’s account of what broke, what we changed, and what we now recommend to teams who want AI to produce measurable ROI. If you are wrestling with product and GTM decisions, prioritization, budgeting, and the messy reality between strategy and execution, this is for you.

“Chat is where ideas start, not where decisions finish.”

The confession: Our early use of chat in mission-critical work looked great… for a minute

We began with the same optimism many teams feel. Chat promised velocity in consulting, in strategy, in product management. One prompt could generate a competitive tear-down, a feature brief, or a first pass at ICP messaging. For early exploration, it truly helped.

Then the cracks appeared:

Context reset. Every serious decision required reconstructing background, constraints, and dependencies.
New data/ insights became rabbit holes to recreate the output needed. Revised outputs were both better AND worse. Iteration killed the desire to use AI (in chat, except in piecemeal fashion).
No visible learning, no certainty of update for complex outputs: Feedback disappeared into transcripts. Teams asked, “Did the system update its view of the world or are we guessing again?”
Ownership ambiguity. A paragraph is not a decision object. Who edits it? Who approves it? Where does it live? Who tracks it becoming real work?
Zero write-back. Good answers did not become tickets, roadmaps, briefs, artifacts or enablement assets without manual rework.

We saw shadow AI bloom everywhere. Individuals used personal tools daily because they were flexible. But the flexibility didn't translate to shrinking silo's , data governance was an issue and so the impact didn't scale across their company.

If your AI can’t learn, your people won’t trust it

The diagnosis: the problem was not the models, it was the modality

Chat is a fantastic scratchpad. It is a poor substrate for multi-week, multi-actor decisions that need to persist, evolve, and integrate. Strategy lives or dies on the quality of handoffs: product to design to engineering to marketing to sales to customer success. Those handoffs require structure, state, and clarity for execution to improve over time.

A System of Record for Growth Teams

We realized that if we wanted AI to truly help teams prioritize features, map GTM, and allocate budget, we needed a UI and workflow that could:

Persist state and memory across cycles
Represent decisions as first-class objects, not paragraphs
Prioritize which initiatives to validate, and remove frictions in testing with customers
Show what changed and why when new learning arrives
Orchestrate actions into the tools people already use
Capture outcomes and learn from them

There are no tools for this Strategy planning AND Execution; only communication tools that were used to present reactions to data. That is how UINUX was born.

Why we designed a UINUX philosophy: Unified Interface for Next-Gen UX AI Interaction

UINUX is a design principle, not a product. The idea is simple: make decisions structured, enable rapid testing, make learning visible, and make cross-team orchestration native.

Unified interface. One place where product, GTM, and strategy interact with AI through structured decision objects: features, segments, experiments, dependencies, capacity, budgets.
Next-gen UX. Stateful screens with clear inputs, explainable outputs, human-in-the-loop controls, and a diff of “what changed and why.”

Agentic orchestration. Decisions push into systems of record through APIs and modern protocols. Actions return with outcomes, and the model updates its weights.
Rapid deployment, low spin-up time to value.

A platform designed to make strategy visible and editable.

What had to change to build a learning strategy operations and testing tool

We re-wired the build/ measure/ learn philosophy, and used AI natively to augment data gaps. AIPath was designed around three loops that had to complete for users to more reliably get to ROI.

Loop 1: From unstructured to structured

Every serious decision now has a schema. A feature, for example, carries evidence, target segment, outcome hypothesis, effort, risk, and dependency friction. A GTM asset carries ICP, promise, proof, and linkage to the features it depends on. An experiment carries a hypothesis, effect size, and acceptance criteria.

Once the work is structured:

We can simulate impact by ICP and scenario
We can rank options with explainable scores
We can write back to the roadmap, tickets, and enablement hubs
We can learn from outcomes and adjust the best next steps for product and marketing teams

A backlog without evidence is a wishlist with a schedule.

Loop 2: From opinions to simulations

Instead of arguing, we simulate tradeoffs. What happens to adoption if we ship Feature A for ICP-1 this quarter and defer Feature B for ICP-2 to next quarter? What happens to CAC and onboarding time if we focus product-marketing on Feature C instead of Feature D? Simulation and validation don’t replace judgment. They make judgment auditable and propose better options.

This is where our Prioritized Dependency Index (PDI) matters. PDI makes hidden friction explicit. A beautiful idea that blocks on three teams and two integrations is not a Q1 idea.

Simulation that aligns desirability, feasibility and viability makes that clear with rapid testing before the opportunity costs and lost revenue occur.

Loop 3: From static plans to learning systems

New learning arrives constantly:

Customers say something changed in discovery calls
Competitors reposition and add features
Sales cycles lengthen or shorten
Teams gain new capabilities and retire old ones

In an AI Native System of record for Product & GTM, every new signal recomputes relevant priorities and refreshes the linked GTM assets.

A weekly digest shows leaders “what changed and why.”

Human reviewers approve or edit where needed with full human in the loop.

Learning is no longer hidden inside chats or stale decks that were rushed in the first place, or were opinions not based on rapid customer testing.

Simulation & user testing turns opinions into operating plans.

Hypotheses and outcomes are visible, resulting decisions are editable and connected to the plan.

Where existing tools fit: Productboard and Aha! remain essential

Go from “We can do everything next year” to “We can only fund a few things.”

When trade-offs are made visible across product, sales, and leadership, the organization chooses with clarity instead of pretending capacity is unlimited. The real decision is not between doing everything next year or funding only a few things. It is about backing the right few with full resources so they actually ship.

Half-funded priorities are where strategy goes to die.

We did not set out to replace roadmapping tools. Productboard and Aha! are excellent canvases for intake, fields, and visualization. We complement them with a learning layer that AI concierges so that experimentation isn’t a dark art, and trade-offs are visible to every team from product to sales to leadership.

We designed AIPath to be always biased to test with real customers, and always ready to update the plan when the world changes.

Evidence enrichment. We attach structured research, market signals, pipeline impacts, and link to business model impact that you need that quarter.
Anticipating competitor moves and weaknesses: adds to your decision-making rubric

No strategy is complete without goals: Simulations and tests support leadership. Being clear on whether margin compression or churn are the biggest focus this quarter. Working backwards from business impact guides budgets, product & GTM priorities most clearly.

Dependency-aware prioritization. PDI exposes sequencing risk and capacity constraints so priorities are feasible, not just desirable.
Simulation. We preview impact across ICPs and scenarios before commit.

Write-back and sync. We support updated priorities, epics, briefs and enablement to be ready to go into the tools teams already live in.

Outcome learning. As results come in, the simulations update and leaders update their priorities and next step with full visibility of best next options. Everyone can see whether we are testing the right things.
There is a clear way to decide “which best idea wins”

Ideas to prioritized opportunities to execution:

AI Chat is great, but the more you need from it the more you’re overwhelmed with sheer walls of text.

A tool mapped to actual workflows, not conversation, should select the right component for the current decision, not flood you with copy.

Product management

Stop losing time re-prioritizing backlogs manually when inputs change
See dependency friction and capacity constraints before you commit
Compare simulations instead of debating feelings

Growth and marketing

Get ICP-specific value propositions, proof points, and objection handling linked to the roadmap
Watch GTM assets update when the roadmap changes, not months later
Instrument win-loss to refine messaging continuously

Executives and finance

Review weekly “what changed and why” diffs instead of chasing status updates
Tie budget lines to decisions and rapid user testing, not just market research reports
Make decisions with data because the reasoning and evidence are visible

False tradeoffs you should learn to reject

AI is useful or not useful. Use AI to ideate and test. Use stateful (AI-Native) UI to decide, communicate and ship. Use chat for edge cases, and pipe what matters back into the system so nothing gets lost or under-weighted.
Build vs buy. Favor partnerships for speed and outcome alignment. Build where your context is unique. Buy where learning systems are already working. Sometimes you can Buy to learn, then Build with a more informed base
Front office vs back office. Start where ROI is measurable and external spend is high. Use those wins to fund bolder front-office bets.
Speed vs control. Structured decisions with evidence, guardrails and customer evidence are faster because they reduce back-tracking and wasted investment

Tips for Founders: Design AI around workflows, not prompts.

The messy middle: strategy direction vs execution updates

This is the hardest part to get right. Leadership sets direction quarterly. Frontline teams learn daily. Without a place to reconcile the two, plans drift or teams spin their wheels. Busy-work is often an outcome.

Here is how we designed a workflow to manage the mess:

Direction lives as structured objectives with clear decision objects beneath them
Execution updates post to a weekly digest of diffs and justifications
Leaders approve exceptions that change direction or budget
Everyone sees the lineage from objective to feature to GTM asset to outcome

This is what we mean when we say AIPath makes strategy visible and editable. It is not another deck. It is a living system that shows its work, with customer evidence at its core.

A practical 90-day playbook anyone can copy

Phase 1: Map and instrument

Define schemas for features, segments, experiments, dependencies, and budgets
Connect your roadmap tool, ticketing, CRM, analytics, and doc stack
Stand up review queues and change logs so learning is visible and safe

Phase 2: Enrich and simulate

Normalize research and feedback into structured evidence linked to ideas and predict where impact is for your business model
Run the first simulations with key teams to expose sequencing and capacity risks
Propose roadmap variants with explainable trade-offs and use them for execution updates

Phase 3: Orchestrate and learn

Push approved changes into tools and capture outcomes
Publish weekly diffs of what changed and why for leadership and teams
Retire any initiatives that testing shows do not move the correct metrics forward

Frequently asked questions

Why does chat fail in AI pilots after strong demos?

Because pilots reward single-player exploration. Production demands multi-actor persistence, ownership, and write-back. Chat is a scratchpad, not a system of record.

What is UINUX in practice?

A unified, stateful interface that represents decisions as structured objects, shows diffs when learning arrives, and orchestrates actions across your stack. It is a concept we designed to build AIPath and push the boundaries of human x LLM interaction beyond what was possible, from GPT 3.5 till date..

How does AIPath connect product and GTM?

Every feature or experiment links to ICPs, value propositions, and enablement. Business needs design experiments and test hypothesis. When learnings change, the linked assets update. Sales sees it. Marketing sees it. No more shadow versions. Homepages stay as updated at the product, and sales materials are always in sync, use cases are always personalized for every customer.

How does budgeting fit?

Capacity and budget live next to decisions, not in a separate spreadsheet. With PDI and scenario toggles, leaders can reallocate funds based on visible tradeoffs in business impact, product engagement & retention, etc.

Do we need to replace Productboard or Aha!?

No. Keep product management or project management tools as the canvas. Leverage an AI-native learning layer that enriches, simulates, resolves dependencies, writes back and learns from outcomes.

Closing thought

If you want AI to deliver ROI, make learning visible and connected to your plan.

AI works best when useful new data is fed-in, and everyone can understand what changed, and why.