Many senior leaders are scared to ask, fearful they won’t understand the answer or worried about asking a stupid question.
This is a big issue. We all need to better understand, in plain terms, how these tools work, how they might fail, and whether they’ll deliver real commercial impact for your business. This wave of AI technology is so much more accessible than we have seen before but I have found myself that you don’t properly understand it until you are willing to ask what I call questions at the “second layer”. Just go a little bit deeper.
Too often, I see conversations getting stuck on abstract risks or endless training data debates, when the more fundamental questions are ignored: Does this thing actually work? Will it scale? Can I trust it with my business model?
Why This Matters
This wave of AI technology looks like magic.
But magic isn’t enough, it needs to be reliable, explainable, and commercially transformative. You want ROI.
Think about autonomous vehicles: if Uber or UPS were to bet their business model on driverless fleets, they’d need more than slick demos. They’d need absolute certainty the system is more reliable than a human driver. They need to be sure they will see payback on a wholesale switch to a new driverless fleet which will be a multi-year business transformation.
It’s the same with AI in the enterprise. “Co-pilots” are nice — but they won’t fundamentally change commercial models. Autonomous agents might. But only if they scale, remove the need for constant human oversight and prove their ROI.
That’s why leaders need to be asking tougher questions from their AI providers. Here’s my guidebook for doing just that. Six, easy-to-ask, deceptively simple questions that drive you into the “second layer”.
The issue: AI is now a marketing umbrella. Every vendor calls their system “AI-powered” when it’s just a decision tree or the thinnest wrapper around ChatGPT. AI today normally means LLM (large language model like Gemini or Chat GPT) but it could mean machine learning which has been around for yonks and does something different (and needs LOADS of data to be effective).
If you don’t know what you’re buying, you can’t know its limits.
Ask: “Is this system an LLM, an ML model, a rules-based system — or some combination? Show me exactly what sits under the hood.”
The issue: AI systems often personalise results — but on what basis? Purchase history is one thing; inferring gender, race, or socio-economic status is a big issue. If the signals are murky, the outputs risk crossing into discrimination. Did anyone see the recent example where LLMs encourage women to ask for lower salaries in negotiations than men? You can imagine how pissed off I was reading that one...
Ask: “What signals does the system use to differentiate between users, and can you prove it doesn’t bias on protected characteristics?”
The issue: Every AI has blind spots. The critical question is not whether it will fail (it will) — but how. Does it escalate to a human? Does it politely ask for clarification? Or does it hallucinate nonsense that could damage trust?
Ask: “Show me examples where your system misunderstood the input — what happened next?”
The issue: Demos are cherry-picked and case studies are highly curated. I should know I live by them!! What you need is the full statistical spread and the opportunity to test the technology for yourself. When testing, think carefully about the eventual user - do your best to put yourself in their shoes.
We see something interesting at Nibble in this regard: Our demo bot sees huge, statistically significant, differences in behaviour between people testing our demo and people using it for real. Real conversations are 50% shorter in time and the initial financial bids are 15% “better” i.e. real users are infinitely more reasonable than casual testers who are mostly thinking “how dumb is this chatbot / can I break it?”.
Real world negotiation failures happen very differently, they are when someone is using the chatbot in a genuine way to try and reach agreement but it is simply a scenario we have not planned and trained for yet. These are the ones you want to discover in testing.
Ask: “What’s your accuracy rate across 1,000 random examples, not your curated top five?” or (better) “Can I test it myself in a live test environment” and “how would you recommend I test it, what KPIs and metrics would you measure?”
The issue: A black-box recommendation (“the AI says so”) is commercially useless and often non-compliant in regulated industries. You need to know why the system made a decision — and whether you can audit that reasoning.
In fact, we are seeing some industries avoiding LLM based technology altogether because it is so hard to explain. This doesn’t need to be the case. At Nibble we work on a hybrid LLM / deterministic approach which means all decisions taken by the bot are explainable.
Ask: “Can you explain, in plain English, why the AI recommended X over Y?”
The issue: Models only know what they were trained on. An LLM might be capped at 2023 data. A machine-learning system might only know your internal datasets. If you don’t know how it learns or updates, you risk using stale, narrow, or even junk information.
But here’s the nuance: agents built on top of LLMs don’t usually retrain the base model every time. Instead, they improve by using structured memory, feedback loops, and rules. This is in the coding, not in the AI.
Take negotiation as an example: if an AI agent negotiates with 50 suppliers, you don’t need to retrain the underlying LLM to handle the next 50. Instead, the agent layer might be coded to:
This means the LLM provides language ability, but the agent provides learning ability in context. That’s very different from retraining the model itself.
Ask: “How does your system improve over time without retraining the base model?”
This is the level of detail you should be pushing for. Not the glossy demo. Not the curated success story. But the uncomfortable, second-layer answers that prove the system is safe, explainable and commercially viable.
These aren’t PhD-level questions. You do not need to be an expert or an AI developer to understand how the technology works. Don’t forget I used to be a finance person, I am not a die-hard AI expert, I just like learning about all this so I keep asking questions.
These are conscientious leadership questions. And the leaders who ask them will be the ones who implement AI with confidence and deliver the ROI.
Bookmark this list. Use it in your next board meeting, AI strategy workshop, or sales pitch with Google, Microsoft or Salesforce. It might be the most commercially important set of questions you ask this year.
Find out more from Nibble's experience negotiating 100,000 times a month here.
Interested in Nibble?