The 'Should We?' Question: Evaluating AI Before You Commit

There is a common type of discussion happening at lots of companies right now. It may start when a competitor ships an AI feature, when a key stakeholder asks about the AI strategy, or when someone returns from a conference excited about what AI could do. The conclusion of “we need AI in our product” often follows quickly and builds momentum.

The problem isn’t that AI features are a bad idea. They often aren’t. We’ve built AI features to rapidly summarize large volumes of information, to automate predictable and repetitive tasks, and to find relevant answers to user queries from large data libraries. Those features have all been useful additions. The problem is when the push to AI features occurs before arriving at a clear vision for what you’re hoping they will accomplish and a way to know whether they’re working.

Evaluating whether AI makes sense for your product requires asking targeted questions at the right times. Teams often skip past those questions, not as a deliberate omission, but because by the time “should we?” surfaces, the “we’re doing it” decision has already been made.

Ask the question before you have momentum

The evaluation of “should we?” works best earlier in the process. When a product team is deep into solution design (debating architecture, scoping features, building a roadmap) the “should we be doing this at all?” question rarely comes up. By that point, there are both excited individuals and organizational momentum behind the effort. Reconsidering at that point feels disruptive, even if it’s warranted.

The most effective evaluation should happen when you’re still defining the problem and before the solution has taken shape. Beyond that, though, pay attention if you find yourself evaluating whether AI makes sense after the work is already underway. It doesn’t mean you can’t course-correct, but it does mean you’ll be asking the question with less objectivity than you’d like.

Define what you’re hoping to gain and how you’d know

To do a cost/benefit evaluation, you need a clear definition of the benefit you’re expecting. This may sound straightforward, but it’s a place where many initiatives may hit a wall.

It’s common to be able to describe AI features in reasonable detail (what they would do, generally how they would work, etc.) without being able to describe the outcomes (how things would be better for your user, what measurable metrics would change, etc.).

On that metrics front, the important question is how you’ll know whether the feature is working once it ships. The bar isn’t whether the feature functions, it’s whether the feature is achieving the expected measurable outcomes. If you have an AI feature in mind without defined success criteria, keep working on the definition before moving forward.

Getting specific here doesn’t have to be an overly laborious process. It can start as simple as writing down a few sentences covering what you expect should improve, for whom, and how you’ll measure if it worked.

The cost/benefit evaluation

Once you have a clear definition of what you expect to gain, you can move into the cost/benefit evaluation. The benefits side of that comparison is where many teams spend the majority of their time.

The costs side deserves more attention, though, and in particular costs that may not show up in the initial estimate. While there are many types of costs to consider, let’s look at three potentially overlooked costs as examples.

Data availability and quality

A common refrain is that AI features are only as good as the data behind them. What this means in practice is often underestimated.

Your teams know they have data. The harder question is whether it’s the right data, in sufficient volume, and in a state that’s actually usable. There is often a meaningful gap between those two. Data collected for one purpose does not always translate neatly to use within AI features built for another purpose. The data may require significant cleaning and preparation before it’s usable.

AI features with data governance and data architecture as foundations.

In some cases, the data may not even exist yet and would need to be collected, which can considerably push back realistic timelines. One example we’ve seen is a desired feature to automatically classify data into one of two buckets. The system had decades of historical data. However, only matching cases had been kept, the previous “negative” cases had been discarded. So while there was lots of historical data, a key part of the data to build an AI classification system was missing and prevented a desired AI feature.

The gap between presumed and actual data readiness is one of the more common places we see AI initiatives hit a roadblock. Getting a clear assessment of where your data actually stands before committing to building tends to reduce project delaying surprises down the road.

Model drift

Unlike many software features, AI features don’t stay static after they ship. The patterns built into a model during training reflect a particular point in time. As user behavior changes, context shifts, and edge cases grow, a model that performed well at launch can quietly degrade over time. Further, AI models are constantly being revised and responses you see today may change tomorrow when a model update rolls out.

This means that the cost of an AI feature isn’t a one-time build cost. Most software requires some amount of maintenance, but the work of maintaining AI features can be different. It includes a different type of monitoring, periodic evaluation, and re-testing as models are updated. Someone has to own that work on an ongoing basis, and that needs to be accounted for when looking at the overall maintenance effort for a product’s lifecycle.

What does the team responsible for this feature look like a year after launch?

Explainability

When an AI feature produces a result (a recommendation, a decision, etc.), there will be situations where someone needs to understand not just what the output was, but how the system arrived at it.

In some industries this may be a compliance requirement. More broadly, though, it is a trust question. Users who receive an unexpected response or stakeholders reviewing an AI-driven decision will often want a meaningful explanation. “The model decided” is rarely a satisfying answer, and in some contexts may not be an acceptable one. Part of your evaluation should include an assessment of how much explainability your specific use case requires.

The extent to which your system needs to be able to explain to your users why a decision was made has a direct impact on the size of the required effort. Beyond the planned effort to build your core idea, have you also budgeted time and cost for supporting attributes like explainability?

The question you should be able to answer

The goal of all this isn’t to create a barrier to building; it’s to make sure the decision to build AI features is a solid one.

At the end of the day, how will you know whether your AI features are working?

If you can answer that clearly and specifically (both outcomes and measurable metrics) that’s a good sign that enough analysis has been done. If the answer is vague or focuses on feature descriptions instead of outcomes, the definition work isn’t done yet. Going back to refine that before moving forward will serve you better than discovering it once the build is already underway.

Related Posts:

The ‘Should We?’ Question: Evaluating AI Before You Commit

Aaron Alexander

In this article