What are the three patterns of the AI Confusion Tax?

1. Wrong AI type for the problem (using GenAI where classical ML fits). 2. Wrong problem selected (using classification where survival analysis fits). 3. Data before decision (spending months on data prep without knowing what decision the AI is supposed to improve).

What are the 5 AI strategic skills for executives?

The 5 AI strategic skills are: (1) Problem-Solution Mapping — knowing which AI tool solves which business problem and avoiding the mistake of forcing one tool type onto every problem; (2) Architectural Literacy — understanding enough about AI system architecture to know when a vendor is building a dependency versus solving a problem; (3) Data Viability — knowing what data needs to be captured today to enable prediction six months from now; (4) Time-to-Value — defining payback milestones before starting an AI project, not after; and (5) Prompt Strategy — understanding how to structure AI interactions for business outcomes, not just generating outputs.

What's the difference between strategic AI skills and technical AI skills?

Technical AI skills (coding, model training, data engineering) belong to the technical team. Strategic AI skills belong to business leaders. The key distinction: you own the Pain, they own the Steps. Executives define the strategy, articulate the pain (Point A) and destination (Point B), and link model accuracy to dollars saved. Technical teams choose models, clean data, and build integration. The gap between these two worlds — where nobody translates technical output to business value — is where most AI projects fail.

Why do most AI projects fail due to translation error?

The CEO says 'I want more revenue.' The data team hears 'I need a model with high accuracy score.' These are not the same thing. Revenue can come from lowering churn, but only if you focus on high-CLTV customers. A technically accurate model on low-value customers is meaningless. The translation error — executives not defining the business lever and technical teams not asking — produces a technically successful model that drives zero actual revenue.

How does the dashboard analogy explain AI literacy for executives?

Think of AI as car instrumentation: the Rearview Mirror (Exploratory AI) answers 'what just happened?' — traditional dashboards. The Windshield (Predictive AI) answers 'what's coming at us?' — forecasting and churn prediction. The GPS Map (Prescriptive AI) answers 'what's the best route?' — optimization and recommendations. Leaders need to know how to read all three, not build any of them. Each requires different AI tools: exploratory uses dashboards, predictive uses machine learning, prescriptive uses optimization.

What does 'you cannot delegate delivery of value' mean in AI?

AI isn't like buying Microsoft Office where you install it and walk away. Leaders cannot delegate the small involvement (defining the North Star metric) or the large involvement (revamping business processes to actually use the new intelligence). You also cannot delegate the Data Viability Instinct — asking 'what data do we need to capture today to predict the future in six months?' If you delegate the business process change or data strategy, you don't own the value. You just own the bill.

What does the plier metaphor mean for GenAI strategy?

GenAI is a powerful tool, but like pliers — it's specialized for specific tasks (gripping, pulling, twisting). Using pliers to drive screws, hammer nails, or dig foundations fails or damages things. Similarly, GenAI (LLMs) is world-class at summarizing documents, drafting emails, and answering questions — but using it to predict inventory needs, detect fraud, or optimize routes is applying the wrong tool to the wrong problem. Companies saying 'we have a GenAI strategy' are often trying to hammer every nail with pliers.

What are the three dangers of 'plier-only' thinking in AI?

The three dangers are: (1) The Hallucination Trap — using an LLM to 'predict' inventory needs when LLMs are built for language probability, not mathematical optimization; (2) The Thin-Wrapper Problem — building a chat interface on top of bad data where the plating looks nice but the food is terrible; (3) The ROI Gap — spending millions on GPU tokens to do work that simple linear regression or traditional ML can do faster, cheaper, and more accurately. These patterns are primary reasons why most AI projects fail.

How do you know which AI tool to use for which problem?

The AI Umbrella framework gives the mapping: (1) Need to predict churn? Use Machine Learning (the Torque Wrench); (2) Need to find the best delivery route? Use Optimization & Planning (the GPS); (3) Need to find fraud patterns in a network? Use Graph Algorithms (the X-Ray); (4) Need to summarize a 50-page contract? Use GenAI (the Pliers). Your job as a leader isn't to build the tools — it's to know when to use each one.

Why is prompt engineering the least important of the 5 AI strategic skills?

Prompt engineering is the entry-level skill — it's about how you talk to the AI. Problem-solution mapping, architectural literacy, data viability, and time-to-value matter far more. You could write the perfect prompt and still fail because you chose the wrong problem to apply GenAI to, or because your data is missing, or because the output doesn't map to any business decision. Leaders who focus on prompting miss the strategic skills that actually drive AI value.

What is the dataset-first trap in AI projects?

The dataset-first trap is the pattern where an AI project starts with a dataset, builds dashboards and models, shows a demo with good accuracy — and then nobody can answer 'what decision does this support?' The project optimizes prediction metrics (error, F1, AUC, MAPE) without linking them to business decisions. The result: technically good models that drive zero business impact. The trap wastes time, money, and faith, and it gives AI a bad name.

What are the 6 steps of the Decision Map framework?

The Decision Map has 6 steps: (1) Name the Decision — who decides, how often, what options; (2) Define the Decision Metric — primary and secondary measures of success; (3) Map the Current Process — where decisions are made, who is involved, where delays happen; (4) Identify the Minimal Action Trigger — what threshold or output causes a behavior change; (5) List Minimal Signals — the smallest set of data needed to reduce uncertainty for this decision; (6) Plan the Feedback Loop — how to measure the decision metric after deployment. Completing this map does 80% of the work most teams skip.

Why do models with better accuracy sometimes produce worse business outcomes?

Because accuracy is not the same as decision value. A model can be 95% accurate on the wrong question and produce confident wrong answers. If your data was collected for reporting purposes and your decision is made on a different variable, the model is accurate to the wrong question. Research in 'decision-focused learning' (smart predict-then-optimize) shows that training models directly for decision loss — not prediction loss — yields better business outcomes even with 'worse' prediction metrics.

How does decision-first prevent AI project failures?

Decision-first forces alignment before investment. If the decision owner cannot commit to a measurable metric, the project is paused before spending begins. This prevents the common failure of building technically impressive systems nobody uses. The telecom case study demonstrates this: a real-time anomaly detection system saved ~£80K/month because the decision was so clearly scoped — 'dispatch a technician for a suspected DSL fault' — that value could be measured before full scale. The clarity came from starting with the decision, not the data.

What does 'start with a rule baseline' mean in decision-first AI?

Before building any model, define a simple rule that represents the current baseline behavior — for example, 'if X > T, create a ticket.' This rule is your fail-safe and your benchmark. If a machine learning model cannot beat that rule in decision impact — not just prediction accuracy — it should be scrapped. Many projects spend months and significant budget building models that marginally improve accuracy metrics while matching or underperforming simple business rules in actual business outcomes.

What is the AI Umbrella framework?

The AI Umbrella is a categorization framework that maps all AI tools into 5 pillars: (1) Machine Learning — pattern recognition for prediction and classification; (2) Graph Algorithms — mapping relationships and networks; (3) Optimization & Planning — finding the best route, allocation, or schedule; (4) Human-Machine Interface — robotics and collaborative systems; and (5) Legacy/Foundational — expert systems and rule-based approaches now embedded in modern methods. The framework helps leaders match business problems to the right AI tool, rather than defaulting everything to GenAI.

Why is GenAI only 8.6% of the AI market?

The total AI market is ~$235 billion. GenAI (ChatGPT-class language models) is $20.2 billion — just 8.6%. Traditional machine learning dominates at 91.4% ($194.6 billion) because it quietly runs the highest-ROI business applications: fraud detection, recommendation engines, supply chain optimization. GenAI is visible because it's consumer-facing, but enterprise value flows from traditional ML, which delivers up to 30% ROI versus GenAI's ~12%.

What ROI do different AI types deliver?

Predictive maintenance delivers up to 400% ROI in 6 months. Fraud detection delivers 150% ROI within 9 months. Marketing automation using ML achieves up to 544% ROI annually. Traditional ML projects complete in 3-6 months. GenAI averages 12% ROI over the same period and takes 3-12 months with more tuning. The data is clear: for most business problems, traditional ML outperforms GenAI on ROI and speed.

Why do 95% of GenAI pilots fail to show real ROI?

GenAI pilots fail because they apply the wrong tool to the wrong problem. LLMs are built for language probability — summarizing, drafting, answering questions. When applied to mathematical optimization, inventory prediction, or fraud detection, they hallucinate or underperform. The failure pattern is consistent: a team uses GenAI for a problem that requires ML or optimization, then blames the data or the implementation. The problem was the tool selection, not the execution.

How do you use the AI Umbrella to avoid wrong tool selection?

Before starting any AI project, ask two questions: 'What problem am I trying to solve?' and 'Which part of the AI Umbrella fits?' If the answer is 'predict what will happen,' use ML. If it's 'find the best route or allocation,' use optimization. If it's 'find relationships in a network,' use graph algorithms. If it's 'summarize or generate text,' use GenAI. The mistake most companies make is using the AI that made headlines instead of the one that solves the actual problem.

Jitin Kapila

The AI Confusion Tax: Why Companies Burn Millions on Wrong AI

Jitin Kapila — Sat, 04 Apr 2026 18:30:00 GMT

Its Q2 2026 and companies still think AI means one thing: GenAI. Large language models. Chatbots. Content generation.

That’s not AI. That’s one tool in a 50-year-old toolkit.

Classification problems have classifiers. Forecasting problems have regression. Anomaly detection has isolation forests. Survival problems have survival models. Each one is cheaper, faster, and more accurate than GenAI for its specific job.

And yet, companies are paying 12x more for GenAI solutions to problems that $5,000 of classical ML handles better. Not to mention the ROI is non existent in earlier one.

That’s the AI Confusion Tax. You’re paying it whether you see the invoice or not.

The Case

This happened. a vendor quoted $62,000 a year for a supplier categorization problem. Implementation: $50K. Recurring API costs: $1K per month.

Now doing same job with topic modeling would cost: $15–20K to build. $10 a month to run.

They almost signed. They had no idea another way existed. Let that sink in!

Well this isn’t rare. RAND Corporation research puts the AI project failure rate at over 80%, twice the rate of non-AI technology projects. MIT’s 2025 report found 95% of generative AI pilots at enterprises are failing. And S&P Global’s 2025 survey found 42% of companies abandoned most of their AI initiatives this year — up from 17% in 2024.

The numbers keep climbing. Not because AI doesn’t work. Because the wrong AI gets applied to the wrong problem, at the wrong cost, with nobody asking the right questions upfront.

What is the AI Confusion Tax?

The AI Confusion Tax is the hidden cost companies pay when they select an AI solution before defining the problem it needs to solve. It shows up as overspend on the wrong tools, mismatched model types, and months of data preparation that never connects to a business decision.

It’s structural, not malicious. Vendors pitch what they sell. Companies don’t know the alternatives exist. The gap between those two facts is where the tax lives.

Three patterns account for the bulk of it.

1. Wrong AI type for the problem

A glass manufacturer wanted to organize their vendor base — which vendors sell which SKUs, how to consolidate for better negotiating power. A vendor came in with a solution built on LLM APIs. Expensive, slow, hard to explain to a supply chain manager.

The problem was text categorization. Topic modeling — LDA, basic NLP, even keyword extraction — would have solved it at a fraction of the cost. These are solved problems. They’re auditable. They’re cheap.

Recent research backs this up: fine-tuned small language models outperform GPT-4 on 85% of classification tasks while costing 10-100x less per inference. The hidden operational costs of LLMs — token monitoring, latency tracking, security audits — add 20-40% on top of the sticker price.

The vendor sold LLM because they sell LLM. The company bought it because they didn’t know topic modeling existed.

Same result. 12x cost difference. That’s not a technology gap. That’s an information gap — generative AI is one tool in the toolbox, not the toolbox itself.

2. Wrong problem selected

This one costs the most over time.

A food company wanted to predict when their inventory would expire. They hired a team to build a classification model — will this item decay or not?

The model never produced useful outputs. Low confidence, poor predictions, no usable time window.

Nobody asked the right question.

This isn’t a classification problem. It’s a survival analysis problem. You don’t need to know whether the item will decay. You need to know when it will decay — and how much time you have to act before it does.

The difference matters. Classification gives you a yes/no on the day you check. Survival analysis gives you a time window. In inventory management, that window is everything — it’s the difference between selling at full margin and writing off the batch.

The same mistake appears in churn prediction. Teams build classification models — will this customer churn? But customers don’t just switch overnight. They gradually reduce engagement, then spending, then leave. A classification model catches none of that progression. A survival model does.

The team blames the data. The data is fine. The model type was wrong from the start. The root cause is almost never data quality. It’s problem framing.

3. Data before decision

This one is expensive in a different way. It costs you in delay.

Every manufacturing and FMCG company I’ve worked with has started an AI project by running a data audit. Six months of cleaning, structuring, waiting for the right data.

$100K per quarter in lost optimization value.

The question nobody asks early enough: what decision are we trying to improve?

If you don’t know that, you don’t know which data matters. You end up preparing data that has nothing to do with your actual problem. The order matters. Decision first. Data second. Model third.

Vendors don’t push back. They scope the data work. The delay is yours.

The pattern underneath

Vendors pitch what they sell. Companies don’t know the alternatives. Nobody defines the problem before selecting the solution.

Every AI decision — vendor selection, project scoping, budget approval — has this tax embedded in it. Sometimes it’s $57K a year on the wrong tool. Sometimes it’s $100K a quarter in data prep delay. Sometimes it’s a model that works technically and delivers nothing commercially.

The sticker price on an AI vendor quote is only 40-55% of your actual total cost of ownership. The rest is hidden in integration, governance, and waste from solving the wrong problem.

Understanding which AI type fits your problem — whether it’s rules-based automation, analytics, predictive ML, or generative AI — is the single most important question you can answer before signing anything.

Before you sign the next contract

One question before any AI project gets approved: does the person scoping this understand what they’re scoping?

Not — is the model accurate? Not — is the vendor credible?

Can they tell you which AI type this problem needs? Classification, regression, clustering, survival analysis? And can they explain why that type is the right one for this specific business decision?

If they can’t answer that, they’re guessing. And you’re paying for the guess.

The confusion is solvable. You just need to know what questions to ask before anyone writes a contract.

Frequently Asked Questions

What is the AI Confusion Tax?

The AI Confusion Tax is the hidden cost companies pay when they select an AI solution — usually GenAI — before defining the problem it needs to . It shows up as 12x overspend on the wrong tools, mismatched model types, and months of data preparation that never connects to a business decision.

Why do companies pay the AI Confusion Tax?

Vendors pitch what they sell. Companies don’t know the alternatives exist. Every business problem has a specific ML model that’s cheaper, faster, and more accurate than GenAI for that job — but most leaders don’t know these alternatives exist.

How much does the wrong AI cost?

Companies routinely pay 12x more for GenAI solutions to problems that $5,000–$20,000 of classical ML handles better. A vendor quoted $62,000/year for a supplier categorization problem. The same job done with topic modeling: $15–20K to build, $10/month to run.

How do you avoid the AI Confusion Tax?

Before approving any AI project, ask: can the person scoping this tell you which AI type this problem needs — classification, regression, clustering, survival analysis — and why that type is the right one for this specific business decision? If they can’t answer, they’re guessing. And you’re paying for the guess.

If your AI projects are running into walls you can’t explain, the AI Strategy Audit finds the confusion tax in your current initiatives — and tells you exactly what to fix. Or start with the AI Profit Quotient to find out where your company actually sits on the AI spectrum.

AI Strategic Skills: Where Should a CEO Draw the Line?

Jitin Kapila — Wed, 28 Jan 2026 18:30:00 GMT

Driving your way with AI

We’ve talked about the skills you need to survive and the reasons most AI projects fail. Now, let’s talk about the destination.

What does a healthy, AI-enabled enterprise actually look like?

It doesn’t look like a room full of glowing servers or a workforce in a state of “FOMO” panic. It looks like a driver sitting in a well-engineered car, hands on the wheel, confident in exactly where they are going.

In my “AI Profit OS” framework, the ultimate goal isn’t just “automation”—it’s Clarity. But to get there, you need to know exactly where your job ends and the technical team’s job begins. If you blur this line, you’re not just micromanaging; you’re likely to crash the car.

Here is the “Resolution”: The final rulebook for governing AI without getting lost in the code.

1. The Line in the Sand: Strategy vs. Steps

I get asked all the time: “Jitin, how deep do I really need to go into the tech?”

The answer is simpler than you think: You own the Pain; they own the Steps.

The CEO & Senior Leaders: Your job is to define the Strategy and Value. You have to articulate the “Pain” (Point A) and the “Destination” (Point B). You’re the one who decides what success is actually worth to the P&L.
The Technical Team: They own the Execution. They choose the models, clean the data, manage the integration, and build the feedback loops.

The Crux: The magic (or the disaster) happens in the middle. This is where your “BS Detector” comes in. You don’t need to know how the algorithm works, but you must be able to translate their technical jargon (like “F1 scores” or “Accuracy”) into business outcomes. If you can’t link “model accuracy” to “dollars saved,” you don’t have a strategy—you have a science fair project.

2. The Dashboard Analogy: Driving the Business

Stop thinking of AI as a “magic box.” Think of it as the instrumentation of your car. To get from A to B effectively, you need three distinct views through the glass:

The Rearview Mirror (Exploratory AI): “What just happened?” (Traditional data analysis and dashboards).
The Windshield (Predictive AI): “What’s coming at us?” (Forecasting, Churn Prediction).
The GPS Map (Prescriptive AI): “What’s the best route to take?” (Optimization, Recommendation Engines).

Your Job: You are the driver. You don’t need to be a mechanic to know how the fuel injection works. But you must know how to read the Windshield and the Map. Part of this is Problem-Solution Mapping. If you try to use a “GenAI” chatbot to solve a “Math” problem (like inventory optimization), you’re going to drive off a cliff. Knowing which tool to pull from the AI Umbrella is the definition of strategic competence.

3. The One Thing You Cannot Delegate

I’ve seen leaders try to delegate everything—including the outcome. This is fatal. You cannot delegate the “Delivery of Value.”

AI isn’t like buying a copy of Microsoft Office where you “install it” and walk away. * Small Involvement: Defining the “North Star” Metric. * Large Involvement: Revamping a business process to actually use the new intelligence.

You also can’t delegate the Data Viability Instinct. You need to ask: “What data do we need to start capturing TODAY so we can predict the future in six months?” If you delegate the business process change or the data strategy, you don’t own the value—you just own the bill.

4. The “Promised Land”: From Grunt Work to Growth

What happens when you get this right? When you move from “Tier 1” (playing with ChatGPT) to “Tier 3” (The Strategist)?

The Day-to-Day Shift: * Before (Chaos): You’re playing defense. Worrying about overstocking, customer tickets, and hiring frantically just to keep up. * After (Leverage): You’re playing offense. Your day is spent deciding where to invest next. Which market to enter? What feature to launch?

The Human Impact: I once worked with a Fortune 500 client who feared AI would demoralize their team. The reality? The team stopped doing the “robot work” (the grunt work on high-selling products). Instead, they focused on the creative, high-leverage stuff—like upselling new products. They didn’t shrink; they just became more human.

5. The Monday Morning Report

How do you sleep at night? You don’t need to check code commits. You need to check the Value Report.

Every Monday, you should be looking for one thing: “Are the steps taken by the technical team moving us closer to the ROI we defined?”

This requires Architectural Literacy. You need to look past the “beautiful plating” of a fancy chatbot wrapper and ask if there’s an actual logic engine beneath it. If the answer is “Yes,” keep driving. If “No,” pull over and check the engine.

Lastly Your License to Operate

You don’t need to be a mechanic to drive a Ferrari, but you do need to know if the brakes work and exactly where you’re headed. The “AI Profit OS” isn’t about replacing you—it’s about giving you the dashboard you’ve always wanted.

Are you ready to take the wheel?

Check out the AI Profit OS Course | Book a Strategic Call

Why your GenAI strategy is just a pair of pliers in 2026

Jitin Kapila — Wed, 14 Jan 2026 18:30:00 GMT

You dont use just pliers for creating your dream home.

Imagine walking onto a massive construction site where a crew is trying to build a skyscraper. You look at their toolbelts and see one thing: a pair of pliers.

They are trying to tighten bolts with them. They are trying to hammer nails with them. They are even trying to dig the foundation with them.

It sounds ridiculous, right? But in 2026, this is exactly what most “GenAI Strategies” look like. Companies are buying the “pliers” (LLMs like ChatGPT, Claude, or Gemini) and trying to force them to solve every problem in the enterprise—from inventory optimization to fraud detection.

GenAI is a powerful tool. But it is just one tool in a very large toolbox.

1. The Plier Metaphor: What GenAI Actually Does

In my AI Umbrella framework, I categorize GenAI as a tool for handling Cognitive Load.

Pliers are great for a specific set of tasks: gripping, pulling, and twisting. Similarly, GenAI is world-class at: * Summarizing long documents. * Drafting emails or reports. * Answering basic customer queries. * Generating creative ideas.

But you wouldn’t use pliers to drive a screw (you’d use a screwdriver) or to dig a trench (you’d use an excavator). If you try to use a “Plier” (GenAI) to solve a “Math” problem or a “Relationship” problem, the tool will fail—or worse, it will “hallucinate” a solution that looks right but is dangerously wrong.

2. The Danger of “Plier-Only” Thinking

When a CEO tells me, “We have a GenAI strategy,” I usually hear, “We are trying to hammer every nail with pliers.” This is a primary reason why most AI projects fail.

Here is what happens when you use the wrong tool: * The Hallucination Trap: Using an LLM to “predict” inventory needs. LLMs are built for language probability, not mathematical optimization. * The Thin-Wrapper Problem: Building a “chat interface” on top of bad data. The “plating” looks nice, but the “food” is still terrible (as I discussed in AI Strategic Skills). * The ROI Gap: Spending millions on GPU tokens to do work that a simple linear regression (Traditional ML) could do faster, cheaper, and more accurately.

3. Building a Full Toolbox

To move from “Tier 1” (playing with tools) to “Tier 3” (The Strategist), you need to understand the rest of the toolbox.

As we explored in Decision-First AI, you must start with the Decision first, then pick the tool:

Need to predict churn? Use Machine Learning (The Torque Wrench).
Need to find the best delivery route? Use Optimization & Planning (The GPS).
Need to find fraud patterns in a network? Use Graph Algorithms (The X-Ray).
Need to summarize a 50-page contract? Now you pull out the GenAI Pliers.

4. The Executive’s Job: Choosing the Tool

Your job as a leader isn’t to know how to build the pliers. It’s to know when to use them.

In my “AI Profit OS” framework, we teach leaders to move past the hype and look at the “Umbrella.” If your technical team is suggesting a GenAI solution for a problem that requires structural data logic, you need to pull the emergency brake.

Are you building a skyscraper, or just playing with pliers?

Take the Next Step

Don’t let your strategy get stuck in “pilot purgatory” because you picked the wrong tool.

Join the AI Profit OS Cohort | Book a Strategic Call

If you need help identifying which tool in the AI Umbrella fits your specific business pain, let’s talk.

Why Most AI Strategies Fail

Jitin Kapila — Wed, 07 Jan 2026 18:30:00 GMT

Most AI projects don’t fail because the code is bad. They fail because they were born from the wrong emotions.

In my experience of advising Fortune 500s and startups on data science, I’ve seen hundreds of AI initiatives. The successful ones all start by identifying a specific business PAIN. The failed ones? They almost always begin with the “Triad of Bad Feelings”: Pressure, FOMO, or Anxiety.

Pressure: The Board says, “We need to do something with AI.”
FOMO (Fear Of Missing Out): You see a competitor launch a chatbot and feel the need to keep up.
Anxiety: You read the headlines and fear your business is becoming irrelevant in a hype cycle.

When you build from anxiety instead of pain, you get “Science Fair Projects”—expensive pilots that never touch the P&L. Here is my diagnostic guide to why your last AI project might have stalled, and the specific “Technical Fluency” gaps that likely caused it.

The #1 Silent Killer: Absence of Pain

If you cannot name the specific business pain you are solving, your project is already dead.

I recently audited a company where the C-Suite wanted “AI for efficiency.” That is not a pain; that is a wish. Because they couldn’t quantify the pain (e.g., “We are losing $50,000 per month in manual data entry errors”), they couldn’t quantify the Value of a solution.

The result was a project with no “End in Mind.” The team drifted, timelines slipped, and the budget evaporated because no one knew what “success” actually looked like.

The Fix: Before writing a single line of code, you must be able to fill in this blank:

“This AI solves [Specific Pain] which is currently costing us [$$$$] per month.”

The “Translation Error”: CEO vs. Data Scientist

This is the most common friction point I see in enterprise AI.

The CEO says: “I want more revenue.”
The Data Team hears: “I need a model with a high accuracy score.”

These are not the same thing. “More revenue” is a goal; high accuracy is just one possible tactic. For example, revenue can come from lowering churn, but only if you focus on the right customers. A 1% churn reduction on low-value customers is meaningless. You need to save the High-CLTV (Customer Lifetime Value) clients.

The Fluency Gap: The executives didn’t define the business lever (CLTV), and the tech team didn’t ask. The result is a technically “successful” model that drives zero actual revenue.

When to Kill an AI Project: The “Code Red” Checklist

Part of my job as a strategist is telling clients when to stop spending money. An AI project should be put on hold or killed immediately if it meets these criteria:

Invisible ROI: You are three months in and still cannot map the model’s output to a dollar value.
Missing Ingredients: The core data is missing or inaccessible. You can’t build a churn model if you aren’t tracking customer complaints or service failures.
The “30% Drift” Rule: If 30% of the project timeline has passed and the team still has “no clue” what the final output looks like or lacks stakeholder support.

The only exception is when the potential ROI is massive and justifies a high-risk, “Agile” approach to de-risk the project in smaller steps.

Case Study: How Fixing Definitions Saved a Fortune 500

I worked with a global Fortune 500 retailer whose AI initiatives were failing.

The Problem: Every country operated as a mini-company. They had different definitions for “Delivery Time,” “Stock in Hand,” and “Marketing Spend.” The Consequence: The global AI forecasting model was useless because the input data meant different things in different regions. This “Translation Error” was costing them millions in lost inventory.

The Turnaround: We didn’t start with a better AI model. We started with Definitions. By standardizing the meaning of “ROI” and “COGS” across their SAPMENA region, the data became clean. Once the data was clean, the AI model could actually forecast.

The lesson is simple: You cannot layer AI on top of organizational chaos. You must fix the business definitions first.

How to Audit Your AI Strategy

If you are feeling the Pressure, FOMO, or Anxiety of AI right now, stop. Don’t build another “Solution looking for a Problem.” My framework helps leaders:

Identify the PAIN (not the hype).
Quantify the Value (put a dollar sign on it).
Assign Accountability (who owns the ROI?).

If you want to diagnose your current AI roadmap before you spend another dollar, let’s talk.

Decision-First AI: Why Data Should Follow, Not Lead

Jitin Kapila — Wed, 08 Oct 2025 18:30:00 GMT

Start here: don’t open a notebook until you know the decision you want to change.

Sounds obvious. But most AI projects don’t start there. They start with a dataset, or with a “let’s try this model,” or with a platform demo that looks great in the cloud. And then weeks later the obvious question appears: “Okay — what decision does this support?” People shrug. The project stalls. The models are good. The business impact is vague.

This is the dataset-first trap. It wastes time, money, and faith. It also gives AI a bad name.

I’ve seen the opposite work — a lot. Start with the decision. Map the decision. Then pick the simplest data and model that make the decision better. The result? Faster pilots, clearer ROI, and systems that actually get used.

That approach is not just a management neat-idea. There’s a real body of research showing that aligning models to downstream decision goals yields better decisions than optimizing prediction accuracy alone. And there are real, practical wins — from telecom fault detection to inventory systems — when you flip the order.

The “dataset-first” trap — what it looks like, and why it hurts

Here’s the typical playbook I see in companies:

Someone discovers a new dataset.
They build dashboards, then a model, then a fine model, then a fancier model.
They show a demo. The demo gets applause. Then the work hits integration, governance, and the messy reality of people who must make decisions every day. The model’s outputs don’t map to a decision process. So adoption fails.

Why? Because the project optimized the wrong thing. It optimized prediction metrics — error, F1, AUC, MAPE. And those are useful. But they’re not the measure of business impact. A model with better accuracy can still be useless if it doesn’t change what someone does.

Harvard Business Review captured this idea well: decisions don’t start with data. They start with a problem, a role, a process, and a behavior. If your analytics don’t connect to that reality, you get slides and disappointment. Harvard Business Review

There are more subtle costs too. Dataset-first projects often create models that are brittle in production: they overfit to historical quirks, they require constant data wrangling, and they produce numbers no one trusts. That kills scale.

Decision-first in research: not new, but finally practical

There’s academic grounding for starting with decisions. In the machine-learning community this shows up as “decision-focused learning” or “smart predict-then-optimize.” The idea: train predictive models not for pure accuracy, but to minimize the loss that matters to the downstream optimization or decision task. When you optimize directly for the decision loss, you often get better business outcomes — even with “worse” prediction metrics. arXiv+1

Recent papers and reviews show both the theory and practical methods: surrogate losses that reflect decision outcomes, techniques to differentiate through optimization, and heuristics for discrete problems. The takeaway: the math supports the intuition. If you want a model to help choose inventory levels, price points, or routing, train it with that decision in mind — not just with RMSE.

That doesn’t mean every model must be complex. Often the opposite. Framing the decision reduces model complexity because you only model what matters. This is your Occam’s Razor.

The Decision Map: simple framework you can use today

If you want to flip to decision-first, start with a small, disciplined tool I call a Decision Map. It’s a one-page artefact. Build it before you touch data.

Here’s the Decision Map — six steps. Do them in order.

Name the decision. Who decides, how often, and what options do they choose? Example: “Field-ops decides whether to dispatch a technician to a suspected DSL fault.” Be specific. Frequency matters — hourly, daily, weekly change what you can do.
Define the decision metric(s).What counts as success? Lower cost? Faster response time? Increase in net revenue? Pick one primary metric and one secondary. If you can’t name it in a single measurable sentence, you don’t have a decision.
Map the current process. Where is the decision made today? Which people and tools are involved? Where does data enter? Where do delays happen? This step exposes the friction you must remove.
Identify the minimal action the model must trigger. The model doesn’t need to be perfect. It needs to change behavior. If the model’s output is a probability, what threshold triggers action? Who gets the alert? What’s the handoff?
List the minimal signals (data) needed. Only include data that directly reduces uncertainty for the decision. You’ll be surprised how small this list often is. Think: signal → action. Not “all the data.”
Plan the feedback loop. How will you measure the decision metric after deployment? How will you collect labels and iterate? Decide that upfront.

If you complete this map, you’ll have done 80% of the work most teams skip. It forces alignment, and it reveals whether the project is worth doing.

P.S. - I am hosting a course where you’ll learn exactly how to implement this for your usecase.

A telecom example, mapped end-to-end (real story)

A brief, real example matters. I built a real-time anomaly detection system for a European telecom client early in my career. The project didn’t begin with “we have logs.” It began with this decision map:

Decision: Should operations dispatch a field technician proactively for a suspected DSL fault?
Metric: Reduce customer reported faults and improve Net Promoter Score (NPS) by reducing time-to-detect. Also: monthly cost savings from fewer reactive truck rolls and more planned truck rolls.
Process: Operations received the customer call, created a ticket, and dispatched if needed — often hours or days later. That was slow and expensive.
- Action: If the system flags an anomaly with high confidence
- Red : Very high probability in next 48 hours,
- Amber: probability in next 2-15 days
- Green: No visibility of error in next 15 days This creates a high-priority ticket and dispatch a remote check or technician and was planned in region wise manner.
Signals: DSL line metrics, error rates, device telemetry, event logs — a handful of streams, not every log.
Feedback: Compare flagged incidents to customer complaints and adjust thresholds.

Because the decision was so clear, we could measure value before full scale. The pilot cut detection time from days to hours, raised customer satisfaction significantly ( We sent messages to possible signal disruption early), and saved ~£80K per month ( because reducing complete breakdown to DSL by heat or so and by planning the route than going everywhere) . It wasn’t an exotic model (or may be it was, tech details some other day); it was a tightly scoped system that informed a clear action.

Notice how this maps to the Decision-First steps. The model existed to change a single operational choice. That focus made deployment possible and measurable fast.

Another quick example: inventory decisions

Inventory forecasting is a classic area where decision-first matters. You can chase lower MAPE and never change stocking policy. Or you can ask: what decision do merchandisers make with this forecast? When you frame it as “which SKUs do we order for next week, and what reorder points trigger expedited shipments,” you design the forecast differently: shorter horizons, bias for understock on fast movers, and direct constraints on reorder costs.

I’ve led projects that delivered $11M in inventory optimization by building forecasts and decision rules that match merchant behavior and supply constraints. The trick was not better models — it was framing forecasts so the merchandisers could act with confidence.

Practical tips for teams (do this in week one)

Run a one-hour Decision Map workshop. Invite the decision owner, one operator, one engineer, and one product owner. Build the one-page map. If the owner can’t commit to a metric, pause the project.
Start with a simple rule baseline. Before modeling, define a rule that will be your baseline (e.g., “if X > T, create ticket”). If the model can’t beat that rule in decision impact, scrap it.
Measure decision impact, not model accuracy. Your dashboard should show business metric delta — not just RMSE. If you show the board a change in cost or conversion, you’ll get attention.
Prioritize deployment constraints. Decide telemetry, latency, and handoff requirements first. Models that can’t meet latency or trust constraints are useless no matter how accurate.
Iterate with real feedback. Don’t wait for “perfect.” Ship an MVP that can be measured, then refine. Real decisions provide labels and operational learning.

Learn how to exactly do this here.

Common objections and how to handle them

“But we don’t have a clear decision owner.” Then don’t build a model. Decisions live in roles. Pull the right owner in early, or you’ll build for nobody.
“Our data is messy.” Fine. If you can define the minimal signals, you can often create a proxy or start with manual labels. Messy data is easier to handle when you only need a few signals for a specific decision.
“We need predictions for many uses.” Build a simple decision-first pilot first. Use its success to fund broader platform work. Pilots create proof that unlocks investment.
“Decision-focused methods are academic — too hard.” There’s truth and myth here. The academic techniques show big wins when decision loss can be written down. But you don’t need complicated differentiable optimization to start. Use a decision map, simple thresholds, A/B tests, and iterative measurement. Graduate to decision-focused training once you have a stable objective. The research just tells us — unsurprisingly — that when you train with the decision in mind, outcomes improve. Optimization Online+1

One-page checklist (copy this)

Decision name (The GOTO problem statement): ____________________
Primary / Secondary metric (which can accounted for ROI calculation later ): _____
Decision owner (I don’t want to debate on this): ________________________
Frequency of predictions: real-time / hourly / daily / weekly / monthly
Action / interventions which can be triggered by model: ____________________
Minimal signals required( The core Data to begin with): ______________________
Baseline rule (your fail-safe if everything goes wrong): ____________________
Feedback source ( might be tricky, but you should have one): __________________

If you can fill this in, you’re set for a pilot.

Final note —
start small, measure fast, then scale with discipline

The decision-first approach is simple because business problems are simple when stated well. The hard part is discipline: saying no to shiny demos and yes to measurable change. Start with one decision that matters. Map it. Ship a small system that changes behavior. Measure the business metric. Iterate.

Research supports this: models trained with the decision in mind perform better on the actual outcomes you care about. And in practice, teams that flip the order — decision first, data second — get to value faster. arXiv+1

If you want help mapping a decision in your company, send me one line describing the decision and the current process. I’ll reply with the Decision Map template you can use in a one-hour workshop. Or if you want we can connect too.

If this post help you or think help someone in need, please do share it. Thanks!!!

Key sources & further reading

Elmachtoub, A.N., & Grigas, P. — Smart “Predict, then Optimize” (foundational paper on decision-focused loss and SPO). arXiv
Reviews and recent work on decision-focused learning (predict-and-optimize / decision-focused methods). arXiv+1
Harvard Business Review — Decisions Don’t Start with Data — on why framing the decision matters. Harvard Business Review

(And — the telecom and inventory examples referenced above are from projects; let’s talk, if you want more details !! )

The AI Umbrella: A Simple Guide to What Actually Matters

Jitin Kapila — Wed, 24 Sep 2025 18:30:00 GMT

AI is everywhere now, but most folks still see it as just ChatGPT or image generators. The truth is, AI covers much more ground. It runs in software, business processes, search engines, robotics, and more. In this post, here’s a plain view of the “AI Umbrella” — a way to see what’s really out there and how to use it to solve real problems, not just chase hype.

What Is the AI Umbrella?

AI is a big term. It wraps up everything from crunching numbers in Excel to self-driving cars and chatbots/AI Agents. This umbrella covers many areas, and each one is changing fast. Some ideas fade, some thrive, and others turn into new things. AI is not a magic trick — and it’s not new. For over sixty years, it’s helped businesses do work better, from simple data entry to complex automation.

Hype vs. Reality

Each cycle brings promises: “AI will change everything.” But those promises rarely deliver overnight. Just like Excel automated bookkeeping but didn’t end jobs, GenAI (like ChatGPT, Claude, Gemini) does not replace human work — it just shifts what we do. Jobs evolve. AI lets us do work faster, spot patterns, and solve tricky problems. But to really get value, businesses need a clear plan, not just headlines.

The AI Umbrella Framework

So how do you make AI work for real business needs? Use the “AI Umbrella” to map your problem and pick the right tool. Here’s a simple view:

It covers everything from classic number crunching to the newest breakthroughs. Think of AI like a big toolbox, not just one gadget. Well we are going to take a deeper dive in each pillar, but a small disclaimer this is not-an-exhaustive list.

Machine Learning (ML)

Machine learning drives most real business value in AI today. It focuses on finding patterns in data to predict outcomes, classify information, or spot trends that humans might miss. Unlike the flashy GenAI tools, ML quietly runs recommendation engines, fraud detection systems, and supply chain optimization across industries. here is a non-exhaustive view of Machine Learning (ML)

ML Umbrella

Notice , at the bottom is you GenAI, and above is the dearth amount of research that has went in that is a a way bigger chunk of AI. Here are some nuggets for you thoughts:

Supervised learning delivers 25-30% ROI on average, with applications like fraud detection showing 150% ROI within 9 months. Marketing automation using ML achieves up to 544% ROI annually. This is all about finding answers with labeled data.
Unsupervised methods excel at finding hidden patterns - clustering helps retail companies segment customers for personalized campaigns, while anomaly detection catches equipment failures before they happen. And this is all about spotting patterns in unlabeled data.

ROI Examples:

Predictive maintenance: 400% ROI in 6 months by preventing costly equipment breakdowns
Customer segmentation: E-commerce businesses see 27% increase in purchase likelihood through ML-powered personalization

Graph Algorithms

Graph algorithms map connections between data points, revealing relationships that traditional databases can’t capture. Think of how Google’s PageRank revolutionized search by understanding link relationships, or how LinkedIn suggests connections by analyzing your professional network.

Graph Umbrella

Graph algorithms are used in variety of ways:

Neo4j customers report 417% ROI over three years, with 20% improvement in business results and 60% faster time-to-value. Graph databases excel at real-time pattern recognition and complex relationship analysis.
Financial services use graph algorithms for fraud detection by mapping transaction networks - one e-commerce company reduced fraud risk through real-time pattern detection of suspicious shipping routes.
Maps: Google Maps uses Dijkstra’s and A* algorithms to find optimal routes through road networks, processing millions of nodes with real-time traffic data and ML predictions for dynamic rerouting.
Telecom: BGP uses Bellman-Ford for inter-domain routing between networks, while OSPF runs Dijkstra’s algorithm within networks for shortest-path calculations and load balancing.

ROI Examples:

Fraud detection: Financial institutions prevent losses through network analysis, with ROI justified by avoided fraud costs
Recommendation systems: Social platforms and e-commerce sites drive higher engagement through connection-based suggestions

Optimization & Planning

Optimization tackles the “best way” problems - finding the most efficient routes, optimal inventory levels, or ideal resource allocation. This field delivers some of the highest ROI because it directly cuts waste and improves efficiency in existing operations.

Optimization Umbrella

ROI Examples:

Route optimization shows 15-30% reduction in travel costs, with companies like cold drink bottlers cutting fuel costs by 12% while increasing shop coverage by 18%. CPG brands report 19% savings on field visit costs through AI-based routing.
Supply chain optimization delivers 80% ROI within 12 months by reducing inventory costs and improving delivery performance. Manufacturing scheduling optimization can achieve 172% ROI through better resource utilization.
Inventory management: 30% reduction in holding costs while improving on-time deliveries by 25%
Production scheduling: Manufacturers see 20-30% cost reduction through optimized workflows

Human-Machine Interface

This pillar focuses on how humans and machines work together - from industrial robotics to neural interfaces to your mobiles to whats is possible future. It’s about extending human capabilities rather than replacing them, creating hybrid systems that combine human judgment with machine precision.

Human-Machine Interface

ROI Examples:

Industrial robotics typically delivers ROI within 3-5 years, with autonomous mobile robots showing quick implementation and measurable labor savings. BMW reported 25% reduction in production time and 30% cut in operational costs within two years.
Warehouse automation ranges from $5-15 million for semi-automated systems, with companies seeing 24/7 operation capabilities and reduced human error rates.
Factory automation: 20% average productivity increase with significant operational cost reductions
Customer service automation: 30% reduction in support costs with faster query resolution

Faded but Foundational AI

Early AI systems like expert systems and rule-based engines laid the foundation for today’s AI. While many faded due to rigidity and high maintenance costs, their core principles live on in modern decision support systems and automated workflows.

Some thoughts:

Expert systems peaked in the 1980s but declined due to scalability issues and inability to handle ambiguity. However, their legacy continues in clinical decision support systems and business automation tools.
Clinical decision support: Healthcare systems still use rule-based approaches combined with ML for diagnosis assistance
Business automation: Modern workflow systems trace back to expert system principles, delivering consistent decision-making in structured environments

Older methods like rule-based systems, expert systems, and fuzzy logic now live inside more advanced techniques like tree-based ML and probabilistic models. Legacy system integration of modern AI shows that 95% of GenAI pilots fail to show real ROI, often because they ignore lessons learned from early AI implementations.

The Market Numbers

Here’s what the data says about the real business side of AI:

The whole AI market is about $235 billion.
GenAI (think ChatGPT) is just 8.6% ($20.2 billion). The rest — traditional ML — is 91.4% ($194.6 billion).
Traditional AI gets better results: up to 30% ROI, while GenAI averages 12%.
Predictive maintenance delivers up to 400% ROI in six months. Fraud detection? 150% in under a year. LLMs do much less: about 12% over the same time.

What Makes AI Projects Work?

Using external tools yields a 67% success rate.
Internal builds succeed only 33% of the time.
Many spend most budget on sales and marketing for AI, but real savings come from automating back-office work.
Traditional machine learning projects finish in 3-6 months. GenAI takes 3-12 months and needs more tuning.

Putting It All Together

Before starting any AI project, ask: What problem am I trying to solve? Use the AI Umbrella map to find what fits. Is it prediction, classification, pattern finding, or optimization? Do you need machine learning, deep learning, or just smarter software? This clarity saves money, time, and helps get actual results.

AI Decision Flow

Next time someone pitches an AI solution, ask “Which part of the AI umbrella are we using?” And “How does it solve our actual problem?”, or You can learn to ask specific question for your use case,, but ask for clarity and not hype. Use the framework above as your map. AI isn’t a buzzword — it’s a toolbox. The right tool solves the right problem. And that’s how you make AI work for your business, today and tomorrow.

This is just the start. I’m building an Enterprise AI series that shows how to combine these AI pillars for real business impact.

Next may be: How one manufacturing company used ML, optimization, and robotics together to cut costs by 40%. Subscribe to see the full breakdown or book a strategic call

TrieKNN: Unleashing KNN’s Power on Mixed Data Types

Jitin Kapila — Tue, 25 Feb 2025 18:30:00 GMT

Photo by Gelgas Airlangga

In This Post

We’ll dissect the limitations of traditional KNN when faced with mixed data types.
Introduce TrieKNN, a Trie-based approach that elegantly handles mixed data.
Walk through the implementation and training of a TrieKNN model.
Evaluate its performance and discuss its potential impact.

The Allure and Limitation of KNN

In the realm of machine learning, the K-Nearest Neighbors (KNN) algorithm stands out for its intuitive nature and ease of implementation. Its principle is simple: classify a data point based on the majority class among its ‘k’ nearest neighbors in the feature space. This non-parametric approach makes no assumptions about the underlying data distribution, rendering it versatile for various applications. KNN is very popular, but it comes with some limitations.

However, KNN’s Achilles’ heel lies in its reliance on distance metrics, which are inherently designed for numerical data. Real-world datasets often contain a mix of numerical and categorical features, posing a significant challenge for KNN. How do you measure the distance between ‘red’ and ‘blue,’ or ‘large’ and ‘small’?

Prior Art

Several strategies have been proposed to adapt KNN for mixed data:

One-Hot Encoding: Converts categorical features into numerical vectors, but can lead to high dimensionality.
Distance Functions for Mixed Data: Develops and apply custom distance metrics that can handle both numerical and categorical features such as HEOM and many others.
Using mean/mode values: Replace the missing values with mean/mode.

These methods often involve compromises, either distorting the data’s inherent structure or adding computational overhead.

Enter TrieKNN: A Novel Approach

What if we could cleverly sidestep the distance calculation problem for categorical features, while still leveraging KNN’s power? TrieKNN offers just that—a way to perform KNN on any mixed data!

TrieKNN combines the strengths of Trie data structures and KNN to handle mixed data types gracefully. Here’s the core idea:

Trie-Based Categorical Encoding: A Trie is used to store the categorical features of the data. Each node in the Trie represents a category.
Leaf-Node KNN Models: At the leaf nodes of the Trie, where specific combinations of categorical features are found, we fit individual KNN models using only the numerical features.
Weighted Prediction: To classify a new data point, we traverse the Trie based on its categorical features. At each level, we calculate a weighted distance based on available data, ending in a probability score in each leaf node.

Why This Works

No Direct Distance Calculation for Categorical Features: The Trie structure implicitly captures the relationships between categorical values.
Localized KNN Models: By fitting KNN models at the leaf nodes, we ensure that distance calculations are performed only on relevant numerical features.
Scalability: The Trie structure efficiently handles a large number of categorical features and values.

Building a TrieKNN Model

Let’s dive into the implementation. We’ll start with the TrieNode and Trie classes, then move on to the KNN model and the training/prediction process.

Trie Implementation

Code

CrossTab Sparsity for Classification

Jitin Kapila — Mon, 02 Jan 2023 18:30:00 GMT

Cross Roads where everyone meets!

Introduction: A Journey into Data

Picture this: you’re standing on the icy shores of Antarctica, the wind whipping around you as you watch a colony of Palmer Penguins waddling about, oblivious to the data detective work you’re about to embark on. As a data science architect, you’re not just an observer; you’re a sleuth armed with algorithms and insights, ready to unravel the mysteries hidden within data. Today, we’ll transform raw numbers into powerful narratives using CrossTab Sparsity as our guiding compass. This blog post will demonstrate how this metric can revolutionize classification tasks, shedding light on many fascinating datasets—the charming Palmer Penguins and the serious Obesity, Credit cards data and many more.

The Power of CrossTab Sparsity

What is CrossTab Sparsity?

CrossTab Sparsity isn’t just a fancy term that sounds good at dinner parties; it’s a statistical measure that helps us peer into the intricate relationships between categorical variables. Imagine it as a magnifying glass that reveals how different categories interact within a contingency table. Understanding these interactions is crucial in classification tasks, where the right features can make or break your model (and your day).

Why Does It Matter?

In the world of data science, especially in classification, selecting relevant features is like picking the right ingredients for a gourmet meal—get it wrong, and you might end up with something unpalatable. CrossTab Sparsity helps us achieve this by:

Highlighting Relationships: It’s like having a friend who always points out when two people are meant to be together—understanding how features interact with the target variable.
Streamlining Models: Reducing complexity by focusing on significant features means less time spent untangling spaghetti code.
Enhancing Interpretability: Making models easier to understand and explain to stakeholders is like translating tech jargon into plain English—everyone appreciates that!

Data Overview: Our Data People at work here

The Datasets

Data 1: Estimation of Obesity Levels Based On Eating Habits and Physical Condition

Little bit about the data: This dataset, shared on 8/26/2019, looks at obesity levels in people from Mexico, Peru, and Colombia based on their eating habits and physical health. It includes 2,111 records with 16 features, and classifies individuals into different obesity levels, from insufficient weight to obesity type III. Most of the data (77%) was created using a tool, while the rest (23%) was collected directly from users online.

Data 2: Predict Students’ Dropout and Academic Success

Little bit about the data: This dataset, shared on 12/12/2021, looks at factors like students’ backgrounds, academic path, and socio-economic status to predict whether they’ll drop out or succeed in their studies. With 4,424 records across 36 features, it covers students from different undergrad programs. The goal is to use machine learning to spot at-risk students early, so schools can offer support. The data has been cleaned and doesn’t have any missing values. It’s a classification task with three outcomes: dropout, still enrolled, or graduated

Key Features:

Multiclass: Both data set cater a multi class problems with NObeyesdad and Target columns
Mixed Data Type: A good mix of categorical and continuous variables are available for usage.
Sizeable: More than 2 K rows are available for testing.

Exploratory Data Analysis (EDA): Setting the Stage

Before we dive into model creation, let’s explore our dataset through some quick EDA. Think of this as getting to know your non-obese friends before inviting them to a party.

EDA for Obesity Data

Here’s a brief code snippet to perform essential EDA on the Obesity dataset:

Loading data and generating basic descriptive

CrossTab Sparsity

Jitin Kapila — Mon, 02 May 2022 18:30:00 GMT

Photo by Nataliya Vaitkevich

Introduction

Cluster analysis has always fascinated me as a window into the hidden structures of data. During my collaboration with Kumarjit Pathak, we grappled with a persistent challenge in unsupervised learning: how to objectively evaluate clustering quality across different algorithms. Traditional metrics like the Silhouette Index or Bayesian Information Criterion felt restrictive—they were siloed within specific methodologies, making cross-algorithm comparisons unreliable.

This frustration led us to develop a universal cluster evaluation metric, detailed in our paper “Cross Comparison of Results from Different Clustering Approaches”. Our goal was to create a framework that transcends algorithmic biases, enabling:
- Direct comparison of K-Means vs GMM vs DBSCAN vs PAM vs SOM vs Anything results
- Identification of variables muddying cluster separation
- Automated determination of optimal cluster counts

In this blog, I’ll walk you through our journey—from conceptualization to real-world validation—and share insights that didn’t make it into the final paper.

The Birth of the Metric: A First-Person Perspective

Why Existing Methods Fell Short
Early in our research, we cataloged limitations of popular evaluation techniques:

Method Dependency

Silhouette scores worked beautifully for K-Means but faltered with Gaussian Mixture Models (GMM).
Probability-based metrics like BIC couldn’t handle distance-based clusters.

Noise Blindness
Noisy variables often contaminated clusters, but traditional methods required manual outlier detection.
Subjective Optimization* Elbow plots and dendrograms left too much room for human interpretation.

Our “Aha!” Moment - Crosstab Sparsity

Best Cluster for K-means Using Crosstab sparsity

While analyzing cross-tab matrices of variable distributions across clusters, we noticed a pattern: well-segregated clusters consistently showed higher frequencies along matrix diagonals. This inspired our two-part metric:

Segregation Factor:

# Simplified calculation from our codebase  
median = np.median(cross_tab)  
N_vk = np.sum(cross_tab > median)  # Count "well-segregated" instances

Explanation Factor:

explanation = np.log(len(data) / (bins * clusters))

Segregation Factor: Measures how distinctly clusters separate data points. We used the median (not mean) to avoid skew from outlier-dominated matrices.
Explanation Factor: Quantifies how well clusters capture data variability. The logarithmic term penalizes overfitting—a critical insight from debugging early over-segmented clusters.

And the Final Formula:
For variable with clusters:

where:
- : Segregated instances (values above cross-tab matrix median)
- : Number of value intervals for variable
- : Total observations

This formulation ensures algorithmic invariance, allowing comparison across methods like K-Means (distance-based) and GMM (probability-based). Also, now you can see from the formula two scenarios happens: 1. If each variable crosstab is too dense then their is no separation between classes 2. If each variable crosstab is too sparse then we loose on explanation.

Hence the curve reaches a maximum and then falls down giving use the separability that the cluster can produce:

Case Study: Vehicle Silhouettes (Through My Eyes)

The Dataset That Almost Broke Us

We tested our metric on a vehicle silhouette dataset with 18 shape-related features (e.g., compactness, circularity). Initially, inconsistent results plagued us—until we realized our binning strategy for continuous variables was flawed.

Key Adjustments:
- Switched from equal-width to quantile-based binning (10 bins per variable).
- For categorical variables, retained native levels instead of coercing bins.

The Breakthrough

After refining the preprocessing:

Optimal Clusters: Our metric plateaued at , aligning perfectly with known vehicle categories (sedans, trucks, etc.).
Noise Detection: Variables like Max.LWR (length-width ratio) scored poorly, revealing inconsistent clustering. We later found this was due to manufacturers’ design variances.

Finding best cluster for K-Means alone:

Best Cluster for PAM method Using Crosstab sparsity

Comparing all cluster methods and find the optimal one:

Optimal Cluster for many methods

The chunkiest part : Understanding your variable for separateness. This gives direct insight of what variable in your data is most critical separator.

All kind of variable scored against Metrics

Comparative Advantages and Creativity at Work

Comparative Advantage Over Traditional Metrics

Feature/Scenario	Silhouette Index	Davies-Bouldin	Crosstab Sparsity
Algorithm Agnostic	❌ (Distance-based only)	❌	✔️
Handles Mixed Data	❌	❌	✔️
Identifies Noisy Vars	❌	❌	✔️
Optimal Cluster Detection	Manual elbow analysis	Manual analysis	Automated plateau detection
Mixed Algorithms	Failed (GMM vs K-Means)	Failed (needs numerical data)	Achieved 92% consistency[1]
Noisy Variables	Manual outlier removal	Manual outlier removal	Auto-detected (e.g., Max.LWR)
Optimal Cluster Detection	Subjective elbow plots	Subjective to Elbow plots	Objective plateau detection

Our creativity yielding boons. We wanted a simple metric to judge different kind of cluster, but we got much more from our experiments and work on this metric:

Variable-Level Diagnostics: Low scores pinpoint variables muddying cluster separation.
Cross-Method Benchmarking: Compare K-Means (distance) vs GMM (probability) vs hierarchical vs partition clustering fairly using a unified score.
Scale Invariance: Logarithmic term makes scores comparable across datasets of varying sizes.
Debug Cluster Quality: Identify and remove noisy variables preemptively
Automate Model Selection: Objectively choose between K-Means, GMM, PAM, Agglomerative.

Lessons Learned and Future Vision

Few take away from these experiments
1. Binning Sensitivity: Quantile-based binning was transformation. Equal-width bins distorted scores for skewed variables.
2. Categorical Handling: Native levels for categorical outperformed frequency-based grouping.
3. Non-Parametric Approach: This approach allowed us to make sense of data without being tied down by assumptions. We have seen how this metric can be a game-changer for statisticians, providing insights not just into cluster behavior but also into rare event modeling.

The plots from these experiments not only clarify how clusters behave but also offer valuable insights for identifying outliers. I believe there’s exciting potential to extend this metric into classification and value estimation modeling. Imagine using it as a loss function in both linear and non-linear methods to achieve better data segmentation! Thing for another blog someday!

A Personal Reflection

Developing this metric taught me that simplicity often masks depth. A two-component formula now underpins clustering decisions in industries we never imagined—from fraud detection to genomics. Yet, I’m most proud of how it democratizes cluster analysis: business analysts at our partner firms now optimize clusters without PhD-level stats.

You can find implementation of python code here

This blog synthesizes findings from our original paper, available here. For a deeper dive into the math, check Section 3 of the paper.

To my readers: Have you tried implementing cross-algorithm clustering? Share your war stories in the comments—I’d love to troubleshoot together!

A flow to Test Your Hypothesis in Python

Jitin Kapila — Mon, 09 Aug 2021 18:30:00 GMT

Hypothesis testing Photo by Tara Winstead

Overview

All the practitioners of data science always hit one giant thing to do with data and you know it well its EDA -Exploratory Data Analysis. This word EDA¹ was coined by Tukey himself in his seminal book published in 1983. But do you think that before that EDA doesn’t existed ?

¹ Emerson, J. D., & Hoaglin, D. C. (1983). Stem-and-leaf displays. In D. C. Hoaglin, F. Mosteller, & J. W. Tukey (Eds.) Understanding Robust and Exploratory Data Analysis, pp. 7–32. New York: Wiley. Book is here.

Well glad you thought. Before that all were doing what is called as Hypothesis Testing. Yes, before this the race was majorly to fit the data and make most unbiased and robust estimate. But remember one thing when you talk about Hypothesis Testing it was always and majorly would be related to RCTs (Randomized Controlled Trials) a.k.a Randomized Clinical Trials and is Gold Standard of data.

What are the results

Pheww… Thats too much code right. But that would save a lot of your time in real life. So in real life you would write code as 3 steps below:

Code

Adaptive Regression

Jitin Kapila — Mon, 30 Apr 2018 18:30:00 GMT

Adapting path through mountains! Photo by Zülfü Demir📸

Introduction

Here I am trying to express our logic to find such Observation. Lets dive in.

There are different value estimation technique like regression analysis and time-series analysis. Everyone of us has experimented on regression using OLS ,MLE, Ridge, LASSO, Robust etc., and also might have evaluated them using RMSE (Root Mean/Median Square Error), MAD (Mean/Median Absolute Deviation), MAE (Mean / Median Absolute Error) and MAPE (Mean/Median Absolute Percentage Error), etc…

But all of these gives a single point estimate that what is the overall error looks like. Just a different thought!! can we be sure that this single value of MAPE or MAE? How easy it is to infer that our trained model has fitted well across the distribution of dependent variable?

Plot of Anscombe’s Quartet

Some Descriptive Stats for Anscombe’s Quartet

Let me give you a pretty small data-set to play with “The Anscombe’s quartet”. This is a very famous data-set by Francis Anscombe. Please refer the plots below to understand the distribution of y1, y2, y3, y4. Isn’t it different?

Would the measure of central tendency and disportion be same for this data? I am sure none of us would believe but to our utter surprise we see all the descriptive stats are kind of same. Don’t believe me !!! Please see the results below ( Source: Wikipedia ):

So what we do Now!

Astonished !!! Don’t be. This is what has been hiding behind those numbers. And this is why we really won’t be able to cross certain performance level. Unless you change some features or even do a lot of hyper parameter tuning, your results won’t vary much.

If you look at the average value of MAPE in each decile you would see an interesting pattern. Let us show you what we see that pattern. One day while working on a business problem where I was using regression on a discussion with Kumarjit, we deviced a different way of model diagnosis. We worked together to give this a shape and build on it.

As you can see it is absolutely evident that either of the side in the distribution of MAPE values is going wild!!!!!!! Still overall MAPE is good (18%).

Seeking Scope of Improvement

We worked together to build a different framework to address such issues on the go and reduce the MAPE deterioration on the edge of the distribution.

This problems gives rise to a concept we named as Distribution Assertive Regression (DAR).

DAR is a framework that is based on cancelling the weakness of one point summaries by using the classical concepts of Reliability Engineering : The Bath Tub Curve.

Plot for Classical Bath Tub Curve using a Hazard Function

The Specialty of this curve is that it gives you the likelihood which areas one tends to have high failure rates. In our experiments when we replace failure with MAPE value and the Time with sorted (ascending) value of target / dependent variable, we observe the same phenomenon. This is likely to happen because most of regression techniques assumes Normal (Gaussian) Distribution of data and fits itself towards the central tendency of this distribution.

Because of this tendency, any regression methods tends to learn less about data which are away from the central tendency of the target.

Lets look at BostonHousing data from “mlbench” package in R.

Plot for MAPE Bath Tub Curve for Decile Split “mdev” from Data

Here the MAPE is calculated for each decile split of ordered target variable. As you can observe it is following the bath tub curve. Hence the validates our hypothesis that the regression method is not able to understand much about the data at the either ends of the distribution.

Final Analysis

Now the DAR framework essentially fixes this weakness of regression method and understands the behavior of data which is stable and can be tweak in a fashion that can be use in general practice.

Plot of MAPE Bath Tub Curve after applying DAR Framework for Decile Split “mdev” from Data

How this framework with same method reduced MAPEs so much and made model much more stable…?? Well here it is:

The DAR framework splits the data at either ends of the order target variable and performs regression on these “split” data individually. This inherently reduces the so called “noise” part of the data and treat it as an individual data.

Scoring on New Data

Now you might be thinking while applying regression this sounds good but how will one score this on new data. Well to answer that we used our most simple yet very effective friend “KNN” (Though any multiclass Classifier can be used here). So ideally scoring involves two step method :

Score new value against each KNN / Multiclass Classifier model of the data
Based on closeness we score it with the regression method used for that part of data.

So now we know how we can improve the prediction power of data for regression.

Code and Flowchart

If things are simple lets keep it simple. Refer flowchart and code below for implementation of this framework. Paper here!

Click to Expand

graph TB
    
    subgraph Testing
        p1(Finding bucket of model to choose)
        p1 --> p2([Making predictions 
 based on selected model for inference])
        p2 --> p3(Consolidate final score of prediction)
    end

    subgraph Training
        md([Fitting a 
Regression model])==> di
        di{Binning Data via 
 evaluating Distribution 
 MAPE values }
        di --> md2([Fitting a Buckteing model 
 to Binned MAPE Buckets])
        md2 --> md3([Fitting Regression 
 Models on Binned Data])
        md == Keeping main
model ==> ro        
        md3 ==> ro(Final Models 
 Binning Data Models + 
 Set of Regressoin Models)
    end

    
    od([Data Input]) -- Training
 Data--> md
    od -- Testing
 Data--> p1
    ro -.-> p1
    ro -.-> p2

    classDef green fill:#9f6,stroke:#333,stroke-width:2px;
    classDef yellow fill:#ff6,stroke:#333,stroke-width:2px;
    classDef blue fill:#00f,stroke:#333,stroke-width:2px,color:#fff;
    classDef orange fill:#f96,stroke:#333,stroke-width:4px;
    class md,md2,md3 green
    class di orange
    class p1,p2 yellow
    class ro,p3 blue

Jitin Kapila

The AI Confusion Tax: Why Companies Burn Millions on Wrong AI

The Case

What is the AI Confusion Tax?

1. Wrong AI type for the problem

2. Wrong problem selected

3. Data before decision

The pattern underneath

Before you sign the next contract

Frequently Asked Questions

AI Strategic Skills: Where Should a CEO Draw the Line?

1. The Line in the Sand: Strategy vs. Steps

2. The Dashboard Analogy: Driving the Business

3. The One Thing You Cannot Delegate

4. The “Promised Land”: From Grunt Work to Growth

5. The Monday Morning Report

Lastly Your License to Operate

Why your GenAI strategy is just a pair of pliers in 2026

1. The Plier Metaphor: What GenAI Actually Does

2. The Danger of “Plier-Only” Thinking

3. Building a Full Toolbox

4. The Executive’s Job: Choosing the Tool

Take the Next Step

Why Most AI Strategies Fail

The #1 Silent Killer: Absence of Pain

The “Translation Error”: CEO vs. Data Scientist

When to Kill an AI Project: The “Code Red” Checklist

Case Study: How Fixing Definitions Saved a Fortune 500

How to Audit Your AI Strategy

Decision-First AI: Why Data Should Follow, Not Lead

The “dataset-first” trap — what it looks like, and why it hurts

Decision-first in research: not new, but finally practical

The Decision Map: simple framework you can use today

A telecom example, mapped end-to-end (real story)

Another quick example: inventory decisions

Practical tips for teams (do this in week one)

Common objections and how to handle them

One-page checklist (copy this)

Final note — start small, measure fast, then scale with discipline

Key sources & further reading

The AI Umbrella: A Simple Guide to What Actually Matters

What Is the AI Umbrella?

Hype vs. Reality

The AI Umbrella Framework

Machine Learning (ML)

Graph Algorithms

Optimization & Planning

Human-Machine Interface

Faded but Foundational AI

The Market Numbers

What Makes AI Projects Work?

Putting It All Together

TrieKNN: Unleashing KNN’s Power on Mixed Data Types

The Allure and Limitation of KNN

Prior Art

Enter TrieKNN: A Novel Approach

Why This Works

Building a TrieKNN Model

Trie Implementation

CrossTab Sparsity for Classification

Introduction: A Journey into Data

The Power of CrossTab Sparsity

What is CrossTab Sparsity?

Data Overview: Our Data People at work here

The Datasets

Exploratory Data Analysis (EDA): Setting the Stage

EDA for Obesity Data

CrossTab Sparsity

Introduction

The Birth of the Metric: A First-Person Perspective

Our “Aha!” Moment - Crosstab Sparsity

Case Study: Vehicle Silhouettes (Through My Eyes)

The Dataset That Almost Broke Us

The Breakthrough

Comparative Advantages and Creativity at Work

Lessons Learned and Future Vision

A Personal Reflection

A flow to Test Your Hypothesis in Python

Overview

What are the results

Final note —
start small, measure fast, then scale with discipline