AI presents a technologist quite possibly one of the most glamorous projects to work on. The promise of Artificial Intelligence (AI) to solve real problems through automation, amplification and simplification is definitely achievable. Today, we are tempted to jump on the bandwagon. But everything is not so great in AI Land.

it is time for us to change that by engaging in a more methodical approach. We need to work on AI’s prerequisites first so that we derive the true value of AI in the future.

AI is important for our future and we need to get it right. Logic mandates that we take this time to focus on the foundational requirements of AI.

The need of the hour is to solve the rampant data problem. We first need to address the need to deliver high-quality, integrated and standardized data to the enterprise. When that’s done, each enterprise’s AI will have all the relevant data, exactly the way it needs it.

AI’s Biggest Problem

Feed bad training data (simulated, synthetic, real or otherwise) and you will be staring down the barrel of AI failure. Incorrect predictions, high error rates, biases, skews and slew of other issues come to the fore. AI’s data problem usually boils down to:

1)   Bad quality of data
2)   Lack of the right data
3)   Combination of #1 & #2

The said AI model and algorithm was incorrect in forecasting and predicting the spread of the virus. A relevant question comes to mind – How could the predictions for Day#45 possibly be erroneous in excess of 5 orders of magnitude? On examining the details, it is easy to conclude that the predictions were made with inadequate data (#2) and the model’s failure to account for a set of continuously changing environmental factors and conditions.

Acting upon erroneous insights could potentially trigger irrecoverable long-term consequences for the business. We need to take responsibility for the relevance and quality of data, we feed AI.

Bottom line – Every enterprise needs to feed AI the required data, with high levels of quality and consistent standards. Without high-quality, integrated and standardized data, we will be unable to practice ‘Responsible AI’. We all have a moral responsibility to create good behaviours and great outcomes from AI.

Run a 5K before an Ultra-Marathon!

Using a running analogy, AI is comparable to running the Ultra-Marathon (distances exceeding 42 km). It is unwise to start an ultra-marathon without any prior running experience. It is also irresponsible to start an AI initiative without data management maturity.

A novice runner needs to attain an optimum level of fitness and meticulously focus on her diet, training regimen, rest, healthy lifestyle habits and more.

That significantly increases the probability of a successful 5K run. Once that is achieved, s/he works towards repeating that success for a 10K run, and then the half-marathon and so on.

As the runner gradually (methodically) progresses towards the ultra-marathon, s/he gets fitter, stronger and gains the required experience and maturity to run the longest race.

We need to adhere to a similar approach in data management to get an enterprise – ‘AI Ready’ – Its Ultra Marathon. We need to take a phased approach and progressively work on gaining data management maturity (data quality, business rules, compliance requirements and more).

Data Management Maturity – The Path to Successful AI

Here is a simplified progression to data management maturity:

1) Cleanse, Comply & Integrate business data from all relevant relational ‘system of record’ data sources to deliver high-quality Key Performance Indicators (KPIs) – 5K

2) Incorporate Customer Relationship Management (CRM) data for Analytics-I, where meaningful data journeys generate customer, product and other relevant insights – 10K

3) Integrate first set of non-relational data sources (documents, images) for Analytics-II – Half-Marathon

4) Integrate second set of non-relational data sources (web logs, social media, chatbot conversations, telemetry) for ML-I – Full-Marathon (Phase I)

5) Integrate third set of non-relational data sources (sound clips, videos) for ML-II – Full-Marathon (Phase II)

6) Your data is now ready for AI – Ultra-Marathon

This puts a huge burden on the Data Science team, who should primarily focus on models and algorithms.

Without a centralized high-quality data source that feeds AI, each data science project engages in one-off data cleansing. And that is not good data management, any way you look at it.


The infamous and much-coveted 360-degree view of the customer can be achieved only when data from all customer touch points are brought to bear.

A Data Integration Hub delivers value to the business, by enabling Analytics, ML and AI initiatives with high-quality, integrated and standardized data.

Leave a Reply

Your email address will not be published.

You may also like