When solving a task in a data-driven fashion, one can do worse than looking at the data set itself before attempting a solution in the latest and coolest framework. We do so to figure out if the data we have collected contains any useful signal that we can capture in order to, say, estimate a given variable or predict an outcome out of a finite number of predefined outcomes. Having cleaned our data and accounted for the irregularities we have been lucky enough to notice, we extract the parameters we’re interested in and start slicing and dicing our data to get an idea of those parameters’ sample distributions.
In the absence of technical error, oftentimes we find ourselves looking at a large number of observations where the parameter values seem clumped together around the sample mean, with fewer observations where the parameter value is closer to an empirical extreme. When plotted on a histogram, the more typical parameter values will account for the “head”, whereas the less typical values will form “tails” towards the two extremes. In an oversimplified intuition, one can say that the tail is where the uncertainty in the behaviour of the resulting data-driven system lies.
In this blog I am primarily concerned with data-driven methods as applied to business systems. Business systems are designed to streamline processes and make them more efficient, or to automate simple repetitive tasks, ultimately saving time and resources. No matter how advanced our data science is, failure to achieve any of that is failure of the data-driven method we’ve chosen to implement. I argue that, to avoid such failures, we need to gain thorough understanding of the “tail” and of the significance it has to the business. Typically, this would mean diving deep into the client’s data set and customizing our approach to solving a seemingly generic task. Having gained such insight, we may ultimately reconsider building a business system on data science alone, or, more often than not, abandon data-driven approaches altogether.
In a series of articles I’d like to take a closer look at what it is that may make an otherwise really well-thought-of and mathematically sound approach incompatible with concrete business needs when the task specifics are not particularly aligned to the data at hand. The blog title is my tongue-in-cheek way of encouraging businesses to be critical of any AI-informed system that may fail to account for what is really specific to them (basically, their tail). The goal is to ensure that each specific use case is considered in a systemic, rigorous manner, safe from the smoke and mirrors often surrounding the marketing strategy behind such business systems.