Text Is Sexy

This is about written text stored under any digital form that has been or can be rendered in human-readable form on a screen, and that has originally been composed by a person. Such text can be either directly input by the person themselves using any available means, or be inferred from a medium carrying the original message (such as printed text on a sheet of paper that is later scanned or speech which is transcribed automatically to text).

But, what is text and does it really concern me?

It is generally accepted in linguistic and evolutionary theory that any known written system is long predated by the speech it records. Once made, this realization, no matter how obvious it should have been, leaves a long trail of consequences for the way we reason about written communication. If we consider a phonetic, alphabetical system of writing, in oversimplified terms any single character that can be mapped to a sound serves to represent an infinite number of variations of that sound as produced by any speaker of the language it belongs to, at any given time while the writing system has been in use. Sounds in isolation are rarely carriers of any message but the morphemes they form are already mapped to meaning of various degrees of abstraction in the knowledge shared across all speakers of that language. Morphemes in turn are the building blocks of words, many of which map to very concrete ideas, objects or phenomena in the natural world. For many modern languages words correspond to sequences of characters we refer to as tokens on the printed page or on the screen, separated by white space and/or punctuation. With words we build phrases that are then combined into clauses and sentences, which is what written text consists of. Each of the foregoing statements represents a naive generalization over a number of linguistic and behavioural or psychological theories and can be very easily challenged by any undergraduate student of linguistics (just ask them, should you chance upon one, what is a word, and you’ll see what I mean). Still, for the sake of argument, I’d like to treat written text in any given language as a conventional single representation of an infinite number of possible variations in spoken production.

In software engineering we can rarely afford to raise our heads above the sea of complexity involved into building even a simple website processing some sort of text input, to look beyond what goes literally from the keyboard to the database, and start philosophizing about language. Developers are typically not concerned with what a user actually writes or what a document transmitted as part of a transaction actually contains (unless they need to erect safeguards against, say, various types of injection attacks, or they are actually working on some smart content-mining feature, in which case they may be already in a frame of mind not unlike the one I’ve hinted at). In the general case, however, any feature directly dependent on the actual textual input will be pushed to the end of your project, when, as a rule, all time and money allocated will have long been spent. Of course, such features may have been discussed in broader terms at various stages of the project implementation, with the occasional mention of the great machine learning / natural language processing / artificial intelligence tech our company has developed and will kindly contribute to the project.

In truth, without access to the actual content to be processed by the system we’re building there is little that the implementation team can bring to the table, other than a general-purpose data pipeline. Besides, our agile project management would hate to see us spend time building features that may never be needed, come the actual content. Therefore, if not central to the project, any “smart” feature half-promised in the hope of winning the contract is likely to be put together in haste and with little regard for the actual content it will depend on in production. And because of that it is hardly likely to perform to the client’s satisfaction. If there’s enough trust left to our credit at this stage we might get away with a change request post the go-live date, which might once again end up on the back burner since the client is already live and we may have moved on to other projects and new clients. This seemingly dead end can hardly be attributed to the fault of any of the stakeholders, it is just the way of modern software development. While there can be no recipe for success in our field, let alone in developing data-driven features, in the case of human language encoded as written text an abstract framework of attitudes and precautions could be adopted to mitigate the effect of textual data not receiving proper attention while actual software development takes place.

Your data, your rules

What’s common to any piece of text produced by a human is that it will bear traces of the producer’s own person, even in the strictest of contexts. Then, as long as the one analyzing the text by automated means is not the one producing the message, pretty soon it will emerge that any major assumption of the analyst’s breaks down at the n-th example, where n is not a very large number. Simple things such as the presence of HTML tags or any other formatting markup, spelling and punctuation, the use of auto-formatting, hyphenation, indentation and whitespace, 7-bit character encodings, to name just a few, may cause tangible disturbance to the delivery of a new business system operating on textual content unless special care has been taken to address most of the idiosyncrasies inherent in the client’s data. In a sense, the system configured to “read” the text may very well fail to understand what the author “meant” when typing a given sequence of characters on their device or applying some formatting we didn’t really expect.

One might argue that, in the age of self-driving cars and talking machines, having some variety in the textual content shouldn’t deter a modern business system from being smart about such text. Well, let’s pause and think about what it is that makes systems text-smart.

The dreaded rules

Unless we’re at a Google kind of place, or building the coolest startup from scratch, chances are that machine learning is not our management’s weapon of choice. For a business system provider there are a few good reasons for that, such as the ability to explain to the client how things work and the need to ensure deterministic outcomes that wouldn’t otherwise aggravate someone who is just trying to do their job on the computer your client gave them. Rules seem to go down well with senior management, too, as even they may get to have a say in what would work in their organization, which is not the case when presented with the prospect of having to think of a document as a list of a million real numbers (most of which zero!) being fed to a black box outputting just a handful of real numbers on the other end telling them what the document is about or how angry the person writing it must have been.

Rules applied to text come in many different forms: from the simple if .. then .. else .. to dedicated pattern matching in the form of regular expressions or stacked finite-state automata, to name but a few. The idea of having a clean set of rules to do what we want in a fashion similar to what an HTML parser does for a web page or a compiler does for a piece of code is truly noble. However, while code is typically already written in a regular “language”, one that can only be produced by a finite set of rules and conversely lends itself to interpretation applying a subset of those rules, natural language is a biological phenomenon long predating the idea of feeding it to a limited machine for parsing. For a parser of a programming language the choice is fairly easy: if no possible parse tree can be inferred from the text, the code does not compute, and a syntax error is thrown since no “meaning” (or instruction) can be inferred from it. But for a reader of human language any text can carry some meaning, due to our incredible ability to adapt to errors and fill in the gaps in communication. Still, when processing human language automatically we can’t do much better than to try to infer a highly likely parse tree from a sequence of tokens. Such parse tree must be a well-formed one within the constraints of the grammar rules for the given language. Depending on the parsing technology, any input that is not strictly grammatical would either fail to produce a parse, or would result in some highly likely parse (according to our model) describing a grammatical sequence (again, such grammaticality fully at our parser’s discretion), that may or may not correspond to the interpretation of a human reader. Given that we’re typically not guarding against ungrammatical input in a business system, any inherently “ungrammatical” input is likely to remain under the radar in terms of attempting a rules engine on it.

One way to account for possible variations in a rules engine would be to try and capture the typical deviations from the strict rules and relax those a bit to allow them to fire also when, say, the user typed invoce instead of invoice. If we have an overview of the typical erroneous inputs, this should be doable. However, any errors we consider will be the idiosyncrasies of a limited number of people submitting such input to the business system. Any new user might introduce a totally new way of breaking the rule, at which time we should go back and revise the existing rules to accommodate also their errors. As we go along we will be evolving the rules engine from one that allows no variation whatsoever to one allowing an ever increasing amount of uncertainty. To make this work in practice we can’t stick to a fully deterministic output either, as a growing number of rules will start firing at the same time on the same input, requiring a judge to determine the most appropriate interpretation out of many possible ones, which is where automation starts to break down. To avoid that we may decide to introduce some measure of confidence or likelihood for each interpretation based on prior observations and context to allow us to choose the interpretation scoring the best. When doing so we should strive not to assign unnecessary preference to any possible interpretation since that would be pretty much the same as applying a strict rule. In formal terms that would mean reducing the bias introduced by hard-bounded rules and increasing the variance of the decision-making mechanism. These concrete measures are actually something one can optimize on when training a decision-making algorithm on observed data.

Wait, but I thought rules were rules and machine learning was magic

Well, no. Anything a machine learns from some data that it can later perform on unseen data can be mapped to a pattern matching rule of arbitrary complexity. Just like, armed with enough patience, one can write a monster of a regular expression to detect every occurrence of an action performed by Anna Karenina on any of the 800+ pages of the book, a machine can be trained to do the same based on a sufficient number of varied examples of such patterns occurring in similar texts. What the machine will come up with will be its own set of rules, infinitely more complex than the one we designed by hand, but such that would still do pattern matching over sequences of strings. Machine-learning algorithms can be debugged to explain the decisions taken at every step before the final outcome, so we can gain insight into the rules applied (although this is hardly feasible for a deep-learning algorithm where the human-readable representation of the data is lost already at the input layer). If we do so we might be surprised to find that the rules the machine came up with make very little sense in human terms. Yet, the outcome from applying the rules is usually comparable with the outcome from applying any handcrafted rule.

With this realization in hand we may be able to convince management that it’s not such a bad idea to try to support the decisions in our business system in a data-driven fashion and spare ourselves the labour of devising rules by hand that would never be able to capture all variation in any larger number of examples we have to work with. Assuming we have already amassed a database where human associates have identified the patterns we’re interested in in previous transactions, this should be a piece of cake. Lately I’ve been revisiting Philipp Koehn’s Statistical Machine Translation (an excellent study of the state of the art in phrase-based SMT and beyond on the eve of the deep-learning revolution in NLP). What strikes me time and time again is the clean-room nature of the examples used. The mathematics behind translating the German Er geht ja nicht nach Hause to He does not go home is mind-blowing, whether it’s computed within the erstwhile state-of-the-art framework of phrase-based statistical machine translation, or that of large-scale discriminative models. Yet it can only serve as an abstraction for the underlying set of rules we’re taught in school as long as the input truly makes sense within that set of rules. Any single spelling mistake, omission or duplication in the input may throw the system into a completely new state depending on the bias/variance balance of the model. For typos or unconventional use of punctuation that would often mean the words affected would likely be treated as out-of-vocabulary (OOV) words. Depending on the algorithms involved, for those special cases the system would typically either fall back on some special treatment, or their distribution will already have been accounted for in training by only considering the n most frequent words in the training data, and treating the rest as OOVs. Neither of which may be a good approximation for a spelling mistake in a really frequent vocabulary word, which is not that uncommon in business systems, in particular such that are in use across borders and languages.

One does not need to excel at spelling and grammar to be able to do their job and communicate effectively with clients. This is especially true for organizations where a shared language is not native for a sizeable part of the workforce. Raw text input generated in such organizations can present real challenges for any automation relying on the presence of specific signals in the text. Such signals are either inferred by handcrafted rules corresponding to our own understanding of the laws of language (such understanding assumed to be correct) or learned from the presence of specific features in training examples, most of which follow those same laws. Ungrammatical examples in the training data are either a minority that can have little effect for the learned parameters, or, if a majority, would typically exhibit a wide range of errors from which hardly any useful signal can be picked. Then, when presented with an unseen ungrammatical example, the system may lack the evidence necessary to fire any meaningful rule since the new example may be “wrong” in a completely novel manner. This property of free text input derives directly from the definition of written text as a single abstraction over an infinite number of possible variations of a given message originating as either speech or thought. Unless the laws of such abstraction are strictly applied, encoding the same message in the same written form twice could become elusive.

What can you do?

A case for standardization

I believe there is a need to ensure standardization across free-text inputs in the business system. Any approach to working with text will fall between two extremes: complete control over the input either by removing free-text fields altogether or by allowing only input from a controlled subset of the language, or completely free input. The former ensures regularity and would allow you to apply your own handcrafted rules to trigger the desired outcomes, yet it will hardly go down well with your users in the year 2019 and counting. Any attempted automation based on the latter may break down completely. Think of the email reply suggestions you’re getting from the world’s data-richest service, Google Mail; anything more specific than a Congratulations, I’m interested and the likes is simply too dangerous to suggest for fear of appearing silly. A mail service’s autoreply task is infinitely harder than that of a business system operating within the strict context of your line of business, that may trigger a limited amount of outcomes (such as escalating an issue, notifying the correct party or redirecting to the answer of a specific FAQ, to give some random examples). You may not even intend to use plain-text user input to trigger any hand-offs in your system automatically and instead would like to keep it as data supporting human decisions after the fact. Even so it would be wise if you could run simple aggregate functions on the data than having to read through each message to gather insight. Either way you’ll be better served by text that is as close as possible to a standard form on the surface, and such that includes the accepted terminology for your line of business. To ensure the best outcomes in such settings I’d like to propose something of a middle ground between fully controlled domain-specific regularized languages and uninhibited social-media-style input.

Motivating users to do the right thing

If we’re not to impose any hard restrictions on textual input maybe we could consider what it is that would otherwise incentivize users to adhere to a standard norm of writing. When we learn to write, generally this happens in a classroom environment with a more or less defined system of awards and penalties. Random deviations from the taught norm, if not penalized outright, are highlighted and corrected, while the lack of any teacher marks on your paper, with a “Well done!” token of encouragement at the bottom ensuring the work has actually been reviewed, is an unmistakable sign of success.

Think of how password strength control has evolved lately. The dry “the password must contain this and this, and be this long…” is but a bad memory of the 90s. Many modern services would instead colour your input accordingly so you get a true intuition of whether you’re doing okay, and would thus nudge you to choose a password that would keep you in the green. Given a good language model that takes little more than just the normalized free-text input one has on record to train, a similar tool could be deployed to indicate whether new inputs are sufficiently conforming to the model, and consequently would allow to be interpreted as actionable. Maybe a language model alone wouldn’t suffice so the business system will have to be augmented with further data-driven checks, such as fuzzy terminology matches. This, combined with autocomplete anywhere you can should definitely pull any free-text input towards a standard form that would lend itself more readily to data-driven analysis of any flavour.

Who’s responsible

At present, all this might seem irrelevant to business systems where input is being collected in free form for some reason but no data-driven techniques have been required to date. This does not mean that next year’s digital transformation initiative wouldn’t identify such a business system as ripe for optimization applying the latest in data-driven insights. It is not a matter of if but of when this should happen, and when it happens it is better to be prepared than to expect that the project team will be able to pull it off by some twist of data-science magic just given access to your stash of invaluable human input. Because actual text content likely won’t be the focus of any major software development effort you, as business-system owner, have got to help yourselves by minimizing the need for the implementation team to clean your text. Besides, you’re sitting closest to the originator of any such text both in space and time so it should be easiest for you to figure out what precautions to take in this respect. It is infinitely more difficult for a third party, not familiar with your line of business or your organization’s lingo, to standardize your own textual data long after the fact.

Concluding remarks

The title of this post is credited to a former colleague for whom I have the greatest respect. Since I have not obtained their permission to use their name in a personal blog, I’d rather not, until they’ve asked me to. The context was automatic extraction of text from tables in printed/scanned PDF files. Anyone who’s ever peeked into the raw text extracted from a PDF file would understand immediately the complexity of this issue since what is really appealing to the human eye on a well-typeset white sheet of paper could be utterly confusing to a machine treating any unstructured input strictly as a linear sequence of characters. To me, the same goes for just any free-text input. Just like no one expected to have to make it easier for a printed table to be translated back to a map-like data structure, regular users wouldn’t be striving to give you the sexy kind of input, a well-formed paragraph with uniform, standard use of spacing and newline characters, punctuation and capitalization that would look as good in your database as when printed on a fast-fashion item of clothing. Yet, this is the kind of text “smart” algorithms are trained on, so unless you’ve taken care to bring such input closer to the deemed norm at the source, chances are the success rate of such algorithms on your data would suffer. Besides, investing in proper text normalization at the source, even without an immediate need to do so, would benefit not only the latest in machine learning fads but also any kind of initiative at collecting insight from your text data both today and 10 years from now.

Why Chatbots Make Poor UIs

Until not so long ago a conversational agent (a.k.a. chatbot) seemed to be the hallmark of any customer-facing AI-informed service. Not since the ELIZA of the 1960s had a script mimicking human interaction drawn so much attention, this time backed by significant piles of investor cash in addition to the press coverage. Hailed as the low-cost replacement of human call/chat agents trapped in a never-ending rewind and repeat of the same answer to the same old question, or as an exciting new retail channel to serve the needs of the busy millenial with no time to research the nearest florist on their way to a dinner party, or those of another, not so busy one, ordering a pizza and preferring, for reasons unknown, to choose a topping by typing instead of pointing and clicking, chatbot technology was pushed out of the obscure circles of Turing-test cheaters and into the C-suite overnight.

Chitchat vs. UI

Apart from the undisputed attraction of the talking machine that has powered many a great work of fiction, little concern is given to the actual utility of a chatbot function in a business system. Chatbots seem to fall into one of two major categories (where one chatbot can combine functions of both): the chitchat flavour (of which ELIZA is still probably the best example even in the age of Alexa, Siri, and Cortana) and the one that is designed to perform a certain action in response to a user request (such as booking a seat on a flight, answering a specific question or sending a command to a system under the user’s control – think Harrison Ford’s character in Blade Runner ordering a CRT screen to zoom/enhance a photo of a crime scene). The former’s utility is difficult to quantify as its main purpose is to keep the conversation going. Surely you could plug a marketing or political message here and there, and sprinkle the browser with ads while the user is painstakingly trying to figure out new ways to elicit funny responses from the script. Still there will be no hard evidence of either a conversion or of any time saved – not least because it is against the nature of the chitchat function to spare time! Turning to the latter, its utility should lend itself more readily to formal analysis as the chatbot’s purpose is to serve as the medium to carry the user’s request to the underlying service. And we have a pretty good idea of what it should and should not do.

The medium translating a user’s actions into specific machine commands is also known as the user interface or UI. On the transmitting side (i.e. when the command goes from the user to the machine) this could be, to name a few, anything from a simple on/off button, a knob, a dashboard of controls, a terminal screen with a keyboard to type instructions in a “language” the machine can interpret, a GUI with touch/point-and-click function, a microphone, or a combination of any of the above. On the receiving side (i.e. when the user is notified of the result from their action) this could be anything from a simple light/acoustic signal, an analog or digital readout, a line on a text-based terminal screen, a GUI control element, speech played back through a speaker etc.

The Importance of Context

The trust level between humans and machines has never been particularly good. This goes both ways. On the user’s end it is highly important that the outcome of any specific command is deterministic, that, all things being equal, exactly the same action performed multiple times always results in exactly the same response. It is also vital that the response produced by the system under the user’s control is unambiguous. In other words, the user must rest assured that the computer is doing exactly as they told it to. On the machine’s end this translates to the need for each instruction to be executed in a very specific context: when saving a file the computer needs to be told where to put it; when processing a pizza order the computer needs to be told what size, pizza dough, toppings you want. Each comes from a predefined list of folders on the file system or options offered by the pizza place. From this point of view the computer must be certain to have gathered all necessary input from the user and that the user understands that they are the ones controlling such input. Formally put, a valid context template needs to exist in order for the computer to produce a deterministic result in response to the user’s action. The only way to fill out such a context template is by prompting the user for further action and then making sure that the system’s interpretation of the context is the same as that of the user (unless you’re running a surprise pizza delivery service where you’ve given the computer free rein, but even then the user must have told the service beforehand that a surprise is what they want to order!).

We are considering only the case of acting in good faith, that is, we are introducing the chatbot pizza ordering service in order to help our callers choose faster, with less hassle, and, very importantly, in order for them to return to the chatbot instead of calling our phone line and tying up our human staff unnecessarily the next time they want pizza delivered. There are certainly cases in which adding confusion to the medium could even push sales upwards, and there are also cases where by protracting the conversation the chatbot could collect valuable insight into the user’s habits and preferences, but those are not our concern in a business system.

UI as Map

From a bird’s-eye view, any software controlled by the user is tasked with the mapping of user actions to outcomes, be it carrying out a search request in response to a query, processing a payment on the clicking of a checkout button, creating or modifying a resource when the user hits the upload/rename button etc. No matter how complex and configurable, business systems are not infinitely versatile. The set of outcomes a business system can produce is well-defined, and, more importantly, finite. In a traditional (non-natural-language-enabled) UI setting the finite set of outcomes is mapped to a finite number of input states, such as domain-specific formal language expressions (like shell commands or protocol instructions), valid sets of UI control states on a screen, or configurations entered via some dedicated hardware control panel. Truly free input, if any, is restricted to variables that do not affect the flow of control inconsistently. Because the number of input states is finite, those can be described in a user manual exhaustively, along with all possible outcomes. Delegating the UI function to a conversational agent on the other hand produces a system mapping a finite set of outcomes (because we’re still living in the present, where HAL 9000 has not been invented yet) to a virtually infinite set of input states enabled by free input. Both user-controlled variables and the flow of control are to be inferred from it. For such a system we will only be able to provide an exhaustive documentation of the possible outcomes, with the possible inputs remaining hidden from the user. And because the set of possible outcomes is not in any way expanded by the free input, in terms of the number and complexity of user-computer interactions necessary to attain any given output, at best we can hope to achieve parity between a traditional UI and one implemented by way of a conversational agent.

Chatbot vs. GUI

Irrespective of organization size and structure, introducing a new business system requires careful planning and consideration. In a typical scenario, a project sponsor would advocate for the innovation and be held accountable for its success by senior management. Such success will be measured in terms of cost savings, client satisfaction, net promoter score, to name a few. Some organizations would go so far as to record the number of actions (clicks) the user takes in order to achieve a given outcome. Below I’d like to focus on a couple of concrete measures that could be useful in determining the utility of a chatbot in a business-system setting. The relevance of each measure doesn’t seem affected across the entire range of chatbot technology, from simple string-based pattern-matching to full-fledged deep-learning models.

1. Time complexity of establishing context

For a traditional GUI the collecting of user input, such as a search query or the name of the file being uploaded, always takes place in the immediate context in which such input will be consumed, determined by activating the specific input field (for instance, you type the file name in the fileName input field mapping to the underlying variable). In formal terms, determining such context has a running time complexity of O(1). To achieve the same constant time complexity in free, natural-language input setting, one needs to know exactly what to “say” to the system and exactly “how”. Any ambiguity in the user’s input will somehow need to be resolved by follow-up questions by the chatbot, effectively increasing the running time complexity.

2. Client satisfaction

This one is hard to reason about in quantitative terms but we can think of some central trait that we know for sure will affect client satisfaction. Such as the ability to accomplish a simple task in the system. Failure to do so with a reasonable amount of effort should definitely be a cause for concern.

For a simple outcome, such as posting on a blog, most likely the context template in a traditional GUI setting is more or less laid out before the user’s eyes on a single screen. More complex actions may require the activation of context menus or even switching between multiple tabs, yet all fields in the context template are perceived as points on a single flat surface. Abandoning GUI for a conversational agent, the mental picture of the context template becomes more akin to a graph structure that requires multiple partial paths along the graph to be persisted in the user’s own biological memory to allow for back-offs and retries in case the natural-language interaction results in a dead end or in an undesired potential outcome. A simple example would be booking a flight for a given stretch on a given date. If I am not satisfied with the offered fare I can easily modify my query by choosing different flight dates or another airline from the GUI controls on a single screen and take in all the resulting changes as a whole. Assuming that the alternative chatbot UI is able to correctly preserve the context of my initial query, I am still left with the burden of remembering the query parameters that are left unchanged along the partial path on the graph that would eventually lead to the alternative offer. Any visual aids the system may resort to in an attempt to help me out in this situation would at best be poorly disguised GUI controls serving directly the same needs as the ones the imaginary chatbot is serving by way of circumlocution.

It turns out that to accomplish a simple, straightforward task through the agency of a chatbot, one has either to be a true power user of the system who knows exactly how to talk to it in order to obtain the desired effect in the least amount of attempts, or, if genuinely buying into the idea of a productive conversation with the machine, to be prepared to buffer in their own memory an arbitrary amount of parameter values which otherwise would be laid out for them on a single screen. Neither of which translates to a positive client experience for most of us.

3. Cost

Cost overruns in an IT project are as certain as the law of gravity. A complicating factor in data-driven projects is that those would typically require the involvement of client resources and be run in an iterative fashion, where we’d be aiming to improve overall performance in small increments with each iteration. Regressions are to be expected and in some use cases the finished system trained and tuned on data which we’ve been pushing staff extra hard to collect, will have already become outdated by the time it’s ready to be put to productive use. Even if the risk is clearly highlighted, given the novelty of conversational assistants and the undisputed need for the client to have skin in the game we can’t know if we’re doing well until late in the implementation cycle. Unlike more traditional projects, such as overhauling a web site, introducing some new feature or replacing a back-end database engine, where the client themselves can draw from in-house IT resources, when implementing a chatbot they are entirely at the mercy of the provider. The provider is the expert and they themselves can draw from a chatbot guru pool of limited size. In such settings it is difficult to assess the fitness of any proposed approach before you can actually start posting lines to the bot. By then the client will have been required to collect training examples and assign senior subject-matter experts to the project to provide the necessary support to the implementation team. Having spent this much time and effort, the client is likely to fall victim to the sunk cost fallacy should they find out the advertised chatbot function does little to address their underlying needs.

One could argue that the same concerns are equally valid for any other effort to improve customer engagement by technical means. One important difference would be that a chatbot interface comes in addition to an existing UI. Even the Alexas of today are yet another entry point to the same established channels powered by a traditional GUI. Given that a chatbot interface can at best perform on a par with a GUI, not having the right GUI in place renders futile any subsequent attempts at diversifying the system of engagement. Given the uncertainty of the return on investing into rethinking a system of engagement it seems a safer bet to throw our effort at, e.g. rethinking UX by shortening the path from a blank page to a desired outcome, than at building a data-driven text-based UI from scratch.

Conclusion

Conversational agents are fascinating applications likely to be considered outside of the strict technological domain due to their very purpose to mimick sentient exchange. Yet, when incorporated into a business system, the chatbot’s function is strictly defined by its inputs and outputs just like any other user interface. One should then discuss the chatbot in the same terms as the alternatives. In the light of such discussion it seems that a conversational agent could add little value to a business system mapping from a finite set of possible user actions to a finite set of specific outcomes.

Outside of the domain of business systems a conversational agent can play an important role as a system of engagement not purporting to improve conversion or accomplish any given task in an efficient manner. Shortly before the latest rise in the commercial interest into chatbot applications the undersigned did develop own conversational agents as part of a community online radio station. It was the only means for a visitor to ask for a specific tune to be played next. There was evidence that the chatbot, implemented in the powerful AIML framework, did indeed keep visitors engaged for some time while the radio was playing tunes by independent artists. I’d like to think that, crude and inefficient as it was, it still had a positive impact on retention and that it served to improve listen times and thus popularize the great work contributed by our patrons. However, as an interface to the underlying database it was suffering from the deficiencies already discussed so a drop-down menu of artists and tunes would’ve been much easier to implement and more efficient to use. I’d hate to see it standing in the way of a willing customer to the checkout button.