Why Chatbots Make Poor UIs

Until not so long ago a conversational agent (a.k.a. chatbot) seemed to be the hallmark of any customer-facing AI-informed service. Not since the ELIZA of the 1960s had a script mimicking human interaction drawn so much attention, this time backed by significant piles of investor cash in addition to the press coverage. Hailed as the low-cost replacement of human call/chat agents trapped in a never-ending rewind and repeat of the same answer to the same old question, or as an exciting new retail channel to serve the needs of the busy millenial with no time to research the nearest florist on their way to a dinner party, or those of another, not so busy one, ordering a pizza and preferring, for reasons unknown, to choose a topping by typing instead of pointing and clicking, chatbot technology was pushed out of the obscure circles of Turing-test cheaters and into the C-suite overnight.

Chitchat vs. UI

Apart from the undisputed attraction of the talking machine that has powered many a great work of fiction, little concern is given to the actual utility of a chatbot function in a business system. Chatbots seem to fall into one of two major categories (where one chatbot can combine functions of both): the chitchat flavour (of which ELIZA is still probably the best example even in the age of Alexa, Siri, and Cortana) and the one that is designed to perform a certain action in response to a user request (such as booking a seat on a flight, answering a specific question or sending a command to a system under the user’s control – think Harrison Ford’s character in Blade Runner ordering a CRT screen to zoom/enhance a photo of a crime scene). The former’s utility is difficult to quantify as its main purpose is to keep the conversation going. Surely you could plug a marketing or political message here and there, and sprinkle the browser with ads while the user is painstakingly trying to figure out new ways to elicit funny responses from the script. Still there will be no hard evidence of either a conversion or of any time saved – not least because it is against the nature of the chitchat function to spare time! Turning to the latter, its utility should lend itself more readily to formal analysis as the chatbot’s purpose is to serve as the medium to carry the user’s request to the underlying service. And we have a pretty good idea of what it should and should not do.

The medium translating a user’s actions into specific machine commands is also known as the user interface or UI. On the transmitting side (i.e. when the command goes from the user to the machine) this could be, to name a few, anything from a simple on/off button, a knob, a dashboard of controls, a terminal screen with a keyboard to type instructions in a “language” the machine can interpret, a GUI with touch/point-and-click function, a microphone, or a combination of any of the above. On the receiving side (i.e. when the user is notified of the result from their action) this could be anything from a simple light/acoustic signal, an analog or digital readout, a line on a text-based terminal screen, a GUI control element, speech played back through a speaker etc.

The Importance of Context

The trust level between humans and machines has never been particularly good. This goes both ways. On the user’s end it is highly important that the outcome of any specific command is deterministic, that, all things being equal, exactly the same action performed multiple times always results in exactly the same response. It is also vital that the response produced by the system under the user’s control is unambiguous. In other words, the user must rest assured that the computer is doing exactly as they told it to. On the machine’s end this translates to the need for each instruction to be executed in a very specific context: when saving a file the computer needs to be told where to put it; when processing a pizza order the computer needs to be told what size, pizza dough, toppings you want. Each comes from a predefined list of folders on the file system or options offered by the pizza place. From this point of view the computer must be certain to have gathered all necessary input from the user and that the user understands that they are the ones controlling such input. Formally put, a valid context template needs to exist in order for the computer to produce a deterministic result in response to the user’s action. The only way to fill out such a context template is by prompting the user for further action and then making sure that the system’s interpretation of the context is the same as that of the user (unless you’re running a surprise pizza delivery service where you’ve given the computer free rein, but even then the user must have told the service beforehand that a surprise is what they want to order!).

We are considering only the case of acting in good faith, that is, we are introducing the chatbot pizza ordering service in order to help our callers choose faster, with less hassle, and, very importantly, in order for them to return to the chatbot instead of calling our phone line and tying up our human staff unnecessarily the next time they want pizza delivered. There are certainly cases in which adding confusion to the medium could even push sales upwards, and there are also cases where by protracting the conversation the chatbot could collect valuable insight into the user’s habits and preferences, but those are not our concern in a business system.

UI as Map

From a bird’s-eye view, any software controlled by the user is tasked with the mapping of user actions to outcomes, be it carrying out a search request in response to a query, processing a payment on the clicking of a checkout button, creating or modifying a resource when the user hits the upload/rename button etc. No matter how complex and configurable, business systems are not infinitely versatile. The set of outcomes a business system can produce is well-defined, and, more importantly, finite. In a traditional (non-natural-language-enabled) UI setting the finite set of outcomes is mapped to a finite number of input states, such as domain-specific formal language expressions (like shell commands or protocol instructions), valid sets of UI control states on a screen, or configurations entered via some dedicated hardware control panel. Truly free input, if any, is restricted to variables that do not affect the flow of control inconsistently. Because the number of input states is finite, those can be described in a user manual exhaustively, along with all possible outcomes. Delegating the UI function to a conversational agent on the other hand produces a system mapping a finite set of outcomes (because we’re still living in the present, where HAL 9000 has not been invented yet) to a virtually infinite set of input states enabled by free input. Both user-controlled variables and the flow of control are to be inferred from it. For such a system we will only be able to provide an exhaustive documentation of the possible outcomes, with the possible inputs remaining hidden from the user. And because the set of possible outcomes is not in any way expanded by the free input, in terms of the number and complexity of user-computer interactions necessary to attain any given output, at best we can hope to achieve parity between a traditional UI and one implemented by way of a conversational agent.

Chatbot vs. GUI

Irrespective of organization size and structure, introducing a new business system requires careful planning and consideration. In a typical scenario, a project sponsor would advocate for the innovation and be held accountable for its success by senior management. Such success will be measured in terms of cost savings, client satisfaction, net promoter score, to name a few. Some organizations would go so far as to record the number of actions (clicks) the user takes in order to achieve a given outcome. Below I’d like to focus on a couple of concrete measures that could be useful in determining the utility of a chatbot in a business-system setting. The relevance of each measure doesn’t seem affected across the entire range of chatbot technology, from simple string-based pattern-matching to full-fledged deep-learning models.

1. Time complexity of establishing context

For a traditional GUI the collecting of user input, such as a search query or the name of the file being uploaded, always takes place in the immediate context in which such input will be consumed, determined by activating the specific input field (for instance, you type the file name in the fileName input field mapping to the underlying variable). In formal terms, determining such context has a running time complexity of O(1). To achieve the same constant time complexity in free, natural-language input setting, one needs to know exactly what to “say” to the system and exactly “how”. Any ambiguity in the user’s input will somehow need to be resolved by follow-up questions by the chatbot, effectively increasing the running time complexity.

2. Client satisfaction

This one is hard to reason about in quantitative terms but we can think of some central trait that we know for sure will affect client satisfaction. Such as the ability to accomplish a simple task in the system. Failure to do so with a reasonable amount of effort should definitely be a cause for concern.

For a simple outcome, such as posting on a blog, most likely the context template in a traditional GUI setting is more or less laid out before the user’s eyes on a single screen. More complex actions may require the activation of context menus or even switching between multiple tabs, yet all fields in the context template are perceived as points on a single flat surface. Abandoning GUI for a conversational agent, the mental picture of the context template becomes more akin to a graph structure that requires multiple partial paths along the graph to be persisted in the user’s own biological memory to allow for back-offs and retries in case the natural-language interaction results in a dead end or in an undesired potential outcome. A simple example would be booking a flight for a given stretch on a given date. If I am not satisfied with the offered fare I can easily modify my query by choosing different flight dates or another airline from the GUI controls on a single screen and take in all the resulting changes as a whole. Assuming that the alternative chatbot UI is able to correctly preserve the context of my initial query, I am still left with the burden of remembering the query parameters that are left unchanged along the partial path on the graph that would eventually lead to the alternative offer. Any visual aids the system may resort to in an attempt to help me out in this situation would at best be poorly disguised GUI controls serving directly the same needs as the ones the imaginary chatbot is serving by way of circumlocution.

It turns out that to accomplish a simple, straightforward task through the agency of a chatbot, one has either to be a true power user of the system who knows exactly how to talk to it in order to obtain the desired effect in the least amount of attempts, or, if genuinely buying into the idea of a productive conversation with the machine, to be prepared to buffer in their own memory an arbitrary amount of parameter values which otherwise would be laid out for them on a single screen. Neither of which translates to a positive client experience for most of us.

3. Cost

Cost overruns in an IT project are as certain as the law of gravity. A complicating factor in data-driven projects is that those would typically require the involvement of client resources and be run in an iterative fashion, where we’d be aiming to improve overall performance in small increments with each iteration. Regressions are to be expected and in some use cases the finished system trained and tuned on data which we’ve been pushing staff extra hard to collect, will have already become outdated by the time it’s ready to be put to productive use. Even if the risk is clearly highlighted, given the novelty of conversational assistants and the undisputed need for the client to have skin in the game we can’t know if we’re doing well until late in the implementation cycle. Unlike more traditional projects, such as overhauling a web site, introducing some new feature or replacing a back-end database engine, where the client themselves can draw from in-house IT resources, when implementing a chatbot they are entirely at the mercy of the provider. The provider is the expert and they themselves can draw from a chatbot guru pool of limited size. In such settings it is difficult to assess the fitness of any proposed approach before you can actually start posting lines to the bot. By then the client will have been required to collect training examples and assign senior subject-matter experts to the project to provide the necessary support to the implementation team. Having spent this much time and effort, the client is likely to fall victim to the sunk cost fallacy should they find out the advertised chatbot function does little to address their underlying needs.

One could argue that the same concerns are equally valid for any other effort to improve customer engagement by technical means. One important difference would be that a chatbot interface comes in addition to an existing UI. Even the Alexas of today are yet another entry point to the same established channels powered by a traditional GUI. Given that a chatbot interface can at best perform on a par with a GUI, not having the right GUI in place renders futile any subsequent attempts at diversifying the system of engagement. Given the uncertainty of the return on investing into rethinking a system of engagement it seems a safer bet to throw our effort at, e.g. rethinking UX by shortening the path from a blank page to a desired outcome, than at building a data-driven text-based UI from scratch.

Conclusion

Conversational agents are fascinating applications likely to be considered outside of the strict technological domain due to their very purpose to mimick sentient exchange. Yet, when incorporated into a business system, the chatbot’s function is strictly defined by its inputs and outputs just like any other user interface. One should then discuss the chatbot in the same terms as the alternatives. In the light of such discussion it seems that a conversational agent could add little value to a business system mapping from a finite set of possible user actions to a finite set of specific outcomes.

Outside of the domain of business systems a conversational agent can play an important role as a system of engagement not purporting to improve conversion or accomplish any given task in an efficient manner. Shortly before the latest rise in the commercial interest into chatbot applications the undersigned did develop own conversational agents as part of a community online radio station. It was the only means for a visitor to ask for a specific tune to be played next. There was evidence that the chatbot, implemented in the powerful AIML framework, did indeed keep visitors engaged for some time while the radio was playing tunes by independent artists. I’d like to think that, crude and inefficient as it was, it still had a positive impact on retention and that it served to improve listen times and thus popularize the great work contributed by our patrons. However, as an interface to the underlying database it was suffering from the deficiencies already discussed so a drop-down menu of artists and tunes would’ve been much easier to implement and more efficient to use. I’d hate to see it standing in the way of a willing customer to the checkout button.