This thesis focuses on the one-to-many problem and the various unexplored challenges therein. Specifically, for most machine learning tasks, there is sufficient uncertainty that there is usually more than one plausible output for a given input. Failure to explicitly account for this can lead to models that are unable to represent the full range of possible outputs. Interactive AI systems comprise systems in which the output is not just a function of a static input, but also some interaction from a user. Concretely, this includes dialogue systems, where a user interacts with a chatbot, and also interactive image segmentation systems, where the final AI-produced segmentation is guided by a user-provided bounding box or click. In these cases, the user's intentionality provides a key source of uncertainty. For instance, if a user clicks a wheel on the image of a car, do they want the AI model to segment the entire car, all four wheels, or just that single wheel? It is precisely this user-induced uncertainty that forms the focal point of this thesis. The thesis is split into mainly two parts. In Part I, we consider implicitly one-to-many tasks -- tasks where there are multiple plausible outputs, but the final task still requires predicting a single output. In Part II, we consider explicitly one-to-many tasks -- tasks in which the end goal is to produce multiple outputs. We first deal with what to do when there is a mismatch between the output the model wants to generate and the outputs available to it. Then we handle the related problem of ensuring the model is capable of learning the full range of outputs for a particular task. We then try to generalise our methods so that they can make use of large language models, which provide powerful performance across a range of tasks without requiring bespoke training, and also across modalities such as images, so that we can show our techniques do not require wholesale redesigning if the modality changes. Lastly, in Part I we show how our techniques can handle a user that provides new information through their behaviour over time. In Part II, we first demonstrate a simulation-based approach with smart reply systems, which, in contrast to the ad hoc previous methods, provides a principled solution to generating a set of relevant and diverse reply suggestions. Finally, we address how to deal with model architectures that are inherently deterministic (e.g. image segmentation), and we demonstrate a recurrent neural network approach that allows for multiple predictions, without requiring any major alterations on the underlying base architecture. Overall, this thesis presents a variety of technical contributions that solve a range of challenges pertaining to the one-to-many problem in interactive AI systems, by synergising various techniques such as simulation, knowledge distillation, and bootstrapping.
Benjamin Towle (Wed,) studied this question.