Tristan Miller

Austrian Research Institute for Artificial Intelligence · Freyung 6/6 · 1010 Vienna · Austria
+43 1 5336112 12 tristan@logological.org ()

I'm a computational linguist with research interests in lexical semantics, historical online corpora, and computational detection and interpretation of humour. I currently head the Computational Pun-derstanding: Computer-Assisted Translation of Humorous Wordplay project at the Austrian Research Institute for Artificial Intelligence (OFAI).



Publications

Waltraud Kolb and Tristan Miller.
Human–computer interaction in pun translation.
In James Hadley, Kristiina Taivalkoski-Shilov, Carlos S. C. Teixeira, and Antonio Toral, editors, Using Technologies for Creative-Text Translation. Routledge, 2022. To appear.
We present and evaluate PunCAT, an interactive electronic tool for the translation of puns. Following the strategies known to be applied in pun translation, PunCAT automatically translates each sense of the pun separately; it then allows the user to explore the semantic fields of these translations in order to help construct a plausible target-language solution that maximizes the semantic correspondence to the original. Our evaluation is based on an empirical pilot study in which the participants translated puns from a variety of published sources from English into German, with and without PunCAT. We aimed to answer the following questions: Does the tool support, improve, or constrain the translation process, and if so, in what ways? And what are the tool's main benefits and drawbacks as perceived and described by the participants? Our analysis of the translators' cognitive processes gives us insight into their decision-making strategies and how they interacted with the tool. We find clear evidence that PunCAT effectively supports the translation process in terms of stimulating brainstorming and broadening the translator's pool of solution candidates. We have also identified a number of directions in which the tool could be adapted to better suit translators' work processes.
@incollection{kolb2022human,
author       = {Waltraud Kolb and Tristan Miller},
editor       = {James Hadley and Kristiina Taivalkoski-Shilov and Carlos S. C. Teixeira and Antonio Toral},
title        = {Human--Computer Interaction in Pun Translation},
booktitle    = {Using Technologies for Creative-Text Translation},
year         = {2022},
publisher    = {Routledge},
note         = {To appear},
}
Tristan Miller, Anthony Cohn, Tiansi Dong, Christian Hempelmann, Siba Mohsen, and Julia Rayz.
Can we diagram the understanding of humour?.
Dagstuhl Reports, 11(8):33, 2022. ISSN 2192-5283.
Cartoons can be understood without language. That is, a suitably arranged scene of simple objects, with no accompanying text, is often enough to make us laugh – evidence that thinking (mental activity) happens before language. This raises the question of non-linguistic diagrammatic representation of spatial humour, along with the mechanism of neural computation. In particular, we raise following questions: (1) How can we diagrammatically formalise spatial humour? (2) How can these diagrammatic formalisms be processed by neural networks? (3) How can this neural computation deliver high-level schema that are similar to the script-opposition semantic theory of humour? The spatial knowledge encoded in the scene can activate the necessary spatial and non- spatial knowledge. By what neural associative mechanism or process of reasoning do we put this all together to “get” the joke? During the seminar, we aimed to make some headway towards establishing (1) exactly what sort of scene-specific and common-sense knowledge is required to understand any given cartoon, (2) what part of this knowledge could in principle be acquired by existing machine learning (ML) techniques, and which could be acquired or encoded through symbolic structures, (3) what activation process acquires the rest of the knowledge required to interpret the humour, and (4) whether there is a unified representation that could represent this knowledge in a computer’s working memory.
@article{miller2022can,
author       = {Tristan Miller and Anthony Cohn and Tiansi Dong and Christian Hempelmann and Siba Mohsen and Julia Rayz},
title        = {Can We Diagram the Understanding of Humour?},
journal      = {Dagstuhl Reports},
volume       = {11},
number       = {8},
pages        = {33},
year         = {2022},
issn         = {2192-5283},
}
Liana Ermakova, Tristan Miller, Orlane Puchalski, Fabio Regattin, Élise Mathurin, Sílvia Araújo, Anne-Gwenn Bosser, Claudine Borg, Monika Bokiniec, Gaelle Le Corre, Benoît Jeanjean, Radia Hannachi, Ġorġ Mallia, Gordan Matas, and Mohamed Saki.
CLEF Workshop JOKER: Automatic wordplay and humour translation.
In Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty, editors, Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, Lecture Notes in Computer Science, pages 355–363, Berlin, Heidelberg, April 2022. Springer. ISBN 978-3-030-99738-0. DOI: 10.1007/978-3-030-99739-7_45.
Humour remains one of the most difficult aspects of intercultural communication: understanding humour often requires understanding implicit cultural references and/ or double meanings, and this raises the question of the (un)translatability of humour. Wordplay is a common source of humour in literature, journalism, and advertising due to its attention-getting, mnemonic, playful, and subversive character. The translation of humour and wordplay is therefore in high demand. Modern translation depends heavily on technological aids, yet few works have treated the automation of humour and wordplay translation and the creation of humour corpora. The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for creative language, including data and metric development, and to foster work on automatic methods for wordplay translation. We propose three pilot tasks: (1) classify and explain instances of wordplay, (2) translate single words containing wordplay, and (3) translate entire phrases containing wordplay.
@inproceedings{ermakova2022clef,
author       = {Liana Ermakova and Tristan Miller and Orlane Puchalski and Fabio Regattin and Élise Mathurin and Sílvia Araújo and Anne-Gwenn Bosser and Claudine Borg and Monika Bokiniec and Gaelle Le Corre and Benoît Jeanjean and Radia Hannachi and Ġorġ Mallia and Gordan Matas and Mohamed Saki},
editor       = {Matthias Hagen and Suzan Verberne and Craig Macdonald and Christin Seifert and Krisztian Balog and Kjetil Nørvåg and Vinay Setty},
title        = {{CLEF} {Workshop} {JOKER}: Automatic Wordplay and Humour Translation},
booktitle    = {Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II},
pages        = {355--363},
series       = {Lecture Notes in Computer Science},
month        = apr,
year         = {2022},
publisher    = {Springer},
address      = {Berlin, Heidelberg},
isbn         = {978-3-030-99738-0},
issn         = {0302-9743},
doi          = {10.1007/978-3-030-99739-7_45},
}
Jörg Wöckener, Thomas Haider, Tristan Miller, The-Khang Nguyen, Thanh Tung Linh Nguyen, Minh Vu Pham, Jonas Belouadi, and Steffen Eger.
End-to-end style-conditioned poetry generation: What does it take to learn from examples alone?.
In Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021), pages 57–66, November 2021.
In this work, we design an end-to-end model for poetry generation based on conditioned recurrent neural network (RNN) language models whose goal is to learn stylistic features (poem length, sentiment, alliteration, and rhyming) from examples alone. We show this model successfully learns the ‘meaning' of length and sentiment, as we can control it to generate longer or shorter as well as more positive or more negative poems. However, the model does not grasp sound phenomena like alliteration and rhyming, but instead exploits low-level statistical cues. Possible reasons include the size of the training data, the relatively low frequency and difficulty of these sublexical phenomena as well as model biases. We show that more recent GPT-2 models also have problems learning sublexical phenomena such as rhyming from examples alone.
@inproceedings{woeckener2021end,
author       = {J{\"{o}}rg W{\"{o}}ckener and Thomas Haider and Tristan Miller and The-Khang Nguyen and Thanh Tung Linh Nguyen and Minh Vu Pham and Jonas Belouadi and Steffen Eger},
title        = {End-to-end Style-Conditioned Poetry Generation: {What} Does It Take to Learn from Examples Alone?},
booktitle    = {Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021)},
pages        = {57--66},
month        = nov,
year         = {2021},
}
Alexandra Uma, Tommaso Fornaciari, Anca Dumitrache, Tristan Miller, Jon Chamberlain, Barbara Plank, Edwin Simpson, and Massimo Poesio.
SemEval-2021 Task 12: Learning with disagreements.
In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 338–347, August 2021. ISBN 978-1-954085-70-1. DOI: 10.18653/v1/2021.semeval-1.41.
Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision. However, most supervised machine learning methods assume that a single preferred interpretation exists for each item, which is at best an idealization. The aim of the SemEval-2021 shared task on Learning with Disagreements (Le-wi-Di) was to provide a unified testing framework for methods for learning from data containing multiple and possibly contradictory annotations covering the best-known datasets containing information about disagreements for interpreting language and classifying images. In this paper we describe the shared task and its results.
@inproceedings{uma2021semeval,
author       = {Alexandra Uma and Tommaso Fornaciari and Anca Dumitrache and Tristan Miller and Jon Chamberlain and Barbara Plank and Edwin Simpson and Massimo Poesio},
title        = {{SemEval}-2021 {Task}~12: Learning with Disagreements},
booktitle    = {Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)},
pages        = {338--347},
month        = aug,
year         = {2021},
isbn         = {978-1-954085-70-1},
doi          = {10.18653/v1/2021.semeval-1.41},
}

Projects

Funded research projects

Events & organizations

Software

Publishing & documentation


Miscellany

My interests in language, math, and computers were sparked and strengthened by exposure to the works of Willard R. Espy, Louis Phillips, Mike Keith, Dmitri Borgmann, Jim Butterfield, and others. These writers share a great talent for making technical or linguistic topics fun and accessible to a general audience. You can check out my own contributions to popular and recreational mathematics and linguistics, plus a few other odds and ends.

I also maintain an index of miscellaneous documents and websites I've produced which don't really fit into any other section.