Tristan Miller

Department of Computer Science · University of Manitoba
+1 204 474 8313 tristan@logological.org ()

I'm a computational linguist with research interests in lexical semantics, historical online corpora, and computational detection and interpretation of humour.



Publications

This article introduces heria, a LaTeX class to format funding proposals for the European Commission's Horizon Europe program. It provides a basic summary of the class's use; compares it to existing packages for funding proposals; discusses its motivations, design decisions, and limitations; and reports on its real-world use and plans for future development. Besides providing prospective Horizon Europe applicants with an overview of the class, this article may give prospective developers and users of classes for other proposal types some idea of the work involved and the potential pitfalls.
@article{miller2024preparing,
author       = {Tristan Miller},
title        = {Preparing {Horizon} {Europe} Proposals in {\LaTeX}{} with {heria}},
journal      = {{TUGboat}: The Communications of the {\TeX}{} {Users} {Group}},
volume       = {45},
number       = {1},
pages        = {59--64},
year         = {2024},
issn         = {0896-3207},
doi          = {10.47397/tb/45-1/tb139miller-horizon},
}
Clara Swaboda and Tristan Miller.
On the use of scale distortion for visual humour: A preliminary analysis.
European Journal of Humour Research, 12(2):206–211, June 2024. ISSN 2307-700X. DOI: 10.7592/EJHR.2024.12.2.904.
In contrast to verbal humour, visual humour remains a relatively underdeveloped area of research. In this exploratory study, we investigate whether scale incongruity – i.e., discrepancy between the expected and actual experience of the size of an object – can serve as a source of humour in the visual modality. We adapt a pre-existing visual data set of mundane scenes by altering the size of an individual object in each scene and collecting humorousness ratings from human annotators on the original and scale-distorted versions. Our analysis of these annotations reveals that scenes with distorted objects are perceived to be significantly funnier than the original images.
@article{swaboda2024use,
author       = {Clara Swaboda and Tristan Miller},
title        = {On the Use of Scale Distortion for Visual Humour: {A} Preliminary Analysis},
journal      = {European Journal of Humour Research},
volume       = {12},
number       = {2},
pages        = {206--211},
month        = jun,
year         = {2024},
issn         = {2307-700X},
doi          = {10.7592/EJHR.2024.12.2.904},
}
Liana Ermakova, Anne-Gwenn Bosser, Tristan Miller, Tremaine Thomas-Young, Victor Manuel Palma Preciado, Grigori Sidorov, and Adam Jatowt.
CLEF 2024 JOKER lab: Automatic humour analysis.
In Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, and Iadh Ounis, editors, Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, Proceedings, Part VI, volume 14613 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 36–43, Cham, March 2024. Springer. ISBN 978-3-031-56072-9. DOI: 10.1007/978-3-031-56072-9_5.
The JOKER Lab at the Conference and Labs of the Evaluation Forum (CLEF) aims to foster research on automated processing of verbal humour, including tasks such as retrieval, classification, interpretation, generation, and translation. While humour remains a cornerstone of human social interaction, despite the heady success of large language models for numerous natural language applications, humour and wordplay automatic processing are far from being a solved problem. JOKER brings together experts from the social and computational sciences and encourages them to collaborate on shared tasks with quality-controlled annotated datasets. In 2024, we will offer entirely new shared tasks on fine-grained sentiment analysis and classification of humour and humour-aware information retrieval. As in the past JOKER Labs, we will make our data available for an unshared task that solicits novel use cases. In this paper, we provide a brief retrospective on the JOKER Labs, with a focus on the results and lessons learnt from last year's iteration, and we preview the tasks to be held at JOKER 2024.
@inproceedings{ermakova2024clef,
author       = {Liana Ermakova and Anne-Gwenn Bosser and Tristan Miller and Tremaine Thomas-Young and Victor Manuel {Palma Preciado} and Grigori Sidorov and Adam Jatowt},
editor       = {Nazli Goharian and Nicola Tonellotto and Yulan He and Aldo Lipani and Graham McDonald and Craig Macdonald and Iadh Ounis},
title        = {{CLEF} 2024 {JOKER} Lab: Automatic Humour Analysis},
booktitle    = {Advances in Information Retrieval: 46th {European} {Conference} on {Information} {Retrieval}, {ECIR} 2024, {Glasgow}, {UK}, {March} 24–28, Proceedings, Part {VI}},
volume       = {14613},
pages        = {36--43},
series       = {Lecture Notes in Computer Science},
month        = mar,
year         = {2024},
publisher    = {Springer},
address      = {Cham},
isbn         = {978-3-031-56072-9},
issn         = {0302-9743},
doi          = {10.1007/978-3-031-56072-9_5},
}
Liana Ermakova, Anne-Gwenn Bosser, Adam Jatowt, and Tristan Miller.
The JOKER Corpus: English–French parallel data for multilingual wordplay recognition.
In SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2796–2806, New York, NY, July 2023. Association for Computing Machinery. ISBN 978-1-4503-9408-6. DOI: 10.1145/3539618.3591885.
Despite recent advances in information retrieval and natural language processing, rhetorical devices that exploit ambiguity or subvert linguistic rules remain a challenge for such systems. However, corpus-based analysis of wordplay has been a perennial topic of scholarship in the humanities, including literary criticism, language education, and translation studies. The immense data-gathering effort required for these studies points to the need for specialized text retrieval and classification technology, and consequently for appropriate test collections. In this paper, we introduce and analyze a new dataset for research and applications in the retrieval and processing of wordplay. Developed for the JOKER track at CLEF 2023, our annotated corpus extends and improves upon past English wordplay detection datasets in several ways. First, we introduce hundreds of additional positive examples; second, we provide French translations for the examples; and third, we provide negative examples with characteristics closely matching those of the positive examples. This last feature helps ensure that AI models learn to effectively distinguish wordplay from non-wordplay, and not simply texts differing in length, style, or vocabulary. Our test collection represents then a step towards wordplay-aware multilingual information retrieval.
@inproceedings{ermakova2023joker,
author       = {Liana Ermakova and Anne-Gwenn Bosser and Adam Jatowt and Tristan Miller},
title        = {The {JOKER} {Corpus}: {English}--{French} Parallel Data for Multilingual Wordplay Recognition},
booktitle    = {{SIGIR} '23: Proceedings of the 46th {International} {ACM} {SIGIR} {Conference} on {Research} and {Development} in {Information} {Retrieval}},
pages        = {2796--2806},
month        = jul,
year         = {2023},
publisher    = {Association for Computing Machinery},
address      = {New York, NY},
isbn         = {978-1-4503-9408-6},
doi          = {10.1145/3539618.3591885},
}

Projects

Funded research projects

Events & organizations

Software

Publishing & documentation


Miscellany

My interests in language, math, and computers were sparked and strengthened by exposure to the works of Willard R. Espy, Louis Phillips, Mike Keith, Dmitri Borgmann, Jim Butterfield, and others. These writers share a great talent for making technical or linguistic topics fun and accessible to a general audience. You can check out my own contributions to popular and recreational mathematics and linguistics, plus a few other odds and ends.

I also maintain an index of miscellaneous documents and websites I've produced which don't really fit into any other section.