Tristan Miller

Department of Computer Science · University of Manitoba
+1 204 474 8313 ()

I'm a computational linguist with research interests in lexical semantics, historical online corpora, and computational detection and interpretation of humour.


This article introduces heria, a LaTeX class to format funding proposals for the European Commission's Horizon Europe program. It provides a basic summary of the class's use; compares it to existing packages for funding proposals; discusses its motivations, design decisions, and limitations; and reports on its real-world use and plans for future development. Besides providing prospective Horizon Europe applicants with an overview of the class, this article may give prospective developers and users of classes for other proposal types some idea of the work involved and the potential pitfalls.
author       = {Tristan Miller},
title        = {Preparing {Horizon} {Europe} Proposals in {\LaTeX}{} with {heria}},
journal      = {{TUGboat}: The Communications of the {\TeX}{} {Users} {Group}},
volume       = {45},
number       = {1},
pages        = {59--64},
year         = {2024},
issn         = {0896-3207},
doi          = {10.47397/tb/45-1/tb139miller-horizon},
Liana Ermakova, Anne-Gwenn Bosser, Tristan Miller, Tremaine Thomas-Young, Victor Manuel Palma Preciado, Grigori Sidorov, and Adam Jatowt.
CLEF 2024 JOKER lab: Automatic humour analysis.
In Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, and Iadh Ounis, editors, Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, Proceedings, Part VI, volume 14613 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 36–43, Cham, March 2024. Springer. ISBN 978-3-031-56072-9. DOI: 10.1007/978-3-031-56072-9_5.
The JOKER Lab at the Conference and Labs of the Evaluation Forum (CLEF) aims to foster research on automated processing of verbal humour, including tasks such as retrieval, classification, interpretation, generation, and translation. While humour remains a cornerstone of human social interaction, despite the heady success of large language models for numerous natural language applications, humour and wordplay automatic processing are far from being a solved problem. JOKER brings together experts from the social and computational sciences and encourages them to collaborate on shared tasks with quality-controlled annotated datasets. In 2024, we will offer entirely new shared tasks on fine-grained sentiment analysis and classification of humour and humour-aware information retrieval. As in the past JOKER Labs, we will make our data available for an unshared task that solicits novel use cases. In this paper, we provide a brief retrospective on the JOKER Labs, with a focus on the results and lessons learnt from last year's iteration, and we preview the tasks to be held at JOKER 2024.
author       = {Liana Ermakova and Anne-Gwenn Bosser and Tristan Miller and Tremaine Thomas-Young and Victor Manuel {Palma Preciado} and Grigori Sidorov and Adam Jatowt},
editor       = {Nazli Goharian and Nicola Tonellotto and Yulan He and Aldo Lipani and Graham McDonald and Craig Macdonald and Iadh Ounis},
title        = {{CLEF} 2024 {JOKER} Lab: Automatic Humour Analysis},
booktitle    = {Advances in Information Retrieval: 46th {European} {Conference} on {Information} {Retrieval}, {ECIR} 2024, {Glasgow}, {UK}, {March} 24–28, Proceedings, Part {VI}},
volume       = {14613},
pages        = {36--43},
series       = {Lecture Notes in Computer Science},
month        = mar,
year         = {2024},
publisher    = {Springer},
address      = {Cham},
isbn         = {978-3-031-56072-9},
issn         = {0302-9743},
doi          = {10.1007/978-3-031-56072-9_5},
Liana Ermakova, Anne-Gwenn Bosser, Adam Jatowt, and Tristan Miller.
The JOKER Corpus: English–French parallel data for multilingual wordplay recognition.
In SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2796–2806, New York, NY, July 2023. Association for Computing Machinery. ISBN 978-1-4503-9408-6. DOI: 10.1145/3539618.3591885.
Despite recent advances in information retrieval and natural language processing, rhetorical devices that exploit ambiguity or subvert linguistic rules remain a challenge for such systems. However, corpus-based analysis of wordplay has been a perennial topic of scholarship in the humanities, including literary criticism, language education, and translation studies. The immense data-gathering effort required for these studies points to the need for specialized text retrieval and classification technology, and consequently for appropriate test collections. In this paper, we introduce and analyze a new dataset for research and applications in the retrieval and processing of wordplay. Developed for the JOKER track at CLEF 2023, our annotated corpus extends and improves upon past English wordplay detection datasets in several ways. First, we introduce hundreds of additional positive examples; second, we provide French translations for the examples; and third, we provide negative examples with characteristics closely matching those of the positive examples. This last feature helps ensure that AI models learn to effectively distinguish wordplay from non-wordplay, and not simply texts differing in length, style, or vocabulary. Our test collection represents then a step towards wordplay-aware multilingual information retrieval.
author       = {Liana Ermakova and Anne-Gwenn Bosser and Adam Jatowt and Tristan Miller},
title        = {The {JOKER} {Corpus}: {English}--{French} Parallel Data for Multilingual Wordplay Recognition},
booktitle    = {{SIGIR} '23: Proceedings of the 46th {International} {ACM} {SIGIR} {Conference} on {Research} and {Development} in {Information} {Retrieval}},
pages        = {2796--2806},
month        = jul,
year         = {2023},
publisher    = {Association for Computing Machinery},
address      = {New York, NY},
isbn         = {978-1-4503-9408-6},
doi          = {10.1145/3539618.3591885},
Netizens, Michael and Ronda Hauben's foundational treatise on Usenet and the Internet, was first published in print 25 years ago. In this piece, we trace the history and impact of the book and of Usenet itself, contextualising them within the contemporary and modern-day scholarship on virtual communities, online culture, and Internet history. We discuss the Net as a tool of empowerment, and touch on the social, technical, and economic issues related to the maintenance of shared network infrastructures and to the preservation and commodification of Usenet archives. Our interview with Ronda Hauben offers a retrospective look at the development of online communities, their impact, and how they are studied. She recounts her own introduction to the online world, as well as the impetus and writing process for Netizens. She presents Michael Hauben's conception of “netizens” as contributory citizens of the Net (rather than mere users of it) and the “electronic commons” they built up, and argues that this collaborative and collectivist model has been overwhelmed and endangered by the privatisation and commercialisation of the Internet and its communities.
author       = {Tristan Miller and Camille Paloque-Bergès and Avery Dame-Griff},
title        = {Remembering {Netizens}: {An} Interview with {Ronda} {Hauben}, Co-Author of {Netizens}: {On} the History and Impact of {Usenet} and the {Internet} (1997)},
journal      = {Internet Histories: Digital Technology, Culture and Society},
volume       = {7},
number       = {1},
pages        = {76--98},
year         = {2022},
issn         = {2470-1483},
doi          = {10.1080/24701475.2022.2123120},


Funded research projects

Events & organizations


Publishing & documentation


My interests in language, math, and computers were sparked and strengthened by exposure to the works of Willard R. Espy, Louis Phillips, Mike Keith, Dmitri Borgmann, Jim Butterfield, and others. These writers share a great talent for making technical or linguistic topics fun and accessible to a general audience. You can check out my own contributions to popular and recreational mathematics and linguistics, plus a few other odds and ends.

I also maintain an index of miscellaneous documents and websites I've produced which don't really fit into any other section.