Publications

This is a list of my scientific and technical publications. (My publications in humour, poetry, and recreational linguistics are in a separate list.)

This list is also available as a BibTeX file.

Anthony G. Cohn, Tiansi Dong, Christian F. Hempelmann, Tristan Miller, Siba Mohsen, and Julia Taylor Rayz.
Can we diagram the understanding of visual humour?
Dagstuhl Reports, 12, 2022. ISSN 2192-5283. To appear.
Purely visual humour, such as static cartoons, can be understood without language. That is, a suitably arranged scene of simple objects, with no accompanying text, is often enough to make us laugh—evidence that thinking (mental activity) happens before language. This raises the question of non-linguistic diagrammatic representation of visual humour, along with the mechanism of neural computation. In particular, we raise following questions: (1) How can we diagrammatically formalise visual humour? (2) How can these diagrammatic formalisms be processed by neural networks? (3) How can this neural computation deliver high-level schema that are similar to the script-opposition semantic theory of humour?
@article{cohn2022can,
author       = {Anthony G. Cohn and Tiansi Dong and Christian F. Hempelmann and Tristan Miller and Siba Mohsen and Julia Taylor Rayz},
title        = {Can We Diagram the Understanding of Visual Humour?},
journal      = {Dagstuhl Reports},
volume       = {12},
year         = {2022},
issn         = {2192-5283},
note         = {To appear},
}
Waltraud Kolb and Tristan Miller.
Human–computer interaction in pun translation.
In James Hadley, Kristiina Taivalkoski-Shilov, Carlos S. C. Teixeira, and Antonio Toral, editors, Using Technologies for Creative-Text Translation. Routledge, 2022. To appear.
We present and evaluate PunCAT, an interactive electronic tool for the translation of puns. Following the strategies known to be applied in pun translation, PunCAT automatically translates each sense of the pun separately; it then allows the user to explore the semantic fields of these translations in order to help construct a plausible target-language solution that maximizes the semantic correspondence to the original. Our evaluation is based on an empirical pilot study in which the participants translated puns from a variety of published sources from English into German, with and without PunCAT. We aimed to answer the following questions: Does the tool support, improve, or constrain the translation process, and if so, in what ways? And what are the tool's main benefits and drawbacks as perceived and described by the participants? Our analysis of the translators' cognitive processes gives us insight into their decision-making strategies and how they interacted with the tool. We find clear evidence that PunCAT effectively supports the translation process in terms of stimulating brainstorming and broadening the translator's pool of solution candidates. We have also identified a number of directions in which the tool could be adapted to better suit translators' work processes.
@incollection{kolb2022human,
author       = {Waltraud Kolb and Tristan Miller},
editor       = {James Hadley and Kristiina Taivalkoski-Shilov and Carlos S. C. Teixeira and Antonio Toral},
title        = {Human--Computer Interaction in Pun Translation},
booktitle    = {Using Technologies for Creative-Text Translation},
year         = {2022},
publisher    = {Routledge},
note         = {To appear},
}
Liana Ermakova, Tristan Miller, Orlane Puchalski, Fabio Regattin, Élise Mathurin, Sílvia Araújo, Anne-Gwenn Bosser, Claudine Borg, Monika Bokiniec, Gaelle Le Corre, Benoît Jeanjean, Radia Hannachi, Ġorġ Mallia, Gordan Matas, and Mohamed Saki.
CLEF Workshop JOKER: Automatic wordplay and humour translation.
In Proceedings of the 44th European Conference on Information Retrieval (ECIR 2022), Lecture Notes in Computer Science, Cham, Switzerland, April 2022. Springer. To appear.
Humour remains one of the most difficult aspects of intercultural communication: understanding humour often requires understanding implicit cultural references and/ or double meanings, and this raises the question of the (un)translatability of humour. Wordplay is a common source of humour in literature, journalism, and advertising due to its attention-getting, mnemonic, playful, and subversive character. The translation of humour and wordplay is therefore in high demand. Modern translation depends heavily on technological aids, yet few works have treated the automation of humour and wordplay translation and the creation of humour corpora. The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for creative language, including data and metric development, and to foster work on automatic methods for wordplay translation. We propose three pilot tasks: (1) classify and explain instances of wordplay, (2) translate single words containing wordplay, and (3) translate entire phrases containing wordplay.
@inproceedings{ermakova2022clef,
author       = {Liana Ermakova and Tristan Miller and Orlane Puchalski and Fabio Regattin and Élise Mathurin and Sílvia Araújo and Anne-Gwenn Bosser and Claudine Borg and Monika Bokiniec and Gaelle Le Corre and Benoît Jeanjean and Radia Hannachi and Ġorġ Mallia and Gordan Matas and Mohamed Saki},
title        = {{CLEF} {Workshop} {JOKER}: Automatic Wordplay and Humour Translation},
booktitle    = {Proceedings of the 44th European Conference on Information Retrieval (ECIR 2022)},
series       = {Lecture Notes in Computer Science},
month        = apr,
year         = {2022},
publisher    = {Springer},
address      = {Cham, Switzerland},
issn         = {0302-9743},
note         = {To appear},
}
Jörg Wöckener, Thomas Haider, Tristan Miller, The-Khang Nguyen, Thanh Tung Linh Nguyen, Minh Vu Pham, Jonas Belouadi, and Steffen Eger.
End-to-end style-conditioned poetry generation: What does it take to learn from examples alone?.
In Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021), pages 57–66, November 2021.
In this work, we design an end-to-end model for poetry generation based on conditioned recurrent neural network (RNN) language models whose goal is to learn stylistic features (poem length, sentiment, alliteration, and rhyming) from examples alone. We show this model successfully learns the ‘meaning' of length and sentiment, as we can control it to generate longer or shorter as well as more positive or more negative poems. However, the model does not grasp sound phenomena like alliteration and rhyming, but instead exploits low-level statistical cues. Possible reasons include the size of the training data, the relatively low frequency and difficulty of these sublexical phenomena as well as model biases. We show that more recent GPT-2 models also have problems learning sublexical phenomena such as rhyming from examples alone.
@inproceedings{woeckener2021end,
author       = {J{\"{o}}rg W{\"{o}}ckener and Thomas Haider and Tristan Miller and The-Khang Nguyen and Thanh Tung Linh Nguyen and Minh Vu Pham and Jonas Belouadi and Steffen Eger},
title        = {End-to-end Style-Conditioned Poetry Generation: {What} Does It Take to Learn from Examples Alone?},
booktitle    = {Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021)},
pages        = {57--66},
month        = nov,
year         = {2021},
}
Alexandra Uma, Tommaso Fornaciari, Anca Dumitrache, Tristan Miller, Jon Chamberlain, Barbara Plank, Edwin Simpson, and Massimo Poesio.
SemEval-2021 Task 12: Learning with disagreements.
In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 338–347, August 2021. ISBN 978-1-954085-70-1. DOI: 10.18653/v1/2021.semeval-1.41.
Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision. However, most supervised machine learning methods assume that a single preferred interpretation exists for each item, which is at best an idealization. The aim of the SemEval-2021 shared task on Learning with Disagreements (Le-wi-Di) was to provide a unified testing framework for methods for learning from data containing multiple and possibly contradictory annotations covering the best-known datasets containing information about disagreements for interpreting language and classifying images. In this paper we describe the shared task and its results.
@inproceedings{uma2021semeval,
author       = {Alexandra Uma and Tommaso Fornaciari and Anca Dumitrache and Tristan Miller and Jon Chamberlain and Barbara Plank and Edwin Simpson and Massimo Poesio},
title        = {{SemEval}-2021 {Task}~12: Learning with Disagreements},
booktitle    = {Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)},
pages        = {338--347},
month        = aug,
year         = {2021},
isbn         = {978-1-954085-70-1},
doi          = {10.18653/v1/2021.semeval-1.41},
}
Tristan Miller.
Dmitri Borgmann's rotas square articles.
Notes and Queries, 67(3):431–432, September 2020. ISSN 0029-3970. DOI: 10.1093/notesj/gjaa113.
In 1979 and 1980, Word Ways: The Journal of Recreational Linguistics printed a series of articles on the early history, religious symbolism, and cultural significance of the rotas square, an ancient Latin-language palindromic word square. The articles were attributed to Dmitri A. Borgmann, the noted American writer on wordplay and former editor of Word Ways. While they attracted little attention at the time, some 35 years after their publication (and 29 years after Borgmann's death), questions began to be raised about their authorship. There is much internal and external evidence that, taken together, compellingly supports the notion that Borgmann did not write the articles himself. This paper surveys this evidence and solicits help in identifying the articles' original source.
@article{miller2020dmitri,
author       = {Tristan Miller},
title        = {{Dmitri Borgmann's} Rotas Square Articles},
journal      = {Notes and Queries},
volume       = {67},
number       = {3},
pages        = {431--432},
month        = sep,
year         = {2020},
issn         = {0029-3970},
doi          = {10.1093/notesj/gjaa113},
}
Tristan Miller and Denis Auroux.
GPP, the generic preprocessor.
Journal of Open Source Software, 5(51), July 2020. ISSN 2475-9066. DOI: 10.21105/joss.02400.
In computer science, a preprocessor (or macro processor) is a tool that programatically alters its input, typically on the basis of inline annotations, to produce data that serves as input for another program. Preprocessors are used in software development and document processing workflows to translate or extend programming or markup languages, as well as for conditional or pattern-based generation of source code and text. Early preprocessors were relatively simple string replacement tools that were tied to specific programming languages and application domains, and while these have since given rise to more powerful, general-purpose tools, these often require the user to learn and use complex macro languages with their own syntactic conventions. In this paper, we present GPP, an extensible, general-purpose preprocessor whose principal advantage is that its syntax and behaviour can be customized to suit any given preprocessing task. This makes GPP of particular benefit to research applications, where it can be easily adapted for use with novel markup, programming, and control languages.
@article{miller2020gpp,
author       = {Tristan Miller and Denis Auroux},
title        = {{GPP}, the Generic Preprocessor},
journal      = {Journal of Open Source Software},
volume       = {5},
number       = {51},
month        = jul,
year         = {2020},
issn         = {2475-9066},
doi          = {10.21105/joss.02400},
}
Tristan Miller.
Don't shun the pun: On the requirements and constraints for preserving ambiguity in the (machine) translation of humour.
In Mehrdad Sabetzadeh, Andreas Vogelsang, Sallam Abualhaija, Markus Borg, Fabiano Dalpiaz, Maya Daneva, Nelly C. Fernández, Xavier Franch, Davide Fucci, Vincenzo Gervasi, Eduard Groen, Renata Guizzardi, Andrea Herrmann, Jennifer Horkoff, Luisa Mich, Anna Perini, and Angelo Susi, editors, Joint Proceedings of REFSQ-2020 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 26th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2020), volume 2584 of CEUR Workshop Proceedings (ISSN 1613-0073), March 2020.
How do we know when a translation is good? This seemingly simple question has long dogged human practitioners of translation, and has arguably taken on even greater importance in today’s world of fully automatic, end-to-end machine translation systems. Much of the difficulty in assessing translation quality is that different translations of the same text may be made for different purposes, each of which entails a unique set of requirements and constraints. This difficulty is compounded by ambiguities in the source text, which must be identified and then preserved or eliminated according to the needs of the translation and the (apparent) intent of the source text. In this talk, I survey the state of the art in linguistics, computational linguistics, translation, and machine translation as it relates to the notion of linguistic ambiguity in general, and intentional humorous ambiguity in particular. I describe the various constraints and requirements of different types of translations and provide examples of how various automatic and interactive techniques from natural language processing can be used to detect and then resolve or preserve linguistic ambiguities according to these constraints and requirements. In the vein of the “Translator’s Amanuensis” proposed by Martin Kay, I outline some specific proposals concerning how the hitherto disparate work in the aforementioned fields can be connected with a view to producing “machine-in-the-loop” computer-assisted translation (CAT) tools to assist human translators in selecting and implementing pun translation strategies in furtherance of the translation requirements. Throughout the talk, I will attempt to draw links with how this research relates to the requirements engineering community.
@inproceedings{miller2020dont,
author       = {Tristan Miller},
editor       = {Mehrdad Sabetzadeh and Andreas Vogelsang and Sallam Abualhaija and Markus Borg and Fabiano Dalpiaz and Maya Daneva and Nelly C. Fernández and Xavier Franch and Davide Fucci and Vincenzo Gervasi and Eduard Groen and Renata Guizzardi and Andrea Herrmann and Jennifer Horkoff and Luisa Mich and Anna Perini and Angelo Susi},
title        = {Don't Shun the Pun: {On} the Requirements and Constraints for Preserving Ambiguity in the (Machine) Translation of Humour},
booktitle    = {Joint Proceedings of REFSQ-2020 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 26th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2020)},
volume       = {2584},
series       = {CEUR Workshop Proceedings},
month        = mar,
year         = {2020},
issn         = {1613-0073},
}
Tristan Miller, Erik-Lân Do Dinh, Edwin Simpson, and Iryna Gurevych.
Predicting the humorousness of tweets using Gaussian process preference learning.
Procesamiento del Lenguaje Natural, 64:37–44, March 2020. ISSN 1135-5948. DOI: 10.26342/2020-64-4.
Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations. We apply our system, which is similar to one that had previously shown good performance on English-language one-liners annotated with pairwise humorousness annotations, to the Spanish-language data set of the HAHA@IberLEF2019 evaluation campaign. We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF2019 data and the pairwise judgment annotations required for our method.
@article{miller2020predicting,
author       = {Tristan Miller and Do Dinh, Erik-L{\^{a}}n and Edwin Simpson and Iryna Gurevych},
title        = {Predicting the Humorousness of Tweets Using {Gaussian} Process Preference Learning},
journal      = {Procesamiento del Lenguaje Natural},
volume       = {64},
pages        = {37--44},
month        = mar,
year         = {2020},
issn         = {1135-5948},
doi          = {10.26342/2020-64-4},
}
Tristan Miller.
Reinhold Aman, 1936–2019.
Humor: International Journal of Humor Research, 32(1):1–5, February 2020. ISSN 0933-1719. DOI: 10.1515/humor-2019-0085.
@article{miller2020reinhold,
author       = {Tristan Miller},
title        = {Reinhold {Aman}, 1936--2019},
journal      = {Humor: International Journal of Humor Research},
volume       = {32},
number       = {1},
pages        = {1--5},
month        = feb,
year         = {2020},
issn         = {0933-1719},
doi          = {10.1515/humor-2019-0085},
}
Tristan Miller.
Reinhold Aman (1936–2019).
The LINGUIST List, 30.4729, December 2019.
@article{miller2019reinhold,
author       = {Tristan Miller},
title        = {Reinhold {Aman} (1936--2019)},
journal      = {The LINGUIST List},
volume       = {30.4729},
month        = dec,
year         = {2019},
}
The translation of wordplay is one of the most extensively researched problems in translation studies, but it has attracted little attention in the fields of natural language processing and machine translation. This is because today's language technologies treat anomalies and ambiguities in the input as things that must be resolved in favour of a single “correct” interpretation, rather than preserved and interpreted in their own right. But if computers cannot yet process such creative language on their own, can they at least provide specialized support to translation professionals? In this paper, I survey the state of the art relevant to computational processing of humorous wordplay and put forth a vision of how existing theories, resources, and technologies could be adapted and extended to support interactive, computer-assisted translation.
@inproceedings{miller2019punsters,
author       = {Tristan Miller},
title        = {The Punster's Amanuensis: {The} Proper Place of Humans and Machines in the Translation of Wordplay},
booktitle    = {Proceedings of the Second Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT 2019)},
pages        = {57--64},
month        = sep,
year         = {2019},
issn         = {2683-0078},
doi          = {10.26615/issn.2683-0078.2019_007},
}
Tristan Miller, Erik-Lân Do Dinh, Edwin Simpson, and Iryna Gurevych.
OFAI–UKP at HAHA@IberLEF2019: Predicting the humorousness of tweets using Gaussian process preference learning.
In Miguel Ángel García Cumbreras, Julio Gonzalo, Eugenio Martínez Cámara, Raquel Martínez Unanue, Paolo Rosso, Jorge Carrillo de Albornoz, Soto Montalvo, Luis Chiruzzo, Sandra Collovini, Yoan Guitiérrez, Salud Jiménez Zafra, Martin Krallinger, Manuel Montes y Gómez, Reynier Ortega-Bueno, and Aiala Rosá, editors, Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), volume 2421 of CEUR Workshop Proceedings (ISSN 1613-0073), pages 180–190, August 2019.
Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations. We apply our system, which had previously shown good performance on English-language one-liners annotated with pairwise humorousness annotations, to the Spanish-language data set of the HAHA@IberLEF2019 evaluation campaign. We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF2019 data and the pairwise judgment annotations required for our method.
@inproceedings{miller2019ofaiukp,
author       = {Tristan Miller and Do Dinh, Erik-L{\^{a}}n and Edwin Simpson and Iryna Gurevych},
editor       = {García Cumbreras, Miguel Ángel and Julio Gonzalo and Martínez Cámara, Eugenio and Martínez Unanue, Raquel and Paolo Rosso and Jorge Carrillo-de-Albornoz and Soto Montalvo and Luis Chiruzzo and Sandra Collovini and Yoan Guitiérrez and Jiménez Zafra, Salud and Martin Krallinger and Manuel Montes-y-Gómez and Reynier Ortega-Bueno and Aiala Rosá},
title        = {{OFAI}--{UKP} at {HAHA}@{IberLEF}2019: {Predicting} the Humorousness of Tweets Using {Gaussian} Process Preference Learning},
booktitle    = {Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)},
volume       = {2421},
pages        = {180--190},
series       = {CEUR Workshop Proceedings},
month        = aug,
year         = {2019},
issn         = {1613-0073},
}
Edwin Simpson, Erik-Lân Do Dinh, Tristan Miller, and Iryna Gurevych.
Predicting humorousness and metaphor novelty with Gaussian process preference learning.
In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), pages 5716–5728, July 2019. ISBN 978-1-950737-48-2. DOI: 10.18653/v1/P19-1572.
The inability to quantify key aspects of creative language is a frequent obstacle to natural language understanding. To address this, we introduce novel tasks for evaluating the creativeness of language—namely, scoring and ranking text by humorousness and metaphor novelty. To sidestep the difficulty of assigning discrete labels or numeric scores, we learn from pairwise comparisons between texts. We introduce a Bayesian approach for predicting humorousness and metaphor novelty using Gaussian process preference learning (GPPL), which achieves a Spearman's $\rho$ of 0.56 against gold using word embeddings and linguistic features. Our experiments show that given sparse, crowdsourced annotation data, ranking using GPPL outperforms best–worst scaling. We release a new dataset for evaluating humor containing 28,210 pairwise comparisons of 4,030 texts, and make our software freely available.
@inproceedings{simpson2019predicting,
author       = {Edwin Simpson and Do Dinh, Erik-L{\^{a}}n and Tristan Miller and Iryna Gurevych},
title        = {Predicting Humorousness and Metaphor Novelty with {Gaussian} Process Preference Learning},
booktitle    = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
pages        = {5716--5728},
month        = jul,
year         = {2019},
isbn         = {978-1-950737-48-2},
doi          = {10.18653/v1/P19-1572},
}
The study of argumentation and the development of argument mining tools depends on the availability of annotated data, which is challenging to obtain in sufficient quantity and quality. We present a method that breaks down a popular but relatively complex discourse-level argument annotation scheme into a simpler, iterative procedure that can be applied even by untrained annotators. We apply this method in a crowdsourcing setup and report on the reliability of the annotations obtained. The source code for a tool implementing our annotation method, as well as the sample data we obtained (4909 gold-standard annotations across 982 documents), are freely released to the research community. These are intended to serve the needs of qualitative research into argumentation, as well as of data-driven approaches to argument mining.
@inproceedings{miller2019streamlined,
author       = {Tristan Miller and Maria Sukhareva and Iryna Gurevych},
title        = {A Streamlined Method for Sourcing Discourse-level Argumentation Annotations from the Crowd},
booktitle    = {Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019)},
volume       = {1},
pages        = {1790--1796},
month        = jun,
year         = {2019},
isbn         = {978-1-950737-13-0},
doi          = {10.18653/v1/N19-1177},
}
Christian Stab, Tristan Miller, Benjamin Schiller, Pranav Rai, and Iryna Gurevych.
Cross-topic argument mining from heterogeneous sources.
In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), pages 3664–3674, October 2018. ISBN 978-1-948087-84-1. DOI: 10.18653/v1/D18-1402.
Argument mining is a core technology for automating argument search in large document collections. Despite its usefulness for this task, most current approaches are designed for use only with specific text types and fall short when applied to heterogeneous texts. In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. We source annotations for over 25,000 instances covering eight controversial topics. We show that integrating topic information into bidirectional long short-term memory networks outperforms vanilla BiLSTMs by more than 3 percentage points in F$_1$ in two- and three-label cross-topic settings. We also show that these results can be further improved by leveraging additional data for topic relevance using multi-task learning.
@inproceedings{stab2018bcross-topic,
author       = {Christian Stab and Tristan Miller and Benjamin Schiller and Pranav Rai and Iryna Gurevych},
title        = {Cross-topic Argument Mining from Heterogeneous Sources},
booktitle    = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)},
pages        = {3664--3674},
month        = oct,
year         = {2018},
isbn         = {978-1-948087-84-1},
doi          = {10.18653/v1/D18-1402},
}
Christian Stab, Johannes Daxenberger, Chris Stahlhut, Tristan Miller, Benjamin Schiller, Christopher Tauchmann, Steffen Eger, and Iryna Gurevych.
ArgumenText: Searching for arguments in heterogeneous sources.
In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (NAACL-HLT 2018), pages 21–25, June 2018. ISBN 978-1-948087-28-5. DOI: 10.18653/v1/N18-5005.
Argument mining is a core technology for enabling argument search in large corpora. However, most current approaches fall short when applied to heterogeneous texts. In this paper, we present an argument retrieval system capable of retrieving sentential arguments for any given controversial topic. By analyzing the highest-ranked results extracted from Web sources, we found that our system covers 89% of arguments found in expert-curated lists of arguments from an online debate portal, and also identifies additional valid arguments.
@inproceedings{stab2018argumentext,
author       = {Christian Stab and Johannes Daxenberger and Chris Stahlhut and Tristan Miller and Benjamin Schiller and Christopher Tauchmann and Steffen Eger and Iryna Gurevych},
title        = {{ArgumenText}: Searching for Arguments in Heterogeneous Sources},
booktitle    = {Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (NAACL-HLT 2018)},
pages        = {21--25},
month        = jun,
year         = {2018},
isbn         = {978-1-948087-28-5},
doi          = {10.18653/v1/N18-5005},
}
Christian Stab, Tristan Miller, and Iryna Gurevych.
Cross-topic argument mining from heterogeneous sources using attention-based neural networks.
ArXiv e-prints, 1802.05758, February 2018.
Argument mining is a core technology for automating argument search in large document collections. Despite its usefulness for this task, most current approaches to argument mining are designed for use only with specific text types and fall short when applied to heterogeneous texts. In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. We source annotations for over 25,000 instances covering eight controversial topics. The results of cross-topic experiments show that our attention-based neural network generalizes best to unseen topics and outperforms vanilla BiLSTM models by 6% in accuracy and 11% in F-score.
@article{stab2018cross-topic,
author       = {Christian Stab and Tristan Miller and Iryna Gurevych},
title        = {Cross-topic Argument Mining from Heterogeneous Sources Using Attention-based Neural Networks},
journal      = {ArXiv e-prints},
volume       = {1802.05758},
month        = feb,
year         = {2018},
}
Tristan Miller, Christian F. Hempelmann, and Iryna Gurevych.
SemEval-2017 Task 7: Detection and interpretation of English puns.
In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 58–68, August 2017. ISBN 978-1-945626-55-5. DOI: 10.18653/v1/S17-2005.
A pun is a form of wordplay in which a word suggests two or more meanings by exploiting polysemy, homonymy, or phonological similarity to another word, for an intended humorous or rhetorical effect. Though a recurrent and expected feature in many discourse types, puns stymie traditional approaches to computational lexical semantics because they violate their one-sense-per-context assumption. This paper describes the first competitive evaluation for the automatic detection, location, and interpretation of puns. We describe the motivation for these tasks, the evaluation methods, and the manually annotated data set. Finally, we present an overview and discussion of the participating systems' methodologies, resources, and results.
@inproceedings{miller2017semeval,
author       = {Tristan Miller and Christian F. Hempelmann and Iryna Gurevych},
title        = {{SemEval}-2017 {Task}~7: {Detection} and Interpretation of {English} Puns},
booktitle    = {Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)},
pages        = {58--68},
month        = aug,
year         = {2017},
isbn         = {978-1-945626-55-5},
doi          = {10.18653/v1/S17-2005},
}
Sallam Abualhaija, Tristan Miller, Judith Eckle-Kohler, Iryna Gurevych, and Karl-Heinz Zimmermann.
Metaheuristic approaches to lexical substitution and simplification.
In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), volume 1, pages 870–879, April 2017. ISBN 978-1-945626-34-0.
In this paper, we propose using metaheuristics—in particular, simulated annealing and the new D-Bees algorithm—to solve word sense disambiguation as an optimization problem within a knowledge-based lexical substitution system. We are the first to perform such an extrinsic evaluation of metaheuristics, for which we use two standard lexical substitution datasets, one English and one German. We find that D-Bees has robust performance for both languages, and performs better than simulated annealing, though both achieve good results. Moreover, the D-Bees–based lexical substitution system outperforms state-of-the-art systems on several evaluation metrics. We also show that D-Bees achieves competitive performance in lexical simplification, a variant of lexical substitution.
@inproceedings{abualhaija2017metaheuristic,
author       = {Sallam Abualhaija and Tristan Miller and Judith Eckle-Kohler and Iryna Gurevych and Karl-Heinz Zimmermann},
title        = {Metaheuristic Approaches to Lexical Substitution and Simplification},
booktitle    = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)},
volume       = {1},
pages        = {870--879},
month        = apr,
year         = {2017},
isbn         = {978-1-945626-34-0},
}
Christian F. Hempelmann and Tristan Miller.
Puns: Taxonomy and phonology.
In Salvatore Attardo, editor, The Routledge Handbook of Language and Humor, Routledge Handbooks in Linguistics, pages 95–108. Routledge, New York, NY, February 2017. ISBN 978-1-138-84306-6. DOI: 10.4324/9781315731162-8.
@incollection{hempelmann2017taxonomy,
author       = {Christian F. Hempelmann and Tristan Miller},
editor       = {Salvatore Attardo},
title        = {Puns: Taxonomy and Phonology},
booktitle    = {The Routledge Handbook of Language and Humor},
pages        = {95--108},
series       = {Routledge Handbooks in Linguistics},
month        = feb,
year         = {2017},
publisher    = {Routledge},
address      = {New York, NY},
isbn         = {978-1-138-84306-6},
doi          = {10.4324/9781315731162-8},
}
Chinnappa Guggilla, Tristan Miller, and Iryna Gurevych.
CNN- and LSTM-based claim classification in online user comments.
In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING 2016), pages 2740–2751, December 2016. ISBN 978-4-87974-702-0.
When processing arguments in online user interactive discourse, it is often necessary to determine their bases of support. In this paper, we describe a supervised approach, based on deep neural networks, for classifying the claims made in online arguments. We conduct experiments using convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) on two claim data sets compiled from online user comments. Using different types of distributional word embeddings, but without incorporating any rich, expensive set of features, we achieve a significant improvement over the state of the art for one data set (which categorizes arguments as factual vs. emotional), and performance comparable to the state of the art on the other data set (which categorizes claims according to their verifiability). Our approach has the advantages of using a generalized, simple, and effective methodology that works for claim categorization on different data sets and tasks.
@inproceedings{guggilla2016cnn,
author       = {Chinnappa Guggilla and Tristan Miller and Iryna Gurevych},
title        = {{CNN}- and {LSTM}-based Claim Classification in Online User Comments},
booktitle    = {Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING 2016)},
pages        = {2740--2751},
month        = dec,
year         = {2016},
isbn         = {978-4-87974-702-0},
}
Tristan Miller, Mohamed Khemakhem, Richard Eckart de Castilho, and Iryna Gurevych.
Sense-annotating a lexical substitution data set with Ubyline.
In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pages 828–835. European Language Resources Association, May 2016. ISBN 978-2-9517408-9-1.
We describe the construction of GLASS, a newly sense-annotated version of the German lexical substitution data set used at the GermEval 2015: LexSub shared task. Using the two annotation layers, we conduct the first known empirical study of the relationship between manually applied word senses and lexical substitutions. We find that synonymy and hypernymy/hyponymy are the only semantic relations directly linking targets to their substitutes, and that substitutes in the target's hypernymy/hyponymy taxonomy closely align with the synonyms of a single GermaNet synset. Despite this, these substitutes account for a minority of those provided by the annotators. The results of our analysis accord with those of a previous study on English-language data (albeit with automatically induced word senses), leading us to suspect that the sense–substitution relations we discovered may be of a universal nature. We also tentatively conclude that relatively cheap lexical substitution annotations can be used as a knowledge source for automatic WSD. Also introduced in this paper is Ubyline, the web application used to produce the sense annotations. Ubyline presents an intuitive user interface optimized for annotating lexical sample data, and is readily adaptable to sense inventories other than GermaNet.
@inproceedings{miller2016sense-annotating,
author       = {Tristan Miller and Mohamed Khemakhem and Eckart de Castilho, Richard and Iryna Gurevych},
editor       = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
title        = {Sense-annotating a Lexical Substitution Data Set with {Ubyline}},
booktitle    = {Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)},
pages        = {828--835},
month        = may,
year         = {2016},
publisher    = {European Language Resources Association},
isbn         = {978-2-9517408-9-1},
}
Tristan Miller. Adjusting sense representations for word sense disambiguation and automatic pun interpretation. Dr.-Ing. thesis, Department of Computer Science, Technische Universität Darmstadt, April 2016.
@phdthesis{miller2016adjusting,
author       = {Tristan Miller},
title        = {Adjusting Sense Representations for Word Sense Disambiguation and Automatic Pun Interpretation},
type         = {{Dr.-Ing.}\ thesis},
month        = apr,
year         = {2016},
school       = {Department of Computer Science, Technische Universit{\"{a}}t Darmstadt},
}
Tristan Miller and Mladen Turković.
Towards the automatic detection and identification of English puns.
European Journal of Humour Research, 4(1):59–75, January 2016. ISSN 2307-700X.
Lexical polysemy, a fundamental characteristic of all human languages, has long been regarded as a major challenge to machine translation, human–computer interaction, and other applications of computational natural language processing (NLP). Traditional approaches to automatic word sense disambiguation (WSD) rest on the assumption that there exists a single, unambiguous communicative intention underlying every word in a document. However, writers sometimes intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use of lexical ambiguity — \emphi.e., punning — is a particularly common source of humour, and therefore has important implications for how NLP systems process documents and interact with users. In this paper we make a case for research into computational methods for the detection of puns in running text and for the isolation of the intended meanings. We discuss the challenges involved in adapting principles and techniques from WSD to humorously ambiguous text, and outline our plans for evaluating WSD-inspired systems in a dedicated pun identification task. We describe the compilation of a large manually annotated corpus of puns and present an analysis of its properties. While our work is principally concerned with simple puns which are monolexemic and homographic (\emphi.e., exploiting single words which have different meanings but are spelled identically), we touch on the challenges involved in processing other types.
@article{miller2016towards,
author       = {Tristan Miller and Mladen Turkovi{\'{c}}},
title        = {Towards the Automatic Detection and Identification of {English} Puns},
journal      = {European Journal of Humour Research},
volume       = {4},
number       = {1},
pages        = {59--75},
month        = jan,
year         = {2016},
issn         = {2307-700X},
}
Tristan Miller, Darina Benikova, and Sallam Abualhaija.
GermEval 2015: LexSub – A shared task for German-language lexical substitution.
In Proceedings of GermEval 2015: LexSub, pages 1–9, September 2015.
Lexical substitution is a task in which participants are given a word in a short context and asked to provide a list of synonyms appropriate for that context. This paper describes GermEval 2015: LexSub, the first shared task for automated lexical substitution on German-language text. We describe the motivation for this task, the evaluation methods, and the manually annotated data set used to train and test the participating systems. Finally, we present an overview and discussion of the participating systems' methodologies, resources, and results.
@inproceedings{miller2015germeval,
author       = {Miller, Tristan and Benikova, Darina and Abualhaija, Sallam},
title        = {{GermEval} 2015: {LexSub}~-- {A} Shared Task for {German}-language Lexical Substitution},
booktitle    = {Proceedings of GermEval 2015: LexSub},
pages        = {1--9},
month        = sep,
year         = {2015},
}
Traditional approaches to word sense disambiguation (WSD) rest on the assumption that there exists a single, unambiguous communicative intention underlying every word in a document. However, writers sometimes intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use of lexical ambiguity—\emphi.e., punning—is a particularly common source of humour. In this paper we describe how traditional, language-agnostic WSD approaches can be adapted to “disambiguate” puns, or rather to identify their double meanings. We evaluate several such approaches on a manually sense-annotated corpus of English puns and observe performance exceeding that of some knowledge-based and supervised baselines.
@inproceedings{miller2015automatic,
author       = {Tristan Miller and Iryna Gurevych},
title        = {Automatic Disambiguation of {English} Puns},
booktitle    = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL--IJCNLP 2015)},
volume       = {1},
pages        = {719--729},
month        = jul,
year         = {2015},
isbn         = {978-1-941643-72-3},
doi          = {10.3115/v1/P15-1070},
}
We present an integer sequence $a(n)$ corresponding to the number of distinct graphs of order $n$ where the vertices can be mapped to different squares of a chessboard such that the connected pairs of vertices are a knight's move apart.
@incollection{A255436,
author       = {Tristan Miller},
title        = {A255436: Number of Distinct, Connected, Order-n Subgraphs of the Infinite Knight's Graph},
booktitle    = {The On-line Encyclopedia of Integer Sequences},
month        = feb,
year         = {2015},
}
Tristan Miller.
An analysis of ambiguity in English puns.
In International Humour Symposium [of the 4th Hungarian Interdisciplinary Humour Conference]: Programme and Abstracts, Komárno, Slovakia, November 2014. J. Selye University, Faculty of Education, Department of Modern Philology.
Punning is a common source of verbal humour in which a word is used to evoke two or more distinct meanings simultaneously. The present work describes and analyzes a large corpus of English homographic puns manually annotated with senses from WordNet. We discuss the challenges in developing and applying the annotation scheme, introduce our annotation support tools, and present an analysis of selected morphological, syntactic, and semantic properties of the annotated examples. Particular focus is placed on the implications for computational approaches to detection of puns and identification of their opposing meanings.
@inproceedings{miller2014analysis,
author       = {Tristan Miller},
title        = {An Analysis of Ambiguity in {English} Puns},
booktitle    = {International Humour Symposium [of the 4th Hungarian Interdisciplinary Humour Conference]: Programme and Abstracts},
month        = nov,
year         = {2014},
publisher    = {J. Selye University, Faculty of Education, Department of Modern Philology},
address      = {Kom{\'{a}}rno, Slovakia},
}
Michael Matuschek, Tristan Miller, and Iryna Gurevych.
A language-independent sense clustering approach for enhanced WSD.
In Josef Ruppenhofer and Gertrud Faaß, editors, Proceedings of the 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014), pages 11–21. Universitätsverlag Hildesheim, October 2014. ISBN 978-3-934105-46-1.
We present a method for clustering word senses of a lexical-semantic resource by mapping them to those of another sense inventory. This is a promising way of reducing polysemy in sense inventories and consequently improving word sense disambiguation performance. In contrast to previous approaches, we use Dijkstra-WSA, a parameterizable alignment algorithm which is largely resource- and language-agnostic. To demonstrate this, we apply our technique to GermaNet, the German equivalent to WordNet. The GermaNet sense clusterings we induce through alignments to various collaboratively constructed resources achieve a significant boost in accuracy, even though our method is far less complex and less dependent on language-specific knowledge than past approaches.
@inproceedings{matuschek2014language,
author       = {Michael Matuschek and Tristan Miller and Iryna Gurevych},
editor       = {Josef Ruppenhofer and Gertrud Faa{\ss}},
title        = {A Language-independent Sense Clustering Approach for Enhanced {WSD}},
booktitle    = {Proceedings of the 12th Konferenz zur Verarbeitung nat{\"{u}}rlicher Sprache (KONVENS 2014)},
pages        = {11--21},
month        = oct,
year         = {2014},
publisher    = {Universit{\"{a}}tsverlag Hildesheim},
isbn         = {978-3-934105-46-1},
}
Tristan Miller and Iryna Gurevych.
WordNet–Wikipedia–Wiktionary: Construction of a three-way alignment.
In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pages 2094–2100. European Language Resources Association, May 2014. ISBN 978-2-9517408-8-4.
The coverage and quality of conceptual information contained in lexical semantic resources is crucial for many tasks in natural language processing. Automatic alignment of complementary resources is one way of improving this coverage and quality; however, past attempts have always been between pairs of specific resources. In this paper we establish some set-theoretic conventions for describing concepts and their alignments, and use them to describe a method for automatically constructing $n$-way alignments from arbitrary pairwise alignments. We apply this technique to the production of a three-way alignment from previously published WordNet–Wikipedia and WordNet–Wiktionary alignments. We then present a quantitative and informal qualitative analysis of the aligned resource. The three-way alignment was found to have greater coverage, an enriched sense representation, and coarser sense granularity than both the original resources and their pairwise alignments, though this came at the cost of accuracy. An evaluation of the induced word sense clusters in a word sense disambiguation task showed that they were no better than random clusters of equivalent granularity. However, use of the alignments to enrich a sense inventory with additional sense glosses did significantly improve the performance of a baseline knowledge-based WSD algorithm.
@inproceedings{miller2014wordnet,
author       = {Tristan Miller and Iryna Gurevych},
editor       = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
title        = {{WordNet}--{Wikipedia}--{Wiktionary}: Construction of a Three-way Alignment},
booktitle    = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)},
pages        = {2094--2100},
month        = may,
year         = {2014},
publisher    = {European Language Resources Association},
isbn         = {978-2-9517408-8-4},
}
Implementations of word sense disambiguation (WSD) algorithms tend to be tied to a particular test corpus format and sense inventory. This makes it difficult to test their performance on new data sets, or to compare them against past algorithms implemented for different data sets. In this paper we present DKPro WSD, a freely licensed, general-purpose framework for WSD which is both modular and extensible. DKPro WSD abstracts the WSD process in such a way that test corpora, sense inventories, and algorithms can be freely swapped. Its UIMA-based architecture makes it easy to add support for new resources and algorithms. Related tasks such as word sense induction and entity linking are also supported.
@inproceedings{miller2013dkpro,
author       = {Tristan Miller and Nicolai Erbs and Hans-Peter Zorn and Torsten Zesch and Iryna Gurevych},
title        = {{DKPro} {WSD}: {A} Generalized {UIMA}-based Framework for Word Sense Disambiguation},
booktitle    = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (System Demonstrations) (ACL 2013)},
pages        = {37--42},
month        = aug,
year         = {2013},
}
Tristan Miller, Chris Biemann, Torsten Zesch, and Iryna Gurevych.
Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation.
In Martin Kay and Christian Boitet, editors, Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 1781–1796, December 2012.
We explore the contribution of distributional information for purely knowledge-based word sense disambiguation. Specifically, we use a distributional thesaurus, computed from a large parsed corpus, for lexical expansion of context and sense information.This bridges the lexical gap that is seen as the major obstacle for word overlap–based approaches.We apply this mechanism to two traditional knowledge-based methods and show that distributional information significantly improves disambiguation results across several data sets.This improvement exceeds the state of the art for disambiguation without sense frequency information—a situation which is especially encountered with new domains or languages for which no sense-annotated corpus is available.
@inproceedings{miller2012using,
author       = {Tristan Miller and Chris Biemann and Torsten Zesch and Iryna Gurevych},
editor       = {Martin Kay and Christian Boitet},
title        = {Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation},
booktitle    = {Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012)},
pages        = {1781--1796},
month        = dec,
year         = {2012},
}
Tristan Miller, Bertin Klein, and Elisabeth Wolf.
Exploiting latent semantic relations in highly linked hypertext for information retrieval in wikis.
In Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, and Nikolai Nikolov, editors, Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing (RANLP 2009), pages 241–245. ACM Press, September 2009.
Good hypertext writing style mandates that link texts clearly indicate the nature of the link target. While this guideline is routinely ignored in HTML, the lightweight markup languages used by wikis encourage or even force hypertext authors to use semantically appropriate link texts. This property of wiki hypertext makes it an ideal candidate for processing with latent semantic analysis, a factor analysis technique for finding latent transitive relations among natural-language documents. In this study, we design, implement, and test an LSA-based information retrieval system for wikis. Instead of a full-text index, our system indexes only link texts and document titles. Nevertheless, its precision exceeds that of a popular full-text search engine, and is comparable to that of PageRank-based systems such as Google.
@inproceedings{miller2009exploiting,
author       = {Tristan Miller and Bertin Klein and Elisabeth Wolf},
editor       = {Galia Angelova and Kalina Bontcheva and Ruslan Mitkov and Nicolas Nicolov and Nikolai Nikolov},
title        = {Exploiting Latent Semantic Relations in Highly Linked Hypertext for Information Retrieval in Wikis},
booktitle    = {Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing (RANLP 2009)},
pages        = {241--245},
month        = sep,
year         = {2009},
publisher    = {ACM Press},
}
Tristan Miller and Elisabeth Wolf.
Word completion with latent semantic analysis.
In Yuan Yan Tang, S. Patrick Wang, G. Lorette, Daniel So Yeung, and Hong Yan, editors, Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), volume 1, pages 1252–1255. IEEE Press, August 2006. ISBN 978-0-7695-2521-1. DOI: 10.1109/ICPR.2006.1191.
Current word completion tools rely mostly on statistical or syntactic knowledge. Can using semantic knowledge improve the completion task? We propose a language-independent word completion algorithm which uses latent semantic analysis (LSA) to model the semantic context of the word being typed. We find that a system using this algorithm alone achieves keystroke savings of 56% and a hit rate of 42%. This represents improvements of 4.3% and 12%, respectively, over existing approaches.
@inproceedings{miller2006word,
author       = {Tristan Miller and Elisabeth Wolf},
editor       = {Yuan Yan Tang and S. Patrick Wang and G. Lorette and Daniel So Yeung and Hong Yan},
title        = {Word Completion with Latent Semantic Analysis},
booktitle    = {Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006)},
volume       = {1},
pages        = {1252--1255},
month        = aug,
year         = {2006},
publisher    = {IEEE Press},
isbn         = {978-0-7695-2521-1},
issn         = {1051-4651},
doi          = {10.1109/ICPR.2006.1191},
}
Elisabeth Wolf, Shankar Vembu, and Tristan Miller.
On the use of topic models for word completion.
In Tapio Salakoski, Filip Ginter, Sampo Pyysalo, and Tapio Pahikkala, editors, Proceedings of the 5th International Conference on Natural Language Processing (FinTAL 2006), volume 4139 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 500–511. Springer, August 2006. ISBN 978-3-540-37334-6. DOI: 10.1007/11816508_50.
We investigate the use of topic models, such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), for word completion tasks. The advantage of using these models for such an application is twofold. On the one hand, they allow us to exploit semantic or contextual information when predicting candidate words for completion. On the other hand, these probabilistic models have been found to outperform classical latent semantic analysis (LSA) for modeling text documents. We describe a word completion algorithm that takes into account the semantic context of the word being typed. We also present evaluation metrics to compare different models being used in our study. Our experiments validate our hypothesis of using probabilistic models for semantic analysis of text documents and their application in word completion tasks.
@inproceedings{wolf2006use,
author       = {Elisabeth Wolf and Shankar Vembu and Tristan Miller},
editor       = {Tapio Salakoski and Filip Ginter and Sampo Pyysalo and Tapio Pahikkala},
title        = {On the Use of Topic Models for Word Completion},
booktitle    = {Proceedings of the 5th International Conference on Natural Language Processing (FinTAL 2006)},
volume       = {4139},
pages        = {500--511},
series       = {Lecture Notes in Computer Science},
month        = aug,
year         = {2006},
publisher    = {Springer},
isbn         = {978-3-540-37334-6},
issn         = {0302-9743},
doi          = {10.1007/11816508_50},
}
In questo articolo verrà presentato HA-prosper, un pacchetto LaTeX per la creazione di sofisticate slide. Ne descriveremo le caratteristiche mostrandone alcuni esempi d'uso. Inoltre, discuteremo quali vantaggi si possono trarre dal tipo di approccio, proprio della filosofia LaTeX, in rapporto agli altri tipi di programmi per presentazioni che generalmente sono presenti nelle attuali suite di applicazioni per ufficio.
@article{miller2006producing,
author       = {Tristan Miller},
title        = {Creare splendide slade con {\LaTeX}: Un'introduzione al pacchetto {HA}-prosper [{Producing} Beautiful Slides with {\LaTeX}: {An} Introduction to the {HA}-prosper Package]},
journal      = {Pluto Journal},
number       = {47},
month        = may,
year         = {2006},
note         = {Translated by Gabriele Zucchetta},
}
Peter Flom and Tristan Miller.
Impressions from PracTeX'05.
TUGboat: The Communications of the TeX Users Group, 26(1):31–32, 2005. ISSN 0896-3207.
@article{flom2005bimpressions,
author       = {Peter Flom and Tristan Miller},
title        = {Impressions from {Prac}{\TeX}'05},
journal      = {TUGboat: The Communications of the {\TeX}{} Users Group},
volume       = {26},
number       = {1},
pages        = {31--32},
year         = {2005},
issn         = {0896-3207},
}
We present Biblet, a set of BibTeX bibliography styles (bst) which generate XHTML from BibTeX databases. Unlike other BibTeX to XML/HTML converters, Biblet is written entirely in the native BibTeX style language and therefore works “out of the box” on any system that runs BibTeX. Features include automatic conversion of LaTeX symbols to HTML or Unicode entities; customizable graphical hyperlinks to PostScript, PDF, DVI, LaTeX, and HTML resources; support for nonstandard but common fields such as day, isbn, and abstract; hideable text blocks; and output of the original BibTeX entry for sharing citations. Biblet's highly structured XHTML output means that bibliography appearance to can be drastically altered simply by specifying a Cascading Style Sheet (CSS), or easily postprocessed with third-party XML, HTML, or text processing tools. We compare and contrast Biblet to other common converters, describe basic usage of Biblet, give examples of how to produce custom-formatted bibliographies, and provide a basic overview of Biblet internals for those wishing to modify the style file itself.
@article{miller2005biblet,
author       = {Tristan Miller},
title        = {Biblet: {A} Portable {\BibTeX}\ Bibliography Style for Generating Highly Customizable {XHTML}},
journal      = {TUGboat: The Communications of the {\TeX}{} Users Group},
volume       = {26},
number       = {1},
pages        = {85--96},
year         = {2005},
issn         = {0896-3207},
}
RPM is a package management system which provides a uniform, automated way for users to install, upgrade, and uninstall programs. Because RPM is the default software distribution format for many operating systems (particularly GNU/Linux), users may find it useful to manage their library of TeX-related packages using RPM. This article explains how to produce RPM files for TeX software, either for personal use or for public distribution. We also explain how a (La)TeX user can find, install, and remove TeX-related RPM packages.
@article{miller2005using,
author       = {Tristan Miller},
title        = {Using the {RPM} {Package} {Manager} for {\LaTeXTeX}{} Packages},
journal      = {TUGboat: The Communications of the {\TeX}{} Users Group},
volume       = {26},
number       = {1},
pages        = {17--28},
year         = {2005},
issn         = {0896-3207},
}
Tristan Miller and Stefan Agne.
Attention-based information retrieval using eye tracker data.
In Peter Clark and Guus Schreiber, editors, Proceedings of the 3rd International Conference on Knowledge Capture (K-CAP 2005), pages 209–210, New York, NY, September 2005. ACM. ISBN 978-1-59593-163-4. DOI: 10.1145/1088622.1088672.
We describe eFISK, an automated keyword extraction system which unobtrusively measures the user's attention in order to isolate and identify those areas of a written document the reader finds of greatest interest. Attention is measured by use of eye-tracking hardware consisting of a desk-mounted infrared camera which records various data about the user's eye. The keywords thus identified are subsequently used in the back end of an information retrieval system to help the user find other documents which contain information of interest to him. Unlike traditional IR techniques which compare documents simply on the basis of common terms withal, our system also accounts for the weights users implicitly attach to certain words or sections of the source document. We describe a task-based user study which compares the utility of standard relevance feedback techniques to the keywords and keyphrases discovered by our system in finding other relevant documents from a corpus.
@inproceedings{miller2005attention-based,
author       = {Tristan Miller and Stefan Agne},
editor       = {Peter Clark and Guus Schreiber},
title        = {Attention-based Information Retrieval Using Eye Tracker Data},
booktitle    = {Proceedings of the 3rd International Conference on Knowledge Capture ({K-CAP} 2005)},
pages        = {209--210},
month        = sep,
year         = {2005},
publisher    = {ACM},
address      = {New York, NY},
isbn         = {978-1-59593-163-4},
doi          = {10.1145/1088622.1088672},
}
Peter Flom and Tristan Miller.
Impressions from PracTeX'05.
The PracTeX Journal, 2(3), July 2005. ISSN 1556-6994.
@article{flom2005impressions,
author       = {Peter Flom and Tristan Miller},
title        = {Impressions from {Prac}{\TeX}'05},
journal      = {The Prac{\TeX}{} Journal},
volume       = {2},
number       = {3},
month        = jul,
year         = {2005},
issn         = {1556-6994},
}
Bertin Klein, Tristan Miller, and Sandra Zilles.
Security issues for pervasive personalized communication systems.
In Dieter Hutter and Markus Ullmann, editors, Security in Pervasive Computing: Proceedings of the 2nd International Conference on Security in Pervasive Computing (SPC 2005), volume 3450 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 56–62. Springer, April 2005. ISBN 3-540-25521-4.
Technological progress allows us to equip any mobile phone with new functionalities, such as storing personalized information about its owner and using the corresponding personal profile for enabling communication to persons whose mobile phones represent similar profiles. However, this raises very specific security issues, in particular relating to the use of Bluetooth technology. Herein we consider such scenarios and related problems in privacy and security matters. We analyze in which respect certain design approaches may fail or succeed at solving these problems. We concentrate on methods for designing the user-related part of the communication service appropriately in order to enhance confidentiality.
@inproceedings{klein2005security,
author       = {Bertin Klein and Tristan Miller and Sandra Zilles},
editor       = {Dieter Hutter and Markus Ullmann},
title        = {Security Issues for Pervasive Personalized Communication Systems},
booktitle    = {Security in Pervasive Computing: Proceedings of the 2nd International Conference on Security in Pervasive Computing (SPC 2005)},
volume       = {3450},
pages        = {56--62},
series       = {Lecture Notes in Computer Science},
month        = apr,
year         = {2005},
publisher    = {Springer},
isbn         = {3-540-25521-4},
issn         = {0302-9743},
}
In this paper, we present HA-prosper, a LaTeX package for creating overhead slides. We describe the features of the package and give examples of their use. We also discuss what advantages there are to producing slides with LaTeX versus the presentation software typically bundled with today's office suites.
@article{miller2005producing,
author       = {Tristan Miller},
title        = {Producing Beautiful Slides with {\LaTeX}: {An} Introduction to the {HA}-prosper Package},
journal      = {The Prac{\TeX}{} Journal},
volume       = {2},
number       = {1},
month        = apr,
year         = {2005},
issn         = {1556-6994},
}
Tristan Miller.
Latent semantic analysis and the construction of coherent extracts.
In Nicolas Nicolov, Kalina Botcheva, Galia Angelova, and Ruslan Mitkov, editors, Recent Advances in Natural Language Processing III, volume 260 of Current Issues in Linguistic Theory (CILT) (ISSN 0304-0763), pages 277–286. John Benjamins, Amsterdam/Philadelphia, 2004. ISBN 1-58811-618-2. DOI: 10.1075/cilt.260.31mil.
We describe a language-neutral automatic summarization system which aims to produce coherent extracts. It builds an initial extract composed solely of topic sentences, and then recursively fills in the topical lacunae by providing linking material between semantically dissimilar sentences. While experiments with human judges did not prove a statistically significant increase in textual coherence with the use of a latent semantic analysis module, we found a strong positive correlation between coherence and overall summary quality.
@incollection{miller2004latent,
author       = {Tristan Miller},
editor       = {Nicolas Nicolov and Kalina Botcheva and Galia Angelova and Ruslan Mitkov},
title        = {Latent Semantic Analysis and the Construction of Coherent Extracts},
booktitle    = {Recent Advances in Natural Language Processing {III}},
volume       = {260},
pages        = {277--286},
series       = {Current Issues in Linguistic Theory (CILT)},
year         = {2004},
publisher    = {John Benjamins},
address      = {Amsterdam/Philadelphia},
isbn         = {1-58811-618-2},
issn         = {0304-0763},
doi          = {10.1075/cilt.260.31mil},
}
Latent semantic analysis (LSA) is an automated, statistical technique for comparing the semantic similarity of words or documents. In this paper, I examine the application of LSA to automated essay scoring. I compare LSA methods to earlier statistical methods for assessing essay quality, and critically review contemporary essay-scoring systems built on LSA, including the Intelligent Essay Assessor, Summary Street, State the Essence, Apex, and Select-a-Kibitzer. Finally, I discuss current avenues of research, including LSA's application to computer-measured readability assessment and to automatic summarization of student essays.
@article{miller2003essay,
author       = {Tristan Miller},
title        = {Essay Assessment with Latent Semantic Analysis},
journal      = {Journal of Educational Computing Research},
volume       = {29},
number       = {4},
pages        = {495--512},
month        = dec,
year         = {2003},
issn         = {0735-6331},
doi          = {10.2190/W5AR-DYPW-40KX-FL99},
}
Tristan Miller.
Latent semantic analysis and the construction of coherent extracts.
In Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, and Nikolai Nikolov, editors, Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing (RANLP 2003), pages 270–277, September 2003. ISBN 954-90906-6-3.
We describe a language-neutral automatic summarization system which aims to produce coherent extracts. It builds an initial extract composed solely of topic sentences, and then recursively fills in the topical lacunae by providing linking material between semantically dissimilar sentences. While experiments with human judges did not prove a statistically significant increase in textual coherence with the use of a latent semantic analysis module, we found a strong positive correlation between coherence and overall summary quality.
@inproceedings{miller2003latent,
author       = {Tristan Miller},
editor       = {Galia Angelova and Kalina Bontcheva and Ruslan Mitkov and Nicolas Nicolov and Nikolai Nikolov},
title        = {Latent Semantic Analysis and the Construction of Coherent Extracts},
booktitle    = {Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing (RANLP 2003)},
pages        = {270--277},
month        = sep,
year         = {2003},
isbn         = {954-90906-6-3},
}
Tristan Miller. Generating coherent extracts of single documents using latent semantic analysis. M.Sc. thesis, Department of Computer Science, University of Toronto, March 2003.
A major problem with automatically-produced summaries in general, and extracts in particular, is that the output text often lacks textual coherence. Our goal is to improve the textual coherence of automatically produced extracts. We developed and implemented an algorithm which builds an initial extract composed solely of topic sentences, and then recursively fills in the lacunae by providing linking material from the original text between semantically dissimilar sentences. Our summarizer differs in architecture from most others in that it measures semantic similarity with latent semantic analysis (LSA), a factor analysis technique based on the vector-space model of information retrieval. We believed that the deep semantic relations discovered by LSA would assist in the identification and correction of abrupt topic shifts in the summaries. However, our experiments did not show a statistically significant difference in the coherence of summaries produced by our system as compared with a non-LSA version.
@mastersthesis{miller2003generating,
author       = {Tristan Miller},
title        = {Generating Coherent Extracts of Single Documents Using Latent Semantic Analysis},
type         = {{M.Sc.}\ thesis},
month        = mar,
year         = {2003},
school       = {Department of Computer Science, University of Toronto},
}
Michael J. Maher, Allan Rock, Grigoris Antoniou, David Billington, and Tristan Miller.
Efficient defeasible reasoning systems.
International Journal on Artificial Intelligence Tools, 10(4):483–501, December 2001. ISSN 0218-2130. DOI: 10.1142/S0218213001000623.
For many years, the non-monotonic reasoning community has focussed on highly expressive logics. Such logics have turned out to be computationally expensive, and have given little support to the practical use of non-monotonicreasoning. In this work we discuss defeasible logic, a less-expressive but more efficient non-monotonic logic. We report on two new implemented systems for defeasible logic: a query answering system employing a backward-chaining approach, and a forward-chaining implementation that computes all conclusions. Our experimental evaluation demonstrates that the systems can deal with large theories (up to hundreds of thousands of rules). We show that defeasible logic has linear complexity, which contrasts markedly with most other non-monotonic logics and helps to explain the impressive experimental results. We believe that defeasible logic, with its efficiency and simplicity, is a good candidate to be used as a modelling language for practical applications, including modelling of regulations and business rules.
@article{maher2001efficient,
author       = {Michael J. Maher and Allan Rock and Grigoris Antoniou and David Billington and Tristan Miller},
title        = {Efficient Defeasible Reasoning Systems},
journal      = {International Journal on Artificial Intelligence Tools},
volume       = {10},
number       = {4},
pages        = {483--501},
month        = dec,
year         = {2001},
issn         = {0218-2130},
doi          = {10.1142/S0218213001000623},
}
Tristan Miller. Essay assessment with latent semantic analysis. Technical Report CSRG-440, Department of Computer Science, University of Toronto, May 2001.
@techreport{miller2001essay,
author       = {Tristan Miller},
title        = {Essay Assessment with Latent Semantic Analysis},
number       = {{CSRG-440}},
type         = {Technical Report},
month        = may,
year         = {2001},
institution  = {Department of Computer Science, University of Toronto},
}
Michael J. Maher, Allan Rock, Grigoris Antoniou, David Billington, and Tristan Miller.
Efficient defeasible reasoning systems.
In Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2000), pages 384–392. IEEE Press, November 2000. ISBN 0-7695-0909-6. DOI: 10.1109/TAI.2000.889898.
For many years, the non-monotonic reasoning community has focussed on highly expressive logics. Such logics have turned out to be computationally expensive, and have given little support to the practical use of non-monotonicreasoning. In this work we discuss defeasible logic, a less-expressive but more efficient non-monotonic logic. We report on two new implemented systems for defeasible logic: a query answering system employing a backward-chaining approach, and a forward-chaining implementation that computes all conclusions. Our experimental evaluation demonstrates that the systems can deal with large theories (up to hundreds of thousands of rules). We show that defeasible logic has linear complexity, which contrasts markedly with most other non-monotonic logics and helps to explain the impressive experimental results. We believe that defeasible logic, with its efficiency and simplicity, is a good candidate to be used as a modelling language for practical applications, including modelling of regulations and business rules.
@inproceedings{maher2000efficient,
author       = {Michael J. Maher and Allan Rock and Grigoris Antoniou and David Billington and Tristan Miller},
title        = {Efficient Defeasible Reasoning Systems},
booktitle    = {Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2000)},
pages        = {384--392},
month        = nov,
year         = {2000},
publisher    = {IEEE Press},
isbn         = {0-7695-0909-6},
issn         = {1082-3409},
doi          = {10.1109/TAI.2000.889898},
}
Automatic generation of Bayesian network (BN) structures (directed acyclic graphs) is an important step in experimental study of algorithms for inference in BNs and algorithms for learning BNs from data. Previously known simulation algorithms do not guarantee connectedness of generated structures or even successful genearation according to a user specification. We propose a simple, efficient and well-behaved algorithm for automatic generation of BN structures. The performance of the algorithm is demonstrated experimentally.
@article{xiang1999wellbehaved,
author       = {Yang Xiang and Tristan Miller},
title        = {A Well-behaved Algorithm for Simulating Dependence Structures of {Bayesian} Networks},
journal      = {International Journal of Applied Mathematics},
volume       = {1},
number       = {8},
pages        = {923--932},
year         = {1999},
issn         = {1311-1728},
}