# Publications

This is a list of my scientific and technical publications. (My publications in humour, poetry, and recreational linguistics are in a separate list.)

This list is also available as a BibTeX file.

1. Christian Stab, Tristan Miller, Benjamin Schiller, Pranav Rai, and Iryna Gurevych. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), pages 3664–3674, October 2018. ISBN 978-1-948087-84-1.

Argument mining is a core technology for automating argument search in large document collections. Despite its usefulness for this task, most current approaches are designed for use only with specific text types and fall short when applied to heterogeneous texts. In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. We source annotations for over 25,000 instances covering eight controversial topics. We show that integrating topic information into bidirectional long short-term memory networks outperforms vanilla BiLSTMs by more than 3 percentage points in F$_1$ in two- and three-label cross-topic settings. We also show that these results can be further improved by leveraging additional data for topic relevance using multi-task learning.
@inproceedings{stab2018bcross-topic,
author       = {Christian Stab and Tristan Miller and Benjamin Schiller and Pranav Rai and Iryna Gurevych},
title        = {Cross-topic Argument Mining from Heterogeneous Sources},
booktitle    = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)},
pages        = {3664--3674},
month        = oct,
year         = {2018},
isbn         = {978-1-948087-84-1},
}
2. Christian Stab, Johannes Daxenberger, Chris Stahlhut, Tristan Miller, Benjamin Schiller, Christopher Tauchmann, Steffen Eger, and Iryna Gurevych. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (NAACL-HLT 2018), pages 21–25, June 2018. ISBN 978-1-948087-28-5. DOI: 10.18653/v1/N18-5005.

Argument mining is a core technology for enabling argument search in large corpora. However, most current approaches fall short when applied to heterogeneous texts. In this paper, we present an argument retrieval system capable of retrieving sentential arguments for any given controversial topic. By analyzing the highest-ranked results extracted from Web sources, we found that our system covers 89\% of arguments found in expert-curated lists of arguments from an online debate portal, and also identifies additional valid arguments.
@inproceedings{stab2018argumentext,
author       = {Christian Stab and Johannes Daxenberger and Chris Stahlhut and Tristan Miller and Benjamin Schiller and Christopher Tauchmann and Steffen Eger and Iryna Gurevych},
title        = {{ArgumenText}: Searching for Arguments in Heterogeneous Sources},
booktitle    = {Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (NAACL-HLT 2018)},
pages        = {21--25},
month        = jun,
year         = {2018},
isbn         = {978-1-948087-28-5},
doi          = {10.18653/v1/N18-5005},
}
3. Christian Stab, Tristan Miller, and Iryna Gurevych. ArXiv e-prints, 1802.05758, February 2018.

Argument mining is a core technology for automating argument search in large document collections. Despite its usefulness for this task, most current approaches to argument mining are designed for use only with specific text types and fall short when applied to heterogeneous texts. In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. We source annotations for over 25,000 instances covering eight controversial topics. The results of cross-topic experiments show that our attention-based neural network generalizes best to unseen topics and outperforms vanilla BiLSTM models by 6\% in accuracy and 11\% in F-score.
@article{stab2018cross-topic,
author       = {Christian Stab and Tristan Miller and Iryna Gurevych},
title        = {Cross-topic Argument Mining from Heterogeneous Sources Using Attention-based Neural Networks},
journal      = {ArXiv e-prints},
volume       = {1802.05758},
month        = feb,
year         = {2018},
}
4. Tristan Miller, Christian F. Hempelmann, and Iryna Gurevych. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 58–68, August 2017. ISBN 978-1-945626-55-5.

A pun is a form of wordplay in which a word suggests two or more meanings by exploiting polysemy, homonymy, or phonological similarity to another word, for an intended humorous or rhetorical effect. Though a recurrent and expected feature in many discourse types, puns stymie traditional approaches to computational lexical semantics because they violate their one-sense-per-context assumption. This paper describes the first competitive evaluation for the automatic detection, location, and interpretation of puns. We describe the motivation for these tasks, the evaluation methods, and the manually annotated data set. Finally, we present an overview and discussion of the participating systems' methodologies, resources, and results.
@inproceedings{miller2017semeval,
author       = {Tristan Miller and Christian F. Hempelmann and Iryna Gurevych},
title        = {{SemEval}-2017 {Task}~7: {Detection} and Interpretation of {English} Puns},
booktitle    = {Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)},
pages        = {58--68},
month        = aug,
year         = {2017},
isbn         = {978-1-945626-55-5},
}
5. Sallam Abualhaija, Tristan Miller, Judith Eckle-Kohler, Iryna Gurevych, and Karl-Heinz Zimmermann. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), volume 1, pages 870–879, April 2017. ISBN 978-1-945626-34-0.

In this paper, we propose using metaheuristics---in particular, simulated annealing and the new D-Bees algorithm---to solve word sense disambiguation as an optimization problem within a knowledge-based lexical substitution system. We are the first to perform such an extrinsic evaluation of metaheuristics, for which we use two standard lexical substitution datasets, one English and one German. We find that D-Bees has robust performance for both languages, and performs better than simulated annealing, though both achieve good results. Moreover, the D-Bees--based lexical substitution system outperforms state-of-the-art systems on several evaluation metrics. We also show that D-Bees achieves competitive performance in lexical simplification, a variant of lexical substitution.
@inproceedings{abualhaija2017metaheuristic,
author       = {Sallam Abualhaija and Tristan Miller and Judith Eckle-Kohler and Iryna Gurevych and Karl-Heinz Zimmermann},
title        = {Metaheuristic Approaches to Lexical Substitution and Simplification},
booktitle    = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)},
volume       = {1},
pages        = {870--879},
month        = apr,
year         = {2017},
isbn         = {978-1-945626-34-0},
}
6. Christian F. Hempelmann and Tristan Miller. In Salvatore Attardo, editor, The Routledge Handbook of Language and Humor, Routledge Handbooks in Linguistics, pages 95–108. Routledge, New York, NY, February 2017. ISBN 978-1-138-84306-6. DOI: 10.4324/9781315731162.

@incollection{hempelmann2017taxonomy,
author       = {Christian F. Hempelmann and Tristan Miller},
editor       = {Salvatore Attardo},
title        = {Puns: Taxonomy and Phonology},
booktitle    = {The Routledge Handbook of Language and Humor},
pages        = {95--108},
series       = {Routledge Handbooks in Linguistics},
month        = feb,
year         = {2017},
publisher    = {Routledge},
isbn         = {978-1-138-84306-6},
doi          = {10.4324/9781315731162},
}
7. Chinnappa Guggilla, Tristan Miller, and Iryna Gurevych. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING 2016), pages 2740–2751, December 2016. ISBN 978-4-87974-702-0.

When processing arguments in online user interactive discourse, it is often necessary to determine their bases of support. In this paper, we describe a supervised approach, based on deep neural networks, for classifying the claims made in online arguments. We conduct experiments using convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) on two claim data sets compiled from online user comments. Using different types of distributional word embeddings, but without incorporating any rich, expensive set of features, we achieve a significant improvement over the state of the art for one data set (which categorizes arguments as factual vs.\ emotional), and performance comparable to the state of the art on the other data set (which categorizes claims according to their verifiability). Our approach has the advantages of using a generalized, simple, and effective methodology that works for claim categorization on different data sets and tasks.
@inproceedings{guggilla2016cnn,
author       = {Chinnappa Guggilla and Tristan Miller and Iryna Gurevych},
title        = {{CNN}- and {LSTM}-based Claim Classification in Online User Comments},
booktitle    = {Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING 2016)},
pages        = {2740--2751},
month        = dec,
year         = {2016},
isbn         = {978-4-87974-702-0},
}
8. Tristan Miller, Mohamed Khemakhem, Richard Eckart de Castilho, and Iryna Gurevych. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pages 828–835. European Language Resources Association, May 2016. ISBN 978-2-9517408-9-1.

We describe the construction of GLASS, a newly sense-annotated version of the German lexical substitution data set used at the GermEval 2015:~LexSub shared task. Using the two annotation layers, we conduct the first known empirical study of the relationship between manually applied word senses and lexical substitutions. We find that synonymy and hypernymy{\slash}hyponymy are the only semantic relations directly linking targets to their substitutes, and that substitutes in the target's hypernymy{\slash}hyponymy taxonomy closely align with the synonyms of a single GermaNet synset. Despite this, these substitutes account for a minority of those provided by the annotators. The results of our analysis accord with those of a previous study on English-language data (albeit with automatically induced word senses), leading us to suspect that the sense--substitution relations we discovered may be of a universal nature. We also tentatively conclude that relatively cheap lexical substitution annotations can be used as a knowledge source for automatic WSD. Also introduced in this paper is Ubyline, the web application used to produce the sense annotations. Ubyline presents an intuitive user interface optimized for annotating lexical sample data, and is readily adaptable to sense inventories other than GermaNet.
@inproceedings{miller2016sense-annotating,
author       = {Tristan Miller and Mohamed Khemakhem and Eckart de Castilho, Richard and Iryna Gurevych},
editor       = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
title        = {Sense-annotating a Lexical Substitution Data Set with {Ubyline}},
booktitle    = {Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)},
pages        = {828--835},
month        = may,
year         = {2016},
publisher    = {European Language Resources Association},
isbn         = {978-2-9517408-9-1},
}
9. Tristan Miller. Dr.-Ing. thesis, Department of Computer Science, Technische Universität Darmstadt, April 2016.

author       = {Tristan Miller},
title        = {Adjusting Sense Representations for Word Sense Disambiguation and Automatic Pun Interpretation},
type         = {{Dr.-Ing.}\ thesis},
month        = apr,
year         = {2016},
school       = {Department of Computer Science, Technische Universit{\"{a}}t Darmstadt},
}
10. Tristan Miller and Mladen Turković. European Journal of Humour Research, 4(1):59–75, January 2016. ISSN 2307-700X.

Lexical polysemy, a fundamental characteristic of all human languages, has long been regarded as a major challenge to machine translation, human–computer interaction, and other applications of computational natural language processing (NLP). Traditional approaches to automatic word sense disambiguation (WSD) rest on the assumption that there exists a single, unambiguous communicative intention underlying every word in a document. However, writers sometimes intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use of lexical ambiguity~--- \emph{i.e.},\ punning~--- is a particularly common source of humour, and therefore has important implications for how NLP systems process documents and interact with users. In this paper we make a case for research into computational methods for the detection of puns in running text and for the isolation of the intended meanings. We discuss the challenges involved in adapting principles and techniques from WSD to humorously ambiguous text, and outline our plans for evaluating WSD-inspired systems in a dedicated pun identification task. We describe the compilation of a large manually annotated corpus of puns and present an analysis of its properties. While our work is principally concerned with simple puns which are monolexemic and homographic (\emph{i.e.},\ exploiting single words which have different meanings but are spelled identically), we touch on the challenges involved in processing other types.
@article{miller2016towards,
author       = {Tristan Miller and Mladen Turkovi{\'{c}}},
title        = {Towards the Automatic Detection and Identification of {English} Puns},
journal      = {European Journal of Humour Research},
volume       = {4},
number       = {1},
pages        = {59--75},
month        = jan,
year         = {2016},
issn         = {2307-700X},
}
11. Tristan Miller, Darina Benikova, and Sallam Abualhaija. In Proceedings of GermEval 2015: LexSub, pages 1–9, September 2015.

Lexical substitution is a task in which participants are given a word in a short context and asked to provide a list of synonyms appropriate for that context. This paper describes GermEval 2015: LexSub, the first shared task for automated lexical substitution on German-language text. We describe the motivation for this task, the evaluation methods, and the manually annotated data set used to train and test the participating systems. Finally, we present an overview and discussion of the participating systems' methodologies, resources, and results.
@inproceedings{miller2015germeval,
author       = {Miller, Tristan and Benikova, Darina and Abualhaija, Sallam},
title        = {{GermEval} 2015: {LexSub}~-- {A} Shared Task for {German}-language Lexical Substitution},
booktitle    = {Proceedings of GermEval 2015: LexSub},
pages        = {1--9},
month        = sep,
year         = {2015},
}
12. Tristan Miller and Iryna Gurevych. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL–IJCNLP 2015), volume 1, pages 719–729, July 2015. ISBN 978-1-941643-72-3.

Traditional approaches to word sense disambiguation (WSD) rest on the assumption that there exists a single, unambiguous communicative intention underlying every word in a document. However, writers sometimes intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use of lexical ambiguity---\emph{i.e.}, punning---is a particularly common source of humour. In this paper we describe how traditional, language-agnostic WSD approaches can be adapted to disambiguate'' puns, or rather to identify their double meanings. We evaluate several such approaches on a manually sense-annotated corpus of English puns and observe performance exceeding that of some knowledge-based and supervised baselines.
@inproceedings{miller2015automatic,
author       = {Tristan Miller and Iryna Gurevych},
title        = {Automatic Disambiguation of {English} Puns},
booktitle    = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL--IJCNLP 2015)},
volume       = {1},
pages        = {719--729},
month        = jul,
year         = {2015},
isbn         = {978-1-941643-72-3},
}
13. We present an integer sequence $a(n)$ corresponding to the number of distinct graphs of order $n$ where the vertices can be mapped to different squares of a chessboard such that the connected pairs of vertices are a knight's move apart.
@incollection{A255436,
author       = {Tristan Miller},
title        = {A255436: Number of Distinct, Connected, Order-n Subgraphs of the Infinite Knight's Graph},
booktitle    = {The On-line Encyclopedia of Integer Sequences},
month        = feb,
year         = {2015},
}
14. Michael Matuschek, Tristan Miller, and Iryna Gurevych. In Josef Ruppenhofer and Gertrud Faaß, editors, Proceedings of the 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014), pages 11–21. Universitätsverlag Hildesheim, October 2014. ISBN 978-3-934105-46-1.

We present a method for clustering word senses of a lexical-semantic resource by mapping them to those of another sense inventory. This is a promising way of reducing polysemy in sense inventories and consequently improving word sense disambiguation performance. In contrast to previous approaches, we use Dijkstra-WSA, a parameterizable alignment algorithm which is largely resource- and language-agnostic. To demonstrate this, we apply our technique to GermaNet, the German equivalent to WordNet. The GermaNet sense clusterings we induce through alignments to various collaboratively constructed resources achieve a significant boost in accuracy, even though our method is far less complex and less dependent on language-specific knowledge than past approaches.
@inproceedings{matuschek2014language,
author       = {Michael Matuschek and Tristan Miller and Iryna Gurevych},
editor       = {Josef Ruppenhofer and Gertrud Faa{\ss}},
title        = {A Language-independent Sense Clustering Approach for Enhanced {WSD}},
booktitle    = {Proceedings of the 12th Konferenz zur Verarbeitung nat{\"{u}}rlicher Sprache (KONVENS 2014)},
pages        = {11--21},
month        = oct,
year         = {2014},
publisher    = {Universit{\"{a}}tsverlag Hildesheim},
isbn         = {978-3-934105-46-1},
}
15. Tristan Miller and Iryna Gurevych. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pages 2094–2100. European Language Resources Association, May 2014. ISBN 978-2-9517408-8-4.

The coverage and quality of conceptual information contained in lexical semantic resources is crucial for many tasks in natural language processing. Automatic alignment of complementary resources is one way of improving this coverage and quality; however, past attempts have always been between pairs of specific resources. In this paper we establish some set-theoretic conventions for describing concepts and their alignments, and use them to describe a method for automatically constructing $n$-way alignments from arbitrary pairwise alignments. We apply this technique to the production of a three-way alignment from previously published WordNet--Wikipedia and WordNet--Wiktionary alignments. We then present a quantitative and informal qualitative analysis of the aligned resource. The three-way alignment was found to have greater coverage, an enriched sense representation, and coarser sense granularity than both the original resources and their pairwise alignments, though this came at the cost of accuracy. An evaluation of the induced word sense clusters in a word sense disambiguation task showed that they were no better than random clusters of equivalent granularity. However, use of the alignments to enrich a sense inventory with additional sense glosses did significantly improve the performance of a baseline knowledge-based WSD algorithm.
@inproceedings{miller2014wordnet,
author       = {Tristan Miller and Iryna Gurevych},
editor       = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
title        = {{WordNet}--{Wikipedia}--{Wiktionary}: Construction of a Three-way Alignment},
booktitle    = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)},
pages        = {2094--2100},
month        = may,
year         = {2014},
publisher    = {European Language Resources Association},
isbn         = {978-2-9517408-8-4},
}
16. Tristan Miller, Nicolai Erbs, Hans-Peter Zorn, Torsten Zesch, and Iryna Gurevych. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (System Demonstrations) (ACL 2013), pages 37–42, August 2013.

Implementations of word sense disambiguation (WSD) algorithms tend to be tied to a particular test corpus format and sense inventory. This makes it difficult to test their performance on new data sets, or to compare them against past algorithms implemented for different data sets. In this paper we present DKPro WSD, a freely licensed, general-purpose framework for WSD which is both modular and extensible. DKPro WSD abstracts the WSD process in such a way that test corpora, sense inventories, and algorithms can be freely swapped. Its UIMA-based architecture makes it easy to add support for new resources and algorithms. Related tasks such as word sense induction and entity linking are also supported.
@inproceedings{miller2013dkpro,
author       = {Tristan Miller and Nicolai Erbs and Hans-Peter Zorn and Torsten Zesch and Iryna Gurevych},
title        = {{DKPro} {WSD}: {A} Generalized {UIMA}-based Framework for Word Sense Disambiguation},
booktitle    = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (System Demonstrations) (ACL 2013)},
pages        = {37--42},
month        = aug,
year         = {2013},
}
17. Tristan Miller, Chris Biemann, Torsten Zesch, and Iryna Gurevych. In Martin Kay and Christian Boitet, editors, Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 1781–1796, December 2012.

We explore the contribution of distributional information for purely knowledge-based word sense disambiguation. Specifically, we use a distributional thesaurus, computed from a large parsed corpus, for lexical expansion of context and sense information.This bridges the lexical gap that is seen as the major obstacle for word overlap–based approaches.We apply this mechanism to two traditional knowledge-based methods and show that distributional information significantly improves disambiguation results across several data sets.This improvement exceeds the state of the art for disambiguation without sense frequency information—a situation which is especially encountered with new domains or languages for which no sense-annotated corpus is available.
@inproceedings{miller2012using,
author       = {Tristan Miller and Chris Biemann and Torsten Zesch and Iryna Gurevych},
editor       = {Martin Kay and Christian Boitet},
title        = {Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation},
booktitle    = {Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012)},
pages        = {1781--1796},
month        = dec,
year         = {2012},
}
18. Tristan Miller, Bertin Klein, and Elisabeth Wolf. In Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, and Nikolai Nikolov, editors, Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing (RANLP 2009), pages 241–245. ACM Press, September 2009.

Good hypertext writing style mandates that link texts clearly indicate the nature of the link target. While this guideline is routinely ignored in HTML, the lightweight markup languages used by wikis encourage or even force hypertext authors to use semantically appropriate link texts. This property of wiki hypertext makes it an ideal candidate for processing with latent semantic analysis, a factor analysis technique for finding latent transitive relations among natural-language documents. In this study, we design, implement, and test an LSA-based information retrieval system for wikis. Instead of a full-text index, our system indexes only link texts and document titles. Nevertheless, its precision exceeds that of a popular full-text search engine, and is comparable to that of PageRank-based systems such as Google.
@inproceedings{miller2009exploiting,
author       = {Tristan Miller and Bertin Klein and Elisabeth Wolf},
editor       = {Galia Angelova and Kalina Bontcheva and Ruslan Mitkov and Nicolas Nicolov and Nikolai Nikolov},
title        = {Exploiting Latent Semantic Relations in Highly Linked Hypertext for Information Retrieval in Wikis},
booktitle    = {Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing (RANLP 2009)},
pages        = {241--245},
month        = sep,
year         = {2009},
publisher    = {ACM Press},
}
19. Tristan Miller and Elisabeth Wolf. In Yuan Yan Tang, S. Patrick Wang, G. Lorette, Daniel So Yeung, and Hong Yan, editors, Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), volume 1, pages 1252–1255. IEEE Press, August 2006. ISBN 978-0-7695-2521-1. DOI: 10.1109/ICPR.2006.1191.

Current word completion tools rely mostly on statistical or syntactic knowledge. Can using semantic knowledge improve the completion task? We propose a language-independent word completion algorithm which uses latent semantic analysis (LSA) to model the semantic context of the word being typed. We find that a system using this algorithm alone achieves keystroke savings of 56\% and a hit rate of 42\%. This represents improvements of 4.3\% and 12\%, respectively, over existing approaches.
@inproceedings{miller2006word,
author       = {Tristan Miller and Elisabeth Wolf},
editor       = {Yuan Yan Tang and S. Patrick Wang and G. Lorette and Daniel So Yeung and Hong Yan},
title        = {Word Completion with Latent Semantic Analysis},
booktitle    = {Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006)},
volume       = {1},
pages        = {1252--1255},
month        = aug,
year         = {2006},
publisher    = {IEEE Press},
isbn         = {978-0-7695-2521-1},
issn         = {1051-4651},
doi          = {10.1109/ICPR.2006.1191},
}
20. Elisabeth Wolf, Shankar Vembu, and Tristan Miller. In Tapio Salakoski, Filip Ginter, Sampo Pyysalo, and Tapio Pahikkala, editors, Proceedings of the 5th International Conference on Natural Language Processing (FinTAL 2006), volume 4139 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 500–511. Springer, August 2006. ISBN 978-3-540-37334-6. DOI: 10.1007/11816508_50.

We investigate the use of topic models, such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), for word completion tasks. The advantage of using these models for such an application is twofold. On the one hand, they allow us to exploit semantic or contextual information when predicting candidate words for completion. On the other hand, these probabilistic models have been found to outperform classical latent semantic analysis (LSA) for modeling text documents. We describe a word completion algorithm that takes into account the semantic context of the word being typed. We also present evaluation metrics to compare different models being used in our study. Our experiments validate our hypothesis of using probabilistic models for semantic analysis of text documents and their application in word completion tasks.
@inproceedings{wolf2006use,
author       = {Elisabeth Wolf and Shankar Vembu and Tristan Miller},
editor       = {Tapio Salakoski and Filip Ginter and Sampo Pyysalo and Tapio Pahikkala},
title        = {On the Use of Topic Models for Word Completion},
booktitle    = {Proceedings of the 5th International Conference on Natural Language Processing (FinTAL 2006)},
volume       = {4139},
pages        = {500--511},
series       = {Lecture Notes in Computer Science},
month        = aug,
year         = {2006},
publisher    = {Springer},
isbn         = {978-3-540-37334-6},
issn         = {0302-9743},
doi          = {10.1007/11816508_50},
}
21. Tristan Miller. Pluto Journal, (47), May 2006. Translated by Gabriele Zucchetta.

In questo articolo verr{\{a}} presentato HA-prosper, un pacchetto LaTeX per la creazione di sofisticate slide. Ne descriveremo le caratteristiche mostrandone alcuni esempi d'uso. Inoltre, discuteremo quali vantaggi si possono trarre dal tipo di approccio, proprio della filosofia LaTeX, in rapporto agli altri tipi di programmi per presentazioni che generalmente sono presenti nelle attuali suite di applicazioni per ufficio.
@article{miller2006producing,
author       = {Tristan Miller},
title        = {Creare splendide slade con {\LaTeX}: Un'introduzione al pacchetto {HA}-prosper [{Producing} Beautiful Slides with {\LaTeX}: {An} Introduction to the {HA}-prosper Package]},
journal      = {Pluto Journal},
number       = {47},
month        = may,
year         = {2006},
note         = {Translated by Gabriele Zucchetta},
}
22. Peter Flom and Tristan Miller. TUGboat, 26(1):31–32, 2005. ISSN 0896-3207.

@article{flom2005bimpressions,
author       = {Peter Flom and Tristan Miller},
title        = {Impressions from {Prac}{\TeX}'05},
journal      = {TUGboat},
volume       = {26},
number       = {1},
pages        = {31--32},
year         = {2005},
issn         = {0896-3207},
}
23. Tristan Miller. TUGboat, 26(1):85–96, 2005. ISSN 0896-3207.

We present Biblet, a set of BibTeX bibliography styles (bst) which generate XHTML from BibTeX databases. Unlike other BibTeX to XML{\slash}HTML converters, Biblet is written entirely in the native BibTeX style language and therefore works `out of the box'' on any system that runs BibTeX. Features include automatic conversion of LaTeX symbols to HTML or Unicode entities; customizable graphical hyperlinks to PostScript, PDF, DVI, LaTeX, and HTML resources; support for nonstandard but common fields such as day, isbn, and abstract; hideable text blocks; and output of the original BibTeX entry for sharing citations. Biblet's highly structured XHTML output means that bibliography appearance to can be drastically altered simply by specifying a Cascading Style Sheet (CSS), or easily postprocessed with third-party XML, HTML, or text processing tools. We compare and contrast Biblet to other common converters, describe basic usage of Biblet, give examples of how to produce custom-formatted bibliographies, and provide a basic overview of Biblet internals for those wishing to modify the style file itself.
@article{miller2005biblet,
author       = {Tristan Miller},
title        = {Biblet: {A} Portable {\BibTeX}\ Bibliography Style for Generating Highly Customizable {XHTML}},
journal      = {TUGboat},
volume       = {26},
number       = {1},
pages        = {85--96},
year         = {2005},
issn         = {0896-3207},
}
24. Tristan Miller. TUGboat, 26(1):17–28, 2005. ISSN 0896-3207.

RPM is a package management system which provides a uniform, automated way for users to install, upgrade, and uninstall programs. Because RPM is the default software distribution format for many operating systems (particularly GNU/Linux), users may find it useful to manage their library of TeX-related packages using RPM. This article explains how to produce RPM files for TeX software, either for personal use or for public distribution. We also explain how a (La)TeX user can find, install, and remove TeX-related RPM packages.
@article{miller2005using,
author       = {Tristan Miller},
title        = {Using the {RPM} {Package} {Manager} for {\LaTeXTeX}{} Packages},
journal      = {TUGboat},
volume       = {26},
number       = {1},
pages        = {17--28},
year         = {2005},
issn         = {0896-3207},
}
25. Tristan Miller and Stefan Agne. In Peter Clark and Guus Schreiber, editors, Proceedings of the 3rd International Conference on Knowledge Capture (K-CAP 2005), pages 209–210, New York, NY, September 2005. ACM. ISBN 978-1-59593-163-4. DOI: 10.1145/1088622.1088672.

We describe eFISK, an automated keyword extraction system which unobtrusively measures the user's attention in order to isolate and identify those areas of a written document the reader finds of greatest interest. Attention is measured by use of eye-tracking hardware consisting of a desk-mounted infrared camera which records various data about the user's eye. The keywords thus identified are subsequently used in the back end of an information retrieval system to help the user find other documents which contain information of interest to him. Unlike traditional IR techniques which compare documents simply on the basis of common terms withal, our system also accounts for the weights users implicitly attach to certain words or sections of the source document. We describe a task-based user study which compares the utility of standard relevance feedback techniques to the keywords and keyphrases discovered by our system in finding other relevant documents from a corpus.
@inproceedings{miller2005attention-based,
author       = {Tristan Miller and Stefan Agne},
editor       = {Peter Clark and Guus Schreiber},
title        = {Attention-based Information Retrieval Using Eye Tracker Data},
booktitle    = {Proceedings of the 3rd International Conference on Knowledge Capture ({K-CAP} 2005)},
pages        = {209--210},
month        = sep,
year         = {2005},
publisher    = {ACM},
isbn         = {978-1-59593-163-4},
doi          = {10.1145/1088622.1088672},
}
26. Peter Flom and Tristan Miller. The PracTeX Journal, 2(3), July 2005. ISSN 1556-6994.

@article{flom2005impressions,
author       = {Peter Flom and Tristan Miller},
title        = {Impressions from {Prac}{\TeX}'05},
journal      = {The Prac{\TeX}{} Journal},
volume       = {2},
number       = {3},
month        = jul,
year         = {2005},
issn         = {1556-6994},
}
27. Bertin Klein, Tristan Miller, and Sandra Zilles. In Dieter Hutter and Markus Ullmann, editors, Security in Pervasive Computing: Proceedings of the 2nd International Conference on Security in Pervasive Computing (SPC 2005), volume 3450 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 56–62. Springer, April 2005. ISBN 3-540-25521-4.

Technological progress allows us to equip any mobile phone with new functionalities, such as storing personalized information about its owner and using the corresponding personal profile for enabling communication to persons whose mobile phones represent similar profiles. However, this raises very specific security issues, in particular relating to the use of Bluetooth technology. Herein we consider such scenarios and related problems in privacy and security matters. We analyze in which respect certain design approaches may fail or succeed at solving these problems. We concentrate on methods for designing the user-related part of the communication service appropriately in order to enhance confidentiality.
@inproceedings{klein2005security,
author       = {Bertin Klein and Tristan Miller and Sandra Zilles},
editor       = {Dieter Hutter and Markus Ullmann},
title        = {Security Issues for Pervasive Personalized Communication Systems},
booktitle    = {Security in Pervasive Computing: Proceedings of the 2nd International Conference on Security in Pervasive Computing (SPC 2005)},
volume       = {3450},
pages        = {56--62},
series       = {Lecture Notes in Computer Science},
month        = apr,
year         = {2005},
publisher    = {Springer},
isbn         = {3-540-25521-4},
issn         = {0302-9743},
}
28. Tristan Miller. The PracTeX Journal, 2(1), April 2005. ISSN 1556-6994.

In this paper, we present HA-prosper, a {\LaTeX}{} package for creating overhead slides. We describe the features of the package and give examples of their use. We also discuss what advantages there are to producing slides with LaTeX versus the presentation software typically bundled with today's office suites.
@article{miller2005producing,
author       = {Tristan Miller},
title        = {Producing Beautiful Slides with {\LaTeX}: {An} Introduction to the {HA}-prosper Package},
journal      = {The Prac{\TeX}{} Journal},
volume       = {2},
number       = {1},
month        = apr,
year         = {2005},
issn         = {1556-6994},
}
29. Tristan Miller. In Nicolas Nicolov, Kalina Botcheva, Galia Angelova, and Ruslan Mitkov, editors, Recent Advances in Natural Language Processing III, volume 260 of Current Issues in Linguistic Theory (CILT) (ISSN 0304-0763), pages 277–286. John Benjamins, Amsterdam/Philadelphia, 2004. ISBN 1-58811-618-2. DOI: 10.1075/cilt.260.31mil.

We describe a language-neutral automatic summarization system which aims to produce coherent extracts. It builds an initial extract composed solely of topic sentences, and then recursively fills in the topical lacunae by providing linking material between semantically dissimilar sentences. While experiments with human judges did not prove a statistically significant increase in textual coherence with the use of a latent semantic analysis module, we found a strong positive correlation between coherence and overall summary quality.
@incollection{miller2004latent,
author       = {Tristan Miller},
editor       = {Nicolas Nicolov and Kalina Botcheva and Galia Angelova and Ruslan Mitkov},
title        = {Latent Semantic Analysis and the Construction of Coherent Extracts},
booktitle    = {Recent Advances in Natural Language Processing {III}},
volume       = {260},
pages        = {277--286},
series       = {Current Issues in Linguistic Theory (CILT)},
year         = {2004},
publisher    = {John Benjamins},
isbn         = {1-58811-618-2},
issn         = {0304-0763},
doi          = {10.1075/cilt.260.31mil},
}
30. Tristan Miller. Journal of Educational Computing Research, 29(4):495–512, December 2003. ISSN 0735-6331. DOI: 10.2190/W5AR-DYPW-40KX-FL99.

Latent semantic analysis (LSA) is an automated, statistical technique for comparing the semantic similarity of words or documents. In this paper, I examine the application of LSA to automated essay scoring. I compare LSA methods to earlier statistical methods for assessing essay quality, and critically review contemporary essay-scoring systems built on LSA, including the Intelligent Essay Assessor, Summary Street, State the Essence, Apex, and Select-a-Kibitzer. Finally, I discuss current avenues of research, including LSA's application to computer-measured readability assessment and to automatic summarization of student essays.
@article{miller2003essay,
author       = {Tristan Miller},
title        = {Essay Assessment with Latent Semantic Analysis},
journal      = {Journal of Educational Computing Research},
volume       = {29},
number       = {4},
pages        = {495--512},
month        = dec,
year         = {2003},
issn         = {0735-6331},
doi          = {10.2190/W5AR-DYPW-40KX-FL99},
}
31. Tristan Miller. In Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, and Nikolai Nikolov, editors, Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing (RANLP 2003), pages 270–277, September 2003. ISBN 954-90906-6-3.

We describe a language-neutral automatic summarization system which aims to produce coherent extracts. It builds an initial extract composed solely of topic sentences, and then recursively fills in the topical lacunae by providing linking material between semantically dissimilar sentences. While experiments with human judges did not prove a statistically significant increase in textual coherence with the use of a latent semantic analysis module, we found a strong positive correlation between coherence and overall summary quality.
@inproceedings{miller2003latent,
author       = {Tristan Miller},
editor       = {Galia Angelova and Kalina Bontcheva and Ruslan Mitkov and Nicolas Nicolov and Nikolai Nikolov},
title        = {Latent Semantic Analysis and the Construction of Coherent Extracts},
booktitle    = {Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing (RANLP 2003)},
pages        = {270--277},
month        = sep,
year         = {2003},
isbn         = {954-90906-6-3},
}
32. Tristan Miller. M.Sc. thesis, Department of Computer Science, University of Toronto, March 2003.

A major problem with automatically-produced summaries in general, and extracts in particular, is that the output text often lacks textual coherence. Our goal is to improve the textual coherence of automatically produced extracts. We developed and implemented an algorithm which builds an initial extract composed solely of topic sentences, and then recursively fills in the lacunae by providing linking material from the original text between semantically dissimilar sentences. Our summarizer differs in architecture from most others in that it measures semantic similarity with latent semantic analysis (LSA), a factor analysis technique based on the vector-space model of information retrieval. We believed that the deep semantic relations discovered by LSA would assist in the identification and correction of abrupt topic shifts in the summaries. However, our experiments did not show a statistically significant difference in the coherence of summaries produced by our system as compared with a non-LSA version.
@mastersthesis{miller2003generating,
author       = {Tristan Miller},
title        = {Generating Coherent Extracts of Single Documents Using Latent Semantic Analysis},
type         = {{M.Sc.}\ thesis},
month        = mar,
year         = {2003},
school       = {Department of Computer Science, University of Toronto},
}
33. Michael J. Maher, Allan Rock, Grigoris Antoniou, David Billington, and Tristan Miller. International Journal on Artificial Intelligence Tools, 10(4):483–501, December 2001. ISSN 0218-2130. DOI: 10.1142/S0218213001000623.

For many years, the non-monotonic reasoning community has focussed on highly expressive logics. Such logics have turned out to be computationally expensive, and have given little support to the practical use of non-monotonicreasoning. In this work we discuss defeasible logic, a less-expressive but more efficient non-monotonic logic. We report on two new implemented systems for defeasible logic: a query answering system employing a backward-chaining approach, and a forward-chaining implementation that computes all conclusions. Our experimental evaluation demonstrates that the systems can deal with large theories (up to hundreds of thousands of rules). We show that defeasible logic has linear complexity, which contrasts markedly with most other non-monotonic logics and helps to explain the impressive experimental results. We believe that defeasible logic, with its efficiency and simplicity, is a good candidate to be used as a modelling language for practical applications, including modelling of regulations and business rules.
@article{maher2001efficient,
author       = {Michael J. Maher and Allan Rock and Grigoris Antoniou and David Billington and Tristan Miller},
title        = {Efficient Defeasible Reasoning Systems},
journal      = {International Journal on Artificial Intelligence Tools},
volume       = {10},
number       = {4},
pages        = {483--501},
month        = dec,
year         = {2001},
issn         = {0218-2130},
doi          = {10.1142/S0218213001000623},
}
34. Tristan Miller. Essay assessment with latent semantic analysis. Technical Report CSRG-440, Department of Computer Science, University of Toronto, May 2001.

@techreport{miller2001essay,
author       = {Tristan Miller},
title        = {Essay Assessment with Latent Semantic Analysis},
number       = {{CSRG-440}},
type         = {Technical Report},
month        = may,
year         = {2001},
institution  = {Department of Computer Science, University of Toronto},
}
35. Michael J. Maher, Allan Rock, Grigoris Antoniou, David Billington, and Tristan Miller. In Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2000), pages 384–392. IEEE Press, November 2000. ISBN 0-7695-0909-6. DOI: 10.1109/TAI.2000.889898.

For many years, the non-monotonic reasoning community has focussed on highly expressive logics. Such logics have turned out to be computationally expensive, and have given little support to the practical use of non-monotonicreasoning. In this work we discuss defeasible logic, a less-expressive but more efficient non-monotonic logic. We report on two new implemented systems for defeasible logic: a query answering system employing a backward-chaining approach, and a forward-chaining implementation that computes all conclusions. Our experimental evaluation demonstrates that the systems can deal with large theories (up to hundreds of thousands of rules). We show that defeasible logic has linear complexity, which contrasts markedly with most other non-monotonic logics and helps to explain the impressive experimental results. We believe that defeasible logic, with its efficiency and simplicity, is a good candidate to be used as a modelling language for practical applications, including modelling of regulations and business rules.
@inproceedings{maher2000efficient,
author       = {Michael J. Maher and Allan Rock and Grigoris Antoniou and David Billington and Tristan Miller},
title        = {Efficient Defeasible Reasoning Systems},
booktitle    = {Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2000)},
pages        = {384--392},
month        = nov,
year         = {2000},
publisher    = {IEEE Press},
isbn         = {0-7695-0909-6},
issn         = {1082-3409},
doi          = {10.1109/TAI.2000.889898},
}
36. Yang Xiang and Tristan Miller. International Journal of Applied Mathematics, 1(8):923–932, 1999. ISSN 1311-1728.

Automatic generation of Bayesian network (BN) structures (directed acyclic graphs) is an important step in experimental study of algorithms for inference in BNs and algorithms for learning BNs from data. Previously known simulation algorithms do not guarantee connectedness of generated structures or even successful genearation according to a user specification. We propose a simple, efficient and well-behaved algorithm for automatic generation of BN structures. The performance of the algorithm is demonstrated experimentally.
@article{xiang1999wellbehaved,
author       = {Yang Xiang and Tristan Miller},
title        = {A Well-behaved Algorithm for Simulating Dependence Structures of {Bayesian} Networks},
journal      = {International Journal of Applied Mathematics},
volume       = {1},
number       = {8},
pages        = {923--932},
year         = {1999},
issn         = {1311-1728},
}