Word sense disambiguation (WSD), the task of identifying a word's meaning in context, has long been recognized as an important task in computational linguistics. Traditional approaches to WSD rest on the assumption that there is a single, unambiguous communicative intention underlying each word in the document. However, there exists a class of language constructs known as puns, in which lexical-semantic ambiguity is a deliberate effect of the communication act. That is, the speaker or writer intends for a certain word or other lexical item to be interpreted as simultaneously carrying two or more separate meanings. Though puns are a recurrent and expected feature in many discourse types, they have attracted relatively little attention in the fields of computational linguistics and natural language processing in general, or WSD in particular.
A pun is a form of wordplay in which one signifier (e.g., a word or phrase) suggests two or more meanings by exploiting polysemy, or phonological similarity to another signifier, for an intended humorous or rhetorical effect. For example, the first of the following two punning jokes exploits contrasting meanings of the word "interest", while the second exploits the sound similarity between the surface form "propane" and the latent target "profane":
I used to be a banker but I lost interest.
When the church bought gas for their annual barbecue, proceeds went from the sacred to the propane.
Puns where the two meanings share the same pronunciation are known as homophonic or perfect, while those relying on similar- but not identical-sounding signs are known as heterophonic or imperfect. Where the signs are considered as written rather than spoken sequences, a similar distinction can be made between homographic and heterographic puns.
Conscious or tacit linguistic knowledge – particularly of lexical semantics and phonology – is an essential prerequisite for the production and interpretation of puns. This has long made them an attractive subject of study in theoretical linguistics, and has led to a small but growing body of research into puns in computational linguistics. Most computational treatments of puns to date, however, have focused on generational algorithms or modelling their phonological properties.
Participants will be provided with two data sets:
For one or both data sets, participating systems will compete in any or all of three subtasks, evaluated in consecutive stages: