Computational Linguistics

AMR


The Department’s computational linguistics faculty are dedicated to the advancement of human language technology and the automatic production of richer and more accurate representations of utterances in English, Chinese, Hindi/Urdu, Arabic, Farsi and other languages 

Working within the Computational Language and EducAtion Research Center (CLEAR), these researchers and their affiliates apply cutting edge techniques from computer science to challenging issues in the processing of natural language.

Prof. Alexis Palmer uses statistical, corpus and computational linguistic methods to improve automatic detection of offensive language for online platforms. Many previous studies have shown the detrimental effect of toxic content on the people who do content moderation for online platforms, and improving automated content moderation is important for reducing this toxicity (as well as the toxicity experienced directly by users of such platforms). Most current systems for this task rely primarily on finding offensive keywords, but much hate speech (and other forms of offensive language) uses more complex forms of language. Dr. Palmer's team has developed COLD (Complex Offensive Language Dataset)—a collection of tweets and other utterances containing several different types of complex offensive language. The data set is intended to be used for diagnosing the strengths and weaknesses of existing systems for automatically detecting offensive language, helping researchers to understand how their models can be improved.

Emeritx Professor/Research Professor Martha Palmer's research involves the application of supervised machine learning to linguistically annotated data in order to train Natural Language Processing components, such as word sense disambiguation systems. These components comprise the building blocks for many different types of end-to-end systems with various applications, such as Information Retrieval, Information Extraction, Question Answering and Machine Translation. The linguistic annotation defines the depth and accuracy of the computer-generated representations, and the research offers a principled approach to developing new layers of increasingly rich levels of semantic and pragmatic annotation.

Computational linguistics is inherently interdisciplinary: it relies on the one hand on the latest developments in linguistic theories, but also on new algorithms and machine-learning approaches from computer science as well as findings about human language processing from cognitive research.

The potential applications of human language technology are far reaching: they can be applied to any field with information in the form of text or speech. With increasing digitization of resources and collections (books, document archives, various types of records,  etc.), there is a growing interest in applying computational linguistics across all disciplines and in all languages to help facilitate information gathering, filtering and prioritizing.