Two forms of bias that are commonly associated with natural language processing (NLP) tasks are domain bias (implicit bias towards documents from a particular domain, with lower performance over other document types) and social bias (implicit bias towards documents authored by particular types of individuals, with lower performance over documents authored by other types of individuals). In this talk, I will discuss the importance of debiasing NLP models across these dimensions, and strategies that can be employed to achieve this. I will focus the talk on the task of language identification (i.e., identifying the language(s) a written document is authored in).


Tim Baldwin is a Professor in the Department of Computing and Information Systems, The University of Melbourne, and Associate Dean (Research Training) within the Melbourne School of Engineering. He has previously held visiting positions at Cambridge University, University of Washington, University of Tokyo, Saarland University, NTT Communication Science Laboratories, and National Institute of Informatics. His research interests include text mining of social media, computational lexical semantics, information extraction, and web mining.