Detection of abusive language in user-generated online content has become an issue of increasing importance in recent years. For instance, cyberbullying and other forms of online harassment have forced many users to remove their accounts, and large Internet companies have struggled to identify and filter abusive posts and users. Most current commercial methods make use of blacklists and regular expressions; however, these measures fall short when contending with more subtle examples of hate speech. We will present a machine learning based method to detect hate speech in online user comments. The presented method outperforms a state-of-the-art deep learning approach. We also show how we develop a corpus of user comments annotated for abusive language, the first of its kind.
Joel Tetreault is a Research Director at Grammarly. His research focus is Natural Language Processing with specific interests in anaphora, dialogue, and discourse processing, machine learning, and applying these techniques to the analysis of English language learning. At Grammarly, he works on the research and development of NLP tools and components for the next generation of intelligent writing assistance systems. Prior to Grammarly, Joel was a Senior Senior Research Scientist at Yahoo Labs, Senior Principal Manager of the Core Natural Language group at Nuance Communications, Inc. Also, Joel has co-organized the Building Educational Application workshop series for 8 years, several shared tasks, and is currently NAACL Treasurer.