AUTOMATIC IDENTIFICATION OF HATE SPEECH IN ONLINE COMMENTS: A TOOL-BASED LINGUISTIC ANALYSIS OF ROMAN URDU

Authors

  • Asma Batool MPhil Scholar, Department of English, NUML University, Faisalabad Campus Author
  • Dr. Aftab Akram Lecturer, Department of English, NUML University, Faisalabad Campus Author

DOI:

https://doi.org/10.63878/jalt1989

Abstract

This paper explores the automatic detection of hate speech in online comments by using a tool-based linguistic analysis of Pakistani Roman Urdu discourse in Twitter. Taking the exploratory model of computational design and a mixed-method approach, which is a combination of the automated rule-based comment detection and interpretative linguistic analysis, the research gathered a sample of 108 comments in absolute terms and addressed to Pakistani political leaders and institutions. A three-level hate speech lexicon was inductively built on the basis of the corpus and implemented with the help of the WordList and Collocation detection patterns of the AntConc corpus tool, to determine high-frequency markers of hate and its collocating patterns. The comments were then grouped as either hate speech, borderline, or non-hate speech. Critical Discourse Analysis and Speech Act Theory were the theoretical bases of the analysis. Results indicate that hate speech in this corpus is not a random or idiosyncratic event but a discursive practice that is structured and ideologically consistent, and that recreates the status quo hierarchies of gender, religion, ethnicity, political affiliation, etc., using language. More importantly, the analysis shows that a rule-based keyword detection, however productive as a first-pass tool, cannot be effectively used as an independent classification tool in Roman Urdu, 38.9% of comments are located in an ambiguous category, which needs pragmatic and contextual analysis in order to be resolved. This paper has its contribution in an approach to replicate a methodology, a curated Roman Urdu hate speech corpus, with an annotated corpus, and a sociolinguistically grounded lexicon to a field of digital discourse study that is underrepresented.

Downloads

Published

2026-03-31