SEMANTIC ANNOTATION OF SELECTED URDUIZED WORDS IN PAKISTANI ENGLISH: EVIDENCE FROM PAKLOCCORPUS

Authors

  • Fatima Tuz Zahra PhD Scholar, Air University Islamabad Lecturer, Minhaj University Lahore. Author
  • Dr. Tehseen Zahra Associate professor Bahria University Islamabad. Author

DOI:

https://doi.org/10.63878/jalt1688

Abstract

Urduized words embedded in English discourse are a defining feature of Pakistani English. Despite their frequency and sociocultural salience, these words remain under-represented in corpus annotation frameworks and natural language processing (NLP) systems. This paper presents a corpus-driven semantic annotation framework for Urduized words, aligned with data from PakLocCorpus (2022). Using concordance evidence from the PakLocCorpus Urduized word list, the study conduct a discourse analysis of selected words using Gee (2011) and develops a context-sensitive semantic tag-set that captures culturally grounded meaning domains of descriptive labels. The analysis demonstrates that Urduized words function as semantically dense cultural carriers rather than peripheral borrowings. The paper argues that systematic semantic annotation of Urduized words is essential for inclusive corpus linguistics and for reducing structural bias in English-focused NLP technologies and descriptive labels have unique morpho-syntactic features that can be used for designing tagging tools and frameworks. Findings of the current study can be used in building pedagogical strategies for English language teaching (ELT) and second language acquisition (SLA).

Downloads

Published

2025-12-28