SEMANTIC ANNOTATION OF SELECTED URDUIZED WORDS IN PAKISTANI ENGLISH: EVIDENCE FROM PAKLOCCORPUS
DOI:
https://doi.org/10.63878/jalt1688Abstract
Urduized words embedded in English discourse are a defining feature of Pakistani English. Despite their frequency and sociocultural salience, these words remain under-represented in corpus annotation frameworks and natural language processing (NLP) systems. This paper presents a corpus-driven semantic annotation framework for Urduized words, aligned with data from PakLocCorpus (2022). Using concordance evidence from the PakLocCorpus Urduized word list, the study conduct a discourse analysis of selected words using Gee (2011) and develops a context-sensitive semantic tag-set that captures culturally grounded meaning domains of descriptive labels. The analysis demonstrates that Urduized words function as semantically dense cultural carriers rather than peripheral borrowings. The paper argues that systematic semantic annotation of Urduized words is essential for inclusive corpus linguistics and for reducing structural bias in English-focused NLP technologies and descriptive labels have unique morpho-syntactic features that can be used for designing tagging tools and frameworks. Findings of the current study can be used in building pedagogical strategies for English language teaching (ELT) and second language acquisition (SLA).
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

