CODE-SWITCHING IN MULTILINGUAL DIGITAL TEXTS: AUTOMATED DETECTION AND LINGUISTIC PATTERNING THROUGH AI-BASED CORPUS ANALYSIS

Muhammad Nusrat (Corresponding Author); Aqsa Shereen; Sumera Bhanbhro; Hira Abbas

doi:10.63878/jalt1197

Authors

Muhammad Nusrat (Corresponding Author) University of Education Lahore Author
Aqsa Shereen Assistant professor in English at Women University Swabi Author
Sumera Bhanbhro Assistant Professor at Institute of English Language & Literature, University of Sindh, Jamshoro. Author
Hira Abbas Researcher, Department of Linguistics and Literature, Karakoram International University Gilgit. Author

DOI:

https://doi.org/10.63878/jalt1197

Keywords:

Code-switching, multilingual NLP, corpus analysis, transformer models, syntactic boundaries, bilingual collocations, annotation reliability, AI-based language detection.

Abstract

In this study, the authors explore whether machine learning-based corpus/pattern recognition of code switching in multilingual digital writings is feasible and to monitor the extent of code switching between these five languages in multilingual contexts (English-Hindi, English-Spanish, English-Arabic, English-Tagalog, and English-Malay). Based on a large-scale annotated corpus and transformer-based architecture fine-tuned on the multilingual setting, the study delivered token-level accuracies of above 95 percent and macro F1 scores of over 0.94 both in the in-domain and out-of-domain assessment. The analysis elicited consistent part-of-speech triggers where the nouns, verbs, discourse markers, were common occurrence at switch point, and usages of syntactic choices were realized by focusing on switch at noun phrase-verb phrase boundaries. The fact that high bilingual collocations ranks also demonstrated that formulaic expressions were robust indicators of switches. The reliability of annotation was verified using Cohen Kappa values that exceeded 0.86 and hyperparameter tuning showed that long-distance switching dependencies can be captured using longer sequence lengths. Not simply re-assuring the powerfulness of AI-guided models to reflect a human-level of code-switch detection, the results also contribute to theoretical knowledge in terms of structural, pragmatic, as well as sociolinguistic aspects of cross-linguistic contact when occurring in online contexts of communication.

CODE-SWITCHING IN MULTILINGUAL DIGITAL TEXTS: AUTOMATED DETECTION AND LINGUISTIC PATTERNING THROUGH AI-BASED CORPUS ANALYSIS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

jalt

HEC

Y Category HEC Recognized

Information

Language

VISITOR