"ROMAN URDU AND CODE-MIXED LANGUAGE PROCESSING FOR SOCIAL MEDIA ANALYTICS IN PAKISTAN."
DOI:
https://doi.org/10.63878/jalt2212Abstract
This study examines Roman Urdu and Urdu-English code-mixing. It also examines the problems associated with using online Roman Urdu and code-mixing and digital Urdu. The study also includes the Roman Urdu social media linguistic structure, code-mixing, and Roman Urdu with a focus on Urdu Cricket, Urdu Dramas, and Politics in Urdu. A manual corpus-based research approach combined with other methods was used in this study. This included the collection of 900 posts from YouTube, Facebook, and Twitter (now X). It was found that there is a lot of code-mixing and a lot of Roman Urdu in the social media posts collected for this research. Most of the code-mixing was intra-sentential, as compared to inter-sentential. The study’s analysis classified the social media posts sent with an almost equal balance of positive, negative, or neutral sentiments. Other social media posts dealt with several issues of Natural Language Processing, such as a lack of standard corpus, spelling variation, and linguistic uncertainty. The study, for the first time, also explored the necessity of an organized and advanced Natural Language Processing Technology for the multi-lingual digital space of Pakistan.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

