![]() We have built SinSpell, a comprehensive spelling checker for the Sinhala language which is spoken by over 16 million people, mainly in Sri Lanka. ![]() Our code, training dataset and the evaluation dataset are publicly released. Thus this paper sets a new baseline for Sinhala spell correction. ![]() The best performing neural model was used to spellcorrect a training dataset for a Sinhala text classification task, which showed an improved performance over the training dataset with spelling errors. To be specific, three neural models were implemented, and all three models outperformed the currently available rule based and dictionarybased Sinhala spellcorrectors. Further, this paper presents the first implementation of neural spell correction models for Sinhala. We also prepared a comprehensive list of Sin hala spelling errors that are commonly found in digitized text. In this paper we present a well curated evaluation dataset for Sinhala spell correctors. ![]() Further, there is no publicly available eval uation dataset to benchmark these spell correctors. Due to the complexity of the language, currently available Sinhala spellcorrectors that are based on dictionary lookups and ngram analysis provide suboptimal results. Sinhala is a morphologically rich language, and has a well defined rule set pertaining to spelling. Sinhala is a lowresource IndoAryan language primarily used by approximately 16 million people living in the island nation of Sri Lanka.
0 Comments
Leave a Reply. |