script conversion systems

Detect script conversion systems used from existing text

When given a pair of transliterated text, often one would need to know what exact script conversion system was used to create the output.

This has always been a tricky problem given the large number of existing script conversion systems, some of which apply per script and per language. With Interscript's extensive set of script conversion systems, this is no longer a challenge.

The system detection feature of Interscript outputs a list of potential script conversion systems used for a given pair, sorted by likelyhood as measured by the Levenshtein distance. On this page, a maximum of 10 systems are returned.

Why would a text pair match more than one system? One might ask. The reason is that many script conversion systems for the same script and language will have similarities, and very often they share the same conversion rules. Note that even a match with zero distance does not provide 100% confidence of the system used, as there could be multiple systems that produce the same output given an input.

    Rababa for Arabic diacriticization

    Diacritization is the art of completing Arabic scripts with the correct vocalization, which is a task that only advanced Arabic speakers successfully manage.

    Rababa is an open source, openly-licensed library available on both Python and Ruby that utilizes advanced neural network architectures for the diacriticization of Abjad scripts like Arabic and Hebrew. Moreover, the trained models are also openly-licensed and their source datasets are also open sourced.

    Trained Rababa models are available in both PyTorch and ONNX formats, which allows for platform-independent execution of the library.

    More details about the development and origin of Rababa can be found on our Rababa blog post. Rababa is available on GitHub.

    Enter unpointed Arabic text (or update the sample) and click "Add diacritics" to obtain diacriticized Arabic.