Smarting up Thai phones and Tablets
In the last few weeks, I have been spending time on a Thai language project (or a software application to make smartphones and tablets read Thai and so enables visually impaired people to enjoy life more).
This project involves 3 major technologies: Thai OCR (optical character recognition); Thai word recognition (as Thai writing usually puts words together in a string of text); and Thai Text-To-Speech (reading each word out).
ศอ.พว. NECTEC (ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ National Electronics and Computer Technplogy Centre) has done similar but separate things (with อ่านไทย ArnThai, SWATH - Smart Word Analysis for THai / Lexto - Thai Lexeme Tokenizer, and วาจา Vaja).
There are currently many working 'text (on computer screen) readers' available in English and some European languages. And NECTEC componentwise software have been available for some time now. Smartphones and tablets have camera to scan, have speaker to speak, and enough computing power to perform reading. This project is therefore quite feasible in its basic aim. The project may with extensions and more work enable multiple alphabets OCR and multiple languages TTS (to cover, say, AEC languages).
But before I go on. Let me see how many readers are interested in pointing a mobile phone at a card, a sign or a page of a book and listening to the phone reading that to you. (We still want to keep 'read books for the blind' projects because the point-and-read phone app will not be easy for the blind to use nor have emotional renditions for a long while yet ;) We will need supports in many forms over the life (long?) project.
I invite young and old inventors and innovators, even off-the-road thinkers to share thoughts/comments. Let us see if we can make a difference to Thailand - with our bits combined. ...
For now I say this: 'We have learned a lot in our life, but we have not learned to collaborate and create something larger than life.'...
1) 25 Nov 2556: I visited NECTEC website and selected "download" service, I was greeted with: Warning: mysql_num_fields(): supplied argument is not a valid MySQL result resource in /www/www2.nectec.or.th/services/download-db-2.php on line 50... Warning: mysql_fetch_object(): supplied argument is not a valid MySQL result resource in /www/www2.nectec.or.th/services/download-db-2.php on line 88...
2) Vaja (วาจา) for Windows is available (at a cost) from NECTEC. On http://vaja.nectec.or.th/ page, it says: Vaja 6.0 Home Edition under the code name “jRaja” (เจรจา) can be downloaded free of charge for personal use and evaluation purpose. [90MB zip file].
3) Lexto is available on NECTEC website.
4) SWATH is written by original creator: - Phaisarn Charoenpornsawat <[email protected]> Now being maintained and supported by: - Theppitak Karoonboonyanan <[email protected]> SWATH uses selectable algorithms to establish word boundaries and a trie structure of Thai words to verify words. SWATH is an ope source software and is used intensively in Thai-capable Linux distributions (libthai and thailatex).
5) Arnthai for Windows is available (at a cost) from NECTEC. It is useful for some printed materials.
6)*** It is interesting to compare teaching primary school children to read Thai with teaching a phone/tablet to read Thai (and English and ...). Perhaps the basic processes are the same (scan text; recognize words in text; say the words). Some people may say that teaching children are a lot harder.