Can you remember how you learn to read and write Thai? I admit I have forgotten the processes of learning. The pains my Thai language teachers had to go through to change me from illiterate to self-motivating learners. That was many many years ago, I still owe those teachers deepest gratitude. I still remember reading old newspapers while I folded and glued pages into paper bags for the market stalls. Yes I was a fast reader then. I read headlines, captions and most times the first paragraph containing summary of each story. I had to stack newspaper pages in the right way so that I could read and fold them into bags at quickly as I could. I learned to multitask too.

How did you learn the alphabet ก (กอ) ข (ขอ) ค (คอ)...?
How did you learn แม่กา, แม่กก, แม่กด...?
Did you learn to spell กอ อะ - กะ, กอ อา - กา, ขอ อะ - ขะ, ขอ อา - ขา...?
Did you learn กอ กอ - กก (or ก ก - กก), or กอ โอะ กอ - โกกะ (or ก โ*ะ ก - กก),...?
How did you learn how to read and write invisible สระโอะ ?

Then how did you learn to read กร - กอน, พร - พอน, ทร - ทอน,...?
When did you know how read and write invisible vowels (สระโอะ and สระออ) in those cases?

Did you remember learning to write names of the months? I had for many weeks troubles read and write มกราคม and กรกฎาคม. My monk teacher insisted that the correct way to read มกราคม is มะ กะ รา [ค โ*ะ ม] and to read กรกฎาคม - กะ ระ กะ ดา [ค โ*ะ ม].

How did we learn to see invisible ะ and invisible โ*ะ even when the two are in the same word? And how did you learn invisible อ (in ก - กอ and กร - กอน)?

*mynote* I am forced to spell out สระ โอะ as 'โ*ะ' and arrange the spelling in strictly 'head consonant - vowel - tail consonant' (HVT) order to avoid confusion. Had I left it as คม then we could read it คะ มะ. Had I written it โคมะ then we may read it โค มะ.

กรกฎาคม [กะระกะ-, กะรักกะ-] น. ชื่อเดือนที่ ๗ ตามสุริยคติ ซึ่งเริ่มด้วยเดือนมกราคม มี ๓๑ วัน. ( ส. กรฺกฏ = ปู + อาคม = มา = เดือนที่อาทิตย์มาสู่ราศีกรกฎ); (เลิก) ชื่อเดือนที่ ๔ ตามสุริยคติ ซึ่งเริ่มด้วยเดือนเมษายน.

Strictly speaking อาคม (in Romanized 'aagama' - อา กะ มะ) should read อา คะ มะ in Thai.

*mynote* The months พฤษภาคม and พฤศจิกายน got me into troubles with my questions to my Thai language teacher. I asked what the differences between ส, ษ, ศ? Why do we have 3 of them สอ? Would just one สอ do for simple sake? And why ซ, ทร, สร... Answers like "I should go and live in Laos then I would not have to deal with 'complexity'" shut me up for many many years.

Let me show a snippet of my request to the Royal Society to revise RSTD:

There are some 10,097 (out of 43,181) words with pronunciations in RITD. Many of the words without pronunciations are quite difficult to read. Though 'การบอกคำอ่าน' in คำชี้แจงหลักการจัดทำและวิธีใช้พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. ๒๕๕๔ :


๑. คำที่มีตัวสะกดตรงตามแต่ละมาตรา เช่น แม่กน น สะกด แม่กบ บ สะกด อย่างคำ
คน พบ จะไม่บอกคำอ่าน

๒. คำที่มีตัว ญ ณ ร ล ฬ สะกด อ่านเหมือนตัว น สะกดอย่างหนึ่ง คำที่มีตัว ข ค ฆ
สะกดอ่านเหมือนตัว ก สะกดอย่างหนึ่ง คำที่มีตัว จ ช ฎ ฏ ฐ ฑ ฒ ต ถ ท ธ ศ ษ ส สะกดอ่าน
เหมือนตัว ด สะกดอย่างหนึ่ง คำที่มีตัว ป พ ฟ ภ สะกด อ่านเหมือนตัว บ สะกดอย่างหนึ่ง ทั้ง ๔
อย่างนี้ ในกรณีที่อาจมีปัญหาในการอ่าน จะบอกคำอ่านไว้ด้วย...

In the interest of correct reading, it is recommended that 'every word' has a 'how to read' or pronunciation. So that students of Thai language can use RITD to aid their reading practice and so improve pronunciation in speaking Thai.

Another compelling reason is to advance text-to-speech technology for Thai language. This tts has several uses for business, government and sight-impaired people. From tts, speech-to-text (or speech recognition) technology development can be accelerated. These technologies can enhance social and industrial environment in this century.

Now we see how difficult learning to read and write Thai can be. But there are only some 10,097 (out of 43,181) words in RSTD are with partial or full pronunciations. RSTD explains that only words with ambiguous or multiple pronunciations are given 'how to read' clues. The rest we have to assume that our Thai language teachers have trained us well (in school). For most cases, this assumption appears valid. (We can read and write, can't we? Never mind about the other 15% who can't read at grade 4.)

The pronunciation of Thai words has delayed development of (machine/computer) text-to-speech (tts). Even now, with examples like the National Electronics and Computer Technologies (NECTEC) release of Vaja (a software application for many platforms) that claims to be able to read out both Thai and English text passages; and Google Translate (a web based translator with tts capability - that speaker icon on the top right hand corner of the 2 panels), both show deficiencies to read 'daily newspapers' for sight impaired people. Explicit pronunciation for every word can remove the major difficulties in translating 'graphics-to-phonetics' (G2P) for machines, and so allow development in tts to focus on (synthesis or production of) 'natural or rhapsodic' speaking like we human do.

Another technological development of importance for this (21st) century is 'speech recognition' (SR). This where machine (computer) learns to translate 'speech-to-text' (stt) and then leans to 'understand' human speech. These (technological) magics have great implications in social development.

By explicitly stating 'how-to-read' words, we can remove many bottlenecks and time-comsuming tasks from core efficacious operations and creativities. We know as developing society, we can't waste time and resources time and time again on problems that we can solve once and once only.

*mynote* I think with helps from public and educational communities, we may be able to have pronunciations for all words in RSTD within 3-6 months. Maintenance of the pronunciation after that would be easy. However, a convention in 'writing' pronunciations will be needed. This convention should be designed to allow both human and machine to read words easily.

Tonal languages Like Thai have more complexity in translating graphics (text) to phonetics (speech). Simple linguistic models that specify: a word may be one-syllable word or many-syllables word. A syllable is a phonetic unit of voice in 'utterance'; a syllable consist of one vowel [อะ, อา, อิ, ,...] or one head consonant and a vowel [กะ, กา, มี, มือ,...] or one head consonant, one vowel and one tail consonant [กก กบ กรน กล้วย];... Words such as กะ, กา, กก, กิน, and กล้วย are one-syllable words; and กระทะ, กาละ, กนก, กติกา, and กลศาสตร์ are many-syllable words. Simple models fail in G2P approaches (rule-based approach fail due to complexity and conflicting rules or exceptions; neural networks/learning algorithms fail to stabilize when learning sample size increases; other approaches using statistical models, Markov chains and so on are mathematically complex though promising but experimental). The number of words in RSTD suggests a brute but cooperative force approach may have the best advantage (best result) over other 'elegant' approaches. This is what I suggest we do over social media networks.

Thai is a tonal language, tone complexity is added on in a number of ways:
= tonal consonants (sets of normal, high and low tones) : [ก ข ค, ส ซ ทร,...]
= use of วรรณยุกต์ (tonal mark)
= effect of short and long vowel on high or low consonant [ขะ ขา, คะ คา]
= effect of live or dead tail consonants [สน, สด, ลน, ลด ]
= effect of head consonant in some words [กนก, สนุก, ฉลาม]
= other preferences such as shortening [เพชร -> เพ็ด] or lenghtening [น้ำ -> น้าม]

See also เรื่องของ "ฤ"
A story of "ฤ". ฤ ( รึ ru; an independent Thai vowel adopted from Sanskrit; unicode 0E24) is an interesting vowel. Why

These make machine reading of Thai text more difficult on top of the difficulty for machine to separate words from strings of words without gaps ([white] spaces) between words. Although gaps between words and phrases improve readability, how-to-use gaps is not taught in school. The Royal Society has offerred a guideline (see หลักเกณฑ์การเว้นวรรค but considerations for machine reading have not been included.

*mynote* SWATH a software program for word analysis of Thai text for Linux/Unix systems is very capable and over 90% accurate. SWATH works better with more gaps between words.

We have more to see on pronunciation and its use. Let us carry on in the next episode.

