How to Listen and Learn

sr
ติดตาม ผู้ติดตาม 
ติดต่อ

Pronunciation 2: How to Listen and Learn


We have looked very briefly 'how we learn to read and write Thai'. Getting computer (or machine) to read out Thai text is not a trivial exercise. Yet text-to-speech (tts) technologies can accelerate social and economic development. Thai tts can increase productivity and learning opportunities. We can see that one obstacle can be removed if we have explicit pronunciations for all words in our best reference Royal Society Thai Dictionary [RSTD]. I will continue asking for pronunciation in RSTD.


We can think of TTS as technologies to translate 'text' (graphics) to speech (sound waves) for us to hear.


Let us now have a quick look at Speech Recognition technologies (SR) that 'listen' or obtain sound waves and translate voices or speech into text or writing for us to read, and hopefully one day into understanding and actions to serve us. In simple words, we ask: Can we teach machine (or computer) to listen and somehow translate sound waves into (standard sound waves and) words then to write out text representing the words?


Even when we have pronunciation for all words, not every one will pronounce words in the same way. Sound waves from different readers reading the same text can be quite different. We have learned to listen and somehow normalize sound waves into meaningful words.


How did we learn to listen and to write up words from voices we hear? [เขียนตามคำบอก/Dictation]
How did we learn what each word we hear mean? [นิยาม/definition]
How did we learn what arrangements of words mean? [ความเข้าใจความหมาย/comprehension]
How did we learn to arrange words to mean what we want in a context? [ไวยากรณ์/grammar]



*mynote* We can think of tts and stt (or sr) as data conversion technologies to deliver data (or information) to us in the form (or mode) we choose. We have seen 'charts', graphs', 'tables' and 'infographics'. They are different forms for delivering data. By extending the concept of forms of data, we can think of data conversion into 'speech', 'pictures', audio-visual, 'actions' or 'control', and so on.

*mynote* Many words in Thai have roots in Pali (via Theravāda Buddhists), Sanskrit (via Mahayāna Buddhists), old Khmer, Mon, Malay, Chinese,... (via social interactions). The meanings of these borrowed words may be different from their root meanings.

ไวยากรณ์ : น. วิชาภาษาว่าด้วยรูปคำและระเบียบในการประกอบรูปคำให้เป็นประโยค. (ป. เวยฺยากรณ; ส. ไวยากรณ ว่า นักศึกษาไวยากรณ์, วฺยากรณ ว่า ตำราไวยากรณ์). {RSTD}
veyyākaraṇa: explanation. (m.), one who knows grammar or how to explain. (nt.) {PED}


*mynote* การถอดอักษรไทยเป็นอักษรโรมันแบบถ่ายเสียงของราชบัณฑิตยสถาน (Romanizing Thai words/names) as outlined and used officially by the Royal Society does not
= differentiate short vowels and long vowels [-ะ, -า -> a; –ัวะ, –ัว, –ว– -> ua]
= differentiate homonymic characters [ข ค ฆ -> kh; ฐ, ฑ, ฒ, ถ, ท, ธ -> th; จ, ฉ, ช -> ch]
= differentiate tonal levels [ข, ค -> kh; ส, ษ, ศ, ซ -> s; no วรรณยุกต์ (tone mark)]
= intuitively suggest how to pronounce [เ–ือะ, เ–ือ -> uea; เ–ือย -> ueai; –ะ, –ั, รร (มีตัวอักษรอื่นตาม), –า -> a]
= consistently apply phonetic rules [compare (–ิ, –ี -> i) and (เ–ย -> oei; เ–ือย -> ueai; –วย -> uai) - so i is used for (–ิ, –ี) and as (ใ–, ไ–, –ัย, ไ–ย, –าย -> ai)
= consider heteronymic characters โ–ะ, –, โ–, เ–าะ, –อ -> o; เ–อะ, เ–ิ, เ–อ -> oe
= allow accurate reversion of romanized words to Thai words

<p “=””>
It is recognized that romanization of Thai words is very difficult due to different sizes of (Thai and Roman) alphabets, implicit and explicit tonal quality, and different sets of phonetics where extension and normalization of one language become foreign in another language, and so on. However, differences listed above should be minimized or redesigned to minimize time and resources and to improve conversion in both directions.


Guidelines for transliterating foreign text to Thai and Romanizing Thai words are available (in pdf) free for download from www.royin.go.th </p>


Sounds (or voices) are records digitally by taking several thousand samples (as numeric representations) of frequencies and amplitudes per second. (Those interested can look at .wav files or in compression formats such as .ogg or .mp3.) So each of even simple sounds (vowels) is encoded by several thousand bytes in size (compare with a few bytes encoding for each letter of alphabet). The sound recordings (encodings) of different people saying (or reading) the same thing are different. So, simple matching of sound waves is not useful. More complex mathematics and processing are needed to convert a unit of sound into a recognozible 'word' then its text representation. It is enough to say that stt (with examples like Dragon NaturallySpeaking, Siri, Google Now, Cortana, Assistant.ai, Indigo and many more. See https://en.wikipedia.org/wiki/List_of_speech_recognition_software) is one emergent technology that currently draw a lot of efforts. It is unfortunate that there is not yet a Thai language capable stt.


Research in machine learning as highlighted by Artificial Intelligence (AI) is also receiving a lot of attention especially with applications in robots, Internet of Things (IoT), Augmented/virtual Reality (in military operations and medical surgery) and smart or educational toys/gadgets industry. A typical scenerio of applications might be illustrated by HAL in 2001: A Space Odyssey or AUTO-PILOT and other robots in Wall-E. There machines are serving human masters in the term that human can understand (because of limited capacity of human sensory organs). Today some 12 languages (English, German, Spanish,... Japanese) are learned in robots classes. Anyone knows of machines or robots learning Thai, please let us know.


We will have a rest to think about this a little more before we continue.
###

บันทึกนี้เขียนที่ GotoKnow โดย  ใน พจนานุกรม ฉบับราชบัณฑิตยสถาน 2554



ความเห็น (0)