Man and Machine Interactions

(Pronunciation 3): Man and Machine Interactions

We have been talking (so called) 21st century for a while without inkling of what 21st century would be like. In this series on Pronunciation, we come to see that (natural) language precessing (or conversion) will be one of the pillars of 21st century. Today, the future is bright for a dozen languages. But Thai language remains in the wilderness - the Past. With one relative simple effort we can help usher Thai language into the 21st century. By explicitly giving pronunciation for every Thai word in the form that both man and machine can use, we can experience and improve man-machine interactions. We can learn from this experience and design our Thai future in this 21st century. (Doing nothing now means Thais will live in a foreign 21st century and beyond.)

Let us start by asking the Royal Society to revise RSTD and have for each word explicit and full pronunciation. To fast-track the revision, we can offer help to verify and to specify pronunciation.
= How many teachers of Thai language are there - willing to help?
= How many pensioners and people who have Thai language expertise - willing to help?
= How many IT professionals and Thai language researchers - willing to help?
= How the Royal Society and NECTEC can organize the 'project' and 'collect' the work?

Surely, with a provision of collaborative facilities on a (royin.go.th?) website, and some public relations, we could have all pronunciation we want within 6 months. Some questionable pronunciation can be flagged, noted and considered by the Royal Society. (Inclusion of pronunciation into RSTD takes only a few mintues by computer.) This would allow other language technologies development to proceed.

*mynote* I can provide a list of Thai words on RSTD (with and without pronunciation) or sublists by category for those interested in helping. Just give me a note (here or email). I have sent a request to the Royal Society and NECTEC on issues including this.

The simple scheme (model) for specifying a syllable (an atomic unit of vocalization) that we (Thais) learn in school is (อักษร) [ตัวนำ สระ ตัวสะกด วรรณยุกต์]. For examples
ก - กอ {ก *อ}
ก (สระ)อา - กา {ก *า}
ก (สระ)อา น - กาน {ก *า น}
ก (สระ)อา น ไม้โท - ก้าน {ก *า น ้}

This is simply the way we are taught to read and to spell words in school. It seems logical and easy to understand. Only in typing that we have to type in different order like ก ไม้โท (สระ)า น to get ก้าน.

I offer a simple encoding (convention) for 'pronunciation' in braces (following the word) with an asterisk * to indicate the position of the consonant with respect to the vowel (within the word). In examples
ก (สระ)เอา - เกา {ก เ*า}
กล (สระ)เอา ไม้โท - เกล้า {กล เ*า ้}
ส เอือ เสือ ไม้โท เสื้อ - น ำ นำ ไม้โท น้ำ - ง เออ น - เงิน - เสื้อน้ำเงิน {ส เ*ือ ้ - น *ำ ้ - ง เ*อ น}

We use a subset of the same alphabet but we use space ' ' to separate characters (for clarity) and dash '-' to separate syllable. We do not substitute or omit any vowel. But การันต์ (ตัวอักษรที่ไม่ออกเสียง ซึ่งมีไม้ทัณฑฆาต ์ กำกับไว้) is silent so phonetically omitted. Abbreviations such ๆ, ฯ ล ฯ, คสช, ... are encoded as they commonly read.

This encoding follows exactly the way we learn to read and should be simple to read after mental adjustment. The encoding can easily convert to other phonetic systems and reconstruct conventional text. The encoding also offers a logical (at least more customary) order of words in the dictionary (so that สอ... สา... เสือ... เสื่อ... เสื้อ... are in that order to look up). The encoding can be (recoded and) fed into speech synthesizers (eSpeak, Festival/Flite,...) and we have tts. Though, some work is required to separate words from phrases. SWATH already does this with high accuracy.

*mynote* We can experiment constructing speech with freely available software such as 'Audacity'. We can make recording of 'ก' (say {ก เ*อะ}) and 'า' (say {อ *า}). We get 2 wave packages. We can keep about 10 milliseconds of 'ก' and append to that 40 ms of 'า' then play back the sum of wave packages. We can hear the word 'กา'. And that is the basic of speech synthesizing. The rest is making speech sound natural (humanlike).

Can we do the reverse - break a word into 'sound units' (from กา to {ก เ*อะ}+{อ *า})?
If we can then it is possible to convert sound units to letters (text) (that is to convert {ก เ*อะ} to ก, and {อ *า} to า and write out the result กา). If we can build a database of sound units (in certain encoding of wave forms) and IF we can BREAK speech into sound units amd MATCH them with our database,...

When we look at HTML - a language that we use to program webpages. We use HTML mostly to make text "look" in certain ways. We can think of a similar language but for making sound units "sound" in certain ways. (For example to make {อ *า} longer than 50 ms and louder, so we get the effect of stressing the word กา.)

What if we have another language (say HyperPhonetic Markup Language - HPML) which we can use to make sounds "act" in certain ways? So we can say "Search for XYZ" to Cortana or Google Now or Siri or Assistant.ai and get a list of webpages. Or we can say "Read X document" or "Call a cab" or "Do my homework"...

Fast foreward now, we can be using speech recognition (sr) technologies in man-machine interactions in this (21st or 26th) century. We'll be talking to our phone more often that using our phone to talk to other people. We can ask our phone questions, learn about things, get transport, health and food advice, be warned about appointments and situations,... Our children will be growing up with their personal assistant (buddy) phone, learning from each other, bonding and working as a team. No-one will ever be alone.

แล้ววันนั้น ลูกหลานไทยจะพูด ภาษาไทย กับโทรศัพท์มือถือของเขา หรือ ภาษาอื่น?
นั่นแล้วแต่ว่า วันนี้ เรามีคำอ่านออกเสียงของคำทุกคำในพจนานุกรมหรือไม่
เรา ทำ ให้เป็น ภาษาไทย และ ภาษาอื่นได้

Will they be speaking to their phone in Thai OR in other languages?
That depends on 'pronunciation' in our dictionary today.
We can change that to 'Thai AND other languages'!

เขียนเมื่อ 2 ก.ย. 2559 04:33 น. 2 ก.ย. 2559

Prof. Vicharn Panich has posted in ชีวิตที่พอเพียง : 2740. คุยกับหุ่นยนตร์ที่มีมรรยาท https://www.gotoknow.org/posts/613227

I think this would give ideas of where man-machine interactions are now in (US) English language.

The research in (alphabetical order) Chinese, French, German, Japanese, Spanish,... is intense and well supported.

No news from NECTEC on progress in this area. Last time I looked on NECTEC website, I found many broken (url) links. They are possibly to busy to keep the public informed.

(Pronunciation 3): Man and Machine Interactions

ความเห็น

บทความในวันเดียวกัน