Please forgive me for writing this in English.

I offer my view {or theory if your want to call it that ;-) }

Learning is in a way 'a process in "pattern recognition" to become "an expert".

A body of signals or data is received by one or more of our senses (eye, ear, nose, tongue, skin and 'mind'), a pattern is formed then recognized, Patterns received by eyes are much easier to 'copy' than patterns in words (instructions which must be translated and visualized before implemented in actions). When we 'actually do' things ourselves, all our senses can be involved in recognizing 'patterns' thus many types of data reinforce our learning and so we can learn better by doing.

Applying the observations above, we would 'learn' well if we use one of our 'physiological' senses directly. We would learn better if we use more physiological senses directly at the same time. Words are indirect data/sense. Words are symbols (not raw signals) by cultural convention to represent phenomena. Words are indirect patterns that we go to school to learn to recognize. Many people learn to use 'derived' senses (such as music, game-playing, cooking, trading and mathematics) for pattern-recognition and many become 'experts' in using those senses.

Note: There are plenty of literature on senses (I suggest "Buddhistic Senses"); pattern recognition (I suggest "Neuron Networks"); learning theories and experts systems. My view above is developed privately.