Reliable voice technology, at last – Avallain supports individualised learning approaches

What could be the perfect user interface for interactions between humans and machines? The IT industry has been pondering this question ever since the advent of advanced computer systems. When it comes to digital language learning, however, the answer is obvious – The ability to input language verbally offers unique possibilities for learners, something that keyboard-driven written input simply cannot match. Unfortunately, early attempts at tapping the potential of this input method tended to fail because of the rather underdeveloped technology that was available at the time.

How, then, does Avallain use Avallain Author and Avallain Unity voice technology to individualise and improve learning experiences and results as decisively as we do today?

How voice technology supports the learning process

Traditional user interfaces based on visual feedback, such as keyboards and touch screens, have one noteworthy disadvantage when used for learning a language – Users can only enter their words into the e-learning software via a keyboard. Consequently, they are able to train their reading, listening and writing skills in their language of choice, but they cannot effectively train their speaking skills. This is where integrating voice technology into the language learning process opens up a whole new set of possibilities:

Users can actively train their speaking skills.
Learners can receive immediate individualised feedback.
Errors in pronunciation can be detected before they fossilise.
Language teaching approaches no longer need to be based solely on written language.

Avallain has recognised the advantages that voice technology can offer early on. We have been following the developments in this field for many years, keeping an eye out for voice technology advancements which can be integrated into our products to improve them in a meaningful way. However, we do not blindly follow any new approach, as underdeveloped technology could impact both the user experience as well as the ultimate goal of learning a new language very negatively.

The three milestones on the road to a mature voice technology

Only 15 years ago, no one would have believed that voice technology – which, back then, was rather limited and frequently unreliable – would mature into advanced voice recognition software such as Siri and Alexa. Today, people all over the world can control these virtual assistants by speech, using countless different languages and dialects. And when these assistants are asked questions, their ability to give fitting answers has become remarkably reliable.

Today, voice technology is advanced enough to meet our standards of quality, which means it can offer significant benefits to users of Avallain Author and Avallain Unity. Our products allow educators to create and publish interlocking e-learning solutions which would have been considered science fiction, even in the nineties. In addition, research suggests that real breakthroughs in some of the most complex areas of voice technology have been made over the last couple of years as well.The path to the current state of the art has been very demanding, requiring us to overcome three key milestones.

First milestone: Audio recording and playback

From today’s point of view, recording and playing back speech over the internet could be considered the easiest of the three milestones to be overcome. This technology constitutes the basis of all further developments in the field. And since it already managed to meet our standards of quality in 2002, we could successfully integrate it into our Macmillan English Campus project even back then.

Today, our software supports a variety of recording and playback methods and applications. For example, Avallain Author allows user input and pre-recorded files to be merged into complete dialogue sequences, both online and offline. Thanks to Avallain Unity, the final recordings can then be sent directly to teachers for feedback, or simply as a means of verbally communicating outside of the fixed course locations.

Second milestone: Synthesised language

Compared to recording and playing back speech, creating synthetic voices audio is a much more complicated topic. For this reason, we waited until the technology had sufficiently matured in 2004 before integrating speech synthesis as a feature into our e-learning software. Back then, we entered into several partnerships with leading speech synthesis companies such as Acapela Group. These cooperations allowed us to offer exciting new features to our customers, all based on language synthesis technology. Some of these features include:

The ability to translate pre-written text into synthetic audio (text-to-speech).
Adjusting specific features of recorded speech – e.g. changing dialects to fit specific markets.
Controlled and scalable production of audio recordings.
Learners can directly translate their own written input into audio using text-to-speech (TTS).

Our current voice technology is flexible enough to create and play back authentic speech using various languages and dialects. Even UNESCO has noted how useful this feature has been for literacy programmes all over the world, describing our successful efforts to provide Swahili education to the coastal population of Kenya on page 16 of their report on international education initiatives.

Third milestone: Speech recognition

Until recently, speech recognition and assessment has always required the attention of an educator, particularly because of the great variety of possible pronunciations. And even the specialised software solutions of today, which can be valuable to automated pronunciation analysis, still only apply to very specific cases of language analysis.

Because of these limitations, our focus has been on general voice recognition features, meaning the ability of the software to recognise and accurately transcribe spoken language. In the early days, even technology developed by giants such as Microsoft, Apple and Google could fail, regardless of the quality of speech. The first companies to offer significant advances in this area were niche software providers such as Nuance who would often go on to be market leaders in areas such as recording dictations and automating customer services. However, their solutions usually required users to be trained in using their specific software.

For us, this approach to voice recognition was not an ideal solution for the education sector, as the need to teach learners how to use a learning software only creates additional obstacles on the way to education success. For this reason, we initially concentrated on using less technologically ambitious, more intuitive e-learning solutions. For example, when working on the learning platform iwdl.de, we deliberately limited voice input options in gamified exercises to individual words instead of entire sentences. Using this approach, iwdl.de has managed to become the first ever digital learning tool to be approved by the German Federal Office for Migration and Refugees for use in immigrant integration courses.

2017 – Visions of the future become a reality

Now that we have successfully overcome these milestones, voice technology opens up a whole new world of exciting possibilities to learners and educators alike. In the summer of 2017, for the first time, we will introduce an Activity Type which allows the software to give direct feedback to learners regarding the quality of their pronunciation of texts within an Activity. After that, the next important step is to introduce the ability to use spoken language within any given Input Activity. This could be used for exercises in which learners have to enter elements verbally or even for gamified exercises in which learners can simply utter the answers to specific questions – these are the near-future milestones for Avallain.

To achieve these goals in 2017, we are currently working primarily with Google’s Speech API, however we will be using three of the key software solutions in this area in the near future.

What’s next for Avallain? That will be decided through constant communication between us and our customers. But one thing is certain – as always, our customers will be the first to be able to offer the latest technologies to their end-users in a comprehensive and fully reliable form. Together, we will make individual education more comprehensive, more efficient and more exciting.

Blog