Wednesday, February 11, 2009

Text to Speech Audiobooks

The Wall Street Journal reports that some publishers and agents expressed concern over a new, experimental feature on the K2 that enables it to read text aloud with a computer-generated voice. This feature is widely available on all PCs today so what is the fuss about?

Paul Aiken, executive director of the Authors Guild is reported saying, "They don't have the right to read a book out loud, that's an audio right, which is derivative under copyright law." An Amazon spokesman in response noted the text-reading feature depends on text-to-speech technology, and that listeners won't confuse it with the audiobook experience.

So is this a storm in a teacup or a genuine threat to audiobooks as we know them today?

Firstly, we must separate the needs of the visually impaired and those just wishing for an audiobook. The former needs are slightly different and a text to speech offer may meet their needs and must be supported in that effort. Ultimately both require a higher sensitivity of audio specification than that auto generated.

We have looked hard at generating audio from text using leading text to voice synthesis technologies. We found that there were three distinct phases to achieve high quality rendering. First you need to create a stylesheet that is applicable to the text being rendered. This identifies pitch and variation and also creates time pauses and other effects when encountering certain punctuation. This may well generate a number of templates, but will be different for different genres of books. Secondly, there is the voice synthesis engine and dictionary which differ widely and are ever learning and growing, but bring that human aspect to rendering and can introduce dialect as well gender. Finally, there is the need to tidy up any mistakes, be they dictionary or punctuation. We did a fiction and STM book and achieved a full rendering in no time whatsoever. The results were very good and with effort could be excellent. We found that the production process needed expert input and although they improved productivity and reduced costs they didn't match the quality of actors and today’s audio books.

So are instant text to voice renditions going to threaten audiobooks? We don’t think so today but they will do over time. Can text to speech synthesis change audio book production? Yes potentially its almost there today. Can publishers take advantage of text to speech? Certainly, if not just to comply with disability demands. Will text to speech threaten territorial rights and offer instant translations? Well try using Google translator, it’s an excellent tool but like most translators it can make hard work of grammar and words that don’t easily translate, but it’s here today and free.

Some may think that the K2 text to speech has another objective and maybe that lies in the educational market.

1 comment:

Anonymous said...

Government documents now available on AudioBooks. Keep up with Obama's fast moving government! To download the latest government documents on audio, visit