Bristol Wearable Computing

[ Home | Plans | Meetings | Members | Search | Papers | Links | CyberWear | LocoSoft]


Text to Speech


Due to the memory and hard disk constraints of the cyber jacket, it has been decided to use the Mbrola text to speech package. Mbrola is not capable of rendering text-to-speech directly. Several other pieces of software are required to simulate the full text-to-speech process. They are:

FreePhone
A text to phoneme package by Alistair Conkie.

Sox
A small audio utility program

PlayResponse( ), audio.c
A simple PCM audio file player

The text-to-speech function in audio.c takes a string as an argument which is to be rendered as speech. All of the above programs are called directly with system calls.
Firstly the TTS function writes the string into a file called /tmp/text.txt. A system call to FreePhone is then made which produces an output file called /tmp/temp.pho. This file contains the phonetic representation of the original string. If the call to the TTS function is done with the Lex flag, the FreePhone command line is issued with the lib/lexicon option. This forces FreePhone to use the English lexicon found in the /lib directory. Using the English dictionary significantly improves the quality of the phonetic transcription. The English lexicon database lexicon.pag can be found in the /lib directory. When FreePhone uses lexicon.pag it creates a second file called lexicon.lib. This is an index to lexicon.pag, which is subsequently used by FreePhone.
A system call to Mbrola then results in /tmp/temp.pho being transformed into /tmp/temp.pcm. The Mbrola command line is issued with the en1 option. This instructs Mbrola to use the English dictionary en1 to generate the audio file. It is important to mention that the sampling rate of the created audio file is dependent upon the dictionary used. The English dictionary used creates audio files sampled at 16000 smps/sec. Our audio play software on the cyber jacket is only capable of rendering audio files at 8000 smps/sec. Unfortunately, a dictionary generating 8000 smps/sec audio files can not be obtained as yet.
To convert the created audio file we therefore issue a system call to Sox. Sox is a general-purpose audio utility program that is capable of manipulating multiple audio file formats. We instruct Sox to convert a raw PCM audio file from 16K to 8K smps/ sec. This causes Sox to generate a file called out.raw. This file may then be played with a call to the PlayResponse function.

tts.gif (8763 bytes)


unicrest.gif (4191 bytes)

The material displayed is provided 'as is' and is subject to use restrictions.
For problems or questions regarding this web contact Cliff Randell.
Last updated: January 14, 2000.
logoep.gif (1404 bytes)
ęCopyright Hewlett-Packard 1997-2000.