Returning the power of natural sounding speech to
post-laryngectomy and other voice-loss patients using non-surgical,
non-invasive, non-intrusive embedded computational techniques.

For more information about post-laryngetomy life and options, please visit National Association of Laryngectomy Clubs and Cancer Laryngectomy Trust. I recommend everyone associated with laryngectomys should read the excellent book Laryngectomy is not a Tragedy by Sydney Norgate and his graddaughter Dr Nicola Oswald.

Principal Investigator

Dr Ian McLoughlin
School of Computing (Medway)
The University of Kent,
Chatham Maritime,
Kent, UK
(Note: when the project began, the PI was in School of Computer Engineering, Nanyang Technological University, Singapore. However most of the work described here was undertaken while the PI was a Professor in the University of Science and Technology of China, Hefei, China)

Dr Hamid Reza Sharifzadeh


Dr Hamid Reza Sharifzadeh
Senior Lecturer in Computing and Information Technology
Unitec Institute of Technology
139 Carrington Road
Mount Albert
Auckland 1025
New Zealand
(Dr Sharifzadeh began work on this while working as a postdoctoral researcher, and before that a research student at Nanyang Technological University, Singapore)

The contributions of the following people to the Bionic Voice project are acknowledged with many thanks, particularly Dr Paul Mok and Dr Nelson Chee (who first coined the name).

Collaborators for original project

Dr Forest Tan (Co-PI on NMRC project 2006-2009)
BEng, PhD (Warwick), MIEEE
Associate Professor
Singapore Institute of Technology

Jingjie LI (Bionic Voice researcher 2012-2014)
The National Engineering Laboratory of Speech and Language Information Processing (NEL-SLIP)
The University of Science and Technology of China
443 Huangshan Lu
Hefei, China 230027

Farzaneh Ahmadi (Bionic Voice researcher 2007-2011)
Postdoctoral researcher
The University of Sydney

Dr Paul Mok (Collaborator & Advisor 2006-2011)
Senior Consultant
Head, Department of Otolaryngology, Head & Neck Surgery
Alexandra Hospital, Singapore.
Visiting Consultant
Tan Tock Seng Hospital and National University Hospital.
Clinical Tutor
National University of Singapore.

Dr Nelson Chee (Collaborator & Advisor 2006-2011)
Director & Senior Consultant
Chee Ear Nose Throat Surgery,
Mount Elizabeth Medical Centre.
Visiting Consultant
Tan Tock Seng Hospital and Alexandra Hospital.


The Bionic Voice project, aims to provide post-laryngectomised patients with a natural sounding voice, preferably modelled after their original voice, using non-surgical and non-invasive computer engineering techniques which are effectively creating an external voice box.

Lay Abstract

The idea behind this project is to create teachnology able to recreate natural sounding speech from the hoarse whisper-like voice of those who have undergone a partial laryngectomy or who have damaged or non-functional larynx, and eventually also for total laryngectomees who breathe through a stoma.
We envisage something that might be a small belt-mounted unit, or built inside a smartphone that could be connected to earphones or perhaps a neck-mounted microphone (just like Bluetooth earpieces for an iPhone).
The issues that we are working to overcome are (i) how to create natural sounding speech that is personalised to the user, (ii) how to make this 'say' what the user wants, (iii) how to do all of this automatically by computationally interpreting the whispery voice they currently have, (iv) how to reliably capure (record) that voice, which is often very quiet, in the presence of louder background sounds.

Up to now we have invented, and published, quite a few new computational techniques for items (i) to (iii). Several tricky areas of in-depth research remain, but we are now at a stage where we can envisage building a prototype unit and are actively seeking for funding to do this.

As a related issue, of growing urgency and importance, we are working on ways to help post-laryngectomised patients (and others with similar larynx-related voicing disabilities) access the wide variety of speech-based services that are being launched - from smartphone voice access to smart homes and cars. We are also seeking funding for this research.


At present post-laryngectomized and speech impaired patients lose the functionality of their larynx in speech production, and therefore require surgical or prosthetic help to regain their speaking ability. Existing solutions for such patients, either pitchless (e.g. trachea-oesophagus speech) or pitched (e.g. electrolarynx), have specific shortcomings, and result in unnatural-sounding speech. The tracheo-oesophageal puncture (TEP) is a surgical valve that is fitted between the oesophagus and the trachea, which resonates just like the glottis should, and provides 'voicing' in speech. This is an excellent system that has enabled many patients to regain speech, but it requires surgical intervention, needs a mechanical valve to be fitted to the patients neck (allied with the need to keep this clean, prevent infection and replace it when it wears out).

Similarly, the electrolarynx has also been a wonderful device for many patients, but since it must be pressed manually to the neck, it requires one hand to be occupied during use. Even the few 'hands free' adaptors that have been invented are not great solutions (requiring, for example, having a tube inserted into the mouth). Furthermore, the pitch or 'buzzing' sound that the electrolarynx uses to resonate the vocal tract is continuous. In normal speech, pitch is only used for voiced phonemes, and is absent at other times (such as frictives, breathings, glottal stops and gaps between words), but the electrolarynx is continuous – and can be quite monotonous.


This project has investigated, and progressed, three main technological methods to achieve the goals:

  1. To shift the pitch injection mechanism to the vocal tract output, instead of the input (i.e. in or near the mouth). Since the VT is a linear time invariant (LTI) system – or at least is usually considered to be one – then in theory it doesn't matter where the pitch injection happens. At the glottis, in the mouth, at the lips or elsewhere. This may sound easy, but in reality there are a number of difficulties to be overcome before this method can work reliably time after time for all speakers.

  2. To investigate computation methods of adding artificial pitch excitation into the whisper-like voice (or indeed actual whispers) from a post-laryngectomised patient. We have three main methods of doing this (a) a CELP-based reconstruction system, (b) a parametric whisper-to-speech reconstruction system that follows a vocoder approach and (c) a statistical voice conversion approach using GMM and more recently DNN.

  3. To use a low-frequency ultrasonic signal generated in front of the mouth to resonate inside the VT to map its shape. The received signal is then analysed to augment the actual whispered speech. This uses the vocal tract physiology in a similar way to the well-established electrolarynx, but the non-audible excitation conveys advantages (it is not heard as a continuous noise – in fact, it is not heard at all except by the prosthesis unit).

Each of these systems is non-surgical, requires contact with the skin only (so is non-invasive), and if we can crack the automatic phoneme mapping problems, we have already demonstrated that intelligible and natural-like speech can result.

Progress to date

Methods (i) and (ii); so far, we have demonstrated the recreation of more natural-sounding speech from the whisper-like utterances of the post-laryngectomised patient in the laboratory, have determined how to capture and handle this speech, and worked on the vowel and formant mapping of these whispers (i.e. how do whisper-like phonemes relate to naturally spoken phonemes, especially voiced phonemes because voicing is completely absent in these patients). We have also achieved some good results on isolated phonemes, words and even sentences. Method (ii)(c) is derived from work by Associate Professor Tomoki Toda ( et. al. at Nara Institute, Japan and is the highest quality method that we have. Method (ii)(b) is the lowest complexity, achieving a slight improvement in quality over simple whispers. Method (ii)(a), for phonemes and simple words can have quality that lies between (b) and (c). However when it is extended to sentences, it performs poorly.

We usually have some samples of the latest output from these methods on the following webpage:

Method (iii); so far we have used this for voice activity detection (VAD) in noise, where it performs extremely well. However extending it to a full whisper reconstruction system is extremely challenging, and so work continues. There is more detail on the VAD method on the following webpage:


Here are some of our publications from this work (or associated with it):


The PI gratefully acknowledges the initial finaicial support of the Singapore National Medical Research Council Exploratory Development Grant (EDG07MAY002), and a British Council Collaborative Development Award (2007), without which this project would not be possible. The PI also wishes to express his thanks to the China Universities Central Research fund for grant no. KY2100060002 which funded his work on whisper reconstruction of Chinese. Thanks are also due to our collaborators, past, present and future, in Singapore's public hospitals, New Zealand hospitals, UK Hositals (particularly East Kent Hospitals University NHS Foundation Trust) and Dr Peter Nicholls of Kent Health. In particular, thanks for the co-operation and selfless support from the students, post-laryngectomised patients and caregivers who have volunteered for pre-clinical trials, recordings and consultations.