Returning the power of natural sounding speech to
post-laryngectomy and other voice-loss patients using non-surgical,
non-invasive, non-intrusive embedded computational techniques.

For more information about post-laryngetomy life and options, please visit National Association of Laryngectomy Clubs and Cancer Laryngectomy Trust. I recommend everyone associated with laryngectomys should read the excellent book Laryngectomy is not a Tragedy by Sydney Norgate and his graddaughter Dr Nicola Oswald.

Principal Investigator	Professor Ian McLoughlin Singapore Institute of Technology, Singapore (Note: when the project began, the PI was in School of Computer Engineering, Nanyang Technological University, Singapore. However much of the work described here was undertaken while the PI was a Professor in the University of Science and Technology of China, Hefei, China)
Co-Investigator	Dr Hamid Reza Sharifzadeh Senior Lecturer in Computing and Information Technology Unitec Institute of Technology 139 Carrington Road Mount Albert Auckland 1025 New Zealand (Dr Sharifzadeh began work on this while working as a postdoctoral researcher, and before that a research student at Nanyang Technological University, Singapore)
Co-Investigator2	Dr Olivier Perrotin CNRS Researcher GIPSA-Lab Dept. of Parole et Cognition Domaine Universitaire, BP 46 38402 Saint Martin d'Hères cedex France
The contributions of the following people to the Bionic Voice project are acknowledged with many thanks, particularly Dr Paul Mok and Dr Nelson Chee (who first coined the name).
Collaborators for original project	Dr Forest Tan (Co-PI on NMRC project 2006-2009) BEng, PhD (Warwick), MIEEE Associate Professor Singapore Institute of Technology Singapore
	Jingjie LI (Bionic Voice researcher 2012-2014) The National Engineering Laboratory of Speech and Language Information Processing (NEL-SLIP) The University of Science and Technology of China 443 Huangshan Lu Hefei, China 230027
	Farzaneh Ahmadi (Bionic Voice researcher 2007-2011) Postdoctoral researcher The University of Sydney Sydney Australia
	Dr Paul Mok (Collaborator & Advisor 2006-2011) Senior Consultant Head, Department of Otolaryngology, Head & Neck Surgery Alexandra Hospital, Singapore. Visiting Consultant Tan Tock Seng Hospital and National University Hospital. Clinical Tutor National University of Singapore.
	Dr Nelson Chee (Collaborator & Advisor 2006-2011) Director & Senior Consultant Chee Ear Nose Throat Surgery, Mount Elizabeth Medical Centre. Visiting Consultant Tan Tock Seng Hospital and Alexandra Hospital.

Aims

The Bionic Voice project, aims to provide post-laryngectomised patients with a natural sounding voice, preferably modelled after their original voice, using non-surgical and non-invasive computer engineering techniques which are effectively creating an external voice box.

Lay Abstract

The idea behind this project is to create teachnology able to recreate natural sounding speech from the hoarse whisper-like voice of those who have undergone a partial laryngectomy or who have damaged or non-functional larynx, and eventually also for total laryngectomees who breathe through a stoma.
We envisage something that might be a small belt-mounted unit, or built inside a smartphone that could be connected to earphones or perhaps a neck-mounted microphone (just like Bluetooth earpieces for an iPhone).
The issues that we are working to overcome are (i) how to create natural sounding speech that is personalised to the user, (ii) how to make this 'say' what the user wants, (iii) how to do all of this automatically by computationally interpreting the whispery voice they currently have, (iv) how to reliably capure (record) that voice, which is often very quiet, in the presence of louder background sounds.

Up to now we have invented, and published, quite a few new computational techniques for items (i) to (iii). Several tricky areas of in-depth research remain, but we are now at a stage where we can envisage building a prototype unit and are actively seeking for funding to do this.

As a related issue, of growing urgency and importance, we are working on ways to help post-laryngectomised patients (and others with similar larynx-related voicing disabilities) access the wide variety of speech-based services that are being launched - from smartphone voice access to smart homes and cars. We are also seeking funding for this research.

Background

At present post-laryngectomized and speech impaired patients lose the functionality of their larynx in speech production, and therefore require surgical or prosthetic help to regain their speaking ability. Existing solutions for such patients, either pitchless (e.g. trachea-oesophagus speech) or pitched (e.g. electrolarynx), have specific shortcomings, and result in unnatural-sounding speech. The tracheo-oesophageal puncture (TEP) is a surgical valve that is fitted between the oesophagus and the trachea, which resonates just like the glottis should, and provides 'voicing' in speech. This is an excellent system that has enabled many patients to regain speech, but it requires surgical intervention, needs a mechanical valve to be fitted to the patients neck (allied with the need to keep this clean, prevent infection and replace it when it wears out).

Similarly, the electrolarynx has also been a wonderful device for many patients, but since it must be pressed manually to the neck, it requires one hand to be occupied during use. Even the few 'hands free' adaptors that have been invented are not great solutions (requiring, for example, having a tube inserted into the mouth). Furthermore, the pitch or 'buzzing' sound that the electrolarynx uses to resonate the vocal tract is continuous. In normal speech, pitch is only used for voiced phonemes, and is absent at other times (such as frictives, breathings, glottal stops and gaps between words), but the electrolarynx is continuous – and can be quite monotonous.

Methods

This project has investigated, and progressed, three main technological methods to achieve the goals:

To shift the pitch injection mechanism to the vocal tract output, instead of the input (i.e. in or near the mouth). Since the VT is a linear time invariant (LTI) system – or at least is usually considered to be one – then in theory it doesn't matter where the pitch injection happens. At the glottis, in the mouth, at the lips or elsewhere. This may sound easy, but in reality there are a number of difficulties to be overcome before this method can work reliably time after time for all speakers.
To investigate computation methods of adding artificial pitch excitation into the whisper-like voice (or indeed actual whispers) from a post-laryngectomised patient. We have three main methods of doing this (a) a CELP-based reconstruction system, (b) a parametric whisper-to-speech reconstruction system that follows a vocoder approach and (c) a statistical voice conversion approach using GMM and more recently DNN.
To use a low-frequency ultrasonic signal generated in front of the mouth to resonate inside the VT to map its shape. The received signal is then analysed to augment the actual whispered speech. This uses the vocal tract physiology in a similar way to the well-established electrolarynx, but the non-audible excitation conveys advantages (it is not heard as a continuous noise – in fact, it is not heard at all except by the prosthesis unit).

Each of these systems is non-surgical, requires contact with the skin only (so is non-invasive), and if we can crack the automatic phoneme mapping problems, we have already demonstrated that intelligible and natural-like speech can result.

Progress to date

Methods (i) and (ii); so far, we have demonstrated the recreation of more natural-sounding speech from the whisper-like utterances of the post-laryngectomised patient in the laboratory, have determined how to capture and handle this speech, and worked on the vowel and formant mapping of these whispers (i.e. how do whisper-like phonemes relate to naturally spoken phonemes, especially voiced phonemes because voicing is completely absent in these patients). We have also achieved some good results on isolated phonemes, words and even sentences. Method (ii)(c) is derived from work by Associate Professor Tomoki Toda (http://isw3.naist.jp/~tomoki) et. al. at Nara Institute, Japan and is the highest quality method that we have. Method (ii)(b) is the lowest complexity, achieving a slight improvement in quality over simple whispers. Method (ii)(a), for phonemes and simple words can have quality that lies between (b) and (c). However when it is extended to sentences, it performs poorly.

We usually have some samples of the latest output from these methods on the following webpage:

http://www.lintech.org/Reconstruction

Method (iii); so far we have used this for voice activity detection (VAD) in noise, where it performs extremely well. However extending it to a full whisper reconstruction system is extremely challenging, and so work continues. There is more detail on the VAD method on the following webpage:

http://www.lintech.org/savad

Publications

Here are some of our publications from this work (or associated with it):

BOOK:

2016/07 McLoughlin I, Speech and Audio Processing: a Matlab-based approach, Cambridge University Press, ISBN 978-1107085466

2009/02 McLoughlin I, Applied Speech and Audio Processing, Cambridge University Press, ISBN 978-05215-1954-0

BOOK CHAPTERS:

2012/12 H.R.Sharifzadeh, McLoughlin I, “From Whispers to Normal Speech: Offering Natural Voice to Laryngectomees”, in Larynx: Surgical Procedures, Complications and Disorders, Nova Science Publishers, New York, pp. 105-122. ISBN 978-162-2570-96-6

2010/04 H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Speech rehabilitation methods for laryngectomised patients”, Chapter 51, in Lecture Notes in Electrical Engineering, Vol. 60, Springer, April 2010. ISBN 978-90-481-8775-1

2009/03 H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Regeneration of speech in voice-loss patients”, IFMBE Proceedings, Vol. 23, pp.1065-1068. ISBN 978-3-540-92840-9

2009/05 F. Ahmadi, McLoughlin I, “Ultrasonic mapping of human vocal tract for speech synthesis”, chapter accepted, May 2009, and will appear in the book Recent Advances in Signal processing, ISBN 978-95-376-1941-1

2008/11 McLoughlin I, H. R. Sharifzadeh, “Speech Recognition for Smart Homes, in Speech Recognition, by ITech Book publishers, Vienna, Austria, ISBN 978-95-376-1929-9

JOURNALS:

2014/11 Li J.-J., McLoughlin I, Dai L.-R., Ling Z.-H., “Whisper-to-speech conversion using restricted Boltzmann machine arrays”, IET Electronics Letters, vol. 50, issue 24, Oct. 2014, pp. 1781-1782 [impact factor 0.97]

2014/09 McLoughlin I, “Super-audible Voice Activity Detection”, IEEE Transactions on Audio, Speech and Language Processing, vol. 22, no. 8, Sept. 2014, pp. 1424-1433 [impact factor 1.848]

2014/09 McLoughlin I, Song Y, “Mouth State Detection From Low-Frequency Ultrasonic Reflection”, Journal of Circuits, Systems & Signal Processing,vol. 34, issue 4, March 2015, pp. 1279-1304 [impact factor 1.26]

2014/11 McLoughlin I, Sharifzadeh HR, Tan SL, Li J.-J., Song Y, “Reconstruction of phonated speech from whispers using formant-derived plausible pitch modulation”, ACM Transactions on Accessible Computing, April 2015

2011/12 Ahmadi F, McLoughlin I, Chauhan S, Ter-Haar G, “Bio-effects and safety of low intensity, low frequency ultrasonic exposure”, Progress in Biophysics & Molecular Biology [impact factor 3.96]

2011/12 Sharifzadeh HR, McLoughlin I, Russell M, “A comprehensive vowel space for whispered speech”, J. Voice, DOI: 10.1016/j.jvoice.2010.12.002 [impact factor 1.58]

2010/10 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Reconstruction of Normal Sounding Speech for Laryngectomy Patients through a Modified CELP Codec”, IEEE Trans. Biomedical Engineering, Vol. 57, Issue 10, pp. 2448 – 2458 [impact factor 2.15]

2010/02/26 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Ultrasonic propagation through the human vocal tract”, IET Electronics Letters, 18 March 2010 [impact factor 0.97]

2009/11 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Voiced speech from whispers for post-laryngectomised patients”, Journal of the Int. Assoc. of Engineers, vol. 26, no.4 Oct. 2009.

2010/01 McLoughlin I, “Vowel Intelligibility in Chinese”, IEEE Transactions on Audio, Speech and Language Processing, Vol.18, No.1, Jan 2010, pp.117-125 [impact factor 1.848]

2008/01 McLoughlin I, “Subjective intelligibility testing of Chinese speech”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, Issue 1, pp.23-33 [impact factor 1.848]

2007/12 McLoughlin I, “A Review: Line Spectral Pairs”, Signal Processing Journal, doi:10.1016/j.sigpro.2007.09.003 [impact factor 1.256]

2003/02 McLoughlin I, Ding ZQ, Tan EC, "Extension of proposal of standard for intelligibility tests of Chinese Speech - CDRT-Tone, IEE Proceedings - Vision, Image & Signal Processing, Vol.150, Issue. 1, Feb. 2003 [impact factor 0.762]

CONFERENCES:

2016/10 Zhipeng Xie, Jun Du, McLoughlin I, Yong Xu, Feng Ma, Haikun Wang, “Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor”, The 10th International Symposium on Chinese Spoken Language Processing, Tianjin, China, 17 October 2016.

2016/05 Allen YJ, Sharifzadeh HR, Mcloughlin I, Sarrafzadeh A, Ardekani I, “Acoustic analysis and computerised reconstruction of speech in laryngectomised individuals”, 137th Annual Meeting of American Laryngological Association (ALA), Combined Otolaryngology Spring Meetings (COSM 2016).

2015/12 Hamid Sharifzadeh, Amir Rassouliha, McLoughlin I, Iman Ardekani and Jacqueline Allen, “Phonated Speech Reconstruction Using Twin Mapping Models”, 15th IEEE International Symposium on Signal Processing and Information Technology, Abu Dhabi, UAE, December 2015

2015/12 Hamid Sharifzadeh, Iman Ardekani and McLoughlin I, “Comparative Whisper Vowel Space for Singapore English and British English Accents”, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2015), Hong Kong, December 2015

2015/09 Li Jingjie, McLoughlin I, Lui Cong, Xue Shaofei, Si Wei, “Multi-Task Deep Neural Network Acoustic Models With Model Adaptation Using Discriminative Speaker Identity For Whisper Recognition”, IEEE Conf. On Acoustics, Speech and Signal Processing (ICASSP), Brisbane, April 2015

2014/09 Li Jingjie, McLoughlin I, Song Y, “Reconstruction of pitch for whisper-to-speech conversion of Chinese”, ISCSLP2014, Singapore, Sept. 2014

2014/09 McLoughlin I, “The Use of Low-Frequency Ultrasound for Voice Activity Detection”, Proc. Interspeech 2014, Singapore, Sept. 2014

2014/06 McLoughin I, Xie ZP, “Speech Playback Geometry for Smart Homes”, IEEE Int. Symp. On Consumer Electronics Jeju, Korea, 21 June 2014

2013/08 McLoughin I, Li Jingjie, Song Yan, “Reconstruction of continuous voiced speech from whispers”, Interspeech 2013, Lyon, France, August 2013

2013/08 Ahmadi F, McLoughin I,“Human Mouth State Detection Using Low Frequency Ultrasound”, Interspeech 2013, Lyon, France, August 2013

2013/06 Ahmadi F, McLoughin I, “A new mechanical index for gauging the human bio-effects of low frequency ultrasound”, The 35th annual international conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, June 2013

2012/06 Sharifzadeh HR, McLoughlin I, “Whisper Vowel Diagrams for Singapore English”, 9th International Conference on Communications COMM 2012 Bucharest, Romania

2012/05 Ahmadi F, McLoughin I. “Measuring resonances of the vocal tract using frequency sweeps at the lips”, Int. Symp. Communications, Control and Signal Processing Rome, Italy

2012/06 Sharifzadeh HR, McLoughlin I, “Bionic voice for laryngectomees with an insight into whispered vowels”, Cutting Edge Laryngology 2012 Conference, Kuala Lumpur, Malaysia

2011/01 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Artificial Phonation for Patients Suffering Voice Box Lesions”, International Conference on Bioengineering, Singapore

2011/07 Sharifzadeh HR, McLoughlin I, “Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec”, EPS International Forum on Rehabilitation Medicine, Nanjing, China

2010/09 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Autoregressive modelling for linear prediction of ultrasonic speech”, INTERSPEECH2010 ,Japan

2010/12 McLoughlin I, Sharifzadeh HR, “Toward a Comprehensive Vowel Space for Whispered Speech”, The 7th International Symposium on Chinese Spoken Language Processing, Tainan, Taiwan

2010/05 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Spectral Enhancement of Whispered Speech Based on Probability Mass Function”, accepted for The Sixth Advanced International Conference on Telecommunications, AICT2010, Barcelona, Spain.

2009/07 Sharifzadeh HR, F. Ahmadi, McLoughlin I, “Speech reconstruction in post laryngectomised patients by formant manipulation and pitch profile generation”, World Congress on Engineering, London, UK. Received best paper award

2008/12 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Regeneration of Speech in Voice-Loss Patients”, The 13th International Conference on Biomedical Engineering, Singapore.

2008/11 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Analysis-by-Synthesis Method for Whisper-Speech Reconstruction”, 2008 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2008, Macau

2007/12 McLoughlin I, Sharifzadeh H. R., “Speech Recognition Engine Adaptions for Smart Home Dialogues”, 6th Int. Conf. on Information, Communications and Signal Processing, Singapore

2007/06 McLoughlin I, “Tone Discrimination in Mandarin Chinese”, 14th International Conference on systems, Signals and Image Processing IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services EC-SIPMCS 2007, June 2007, Maribor, Slovenia

2005/11 Chong FL, McLoughlin I, Pawlikowski K, “A methodology for improving PESQ accuracy for Chinese speech”, IEEE TENCON2005, Melbourne, Australia, Nov. 2005

PATENTS RELATED TO THE WORK:

2012/02 McLoughlin I, Ahmadi F, Method and apparatus for mouth state determination using acoustic information”, Pat. Pending UK (1202662.1) [lapsed]

2009/08 McLoughlin I, Sharifzadeh H R, Ahmadi F, “Apparatus and Method for Whisper Voice Generation”, US Provisional patent application no. 61/236,680 [lapsed].

2006/09 Mcloughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems”, European Patent No. 03784706.8-2412-NZ0300176, filed 08/08/2003 [superceded]

2002/08 McLoughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems”, NZ Patent 520650, granted January 2006 [superceded]

2002/08 McLoughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems subsidiary patent”, NZ Patent 537902, granted January 2006 [superceded]

1998/07 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", PCT International Patent no. PCT\GB98\01936, 01 Jul 1998.

1998/06 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", South Africa patent no. 98-5607, 26 Jun 1998.

1998/06 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Chile patent no. 1998-1471, 24 Jun 1998.

1998/06 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Indian patent no. 1725/Del/98, 22 Jun 1998.

1998/04 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", USA patent no. 09/065239, 23 Apr 1998.

1998/04 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Canadian patent no. 2235455, 21 Apr 1998.

1997/07 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", United Kingdom patent no. 9814279.7, 02 Jul 1997.

Acknowledgements

The PI gratefully acknowledges the initial finaicial support of the Singapore National Medical Research Council Exploratory Development Grant (EDG07MAY002), and a British Council Collaborative Development Award (2007), without which this project would not be possible. The PI also wishes to express his thanks to the China Universities Central Research fund for grant no. KY2100060002 which funded his work on whisper reconstruction of Chinese. Thanks are also due to our collaborators, past, present and future, in Singapore's public hospitals, New Zealand hospitals, UK Hositals (particularly East Kent Hospitals University NHS Foundation Trust) and Dr Peter Nicholls of Kent Health. In particular, thanks for the co-operation and selfless support from the students, post-laryngectomised patients and caregivers who have volunteered for pre-clinical trials, recordings and consultations.