The Bionic Voice Project

Returning the power of natural sounding speech to laryngectomy and other voice-loss patients
using non-surgical, non-invasive, non-intrusive electronics.

Principal Investigator

Dr Ian McLoughlin
Eur. Ing. BEng, PhD, CEng, MIET, SrMIEEE
Professor
School of Information Science and Technology
University of Science and Technology of China
443 Huangshan Lu
Hefei, China 230027
and
School of Computer Science
The University of Kent, Chatham Maritime, Kent, UK
(Note: when the project began, the PI was in the School of Computer Engineering, Nanyang Technological University, Singapore)



Co-Principal investigator

Dr Hamid Reza Sharifzadeh
Lecturer in Computing and Information Technology
Unitec Institute of Technology
139 Carrington Road
Mount Albert
Auckland 1025
New Zealand
(Note: when the project began, Hamid was a PG researcher and then Postdoc, also in the School of Computer Engineering, Nanyang Technological University, Singapore)


Original Collaborator

Dr Forest Tan
BEng, PhD (Warwick), MIEEE
Associate Professor
Singapore Institute of Technology
Singapore

(Note: when the project began, Forest was also a faculty member of the School of Computer Engineering, Nanyang Technological University, Singapore)


Research Student

Jingjie LI
The National Engineering Laboratory of Speech and Language Information Processing (NEL-SLIP)
The University of Science and Technology of China

443 Huangshan Lu
Hefei, China 230027


Ex-employee & PhD student

Farzaneh Ahmadi
Postdoctoral researcher
The University of Sydney
Sydney
Australia
(Note: when the project began, Farzaneh was a PhD student and Research Associate in the School of Computer Engineering, Nanyang Technological University, Singapore)



Collaborator and advisor to the original project

Dr Paul Mok
Senior Consultant
Head, Department of Otolaryngology, Head & Neck Surgery
Alexandra Hospital, Singapore.
Visiting Consultant
Tan Tock Seng Hospital and National University Hospital.
Clinical Tutor
National University of Singapore.



Collaborator and advisor to the original project

Dr Nelson Chee Wang Cheng
Director & Senior Consultant
Chee Ear Nose Throat Surgery,
Mount Elizabeth Medical Centre.
Visiting Consultant
Tan Tock Seng Hospital and Alexandra Hospital.





Aims

The Bionic Voice project, which began in 2010, aims to eventually provide post-laryngectomised patients with a natural sounding voice, preferably modelled after their original voice, using primarily non-surgical and non-invasive computer engineering techniques which could be described as creating an external [computational] voice box.


Motivation


The whole idea behind this project is to work towards an external (perhaps belt-mounted) electronic device that connects to a head- or neck-mounted sensor. This sensor could look just like (or even be the same as) a bluetooth earpeice for a mobile phone. From this, we would like to create a low-cost but effective assistive device prototype. In fact, we have also explored the possibility of implementing the entire system within an existing mobile phone – although we are not actively progressing this option at present (there is already too much work and not enough people!).


Background


At present post-laryngectomized and speech impaired patients lose the functionality of their larynx in speech production, and therefore require surgical or prosthetic help to regain their speaking ability. Existing solutions for such patients, either pitchless (e.g. trachea-oesophagus speech) or pitched (e.g. electrolarynx), have specific shortcomings, and result in unnatural-sounding speech. The tracheo-oesophageal puncture (TEP) is a surgical valve that is fitted between the oesophagus and the trachea, which resonates just like the glottis should, and provides 'voicing' in speech. This is an excellent system that has enabled many patients to regain speech, but it requires surgical intervention, needs a mechanical valve to be fitted to the patients neck (allied with the need to keep this clean, prevent infection and replace it when it wears out).

Similarly, the electrolarynx has also been a wonderful device for many patients, but since it must be pressed manually to the neck, it requires one hand to be occupied during use. Even the few 'hands free' adaptors have been invented are not great solutions. Furthermore, the pitch or 'buzzing' sound that the electrolarynx uses to resonate the vocal tract is continuous. In normal speech, pitch is only used for voiced phonemes, and is absent at other times (such as frictives, breathings, glottal stops and gaps between words), but with the electrolarynx is continuous – and can be quite monotonous.


Methods


This project has investigated, and progressed, three main technological methods to achieve the goals:


  1. To shift the pitch injection mechanism to the vocal tract output, instead of the input (i.e. in or near the mouth). Since the VT is a linear time invariant (LTI) system – or at least is usually considered to be one – then in theory it doesn't matter where the pitch injection happens. However in reality there are a number of difficulties to be overcome before this method can work.

  2. To investigate computation methods of adding artificial pitch excitation into the whisper-like voice (or indeed actual whispers) from a post-laryngectomised patient. We have three main methods of doing this (a) a CELP-based reconstruction system, (b) a parametric whisper-to-speech reconstruction system that follows a vocoder approach and (c) a statistical voice conversion approach using GMM and more recently DNN.

  3. To use a low-frequency ultrasonic signal generated in front of the mouth to resonate inside the VT to map its shape. The received signal is then analysed to augment the actual whispered speech. This uses the vocal tract physiology in a similar way to the well-established electrolarynx, but the non-audible excitation conveys advantages (it is not heard as a continuous noise – in fact, it is not heard at all except by the prosthesis unit).


Each of these systems is non-surgical, requires contact with the skin only (so is non-invasive), and if we can crack the automatic phoneme mapping problems, we have already demonstrated that intelligible and natural-like speech can result.


Progress to date


Methods (i) and (ii); so far, we have demonstrated the recreation of more natural-sounding speech from the whisper-like utterances of the post-laryngectomised patient in the laboratory, have determined how to capture and handle this speech, and worked on the vowel and formant mapping of these whispers (i.e. how do whisper-like phonemes relate to naturally spoken phonemes, especially voiced phonemes because voicing is completely absent in these patients). We have also achieved some good results on isolated phonemes, words and even sentences. Method (ii)(c) is derived from work by Associate Professor Tomoki Toda (http://isw3.naist.jp/~tomoki) et. al. at Nara Institute, Japan and is the highest quality method that we have. Method (ii)(b) is the lowest complexity, achieving a slight improvement in quality over simple whispers. Method (ii)(a), for phonemes and simple words can have quality that lies between (b) and (c). However when it is extended to sentences, it performs poorly.


We usually have some samples of the latest output from these methods on the website:

http://www.lintech.org/Reconstruction



Method (iii); so far we have used this for voice activity detection (VAD) in noise, where it performs extremely well. However extending it to a full whisper reconstruction system is extremely challenging, and so work continues. There is more detail on the VAD method on the website:

http://www.lintech.org/savad




Publications


Here are some of our publications from this work (or associated with it):

BOOK:

2009/02/01 McLoughlin I, Applied Speech and Audio Processing, Cambridge University Press, ISBN 978-05215-1954-0

BOOK CHAPTERS:

2012/12/01 H.R.Sharifzadeh, McLoughlin I, “From Whispers to Normal Speech: Offering Natural Voice to Laryngectomees”, in Larynx: Surgical Procedures, Complications and Disorders, Nova Science Publishers, New York, pp. 105-122. ISBN 978-162-2570-96-6

2010/04/01 H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Speech rehabilitation methods for laryngectomised patients”, Chapter 51, in Lecture Notes in Electrical Engineering, Vol. 60, Springer, April 2010. ISBN 978-90-481-8775-1

2009/03/15 H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Regeneration of speech in voice-loss patients”, IFMBE Proceedings, Vol. 23, pp.1065-1068. ISBN 978-3-540-92840-9

2009/05/10 F. Ahmadi, McLoughlin I, “Ultrasonic mapping of human vocal tract for speech synthesis”, chapter accepted, May 2009, and will appear in the book Recent Advances in Signal processing, ISBN 978-95-376-1941-1

2008/11/12 McLoughlin I, H. R. Sharifzadeh, “Speech Recognition for Smart Homes, in Speech Recognition, by ITech Book publishers, Vienna, Austria, ISBN 978-95-376-1929-9

JOURNALS:

2014/11/17 Li J.-J., McLoughlin I, Dai L.-R., Ling Z.-H., “Whisper-to-speech conversion using restricted Boltzmann machine arrays”, IET Electronics Letters, vol. 50, issue 24, Oct. 2014, pp. 1781-1782 [impact factor 0.97]

2014/09/?? McLoughlin I, “Super-audible Voice Activity Detection”, IEEE Transactions on Audio, Speech and Language Processing, vol. 22, no. 8, Sept. 2014, pp. 1424-1433 [impact factor 1.848]

2014/09/?? McLoughlin I, Song Y, “Mouth State Detection From Low-Frequency Ultrasonic Reflection”, Journal of Circuits, Systems & Signal Processing,vol. 34, issue 4, March 2015, pp. 1279-1304 [impact factor 1.26]

2014/11/?? McLoughlin I, Sharifzadeh HR, Tan SL, Li J.-J., Song Y, “Reconstruction of phonated speech from whispers using formant-derived plausible pitch modulation”, ACM Transactions on Accessible Computing, April 2015

2011/12/31 Ahmadi F, McLoughlin I, Chauhan S, Ter-Haar G, “Bio-effects and safety of low intensity, low frequency ultrasonic exposure”, Progress in Biophysics & Molecular Biology [impact factor 3.96]

2011/12/31 Sharifzadeh HR, McLoughlin I, Russell M, “A comprehensive vowel space for whispered speech”, J. Voice, DOI: 10.1016/j.jvoice.2010.12.002 [impact factor 1.58]

2010/10/01 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Reconstruction of Normal Sounding Speech for Laryngectomy Patients through a Modified CELP Codec”, IEEE Trans. Biomedical Engineering, Vol. 57, Issue 10, pp. 2448 – 2458 [impact factor 2.15]

2010/02/26 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Ultrasonic propagation through the human vocal tract”, IET Electronics Letters, 18 March 2010 [impact factor 0.97]

2009/11/01 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Voiced speech from whispers for post-laryngectomised patients”, Journal of the Int. Assoc. of Engineers, vol. 26, no.4 Oct. 2009.

2010/01/01 McLoughlin I, “Vowel Intelligibility in Chinese”, IEEE Transactions on Audio, Speech and Language Processing, Vol.18, No.1, Jan 2010, pp.117-125 [impact factor 1.848]

2008/01/01 McLoughlin I, “Subjective intelligibility testing of Chinese speech”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, Issue 1, pp.23-33 [impact factor 1.848]

2007/12/01 McLoughlin I, “A Review: Line Spectral Pairs”, Signal Processing Journal, doi:10.1016/j.sigpro.2007.09.003 [impact factor 1.256]

2003/02/08 McLoughlin I, Ding ZQ, Tan EC, "Extension of proposal of standard for intelligibility tests of Chinese Speech - CDRT-Tone, IEE Proceedings - Vision, Image & Signal Processing, Vol.150, Issue. 1, Feb. 2003 [impact factor 0.762]

CONFERENCES:

2014/09/18 Li Jingjie, McLoughlin I, Lui Cong, Xue Shaofei, Si Wei, “Multi-Task Deep Neural Network Acoustic Models With Model Adaptation Using Discriminative Speaker Identity For Whisper Recognition”, IEEE Conf. On Acoustics, Speech and Signal Processing (ICASSP), Brisbane, April 2015

2014/09/18 Li Jingjie, McLoughlin I, Song Y, “Reconstruction of pitch for whisper-to-speech conversion of Chinese”, ISCSLP2014, Singapore, Sept. 2014

2014/09/18 McLoughlin I, “The Use of Low-Frequency Ultrasound for Voice Activity Detection”, Proc. Interspeech 2014, Singapore, Sept. 2014

2014/06/21 McLoughin I, Xie ZP, “Speech Playback Geometry for Smart Homes”, IEEE Int. Symp. On Consumer Electronics Jeju, Korea, 21 June 2014

2013/08/01 McLoughin I, Li Jingjie, Song Yan, “Reconstruction of continuous voiced speech from whispers”, Interspeech 2013, Lyon, France, August 2013

2013/08/01 Ahmadi F, McLoughin I,“Human Mouth State Detection Using Low Frequency Ultrasound”, Interspeech 2013, Lyon, France, August 2013

2013/06/07 Ahmadi F, McLoughin I, “A new mechanical index for gauging the human bio-effects of low frequency ultrasound”, The 35th annual international conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, June 2013

2012/06/12 Sharifzadeh HR, McLoughlin I, “Whisper Vowel Diagrams for Singapore English”, 9th International Conference on Communications COMM 2012 Bucharest, Romania

2012/05/01 Ahmadi F, McLoughin I. “Measuring resonances of the vocal tract using frequency sweeps at the lips”, Int. Symp. Communications, Control and Signal Processing Rome, Italy

2012/06/01 Sharifzadeh HR, McLoughlin I, “Bionic voice for laryngectomees with an insight into whispered vowels”, Cutting Edge Laryngology 2012 Conference, Kuala Lumpur, Malaysia

2011/01/15 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Artificial Phonation for Patients Suffering Voice Box Lesions”, International Conference on Bioengineering, Singapore

2011/07/08, Sharifzadeh HR, McLoughlin I, “Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec”, EPS International Forum on Rehabilitation Medicine, Nanjing, China

2010/09/29 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Autoregressive modelling for linear prediction of ultrasonic speech”, INTERSPEECH2010 ,Japan

2010/12/01 McLoughlin I, Sharifzadeh HR, “Toward a Comprehensive Vowel Space for Whispered Speech”, The 7th International Symposium on Chinese Spoken Language Processing, Tainan, Taiwan

2010/05/08 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Spectral Enhancement of Whispered Speech Based on Probability Mass Function”, accepted for The Sixth Advanced International Conference on Telecommunications, AICT2010, Barcelona, Spain.

2009/07/01 Sharifzadeh HR, F. Ahmadi, McLoughlin I, “Speech reconstruction in post laryngectomised patients by formant manipulation and pitch profile generation”, World Congress on Engineering, London, UK. Received best paper award

2008/12/06 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Regeneration of Speech in Voice-Loss Patients”, The 13th International Conference on Biomedical Engineering, Singapore.

2008/11/03 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Analysis-by-Synthesis Method for Whisper-Speech Reconstruction”, 2008 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2008, Macau

2007/12/13 McLoughlin I, Sharifzadeh H. R., “Speech Recognition Engine Adaptions for Smart Home Dialogues”, 6th Int. Conf. on Information, Communications and Signal Processing, Singapore

2007/06/27 McLoughlin I, “Tone Discrimination in Mandarin Chinese”, 14th International Conference on systems, Signals and Image Processing IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services EC-SIPMCS 2007, June 2007, Maribor, Slovenia

2005/11/21 Chong FL, McLoughlin I, Pawlikowski K, “A methodology for improving PESQ accuracy for Chinese speech”, IEEE TENCON2005, Melbourne, Australia, Nov. 2005

PATENTS RELATED TO THE WORK:

2012/02/16 McLoughlin I, Ahmadi F, Method and apparatus for mouth state determination using acoustic information”, Pat. Pending UK (1202662.1) [lapsed]

2009/08/25 McLoughlin I, Sharifzadeh H R, Ahmadi F, “Apparatus and Method for Whisper Voice Generation”, US Provisional patent application no. 61/236,680 [lapsed].

2006/09/06 Mcloughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems”, European Patent No. 03784706.8-2412-NZ0300176, filed 08/08/2003 [superceded]

2002/08/08 McLoughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems”, NZ Patent 520650, granted January 2006 [superceded]

2002/08/01 McLoughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems subsidiary patent”, NZ Patent 537902, granted January 2006 [superceded]

1998/07/01 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", PCT International Patent no. PCT\GB98\01936, 01 Jul 1998.

1998/06/26 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", South Africa patent no. 98-5607, 26 Jun 1998.

1998/06/24 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Chile patent no. 1998-1471, 24 Jun 1998.

1998/06/22 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Indian patent no. 1725/Del/98, 22 Jun 1998.

1998/04/23 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", USA patent no. 09/065239, 23 Apr 1998.

1998/04/21 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Canadian patent no. 2235455, 21 Apr 1998.

1997/07/02 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", United Kingdom patent no. 9814279.7, 02 Jul 1997.





Acknowledgements

The PI gratefully acknowledges the initial support of the Singapore National Medical Research Council Exploratory Development Grant (EDG07MAY002), and a British Council Collaborative Development Award (2007), without which this project would not be possible. The PI also wishes to express his thanks to the China Universities Central Research fund for grant no. KY2100060002 which funded his work on whisper reconstruction of Chinese. Thanks are also due to our collaborators, past, present and future, in Singapore's public hospitals, New Zealand hospitals, and for the co-operation and selfless support from the students and post-laryngectomised patients who have volunteered for pre-clinical trials.