For more information about post-laryngetomy life and options, please visit National Association of Laryngectomy Clubs and Cancer Laryngectomy Trust. I recommend everyone associated with laryngectomys should read the excellent book Laryngectomy is not a Tragedy by Sydney Norgate and his graddaughter Dr Nicola Oswald. |
Principal Investigator |
Professor
Ian McLoughlin |
|
Co-Investigator |
Dr
Hamid Reza Sharifzadeh |
|
Co-Investigator2 |
Dr Olivier Perrotin |
|
The contributions of the following people to the Bionic Voice project are acknowledged with many thanks, particularly Dr Paul Mok and Dr Nelson Chee (who first coined the name). | ||
Collaborators for original project |
Dr
Forest Tan (Co-PI on NMRC project 2006-2009) |
|
Jingjie
LI (Bionic Voice researcher 2012-2014) |
|
|
Farzaneh
Ahmadi (Bionic Voice researcher 2007-2011) |
|
|
Dr
Paul Mok (Collaborator & Advisor 2006-2011) |
|
|
Dr
Nelson Chee (Collaborator & Advisor 2006-2011) |
|
Aims The Bionic Voice project, aims to provide post-laryngectomised patients with a natural sounding voice, preferably modelled after their original voice, using non-surgical and non-invasive computer engineering techniques which are effectively creating an external voice box.
|
Lay Abstract
The
idea behind this project is to create teachnology able to recreate natural sounding
speech from the hoarse whisper-like voice of those who have undergone a partial
laryngectomy or who have damaged or non-functional larynx, and eventually also for
total laryngectomees who breathe through a stoma.
Up to now we have invented, and published, quite a few new computational techniques for items (i) to (iii). Several
tricky areas of in-depth research remain, but we are now at a stage where we can envisage building
a prototype unit and are actively seeking for funding to do this.
As a related issue, of growing urgency and importance, we are working on ways to help post-laryngectomised patients
(and others with similar larynx-related voicing disabilities) access the wide variety of speech-based services
that are being launched - from smartphone voice access to smart homes and cars. We are also seeking funding for this
research.
|
Background At present post-laryngectomized and speech impaired patients lose the functionality of their larynx in speech production, and therefore require surgical or prosthetic help to regain their speaking ability. Existing solutions for such patients, either pitchless (e.g. trachea-oesophagus speech) or pitched (e.g. electrolarynx), have specific shortcomings, and result in unnatural-sounding speech. The tracheo-oesophageal puncture (TEP) is a surgical valve that is fitted between the oesophagus and the trachea, which resonates just like the glottis should, and provides 'voicing' in speech. This is an excellent system that has enabled many patients to regain speech, but it requires surgical intervention, needs a mechanical valve to be fitted to the patients neck (allied with the need to keep this clean, prevent infection and replace it when it wears out). Similarly, the electrolarynx has also been a wonderful device for many patients, but since it must be pressed manually to the neck, it requires one hand to be occupied during use. Even the few 'hands free' adaptors that have been invented are not great solutions (requiring, for example, having a tube inserted into the mouth). Furthermore, the pitch or 'buzzing' sound that the electrolarynx uses to resonate the vocal tract is continuous. In normal speech, pitch is only used for voiced phonemes, and is absent at other times (such as frictives, breathings, glottal stops and gaps between words), but the electrolarynx is continuous – and can be quite monotonous.
|
Methods This project has investigated, and progressed, three main technological methods to achieve the goals:
|
Progress to date Methods (i) and (ii); so far, we have demonstrated the recreation of more natural-sounding speech from the whisper-like utterances of the post-laryngectomised patient in the laboratory, have determined how to capture and handle this speech, and worked on the vowel and formant mapping of these whispers (i.e. how do whisper-like phonemes relate to naturally spoken phonemes, especially voiced phonemes because voicing is completely absent in these patients). We have also achieved some good results on isolated phonemes, words and even sentences. Method (ii)(c) is derived from work by Associate Professor Tomoki Toda (http://isw3.naist.jp/~tomoki) et. al. at Nara Institute, Japan and is the highest quality method that we have. Method (ii)(b) is the lowest complexity, achieving a slight improvement in quality over simple whispers. Method (ii)(a), for phonemes and simple words can have quality that lies between (b) and (c). However when it is extended to sentences, it performs poorly. We usually have some samples of the latest output from these methods on the following webpage: http://www.lintech.org/Reconstruction Method (iii); so far we have used this for voice activity detection (VAD) in noise, where it performs extremely well. However extending it to a full whisper reconstruction system is extremely challenging, and so work continues. There is more detail on the VAD method on the following webpage: |
|
Publications
Here are some of our publications from this work (or associated with it): |
BOOK: |
2016/07 McLoughlin I, Speech and Audio Processing: a Matlab-based approach, Cambridge University Press, ISBN 978-1107085466 |
2009/02 McLoughlin I, Applied Speech and Audio Processing, Cambridge University Press, ISBN 978-05215-1954-0 |
BOOK CHAPTERS: |
2012/12 H.R.Sharifzadeh, McLoughlin I, “From Whispers to Normal Speech: Offering Natural Voice to Laryngectomees”, in Larynx: Surgical Procedures, Complications and Disorders, Nova Science Publishers, New York, pp. 105-122. ISBN 978-162-2570-96-6 |
2010/04 H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Speech rehabilitation methods for laryngectomised patients”, Chapter 51, in Lecture Notes in Electrical Engineering, Vol. 60, Springer, April 2010. ISBN 978-90-481-8775-1 |
2009/03 H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Regeneration of speech in voice-loss patients”, IFMBE Proceedings, Vol. 23, pp.1065-1068. ISBN 978-3-540-92840-9 |
2009/05 F. Ahmadi, McLoughlin I, “Ultrasonic mapping of human vocal tract for speech synthesis”, chapter accepted, May 2009, and will appear in the book Recent Advances in Signal processing, ISBN 978-95-376-1941-1 |
2008/11 McLoughlin I, H. R. Sharifzadeh, “Speech Recognition for Smart Homes, in Speech Recognition, by ITech Book publishers, Vienna, Austria, ISBN 978-95-376-1929-9 |
JOURNALS: |
2014/11 Li J.-J., McLoughlin I, Dai L.-R., Ling Z.-H., “Whisper-to-speech conversion using restricted Boltzmann machine arrays”, IET Electronics Letters, vol. 50, issue 24, Oct. 2014, pp. 1781-1782 [impact factor 0.97] |
2014/09 McLoughlin I, “Super-audible Voice Activity Detection”, IEEE Transactions on Audio, Speech and Language Processing, vol. 22, no. 8, Sept. 2014, pp. 1424-1433 [impact factor 1.848] |
2014/09 McLoughlin I, Song Y, “Mouth State Detection From Low-Frequency Ultrasonic Reflection”, Journal of Circuits, Systems & Signal Processing,vol. 34, issue 4, March 2015, pp. 1279-1304 [impact factor 1.26] |
2014/11 McLoughlin I, Sharifzadeh HR, Tan SL, Li J.-J., Song Y, “Reconstruction of phonated speech from whispers using formant-derived plausible pitch modulation”, ACM Transactions on Accessible Computing, April 2015 |
2011/12 Ahmadi F, McLoughlin I, Chauhan S, Ter-Haar G, “Bio-effects and safety of low intensity, low frequency ultrasonic exposure”, Progress in Biophysics & Molecular Biology [impact factor 3.96] |
2011/12 Sharifzadeh HR, McLoughlin I, Russell M, “A comprehensive vowel space for whispered speech”, J. Voice, DOI: 10.1016/j.jvoice.2010.12.002 [impact factor 1.58] |
2010/10 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Reconstruction of Normal Sounding Speech for Laryngectomy Patients through a Modified CELP Codec”, IEEE Trans. Biomedical Engineering, Vol. 57, Issue 10, pp. 2448 – 2458 [impact factor 2.15] |
2010/02/26 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Ultrasonic propagation through the human vocal tract”, IET Electronics Letters, 18 March 2010 [impact factor 0.97] |
2009/11 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Voiced speech from whispers for post-laryngectomised patients”, Journal of the Int. Assoc. of Engineers, vol. 26, no.4 Oct. 2009. |
2010/01 McLoughlin I, “Vowel Intelligibility in Chinese”, IEEE Transactions on Audio, Speech and Language Processing, Vol.18, No.1, Jan 2010, pp.117-125 [impact factor 1.848] |
2008/01 McLoughlin I, “Subjective intelligibility testing of Chinese speech”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, Issue 1, pp.23-33 [impact factor 1.848] |
2007/12 McLoughlin I, “A Review: Line Spectral Pairs”, Signal Processing Journal, doi:10.1016/j.sigpro.2007.09.003 [impact factor 1.256] |
2003/02 McLoughlin I, Ding ZQ, Tan EC, "Extension of proposal of standard for intelligibility tests of Chinese Speech - CDRT-Tone, IEE Proceedings - Vision, Image & Signal Processing, Vol.150, Issue. 1, Feb. 2003 [impact factor 0.762] |
CONFERENCES: |
2016/10 Zhipeng Xie, Jun Du, McLoughlin I, Yong Xu, Feng Ma, Haikun Wang, “Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor”, The 10th International Symposium on Chinese Spoken Language Processing, Tianjin, China, 17 October 2016. |
2016/05 Allen YJ, Sharifzadeh HR, Mcloughlin I, Sarrafzadeh A, Ardekani I, “Acoustic analysis and computerised reconstruction of speech in laryngectomised individuals”, 137th Annual Meeting of American Laryngological Association (ALA), Combined Otolaryngology Spring Meetings (COSM 2016). |
2015/12 Hamid Sharifzadeh, Amir Rassouliha, McLoughlin I, Iman Ardekani and Jacqueline Allen, “Phonated Speech Reconstruction Using Twin Mapping Models”, 15th IEEE International Symposium on Signal Processing and Information Technology, Abu Dhabi, UAE, December 2015 |
2015/12 Hamid Sharifzadeh, Iman Ardekani and McLoughlin I, “Comparative Whisper Vowel Space for Singapore English and British English Accents”, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2015), Hong Kong, December 2015 |
2015/09 Li Jingjie, McLoughlin I, Lui Cong, Xue Shaofei, Si Wei, “Multi-Task Deep Neural Network Acoustic Models With Model Adaptation Using Discriminative Speaker Identity For Whisper Recognition”, IEEE Conf. On Acoustics, Speech and Signal Processing (ICASSP), Brisbane, April 2015 |
2014/09 Li Jingjie, McLoughlin I, Song Y, “Reconstruction of pitch for whisper-to-speech conversion of Chinese”, ISCSLP2014, Singapore, Sept. 2014 |
2014/09 McLoughlin I, “The Use of Low-Frequency Ultrasound for Voice Activity Detection”, Proc. Interspeech 2014, Singapore, Sept. 2014 |
2014/06 McLoughin I, Xie ZP, “Speech Playback Geometry for Smart Homes”, IEEE Int. Symp. On Consumer Electronics Jeju, Korea, 21 June 2014 |
2013/08 McLoughin I, Li Jingjie, Song Yan, “Reconstruction of continuous voiced speech from whispers”, Interspeech 2013, Lyon, France, August 2013 |
2013/08 Ahmadi F, McLoughin I,“Human Mouth State Detection Using Low Frequency Ultrasound”, Interspeech 2013, Lyon, France, August 2013 |
2013/06 Ahmadi F, McLoughin I, “A new mechanical index for gauging the human bio-effects of low frequency ultrasound”, The 35th annual international conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, June 2013 |
2012/06 Sharifzadeh HR, McLoughlin I, “Whisper Vowel Diagrams for Singapore English”, 9th International Conference on Communications COMM 2012 Bucharest, Romania |
2012/05 Ahmadi F, McLoughin I. “Measuring resonances of the vocal tract using frequency sweeps at the lips”, Int. Symp. Communications, Control and Signal Processing Rome, Italy |
2012/06 Sharifzadeh HR, McLoughlin I, “Bionic voice for laryngectomees with an insight into whispered vowels”, Cutting Edge Laryngology 2012 Conference, Kuala Lumpur, Malaysia |
2011/01 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Artificial Phonation for Patients Suffering Voice Box Lesions”, International Conference on Bioengineering, Singapore |
2011/07 Sharifzadeh HR, McLoughlin I, “Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec”, EPS International Forum on Rehabilitation Medicine, Nanjing, China |
2010/09 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Autoregressive modelling for linear prediction of ultrasonic speech”, INTERSPEECH2010 ,Japan |
2010/12 McLoughlin I, Sharifzadeh HR, “Toward a Comprehensive Vowel Space for Whispered Speech”, The 7th International Symposium on Chinese Spoken Language Processing, Tainan, Taiwan |
2010/05 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Spectral Enhancement of Whispered Speech Based on Probability Mass Function”, accepted for The Sixth Advanced International Conference on Telecommunications, AICT2010, Barcelona, Spain. |
2009/07 Sharifzadeh HR, F. Ahmadi, McLoughlin I, “Speech reconstruction in post laryngectomised patients by formant manipulation and pitch profile generation”, World Congress on Engineering, London, UK. Received best paper award |
2008/12 Sharifzadeh HR, McLoughlin I, Ahmadi F, “Regeneration of Speech in Voice-Loss Patients”, The 13th International Conference on Biomedical Engineering, Singapore. |
2008/11 Ahmadi F, McLoughlin I, Sharifzadeh HR, “Analysis-by-Synthesis Method for Whisper-Speech Reconstruction”, 2008 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2008, Macau |
2007/12 McLoughlin I, Sharifzadeh H. R., “Speech Recognition Engine Adaptions for Smart Home Dialogues”, 6th Int. Conf. on Information, Communications and Signal Processing, Singapore |
2007/06 McLoughlin I, “Tone Discrimination in Mandarin Chinese”, 14th International Conference on systems, Signals and Image Processing IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services EC-SIPMCS 2007, June 2007, Maribor, Slovenia |
2005/11 Chong FL, McLoughlin I, Pawlikowski K, “A methodology for improving PESQ accuracy for Chinese speech”, IEEE TENCON2005, Melbourne, Australia, Nov. 2005 |
PATENTS RELATED TO THE WORK: |
2012/02 McLoughlin I, Ahmadi F, Method and apparatus for mouth state determination using acoustic information”, Pat. Pending UK (1202662.1) [lapsed] |
2009/08 McLoughlin I, Sharifzadeh H R, Ahmadi F, “Apparatus and Method for Whisper Voice Generation”, US Provisional patent application no. 61/236,680 [lapsed]. |
2006/09 Mcloughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems”, European Patent No. 03784706.8-2412-NZ0300176, filed 08/08/2003 [superceded] |
2002/08 McLoughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems”, NZ Patent 520650, granted January 2006 [superceded] |
2002/08 McLoughlin I, Busch A, Churton P, Shyh-hao Kuo, Lendnal S, Mehrotra K, McConnell D, Scott T, Pow I, Spalding D (Tait Electronics Ltd), “Improvements relating to radio communications systems subsidiary patent”, NZ Patent 537902, granted January 2006 [superceded] |
1998/07 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", PCT International Patent no. PCT\GB98\01936, 01 Jul 1998. |
1998/06 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", South Africa patent no. 98-5607, 26 Jun 1998. |
1998/06 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Chile patent no. 1998-1471, 24 Jun 1998. |
1998/06 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Indian patent no. 1725/Del/98, 22 Jun 1998. |
1998/04 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", USA patent no. 09/065239, 23 Apr 1998. |
1998/04 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", Canadian patent no. 2235455, 21 Apr 1998. |
1997/07 McLoughlin I, Chance R.J (Simoco Telecommunications Ltd), "Method and Apparatus for Speech Enhancement in a Speech Communications System", United Kingdom patent no. 9814279.7, 02 Jul 1997. |
Acknowledgements The
PI gratefully acknowledges the initial finaicial support of the Singapore
National Medical Research Council Exploratory Development
Grant (EDG07MAY002), and a British
Council Collaborative Development Award (2007), without which
this project would not be possible.
The PI also wishes to express his thanks to the China Universities Central Research fund for grant no. KY2100060002 which funded his work on whisper reconstruction of Chinese.
Thanks are also due to our collaborators, past, present and future, in Singapore's public
hospitals, New Zealand hospitals, UK Hositals (particularly East Kent Hospitals University NHS Foundation Trust) and Dr Peter Nicholls of Kent Health. In particular, thanks for the co-operation and selfless support from the
students, post-laryngectomised patients and caregivers who have volunteered for
pre-clinical trials, recordings and consultations.
|