Applied Speech and Audio Processing With MATLAB© Examples Dr Ian McLoughlin
Visit the forum and discussion page for this book:
|
|
|||||||||
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|||
1 |
This is a MATLAB-based, one-stop resource that blends speech and hearing research in describing the key techniques of speech and audio processing. This practically oriented text provides MATLAB examples throughout to illustrate the concepts discussed and to give the reader hands-on experience of important techniques. It dispenses with dry and boring theory (but links to it so it can be found when necessary), and instead gets rapidly down to manipulating and processing speech and audio. With its hands-on nature and numerous MATLAB examples, this book is ideal for graduate students and practitioners working with speech or audio systems.
This is a valuable resource for those working with speech and audio systems, whether they are senior undergraduates, or postgraduates just embarking on a career or a research project in a speech or audio related field.
|
Useful links |
MATLAB
tutorials from The
Mathworks: Introduction
to computer programming with MATLAB from UCL's Speech Hearing and
Phonics Research
Department: For
help with signal processing topics in general, the online guide
for 'Digital Signal Processing: A Practical Guide for Engineers
and Scientists, by S. W. Smith, can be very
useful: |
2 |
Basic Audio Processing
This chapter discusses the nature of sound and audio, and how it is captured by computer for processing. Learn how to load and record sounds in MATLAB, replay them at different sample rates. We will begin to cover the fundamentals of how to process audio by computer, perform Fourier transforms to view the frequency domain information, segment, overlap and window sound arrays. Simple and practical digital filtering will be introduced, before a discussion on visualization – how to plot the time and frequency domain signals (including by spectrogram, correlogram, cepstrum and so on – and create an audio chirp as an example). Finally we will conclude with a look at creating tones, frequencies and musical notes, mixing these together and replaying them. The main MATLAB functions presented in this chapter are: freqgen.mtonegen.m xcorr.m Here are some code segments from the subsections shown (to save you typing them all in): 2.4.3_continuous_filter.m2.4.3_non_continuous_filter.m 2.6.2.2_Cepstrum.m 2.7.4_make_a_chord.m
|
Useful links |
The
Wikipedia Cepstrum page has a nice explanation of this arcane
analysis method: The
MathWorks help page for Cepstrum
processing: The Complete Idiot's Guide to Music Theory, 2nd Edition (on Google books): From Google Books (click here) |
3 |
Speech
In this chapter we look at a cutaway picture of the human vocal apparatus (usually called the 'human head') and discuss how speech is produced, and its basic characteristics. We will examine the frequency and amplitude characteristics of speech, and the different parts of speech. We cover articulation, basic phonetics, speech intelligibility and speech quality (plus how to measure these!). Some of the MATLAB examples from this chapter are given below for reference: 3.3.2_SD.m3.3.2_mse.m 3.3.2_segsnr.m |
Useful links |
The Vocal Tract and Larynx page by Professor John Coleman, University of Oxford has a good explanation, including a real human head cut-away diagram! This page has more detail on the larynx and different types of voicing: The Speech Disorders website, which has a nice explanation of speech terminology and background, particularly from a speech therapists' point of view: Speech quality and evaluation page: Speech intelligibility is explored in this reasonably comprehansive page: |
4 |
Hearing
Here, we study the ear, and some of the ear-brain interactions that lead to the term 'psychoacoustics'. Starting with a cut-away diagram of the ear, we examine the characteristics of hearing – equal loudness contours, cochlea echoes, phase locking, temporal integration, auditory fatigue, adaptation, different types of masking, tonal discrimination and pitch sensation. We move the discussion to the reception of speech (since speech is probably handled by the brain in quite a different way to general musical sounds), and then develop some analytical models of the human auditory system's response to sounds. This chapter considers psychoacoustics quite extensively, especially in terms of auditory scene analysis. Some neat examples are given toward the end of the chapter to illustrate the effect of these elements. The following MATLAB functions are used both in this chapter and elsewhere in the book: Many short examples in this chapter analyse aspects of hearing. To save your typing, here are some of the longer examples reproduced below:
4.2.1_equal_loudness.m
|
Useful links |
Nobody has better defined the field of hearing research that has led to the field of psychoacoustics than Professor Brian C. J. Moore, at the university of Cambridge (Audio Perception Group, Department of Experimental Psychology). Although they are looking from a psychology perspective rather than an engineering perspective, their homepage is nevertheless interesting - and contains a number of demos and experimental material: Back to the Technical University of Kosice (Slovakia), Dept. of Electronics and Multimedia Communications for some excellent information on hearing: |
5 |
Speech Communications
We've already looked at speech and its characteristics in Chapter 3. What we do here is start to develop methods of capturing speech and conveying it over digital channels (especially ways of reducing the bandwidth of speech through compression). In fact this chapter goes into quite some detail of lossless compression of audio – waveform coders, parametric coders including PCM, delta-mod, ADPCM, SB-ADPCM, GSP (RPE), CELP and so on. These are all explained clearly, and with many diagrams to assist in understanding their operation. One particular emphasis in this chapter is in the central part played by Line Spectral Pairs (LSPs), also known as Line Spectral Frequencies (LSFs). Some nice MATLAB code for these, their extraction, analysis, manipulation and plotting, is given. To get you started, an example set of LPC coefficients, and LSP parameters (as used to plot many of the figures in the book): Some useful functions:
lpc_lsp.m Also some of the larger examples are reproduced here:
5.2.1.1_magnitude_peak.m
|
Useful links |
Carnegie Mellon university in the USA, as well has hosting an active and accomplished speech research community, also maintain a repository of speech coding algorithms: A useful presentation of speech compression systems from a data compression perspective is a little dated, but nevertheless contains much useful information: GSM is an LPC-based speech coder that has captured the world of telecommunications with the spread of the European GSM standard to dominate worldwide mobile telecommunications. This page explains, in detail, the GSM encoding format(s): Some evaluation and classification of standard speech compression algorithms, particularly from a VOIP persepective: |
6 |
Audio Analysis
This chapter is pretty interesting because it ties together many of the strands explored earlier. Although the audio signal can be music, speech or some other sound, there are a set of methods used to analyse and handle audio irrespective of what it conveys. Many of these are explored in this chapter where an audio 'toolbox' is introduced. Of course, these tools run on MATLAB! We round off the chapter with some non-speech sounds – an analysis of violin music, and the song from a blackbird. Some of our toolkit:
amdf.m
Examples of analysis:
6.1.4_spectral_measures.m
|
Useful links |
The Speech Signal Processing Toolkit (SPTK): Introduction to Sound Processing by Davide Rocchesso, a nice online book about sound processing. Look in particular at Chapter 4: |
7 |
Advanced Topics
The final chapter, this collects many more interesting topics, from psychoacoustic modelling, through perceptual weighting, speech recognition, speaker recognition, language classification and so on. We also briefly consider three different types of text-to-speech system, move onto stereo encoding and placement (including synthesis of stereo examples), speech formant manipulation. Finally, two types of pitch adjustment are given (PSOLA and a novel LSP-based system).
lspnarrow.m
|
Useful links |
|
8 |
The End
For relative newcomers to the field, I would recommend spending some time getting familiar with such tools as MATLAB, sox and front-ends such as audacity. These are the day-to-day tools of audio and speech analysis. Try some of the experiments in the book - experiment with the experiments! The way to learn is by doing, and in this case, it means listening, analysing and processing. For the experts too - have some fun! Rediscover the joy of making your voice squeaky or hearing yourself speaking like a Dalek. In my philosophy, you shouldn't get away from experimentation and learning. Take some time out and enjoy new things. For me (the author), I get precious little time to 'play' with MATLAB and audio. Writing this book has been a great opportunity to revisit some of those techniques I had left years ago, plus learn a few new ones. Every now and then I boot up Octave or MATLAB and test out some new ideas that have been fermenting at the back of my mind. I do keep up my interest in speech, audio and hearing, and I have been running several audio projects over the past few years, but I've also broadened my interests into some diverse areas: building a satellite (launched in April 2011), building a fleet of electric bikes with Android controllers, working on UAVs, polarimetric synthetic aperture radar, earth observation technology, wireless communications, networking, embedded systems, automotive technology and so on. For this book - well nothing is in the pipeline yet (although I did publish another, unrelated book two years later on Computer Architecture) - but eventually I want to extend this book in a number of directions, including speech and hearing therapy, the medical prosthesis side of speech and audio, into near-audible ultrasound and infrasound, and most importantly to significantly expand the sections on speech recognition (plus natural language processing) and speech synthesis, possibly with a co-author. Other ideas for improvement or expansion are most welcome!
|