Machine
Hearing Research
From
the research group of Professor
I. V. McLoughlin
(with help from students Zhang Haomin and Xie Zhipeng)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This page contains full MATLAB code, along with details of the sound and background noise databases necessary to reproduce the isolated sound event detection classifiers using DNN and CNN. Please cite the following paper if you used the code or found this page to be useful. Note: code will be placed here later for continuous recognition experiments. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Setup - obtain the required data and software |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.1 |
The tested sounds and
noises used in all of the published evaluations are chosen to
exactly follow the experimental conditions of Jonathan Dennis
(click
here to view his PhD thesis).
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.2 |
Dennis chose 50 classes of sounds from RWCP, and used 80 files from each class: 50 for testing and the remaining 30 for training. Most of the experiments use mismatched conditions (i.e. training only with clean sounds, but testing with noise-corrupt sounds), but some used multi-condition training (i.e. training with clean AND noisy sounds). First we create the clean sounds database:
We then create noise-corrupted versions, in three further subdirectories (one for each SNR level). To create these, a random choice is made of the type of corrupting noise (from 4 choices), and a random noise starting point is selected (i.e. so the mix does not always start from the beginning of the noise file). Noise is added at 0, 10 and 20db SNR.
These sound directories are then used for subsequent training and testing. In practice, the code below will use every 1st to 5th file for training and the remainder for testing. We can easily recreate the sound database and change the train/test mix to randomise the conditions. The 50 sound classes that we
use for the published experiments are as follows;
aircap ,
bank ,
bells5 ,
book1 ,
bottle1 ,
bowl ,
buzzer ,
candybwl ,
cap1 ,
case1 ,
cherry1 ,
clap1 ,
clock1 ,
coffcan ,
coffmill ,
coin1 ,
crumple ,
cup1 ,
cymbals ,
dice1 ,
doorlock ,
drum ,
dryer ,
file ,
horn ,
kara ,
magno1 ,
maracas ,
mechbell ,
metal05 ,
pan ,
particl1 ,
phone1 ,
pipong ,
pump ,
punch ,
ring ,
sandpp1 ,
saw1 ,
shaver ,
snap ,
spray ,
stapler ,
sticks ,
string ,
teak1 ,
tear ,
trashbox ,
whistle1 ,
wood1
So, for example, the directory structure would contain;
and later;
much later;
All of these subdirectories containing sounds mixed with noise are prepared in advance of any training/testing and are unchanged for each instance of tests, except when cross-verification over different noise types is performed, when several versions of the entire setup are created with different noise mixes.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.3 |
All testing is performed using MATLAB (or Octave). If you use MATLAB, it is probably necessary to ensure that you have a copy of the Signal Processing Toolbox. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.4 |
Although we have our own DNN implementation, the implementation does not affect the performance score (but it can affect the computation speed quite substantially). The easiest way to get set up to do the experiments is to use the very convenient DeepLean toolbox from Palm. If you use this, please ensure you also cite their paper. You can download the toolbox – which works in either MATLAB or Octave – from here https://github.com/rasmusbergpalm/DeepLearnToolbox Download the software, unpack the toolbox and then add it to your MATLAB path as shown in the documentation. Note: you will need a reasonably fast computer, and at least 4GB of memory to do this. Training and testing for one conditions (i.e. one noise mix) takes a couple of hours. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.5 |
PASS 1: training a DNN First we set up the system:
Next step is to run through and bring all sounds into memory:
The array train_data now contains all of the training data feature vectors. Please refer to our paper to see how the feature vectors are constructed from the spectrogram representations. Before we continue, we must condition the data to ensure it is scaled appropriately.
Next step will be to set up the neural network parameters, using the recommended settings by Palm for the DeepLearn toolbox:
Now we start to create and stack RBM layers – as many as we want, to create a deep structure (but this example is not particularly deep, and was adopted from Palm's documentation):
Now we treat this network as a NN, ready for fine-tuning using back-propagation:
The outcome of this process is a fairly large array in MATLAB's memory called nn. This defined the DNN structure, and contains all weights and connections. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.6 |
PASS 2: use the DNN for testing The learned DNN (called nn) is now used for classification. Again, we first set up the system similarly to the way we did it before.
And again, read in the data for testing in the same way we did for training previously:
Next – as before – we also condition the files and ensure they are scaled appropriately:
Now we execute the actual test:
This makes use of a function called nnpredict_p to do probability and energy scaling (one of the two output scoring options in our paper), which is shown below. Here is the nnpredict_p function:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.7 |
The above code will output either a “correct” or “NOT correct” output for each of the 1500 files in the training set. The performance score for that particular test condition is simply the proportion 'number of correct'/1500 |