This sections gives short overview how model is implemented which consists of how audio file preprocessing is done, audio processing alternatives, feature extraction, classification model and learning process as well as model limitation. The human hearing system has a great sense of its surroundings with respect to the location and different variations of unlimited kinds of sounds of living as well as non living objects. Our hearing system is capable to distinguish between various diverse sounds. Imagine if we made machine and software to carry out the task of listening to different sounds and been able to determine what type of sound it is listening to. This seems like a good idea and it is very well been implemented in applications like classification of music, speech and other sounds, recognition of genre, gender or speaker, and many more. But it requires careful processing of audio signals, which is indeed a complex task. To overcome this task, machine listening has been progressing rapidly in the areas of processing audio signals, understanding relations between different types of audio signals and giving out meaningful results.
<p>
<imgsrc="phd_dnn_1.png">
</p>
Any audio file conveys a kind of sound that we can hear from the audio file, may it be the sound of a band playing a song, a vehicle passing by on a highway, a crying baby, etc. Besides conveying sound, there is a certain information that can be gathered from an audio file. Machine listening is a discipline under machine learning which is used in audio signal processing and applying machine learning to automatically retrieving, analyzing, and classifying audio recordings or files. Machine listening can be a valuable process in areas that require processing based on voice content rather than visual content. Machine listening finds applications in almost all the fields right from technology, security to healthcare.
<br/>
<br/>
The model cannot predict a audio file with sound of two or more vehicles passing by. If the data set is unbalanced, consisting of unequal amount of files per categories, the model will not predict in expected manner when tested with the unlabelled or unseen test data. During data augmentation, using large value for noise factor (larger than 1.0), would lead to change the semantic of the original audio file and thus the model test accuracy will also be comprised.
</p>
...
...
@@ -53,7 +64,7 @@
<palign="justify">
In this thesis, we have described various approaches and alternatives and chosen a suitable one for obtaining data, reading the data, extracting features from the data and finally creating a model which leads to achieve the goal of classifying the vehicles into their respective categories. Demonstrating the behaviour of the model on different noise values also provided us with the extent to which our model can predict accurately. We also showcased the technique of data augmentation which can be applied for data enrichment to get better results in case of data scarcity. Training time can be reduced to a large extent by using the parallel computing nodes on the HPC cluster. Also, using a balanced dataset for training, the model can predict even on the unlabelled test data. As a future extension, a clustering algorithm like k-means clustering can be used to separate skewed or noisy data and non-noisy data before starting the training process.
<br/>
<br/><br/>
After classifying the data about vehicles into categories, city experts can extract which type of and how many vehicles pass by, which of these types produce the highest noise levels etc. Finally using this extracted information, specialists from different areas can take suitable measures to control noise pollution in the region. For example, building architects can derive on how the building structure should be, other smart city experts can take appropriate measures to reduce noise pollution levels for that region.