Abstract

Machine Learning can work very well with image recognition, but it can also be used to recognize audio patterns. Machine listening can be used to identify audio patterns of different entities like car engine, human speaking, nature sounds etc. Aim of this thesis is to create a program which will read the labelled audio files, extract features from them, feed features to a sequential model, which will then classify these audio files of vehicles based on their sounds and then further categorize them as either light weight, medium weight, heavy weight, rail-bound or two-wheeled vehicle using the applications of machine listening and deep learning in the field of acoustics. It will also classify unlabelled test data files on a pre-trained model. Additionally, to increase the speed and performance of the software program and algorithm, the program could be executed on a High Performance Computing (HPC) system containing cluster which in turn will have many compute servers also called as nodes which will enable faster and parallel computing. This thesis provides an base model for the vehicle classification giving both advantages and disadvantages along with possibility for future extensions.

User-Case: Region - Stuttgart, Germany

One could easily get annoyed by the noise of traffic, vehicles, specially in bigger cities like Stuttgart. Also, the city of Stuttgart is situated inside a valley, and thus sound can be reflected back by the hills. So controlling the sound pollution levels in this region is a major part of the research. This classification approach can help reduce the noise pollution level in Stuttgart.

Model Implementation

This sections gives short overview how model is implemented which consists of how audio file preprocessing is done, audio processing alternatives, feature extraction, classification model and learning process as well as model limitation. The human hearing system has a great sense of its surroundings with respect to the location and different variations of unlimited kinds of sounds of living as well as non living objects. Our hearing system is capable to distinguish between various diverse sounds. Imagine if we made machine and software to carry out the task of listening to different sounds and been able to determine what type of sound it is listening to. This seems like a good idea and it is very well been implemented in applications like classification of music, speech and other sounds, recognition of genre, gender or speaker, and many more. But it requires careful processing of audio signals, which is indeed a complex task. To overcome this task, machine listening has been progressing rapidly in the areas of processing audio signals, understanding relations between different types of audio signals and giving out meaningful results.

Any audio file conveys a kind of sound that we can hear from the audio file, may it be the sound of a band playing a song, a vehicle passing by on a highway, a crying baby, etc. Besides conveying sound, there is a certain information that can be gathered from an audio file. Machine listening is a discipline under machine learning which is used in audio signal processing and applying machine learning to automatically retrieving, analyzing, and classifying audio recordings or files. Machine listening can be a valuable process in areas that require processing based on voice content rather than visual content. Machine listening finds applications in almost all the fields right from technology, security to healthcare. Implementation of the model can be found hier.

Limitation


The model cannot predict a audio file with sound of two or more vehicles passing by. If the data set is unbalanced, consisting of unequal amount of files per categories, the model will not predict in expected manner when tested with the unlabelled or unseen test data. During data augmentation, using large value for noise factor (larger than 1.0), would lead to change the semantic of the original audio file and thus the model test accuracy will also be comprised.

Conclusion

In this thesis, we have described various approaches and alternatives and chosen a suitable one for obtaining data, reading the data, extracting features from the data and finally creating a model which leads to achieve the goal of classifying the vehicles into their respective categories. Demonstrating the behaviour of the model on different noise values also provided us with the extent to which our model can predict accurately. We also showcased the technique of data augmentation which can be applied for data enrichment to get better results in case of data scarcity. Training time can be reduced to a large extent by using the parallel computing nodes on the HPC cluster. Also, using a balanced dataset for training, the model can predict even on the unlabelled test data. As a future extension, a clustering algorithm like k-means clustering can be used to separate skewed or noisy data and non-noisy data before starting the training process.

After classifying the data about vehicles into categories, city experts can extract which type of and how many vehicles pass by, which of these types produce the highest noise levels etc. Finally using this extracted information, specialists from different areas can take suitable measures to control noise pollution in the region. For example, building architects can derive on how the building structure should be, other smart city experts can take appropriate measures to reduce noise pollution levels for that region.