Abstract

Machine Learning can work very well with image recognition, but it can also be used to recognize audio patterns. Machine listening can be used to identify audio patterns of different entities like car engine, human speaking, nature sounds etc. Aim of this thesis is to create a program which will read the labelled audio files, extract features from them, feed features to a sequential model, which will then classify these audio files of vehicles based on their sounds and then further categorize them as either light weight, medium weight, heavy weight, rail-bound or two-wheeled vehicle using the applications of machine listening and deep learning in the field of acoustics. It will also classify unlabelled test data files on a pre-trained model. Additionally, to increase the speed and performance of the software program and algorithm, the program could be executed on a High Performance Computing (HPC) system containing cluster which in turn will have many compute servers also called as nodes which will enable faster and parallel computing. This thesis provides an base model for the vehicle classification giving both advantages and disadvantages along with possibility for future extensions.

User-Case: Region - Stuttgart, Germany

One could easily get annoyed by the noise of traffic, vehicles, specially in bigger cities like Stuttgart. Also, the city of Stuttgart is situated inside a valley, and thus sound can be reflected back by the hills. So controlling the sound pollution levels in this region is a major part of the research. This classification approach can help reduce the noise pollution level in Stuttgart.

Model Implementation

Conclusion

In this thesis, we have described various approaches and alternatives and chosen a suitable one for obtaining data, reading the data, extracting features from the data and finally creating a model which leads to achieve the goal of classifying the vehicles into their respective categories. Demonstrating the behaviour of the model on different noise values also provided us with the extent to which our model can predict accurately. We also showcased the technique of data augmentation which can be applied for data enrichment to get better results in case of data scarcity. Training time can be reduced to a large extent by using the parallel computing nodes on the HPC cluster. Also, using a balanced dataset for training, the model can predict even on the unlabelled test data. As a future extension, a clustering algorithm like k-means clustering can be used to separate skewed or noisy data and non-noisy data before starting the training process.
After classifying the data about vehicles into categories, city experts can extract which type of and how many vehicles pass by, which of these types produce the highest noise levels etc. Finally using this extracted information, specialists from different areas can take suitable measures to control noise pollution in the region. For example, building architects can derive on how the building structure should be, other smart city experts can take appropriate measures to reduce noise pollution levels for that region.