index.html 6.57 KB
Newer Older
Athanasios's avatar
Athanasios committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="generator" content="GitLab Pages">
  <title></title>
  <link rel="stylesheet" href="../assets/css/style.css">
  <link rel="stylesheet" href="../assets/css/mobile.css">
</head>

<body>
  <header id="header">
    <div class="row">
      <div class="logo" id="logo"></div>
      <div class="hamburger">
        <div class="line"></div>
        <div class="line"></div>
        <div class="line"></div>
      </div>
      <nav></nav>
    </div>
Athanasios's avatar
Athanasios committed
24
25
    <div id="projectlogo"></div>
    <div id="projectname"></div>
Athanasios's avatar
Athanasios committed
26
27
28
  </header>
  <div class="content">

Muddsair Sharif's avatar
Muddsair Sharif committed
29
    <h1>Abstract</h1>
Athanasios's avatar
Athanasios committed
30

31
    <p align="justify">
32
      Machine Learning can work very well with image recognition, but it can also be used to recognize audio patterns. Machine listening can be used to identify audio patterns of different entities like car engine, human speaking, nature sounds etc. Aim of this thesis is to create a program which will read the labelled audio files, extract features from them, feed features to a sequential model, which will then classify these audio files of vehicles based on their sounds and then further categorize them as either light weight, medium weight, heavy weight, rail-bound or two-wheeled vehicle using the applications of machine listening and deep learning in the field of acoustics. It will also classify unlabelled test data files on a pre-trained model. Additionally, to increase the speed and performance of the software program and algorithm, the program could be executed on a High Performance Computing (HPC) system containing cluster which in turn will have many compute servers also called as nodes which will enable faster and parallel computing. This thesis provides an base model for the vehicle classification giving both advantages and disadvantages along with possibility for future extensions.
Athanasios's avatar
Athanasios committed
33
    </p>
34
   
Athanasios's avatar
Athanasios committed
35
36

    <p>
Muddsair Sharif's avatar
Muddsair Sharif committed
37
      <img src="aesc_nn_architecture.png">
Athanasios's avatar
Athanasios committed
38
39
    </p>

Muddsair Sharif's avatar
Muddsair Sharif committed
40
41
42
43
44
   <h1>User-Case: Region - Stuttgart, Germany</h1>

    <p align="justify">
    One could easily get annoyed by the noise of traffic, vehicles, specially in bigger cities like Stuttgart. Also, the city of Stuttgart is situated inside a valley, and thus sound can be reflected back by the hills. So controlling the sound pollution levels in this region is a major part of the research. This classification approach can help reduce the noise pollution level in Stuttgart.
    </p>
Athanasios's avatar
Athanasios committed
45

Muddsair Sharif's avatar
Muddsair Sharif committed
46
47
48
    <h1>Model Implementation </h1>

    <p align="justify">
Muddsair Sharif's avatar
Muddsair Sharif committed
49
50
51
52
53
54
55
56
57
58
59
      This sections gives short overview how model is implemented which consists of how audio file preprocessing is done, audio processing alternatives, feature extraction, classification model and learning process as well as model limitation. The human hearing system has a great sense of its surroundings with respect to the location and different variations of unlimited kinds of sounds of living as well as non living objects. Our hearing system is capable to distinguish between various diverse sounds. Imagine if we made machine and software to carry out the task of listening to different sounds and been able to determine what type of sound it is listening to. This seems like a good idea and it is very well been implemented in applications like classification of music, speech and other sounds, recognition of genre, gender or speaker, and many more. But it requires careful processing of audio signals, which is indeed a complex task. To overcome this task, machine listening has been progressing rapidly in the areas of processing audio signals, understanding relations between different types of audio signals and giving out meaningful results. 
      
       <p>
      <img src="phd_dnn_1.png">
    </p>
      
      Any audio file conveys a kind of sound that we can hear from the audio file, may it be the sound of a band playing a song, a vehicle passing by on a highway, a crying baby, etc. Besides conveying sound, there is a certain information that can be gathered from an audio file. Machine listening is a discipline under machine learning which is used in audio signal processing and applying machine learning to automatically retrieving, analyzing, and classifying audio recordings or files. Machine listening can be a valuable process in areas that require processing based on voice content rather than visual content. Machine listening finds applications in almost all the fields right from technology, security to healthcare. 

<br/>
<br/>
    The model cannot predict a audio file with sound of two or more vehicles passing by. If the data set is unbalanced, consisting of unequal amount of files  per categories, the model will not predict in expected manner when tested with the unlabelled or unseen test data. During data augmentation, using large value for noise factor (larger than 1.0), would lead to change the semantic of the original audio file and thus the model test accuracy will also be comprised.
Muddsair Sharif's avatar
Muddsair Sharif committed
60
61
62
63
64
65
66
  
    </p>

     <h1>Conclusion</h1>

    <p align="justify">
   In this thesis, we have described various approaches and alternatives and chosen a suitable one for obtaining data, reading the data, extracting features from the data and finally creating a model which leads to achieve the goal of classifying the vehicles into their respective categories. Demonstrating the behaviour of the model on different noise values also provided us with the extent to which our model can predict accurately. We also showcased the technique of data augmentation which can be applied for data enrichment to get better results in case of data scarcity. Training time can be reduced to a large extent by using the parallel computing nodes on the HPC cluster. Also, using a balanced dataset for training, the model can predict even on the unlabelled test data. As a future extension, a clustering algorithm like k-means clustering can be used to separate skewed or noisy data and non-noisy data before starting the training process.
Muddsair Sharif's avatar
Muddsair Sharif committed
67
<br/><br/>
Muddsair Sharif's avatar
Muddsair Sharif committed
68
69
70
71
After classifying the data about vehicles into categories, city experts can extract which type of and how many vehicles pass by, which of these types produce the highest noise levels etc. Finally using this extracted information, specialists from different areas can take suitable measures to control noise pollution in the region. For example, building architects can derive on how the building structure should be, other smart city experts can take appropriate measures to reduce noise pollution levels for that region. 
    </p>


Athanasios's avatar
Athanasios committed
72
73
74
  </div>

  <div class="footer"></div>
Athanasios's avatar
Athanasios committed
75
  <div class="legal"></div>
Athanasios's avatar
Athanasios committed
76
77
78
79
80

  <script src="../settings.js"> </script>
  <script src="../main.js"> </script>
</body>

81
</html>