Introduction

This project is a group exercise undertaken as part of the University of Oxford - Artificial Intelligence: Cloud and Edge Implementations course, as a learning challenge from Microsoft and Elephant Listening Project. The objective is to devise solutions against illegal elephant hunting in tropical African forests by enabling sensors for instant prediction of gunshot events and thus mitigate poaching attempts.

Elephant Listening Project Challenge

As proposed by Dr. Peter Wrege, Director Elephant Listening Project, Center for Conservation Bioacoustics, Cornell Lab of Ornithology, gunshot detection is an issue in African National Parks and environmental audio data is being collected into a public database at Congo Soundscapes as part of the Elephant Listening Project.

The current model (for gunshot detection) is very inefficient - less than .2% of tagged signals are gunshots and we typically get 10K- 15K tagged signals in a four-month deployment at just one of the 50 recording sites. The issue is we have about 200 good gunshots annotated, but because poaching is way too high, gunshots are still extremely rare in the sounds and it is extremely time-consuming to create the “truth” logs where we can say that every gunshot in a 24hr file has been tagged. From our understanding, this makes developing a detector more difficult.

IoT Architecture - Microsoft Project 15 Open Platform

The Microsoft Project 15 Architecture is leveraged to follow the best practices in the deployment of scalable IoT solutions. One of the core goals of the Project 15 Open Platform is to boost innovation with a ready-made platform, allowing the scientific developer to expand into specific use cases.

Project 15 Architecture

Data Scientists can train the model using Azure Machine Learning cloud service and deploy the model to edge devices using IoT services. This process can be automated using Azure cloud services for upgrading the edge devices models at runtime through a strategic rollout.

Machine Learning Model Training

Data Collection, Analysis, and Feature extraction

At this stage, we convert the gunshot detection problem statement into a machine learning problem. The main issue as per the challenge is the lack of tagged data for model training and also mostly the gunshot audio data for such use cases is proprietary. So we collected free gunshot samples and other environmental audio samples from random internet sources. A few samples of them are available in the Sample audio folder. There was some basic audio cleaning done to remove noise and clip exact audio data points post converting all formats to .wav files into the dataset. The further audio analysis was carried out as per Analysis Notebooks folder. For brevity, only a part of the dataset is uploaded in the repo but all examples, notebooks, and the dataset are structured in a way to accommodate more data and scale the model training process as we move ahead.

Many features were analyzed across two different classes namely gunshot and environmental audio. The environmental audio contains audio data of elephant noises and other sounds of the tropical African forests. The features analyzed are as follows,

There could be more features relevant to this classification problem that can be analyzed to improve the model in the future and for the scope of this exercise the first thirteen Mel-Frequency Cepstral Coefficients suited best as input features for audio classification.

Classification Model Architecture

The input to the machine learning model will be a set of audio features and the model has to classify each input set into two classes, i.e gunshot or environmental audio. This is a binary classification problem and a suitable Dataset was created for training purposes. A feature extraction script was created to convert the raw audio .wav files into relevant feature data and corresponding true values for classification.

Azure Machine Learning cloud service has been leveraged for the classification model supervised training. In this case, two different model architectures were tested by running scripts from the Makefile and the following results were observed,

Neural Network Architecture Train Accuracy Test Accuracy
Multi-layer Perceptron 0.9756 0.9405
Convolutional Neural Network 0.9740 0.9236

Note: These accuracy metrics depend directly on the limited data used for model training and testing. Further improvements in contextual data collection, feature extraction and analysis, and experimentation with model architectures is needed to validate the reliability of these model metrics for practical applications.

IoT Solution Deployment

Event processing for Gunshot detection

Microsoft Project 15 Setup

architecture

Edge Device Simulation

The models can be deployed using NVIDIA Triton Inference Server and a deployment notebook demonstrates the best approach to a cloud-native inference setup using container technology and inference over network.

For demo purposes, we have created an edge device prototype that simulates predictions from audio data and sends telemetry data to IoT Hub. This is a nodejs based application that uses tensorflow as backend to make predictions using the model created during training phase.

The model created during the training phase needs conversion to a tensroflow js graph model format to be used in a node js application. So we convert the models to tfjs graph format for the client to predict using the make command as follows

make convert_model

Register an edge device named gunshot-detector-1 to IoT Hub and copy IoT device connection string from the micorsoft portal into .env file in the edge device source folder as follows

CONN_STR=<conn-string>
DETECTOR_ID=gunshot-detector-1

Now, run the edge device prototype to simulate gunshot predictions randomly and send telemetry data to IoT Hub

make run_device_simulation

Live Monitoring and Demo

Future Work

Gunshot detection is still an unsolved problem today and there is ongoing research on this front for both military and other purposes around the world. The machine learning classification model in this context can be improved with better data collection from the tropical forests in Africa. The lack of sufficient data remains the major issue with this challenge. Accurate gunshot audio should be recorded using the apposite set of hunting rifles for improved accuracy and further feature analysis can be conducted comparing gunshots with other surrounding sounds in the elephant habitat.

Further, many more model architectures can be tested and compared to understand what hyper-parameters work best for such remote environments. This can surely improve the model training process. Also, better deployment practices, telemetry, alert systems, and model upgrade processes can be explored with cloud-based IoT solutions to improve the overall efficiency of the solution.

Team

University of Oxford - Artificial Intelligence: Cloud and Edge Implementations - Team 3