• International Journal of Technology (IJTech)
  • Vol 13, No 6 (2022)

A Cow Crossing Detection Alert System

A Cow Crossing Detection Alert System

Title: A Cow Crossing Detection Alert System
Yuan Qin Ong, Tee Connie, Michael Kah Ong Goh

Corresponding email:

Cite this article as:
Ong, Y.Q., Connie, T., Goh, M.K.O., 2022. A Cow Crossing Detection Alert System. International Journal of Technology. Volume 13(6), pp. 1202-1212

Yuan Qin Ong Faculty of Information Science & Technology, Multimedia University, 75450, Melaka, Malaysia
Tee Connie Faculty of Information Science & Technology, Multimedia University, 75450, Melaka, Malaysia
Michael Kah Ong Goh Faculty of Information Science & Technology, Multimedia University, 75450, Melaka, Malaysia
Email to Corresponding Author

A Cow Crossing Detection Alert System

Artificial intelligence is rapidly growing in recent years and has derived several branches of studies such as object detection and sound recognition. Object detection is a computer vision technique that allows the identification and location of objects in an image or video. On the other hand, proper recognition is the ability of a machine or program to receive and interpret dictation or to understand and carry out direct commands. This paper presents a cow crossing detection alert system with object detection and sound recognition capabilities. The proposed system aims to protect the driver from animal-vehicle collision. A data-driven deep learning approach is used for cow detection. Consequently, the cow detection module is integrated with a Raspberry Pi device to perform real time monitoring. The proposed cow crossing detection alert system will send an alert message to relevant units like the road transport department to take further actions if a potential animal-vehicle collision is detected on the road. Experimental results show that the proposed cow detection approach yields a mean average precision 0.5 (mAP 0.5) of 99% in object detection and 100% accuracy in sound recognition, demonstrating the system's practical feasibility.

Cow crossing detection; IoT; Object detection; Sound recognition; YOLO


    In Malaysia, collisions between cows and vehicles are significant causes of road accidents, especially in rural areas. Apart from that, the collisions usually take place on the paths of villages (kampong roads). Usually, these paths have very few or no streetlights installed, so the drivers do not have good vision for driving and do not have enough time to react when encountering cows on the roadways. To address this problem, we propose an active roadways alert system that allows relevant parties to take preventive actions when cows are present in the area.

      The proposed alert system is equipped with real time object detection and sound recognition modules to monitor roadway conditions. The alert system requires less budget and less effort for maintenance. Furthermore, when the system's camera or microphone detects a cow on roadways, the system will send an alert or notification to the relevant units like the Road Transport Department Malaysia. The relevant units can have informed knowledge about the condition of the roadways and perform further actions such as driving out cows from the highways. To reduce cost, Internet of Things (IoT) technology is deployed to automate the monitoring process (Jonny et al., 2021; Zahari et al., 2021; Munaf et al., 2020). IoTs are, physicalweight physical devices embedded with sensors, microprocessors software, and other technologies connected to the Internet or other communications networks to exchange data with other devices and systems. Raspberry Pi is an affordable, small-sized computer that can host operating systems. Therefore, Raspberry Pi is chosen in this research as the IoT device to be installed on the roadway to collect environmental data for cow detection using a camera and mini microphone.
     Object detection is an essential component of the proposed system. Recently, YOLO has emerged as a popular object detection algorithm. It relies on neural networks to provide real time object detection. The benefit of using YOLO is that it allows for high object detection speed and high accuracy. In addition, sound recognition can assist the limitation of object detection in dark conditions. Hence, the trained sound recognition model can alert the relevant units when it detects cows on roadways when the camera sensor is out of service, or the visual cue is affected due to bad weather conditions. In this research, YOLO with a custom and sound recognition model is trained and run in Raspberry Pi. The proposed YOLO with a custom model can achieve high accuracy of 99% of mAP 0.5 for cow detection and the sound recognition model can obtain 100% accuracy.

Experimental Methods

2.1.  Cow Crossing Detection Alert System

       In this paper, an object detection model is trained with a custom dataset so that it can recognize cows from different angles, such as the front, side, and back. A sound processing module is also developed to analyze the sound signals. The system will capture videos and sounds of the roadway at fixed durations. The cow crossing detection alert system is installed on roadsides to monitor road condition. The system will read the captured video’s frame to execute the recognition process using the custom YOLO model. The system also performs sound recognition at the same time. The system will send alert messages to subscribed devices if the detection result indicates the occurrence of a cow in the scene under observation. If the system does not detect the existence of cows on the roadway, it will wait for the following input from the camera and microphone to execute the recognition process.

       Some existing techniques or solutions use GPS trackers to track cows to ensure that they do not enter the roadway area. However, there are limitations to such practices. The GPS trackers are not able to track the cow’s position correctly and accurately. Malaysia is a tropical country, and many roadways are covered by dense forests that cause the GPS trackers to lose signal to provide correct location data. The proposed system, on the contrary, can fully address this issue. The proposed approach does not rely on the availability of the GPS signal but is based on visual and audio cues instead to monitor the cows.

Five core characters are required in the proposed cow crossing detection alert system: Pushbullet, Raspberry Pi camera module, mini-USB microphone, GPS module, and Raspberry Pi. The Raspberry Pi camera module and mini-USB microphone are used to capture the roadway condition for recognition. Furthermore, Raspberry Pi is used to execute the cow recognition process, and it acts as a storage to save the input video, sound data recorded, and essential data. Besides, the latitude and longitude of the Raspberry Pi device can be used to generate a Google Map link so that relevant units can track the cow’s location easily. Next, Pushbullet plays a major role on the user side because it sends the alert message with a map link to the relevant units’ devices, such as a computer or smartphone, if the system detects a cow on the roadway. Figure 1 illustrates a scenario of the proposed cow crossing detection system.

Figure 1 The proposed cow crossing detection alert system

2.2.  YOLO: Real Time Object Detection

       In this research, YOLOv3 is chosen and integrated with Raspberry Pi. YOLOv3 is an advanced real time object detection algorithm that is faster than other detectors such as Retinanet-50-500, SSD321, and so on (Redmon & Farhadi, 2018). YOLOv3 is based on the CNN architecture to detect objects real time YOLOv3 can interpret images as a structured array of data and find patterns between them using CNNs. The YOLOv3 algorithm performs prediction based on 1x1 convolutions of convolutional layer, and the input image only requires one forward propagation pass across the network to produce high accuracy prediction at a high speed. YOLOv3 employs a novel approach to perform prediction by using a single neural network to process the whole picture. Once the concept has been divided into areas, the network calculates the probabilities associated with each of those parts. The projected possibilities are used to weight the produced bounding boxes. One more important technique used in YOLOv3 is non-max suppression. This technique ensures that each item is detected once and discards any false detections before returning the identified objects and their bounding boxes.

2.3.  Sound Recognition Model

       Apart from object detection, our system also includes a sound recognition model. Audio data analysis is essential to processing and comprehending audio signals obtained from digital devices. Spectrogram plays a vital role in audio analysis because we can extract crucial audio signal features from the spectrogram to train a model. The spectrogram provides a valuable way to understand the audio signal and display the audio signal graphically. Moreover, every audio collected by Raspberry Pi consists of many important features that can be used to predict the sound movement. One of the valuable features that be extracted from an audio signal is spectral centroid. The spectral centroid allows us to quickly locate the centre of mass for a sound (Chauhan, 2020). Next, spectral rolloff is used to measure the shape of the signal. Besides, spectral bandwidth is another helpful feature. Spectral bandwidth is defined as the width of a light band at one-half of its most significant value. In this study, a zero-crossing rate is also used to determine the smoothness of a signal by calculating the number of zero-crossings in a segment of a movement. All the features are fed as input to a neural network to perform classification. Figure 2 illustrates some examples of the spectrogram of the audio signals. Figure 2a shows the cow audio signals, while Figure 2b resemble the audio signals without a cow.

Figure 2 Spectrogram of audio signals: (a) cow, (b) without a cow

2.4.  Fusion of Images and Sound Signals

       This study uses score-level fusion to consolidate the audio and visual signals. Score-level fusion combines the match scores produced by different matchers to make a judgement regarding an individual's identification (Ross & Nandakumar, 2009). The cow crossing alert system uses the score-level fusion to determine the existence of cows from two modalities at different fusion levels: image and sound, before sending the alert messages. Figure 3 provides an illustration of how the score-level fusion is applied in the study. In the beginning, both the visual and audio be fed to the respective backbones/models to yield a matching score. The coordinating scores from the two models are joined with the exceptional coordinating score. The system used the SUM operation for score level combination. If the melded coordinating scores (MF) are bigger than the target (T) score, the system will send an alert message. For example, suppose the value for T is set to 1. The YOLO model returns a score of 1 as a cow is detected, but the sound processing module returns a 0 score. In this case, the total is 1 and is not greater than 1. Therefore, the system will not send an alert message because the condition is no met.

Figure 3 Score level fusion approach

2.5.  Platform to Receive an Alert Message

       The system will send alert messages to subscribed devices such as computers or smartphones with the Pushbullet application. The system will send an alert message to the devices when one of the conditions are met:

1.  A cow is detected in the video, but no cow’s sound is detected

2.  A cow’s sound is detected, but there is no cow seen in the video

3.  Both of the video and sound witnessed the occurrence of a cow

       The reason for setting the conditions is to allow more possibilities to detect the cows due to the dynamic environment in real-world. The visual input especially is vulnerable to change in illumination. The cow objects might not be visible at night or in bad weather conditions. Therefore, the audio signal can be a complementary information in the proposed cow detection system.

Results and Discussion

3.1.  Dataset

3.1.1. Image Dataset

        In this paper, about 400 cow and 100 vehicle images are obtained from Kaggle. For clarity, this dataset is denoted as Dataset 1. The collected samples include various backgrounds and different views of cows and cars. The collected data are pre-processed by auto-orientation to prevent feeding the model with wrong information. After that, the images are resized to 416 x 416 pixels. The samples are stored in Pascal VOC format for further use. After that, data augmentation is applied on the images to increase the dataset size. The operations performed include random gaussian blur of between 0 and 3 pixels, and salt and pepper noise that are applied to 5 percent of the pixels. This results in a total of 1300 images coined as Dataset 2.

Figure 4 Different views of the cow and car

3.1.2.   Sound Dataset

        The sound datasets are collected from online public datasets such as UnrbanSound. This study collected 150 cow and 150 non-cow sound signals. The no cow sounds contain sounds from the surrounding that people always hear on the roadway, like the sound of air conditioner, car horn, and engine idling. Figure 5 shows some sample sound signals acquired in this study. Features such as Mel-frequency cepstral coefficients (MFCC), Spectral Centroid, Zero Crossing Rate, Chroma Frequencies, Spectral Roll-off are extracted.

Figure 5 Samples sound signals of: (a) cow, (b) air conditioner, (c) vehicle horn, (d) engine idling

3.2.  Experimental Setting

        All the experiments were conducted in Google Colaboratory with the following requirements: GPU: 1xTesla K80, compute 3.7, having 2496 CUDA cores, 12GB GDDR5 VRAM, CPU: 1xsingle core hyper threaded Xeon Processors @2.3Ghz i.e(1 core, 2 threads), RAM: 12.6 GB and 33 GB of disk space.

        The dataset is split into 80% for training and 20% for testing. For clear indication, each YOLO model is named using the sequence of input size, YOLO quantize model, and batch size, for example 416_INT8_4 denotes a model having an input size of 416 x 416 pixels with an 8-bit quantized version of the model and having a batch size of 4. To further differentiate the expanded dataset (after augmentation) from the original dataset, we name the original dataset and developed dataset as Dataset 1 and Dataset 2, respectively. The hyperparameters used in the experiments are as follows: Epoch: 100, Framework: TensorFlow, IOU Loss Threshold: 0.5, Input Size: 416x416, Initialization Train Learning Rate: 0.0001, Ending Train Learning Rate: 0.00001, YOLO Quantize Mode: INT8, FP16, FP32, Batch Size: 2, 4, 8.

3.3.  Experiment Analysis

3.3.1.   Experiment Analysis For Object Detection Performance of Custom YOLO Model

        We first assess the performance of using a custom YOLO model. Dataset 2 is used for this purpose. The custom YOLO model is trained with different hyperparameters to obtain optimal performance. Figure 6a shows the loss values of using different YOLO models. We observe that 416_INT_2, 416_FP16_2, 416_FP32_2 can get low training loss as compared to other models. Batch size 2 appears to be the best batch size for training the YOLO model. Apart from that, Figure 6b depicts the validation loss values of different YOLO models. We can see that 416_INT_2, 416_FP16_2, 416_FP32_2 still yield the lowest validation loss among the other models.

Figure 6 (a) Training and (b) validation of different YOLO models

        Table 1 shows the results from all the hyperparameters. We observe that the train total loss of 416_INT8_2 is much lower than the validation total loss, so the 416_INT8_2 is facing overfitting problem although its train total loss is lower than the other hyperparameters. Moreover, 416_FP16_2, 416_FP32_2 can obtain good results in both train total loss and validation total loss, and no overfitting problem.

Table 1 Loss of different hyperparameters


Train confidence loss

Train box loss

Train classification loss

Traintotal loss

Val confidence loss

Val box loss

Val classification loss

Val total loss


















































































       The results of using different YOLO models are summarized in Table 2. We observe that the 416_FP32_2 model achieves the best mAP0.5 and FPS among the nine models, which is 1 for mAP0.5 at 13.51 FPS. Although the 416_FP32_4 model can also achieve 13.09 FPS, its speed is still slight lower than 416_FP32_2. On the other hand, the performance of 416_FP16_8 is the worst because it only obtains 0.0076 for mAP0.5 and 5.7 for FPS.

Table 2 Performance of different YOLO models                                                             

































6.05 Performance Comparisons Between Models Trained by Dataset 1 and Dataset 2

       This experiment aims to evaluate the performance of the proposed methods using Dataset 1 and Dataset 2. The 416_INT8_4 model is applied in the investigation. The total training loss and validate total loss are used to evaluate the model. Figure 7a show the training confidence loss, training box loss, training classification loss, and total loss for Dataset 1. On the other hand, the validation confidence loss, validation box loss, validation classification loss, and validation total loss for Dataset 1 are depicted in Figure 7b. The total validation loss is around 5, but the training loss is only approximately 0.78. This shows that the model may face overfitting, and it is unable to generalize well to new data. The same 416_INT8_4 model is applied to Dataset 2. Figures 8a presents the training confidence loss, training box loss, training classification loss, and training total loss when the model is trained using Dataset 2. The validation confidence loss, training box loss, training classification loss and training total loss are provided in Figure 8b. We observe that the training total loss is around 2.117, higher than the model trained by Dataset 1. However, the validation total loss is only 2.038, which is much lower than the mode trained by Dataset 1. Hence, Dataset 2 is used in the remaining experiments.

Figure 7 Confidence loss, box loss, classification loss and total loss for training (first row) and validation (second row) on Dataset 1