Published at : 03 Nov 2022
Volume : IJtech
Vol 13, No 6 (2022)
DOI : https://doi.org/10.14716/ijtech.v13i6.5872
Iskandar Zulkarnain Jafriz | Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia |
Sarina Mansor | Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia |
In the era of Covid-19
infection, enforcing social distance is essential for confined areas such as
shopping retails and classrooms. Human workforce is used to ensure the safety
measure rules are adhered. However, a better technique to enforce social
distancing regulations is to use an automated system that counts and detects
people and measures the social distance. This work proposes an innovative
retail monitoring system based on the Intel Distribution of Open VINO toolkit.
The system uses deep learning techniques and trained models to automatically
count the number of individuals, the number of persons entering and exiting
premises, and the distance between each person to ensure social distancing.
Five experiments were conducted to evaluate the efficiency and accuracy of the
system.
Intel OpenVINO; Machine learning; Monitoring system; Smart retail; Social distancing
Several new
practices and preventive measures were introduced during the Covid-19 pandemic (Berawi et al., 2020; Baroroh & Agarwal, 2022;
Romadlon et al., 2022). One of them is social distancing, defined as
keeping a safe gap between persons who are not in the same bubble. An infected
person coughing, sneezing, or speaking may infect the next person closest to
them (Gupta, 2020).
2.1. Line Monitoring System
A line monitoring system considered as a
crowd control system. It counts people in a specific area, frame, or queue.
Several applications could utilise the line monitoring system. One popular
application is queue monitoring at the convenience store. Due to excessively
long and poorly managed payment queues, many retail customers abandon their
purchases. To solve this problem, an autonomous queue monitoring system based
on computer vision has been proposed (Viriyavisuthisakul et al., 2017).
Another interesting application is
monitoring vehicles at traffic signals. It has been used to improve traffic
signals (Yao
et al., 2013). The
system keeps track of time by measuring the vehicle's length (Cai et al., 2010). The system spotted
the vehicle in real-time because the camera was at the crossroads. The image
was then processed using image processing methods. The bank’s queue system is
also considered to provide efficient customer service, where two infrared
sensors are used for real-time detection of queue at the entrance and exit (Gimba et al.,
2020).
2.2. People Counting System
A people counter system is a system that
is designed to count the number of people that passes through a specific
designated area. Previously, people would usually count the number of people in
a particular area by hand. In this modern time, various technologies have been
created to make counting people smoother, faster, and more accurate. Some of
the new technology designed to count people are infrared sensors, using
cameras, thermal sensors, Wi-Fi sensors, and many more (Brown,
2019; Hughes, 2021). Every technology has its advantages and
limitations, but it all comes down to functionality and usability. For example, Arief-Ang
et al. (2018) developed a method of counting and detecting people using
carbon-dioxide sensors. The cost of implementation is low, but the accuracy
depends on the carbon-dioxide concentrations, which can be fluctuated due to
external factors.
3.1. System Architecture
Figure 1 Overall system architecture
The system works by
taking in an input video/footage, which could be a video or even live camera
footage from the webcam. The OpenVINO software processes the input with its
models and inference engine. To begin, the system ingests video from a file and
function it frame by frame. Individuals in the frame are recognized using a
pre-trained Deep Neural Network (DNN) model (Intel,
2021). To track people, the system will use a second pre-trained DNN to
extract their features (Intel, 2020e). The
deep learning algorithm depends on the components to be implemented, such as
capacity limit, social distance, and line monitoring. The following subsections
describe the implementation of these feature in detail.
3.1.1. Capacity Limit System
Figure 2 Block Diagram of Capacity Limit
System (Intel, 2020a)
After
person detection and tracking, the system will determine whether the people
crossed any predetermined virtual gates based on the output frame's coordinates
and whether the virtual gates indicated one direction or the other. Finally,
the person counter's output is updated based on entry and exit data. If a
certain number of people crossing the entry line exceeds the threshold, the
system will trigger an alert that will pop up a warning at the output frame.
3.1.2. Social Distancing System
Figure 3 Block Diagram of Social Distancing System (Intel, 2020b)
The
system processes the video frame by frame until the stream is complete. A DNN
model will detect people in the frame of interest, and another DNN model will
extract characteristics from them so they can be tracked. It then calculates
the distance between two identified people based on their position, size, and
viewpoint to see if the minimal social distance threshold has been exceeded.
3.1.3. Line Monitoring System
This reference implementation demonstrates a retail application that counts the number of people who are waiting in a retail store's waiting queue. The number of persons in a line is estimated by the program’s algorithm by performing an intersection between the people who have been identified in the frame (Intel, 2020c).
Figure 4 Block Diagram of Line Monitoring
System (Intel, 2020c)
Figure
4 shows the block diagram of Line Monitoring system. This method has been
integrated with the Capacity Limit System to provide a more efficient and
user-friendly system for retails. This system detects individuals waiting
outside the retail establishment prior entering.
3.2. Pre-trained Models
3.2.1. Pre-trained
Model 1: People-Detection-Retail-0013
The people-detection-retail-0013 is a
pre-trained model developed by Intel for person detection application. This
model was obtained from Open Model Zoo and can be downloaded using their model
downloader. It has 88.62 percent accuracy, uses the Caffe framework, and
supports occluded pedestrians, among other features. This model uses the FP32
format, which is a single-precision floating-point format. This model uses a
MobileNetV2-like backbone with depth-wise convolutions to reduce the number of
calculations required for the 3x3 convolution block (Intel, 2021).
3.2.2. Pre-trained Model 2:
People-Reidentification-Retail-0030
Intel also developed pre-trained models
for reidentification purpose. In this work, the
people-reidentification-retail-0300 model is used. This model was obtained from
Open Model Zoo and can be downloaded using their model downloader. As input, it
takes a full-body image and outputs an embedding vector that can be used to
compare two images using cosine distance. The model is built on the
OmniScaleNet backbone for rapid inference. A single reidentification head
extracted from the 1/16 scale feature map generates a 512-float embedding
vector (Intel,
2020e).
The system was
assessed with five different experiments to evaluate its efficiency. The experiment
focuses on analysing the accuracy of detecting people, the calculation for
social distancing, and the counting for number of people entering and exiting a
particular premise. There were five volunteers involved in the experiment. All
the findings from the experiment are shown and discussed in Section 4.1.
Section 4.2 evaluates the overall system performance with 6 videos in different
environments.
4.1. Experimental Results
4.1.1. Experiment
1: Angle and Distance Test
Figure 5 The First Camera Setup (Close-Range, Body-Level)
Figure 6 The Second Camera Setup (Medium-Range, Bird’s Eyed View)