|Iskandar Zulkarnain Jafriz||Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia|
|Sarina Mansor||Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia|
In the era of Covid-19 infection, enforcing social distance is essential for confined areas such as shopping retails and classrooms. Human workforce is used to ensure the safety measure rules are adhered. However, a better technique to enforce social distancing regulations is to use an automated system that counts and detects people and measures the social distance. This work proposes an innovative retail monitoring system based on the Intel Distribution of Open VINO toolkit. The system uses deep learning techniques and trained models to automatically count the number of individuals, the number of persons entering and exiting premises, and the distance between each person to ensure social distancing. Five experiments were conducted to evaluate the efficiency and accuracy of the system.
Intel OpenVINO; Machine learning; Monitoring system; Smart retail; Social distancing
Several new practices and preventive measures were introduced during the Covid-19 pandemic (Berawi et al., 2020; Baroroh & Agarwal, 2022; Romadlon et al., 2022). One of them is social distancing, defined as keeping a safe gap between persons who are not in the same bubble. An infected person coughing, sneezing, or speaking may infect the next person closest to them (Gupta, 2020).
2.1. Line Monitoring System
A line monitoring system considered as a crowd control system. It counts people in a specific area, frame, or queue. Several applications could utilise the line monitoring system. One popular application is queue monitoring at the convenience store. Due to excessively long and poorly managed payment queues, many retail customers abandon their purchases. To solve this problem, an autonomous queue monitoring system based on computer vision has been proposed (Viriyavisuthisakul et al., 2017).
Another interesting application is monitoring vehicles at traffic signals. It has been used to improve traffic signals (Yao et al., 2013). The system keeps track of time by measuring the vehicle's length (Cai et al., 2010). The system spotted the vehicle in real-time because the camera was at the crossroads. The image was then processed using image processing methods. The bank’s queue system is also considered to provide efficient customer service, where two infrared sensors are used for real-time detection of queue at the entrance and exit (Gimba et al., 2020).
2.2. People Counting System
A people counter system is a system that is designed to count the number of people that passes through a specific designated area. Previously, people would usually count the number of people in a particular area by hand. In this modern time, various technologies have been created to make counting people smoother, faster, and more accurate. Some of the new technology designed to count people are infrared sensors, using cameras, thermal sensors, Wi-Fi sensors, and many more (Brown, 2019; Hughes, 2021). Every technology has its advantages and limitations, but it all comes down to functionality and usability. For example, Arief-Ang et al. (2018) developed a method of counting and detecting people using carbon-dioxide sensors. The cost of implementation is low, but the accuracy depends on the carbon-dioxide concentrations, which can be fluctuated due to external factors.
3.1. System Architecture
Figure 1 Overall system architecture
The system works by taking in an input video/footage, which could be a video or even live camera footage from the webcam. The OpenVINO software processes the input with its models and inference engine. To begin, the system ingests video from a file and function it frame by frame. Individuals in the frame are recognized using a pre-trained Deep Neural Network (DNN) model (Intel, 2021). To track people, the system will use a second pre-trained DNN to extract their features (Intel, 2020e). The deep learning algorithm depends on the components to be implemented, such as capacity limit, social distance, and line monitoring. The following subsections describe the implementation of these feature in detail.
3.1.1. Capacity Limit System
Figure 2 Block Diagram of Capacity Limit System (Intel, 2020a)
After person detection and tracking, the system will determine whether the people crossed any predetermined virtual gates based on the output frame's coordinates and whether the virtual gates indicated one direction or the other. Finally, the person counter's output is updated based on entry and exit data. If a certain number of people crossing the entry line exceeds the threshold, the system will trigger an alert that will pop up a warning at the output frame.
3.1.2. Social Distancing System
Figure 3 Block Diagram of Social Distancing System (Intel, 2020b)
The system processes the video frame by frame until the stream is complete. A DNN model will detect people in the frame of interest, and another DNN model will extract characteristics from them so they can be tracked. It then calculates the distance between two identified people based on their position, size, and viewpoint to see if the minimal social distance threshold has been exceeded.
3.1.3. Line Monitoring System
This reference implementation demonstrates a retail application that counts the number of people who are waiting in a retail store's waiting queue. The number of persons in a line is estimated by the program’s algorithm by performing an intersection between the people who have been identified in the frame (Intel, 2020c).
Figure 4 Block Diagram of Line Monitoring System (Intel, 2020c)
Figure 4 shows the block diagram of Line Monitoring system. This method has been integrated with the Capacity Limit System to provide a more efficient and user-friendly system for retails. This system detects individuals waiting outside the retail establishment prior entering.
3.2. Pre-trained Models
3.2.1. Pre-trained Model 1: People-Detection-Retail-0013
The people-detection-retail-0013 is a pre-trained model developed by Intel for person detection application. This model was obtained from Open Model Zoo and can be downloaded using their model downloader. It has 88.62 percent accuracy, uses the Caffe framework, and supports occluded pedestrians, among other features. This model uses the FP32 format, which is a single-precision floating-point format. This model uses a MobileNetV2-like backbone with depth-wise convolutions to reduce the number of calculations required for the 3x3 convolution block (Intel, 2021).
3.2.2. Pre-trained Model 2: People-Reidentification-Retail-0030
Intel also developed pre-trained models for reidentification purpose. In this work, the people-reidentification-retail-0300 model is used. This model was obtained from Open Model Zoo and can be downloaded using their model downloader. As input, it takes a full-body image and outputs an embedding vector that can be used to compare two images using cosine distance. The model is built on the OmniScaleNet backbone for rapid inference. A single reidentification head extracted from the 1/16 scale feature map generates a 512-float embedding vector (Intel, 2020e).
The system was assessed with five different experiments to evaluate its efficiency. The experiment focuses on analysing the accuracy of detecting people, the calculation for social distancing, and the counting for number of people entering and exiting a particular premise. There were five volunteers involved in the experiment. All the findings from the experiment are shown and discussed in Section 4.1. Section 4.2 evaluates the overall system performance with 6 videos in different environments.
4.1. Experimental Results
4.1.1. Experiment 1: Angle and Distance Test
Figure 5 The First Camera Setup (Close-Range, Body-Level)
Figure 6 The Second Camera Setup (Medium-Range, Bird’s Eyed View)