• International Journal of Technology (IJTech)
  • Vol 13, No 6 (2022)

Smart Retail Monitoring System using Intel OpenVINO Toolkit

Smart Retail Monitoring System using Intel OpenVINO Toolkit

Title: Smart Retail Monitoring System using Intel OpenVINO Toolkit
Iskandar Zulkarnain Jafriz, Sarina Mansor

Corresponding email:

Cite this article as:
Jafriz, I.Z., Mansor, S., 2022. Smart Retail Monitoring System using Intel OpenVINO Toolkit. International Journal of Technology. Volume 13(6), pp. 1241-1250

Iskandar Zulkarnain Jafriz Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia
Sarina Mansor Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia
Email to Corresponding Author

Smart Retail Monitoring System using Intel OpenVINO Toolkit

In the era of Covid-19 infection, enforcing social distance is essential for confined areas such as shopping retails and classrooms. Human workforce is used to ensure the safety measure rules are adhered. However, a better technique to enforce social distancing regulations is to use an automated system that counts and detects people and measures the social distance. This work proposes an innovative retail monitoring system based on the Intel Distribution of Open VINO toolkit. The system uses deep learning techniques and trained models to automatically count the number of individuals, the number of persons entering and exiting premises, and the distance between each person to ensure social distancing. Five experiments were conducted to evaluate the efficiency and accuracy of the system.

Intel OpenVINO; Machine learning; Monitoring system; Smart retail; Social distancing


    Several new practices and preventive measures were introduced during the Covid-19 pandemic (Berawi et al., 2020; Baroroh & Agarwal, 2022; Romadlon et al., 2022). One of them is social distancing, defined as keeping a safe gap between persons who are not in the same bubble. An infected person coughing, sneezing, or speaking may infect the next person closest to them (Gupta, 2020).

    Currently, workers such as security guards and retail workers are used to checking the temperatures of their customers, counting the number of people on the premises and allocating every customer with their time limits. As humans, doing multiple tasks at once is an exhausting chore as it is difficult for us to comprehend the speed and precision required to perform those tasks. Using people to count and keep track of the time limits is an imprecise and ineffective way to control people in a space especially during peak times. Providing an automated people counter system is a more efficient way to count, detect people, and ensure social distancing.
      Few products offer automated people counter system for retails stores and shopping malls, such as “FootfallCam” (FootfallCam, 2022) and SensMax (2022). However, most existing solutions require high-end hardware and software. This paper presents an intelligent retail monitoring system developed using Intel Distribution of OpenVINO toolkit. The system can run on a single-based computer (SBC) to reduce the cost of implementation. It has features to perform line monitoring, capacity limit and social distance detection.

Relates Works

2.1. Line Monitoring System

      A line monitoring system considered as a crowd control system. It counts people in a specific area, frame, or queue. Several applications could utilise the line monitoring system. One popular application is queue monitoring at the convenience store. Due to excessively long and poorly managed payment queues, many retail customers abandon their purchases. To solve this problem, an autonomous queue monitoring system based on computer vision has been proposed (Viriyavisuthisakul et al., 2017).

     Another interesting application is monitoring vehicles at traffic signals. It has been used to improve traffic signals (Yao et al., 2013). The system keeps track of time by measuring the vehicle's length (Cai et al., 2010). The system spotted the vehicle in real-time because the camera was at the crossroads. The image was then processed using image processing methods. The bank’s queue system is also considered to provide efficient customer service, where two infrared sensors are used for real-time detection of queue at the entrance and exit (Gimba et al., 2020).

2.2. People Counting System

     A people counter system is a system that is designed to count the number of people that passes through a specific designated area. Previously, people would usually count the number of people in a particular area by hand. In this modern time, various technologies have been created to make counting people smoother, faster, and more accurate. Some of the new technology designed to count people are infrared sensors, using cameras, thermal sensors, Wi-Fi sensors, and many more (Brown, 2019; Hughes, 2021). Every technology has its advantages and limitations, but it all comes down to functionality and usability.  For example, Arief-Ang et al. (2018) developed a method of counting and detecting people using carbon-dioxide sensors. The cost of implementation is low, but the accuracy depends on the carbon-dioxide concentrations, which can be fluctuated due to external factors.

     Another work by Chang et al. (2018) detected people in a restricted area based on Wi-Fi signals. The information obtained from Wi-Fi channel state was analysed using Deep Neural Network to estimate the number of people. No additional setup was needed, but this method works for indoor environment only. The closest related work to the proposed system in the vision-based people detection system developed by Parthornratt et al. (2016). This system was deployed on Raspberry Pi board and using a Pi camera to capture images. The face detection algorithm was used to detect and count the number of people passing by. Some limitations of the system include customer must face the camera with a minimum distance of 3cm, no head covering, long processing time, and Raspberry Pi speed. Our project proposes to overcome these limitations. The preliminary work for this project has been presented in 2020 (Aslam, 2021), where a low-cost people detector has been developed. The result compared two libraries performance: OpenCV and OpenVINO. It was found out that the system with OpenVINO utilisation is faster at performing inferences and more suitable for real-time applications. Therefore, the proposed intelligent retail monitoring system will use the OpenVINO toolkit. 

Experimental Methods

3.1. System Architecture

       Figure 1 shows the overall system architecture for the smart retail monitoring system. The system requires a PC, a monitor and a webcam. It uses Ubuntu 18.04 operating system and an Intel Distribution of OpenVINO installed on the PC. The uniqueness of the proposed system as compared to the existing ones is the utilization of Intel OpenVINO toolkit (Intel, 2020d). This toolkit has enormous number of pre-installed deep-learning models, which could speed-up the inference stage. The OpenVINO version that was used in this project was version 2020.3. The proposed system is the combination of three single projects developed by Intel, which are: Capacity Limit (Intel, 2020a), Social Distance (Intel, 2020b) and Line Monitoring (Intel, 2020c).

Figure 1 Overall system architecture

        The system works by taking in an input video/footage, which could be a video or even live camera footage from the webcam. The OpenVINO software processes the input with its models and inference engine. To begin, the system ingests video from a file and function it frame by frame. Individuals in the frame are recognized using a pre-trained Deep Neural Network (DNN) model (Intel, 2021). To track people, the system will use a second pre-trained DNN to extract their features (Intel, 2020e). The deep learning algorithm depends on the components to be implemented, such as capacity limit, social distance, and line monitoring. The following subsections describe the implementation of these feature in detail.

3.1.1.   Capacity Limit System

        This is Intel's retail capacity limit application which counts people entering and exiting the store. Virtual lines are drawn in entrance and exit areas to serve as ‘virtual gates’ (Intel, 2020a). Figure 2 shows the block diagram of Capacity Limit system.

Figure 2 Block Diagram of Capacity Limit System (Intel, 2020a)           

        After person detection and tracking, the system will determine whether the people crossed any predetermined virtual gates based on the output frame's coordinates and whether the virtual gates indicated one direction or the other. Finally, the person counter's output is updated based on entry and exit data. If a certain number of people crossing the entry line exceeds the threshold, the system will trigger an alert that will pop up a warning at the output frame.

3.1.2.   Social Distancing System

        This reference implementation demonstrates a retail social distance application that identifies and measures the distance between two persons in a retail setting. If the distance between the two points is less than a value previously specified by the user, an alarm is triggered (Intel, 2020b). Figure 3 shows the block diagram of the social distance system.

Figure 3 Block Diagram of Social Distancing System (Intel, 2020b)

        The system processes the video frame by frame until the stream is complete. A DNN model will detect people in the frame of interest, and another DNN model will extract characteristics from them so they can be tracked. It then calculates the distance between two identified people based on their position, size, and viewpoint to see if the minimal social distance threshold has been exceeded.

3.1.3.   Line Monitoring System

        This reference implementation demonstrates a retail application that counts the number of people who are waiting in a retail store's waiting queue. The number of persons in a line is estimated by the program’s algorithm by performing an intersection between the people who have been identified in the frame (Intel, 2020c).

Figure 4 Block Diagram of Line Monitoring System (Intel, 2020c)

      Figure 4 shows the block diagram of Line Monitoring system. This method has been integrated with the Capacity Limit System to provide a more efficient and user-friendly system for retails. This system detects individuals waiting outside the retail establishment prior entering.

3.2. Pre-trained Models

3.2.1. Pre-trained Model 1: People-Detection-Retail-0013        

   The people-detection-retail-0013 is a pre-trained model developed by Intel for person detection application. This model was obtained from Open Model Zoo and can be downloaded using their model downloader. It has 88.62 percent accuracy, uses the Caffe framework, and supports occluded pedestrians, among other features. This model uses the FP32 format, which is a single-precision floating-point format. This model uses a MobileNetV2-like backbone with depth-wise convolutions to reduce the number of calculations required for the 3x3 convolution block (Intel, 2021).

3.2.2. Pre-trained Model 2: People-Reidentification-Retail-0030

   Intel also developed pre-trained models for reidentification purpose. In this work, the people-reidentification-retail-0300 model is used. This model was obtained from Open Model Zoo and can be downloaded using their model downloader. As input, it takes a full-body image and outputs an embedding vector that can be used to compare two images using cosine distance. The model is built on the OmniScaleNet backbone for rapid inference. A single reidentification head extracted from the 1/16 scale feature map generates a 512-float embedding vector (Intel, 2020e).

Results and Discussion

    The system was assessed with five different experiments to evaluate its efficiency. The experiment focuses on analysing the accuracy of detecting people, the calculation for social distancing, and the counting for number of people entering and exiting a particular premise. There were five volunteers involved in the experiment. All the findings from the experiment are shown and discussed in Section 4.1. Section 4.2 evaluates the overall system performance with 6 videos in different environments.

4.1. Experimental Results

4.1.1.   Experiment 1: Angle and Distance Test

      This test is designed to determine the optimal camera angle and distance for detecting people. The camera is tested in three different locations. The first camera setup is close-range, which is at the body level of participants. The second camera set-up is medium range, which provides a "bird's-eye" view of the entire specified area. Finally, the camera was positioned at the top of the area, pointing downward, providing an overhead perspective of the people in that location.  This experiment is tested using the Social Distancing System.

Figure 5 The First Camera Setup (Close-Range, Body-Level)

     The output frame of the first camera setup is shown in Figure 5. As illustrated in the figure, all people can be identified except one. A green bounding box for two people is drawn, as indicated by the arrow in Figure 5. This demonstrates that the system cannot detect the individual who is overlapping with another individual in front of them.

Figure 6 The Second Camera Setup (Medium-Range, Bird’s Eyed View)

     The output frame of the second camera arrangement is shown in Figure 6. It is positioned above the volunteer's head and a short distance from the first arrangement. As can be seen, all the volunteers are visible from the camera's perspective, were detected by the system, and were all drawn by red and yellow bounding boxes.