• International Journal of Technology (IJTech)
  • Vol 10, No 5 (2019)

A Novel Approach in Low-cost Motion Capture System using Color Descriptor and Stereo Webcam

A Novel Approach in Low-cost Motion Capture System using Color Descriptor and Stereo Webcam

Title: A Novel Approach in Low-cost Motion Capture System using Color Descriptor and Stereo Webcam
Muhammad Imanullah, Eko Mulyanto Yuniarno, Adri Gabriel Sooai

Corresponding email:

Cite this article as:
Imanullah, M., Yuniarno, E.M., Sooai, A.G., 2019. A Novel Approach in Low-cost Motion Capture System using Color Descriptor and Stereo Webcam. International Journal of Technology. Volume 10(5), pp. 942-952

Muhammad Imanullah Department of Electrical Engineering, Institut Teknologi Sepuluh Nopember, Sukolilo, Surabaya 60111, Indonesia
Eko Mulyanto Yuniarno Department of Computer Engineering, Institut Teknologi Sepuluh Nopember, Sukolilo, Surabaya 60111, Indonesia
Adri Gabriel Sooai Department of Computer Science, Universitas Katolik Widya Mandira, Kupang City 85225, Indonesia
Email to Corresponding Author

A Novel Approach in Low-cost Motion Capture System using Color Descriptor and Stereo Webcam

Animated motion has become crucial in some electronic entertainment business products, such as games, animated movies, and simulations. Making such animated motion is associated with high cost and hard setup requirements. This research proposes a low-cost system for capturing motion using stereo webcam (two webcams placed side by side) and some daily house-grade tools. The system test has been specifically designed for shadow puppet theaters. The setup consists of two identical webcams placed side by side to acquire the depth of a marker by stereo camera triangulation. Image processing is needed to improve object detection and feature matching. Once images are captured from two webcams, they will be inverted and color-filtered to detect the markers and set those markers as features to be matched. Each feature is packed with a unique descriptor based on its color composition. Features in both images are then compared to get their related matches. When matched, their depth and position can be calculated and recorded as a 3D representation that is ready to be processed as motion data. The proposed system is reasonably efficient since we can get an average accuracy of 83.5% using webcams that cost around $7.69.

Feature matching; Marker; Motion capture; Stereo camera


Electronic entertainment business products, such as games, animated movies, and simulations, have recorded innovations every year. Some basic asset requirements like 3D models, 3D environments, and animated motion for such products have also become crucial in their development process. To acquire such basic asset requirements, especially animated motion, we may need motion capture devices that have a range variety of cost and accuracy, but most of them are expensive. Many attempts, such as those by Huang et al. (2018) and Zecca et al. (2013), have focused on the main issue of capturing motion without any cost-related consideration. They use small IMU sensors attached to human body joints to acquire their orientation and position displacement, which facilitates motion capturing. Other attempts to solve the problem of cost have been made by Budiman et al. (2005), Chao et al. (2009), and Guarisa et al. (2016). Although their attempt to solve the problem was hampered by cost requirement, unlike what Huang et al. and Zecca et al. did with their latest technological approach, it provided alternatives for project scale, method, and affordance consideration.

Budiman et al. (2005) used a mean-shift algorithm to track detected objects and used black curtain as background with a white circular marker placed in the lower body part to make it easier to detect. He also used a camera calibration step to get each extrinsic camera matrix needed in finding the global coordinate of each marker from two webcams. Chao et al. (2009) used a dynamic background subtraction technique to ease the segmentation of human silhouette needed in 3D motion data reconstruction. Unlike Budiman et al., Chao et al. used four cameras and a color-marker-based spatial calibrating technique for fast and easier camera calibration to get the fundamental matrices and calculate the relative coordinate system. When it comes to the amount of camera used, Guarisa et al. (2016) used only one webcam that holds its aim for a low-cost motion capture feature. They developed their motion capture specifically to recognize the facial pose with markers of contrast color other than black and white. They successfully developed the low-cost and open-source facial motion capture even though the lack of depth value is unavoidable since only one camera is used.

Following the examples of the researches above, we decided to use two low-cost webcams (cost below $5 each) and hire a stereo camera triangulation system to estimate the depth instead of using intrinsic and extrinsic camera matrices calibration since the camera will be set in an unfixed position. The markers are made with various color differences to improve the feature matching process in a white background. We chose not to use black background as Budiman et al. (2005) did because most rooms in typical houses are painted white. We also proposed an image processing method to improve the detection of markers and color descriptors to match detected markers in both cameras.


The use of stereo camera triangulation to find depth in a low-cost motion capture setup is reasonably effective since we can get an accuracy of 83.5% using webcams that cost around $7.69. It serves us as a forgivable result in obtaining the 3D point reconstruction data with some calculated error shown in Table 1. Parameter adjustment, such as baseline (b) and focal length (f), is crucial to getting better output.

The combination of white background, colored markers, and image processing methods, such as inversion and color filtering, effectively helps our attempt in features extraction (as shown in Figure 8 where all markers can be extracted) even though the existence of shadows and the issue of white balance will worsen the result, as can be seen in Figure 10b. Those image processing methods may be considered as suitable to distinguish colored markers from white background. With those image processing methods, object detection within Aforge.NET framework is working more accurately.

Along with various colored markers, color descriptor effectively helps us distinguish each marker and improve the feature matching process since all markers in Figures 9 and 10 can be successfully distinguished. Feature matching is a crucial part of this low-cost motion capture setup since it lets us locate the corresponding features needed in the calculation of disparity (d). We hope our proposed methods would be useful in motion capture-related projects to improve effectiveness or performance.


Aforge.NET, 2012. Aforge. NET Framework. Available Online at http://aforgenet.com/framework/

Bindu, S., Prudhvi, S., Hemalatha, G., Sekhar, N.R., Nanchariah, V., 2014. Object Detection from Complex Background Image using Circular Hough Transform. International Journal of Engineering Research and Applications, Volume 4(4), pp. 23?28

Budiman, R., Bennamoun, M., Huynh, D., 2005. Low Cost Motion Capture. In: B. McCane (Ed.), Proceedings of IVCNZ 05 (Dunedin N.Z. Edition, Volume 1). Dunedin: I & VCNZ.

Chao, S.-P., Chen, Y.-Y., Chen, W.-C., 2009. The Cost-effective Method to Develop a Real-time Motion Capture System. In: Fourth International Conference on Computer Sciences and Convergence Information Technology: IEEE

Cunha, A., 2009. A Brief Introduction to Image Processing. Center for Advanced Computing Research California Institute of Technology

Das, D., Saharia, S., 2014. Implementation and Performance Evaluation of Background Subtraction Algorithms. International Journal on Computational Sciences & Applications (IJCSA), Volume 4(2), pp. 49?55

Guarisa, G.P., Angonese, A.T., Judice, S.F.P.P., 2016. Low-cost and Open Source Optical Capture System of Facial Performance In: SBC – Proceedings of SBGames, ISSN: 2179-2259, Brazil

Heale, R., Forbes, D., 2013. Understanding Triangulation in Research.  BMJ Journals Evidence-Based Nurs, Volume 16(4), pp. 98

Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-moll, G., 2018. Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time. Journal ACM Transactions on Graphics, Volume 37(6), pp.1?15

Kamencay, P., Breznan, M., Jarina, R., Lukac, P., Zachariasova, M., 2012. Improved Depth Map Estimation from Stereo Images based on Hybrid Method. Radioengineering, Volume 21(1), pp. 70?79

Kangas, V., 2011. A Comparison of Local Feature Detectors and Descriptors for Visual Object Categorization. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan

Ko, J., Ho, Y.-S., 2016. Stereo Matching using Census Transform of Adaptive Window Sizes with Gradient Images. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Wangju Institute of Science and Technology, Republic of Korea

Lee, J., Jun, D., Eem, C., Hong, H., 2016. Improved Census Transform for Noise Robust Stereo Matching. Optical Engineering, Volume 55(6), pp. 1?10

Li, S., Lihong, H., 2014. Research of Background Segmentation Method in Sports Video. TELKOMNIKA Indonesian Journal of Electrical Engineering, Volume 12(6), pp. 4274?4282

Miettinen, J.O., 2018. Shadow and Puppet Theatre. Asian Traditional Theatre and Dance, ISBN 978-952-7218-23-5, Theatre Academy of the University of the Arts Helsinki

Muhlmann, K., Maier, D., Hesser, J., Manner, R., 2002. Calculating Dense Disparity Maps from Color Stereo Images, an Efficient Implementation. In: Proceedings IEEE Workshop on Stereo and Multi-baseline Vision (SMBV 2001), August 2002, Kauai, HI, USA

Shen, Y., Peng, P., Gao, W., 2012. 3D Reconstruction from a Single Family Camera In: IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI), October 2012, Nanjing, Jiangsu, China

Zecca, M., Saito, K., Sessa, S., Bartolomeo, L., Lin, Z., Cosentino, S., Ishii, H., Ikai, T., Takanishi, A., 2013. Use of an Ultra-miniaturized IMU-based Motion Capture System for Objective Evaluation and Assessment of Walking Skills In: The 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), September 2013, Osaka, Japan