Published at : 30 Oct 2019
Volume : IJtech
Vol 10, No 5 (2019)
DOI : https://doi.org/10.14716/ijtech.v10i5.2789
Muhammad Imanullah | Department of Electrical Engineering, Institut Teknologi Sepuluh Nopember, Sukolilo, Surabaya 60111, Indonesia |
Eko Mulyanto Yuniarno | Department of Computer Engineering, Institut Teknologi Sepuluh Nopember, Sukolilo, Surabaya 60111, Indonesia |
Adri Gabriel Sooai | Department of Computer Science, Universitas Katolik Widya Mandira, Kupang City 85225, Indonesia |
Animated motion has become
crucial in some electronic entertainment business products, such as games,
animated movies, and simulations. Making such animated motion is associated
with high cost and hard setup requirements. This research proposes a low-cost
system for capturing motion using stereo webcam (two webcams placed side by
side) and some daily house-grade tools. The system test has been specifically
designed for shadow puppet theaters. The setup consists of two identical
webcams placed side by side to acquire the depth of a marker by stereo camera
triangulation. Image processing is needed to improve object detection and
feature matching. Once images are captured from two webcams, they will be
inverted and color-filtered to detect the markers and set those markers as
features to be matched. Each feature is packed with a unique descriptor based
on its color composition. Features in both images are then compared to get
their related matches. When matched, their depth and position can be calculated
and recorded as a 3D representation that is ready to be processed as motion
data. The proposed system is reasonably efficient since we can get an average
accuracy of 83.5% using webcams that cost around $7.69.
Feature matching; Marker; Motion capture; Stereo camera
Electronic entertainment business products, such as games, animated movies, and simulations, have recorded innovations every year. Some basic asset requirements like 3D models, 3D environments, and animated motion for such products have also become crucial in their development process. To acquire such basic asset requirements, especially animated motion, we may need motion capture devices that have a range variety of cost and accuracy, but most of them are expensive. Many attempts, such as those by Huang et al. (2018) and Zecca et al. (2013), have focused on the main issue of capturing motion without any cost-related consideration. They use small IMU sensors attached to human body joints to acquire their orientation and position displacement, which facilitates motion capturing. Other attempts to solve the problem of cost have been made by Budiman et al. (2005), Chao et al. (2009), and Guarisa et al. (2016). Although their attempt to solve the problem was hampered by cost requirement, unlike what Huang et al. and Zecca et al. did with their latest technological approach, it provided alternatives for project scale, method, and affordance consideration.
Budiman
et al. (2005) used a
mean-shift algorithm to track detected objects and used black curtain as
background with a white circular marker placed in the lower body part to make
it easier to detect. He also used a camera calibration step to get each
extrinsic camera matrix needed in finding the global coordinate of each marker
from two webcams. Chao et al. (2009) used a dynamic background subtraction
technique to ease the segmentation of human silhouette needed in 3D motion data
reconstruction. Unlike Budiman et al., Chao et al. used four cameras and a
color-marker-based spatial calibrating technique for fast and easier camera
calibration to get the fundamental matrices and calculate the relative
coordinate system. When it comes to the amount of camera used, Guarisa et al.
(2016) used only one webcam that holds its aim for a low-cost motion capture
feature. They developed their motion capture specifically to recognize the
facial pose with markers of contrast color other than black and white. They
successfully developed the low-cost and open-source facial motion capture even
though the lack of depth value is unavoidable since only one camera is used.
Following
the examples of the researches above, we decided to use two low-cost webcams
(cost below $5 each) and hire a stereo camera triangulation system to estimate
the depth instead of using intrinsic and extrinsic camera matrices calibration since
the camera will be set in an unfixed position. The markers are made with
various color differences to improve the feature matching process in a white
background. We chose not to use black background as Budiman et al. (2005) did because most rooms
in typical houses are painted white. We also proposed an image processing
method to improve the detection of markers and color descriptors to match
detected markers in both cameras.
The
use of stereo camera triangulation to find depth in a low-cost motion capture
setup is reasonably effective since we can get an accuracy of 83.5% using
webcams that cost around $7.69. It serves us as a forgivable result in
obtaining the 3D point reconstruction data with some calculated error shown in
Table 1. Parameter adjustment, such as baseline (b) and focal length (f), is
crucial to getting better output.
The
combination of white background, colored markers, and image processing methods,
such as inversion and color filtering, effectively helps our attempt in
features extraction (as shown in Figure 8 where all markers can be extracted)
even though the existence of shadows and the issue of white balance will worsen
the result, as can be seen in Figure 10b. Those image processing methods may be
considered as suitable to distinguish colored markers from white background.
With those image processing methods, object detection within Aforge.NET
framework is working more accurately.
Along
with various colored markers, color descriptor effectively helps us distinguish
each marker and improve the feature matching process since all markers in
Figures 9 and 10 can be successfully distinguished. Feature matching is a
crucial part of this low-cost motion capture setup since it lets us locate the
corresponding features needed in the calculation of disparity (d). We hope our
proposed methods would be useful in motion capture-related projects to improve
effectiveness or performance.
Aforge.NET, 2012. Aforge. NET Framework. Available Online at http://aforgenet.com/framework/
Bindu, S., Prudhvi, S., Hemalatha, G., Sekhar, N.R., Nanchariah, V., 2014. Object Detection from Complex Background Image using Circular Hough Transform. International Journal of Engineering Research and Applications, Volume 4(4), pp. 23?28
Budiman, R., Bennamoun, M., Huynh, D., 2005. Low Cost Motion Capture. In: B. McCane (Ed.),
Proceedings of IVCNZ 05 (Dunedin N.Z. Edition, Volume 1). Dunedin: I &
VCNZ.
Chao, S.-P., Chen, Y.-Y., Chen, W.-C., 2009. The Cost-effective Method
to Develop a Real-time
Motion Capture System. In: Fourth
International Conference on Computer Sciences and Convergence Information
Technology: IEEE
Cunha, A., 2009. A Brief Introduction
to Image Processing. Center
for Advanced Computing Research California Institute of Technology
Das, D., Saharia, S., 2014. Implementation and Performance Evaluation of Background Subtraction Algorithms. International Journal on Computational Sciences & Applications (IJCSA), Volume 4(2), pp. 49?55
Guarisa, G.P., Angonese, A.T., Judice, S.F.P.P., 2016.
Low-cost and Open Source Optical Capture System of Facial Performance In:
SBC – Proceedings of SBGames, ISSN: 2179-2259, Brazil
Heale, R., Forbes, D., 2013. Understanding Triangulation in Research.
BMJ Journals Evidence-Based Nurs, Volume 16(4), pp. 98
Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-moll, G., 2018. Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time. Journal ACM Transactions on Graphics, Volume 37(6), pp.1?15
Kamencay, P., Breznan, M., Jarina, R., Lukac, P., Zachariasova, M., 2012. Improved Depth Map Estimation from Stereo Images based on Hybrid Method. Radioengineering, Volume 21(1), pp. 70?79
Kangas, V., 2011. A Comparison
of Local Feature Detectors and Descriptors for Visual Object Categorization. In: Proceedings of the 21st
International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan
Ko, J., Ho, Y.-S., 2016. Stereo Matching using Census Transform of Adaptive
Window Sizes
with Gradient Images. In: 2016 Asia-Pacific Signal and
Information Processing Association Annual Summit and Conference (APSIPA), Wangju Institute of Science and Technology,
Republic of Korea
Lee, J., Jun, D., Eem, C., Hong, H., 2016. Improved Census Transform for Noise Robust Stereo Matching. Optical Engineering, Volume 55(6), pp. 1?10
Li, S., Lihong, H., 2014. Research of Background Segmentation Method in Sports Video. TELKOMNIKA Indonesian Journal of Electrical Engineering, Volume 12(6), pp. 4274?4282
Miettinen, J.O., 2018. Shadow and Puppet Theatre. Asian Traditional Theatre and Dance, ISBN
978-952-7218-23-5, Theatre Academy of the University of the Arts Helsinki
Muhlmann, K., Maier, D., Hesser, J., Manner, R., 2002.
Calculating Dense Disparity
Maps from Color
Stereo Images,
an Efficient Implementation. In: Proceedings IEEE Workshop on Stereo
and Multi-baseline Vision (SMBV 2001), August 2002, Kauai, HI, USA
Shen, Y., Peng, P., Gao, W., 2012. 3D Reconstruction
from a Single Family Camera In: IEEE Fifth
International Conference on Advanced Computational Intelligence (ICACI), October
2012, Nanjing, Jiangsu, China
Zecca, M., Saito, K., Sessa, S., Bartolomeo, L., Lin,
Z., Cosentino, S., Ishii, H., Ikai, T., Takanishi, A., 2013. Use of an Ultra-miniaturized IMU-based Motion Capture System for Objective
Evaluation and Assessment of Walking Skills In: The 35th
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC),
September 2013, Osaka, Japan