Development of Eye Fixation Points Prediction Model from Eye Tracking Data using Neural Network

Boy Nurtjahyo Moch., Komarudin Komarudin, Maulana Senjaya Susilo

Published at : 27 Dec 2017
https://doi.org/10.14716/ijtech.v8i6.717

Boy Nurtjahyo Moch. - Department of Industrial Engineering. Faculty of Engineering, Universitas Indonesia
Komarudin Komarudin Department of Industrial Engineering, Faculty of Engineering, Universitas Indonesia
Maulana Senjaya Susilo Department of Industrial Engineering, Faculty of Engineering, Universitas Indonesia
Fixation points, as the stopping location of eye movements, can be extracted to generate valuable information about a picture or an object. This information is valuable as it enables the identification of the area/part of the picture that attracts people’s attention, which can be used as a consideration when making decisions in the future, for example in marketing. For this reason, in this study, a Neural Network (NN) model was developed to predict the fixation points of a picture. Specifically, the authors experimented with various transfer and training functions in the NN in order to determine which causes the fewest errors. The results show that the method used is applicable in practice since it produces MAPE (Mean Absolute Percent Error) of around 13–15% and MSE (Mean Squared Error) of 0.9–1.1%.

Eye tracking; Fixation points; Neural network; MAPE; MSE


From several analyses that have been performed, there are several conclusions. In terms of the accuracy of the prediction model, as measured by the smallest error value, the best combination of functions is purelin-purelin, trainscg. The combination of these functions ranks first in MSE and second in MAPE calculations. In addition, in terms of the computing performance of the prediction model, as measured by the smallest number of iterations and the shortest training duration, the best combination of functions is purelin-purelin and trainbfg. Moreover, the trainscg training function produces a smaller range of MAPE values than traingdx or trainbfg. Lastly, the trainbfg training function involves a shorter training duration and a smaller number of iterations than traingdx or trainscg. For future research, several future works can be proposed. Instead of viewing a picture, respondents may be more attracted to seeing human faces. Therefore, future work could use pictures that include a human face as the research object. In this situation, the testing would be more accurate if additional tools were employed, for example the Viola Jones face detector and the Felzenszwalb person detector. In addition, the combination of the training and testing carried out in this research is limited only to a combination of inter-layer transfer and training functions. Therefore, more research should be performed to investigate different combinations of numbers of layers and network types.


This study was financially supported by Hibah PITTA 2017 from the Directorate of Research and Community Engagement, Universitas Indonesia.


