Histogram Equalization Implementation in the Preprocessing Phase on Optical Character Recognition

Title: Histogram Equalization Implementation in the Preprocessing Phase on Optical Character Recognition

Authors
Authors and Affiliations

Peter Pangestu, Dennis Gunawan, Seng Hansun

Corresponding email: peterpangestu@live.com

Published at : 31 Oct 2017
Volume : IJtech Vol 8, No 5 (2017)
DOI : https://doi.org/10.14716/ijtech.v8i5.877

Cite this article as:

Pangestu, P., Gunawan, D., Hansun, S., 2017. Histogram Equalization Implementation in the Preprocessing Phase on Optical Character Recognition. International Journal of Technology. Volume 8(5), pp. 947-956

1,075

Downloads

Peter Pangestu	Computer Science Study Program, Faculty of Engineering and Informatics Universitas Multimedia Nusantara, Jl. Scientia Boulevard, Gading Serpong, Tangerang, Banten 15811, Indonesia
Dennis Gunawan	Computer Science Study Program, Faculty of Engineering and Informatics Universitas Multimedia Nusantara, Jl. Scientia Boulevard, Gading Serpong, Tangerang, Banten 15811, Indonesia
Seng Hansun	Computer Science Study Program, Faculty of Engineering and Informatics Universitas Multimedia Nusantara, Jl. Scientia Boulevard, Gading Serpong, Tangerang, Banten 15811, Indonesia

Email to Corresponding Author

Abstract

Histogram Equalization Implementation in the Preprocessing Phase on Optical Character Recognition

A 2014 report from Digital Marketing Philippines stated that the number of web applications with visual content as their main product has increased significantly. Image processing technology has also undergone significant growth. One example of this is optical character recognition (OCR), which can convert the text on an image to plain text. However, a problem occurs when the image has low contrast and low exposure, which potentially results in information being hidden in the image. To address this problem, histogram equalization is used to enhance the image’s contrast so the hidden information can be shown. Similar to X-ray scanning used in the medical field, histogram equalization processes scanned images that have low brightness and low contrast. In this study, histogram equalization was successfully implemented using OCR preprocessing. The test was done with a dataset that contains dark background images with low light text; the successful outcome resulted in the ability to show 74.95% of the information hidden in the image.

Keywords

Contrast enhancement; Histogram equalization; Image processing; Information hiding; Optical character recognition

Introduction

In 2014, Digital Marketing Philippines, a digital consultant, issued an analysis of a survey conducted on visuals used in web applications. The analysis explained that the use of visual content in web applications has developed very rapidly (Digitalmarketingphilippines.com, 2014). This suggests that visual message delivery methods, such as infographics and charts, can provide users with more information and draw greater attention. These developments have also been followed by the development of image processing technologies, such as face detection, object detection, and optical character recognition (OCR) (MathWorks.com, 2016).

Using OCR, images can be produced through a process that includes a series of arrangements, background separation, and matching characters (Sánchez et al., 2012). OCR can be achieved using machines that are currently popular. An OCR readeris not versatile; it must be supported by good conditions and it must meet suitable image matching criteria. Light-dark settings and resolution also affect the performance of an OCR reader. The success of machine translation is also affected by the engine as well as the techniques used in the OCR (Abbyy-developers.eu, 2015). Therefore, OCR cannot be run optimally if the image inserted does not support the desired conditions. Thus, preprocessing is very important in order to create an image that is

ready to be processed. One important preprocessing step is the improvement of the image with dark and light colors.

No OCR reader can read all the conditions of an image perfectly (Abbyy-developers.eu, 2015). Various tests have been conducted with a variety of datasets (iapr-tc11.org, 2015). However, the majority of the images contained in the dataset collection have had fairly good image conditions (quite bright and good contrast), and they have supported the delivery of clear information from the image (no hidden information). Therefore, to ensure that OCR can be used for a variety of image conditions, improvements in the color, light, and dark elements used in an image (color adjustment) are needed. Some of the methods commonly used to improve the condition of an image include histogram equalization, Wiener filtering, median filtering, decorrelation stretch, and unsharp mask filters (MathWorks.com, 2016).

In the present study, histogram equalization was selected as the image enhancement method because it is similar to X-ray scanning that is used in the medical field to scan organs. Histogram equalization was used to clarify the background and the object of regional differences, and to identify information hidden in the images due to low light and low contrast (Akhlis & Sugiyanto, 2011). In addition, this method is considered fairly common and easy to apply to an image (Alginahi, 2010).

Experimental Methods

The research methodology used in the present study was implemented in the following stages.

3.1. Learning and Consulting

A literature review was conducted to identify previous research on this topic. This phase was done by collecting supporting data associated with the present research study. The data collection was done by reviewing various types of scientific work, such as books, journals, and articles. This phase is carried out so that the research study can be conducted in accordance with the provisions presented in previous studies in order to produce a valid conclusion.

3.2. Designing the Application and Identifying the Analysis Requirements

After collecting the supporting data, we conducted a needs analysis to determine the standard to be used in research. In addition to the analysis, we designed the application that will be used as media in the study. This design resulted in several documents, namely flowcharts and the structure of the storage table. Figure 6 shows the procedures we undertook to implement the histogram equalization method used in this research.

Figure 6 Flowchart of the histogram equalization method used in the study

3.3. Programming

After establishing the design and determining the analysis requirements, we programmed the application. The application served as a media liaison between the users and the systems used for implementation. The application was designed so users can enter their own samples and obtain rapid analysis of them.

3.4. Testing and Debugging

After completing the programming stage, the application was tested. The tests were conducted using all the functions that were made in the programming stage (Stage 3). In addition to testing all the functions, we also performed tests on all the possibilities that could occur in the application when it is used by a user.

3.5. Collecting Samples

After the testing was completed, the application was deemed ready to be used in real-time by users. Therefore, we collected the samples. The sample collection was done by looking for a random sample in accordance with the needs generated in the designing the application and identifying the analysis requirements (Stage 2).

3.6. Analysis of the Test Results

After obtaining a variety of samples from users, we further analyzed the data to determine the impact of the implementation results and conclusions from the application of the histogram equalization method used in the study.

3.7. Report and Documentation

To complete the research activities, we wrote a report to document the procedures and findings of the research study.

Results and Discussion

After designing and implementing the program, further testing was done using the samples that were collected. The samples collected by searching for dark background samples tended to be black, have low contrast, and were not crowded. The analysis aimed to determine if the image conditions were in accordance with the application of histogram equalization in general. The tests were performed offline using a sample dataset, which contained images with a dark background picture in dominant black (RGB # 000000). Then, some texts were added on each image. Selection of the color of the text was done by adding a hexadecimal value 10 for each component of red, green, and blue (RGB) of the background; for example, RGB #000000 to RGB #101010. The test results are presented in Table 1.

Table 1 The testing results

No.		Title		Text				Target		Success (%)
No.		Title		Before histogram equalization		After histogram equalization				Before histogram equalization		After histogram equalization
1	sugar.png				lele jumbo		lele jumbo		0		100
2	manus.png				I will kill you		I will kill you		0		100
3	phantom-assassin-dota-2-dota-2740(1).png				assasin		assasin		0		100
4	cool_dark_wallpaper.png				cool		cool		0		100
5	12June2012-Low-light-focusing-lrg				LION KING		LION KING		0		100
6	images_(5).jpg				the dragon		the dragon		0		100
7	images.jpg				S N THE DARKEST PLACES		THERE IS LIGHT EVEN IN THE DARKEST PLACES		0		48.6486486
8	Normal21.bmp		POLITICS H appuintment he				POLITICS H Appointment he general manage		63.1578947		0
9	Normal22.bmp		TAIW				TAIW Taipei		40		0
10	Normal23.bmp		Not to				not to		100		0
11	Normal25.bmp		Chand Said				e arguments change said		47.3684211		0
12	Norhe mal26.bmp		the World				the world		100		0
13	Normal27.bmp		mm mm mbigs she				news feature abies lishe		20.8333333		0
14	Normal28.bmp		atlc exerci of detention tlso reneged				atlc exerci of detention lse reneged		97.14286		0
15	Normal29.bmp						in		0		0
16	Shadow1.bmp		THURSDAY		zrea ysterda		THURSDAY		100		0
17	Shadow2.bmp		massacre and mothers tell it like it		tell it like		a massacre and mothers tell it like it is		93.93939		30.3030303
18	Shadow4.bmp		Conclusion Histogram equalization was successfully implemented during the OCR preprocessing phase by using a web-based program (PHP). The research results demonstrate that implementing histogram equalization during OCR preprocessing improved the OCR performance. The dataset that contained a collection of predominantly black and dark images inserted with dark texts increased the percentage of success to approximately 74.95%. In the future, additional studies can be conducting using other advanced methods to improve the image contrast without causing a lot of noise, such as adaptive histogram equalization and contrast-limited adaptive histogram equalization. We believe that more advance methods could result in better image contrast than is possible using a normal histogram equalization method. References Abbyy-developers.eu, 2015. Image Processing and Binarisation for Camera OCR. Available online at https://abbyy.technology/en:features:ocr:cameraocr-preprocessing-binarisation Ahmad, N., Hadinegoro, A., 2012. Metode Histogram Equalization untuk Perbaikan Citra Digital. In: Proceedings of Seminar Nasional Teknologi Informasi & Komunikasi Terapan 2012, Semarang: Indonesia, INFRM, pp. 439–445 Akhlis, I., Sugiyanto, 2011. Implementasi Metode Histogram Equalization untuk Meningkatkan Kualitas Citra Digital. Jurnal Fisika, Volume 1(2), pp. 70–74 Alginahi, Y., 2010. Preprocessing Techniques in Character Recognition, Character Recognition, Minoru Mori (Ed.), ISBN: 978-953-307-105-3, InTech. Available online at http://cdn.intechopen.com/pdfs-wm/11405.pdf Digitalmarketingphilippines.com. 2014. Amazing Facts and Statistics about Visual Web. Available online at http://digitalmarketingphilippines.com/wp-content/uploads/2014/01/Amazing-Facts-and-Statistics-about-Visual-Web.jpg Gonzalez, R., Woods, R., 2008. Digital Image Processing (3^rd ed). New Jersey: Prentice-Hall Holley, R., 2009. How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs. Available online at http://www.dlib.org/dlib/march09/holley/03holley.html iapr-tc11.org, 2015. Datasets List - TC11. Available online at http://www.iapr-tc11.org/mediawiki/index.php/Datasets_List Krutsch, R., Tenorio, D., 2011. Histogram Equalization. Guadalajara: Freescale Semiconductor Application Note Number AN4318, Rev 0 MathWorks.com, 2016. Image Processing and Computer Vision Examples. Available online at http://www.mathworks.com/examples/product-group/matlab-image-processing-and-computer-vision Mithe, R., Indalkar, S., Divekar, N., 2013. Optical Character Recognition. International Journal of Recent Technology and Engineering (IJRTE), Volume 2 (1), pp. 72–75 Rachman, E.M.B.P., 2014. Histogram Equalisation. Available online at http://ilmukomputer.org/wp-content/uploads/2014/02/Histogram-Equalisation-Pengolahan-Citra-Digital.odt Rice, S.V., Jenkins, F.R., Nartker, T.A., 1995. The Fourth Annual Test of OCR Accuracy. Available online at http://www.expervision.com/wp-content/uploads/2012/12/1995.The_Fourth_Annual_Test_of_OCR_Accuracy.pdf Sánchez, J., Perronnin, F., de Campos, T., 2012. Modeling the Spatial Layout of Images Beyond Spatial Pyramids. Pattern Recognition Letters, Volume 33(16), pp. 2216–2223 Xcitex, Inc., 2010. Image Processing: Brightness, Contrast, Gamma, and Exponential/Logarithmic Settings in ProAnalyst. Available online at http://www.xcitex.com/Resource%20Center/ProAnalyst/Application%20Notes/App%20Note%20151%20-%20Image%20Processing%20Brightness,%20Contrast,%20Gamma%20and%20Exponential.pdf Zybert, C., 2014. How does Optical Character Recognition Work. Available online at http://nedocs.com/how-does-optical-character-recognition-work/ TY - JOUR T1 - Histogram Equalization Implementation in the Preprocessing Phase on Optical Character Recognition AU - Peter Pangestu,Dennis Gunawan,Seng Hansun JO - International Journal of Technology VL - 8 IS - 5 SP - 291 EP - 319 PY - 2017 DA - 2017/10/31 SN - 2087-2100 DO - https://doi.org/10.14716/ijtech.v8i5.877 UR - https://ijtech.eng.ui.ac.id/article/view/877 Download PDF Who cite this paper Email Facebook Twitter Google LinkedIn Whatsapp Table of Contents Article Abstract Introduction Experimental Methods Results and Discussion Conclusion References Copyright © 2017 Faculty of Engineering International Journal of Technology IJTech secretariat, Engineering Center Bld., 2^nd Fl. Faculty of Engineering, Universitas Indonesia Depok 16424, Indonesia.