Study of using hybrid deep neural networks in character extraction from images containing text

P Preethi; HR Mamatha; Hrishikesh Viswanath

doi:10.17352/tcsit.000039

Trends in Computer Science and Information Technology volume6-issue2 articles

PDF HTML

Submitted: July 26, 2021

Published: Aug 4, 2021

DOI: 10.17352/tcsit.000039

Keywords:

Feed forward neural network, Convolutional neural network, Support vector machine, Epigraphical scripts, Segmentation, Sliding window, CNN, SVM

P Preethi

Department of Computer Science and Engineering, People’s Education Society University, Bangalore, India

HR Mamatha

Department of Computer Science and Engineering, People’s Education Society University, Bangalore, India

Hrishikesh Viswanath*

Department of Computer Science and Engineering, People’s Education Society University, Bangalore, India

Abstract

Character segmentation from epigraphical images helps the optical character recognizer (OCR) in training and recognition of old regional scripts. The scripts or characters present in the images are illegible and may have complex and noisy background texture. In this paper, we present an automated way of segmenting and extracting characters on digitized inscriptions. To achieve this, machine learning models are employed to discern between correctly segmented characters and partially segmented ones. The proposed method first recursively crops the document by sliding a window across the image from top to bottom to extract the content within the window. This results in a number of small images for classification. The segments are classified into character and non-character class based on the features within them. The model was tested on a wide range of input images having irregular, inconsistently spaced, hand written and inscribed characters.

Downloads

Download data is not yet available.

How to Cite

Preethi, P., Mamatha, H., & Viswanath, H. (2021). Study of using hybrid deep neural networks in character extraction from images containing text. Trends in Computer Science and Information Technology, 6(2), 045–052. https://doi.org/10.17352/tcsit.000039

Issue

Vol. 6 No. 2 (2021)

Section

Research Articles

Copyright & License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Manigandan T, Vidhya V, Dhanalakshmi V, Nirmala B (2017) Tamil character recognition from ancient epigraphical inscription using OCR and NLP. 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, 1008-1011. Link: https://bit.ly/3jBsXkT

Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. International Journal of Document Analysis and Recognition (IJDAR) 9: 123-138. Link: https://bit.ly/3ik02SQ

Padmaprabha P, Ramappa MH (2018) A Systematic Approach in Transforming Inscriptions into Modern Text–Review. International Journal of Signal Processing, Image Processing and Pattern Recognition 11: 37-44. Link: https://bit.ly/3CdeC6m

Louloudis G, Gatos B, Pratikakis I, Halatsis K (2006) A block-based Hough transform mapping for text line detection in handwritten documents. In Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft. Link: https://bit.ly/3jBoHlp

Kavitha AS, Shivakumara P, Kumar GH, Lu T (2016) Text segmentation in degraded historical document images. Egyptian Informatics Journal 17: 189-197. Link: https://bit.ly/3lvhIwz

Likforman-Sulem L, Zahour A, Taconet B (2006) Text Line Segmentation of Historical Documents: a Survey. International Journal on Document Analysis and Recognition. Link: https://bit.ly/3A32LG4

Sapirstein P (2019) Segmentation, Reconstruction, and Visualization of Ancient Inscriptions in 2.5D. Journal on Computing and Cultural Heritage (JOCCH 12: 1-30. Link: https://bit.ly/3ChPOKO

Murthy SK, Kumar HG, Shivakumar P, Ranganath PR (2004) nearest neighboring clustering based approach for line and character segmentation in epigraphical scripts. Link: https://bit.ly/2Vo3iE3

Bhat SS, Balachandra Achar HV (2016) Character recognition and Period prediction of ancient Kannada Epigraphical scripts. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering 3: 114-120. Link: https://bit.ly/3CeSKrd

Hu R, Odobez JM, Gatica-Perez D (2017) Extracting maya glyphs from degraded ancient documents via image segmentation. ACM Journal of Computational. Cultural. Heritage 10: 71-93. Link: https://bit.ly/3A8ZrcF

Mohana HS, Navya K, Rajithkumar BK, Nagesh C (2014) Interactive segmentation for character extraction in stone inscriptions. Second International Conference on Current Trends In Engineering and Technology - ICCTET. Link: https://bit.ly/3AmmSzx

Alberti M, Seuret M, Pondenkandath V, Ingold R, Liwicki M (2017) Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks. In Proceedings of 4th International Workshop on Historical Document Imaging and Processing, Kyoto, Japan. Link: https://bit.ly/2TTjEnr

Chen K, Liu C, Seuret M, Liwicki M, Hennebert J, et al. (2016) Page Segmentation for Historical Document Images Based on Superpixel Classification with Unsupervised Feature Learning. 2016 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini 299-304. Link: https://bit.ly/2Vx1Gaz

Badjatiya P, Kurisinkel LJ, Gupta M, Varma V (2018) Attention-based Neural Text Segmentation. European Conference on Information Retrieval 2018, (ECIR-2018),Grenoble, France.

Jo J, Koo H, Soh JW, Cho N (2019) Handwritten Text Segmentation via End-to-End Learning of Convolutional Neural Network. Link: https://bit.ly/37fU2nT

Al-Rawi M, Bazazian D, Valveny E (2019) Can Generative Adversarial Networks Teach Themselves Text Segmentation?. Link: https://bit.ly/3imtlUH

Reza MM, Bukhari SS, Jenckel M, Dengel A (2019) Table Localization and Segmentation using GAN and CNN. 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, Australia 152-157. Link: https://bit.ly/3rQhe5D

Li X, Zhang X, Yang B, Xia S (2017) Character segmentation in text line via convolutional neural network. 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou 1175-1180. Link: https://bit.ly/3xiIxX7

Zirari F, Ennaji A, Nicolas S, Mammass D (2013) A Document Image Segmentation System Using Analysis of Connected Components," 2013 12th International Conference on Document Analysis and Recognition, Washington, DC 753-757. Link: https://bit.ly/3A59U8Y

Manigandan T, Vidhya V, Dhanalakshmi V, Nirmala B (2017) Tamil character recognition from ancient epigraphical inscription using OCR and NLP. 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, 1008-1011. Link: https://bit.ly/3jjfnlY

Sowmya A, Kumar H (2015) Enhancemnet and segmentation of historical records. Fifth intenation conference on CS and IT. Link: https://bit.ly/37iELCH

Abtahi F, Zhu Z, Burry AM (2015) A deep reinforcement learning approach to character segmentation of license plate images. 2015 14th IAPR International Conference on Machine Vision Applications (MVA), Tokyo 539-542. Link: https://bit.ly/3ftioyY

Bailey DG, Klaiber MG (2019) Zig-Zag Based Single-Pass Connected Components Analysis. J Imaging 5: 1-26. Link: https://bit.ly/3CfUeSn

Shrruru K (2020) An Introduction to Artificial Neural Network. International Journal Of Advance Research And Innovative Ideas In Education 1: 27-30.

Khan S, Rahmani H, Ali Shah SA, Bennamoun M, Medioni G, et al. (2018) A Guide to Convolutional Neural Networks for Computer Vision," in A Guide to Convolutional Neural Networks for Computer Vision. Morgan & Claypool 1-18. Link: https://bit.ly/3lvdViP

Lipo W (2005) Support Vector Machines: Theory and Applications. Springer Science & Business Media 431. Link: https://bit.ly/3rZl38v

Ni KS, Nguyen TQ (2009) An Adaptable k-Nearest Neighbors Algorithm for MMSE Image Interpolation. IEEE Transactions on Image Processing 18: 1976-1987. Link: https://bit.ly/3xmPh6o

Lin T (2018) PCA/SVM-Based Method for Pattern Detection in a Multisensor System. Mathematical Problems Engineering, Hindawi. 1-11. Link: https://bit.ly/3yt8bKg

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

How to Cite

Issue

Section

Copyright & License

References