Semantic Document Image Classification Based on Valuable Text Pattern
Subject Areas : Renewable energyHossein Pourghassem 1 * , Mohammad sadegh Helforoush 2 , Sabalan Daneshvar 3
1 - Assistant Professor/Islamic Azad University - Najafabad Branch
2 - Assistant Professor/ Shiraz University of Technology
3 - Assistant Professor/Sahand University of Technology – Tabriz
Keywords: semantic classification, document and non-document images, information valuable,
Abstract :
Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region) regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.
_||_