Page Layout Analysis of the Document Image Based on the Region Classification in a Decision Hierarchical Structure
Subject Areas : Renewable energy
1 - Assistant Professor/Islamic Azad University, Najaf Abad Branch
Keywords: document image, Page layout analysis, Hierarchical structure, texture features,
Abstract :
The conversion of document image to its electronic version is a very important problem in the saving, searching and retrieval application in the official automation system. For this purpose, analysis of the document image is necessary. In this paper, a hierarchical classification structure based on a two-stage segmentation algorithm is proposed. In this structure, image is segmented using the proposed two-stage segmentation algorithm. Then, the type of the image regions such as document and non-document image is determined using multiple classifiers in the hierarchical classification structure. The proposed segmentation algorithm uses two algorithms based on wavelet transform and thresholding. Texture features such as correlation, homogeneity and entropy that extracted from co-occurrenc matrix and also two new features based on wavelet transform are used to classifiy and lable the regions of the image. The hierarchical classifier is consisted of two Multilayer Perceptron (MLP) classifiers and a Support Vector Machine (SVM) classifier. The proposed algorithm is evaluated on a database consisting of document and non-document images that provides from Internet. The experimental results show the efficiency of the proposed approach in the region segmentation and classification. The proposed algorithm provides accuracy rate of 97.5% on classification of the regions.
_||_