Semantic Document Image Classification Based on Valuable Text Pattern
Subject Areas : Renewable energy
                                                    
                                                             Hossein Pourghassem
                                                        
                                                            1
                                                                *
                                                            
                                                                
                                                                    
                                                                
                                                        
                                                    
                                                    ,
                                                    
                                                             Mohammad sadegh Helforoush
                                                        
                                                            2
                                                        
                                                    
    ,                                                    
                                                             Sabalan Daneshvar
                                                        
                                                            3
                                                        
                                                    
                                    
                                               1 -     Assistant Professor/Islamic Azad University - Najafabad Branch
                                               
                                               2 -     Assistant Professor/ Shiraz University of Technology
                                               
                                               3 -     Assistant Professor/Sahand University of Technology – Tabriz
                                               
                                       
Keywords: semantic classification, document and non-document images, information valuable,
Abstract :
Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region) regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.
_||_