Adaptive Approximate Record Matching

Rahnamoun, Ramin

Manuscript ID : 510148 Visit : 290 Page: 23 - 27

20.1001.1.22519246.2014.03.01.4.4

Article Type: Original Research

Adaptive Approximate Record Matching

Subject Areas : International Journal of Smart Electrical Engineering

Ramin Rahnamoun ¹

1 - Computer Engineering Department, Azad University-Tehran Central Branch, Tehran, Iran.

Received: 2015-04-21 Accepted : 2015-04-21 Published : 2014-10-01

Keywords: genetic algorithms, record matching, edit distance, data cleaning,

Abstract :

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error patterns. In field matching phase, edit distance method is used. Naturally, it had been customized for Persian language problems such as similarity of Persian characters, usual typographical errors in Persian, etc. In record matching phase, the importance of each field can be determined by specifying a coefficient related to each field. Coefficient of each field must be dynamically changed, because of changes of typographical error patterns. For this reason, Genetic Algorithm (GA) is used for supervised learning of coefficient values. The simulation results show the high abilities of this algorithm compared with other methods (such as Decision Trees).

References:

[1]

D. E. Goldberg, “Genetic Algorithms in Search Optimization and Machine Learning”, Addison_Wesley, 1989.

[2]

] J. Han, M. Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001.

[3]

M. A. Hernadez, S.J.Stolfo, "Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem", Journal of Data Mining and Knowledge Discovery, Vol.1, No.2, 1998

[4]

J.A. Hylton, “Identifying and Merging Related Bibliographical Records”, Master's Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1996.

[5]

M. Kantardzic,” Data Mining: Concepts, Methods, and Algorithms”, IEEE Press, 2003.

[6]

K. Kukich, "Techniques for Automatically Correcting Words in Text", ACM Computing Survey, Vol.24, No.4, 1992.

[7]

A. E. Monge, “Adaptive Detection of Approximately Duplicate Database Records and Database Integration Approach to Information Discovery”, PHD Thesis, University of California, San Diego, 1997.

[8]

] A. E. Monge, C. P. Elkan, "The Field Matching Problem: Algorithms and Applications" Second International Conference of Knowledge Discovery and Data Mining, AAAI Press, 1996.

[9]

V. S. Verykios, A.K.Elmagarmid, E.H.Houstis, "Automating the Approximate Record Matching Process", Information Science, Vol.126, No 1-4, 2000.

[10]

V. S. Verykios, G.V.Moustakides, "A Cost Optimal Decision Model for Record Matching", Workshop on Data Quality, 2001.

Optimal Detection of Oil Contamination at Sea by the FPSO Algorithm
Print Date : 2014-04-01
Significant Characteristics of Multi Carrier Energy Networks for Integration of Plug-In Electric Vehicles to Electric Distribution Network
Print Date : 2014-04-01
Feasibility Study of Using Renewable Energy Sources for a University Campus in Smart Grid Using Fuzzy Method
Print Date : 2014-04-01
A Fuzzy Controlled PWM Current Source Inverter for Wind Energy Conversion System
Print Date : 2014-04-01
Neural Network Performance Analysis for Real Time Hand Gesture Tracking Based on Hu Moment and Hybrid Features
Print Date : 2014-04-01
Decentralized Routing and Power Allocation in FDMA Wireless Networks based on H∞ Fuzzy Control Strategy
Print Date : 2014-04-01

Share To

Article Url

Adaptive Approximate Record Matching