A Review of Outliers: Towards a Novel Fuzzy Method for Outlier Detection
محورهای موضوعی : Electrical EngineeringArash Mazidi 1 , Fahimeh Roshanfar 2 , Vahid Parvin Darabad 3
1 - Department of Computer Engineering, Faculty of Engineering, Golestan University, Gorgan, Iran.
2 - Department of Electrical Engineering, Faculty of Engineering, Golestan University, Gorgan, Iran.
3 - Department of Electrical Engineering, Faculty of Engineering, Golestan University, Gorgan, Iran.
کلید واژه: Fuzzy, Outliers, Outlier Detection, Outlier Detection Applications,
چکیده مقاله :
Outliers and outlier detection are among the most important concepts of data processing in different applications. While there are many methods for outlier detection, each detection problem needs to be solved with the method most suited to its unique characteristics and features. This paper first classifies different outlier detection methods used in different fields and applications to provide a better understanding, and then presents a new fuzzy method for outlier detection. The proposed method uses the fuzzy logic and the local density to assign a point to data instances, and then determines whether a piece of data is normal or outlier based on the value of resulted membership function. Evaluation of the proposed outlier detection algorithm with synthetic datasets demonstrates its good accuracy; moreover, evaluation of the performance in solving real datasets show that the proposed method outperforms the k-means and K-NN algorithms.
[1] D. M. Hawkins, Identification of Outliers, London: Chapman and Hall, 1980. |
[2] E. M. Knorr and R. T. Ng, "A Unified Approach for Mining Outliers," in Centre for Advanced Studies on Collaborative research, Toronto, 1997. |
[3] M. M. Breunig, H.-P. Kriegel, R. T. Ng and J. Sander, "LOF: Identifying Density-Based Local Outliers," ACM Sigmod Record, vol. 29, no. 2, pp. 93-104, 2000. |
[4] V. Kumar, "Parallel and Distributed Computing for Cybersecurity," IEEE Distributed Systems Online, vol. 6, no. 10, 2005. |
[5] C. Spence, L. Parra and P. Sajda, "Detection, Synthesis and Compression in Mammographic Image Analysis with a Hierarchical Image Probability Model," Mathematical Methods in Biomedical Image Analysis, pp. 3-10, 2001. |
[6] S. Panigrahi, A. Kundu, S. Sural and A. Majumdar, "Credit card fraud detection: A fusion approach using Dempster–Shafer theory and Bayesian learning," Information Fusion, vol. 10, no. 4, pp. 354-363, 2009. |
[7] X. Yu-Zhen, Z. Yu-Lin and Y. Jian-Ping, "Detection of Outliers from Spacecraft Tracking Data using GP-RBF Network," Acta Simulata Systematica Sinica, vol. 2, 2005. |
[8] P. Gogoi, D. K. Bhattacharyya, B. Borah and J. K. Kalita, "A survey of outlier detection methods in network anomaly identification," The Computer Journal, vol. 54, no. 4, pp. 570-588, 2011. |
[9] P. Clifton, V. Lee, K. Smith and R. Gayler, "A comprehensive survey of data mining-based fraud detection research," in arXiv preprint arXiv, 2010. |
[10] M. Gupta, J. Gao, C. C. Aggarwal and J. Han, "Outlier Detection for Temporal Data: A Survey," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 25, no. 1, 2013. |
[11] V. J. Hodge and J. Austin, "A survey of outlier detection methodologies," Artificial Intelligence Review, vol. 22, no. 2, pp. 85-126, 2004. |
[12] M. Agyeman, K. Barker and R. Alhajj, "A comprehensive survey of numeric and symbolic outlier mining techniques," Intelligent Data Analysis, vol. 10, no. 6, pp. 521-538, 2006. |
[13] A. Patcha and J.-M. Park, "An overview of anomaly detection techniques: Existing solutions and latest technological trends," Computer Network, vol. 51, no. 12, pp. 3448-3470, 2007. |
[14] Z. A. Bakar, R. Mohemad, A. Ahmad and M. M. Deris, "A Comparative Study for Outlier Detection Techniques in Data Mining," in Cybernetics and Intelligent Systems, 2006. [15] S. Shiblee and L. Gruenwald, "An Adaptive Outlier Detection Technique for Data Streams," Scientific and Statistical Database Management, pp. 596-597, 2011. [16] Chawla, Sanjay, and Aristides Gionis. "k-means-: A Unified Approach to Clustering and Outlier Detection." In SDM, pp. 189-197. 2013. |
[17] P. L. Brockett, X. Xia and R. A. Derrig, "Using Kohonen's self-organizing feature map to uncover automobile bodily injury claims fraud," Journal of Risk and Insurance, vol. 65, no. 2, pp. 245-274, 1998. |
[18] P. D'haeseleer, S. Forrest and P. Helman, "An immunological approach to change detection: Algorithms, analysis and implications," Security and Privacy, pp. 110-119, 1996. |
[19] E. Eskin, "Anomaly Detection over Noisy Data Using Learned Probability Distributions," 2000. |
[20] A. K. Ghosh, J. Wanken and F. Charron, "Detecting Anomalous and Unknown Intrusions Against Programs," in Computer Security Applications Conference, 1998. |
[21] W. Lee, S. J. Stolfo and P. K. Chan, "Learning Patterns from Unix Process Execution Traces for Intrusion Detection," AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, pp. 50-56, 1997. |
[22] W. Lee, S. J. Stolfo and K. W. Mok, "Adaptive Intrusion Detection: a Data Mining Approach," Artificial Intelligence Review, vol. 14, no. 6, pp. 533-567, 2000. |
[23] D. Anderson, T. F. Lunt, H. Javitz, A. Tamaru and A. Valdes, "Detecting Unusual Program Behavior Using the Statistical Component of the Next-generation Intrusion Detection Expert System (NIDES)," SRI International, Computer Science Laboratory, Colifornia, 1995. |
[24] K. Yamanishi and J.-I. Takeuchi, "Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner," seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 389-394, 2001. |
[25] D.-Y. Yeung and C. Chow, "Parzen-window network intrusion detectors," 16th International Conference on Pattern Recognition, vol. 4, pp. 385-388, 2002. |
[26] A. Valdes and K. Skinner, "Adaptive, Model-based Monitoring for Cyber Attack Detection," Springer, Berlin Heidelberg, 2000. |
[27] A. Bronstein, J. Das, M. Duro, R. Friedrich, G. Kleyner, M. Mueller, S. Singhal and I. Cohen, "Self-Aware Services: Using Bayesian Networks for Detecting Anomalies in Internet-based Services," IEEE/IFIP International Symposium on Integrated Network Management, pp. 623-638, 2001. |
[28] D. Barbará, J. Couto, S. Jajodia and N. Wu, "ADAM: a testbed for exploring the use of data mining in intrusion detection," ACM Sigmod Record, vol. 30, no. 4, pp. 15-21, 2001. |
[29] G. Williams, R. Baxter, H. He, S. Hawkins and L. Gu, "A Comparative Study of RNN for Outlier Detection in Data Mining," in IEEE International Conference on Data Mining, 2002. |
[30] J. R. Dorronsoro, F. Ginel, C. Sgnchez and C. S. Cruz, "Neural fraud detection in credit card operations," IEEE Transaction on Neural Networks, vol. 8, no. 4, pp. 827-834, 1997. |
[31] R. Brause, T. Langsdorf and M. Hepp, "Neural Data Mining for Credit Card Fraud Detection," 11th IEEE International Conference on Tools with Artificial Intelligence, pp. 103-106, 1999. |
[32] R. J. Bolton and D. J. Hand, "Unsupervised Profiling Methods for Fraud Detection," Credit Scoring and Credit Control VII, pp. 235-255, 2001. |
[33] V. R. Ganji and S. N. P. Mannem, "Credit card fraud detection using anti-k nearest neighbor algorithm," International Journal on Computer Science & Engineering, vol. 4, no. 6, pp. 1035-1039, 2012. |
[34] T. Fawcett and F. Provost, "Activity monitoring: noticing interesting changes in behavior," ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 53-62, 1999. |
[35] K. C. Cox, S. G. Eick, G. J. Wills and R. J. Brachman, "Visual Data Mining: Recognizing Telephone Calling Fraud," Data Mining and Knowledge Discovery, vol. 1, no. 2, pp. 225-231, 1997. |
[36] S. L. Scott, "Detecting network intrusion using a Markov modulated non homogeneous Poisson process," Submitted to the Journal of the American Statistical Association, 2001. |
[37] P. Barson, S. Field, N. Davey, G. McAskie and R. Frank, "The Detection of Fraud in Mobile Phone Networks," Neural Network World, vol. 6, no. 4, pp. 477-484, 1996. |
[38] E. Suzuki, T. Watanabe, H. Yokoi and K. Takabayashi, "Detecting interesting exceptions from medical test data with visual summarization," Third IEEE International Conference on Data Mining, pp. 315-322, 2003. |
[39] W.-K. Wong, A. Moore, G. Cooper and M. Wagner, "Bayesian network anomaly pattern detection for disease outbreaks," ICML, pp. 808-813, 2003. |
[40] J. Lin, E. Keogh, A. Fu and H. V. Herle, "Approximations to Magic: Finding Unusual Medical Time Series," 18th IEEE Symposium on Computer-Based Medical Systems, pp. 329-334, 2005. |
[41] E. Keogh, S. Lonardi and . B.-c. Chiu, "Finding Surprising Patterns in a Time Series Database in Linear Time and Space," The eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 550-556, 2002. |
[42] E. Keogh, J. Lin, S.-H. Lee and H. V. Herle, "Finding the most unusual time series subsequence: algorithms and applications," Knowledge and Information Systems, vol. 11, no. 1, pp. 1-27, 2007. |
[43] S. Jakubek and T. Strasser, "Fault-diagnosis using neural networks with ellipsoidal basis functions," American Control Conference, vol. 5, pp. 3846-3851, 2002. |
[44] T. Yairi, Y. Kato and K. Hori, "Fault Detection by Mining Association Rules from House-keeping Data," International Symposium on Artificial Intelligence, Robotics and Automation in Space, vol. 3, no. 9, 2001. |
[45] G. Manson, G. Pierce and K. Worden, "On the long-term stability of normal condition for damage detection in a composite panel," Key Engineering Materials, vol. 204, pp. 359-370, 2001. |
[46] S. J. Hickinbotham and J. Austin, "Novelty detection in airframe strain data," 15th International Conference on Pattern Recognition, vol. 2, pp. 536-539, 2000. |
[47] H. Sohn, K. Worden and C. R. Farrar, "Novelty Detection under Changing Environmental Conditions," SPIE's 8th Annual International Symposium on Smart Structures and Materials, pp. 108-118, 2001. |
[48] C. P. Diehl and J. B. Hampshire, "Real-time object classification and novelty detection for collaborative video surveillance," Neural Networks, vol. 3, pp. 2620-2625, 2002. |
[49] M. Davy and S. Godsill, "Detection of abrupt spectral changes using support vector machines. an application to audio signal segmentation," ICASSP, vol. 2, pp. 1313-1316, 2002. |
[50] S. Singh and M. Markou, "An Approach to Novelty Detection Applied to the Classification of Image Regions," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 16, no. 4, pp. 396-408, 2004. |
[51] D. Pokrajac, A. Lazarevic and L. J. Latecki, "Incremental Local Outlier Detection for Data Streams," IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 504-515, 2007. |
[52] L. M. Manevitz and M. Yousef, "One-Class SVMs for Document Classification," Journal of Machine Learning Research, vol. 2, pp. 139-154, 2001. |
[53] D. Guthrie, Unsupervised Detection of Anomalous Text, Sheffield: Doctoral dissertation, University of Sheffield, 2008. |
[54] D. Janakiram, V. A. M. Reddy and A. P. Kumar, "Outlier detection in wireless sensor networks using Bayesian belief networks," First International Conference on Communication System Software and Middleware, pp. 1-6, 2006. |
[55] J. W. Branch, C. Giannella, B. Szymanski, R. Wolff and H. Kargupta, "In-Network Outlier Detection in Wireless Sensor Networks," Knowledge and information systems, vol. 34, no. 1, pp. 23-54, 2013. |
[56] H. Song, S. Zhu and G. Cao, "Attack-resilient time synchronization for wireless sensor networks," Ad Hoc Networks, vol. 5, no. 1, pp. 112-125, 2007. |
[57] A. Boukerche, H. A. Oliveira and E. F. Nakamura, "Secure Localization Algorithms for Wireless Sensor Networks," Communications Magazine, vol. 46, no. 4, pp. 96-101, 2008. |
[58] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki and D. Gunopulos, "Online outlier detection in sensor data using non-parametric models," 32nd international conference on Very large data bases, pp. 187-198, 2006. |
[59] P. Crook, S. Marsland, G. Hayes and U. Nehmzow, "A Tale of Two Filters - On line Novelty Detection," ICRA'02. IEEE International Conference on Robotics and Automation, vol. 4, pp. 3894-3899, 2002. |
[60] J. A. Ting, A. D`Souza and S. Schaal, "Automatic Outlier Detection : A Bayesian approach," IEEE International Conference on Robatics and Automation, pp. 2489-2494, 2007. |
[61] G. Munz, S. Li and G. Carle, "Traffic Anomaly Detection Using K-Means Clustering," GI/ITG Workshop MMBnet, 2007. |
[62] T. Ide and H. Kashima, "Eigenspace-based anomaly detection in computer systems," tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 440-449, 2004. |
[63] K. Kadota, D. Tominaga, Y. Akiyama and K. Takahashi, "Detecting outlying samples in microarray data: A critical assessment of the effect of outliers on sample classification," Chem-Bio Informatics, vol. 3, no. 1, pp. 30-45, 2003. |
[64] G. Atluri, R. Gupta, G. Fang, G. Pandey, M. Steinbach and V. Kumar, "Association Analysis Techniques for Bioinformatics Problems," Bioinformatics and Computational Biology, pp. 1-13, 2009. |
[65] C.-T. Lu, D. Chen and Y. Kou, "Algorithms for Spatial Outlier Detection," Third IEEE International Conference on Data Mining, pp. 597-600, 2003. |
[66] S. Lin and D. E. Brown, "An Outlier-based Data Association Method For Linking Criminal Incidents," Decision Support Systems, vol. 41, no. 3, pp. 604-615, 2006. |
[67] H. Dutta, C. Giannella, K. D. Borne and H. Kargupta, "Distributed top-k outlier detection in astronomy catalogs using the DEMAC system," in SDM, 2007. |
[68] Y.-X. Zhang, A. L. Lou and Y.-H. Zhao, "Outlier detection in astronomical data," Astronomical Telescopes and Instrumentation, pp. 521-529, 2004. |