A Hybrid Geospatial Data Clustering Method for Hotspot Analysis
Subject Areas : Journal of Computer & RoboticsMohammad Reza Keyvanpour 1 , Mostafa Javideh 2 , Mohammad Reza Ebrahimi 3
1 - Department of Computer Engineering, Alzahra University, Tehran, Iran
2 - Shamsipoor Technical College, Tehran, Iran
3 - Islamic Azad University, Qazvin Branch, Qazvin, Iran
Keywords: Geospatial data, Geographical knowledge discovery, Hotspot analysis, Hierarchical clustering, Partitional clustering, Hybrid clustering approach, Earthquake hotspots, Crime mapping,
Abstract :
Traditional leveraging statistical methods for analyzing today’s large volumes of spatial data have high computational burdens. To eliminate the deficiency, relatively modern data mining techniques have been recently applied in different spatial analysis tasks with the purpose of autonomous knowledge extraction from high-volume spatial data. Fortunately, geospatial data is considered a proper subject for leveraging data mining techniques. The main purpose of this paper is presenting a hybrid geospatial data clustering mechanism in order to achieve a high performance hotspot analysis method. The method basically works on 2 or 3-dimensional geographic coordinates of different natural and unnatural phenomena. It uses the systematic cooperation of two popular clustering algorithms: the AGlomerative NEStive, as a hierarchical clustering method and κ-means, as a partitional clustering method. It is claimed that the hybrid method will inherit the low time complexity of the κ-means algorithm and also relative independency from user’s knowledge of the AGNES algorithm. Thus, the proposed method is expected to be faster than AGNES algorithm and also more accurate than κ-means algorithm. Finally, the method was evaluated against two popular clustering measurement criteria. The first clustering evaluation criterion is adapted from Fisher’s separability criterion, and the second one is the popular minimum total distance measure. Results of evaluation reveal that the proposed hybrid method results in an acceptable performance. It has a desirable time complexity and also enjoys a higher cluster quality than its parents (AGNES and κ-means). Real-time processing of hotspots requires an efficient approach with low time complexity. So, the problem of time complexity has been taken into account in designing the proposed approach.
[1] H. J. Miller and J. Han, Geographic data mining and knowledge discovery: An overview, In H. J. Miller and J. Han (Eds.) Geographic Data Mining and Knowledge Discovery, London: Taylor and Francis, pp. 3-32, 2001. [2] H. J. Miller, Geographic data mining and knowledge discovery, In J. P. Wilson and A. S. Fotheringham (Eds.) Handbook of Geographic Information Science, ISBN: 978-1-4051-0795-2, article No 19, 2007. [3] D. Guo, Multivariate spatial clustering and geovisualization. In Geographic Data Mining and Knowledge Discovery, In H. J. Miller and J. Han (Eds.). London and New York: Taylor & Francis, pp. 325-345, 2009. [4] J. Han, M. Kamber and A.K.H. Tung. Spatial clustering methods in data mining: A survey, In: Geographic Data Mining and Knowledge Discovery. H.J. Miller and J. Han, (eds.), London: Taylor & Francis, pp. 33–50, 2001. [5] J. Han, K. Koperski and N. Stefanovic, GeoMiner: A system prototype for spatial data mining, ACM SIGMOD International Conference on Management of Data, Tucson, AZ, pp. 553–556, 1997. [6] S. Shekhar, C.T. Lu and P. Zhang, A unified approach to detecting spatial outliers, GeoInformatica, 7, pp. 139–166, 2003. [7] H. Chen, W. Chung, J.J. Xu., G. Wang, Y.Qin and M. Chau, Crime data mining: A general framework and some examples, University of Arizona; published by IEEE Computer Society Press Los Alamitos, CA, USA, 2004. [8] H. Chen, W. Chung, Y.Qin, M.Chau, J.J.Xu, G.Wang, R. Zheng and H. Atabakhsh, Crime data mining: An overview and case studies, 2003. [9] H. Chen, H. Atabakhsh, T. Petersen, J. Schroeder, T. Buetow, L. Chaboya, C.O’Toole, M.Chau, T.Cushna, D. Casey and Z. Huang, COPLINK: Visualization for crime analysis, Proc. of The National Conf. on Digital Government Research, 2003. [10] Y. Xiang, M. Chau, H. Atabakhsh and H.Chen, Visualizing criminal relationships: Comparison of a hyperbolic tree and a hierarchical list, University of Arizona, 2004. [11] P. Thongtae and S. Srisuk, An analysis of data mining applications in crime domain, citworkshops, pp. 122-126, IEEE 8th International Conf. on Computer and Information Technology Workshops, 2008. [12] A.Gonzales, R.Schofield, and S.Hart, Mapping crime: Understanding hotspot. U.S. Department of Justice, 2005. [13] M. Ahmadi, A Sharifi and M.J. Valadan, Crime mapping and spatial analysis, International institute for geo-information science and earth observation, Enschede, Neatherlands, 2003. [14] V.Estivill-Castro and I. Lee, Data mining techniques for autonomous exploration of large volumes of geo-referenced crime data, 6th Int. Conf. on Geocomputation, Brisbane, Australia, 2008.
[15] M.Wyland, Design and Implementation of a spatial Data Engine and Visualization Interface for a Crime Information System, 2008.
[16] L.Kelvin, C.Stephen, N.Vincent and S.Simon, Introduction of STEM: Space-Time-Event Model for crime pattern analysis. Asian journal of information technology, 2008. [17] M.A.Santos da Silva, A.M. Vieira Monteiro and J.S. Medeiros, Visualization of Geospatial data by component plane and U-Matrix, Brazil, 2008.
[18] L.Kelvin, J.Li, C. Stephen and N.Vincent, An Application of the dynamic pattern analysis framework to the analysis of spatial-temporal crime relationships, Journal of Universal Computer Science, vol. 15, no. 9, 2009. [19] R.W.Adderley, The use of data mining techniques in crime trend analysis and offender, profiling, PhD thesis, Publisher: University of Wolverhampton, 2007. [20] N. Levin, The CrimeStat Program: Characteristics, Use, and Audience, Houston, TX, 2004 [21] P. Mohan, S. Shekhar, N. Levine, R. Wilson, B. George and M.Celik, Should SDBMS support a join index?: A case study from crime stat, USA(c) 2008 ACM, ISBN:978-1-60558-323-5, 2008. [22] A. Helmstetter and D. Sornette, Subcritical and supercritical regimes in epidemic models of earthquake aftershocks, J. Geophys. Res., 107(B10), 2237, DOI:10.1029/2001JB001580, 2002. [23] Y.Y. Kagan and L.Knopoff, Statistical short-term earthquake prediction, Science 236, pp. 1563–1567, 1987. [24] Y.Ogata, Statistical models for earthquake occurrence and residual analysis for point processes, J. Am. stat. Assoc., 83, pp. 9-27, 1998. [25] W.Dzwinel, D.A.Yuen, K.Boryczko, Y.Ben-Zion, S. Yoshioka and T.Ito, Cluster analysis, data-mining, multi-dimensional visualization of earthquakes over space, time and feature space, Nonlinear Processes in Geophysics. Vol. 12. pp. 117-128, 2005. [26] C.C.Chen, J. B.Rundle, J. R.Holliday, K. Z.Nanjo, D. L.Turcotte, S.C. Li and K. F.Tiampo, The 1999 Chi-Chi, Taiwan, earthquake as a typical example of seismic activation and quiescence, Geophys. Res. Lett., 32, L22315, DOI:10.1029/ 2005GL023991, 2005. [27] R.Muir-Wood, Earthquake clustering due to stress interactions, proceedings of the 2008 science symposium: Advances in Earthquake Forcasting, RMS Special Report 2008, Risk Management Solutions,Inc, 2008.
Journal of Computer and Robotics 1 (2010) 53-67
67
[28] M.R.Keyvanpour, M.Javideh, M.R. Ebrahimi, and M.Sojoodi, Using Geographical information systems for crime prevention, Proceedings of National Conf. on Crime Prevention, Iran, 2008. [29] G.C.Oatley, B.W.Ewart and J.Zeleznikow, Decision support systems for police: lessons from the application of data mining techniques to 'Soft' forensic evidence, Journal of Artificial Intelligence and Law, Vol. 14, No. 1-2, DOI: 10.1007/s10506-006-9023-z, 2006. [30] http://www.crimereduction.homeoffice.gov.uk. [31] J.Reno, D.Marcus, L.Robinson, N.Brennan, and J.Travis, Mapping crime principle and practice, U.S. Department of Justice, 1999.
[32] J.Han, and M.Kamber, Data mining concepts and techniques, second edition, Morgan Kaufmann, November 3, 2005.
[33] G.K. Gupta, Introduction to data mining with case studies, prentice-hall of India, New Delhi, 2006.
[34] X.W. Syrmos, Optimal cluster selection based on Fisher class separability measure, American Control Conference, IEEE, 2005.
[35] http://www.geophysics.ut.ac.ir.
[36] B.Raskutti and C.Leckie, An evaluation of criteria for measuring the quality of clusters, pp. 905 – 910, ISBN:1-55860-613-0, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 1999.