Publishing Health Information without Distortion While Balancing Desired Privacy-Preserving and Utility
Subject Areas : Renewable energyAbbas Karimi Rizi 1 , Mohammad Naderi Dehkordi 2 , Naser Nematbakhsh 3
1 - Faculty of Computer Engineering- Najafabad Branch, Islamic Azad University, Najafabad, Iran
2 - Big Data Research Center- Najafabad Branch, Islamic Azad University, Najafabad, Iran
3 - Department of Engineering- Shahid Ashrafi Esfahani University, Esfahan, Iran
Keywords: Taxonomy, disease diagnostic code, membership analysis,
Abstract :
In the age of health information analysis, the disease diagnostic code is considered as the patient's privacy. Achieving this code is the most important need for the analysts while anonymizing the code is necessary for people when publishing health information. Disease diagnostic codes, usually presented based on international classifications, are displayed in the form of a taxonomy. In real life, patients only allow the category of the disease diagnostic code to be disclosed, not the original disease diagnostic code. Conventional privacy-preserving models often distort the category of the disease diagnostic code. Preserving privacy accompanying the data utility has always been a critical issue in the dissemination of health information. In this study, a new anonymization method is presented in a way that all attributes of health information can be published without distortion to maintain the utility of the data. So, the published information protects the privacy of patients, so that the experts' expectations and the utility of analysts are desired as expected. The innovative method disseminates health information in a way that the maximum probability of disclosing the disease diagnostic code is always less than or equal to the threat threshold defined by the expert, and on the other hand, the membership analysis error is reduced. The new method is scalable under certain conditions. The results of the practical evaluation of patient data obtained from one of the hospitals in Isfahan are evidence of the effectiveness of the proposed method.
[1] T. Dalenius, “Finding a needle in a haystack or identifying anonymous census records”, Journal of Official Statistics, vol. 2, no.3, pp. 315-328, 1986.
[2] L. H. Cox, “Suppression methodology and statistical disclosure control”, Journal of the American Statistical Assocciation, vol. 75, pp. 377-385, 1980.
[3] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information”, Proceedings of the ACM SIGACT-SIGMOD-SIGART, Seattle Washington USA, May. 1998 (doi: 10.1145/275487.275508).
[4] L. Sweeney, “k-anonymity: A model for protecting privacy”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002 (doi: 10.1142/S0218488502001648).
[5] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, “L–diversity: Privacy beyond k-anonymity”, ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, pp. 1-16, March 2007 (doi: 10.1145/1217299.1217302).
[6] X. Xiao and Y. Tao, “Anatomy: Simple and effective privacy preservation”, Proceedings of the VLDB, pp. 139-150, Sept. 2006 (doi: 10.5555/1182635.1164141).
[7] A. Karimi Rizi, M. Naderi Dehkordi, N. Nemat bakhsh, “SNI: Supervised anonymization technique to publish social networks having multiple sensitive labels”, Security and Communication Networks, vol. 2019, Article Number: 8171263, pp. 1-23, 2019 (doi: 10.1155/2019/8171263).
[8] X. Xiao, Y. Tao, “Personalized privacy preservation”, Proceedings of the SIGMOD, pp. 229-240, June 2006 (doi: 10.1145/1142473.1142500).
[9] R. C. W. Wong, J. Li, A. W. C. Fu, K. Wang, “(α, k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing”, Proceedings of the KDD, pp. 754-759, Aug. 2006 (doi: 10.1145/1150402.1150499).
[10] Q. Zhang, N. Koudas, D. Srivastava, T. Yu, "Aggregate query answering on anonymized tables", Proceeding of the IEEE/ICDE, Istanbul, Turkey, pp. 116-125, 2007 (doi: 10.1109/ICDE.2007.367857).
[11] K. Wang, B. C. M. Fung, P. S. Yu, “Handicapping attacker’s confidence: An alternative to k- Anonymization”, Knowledge and Information Systems, vol. 11, pp. 345-368, 2007 (doi: 10.1007/s10115-006-0035-5).
[12] L. Ninghui, L. Tiancheng, S. Venkatasubramanian, "t-Closeness: Privacy beyond k-anonymity and l-diversity", Proceeding of the IEEE/ICDE, Istanbul, Turkey, pp. 106-115, 2007 (doi: 10.1109/ICDE.2007.367856).
[13] V. Rastogi, D. Suciu, S. Hong, “The boundary between privacy and utility in data publishing”, Proceeding of the VLDB, pp. 531-542, Sept. 2007.
[14] A. Blum, K. Liqett, A. Roth, “A learning theory approach to non-interactive database privacy”, Proceedings of the ACM, pp. 609-618, 2008 (doi: 10.1145/1374376.1374464).
[15] C. Dwork, A. Roth, “The algorithmic foundations of differential privacy”, Foundations and Trends in Theoretical Computer Sience, vol. 19, no. 3-4, pp 211-407, 2014 (doi: 10.1561/0400000042).
[16] J. Han, F. Luo, J. Lu, H. Peng, “SLOMS: A privacy preserving data publishing method for multiple sensitive attributes microdata”, Journal of Software, vol. 8, no. 12, pp. 3096-3104, 2013 (doi: 10.4304/jsw.8.12.3096-3104).
[17] Q. Liu, H. Shen, Y. Sang, “A privacy-preserving data publishing method for multiple numerical sensitive attributes via clustering and multi-sensitive bucketization”, Proceeding of the PAAP, Beijing, China, pp. 220-223, 2014 (doi: 10.1109/PAAP.2014.56).
[18] V. S. Susan, T. Christopher, “Anatomisation with slicing: a new privacy preservation approach for multiple sensitive attributes”, SpringerPlus, vol. 5, no. 964, pp. 1-18 2016 (doi: 10.1186/s40064-016-2490-0).
[19] A. Hasan, Q. Jiang, H. Chen, and S. Wang, “A new approach to privacy-preserving multiple independent data publishing”, Applied Sciences, vol. 8, no. 5, pp. 783, 2018 (doi: 10.3390/app8050783).
[20] T. Kanwal, S.A.A. Shaukat, A. Anjum, S.R. Malik, K.K.R Choo, A Khan, N Ahmad, M. Ahmad, S.U. Khan, “Privacy-preserving model and generalization correlation attacks for 1:M data with multiple sensitive attributes”, Information Sciences, vol. 488, pp. 238-256, 2019 (doi: 10.1016/j.ins.2019.03.004).
[21] A. Anjum, N. Farooq, S. U. R. Malik, A. Khan, M. Ahmed, M. Gohar, “An effective privacy preserving mechanism for 1: M microdata with high utility”, Sustainable Cities and Society, vol. 45, pp. 213, Feb. 2019 (doi: 10.1016/j.scs.2018.11.037).
[22] R. Khan, X. Tao, A. Anjum, H. Sajjad, S.R. Malik, A. Khan, F. Amiri, “Privacy preserving for multiple sensitive attributes against fingerprint correlation attack satisfying c-diversity“, Wireless Communications and Mobile Computing, vol. 2020, Article ID 8416823, pp. 1-18, 2020 (doi: 10.1155/2020/8416823).
[23] S. K. Bansal, “Towards a semantic extract-transform-load (ETL) framework for big data integration”, in Proceeding of the IEEE/ICBD, pp. 522-529, Anchorage, AK, USA, June/July 2014 (doi: 10.1109/BigData.Congress.2014.82).
[24] K. Fearon, F. Strasser, S. D. Anker, Bosaeus, E. Bruera, R. L. Fainsinger, “Definition and classification of cancer cachexia: An international consensus”, The Lancet Oncology, 2011 (doi: 10.1016/S1470-2045(10)70218-7).
_||_