مروری جامع بر شباهتیابی معنایی متون برای توسعه سیستمهای توصیهگر هوشمند قضایی
الموضوعات : سامانههای پردازشی و ارتباطی چندرسانهای هوشمندنسیم گریلی نیا 1 , ندا عبدالوند 2 , زهرا رضایی 3 , فرزاد موحدی سبحانی 4
1 - دانشجوی دکتری رشته مهندسی صنایع، دانشکده فنی و مهندسی، واحد علوم و تحقیقات، دانشگاه آزاد اسلامی، تهران، ایران
2 - دانشیار، گروه مدیریت، دانشکده علوم اجتماعی و اقتصادی، دانشگاه الزهرا، تهران، ایران
3 - استادیار، گروه آینده¬پژوهی، پژوهشگاه قوه قضاییه، تهران، ایران
4 - استادیار، گروه مهندسی صنایع، دانشکده فنی و مهندسی، واحد علوم و تحقیقات، دانشگاه آزاد اسلامی، تهران، ایران
الکلمات المفتاحية: پردازش زبان طبیعی, یادگیری عمیق, متون قضایی, شباهتیابی معنایی,
ملخص المقالة :
سیستمهای توصیهگر هوشمند قضایی با هدف بهبود کارایی و ثبات تصمیمگیری قضایی، بر سنجش دقیق شباهت معنایی بین متون حقوقی متکی هستند. در این پژوهش روشهای مختلف ارزیابی شباهت معنایی که میتواند در توسعه سیستمهای توصیهکننده قضایی مورد استفاده قرارگیرد، بررسی شده است. این تجزیه و تحلیل طیف وسیعی از تکنیکهای تشخیص شباهت متن، از رویکردهای مبتنی بر واژگان تا پردازش زبان طبیعی پیشرفته و مدلهای یادگیری ماشین را پوشش میدهد. این بررسی جامع با در نظر گرفتن ویژگیهای خاص زبان حقوقی و چالشهای موجود در این حوزه، نقاط قوت و ضعف هر روش را به دقت تحلیل کرده است. در واقع این پژوهش با ارائه یک نمای کلی از پیشرفته ترین روشهای شباهتیابی معنایی متون قضایی، زمینه را برای طراحی و توسعۀ ابزارهای پشتیبانی تصمیم گیری هوشمند قوی و مؤثر برای سیستم قضایی فراهم میکند. توسعه این ابزارها کمک شایانی به قضات و وکلا خواهدکرد و همچنین نقش مهمی در ایجاد وحدت رویه در سیستم قضایی خواهدداشت.
[1] Talib, M. R., Hanif, M. K., Nabi, Z., Sarwar, M. U., and Ayub, N. (2017). “Text mining of judicial system's corpora via clause elements”, International Journal on Information Technologies & Security, 9(3).
[2] Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., & Sun, M. (2020). “How does NLP benefit legal system: A summary of legal artificial intelligence”, arXiv preprint arXiv:2004.12158.
[3] Mandal, A., Chaki, R., Saha, S., Ghosh, K., Pal, A., and Ghosh, S. (2017, November). “Measuring similarity among legal court case documents”, In Proceedings of the 10th annual ACM India compute conference (pp. 1-9).
[4] Ghanbari, N., Nasirannajabadi, D., and Soltani, R. (2018). “Disparities in Outcomes of Similar Cases in Civil Litigation”, International Legal Research, 11(39), 353-373. [Persian]
[5] Farhadishad, M., Kazemifard, M., & Rezaei, Z. (2023). “Predicting Court Judgment in Criminal Cases by Text Mining Techniques”, Journal of Information Technology Management, 15(2), 204-222.
[6] Herissinejad, K. (2010). “Study of the factors of Roman-German legal system’s impact on Iranian modern law”, Journal of Legal Research, 39(2), 245-264. [Persian]
[7] Chandrasekaran, D., & Mago, V. (2021). “Evolution of semantic similarity—a survey”, ACM Computing Surveys (CSUR), 54(2), 1-37.
[8] Yao, H., Liu, H., and Zhang, P. (2018). “A novel sentence similarity model with word embedding based on convolutional neural network”, Concurrency and Computation: Practice and Experience, 30(23), e4415.
[9] Manber, U. (1994, January). “Finding Similar Files in a Large File System”, In Usenix winter (Vol. 94, pp. 1-10).
[10] Geravand, S., and Ahmadi, M. (2014). “An efficient and scalable plagiarism checking system using bloom filters”, Computers & Electrical Engineering, 40(6), 1789-1800.
[11] Deza, E., Deza, M. M., Deza, M. M., and Deza, E. (2009). Encyclopedia of distances, (pp. 1-583). Springer Berlin Heidelberg.
[12] Sunilkumar, P., and Shaji, A. P. (2019, December). “A survey on semantic similarity”, In 2019 International Conference on Advances in Computing, Communication and Control (ICAC3) (pp. 1-8). IEEE.
[13] Pawar, A., and Mago, V. (2018). “Calculating the similarity between words and sentences using a lexical database and corpus statistics”, arXiv preprint arXiv:1802.05667.
[14] Wibisono, P. D., Asad, A., and Chintan, A. (2021). “Short text similarity measurement methods: a review”, Soft Computing, 1-25.
[15] Norouzi, M., Fleet, D. J., and Salakhutdinov, R. R. (2012). “Hamming distance metric learning”, Advances in neural information processing systems, 25.
[16] Momtaz, M., Bijari, K., Salehi, M., & Veisi, H. (2016, December). “Graph-based Approach to Text Alignment for Plagiarism Detection in Persian Documents”, In FIRE (working notes) (pp. 176-179).
[17] Atoum, I., and Otoom, A. (2016). “Efficient hybrid semantic text similarity using WordNet and a corpus”, International Journal of Advanced Computer Science and Applications, 7(9).
[18] Resnik, P. (1995). "Using information content to evaluate semantic similarity in a taxonomy", arXiv preprint cmp-lg/9511007.
[19] Po, D. K. (2020). “Similarity based information retrieval using Levenshtein distance algorithm”, Int. J. Adv. Sci. Res. Eng, 6(04), 06-10.
[20] Leonardo, B., and Hansun, S. (2017). “Text documents plagiarism detection using Rabin-Karp and Jaro-Winkler distance algorithms”, Indonesian Journal of Electrical Engineering and Computer Science, 5(2), 462-471.
[21] Jiang, J. Y., Cheng, W. H., Chiou, Y. S., and Lee, S. J. (2011, July). “A similarity measure for text processing”, In 2011 International Conference on Machine Learning and Cybernetics (Vol. 4, pp. 1460-1465). IEEE.
[22] Pennington, J., R. Socher and C. D. Manning (2014). “Glove: Global vectors for word representation”, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).
[23] Tous, R., & Delgado, J. (2006, September). A vector space model for semantic similarity calculation and OWL ontology alignment. In International Conference on Database and Expert Systems Applications (pp. 307-316). Berlin, Heidelberg: Springer Berlin Heidelberg.
[24] Turney, P. D. (2005). Measuring semantic similarity by latent relational analysis. arXiv preprint cs/0508053.
[25] Li, Y., Bandar, Z. A., & McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on knowledge and data engineering, 15(4), 871-882.
[26] Akila, D., & Jayakumar, C. (2014). Semantic similarity-a review of approaches and metrics. Int. J. Appl. Eng. Res, 9(24), 27581-27600.
[27] Sussna, M. (1993, December). Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the second international conference on Information and knowledge management (pp. 67-74).
[28] Su, Z., Ahn, B. R., Eom, K. Y., Kang, M. K., Kim, J. P., and Kim, M. K. (2008, June). “Plagiarism detection using the Levenshtein distance and Smith-Waterman algorithm”, In 2008 3rd International Conference on Innovative Computing Information and Control (pp. 569-569). IEEE.
[29] Chen, Y., Lu, H., and Li, L. (2017). “Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity”. PloS one, 12(3), e0173410.
[30] Landauer, T. K., and Dumais, S. T. (1997). “A solution to Plato's problem: The latent semantic analysis theory of acquisition”, induction, and representation of knowledge. Psychological review, 104(2), 211.
[31] Matveeva, I., Levow, G., Farahat, A., & Royer, C. (2007). Term representation with generalized latent semantic analysis. Amsterdam Studies in the Theory and History of Linguistic Science Series 4, 292, 45.
[32] Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). “Latent dirichlet allocation”, Journal of machine Learning research, 3(Jan), 993-1022.
[33] Rong, X. (2014). “word2vec parameter learning explained”, arXiv preprint arXiv:1411.2738.
[34] Gabrilovich, E., & Markovitch, S. (2007, January). Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In IJcAI (Vol. 7, pp. 1606-1611).
[35] Potthast, M., Stein, B., & Anderka, M. (2008, March). A wikipedia-based multilingual retrieval model. In European conference on information retrieval (pp. 522-530). Berlin, Heidelberg: Springer Berlin Heidelberg.
[36] Manthouri, M., Tehranipour, S., & Yazdani, S. (2023). “ParsAirCall: Automated Conversational IVR in Airport Call Center using Deep Transfer Learning”, Intelligent Multimedia Processing and Communication Systems (IMPCS), 2 (4).
[37] Kenter, T., and De Rijke, M. (2015, October). “Short text similarity with word embeddings”, In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1411-1420).
[38] Deza, E., et al. (2009). Encyclopedia of distances, Springer.
[39] Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). “Enriching word vectors with subword information”, Transactions of the association for computational linguistics, 5, 135-146.
[40] Blagec, K., Xu, H., Agibetov, A., and Samwald, M. (2019). “Neural sentence embedding models for semantic similarity estimation in the biomedical domain”, BMC bioinformatics, 20, 1-10.
[41] Reimers, N., and Gurevych, I. (2019). “Sentence-bert: Sentence embeddings using siamese bert-networks”, arXiv preprint arXiv:1908.10084.
[42] Alshammeri, M., Atwell, E., and ammar Alsalka, M. (2021). “Detecting semantic-based similarity between verses of the Quran with Doc2vec”, Procedia Computer Science, 189, 351-358.
[43] Chicco, D. (2021). “Siamese neural networks: An overview”, Artificial neural networks, 73-94.
[44] Zheng, T., Gao, Y., Wang, F., Fan, C., Fu, X., Li, M., ... and Ma, H. (2019). “Detection of medical text semantic similarity based on convolutional neural network”, BMC medical informatics and decision making, 19, 1-11.
[45] Kim, S. H., Nam, H., and Park, Y. H. (2022, May). “Temporal dynamic convolutional neural network for text-independent speaker verification and phonemic analysis”, In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6742-6746). IEEE.
[46] Han, M., Zhang, X., Yuan, X., Jiang, J., Yun, W., and Gao, C. (2021). “A survey on the techniques, applications, and performance of short text semantic similarity”, Concurrency and Computation: Practice and Experience, 33(5), e5971.
[47] Memory, L. S. T. (2010). “Long short-term memory”. Neural computation, 9(8), 1735-1780.
[48] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... and Polosukhin, I. (2017). “Attention is all you need”, Advances in neural information processing systems, 30.
[49] Devlin, J., et al. (2018). “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint arXiv:1810.04805.
[50] Gonbadi, L., & Ranjbar, N. (2022). “Sentiment Analysis of People’s opinion about Iranian National Cars with BERT”, Intelligent Multimedia Processing and Communication Systems (IMPCS), 4(3), 51-60
[51] Liu, Y., et al. (2019). “Roberta: A robustly optimized bert pretraining approach”, arXiv preprint arXiv:1907.11692.
[52] Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter”, arXiv preprint arXiv:1910.01108.
[53] Rajapaksha, P., Farahbakhsh, R., and Crespi, N. (2021). “Bert, xlnet or roberta: the best transfer learning model to detect clickbaits”, IEEE Access, 9, 154704-154716.
[54] Sugathadasa, K., Ayesha, B., de Silva, N., Perera, A. S., Jayawardana, V., Lakmal, D., and Perera, M. (2019). “Legal document retrieval using document vector embeddings and deep learning”, In Intelligent Computing: Proceedings of the 2018 Computing Conference, Volume 2 (pp. 160-175). Springer International Publishing.
[55] Bhattacharya, P., Ghosh, K., Pal, A., and Ghosh, S. (2020). “Methods for computing legal document similarity: A comparative study”, arXiv preprint arXiv:2004.12307.
[56] Ma, Y., Zhang, P., and Ma, J. (2018). “An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization”, arXiv preprint arXiv:1808.01843.
[57] Mandal, A., Ghosh, K., Ghosh, S., and Mandal, S. (2021). “Unsupervised approaches for measuring textual similarity between legal court case reports”, Artificial Intelligence and Law, 1-35.
[58] de Oliveira, R. S., and Nascimento, E. G. S. (2022). “Analysing similarities between legal court documents using natural language processing approaches based on Transformers”, arXiv preprint arXiv:2204.07182.
[59] Nair, A. M., and Wagh, R. S. (2018). “Similarity analysis of court judgements using association rule mining on case citation data-a case study”, Int J Eng Res Technol, 11(3), 373-381.
[60] Renjit, S., and Idicula, S. M. (2019). “CUSAT NLP@ AILA-FIRE2019: Similarity in Legal Texts using Document Level Embeddings”, In FIRE (Working Notes) (pp. 25-30).
[61] Xia, C., He, T., Li, W., Qin, Z., and Zou, Z. (2019, July). “Similarity analysis of law documents based on Word2vec”, In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C) (pp. 354-357). IEEE.
[62] Sheetal, S., Veda, N., Prabhu, R., Pruthv, P., & Mamatha, H. R. R. (2022, December). Knowledge Graph-based Thematic Similarity for Indian Legal Judgement Documents using Rhetorical Roles. In Proceedings of the 19th International Conference on Natural Language Processing (ICON) (pp. 154-160).
[63] Sisodia, Y. (2023). Semantic Textual Similarity on Contracts: Exploring Multiple Negative Ranking Losses for Sentence Transformers. Authorea Preprints.
[64] Li, B., and Wang, M. (2023). “Design of intelligent legal text analysis and information retrieval system based on BERT model”. (preprint)
[65] Zhu, J., Wu, J., Luo, X., and Liu, J. (2023). “Semantic matching based legal information retrieval system for COVID-19 pandemic”, Artificial intelligence and law, 1-30.
[66] Naseri, J., Hasanpour, H., & Ghanbari Sorkhi, A. (2024). Accelerating Legislation Processes through Semantic Similarity Analysis with BERT-based Deep Learning. International Journal of Engineering, 37(6), 1050-1058.