Development of a Data-Driven Model Based on Machine Learning for Screening Urban Infrastructure Projects Considering Sustainability and Agility Dimensions
Mahyar Abasian
1
(
School of Industrial Engineering, College of Engineering, University of Tehran, Tehran, Iran
)
Seyed Mohammadreza Vatankhah Ghamsari
2
(
Islamic Azad University, Shahriar Branch, Tehran, Iran
)
Seyed Mohammad Matin Ziaei
3
(
School of Management, College of Financial Management, University of Tehran, Tehran, Iran
)
Pooneh Hamrahi
4
(
School of Industrial Engineering, College of Engineering, University of Tehran, Tehran, Iran
)
کلید واژه: Project Selection, Data-Driven Model, Machine Learning, Sustainability, Agility,
چکیده مقاله :
This study aims to advance the evaluation and screening process of urban infrastructure projects by designing a data-driven classification model based on the CatBoost machine learning algorithm. The proposed model classifies projects into three decision categories—Selected, Reserved, and Rejected—using 13 indicators derived from two integrated dimensions: sustainability (with its economic, social, and environmental sub-dimensions) and agility. The dataset comprises 380 real-world projects, each annotated with corresponding indicator values and expert-labeled final decisions. A rigorous statistical analysis was conducted to confirm data quality, balance, and absence of multicollinearity. Subsequently, the CatBoost model was trained and optimized via hyperparameter tuning and cross-validation techniques. Its classification performance was benchmarked against Support Vector Machine (SVM) and Artificial Neural Network (ANN) models. Results revealed the superiority of CatBoost, which achieved an accuracy of 91.21% and an F1-score of 90.32%, outperforming both alternative models across all key metrics including precision and recall. Confusion matrix analysis further highlighted its robustness in correctly identifying projects in each of the three categories. The study demonstrates that advanced machine learning models—particularly those optimized for mixed-type and nonlinear datasets—can significantly improve multi-criteria decision-making processes in urban project management. The model’s capability to integrate sustainability and agility perspectives offers a novel approach to address the complexities of modern infrastructure planning, especially in dynamic and data-rich environments. From a practical perspective, the proposed model supports urban policymakers, planners, and evaluators in selecting high-impact, future-ready projects. Theoretically, the research contributes to bridging the gap between strategic planning paradigms and intelligent computational tools. Future developments may explore integration with GIS data, real-time analytics, and adaptive learning features to expand applicability across urban development domains.
چکیده انگلیسی :
This study aims to advance the evaluation and screening process of urban infrastructure projects by designing a data-driven classification model based on the CatBoost machine learning algorithm. The proposed model classifies projects into three decision categories—Selected, Reserved, and Rejected—using 13 indicators derived from two integrated dimensions: sustainability (with its economic, social, and environmental sub-dimensions) and agility. The dataset comprises 380 real-world projects, each annotated with corresponding indicator values and expert-labeled final decisions. A rigorous statistical analysis was conducted to confirm data quality, balance, and absence of multicollinearity. Subsequently, the CatBoost model was trained and optimized via hyperparameter tuning and cross-validation techniques. Its classification performance was benchmarked against Support Vector Machine (SVM) and Artificial Neural Network (ANN) models. Results revealed the superiority of CatBoost, which achieved an accuracy of 91.21% and an F1-score of 90.32%, outperforming both alternative models across all key metrics including precision and recall. Confusion matrix analysis further highlighted its robustness in correctly identifying projects in each of the three categories. The study demonstrates that advanced machine learning models—particularly those optimized for mixed-type and nonlinear datasets—can significantly improve multi-criteria decision-making processes in urban project management. The model’s capability to integrate sustainability and agility perspectives offers a novel approach to address the complexities of modern infrastructure planning, especially in dynamic and data-rich environments. From a practical perspective, the proposed model supports urban policymakers, planners, and evaluators in selecting high-impact, future-ready projects. Theoretically, the research contributes to bridging the gap between strategic planning paradigms and intelligent computational tools. Future developments may explore integration with GIS data, real-time analytics, and adaptive learning features to expand applicability across urban development domains.
Bai, L., Yang, M., Pan, T., & Sun, Y. (2025). Project portfolio selection and scheduling incorporating dynamic synergy. Kybernetes, 54(2), 996–1026.
Dağıstanlı, H. A. (2024). An interval-valued intuitionistic fuzzy VIKOR approach for R&D project selection in defense industry investment decisions. Journal of Soft Computing and Decision Analytics, 2(1), 1–13.
Dutta, J., & Roy, S. (2022). OccupancySense: Context-based indoor occupancy detection & prediction using CatBoost model. Applied Soft Computing, 119, 108536.
ForouzeshNejad, A. (2024). A hybrid data-driven model for project portfolio selection problem based on sustainability and strategic dimensions: a case study of the telecommunication industry. Soft Computing, 28(3), 2409–2429.
Gertzen, W. M., Van der Lingen, E., & Steyn, H. (2022). Goals and benefits of digital transformation projects: Insights into project selection criteria. South African Journal of Economic and Management Sciences, 25(1), 4158.
Ghanavatinejad, M., Tavakoli, M., & Sepehri, M. (2019). A Clustering model for gadgets and apps used in patient monitoring in HIOT environment. Journal of Hospital, 18(3), 63–72.
GhanavatiNejad, M., Tavakoli, M., Sheikhalishahi, M., Aydın, N., & Aria, S. S. (2025). An integrated smart framework for fast-moving consumer goods online market logistics: a digital twin framework. Journal of Industrial and Production Engineering, 1–17.
Javan-Molaei, B., Tavakkoli-Moghaddam, R., Ghanavati-Nejad, M., & Asghari-Asl, A. (2024). A data-driven robust decision-making model for configuring a resilient and responsive relief supply chain under mixed uncertainty. Annals of Operations Research, 1–38.
Kandakoglu, M., Walther, G., & Ben Amor, S. (2024). The use of multi-criteria decision-making methods in project portfolio selection: a literature review and future research directions. Annals of Operations Research, 332(1), 807–830.
Kettunen, J., & Lejeune, M. A. (2022). Data-driven project portfolio selection: Decision-dependent stochastic programming formulations with reliability and time to market requirements. Computers & Operations Research, 143, 105737.
Lee, S., Cho, Y., & Ko, M. (2020). Robust optimization model for R&D project selection under uncertainty in the automobile industry. Sustainability, 12(23), 10210.
Mahmoudi, A., Deng, X., Javed, S. A., & Yuan, J. (2021). Large-scale multiple criteria decision-making with missing values: project selection through TOPSIS-OPA. Journal of Ambient Intelligence and Humanized Computing, 12(10), 9341–9362.
Mohagheghi, V., & Mousavi, S. M. (2021). A new multi-period optimization model for resilient-sustainable project portfolio evaluation under interval-valued Pythagorean fuzzy sets with a case study. International Journal of Machine Learning and Cybernetics, 12(12), 3541–3560.
Mohagheghi, V., Mousavi, S. M., Antuchevičienė, J., & Dorfeshan, Y. (2019). Sustainable infrastructure project selection by a new group decision-making framework introducing MORAS method in an interval type 2 fuzzy environment. International Journal of Strategic Property Management, 23(6), 390–404.
Molaei, B. J., Ghanavati-Nejad, M., Tajally, A., & Sheikhalishahi, M. (2025). A novel stochastic machine learning approach for resilient-leagile supplier selection: a circular supply chain in the era of industry 4.0. Soft Computing, 1–22.
Namazi, M., Tavana, M., Mohammadi, E., & Naeini, A. B. (2023). A new strategic approach for R&D project portfolio selection using efficiency-uncertainty maps. Benchmarking: An International Journal, 30(10), 4193–4220.
Nayeri, S., Khoei, M. A., Rouhani-Tazangi, M. R., GhanavatiNejad, M., Rahmani, M., & Tirkolaee, E. B. (2023). A data-driven model for sustainable and resilient supplier selection and order allocation problem in a responsive supply chain: A case study of healthcare system. Engineering Applications of Artificial Intelligence, 124, 106511.
Nessari, S., Ghanavati-Nejad, M., Jolai, F., Bozorgi-Amiri, A., & Rajabizadeh, S. (2024). A data-driven decision-making approach for evaluating the projects according to resilience, circular economy and industry 4.0 dimension. Engineering Applications of Artificial Intelligence, 134, 108608.
Qian, L., Chen, Z., Huang, Y., & Stanford, R. J. (2023). Employing categorical boosting (CatBoost) and meta-heuristic algorithms for predicting the urban gas consumption. Urban Climate, 51, 101647.
Rastgoo, A., & Khajavi, H. (2023). A novel study on forecasting the Airfoil self-noise, using a hybrid model based on the combination of CatBoost and Arithmetic Optimization Algorithm. Expert Systems with Applications, 120576.
Rostami, O., Tavakoli, M., Tajally, A., & GhanavatiNejad, M. (2023). A goal programming-based fuzzy best–worst method for the viable supplier selection problem: a case study. Soft Computing, 27(6), 2827–2852.
Sazvar, Z., Tavakoli, M., Ghanavati-Nejad, M., & Nayeri, S. (2022). Sustainable-resilient supplier evaluation for high-consumption drugs during COVID-19 pandemic using a data-driven decision-making approach. Scientia Iranica.
Sharma, M., & Joshi, S. (2023). Digital supplier selection reinforcing supply chain quality management systems to enhance firm’s performance. TQM Journal. https://doi.org/10.1108/TQM-07-2020-0160
Sharma, M., Luthra, S., Joshi, S., & Joshi, H. (2022). Challenges to agile project management during COVID-19 pandemic: an emerging economy perspective. Operations Management Research, 15(1), 461–474.
Staron, M., Meding, W., & Palm, K. (2012). Release readiness indicator for mature agile and lean software development projects. International Conference on Agile Software Development, 93–107.
Swarnakar, V., Singh, A. R., Antony, J., Tiwari, A. K., & Garza-Reyes, J. A. (2023). Sustainable Lean Six Sigma project selection in manufacturing environments using best-worst method. Total Quality Management & Business Excellence, 34(7–8), 990–1014.
Tajally, A., Vamarzani, M. Z., Ghanavati-Nejad, M., Zeynali, F. R., Abbasian, M., & Bahengam, A. (2025). A hybrid machine learning-based decision-making model for viable supplier selection problem considering circular economy dimensions. Environment, Development and Sustainability, 1–33.
Tavakoli, M., Ghanavati-Nejad, M., Tajally, A., & Sheikhalishahi, M. (2024). LRFM—based association rule mining for dentistry services patterns identification (case study: a dental center in Iran). Soft Computing, 28(7), 6085–6100.
Thesing, T., Feldmann, C., & Burchardt, M. (2021). Agile versus waterfall project management: decision model for selecting the appropriate approach to a project. Procedia Computer Science, 181, 746–756.
Yang, L., Tang, Z., & Liu, S. (2022). Research on optimisation method for project site selection based on improved genetic algorithm. International Journal of Industrial and Systems Engineering, 40(3), 309–324.
Yeh, J.-Y., & Chen, C.-H. (2020). A machine learning approach to predict the success of crowdfunding fintech project. Journal of Enterprise Information Management, 35(6), 1678–1696.
Zarjou, M., & Khalilzadeh, M. (2022). Optimal project portfolio selection with reinvestment strategy considering sustainability in an uncertain environment: a multi-objective optimization approach. Kybernetes, 51(8), 2437–2460.
Zeynali, F. R., Hatami, S., Khameneh, R. T., & Ghanavati-Nejad, M. (2024). Evaluating the performance of the raw material providers based on the customer-based LARG (CLARG) paradigm: a machine learning-based method. Journal of Optimization in Industrial Engineering, 2(17).
Zeynali, F. R., Parvin, M., ForouzeshNejad, A. A., Jeyzanibrahimzade, E., Ghanavati-Nejad, M., & Tajally, A. (2025). A Heuristic-based Multi-Stage Machine Learning-based Model to Design a Sustainable, Resilient, and Agile Reverse Corn Supply Chain by considering Third-party Recycling. Applied Soft Computing, 113042.