Computing the Efficiency of Bank Branches with Financial Indexes, an Application of Data Envelopment Analysis (DEA) and Big Data
Subject Areas : Statistical Methods in Financial ManagementFahimeh Jabbari-Moghadam 1 , Farhad Hosseinzadeh Lotfi 2 , Mohsen Rostamy-Malkhalifeh 3 , Masoud Sanei 4 , Bijan Rahmani-Parchkolaei 5
1 - Department of Mathematics, Science and Research Branch, Islamic Azad University, Tehran, Iran
2 - Department of Mathematics, Science and Research Branch, Islamic Azad University,Tehran, Iran
3 - Department of Mathematics, Faculty of Science, Science and Research Branch, Islamic Azad University, Tehran, Iran
4 - Department of Mathematics, Central Tehran Branch, Islamic Azad University, Tehran, Iran
5 - Department of Mathematics, Nour Branch, Islamic Azad University, Nour, Iran
Keywords: Data Envelopment Analysis (DEA), Clustering, Data Mining and Big Data, Efficiency (Performance),
Abstract :
In traditional Data Envelopment Analysis (DEA) techniques, in order to calculate the efficiency or performance score, for each decision-making unit (DMU), specific and individual DEA models are designed and resolved. When the number of DMUs are immense, due to an increase in complications, the skewed or outdated, calculating methods to compute efficiency, ranking and …. may not prove to be economical. The key objective of the proposed algorithm is to segregate the efficient units from that of the other units. In order to gain access to this objective, effectual indexes were created; and taken to assist, in regards the DEA concepts and the type of business (under study), to survey the indexes, which were relatively operative. Subsequently, with the help of one of the clustering techniques and the ‘concept of dominance’, the efficient units were absolved from the inefficient ones and a DEA model was developed from an aggregate of the efficient units. By eliminating the inefficient units, the number of units which played a role in the construction of a DEA model, diminished. As a result, the speed of the computational process of the scores related to the efficient units increased. The algorithm designed to measure the various branches of one of the mercantile banks of Iran with financial indexes was implemented; resulting in the fact that, the algorithm has the capacity of gaining expansion towards big data.
[1] Sherman, H.D. and Gold, F., Bank Branch Operating Efficiency. Evaluation with Data Envelopment Analysis, Journal of Banking and Finance, 1985; 9, 297-315. Doi:10.1016/0378-4266(85)90025-1
[2] Charnes, A., Cooper, W. W., & Rhodes, E., Measuring the efficiency of decision making units, European journal of operational research, 1978; 2(6), 429-444. Doi: 10.1016/0377-2217(78)90138-8
[3] Banker, R. D., Charnes, A., & Cooper, W. W., Some models for estimating technical and scale inefficiencies in data envelopment analysis, Management science; 1984; 30(9), 1078-1092. Doi: 10.1287/mnsc.30.9.1078
[4] Cook, W. D., & Seiford, L. M., Data envelopment analysis (DEA)–Thirty years on. European journal of operational research, 2009; 192(1), 1-17. Doi: 10.1016/j.ejor.2008.01.032
[5] Dulá, J. H., & Helgason, R. V., A new procedure for identifying the frame of the convex hull of a finite collection of points in multidimensional space, European Journal of Operational Research, 1996; 92(2), 352-367. Doi:10.1016/0377-2217(94)00366-1
[6] Dulá, J. H., Helgason, R. V., & Venugopal, N., An algorithm for identifying the frame of a pointed finite conical hul, Informas Journal on Computing, 1998; 10(3): 323-330. Doi:10.1287/ijoc.10.3.323
[7] Barr, R. S., & Durchholz, M. L, Parallel and hierarchical decomposition approaches for solving large-scale data envelopment analysis models, Annals of Operations Research, 1997; 73, 339-372. Doi:10.1023/a:1018941531019
[8] Sueyoshi, T., & Chang, Y. L., Efficient algorithm for additive and multiplicative models in data envelopment analysis, Operations Research Letters, 1989; 8(4), 205-213. Doi:10.1016/0167-6377(89)90062-x.
[9] Dulá, J. H., & Thrall, R. MA., computational framework for accelerating DEA, Journal of Productivity Analysis, 2001; 16(1), 63-78. Doi:10.1023/a:1011103303616
[10] Chen, W. C., & Cho, W. JA., procedure for large-scale DEA computations, Computers & Operations Research, 2009; 36(6), 1813-1824. Doi:10.1016/j.cor.2008.05.006
[11] Dulá, J. H., & López, F. J., DEA with streaming data, Omega, 2013; 41(1), 41-47.
Doi: 10.1016/j.omega.2011.07.010
[12] Chen, W. C., & Lai, S. Y., Determining radial efficiency with a large data set by solving small-size linear programs, Annals of Operations Research, 2017; 250(1), 147-166. Doi: 10.1007/s10479-015-1968-4.
[13] Zhu, Q., Wu, J., & Song, M., Efficiency evaluation based on data envelopment analysis in the big data context, Computers & Operations Research, 2018; 98, 291-300. Doi: 10.1016/j.cor.2017.06.017
[14] Khezrimotlagh, D., Zhu, J., Cook, W. D., & Toloo, M., Data envelopment analysis and big data, European Journal of Operational Research, 2019; 274(3), 1047-1054. Doi: 10.1016/j.ejor.2018.10.044.
[15] Dellnitz, A Big data efficiency analysis: Improved algorithms for data envelopment analysis involving large datasets, Computers & Operations Research, 2022; Volume 137,2022,105553.
Doi: 10.1016/j.cor.2021.105553.
[16] Vijayarani, S., & Nithya, S. An efficient clustering algorithm for outlier detection, International Journal of Computer Applications, 2011; 32(7), 22-27.
[17] Vijayarani, S., & Nithya, S., Sensitive Outlier Protection in Privacy Preserving Data Mining, International Journal of Computer Applications, 2011; 33(3).
Adv. Math. Fin. App., 2024, 9(3), P. 817-839 | |
| Advances in Mathematical Finance & Applications www.amfa.iau-arak.ac.ir Print ISSN: 2538-5569 Online ISSN: 2645-4610 Doi: 10.22034/amfa.2023.1959465.1755 |
Computing The Efficiency of Bank Branches With Financial Indexes, an Application of Data Envelopment Analysis (DEA) and Big Data
|
Fahimeh Jabari Moghadama, Farhad Hosseinzadeh Lotfia, *, Mohsen Rostamy Malkhalifeha, Masood Saneib, Bijan Rahmani Parchkolaeic.
a Department of Mathematics, Science and Research Branch, Islamic Azad University, Tehran, Iran b Department of Mathematics, Central Tehran Branch, Islamic Azad University, Iran c Department of Mathematics, Nour Branch, Islamic Azad University, Nour, Iran
| |||
Article Info Article history: Received 2022-05-28 Accepted 2023-01-01
Keywords: Efficiency (Performance) Clustering Data Envelopment Analysis (DEA) Data Mining Big Data |
| Abstract | |
In traditional Data Envelopment Analysis (DEA) techniques, in order to calculate the efficiency or performance score, for each decision-making unit (DMU), specific and individual DEA models are designed and resolved. When the number of DMUs are immense, due to an increase in complications, the skewed or outdated, calculating methods to compute efficiency, ranking and …. may not prove to be economical. The key objective of the proposed algorithm is to segregate the efficient units from that of the other units. In order to gain access to this objective, effectual indexes were created; and taken to assist, in regards the DEA concepts and the type of business (under study), to survey the indexes, which were relatively operative. Subsequently, with the help of one of the clustering techniques and the ‘concept of dominance’, the efficient units were absolved from the inefficient ones and a DEA model was developed from an aggregate of the efficient units. By eliminating the inefficient units, the number of units which played a role in the construction of a DEA model, diminished. As a result, the speed of the computational process of the scores related to the efficient units increased. The algorithm designed to measure the various branches of one of the mercantile banks of Iran with financial indexes was implemented; resulting in the fact that, the algorithm has the capacity of gaining expansion towards big data. . |
1 Introduction
Today, computing efficiency in various organizations and industries, is one of the essential procedures carried out, for the purpose of comparing, the amount of competitiveness credence, in the domestic and foreign scenarios of a country. Banks are not an exception in this respect. Hence, calculating the efficiency of banks and identifying the effective factors is extremely crucial. In DEA, abundant research has been performed to calculate the efficiency or performance of banks. DEA was initially utilized by Sherman [1] to compute the efficiency of banks. Charnes [2] proposed a model to attain a set of DMUs with multiple inputs and outputs. DEA classifies units into two groups, namely, the ‘efficient and inefficient’ units; and measures the efficiency score of each unit [3]. Cook [4] solved the conventional DEA models easily, with linear programming approaches. Complications relating to real-life problems, has given rise to DEA issues, with greater databases. When DEA, is big data related, two issues occur; 1- The number of inputs/outputs is high and 2- The number of DMUs are numerous. Computing efficiency, can be manipulated by using diverse models, which, based on the measured efficiency, can determine the efficient and inefficient DMUs. In the CCR model, m + s is present in limitation and n+1 exists as a variable; whereas, in the multiplier CCR model, n+1 has restrictions and m + s exists as a variable [2]. This results in an increment in the number of units or the number of variables, thus, increasing the complexities of calculations, leads to the query, as to the manner in which the models are implemented, when the sizes of m, s and n are exceedingly large. When the number of units increase, a significant volume of the linear programming models must be resolved. Processing big data by employing traditional methods is problematic or unfeasible. Big data emerged in the 1980’s; and it can be reflected upon as, incrementing the volume of data, such that, it is difficult to analyze, process and store, by means of skewed database technologies. The ‘big data’ terminology correlates to data which are tremendously bulky, rapid or complex. Currently, big data is within the focus of modern scientific and commercial centers. In the DEA sphere, big data has created numerous problems for researchers. For example, when the DMUs are in extremely large numbers, due to an increment in computational complexities, a calculation of the efficiency of the DMUs is beyond the capacity of conventional methods. Given the application of DEA in the varied arenas of managerial and industrial aspects, including the growing acceptance of this technique, to evaluate the units under evaluation in big organizations, a rendering of methods which are capable of reducing computational issues and as a result, decreasing the calculation period, seems to be essential and beneficial. Numerous studies on the grounds of DEA, have been suggested for a large number of units, with the objectives of reducing, the computational period, for resolving linear programming models.,[5-6] similarly, in 1997.,[7] also proposed a hierarchical analysis approach, to reduce the implementation time, of big- sized DEA problems. Corhon and Sitari, rendered a lexicographic parametric programming method also, to decrease computational costs, when the efficient units are identified. , [8-15] were other studies in this domain. The key idea of these methods was to divide the units into smaller sets with the objectives of seeking an aggregate of efficient units. Subsequently, the efficiency of the inefficient units is calculated.
Zhu et.al.,[13] and Dellnitz.,[15], categorized the efficient units by classifying the n decision-making units (DMUs) into k+1 classifications then, by calculating the efficiency of each set, conducted the segregation of the efficient units. Dellnitz.,[15] developed the algorithm presented by Zhu et.al.,[13]. In this paper the same approach utilized by Zhu et.al, in relevance with classifying the data into smaller sets was followed. In order to seek the efficient units, Khezrimotlagh et.al.,[15], utilized and specified the criteria, where, units with the minimum input and the maximum output values, were concerned.In order to decrease the computational complexities, a new method has been rendered to segregate the efficient units from the inefficient ones, in this paper. Thereby, with the assistance of DEA concepts; and with due attention to the type of business, (in the current paper, being the bank), the relatively effectual indexes have been configured. Next, with one of the (two-step) clustering methods, the units are classified into efficient and inefficient classifications; and the DEA model is constructed with the efficient category. To assure the identification of the entire efficient units, the ‘definition of dominance’ is utilized and the efficient units which have remained, are specified and supplemented to the efficient classification of units. Thus, the model is updated. it is in this phase that; the efficiency of the inefficient units is calculated. In actual fact, by segregating the efficient units from the inefficient, the number of units with which the DEA model is constructed, reduces. This results in a decrease in computational complications. To evaluate the validity of the proposed algorithm and determine the number of appropriate aggregates or sets for clustering, samples in sizes ranging from 100 to 9000 units were surveyed. The results illustrated that, as the amount of datum increments, the time-period for implementing the proposed algorithm decreases, then that of the time taken, when it is executed by the CCR model with an identical number of data. But, when the amount of data is meager, the execution time with the CCR model is less, or shows little or no variance, with the time taken, when implemented by the proposed model. In actuality, by increasing the number of units, a shorter time-period is required, to calculate the efficiency with the proposed algorithm. The Gams 23.4 Software has been taken advantage of, so as to calculate, the efficiency of the units and to employ the definition of dominance. The IBM SPSS Modeler 18 Software is utilized for clustering; whereas, the system adopted to implement is the Intel (R) Core (TM) i7-2670QM CPU @ 2.20GHz 2.20 GHz. RAM: 8.00 GB
In brief, the steps of the proposed method comprise of:
1. The data is classified into two sets, that is, efficient and inefficient data, by manipulating indexes, involving the two-stage clustering method
2. Specifying the entire efficient units, with the assistance of the concept of dominance; and supplementing them, to the efficient category or set
3. Calculating the efficiency of all the units, by resolving the problems of models with lower computational complexities, in comparison with classical models.
This paper is organized as follows:
In Section 2: A synopsis of the Subject Literature
In Section 3: The Proposed Method
In Section 4: An Empirical Example
In Section 5: Conclusion and suggestions for future undertakings.
1 Subject Literature
1.1 Clustering and Statistical Concepts
Two-Stage Clustering Method: In general, ‘clustering’ is a statistical method, for grouping observations, within a framework of sub-divisions which are similar to each other and are based on one or several characteristics. In the contemporary paper, a two-step clustering method has been employed. Clustering comprises of a large family of methods and algorithms; and the two-step method is extensively used in media marketing studies. In other words, the study, segmenting and gaining an overall profile of the samples under investigation, is the key purpose of this methodology. The two-stage method is applicable where big data and the hybrid use of sequential and quantitative data is concerned. This point forms its diversity with all the other clustering methods. In fact, it is the only algorithm that can offer both quantitative and qualitative data. In the initial step, the observations are within a framework of primary clusters and these pre-clusters, are taken to be as a primary nucleus and are also considered as an observation. In the second step, the hierarchical method for categorizing these nuclei, which form the output of the prior step are utilized, comprising of similar observations that are within a nucleus or core.
Outliers and extreme data: The identification and elimination of outlier data is a process to eradicate and exterminate data, in other words is an operation which enhances the qualitative aspects of datum. One of the foremost reasons for effective data mining is an efficacious analysis and a beneficial aggregate from an assortment of observations with due attention to their behavior., [16-17].
The analysis of outlier or aberrant data was studied from the beginning of the 19th century, in the statistical society. ‘Outlier’ data signifies to data which is out of range. At times outlier data can be an aggravation and hassle; and sometimes the problem itself, could refer to the detection of outlier(s). Outliers are not erroneous data. These are datum, which are only, at a distance, from the distribution of records. One of the approaches for classifying outlier data, is the distribution-based method. In these methods, statistical models are developed for the data-sets; and then a statistical test is imposed, to determine, as to whether, a data conforms and fits in with this model, or not. Data with a low probability, of belonging to the statistical model are declared as ‘outliers ‘. This approach is divided into two classifications, known as, parametric and non-parametric. In the parametric methods, it is presumed that, a principled distribution is pursued, such as, a normal distribution; whereas, in the non-parametric methods, this assumption does not exist. In this paper, in order to identify the outlier data, a parametric and a univariate method have been used. In the univariate approach, a single variable is surveyed at a time to detect outlier data. The method to identify, is such that, a normal distribution is taken into consideration for the data. Outlier data in this case, refers to the observations, which are of an immense distance, in relative to the mean central point, in context with the standard deviation index. This conception is denoted from the properties of a normal distribution; 99.7 percent of the data is between . In order to describe another label for outliers and excessively outdated data, is taken into consideration; and in this paper, for the extreme data, the Modeller spss 18 Software has been utilized.
1.2 Data Envelopment Analysis (DEA) Concepts
- The CCR Model: Let us assume that we have “n” DMUs. Each DMUj has the ≥0, (x1j, …, xmj) input vector and utilizes it to produce the output vector . The relative efficiency of DMUo comes to hand from the model hereunder:
| (1)
|
(possibility production set)
By observing the subjectivity comprising of constant returns to scale (CRS), convexity, feasibility, minimal interpolation of set, the possibility of production is as given below. [2]
DMUo is efficient, if a better or more enhanced point, than this, does not exist in the constructed pps.
DMUo is inefficient, if and only if a better point than that is found.
Theorem 1: If DMUt is inefficient, then in every optimal response (1)
(1) can be constructed without taking any inefficient t unit under consideration.
|
(2)
|
- Definition of dominance
Definition 1: dominates if and only if:
ü DMU1 is not worse than DMU2 in the entire components
ü It should be significantly better than DMU2, in one component, to the minimum. That is: