A Hybrid Intelligent Model to Increase the Accuracy of COCOMO
الموضوعات :
1 - Kerman Branch, Islamic Azad University
الکلمات المفتاحية: Software costs estimation, Accuracy, Invasive Weed Optimization (IWO), meta-heuristic,
ملخص المقالة :
Nowadays, effort estimation in software projects is turned to one of the key concerns for project managers. In fact, accurately estimating of essential effort to produce and improve a software product is effective in software projects success or fail, which is considered as a vital factor. Lack of access to satisfying accuracy and little flexibility in existing estimation models have attracted the researchers’ attention to this area in last few years. One of the existing effort estimation methods is COCOMO (Constructive Cost Model) which has been taken importantly as an appropriate method for software projects. Although COCOMO has been invented some years ago, it has still got effort estimation ability in software projects. Many researchers have attempted to improve effort estimation ability in this model by improving COCOMO operation; but despite many efforts, COCOMO results are not satisfying yet. In this research, a new compound method is presented to increase COCOMO estimation accuracy. In the proposed method, much better factors are gained using combination of invasive weed optimization and COCOMO estimation method in contrast with basic COCOMO. With the best factors, the proposed model’s optimality will be maximized. In this method, a real data set is used for evaluating and its operation is analyzed in contrast to other models. Operational parameters improvement is affirmed by this model’s estimation results.
[1] D. Karaboga., and B. Basturk., "A powerful and Efficient Algorithm for Numerical Function Optimization: Artificial Bee Colony (ABC) Algorithm" Journal of Global optimization vol. 39, pp. 459-471, November 2007.
[2] J.J. Liang., A.K. Qin, “Comprehensive Learning Particle Swarm Optimizer for Global Optimization of Multimodal Functions”, Proceedings of IEEE Transaction of Evolutionary Computation, vol. 10, No. 3, June 2006.
[3]Liang J, Lee C,” A Modification Artificial Bee Colony Algorithm for Optimization Problems”, Mathematical Problems in Engineering Volume 2015 (2015).
[4] B. Akay and D. Karaboga, “A modified Artificial Bee Colony algorithm for real-parameter optimization,” Information Sciences, vol. 192, pp. 120–142, 2012.
[5] HAI-BIN DUAN, CHUN-FANG XU, and ZHI-HUI XING,” A hybrid artificial bee colony optimization and quantum evolutionary algorithm for continuous optimization problems”2010.
[6] Hadidi, Kazemzade.," Structural optimization using artificial bee colony algorithm” 2nd International Conference on Engineering Optimization September 6 - 9, (2010), Lisbon, Portugal.
[7] W. Gao , S. Liu “ A modified artificial bee colony algorithm”.Computer & Operation Research. 39 , 3 , 687-697 ,2012.
[8] D. Karaboga, “An idea based on honey bee swarm fornumerical optimization”. Technical Report-TRO6. Kayseri, Turkey: Erciyes.
[9] Jing, Hong,” Improved Artificial Bee Colony Algorithm and Application in Path Planning of Crowd Animation” International Journal of Control and Automation Vol.8, No.3 (2015), pp.53-66.
[10] T. Chen, XU,” Solving a timetabling problem with an artificial bee colony algorithm” World Transactions on Engineering and Technology Education Vol.13, No.3, 2015.
[11] R. Poli, J. Kennedy, T. Blackwell, Particle swarm optimization: An overview (Springer Science and Business Media, LLC (2007).
[12] Pei-Wei TSai, Jeng-Shyang Pan, Bin-Yih Liao, Shu-Chuan Chu, Enhanced Artificial Bee Colony Optimization , International Journal of Innovative Computing, Information and Control, Volume 5, Number 12, December (2009).
[13] Yang, X. S. , Nature-Inspired Metaheuristic Algorithms, Luniver Press(2008).
[14] R. Khaze, I. maleki, S. Hojjatkhah and A.Bagherinia, EVALUATION THE EFFICIENCY OF ARTIFICIAL BEE COLONY AND THE FIREFLY ALGORITHM IN SOLVING THE CONTINUOUS OPTIMIZATION PROBLEM, International Journal on Computational Sciences & Applications (IJCSA) Vol.3, No.4, August 2013.
[15] X-S. Yang," Firefly Algorithm, L´evy Flights and Global Optimization" arXiv:1003.1464v1 [math.OC] 7 Mar (2010).
[16] A Hashmi, Nishant Goel, Shruti Goel, Divya Gupta,” Firefly Algorithm for Unconstrained Optimization” IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 11, Issue 1,(2013).
A Hybrid Intelligent Model to Increase the Accuracy of COCOMO
Abstract
Nowadays, effort estimation in software projects is turned to one of the key concerns for project managers. In fact, accurately estimating of essential effort to produce and improve a software product is effective in software projects success or fail, which is considered as a vital factor. Lack of access to satisfying accuracy and little flexibility in existing estimation models have attracted the researchers’ attention to this area in last few years. One of the existing effort estimation methods is COCOMO (Constructive Cost Model) which has been taken importantly as an appropriate method for software projects. Although COCOMO has been invented some years ago, it has still got effort estimation ability in software projects. Many researchers have attempted to improve effort estimation ability in this model by improving COCOMO operation; but despite many efforts, COCOMO results are not satisfying yet. In this research, a new compound method is presented to increase COCOMO estimation accuracy. In the proposed method, much better factors are gained using combination of invasive weed optimization and COCOMO estimation method in contrast with basic COCOMO. With the best factors, the proposed model’s optimality will be maximized. In this method, a real data set is used for evaluating and its operation is analyzed in contrast to other models. Operational parameters improvement is affirmed by this model’s estimation results.
Key words: Software cost estimation, meta-heuristic, Invasive Weed Optimization (IWO), accuracy.
1- Introduction
In modern world, we everyday witness an increase in share and use of most software in daily life, so that we usually benefit one or some software directly or indirectly during the day. The increase in share has been considered possible with increase in required data amount and complexity for software; but from the other hand, this enlargement in data and complexity amount has raised a series of challenges and difficulties in software making and producing process. That is why no model has been able to have an impressive success in estimating a software project cost. Estimating the cost rate with high accuracy, in a software project is intrinsically challenging [1-2]. Although there has been a lot of efforts in last decades in this regard, but there is not a “the best” method for estimating the costs. The main challenges consist of: 1- The relation between software components is complex and non-linear.2- Measuring software length is complicated and mostly impossible. 3- Using and considering all involved aspects in manufacturing cost of a software including numeric project data and professional knowledge, is a difficult affair in cost estimation. A lot of methods are offered for correctly cost estimating software manufacturing out of which, one is chosen by managers based on their own project type. One of these common mathematical methods is COCOMO. This model was firstly presented by Barry W. Boehm in 1981 with using invariant parameters and help of statistic data regression analysis method based on 63 different types of software projects [3]. Although COCOMO is one of the most common, popular methods, but has less accuracy in modern projects owed to limitation in factors, or in other word, this model is not helpful in modern projects [4]. A remedy to this matter could be a high-accurate measuring and changing COCOMO parameters based on the project type, which is the subject of this paper. This paper includes 6 parts out of which, part 2 is for related tasks and 3 is for COCOMO description. Invasive Weed Optimization (IWO) is explained in 4 and 5 contains the proposed method, and finally the conclusion is offered in part 6.
2- Related Work
In last few years, using artificial intelligence techniques, to optimize different aspects in different sciences area, especially computer sciences, has been vastly spread. Even though these techniques have got a lot of advantages, using combined methods to benefit from those advantages is very common in recent studies. Sure, there are also new methods to estimate software costs that are based on innovative and meta-heuristic methods. In these kinds of methods, when an algorithm starts to work, normally it must be trained with some other training samples, normally. It is in this stage that this method tries to set the optimization variables rates automatically and finally the algorithm ends its training process based on some conditions known as pause conditions. But this is time to entre test data to algorithm, so that it can estimate required rates for them. This part of algorithm is called test part. Different combined methods of GA-LR, GA-NLR are presented to estimate software costs which are operated on a database like NASA60, NASA93, COCOMO 81. MMRE rate on NASA93 in GA-LR in training and testing process are 0.43 and 0.48 respectively. But these rates are 0.2 and 0.42 in GA-NLR process [5]. MMRE rate on COCOMO 81 database in GA-LR and GA-LNR are respectively 0.46, 0.35, 0.44 and 0.37 in training and testing processes. The results show that the combined models have got less accuracy in contrast with GA-LR, GA-NLR, from MMRE point of view. PSO-FCM, PSO-LM combined models are offered to estimate software costs that had worked on NASA60 database. Cost estimation is done with the help of PSO on KEMERER with records of 15 projects [6]. The results say MMRE amount is 56.57 in this case and will be 245.39 with acting on COCOMO. Combined models GA, Scatter Search (SS) were also operated on NASA60, NASA93 database [7]. MMRE amount in combined model for NASA60, NASA93 are 7.56 and 23.85 respectively, while these amounts are respectively 36.51 and 19.63 in GA on NASA60, NASA93. In SS case, on NASA60, NASA93 database they are 15.21 and 29.15. MMRE amount in combined model on NASA60, NASA93 database in contrast with COCOMO is shown by results which are decreased to 3.92 and 2.96 [8]. Combined model of GA, Artificial Immune System (AIS) is also offered to operate on NASA60 [9]. In this case, the amounts of MMRE for GA, AIS model are 12.44, 18.20, 15 and 14. A model combined of GA, Firefly Algorithm (FA) is presented to estimate cost on NASA93 database. In this model, the smallest error amount is returned as cost. The results show that the MMRE amount is 58.80 on COCOMO model and for GA, FA are 38.31 and 30.34 respectively and for combined case is 22.53. The comparison show that combined model, has increased the estimation accuracy effectiveness about 2.88% in contrast with COCOMO. The methods of LR,ANN,SVR,KNN are offered for cost estimation [9] whose estimation accuracy are 60%, 95%, 80% and 60% respectively.
3- COCOMO
COCOMO model is actually an algorithmic software cost estimation model which was firstly proposed by Barry W. Boehm in 1981[10]. This model has used data of real world projects and gets to a parametric formula. Three-level hierarchy is used in this model [10]. First level is basic COCOMO which is very fast and primitive and although it has got low accuracy, it is just used for fast software cost estimation; because it does not consider software projects characteristics. But Intermediate COCOMO does.
Detailed COCOMO model is considered the effect of each of projects phases in addition to last case [11]. Basic COCOMO takes software manufacturing cost under consideration as a function of program size [12]. To categorize system complexity, COCOMO is applied on 3 different project models of Organic, Semi-detached and Embedded. Concise definition of these cases is shown in Table 1.
Table 1: Three software project types in COCOMO
Software projects |
Organic |
Semi-detached |
Embedded |
In equations related to Basic COCOMO, 3 effort parameters are done, and time and required number of people are importantly considered. While required effort rate in Intermediate COCOMO is a function of program size and a series of cost drivers; that the product of these items are known as effort adjustment factor. Equation is used by COCOMO.
Effort = a*sizeb (1)
In which a, b are 2 factors that must be set based on related software project (Table 2)
and EMi is a set of effort factors shown in Table 3.
Table 2: Types of projects in COCOMO model
Software project | a | b |
Organic | 3.2 | 1.05 |
Semi-detached | 3.0 | 1.12 |
Embedded | 2.8 | 1.20 |
Table 3: COCOMO cost drivers
Cost Drivers | Ratings | ||||||
Very Low | Low | Nominal | High | Very High | Extra High | ||
Product attributes |
|
|
|
|
|
| |
Required software reliability | 0.75 | 0.88 | 1.00 | 1.15 | 1.40 |
| |
Size of application database |
| 0.94 | 1.00 | 1.08 | 1.16 |
| |
Complexity of the product | 0.70 | 0.85 | 1.00 | 1.15 | 1.30 | 1.65 | |
Hardware attributes |
|
|
|
|
|
| |
Run-time performance constraints |
|
| 1.00 | 1.11 | 1.30 | 1.66 | |
Memory constraints |
|
| 1.00 | 1.06 | 1.21 | 1.56 | |
Volatility of the virtual machine environment |
| 0.87 | 1.00 | 1.15 | 1.30 |
| |
Required turnabout time |
| 0.87 | 1.00 | 1.07 | 1.15 |
| |
Personnel attributes |
|
|
|
|
|
| |
Analyst capability | 1.46 | 1.19 | 1.00 | 0.86 | 0.71 |
| |
Applications experience | 1.29 | 1.13 | 1.00 | 0.91 | 0.82 |
| |
Software engineer capability | 1.42 | 1.17 | 1.00 | 0.86 | 0.70 |
| |
Virtual machine experience | 1.21 | 1.10 | 1.00 | 0.90 |
|
| |
Programming language experience | 1.14 | 1.07 | 1.00 | 0.95 |
|
| |
Project attributes |
|
|
|
|
|
| |
Application of software engineering methods | 1.24 | 1.10 | 1.00 | 0.91 | 0.82 |
| |
Use of software tools | 1.24 | 1.10 | 1.00 | 0.91 | 0.83 |
| |
Required development schedule | 1.23 | 1.08 | 1.00 | 1.04 | 1.10 |
|
4- Invasive Weed Optimization (IWO)
Invasive Weed Optimization or IWO is one of the prominent algorithms in optimizing problem solving. This algorithm could be used in discrete, continuous and binary conditions using special arrangements, and very excellent results will be gained. Invasive Weed Optimization in optimizing method is inspired from weeds in the nature. This algorithm was presented by Mehrabian and Lucas in 2006 [13]. In the nature, weeds have got violent growth and this kind of growth is a new threat to useful plants. One of the most important traits of weeds is its high resistance and compliance in the nature; and this trait is optimization base in IWO.
Fig. 1: IWO phases [13]
IWO meta-heuristic phases are as follow:
· Population basic set (Initialization)
In this stage, limited number of weed seeds spreads into the environment.
· Weeds reproduction
Each seed grows to reach flowering stage. Each member can also produce seeds based on its suitability rate.
· Environment dispersion
The produced seeds disperse using normal distribution with zero average and different variances, near their parent; and every seed leads to a new plant.
· Competition elimination
A kind of competition elimination process is done among weeds to control their amount maximum. This process is so that the weeds with low suitability must be eliminated among the rest of weeds. Weeds reproduction continues to reach the utmost plants and only the plants with higher utility can survive and produce seeds and the rest will destroy.
5- Proposed model
Software manufacturing cost estimation process is one of the most important and the most essential aspects in managing and planning in software projects, for which different algorithms are presented. In this study, meta-heuristic algorithm of Invasive Weed Optimization is used to make software effort estimation process optimized as much as possible. Cost estimation of COCOMO 81 which is containing complete information of 63 real software projects, is used in this study; as in this database, seventeen characteristics are saved for each of these projects. Based on equation 1, it is clearly observed that the effort rate required for software development is completely associated with software size and other 15 characteristics of that project. It must be fully considered these projects are categorized to three kinds of Embedded, Semi-detached and Organic.
5-1- Optimization process
In this model’s training process, new factors are offered for COCOMO using Invasive Weed Optimization; and then the proposed model is made based on it. COCOMO 81 database containing information of 63 software projects is as an input to the algorithm. In this database, the dependent variable is effort rate which is the same 17 characteristics in database, and other 15 variables act as independent ones. It is fully cared that the sixteenth variable represents software size. At the beginning of this phase, the projects are divided randomly into two categories of training and testing projects. Now, this is Invasive Weed Optimization that suggests the factors based on defined period, for optimization variables. Then according to equation 1, estimated effort rate is calculated for each project. After that, relative error amount and its average for the happened forecasts will be counted and then returned to Invasive Weed Optimization as a result. Since Invasive Weed Optimization is an optimizing algorithm and its aim is to minimize its cost function, it tries to minimize the received amount; therefore according to previous amounts, it arranges new amount of relative error and again offers new amounts for optimization variables. The proposed model parameters are actually set in this section. By parameters, it means a, b factors in COCOMO, i.e. the aim is producing optimized factors. At first, all projects are randomly divided into two categories of training and testing. According to COCOMO relation, training data is categorized to 3. They insist of Embedded, Semi-detached and Organic data. Grouping the projects in proposed model is because of high degree of heterogeneity in COCOMO projects. In fact this model tries to raise the estimation accuracy locally and slightly.
Out of existed categories, one of the Embedded, Organic and Semi-detached will be chosen in next stage. Using Invasive Weed Optimization and proposed factors, the effort of projects is estimated in chosen category. After determining a, b factors by Invasive Weed Optimization, COCOMO must be calculated through Equation 2. In this equation, a, b are offered by Invasive Weed Optimization.
MM = a*(size) b*EAF (2)
The effort of all existing projects is actually estimated using proposed factors. Finishing estimation process for projects of a category, operational parameters must be measured for that category. Operational parameters are MMRE1 and PRED2 here. To calculate MMRE and PRED it is firstly required to calculate MRE3through Equation 4. MMRE and PRED will be calculated in 5 and 6 functions, after calculating MRE of all projects. MMRE amount equals the average of gained amounts for MRE in intended category. PRED equals the percentage of projects whose MRE amount is less than or equal to X.
5-2- Efficiency measuring parameters
There are different efficiency parameters to evaluate the accuracy of an estimation set-up, but only two important prevalent parameters are used here:
1- Mean Magnitude Relative Error(MMRE) and
2- Prediction percentage (PRED), which are calculated through equations 3 to 6.
RE = (3)
MRE = *100 (4)
MMRE = (5)
PRED(x) = (6)
- Mean Magnitude of Relative Error (MMRE)
2 - Prediction (PRED)
3 -Magnitude of Relative Error (MRE)
In above equations, A is the number of estimated projects in which the predicted error is less than or equal to X, and N is the number of all estimated projects. X is usually considered 0.25 in most software projects proposed models will be compared with each other based on this value,. Prior studies have normally considered this value as an agreement. We also take X equal to 0.25 and then calculate PRED for that. While MMRE must be minimized, PRED must get maximized as much as possible. In testing stage, results of training stage will be used to evaluate the proposed model. Used data in this section are testing projects and of course optimized factors will be used for estimating. One of testing projects is firstly selected and it must be specified that which COCOMO category of 3 (Embedded, Semi-detached and Organic) it belongs to. The factors related to intended category will be extracted to assign the matter which category it belongs to. It is very required to clarify that the related factors are saved in training phase and are able to be retrieved now.
In this phase, the gained effort for mentioned project is saved and a new project is selected from testing stage and all above-mentioned matters will be done to estimate the new project effort. This process will be repeated for all existing projects in testing set. Then, MRE of testing projects will be calculated; i.e. there are some MREs to the number of estimated testing projects whose average produces MMRE. After that, total MMRE and PRED will be calculated in evaluating part. All these processes are illustrated in Figure 2. As it is shown in Figure 2, project categorizing in training phase, will be utilized in testing stage so that each project enjoys some special factors noticing its own nature. In addition to that, each project uses just the cost motivator, so that is helpful and former for that project.
Fig. 2: Process of proposed model
5-3- Results
COCOMO 81data set is used to measure the accuracy of proposed model combined with Invasive Weed Optimization. Environment software of Maltab 2015 has been used for the process modeling. The results will be presented based on MMRE, PRED (0.25) and MdMRE criteria. Table 4 shows the comparison between basic COCOMO and proposed model in evaluation parameters frame. As it is shown, the proposed model has had much better operation in all 3 parameters. It is required to describe that offered results are the average of Organic, Semi-detached and Embedded projects results.
Table 4: Primary amount-giving to IWO
| MMRE | PRED(0.25) | MdMRE | |
COCOMO | 0.318 | 0.3492 | 0.351 | |
COCOMO+IWO | 0.2931 | 0.3722 | 0.332 |
In proposed model, MMRE-PRED is considered as cost function output. Thus the aim of cost function is minimizing MMRE and maximizing PRED. MMRE criterion is compared in COCOMO and proposed model is more in this parameter.
Fig. 6: Comparing COCOMO and proposed model based on MMRE
PRED is compared in COCOMO and proposed model, in fig. 7. It is clear from the picture that PRED rate is higher in proposed model, and it means that estimation accuracy has been higher in proposed model in many comparisons.
Fig. 7: Comparing COCOMO and proposed model based on PRED
MdMRE is evaluated in Figure 8 and as it is clear, the amount of this parameter is less, i.e. estimation accuracy is more in proposed model in comparison with COCOMO.
Fig. 8: Comparing COCOMO and proposed model based on MdMRE
6- Conclusion
Software cost estimation and actually required effort estimation is one of the most critical activities in software projects management. COCOMO is very actively considered for effort estimation among all existing methods and is used in abundance.
In this study, it was tried to improve COCOMO operation and change its structure. For this purpose, Invasive Weed Optimization was used to optimize COCOMO and new factors were defined for that. In fact, a new image of COCOMO was presented using Invasive Weed Optimization. The operation of proposed combined model was measured using real data, and the results showed that the optimization process has been able to improve operational parameters very well. As the future work, we are going to apply other optimization algorithms to the proposed model.
References
[1] B.W. Boehm, et al (2000).” Software Cost Estimation with COCOMO II”, Prentice Hall.
[2]K.D. Maxwell, P. Forselius (2000), “Benchmarking software development productivity”, IEEE Software, 17 (1): 80–88.
[3] Boehm, B. W. (1981). Software engineering economics. Englewood Cliffs, NJ: Prentice Hall.
[4]A.Iman and H.O.Siew. (2010). Soft Computing Approach for Software Cost Estimation, Int.J. of Software Engineering, IJSE, 3(1):1-10.
[5] Heiat, A. (2002).Comparison of artificial neural network and regression models for estimating software development effort. Information and software Technology 44.15: 911-922.
[6] Gharehchopogh, Soleimanian, F; et al. (2014). A Novel PSO based Approach with Hybrid of Fuzzy C-Means and Learning Automata in Software Cost Estimation. Indian Journal of Science and Technology 7.6: 795-803.
[7] Maleki, I. Gharehchopogh, Ayat, F. S, Ebrahimi, L. (2014). A Novel Hybrid Model of Scatter Search and Genetic Algorithms for Software Cost Estimation. MAGNT Research Report, 2 (6): 359-371.
[8] Leung, Hareton, Zhang, F.(2002). "Software cost estimation." Handbook of Software Engineering, Hong Kong Polytechnic University.
[9] Gharehchopogh, Soleimanian, F; et al. (2014).A Novel Hybrid Artificial Immune System with Genetic Algorithm for Software Cost Estimation. MAGNT Research Report, 2 (6): 506-517.
[10] Atashpaz, G. E. et al.(2008). Colonial competitive algorithm: a novel approach for PID controller design in MIMO distillation column process. International Journal of Intelligent Computing and Cybernetics 1(3): 337-355.
[11] Bardsiri, V. k; et al. (2013). A PSO-based model to increase the accuracy of software development effort estimation. Software Quality Journal 21(3): 501-526.
[12] Catal, C., Mehmet, S. A. (2011). A Composite Project Effort Estimation Approach in an Enterprise Software Development Project. International Conference on Software Engineering & Knowledge Engineering.
[13] A. R. Mehrabian and C. Lucas, (2006) “A novel numerical optimization algorithm inspired from weed colonization,” Ecological Informatics, 1(4):355–366.