Optimization of weighting-based approach to predict and deal with cold start of web recommender systems using cuckoo algorithm

molaee fard, reza

Manuscript ID : JACET-2107-1474 (R2) Visit : 406 Page: 137 - 146

Article Type: Original Research

Optimization of weighting-based approach to predict and deal with cold start of web recommender systems using cuckoo algorithm

Subject Areas : Data Mining

reza molaee fard ¹

1 - Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran

Received: 2021-07-16 Accepted : 2022-03-01 Published : 2021-05-01

Keywords:

Abstract :

References:

Full-Text:

Journal of Advances in Computer Engineering and Technology

Optimization of weighting-based approach to predict and deal with cold start of web recommender systems using cuckoo algorithm

First A. Author, Second B. Author, and Third C. Author

Received (Day Month Year)

Revised (Day Month Year)

Accepted (Day Month Year)

Abstract— Recommending systems are systems that, by taking limited information from the user and features such as what the user has searched for in the past and what product they have rated, can correctly identify the user and the desired items Offer the user. The user's desired items are suggested to him through the user profile. In this research, a new method is presented to recommend the user's interests in the form of the user's personalized profile. The way to do this is to use other users' searched information in the form of a database to recommend to new users. The procedure is that we first collect a log file from the items searched by users, then we pre-process this log file to remove the data from the raw state and clean it. Then, using data weighting and using the score function, we extract the most searched items of users in the past and provide them to the user in the form of a recommendation system based on participatory filtering. Finally, we use our data using an algorithm. We optimize the cuckoo that this information can be of interest to the user. The results of this study showed 99% accuracy and 97% frequency, which can to a large extent correctly predict the user's favorite items and pages and start with the problem that is the problem of most recommender systems To confront.

Index Terms— Recommender system, weighting, cold start, page prediction, cuckoo algorithm, data mining.

I. INTRODUCTION

Knowledge discovery and data mining provide an array of solutions for real-world problems. When facing business requirements, the ultimate goal of knowledge discovery is not the knowledge itself but rather making the gained knowledge practical. Consequently, the models and patterns found by the mining methods often require post-processing. To this end, actionable knowledge discovery has been introduced which is developed to extract actionable knowledge from data. The output of actionable knowledge discovery is a set of actions that help the domain expert to gain the desired outcome. Such a process where a set of actions are extracted is called action extraction [1]. One of the best ways to extract information is through recommender systems. Recommending systems are systems that can provide appropriate suggestions to the user by taking limited information from the user and features such as what the user has searched for in the past as well as what products they have rated. An advisory system with the ability to gather information about users' tastes, interests and preferences, categorize and interpret them, allows users to access the information they want with less time and energy. Recommender Systems A subset of an information filtering system that seeks to predict a user's rating or priority on an item. In recent years, recommendation systems have become very common and have been used in various fields. Some popular uses for these systems include music, web pages, news, books and articles, search, and social media. The recommendation system is one of the tools that can guide users in electronic environments to find the information, services and items they want. Recommending systems, with the ability to discover users' interests and predict their preferences, refine items that are likely to be of interest to the user from large volumes of data or offer them time-saving. On the other hand, these systems also infer the ability to store and analyze the user's past behaviors, services and information that are not of interest to users and may be interested in them, and provide interesting results to users. In fact, recommender systems are one of the main tools to overcome the problem of information redundancy and with the power to analyze user behaviors, it is an intelligent complement to the concepts of information retrieval and refinement. One of the main problems of recommendation systems is cold start. This problem occurs when new users log in or new items are added to the directory. In such cases, neither the new user's tastes can be predicted nor can new items be rated or purchased by users, which leads to inappropriate and less accurate offers. The problem of cold start can be solved in many ways, including: a) first ask the user to rank some items, b) generally ask the user to clearly express his taste, and c) Based on the collected demographic information, suggest items to the new user. Demographic information can be used to know location, zip code, etc. This information is collected during new user interactions with the system and is used to suggest items rated by other similar users, users with similar demographic information to this user. Have a recommender system is an information retrieval technology that improves access and dynamically suggests relevant items to users, taking into account explicit preferences and behaviors explicitly expressed by users. A Recommender system is actually one of the main ways to deal with the problem of information overload in the field of information retrieval and it does this by offering relevant and appropriate items to users. Today, several recommendation systems have been developed for different domains, however, these systems are not accurate enough to meet the information needs of users. Therefore, it is necessary to create very high quality advisory systems [18]. In this research, a new method is proposed to improve the efficiency and accuracy of recommender systems, which can solve the problem of cold start, which is the problem of most recommender systems. The method used in this research is the use of data weighting and the use of the score function in the form of a recommender system based on participatory filtering. According to research, this recommender system can be much more efficient than previous systems.

2. Related work

In their 2020 article, Maazouzi et al. presented a new way to improve the recommendation of web pages to users. In their paper, the researchers presented an effective suggestion system based on user conversations and grouping them into different groups, then effectively advising a group that had similar characteristics. The authors used Pearson correlation coefficient method and user TED negotiations. They then used the pages using the K-Means clustering method to group users and then suggest to the target user. The results of the researchers' research, which was performed using RMSE, obtained better accuracy and recall than other available methods [2].

In their 2020 paper, Riyahi and Sohrabi presented a way to improve web page recommendation using a hybrid recommender system and the use of data tagging. The researchers extracted the meaning of the tags using the WORDNET vocabulary database and then organized the tags into a hierarchical structure based on their semantic significance. The hierarchical structure was used to search for relevant tags in the content filtering section, and user requests were expanded using related semantic web and calculated in the collaborative filtering section. The result of combining these two sections was a hybrid suggestion system that could suggest pages to users [3].

In their paper, Pabitha et al. presented a way to improve recommendation systems with a weight-based approach. In their paper, the researchers discussed a participatory filtering-based approach to advising users. In this research, the clustering method is used to identify the neighborhood of the current user to suggest relevant sources. A weight-based method is used to calculate resource rankings. This method has been adopted to reduce the system against the problem of data scatter. The system is a client-side web application that provides recommendations by building user resource charts and ranking resources by a newly designed method similar to search algorithms. The results of the research indicate the improvement of the recommendation in these systems [4].

In his 2018 article, Liu introduced a method based on weighting and labeling to improve recommendations for users. In this research, to address the problem of expressing personal interest and recommending a user-related web resource, a proposed tag-based and weighting framework is proposed for markup websites, in which user profiles, tags and resources are Reciprocity is defined in a single form and "interest pursuit" based on social network analysis to calculate the impact of social relationships on individual interests. This method compares itself with several common filter-based recommendation methods using data sets collected from two social bookmarking websites. The results show that it improves the performance of resource recommendation and outperforms the basic methods [5].

In his 2019 article, Waq and Patil use web mining techniques to personalize the web and recommend web pages. These techniques are used to find the relationship between web pages, the clustering stage, and the classification in data mining and data analysis methods. The two researchers have modeled new measures for the relationship between pages, such as the distance matrix, the occurrence frequency matrix, and the relationship matrix. For the relationship between web pages, a virtual graph is created that fits the relationship matrix. By presenting an advanced search algorithm, they divide the virtual chart into different clusters, i.e. infertility patterns. This method is a diagram-based algorithm. Using the LCS algorithm, the active user is classified in one of the clusters and finally a threshold value is used to suggest only the optimal pages to the user [6].

In their 2020 article, Kalanat & Khanjari presented a way to extract and discover knowledge on social media. In their paper, considering the problems of previous methods of information extraction, these researchers tried to present a method whose main focus is the extraction of knowledge in social networks based on the characteristics of nodes. The actions of these researchers showed that there are optimal changes in the properties of nodes that lead to the desired tag if users apply tags. The method was to develop a feature-based extraction operation method that naturally combines network structure information with node properties, in which the goal is to learn a function that changes the values of node properties. In turn, it affects the weight of the edges in the network, so that the label of the people in question may minimize the label of the desired feature. The cost of modifying the experiments confirms that the proposed approach works better than the current most advanced method in mining [19].

2.1. Recommender systems

Recommending systems have become very important in recent years. The goal of any proposing system is for consumers to be able to find new goods or services, such as movies, articles, the Web, books, music, restaurants, or even people, based on information about the consumer or recommended [8, 7]. The recommender system is a system that, according to the user's preferences, recommends items jointly to a group of users [9]. Recommending systems are systems that help to find the user's interests in situations of over-information. Where the user's preferences are estimated based on the behavior observed in the past and can provide the user with a ranked list of suggestions [10].

2.2. Collaborative refinement recommendation system

Suggests participatory filtering by matching users with other users with similar interests. This type of filtering collects user feedback and does so in the form of user-generated ratings for each item, and finds adaptations in user-centric rating behaviors to find a group of users who have similar preferences [11]. Here, a user profile represents the user's preferences, preferences that the user has explicitly or implicitly provided [12]. In a typical collaborative filtering environment, there is a list of m user numbers and item items that are displayed as and, respectively. Are given. Each user has a list of items that are explicitly or implicitly ranked. This produces a user-item matrix called ‘R’, a matrix that shows the user preferences for items. To find unknown rankings, various methods are used, such as finding the "nearest neighbor" so that items offered to new users are based on rankings provided by their nearest neighbors.

3. Proposed method

The proposed approach tries to suggest learning resources to the learner based on his / her preferences and his / her previous search history, a history extracted from log files. This approach combines learning styles and collaborative filtering techniques to improve the quality of suggestions. The proposed system provides an initial training strategy for cold start of the session to deal with the problem of cold start. This happens in cases where there is a lack of data about learners and their preferences, which makes it impossible to make suggestions. Therefore, we have also used the participatory filtering approach to review these initial proposals. The idea of this method is to make predictions about the learner's preferences based on the preferences of other people who are similar to the active learner. In fact, by looking at the profiles of learners, a group of learners is identified whose preferences match the current learner. The learner profiles are then compared with different groups of learners that have already been categorized and the most appropriate neighborhood is identified. In addition, to update this profile, we have defined a new score function for weighting learning resources, to extract the learner preferences from the log files. Also, this function allows you to increase and strengthen the performance and normalize the rankings to prevent the problem of data scatter. The architecture for proposing resources using this method is shown in Figure 1.

Fig 1. An overview of the proposed method

Data Base

To implement the proposed system, we must first collect a log file from the requests of different users. To do this, we use the NASA log file that can be extracted at http://ita.ee.lbl.gov/html/contrib/NASAHTTP. This log file contains 978,000 data related to user searches in a specific period of time that we randomly selected a database containing 20,000 records for this research. The steps are to convert this log file into user sessions using the 30-minute rule. The weight of each page viewed by the user is calculated in one session. This continues until there is no meeting left. At the end of this step, a file containing 1650 sessions was extracted from the initial log file, which was cleared to 1431 sessions.

Step 1. Data preprocessing

In the first step of the proposed method, we must first perform the data preprocessing operation because it is usually not possible to inject data raw into data mining algorithms. To prepare the data, it is necessary to take it out of its original form and form and transform it into a form that is suitable for the algorithm. Also, the available data usually have different extras that may confuse the algorithm. In data mining, we also need to remove additional data that does not help the problem and the algorithm. Data preprocessing operations are usually performed before the main operation of data mining algorithms and facilitate and assist the algorithms. Data processing is an important step towards successful data mining.

Data cleansing

At this point we need to clear the existing data. Data cleansing is the process of eliminating errors and inconsistencies in the data and is in fact the stage of quality control before performing data analysis. Data from real-world sources are often inaccurate, incomplete, and inconsistent due to operational errors and systems implementation. Such data needs to be cleared first.

Data normalization

At this point we need to normalize the data, normalizing the data changes the data so that it is mapped to a small and definite range such as the distance between 1- and 1. The goal of normalization is to eliminate data redundancy and maintain dependencies between related data. This process often results in more tables, but reduces database measurement and ensures improved performance.

Step 2. Data clustering with SW-DBSCAN algorithm

Next we need to cluster our data to find similarities between the data. The clustering method used in the research is the use of an improved version of the DBSCAN clustering algorithm, SW-DBSCAN. SW-DBSCAN is a network-based clustering method to reduce the time complexity and increase the efficiency of DBSCAN to obtain more accurate clusters, which we cluster in this research through this algorithm. First, we divide the network into different cells and then run the DBSCAN algorithm on the other cells according to the number of points in each cell. Then we change the coordinates of the network. We now have a new network in which we will apply the same method as before. We now have two networks. The next step is to integrate the resulting clusters in the first network according to the second network. To do this, we create a matrix that helps us determine which categories should be merged. Finally, we integrate the first network clusters based on the generated matrix. As a result, it gives us clusters that are so accurate that this data can be used to identify the user's interest in a particular set of data [13]. The results of clustering were extracted from 1431 active sessions of 118 clusters.

Step 3. Weighing learning resources

After clearing and pre-processing the web logs, the data is converted or merged into the appropriate forms for the proposed purposes. For this purpose, we have defined the ranking weight of each learning activity using the following score function:

That is the explicit score given by the learner to each learning object θ. is also an implicit score and S is a social dimension score. the and β parameters are selected to normalize the IMP and functions with unit 5. The explicit score is as follow:

(1)

is equal to one when stored in the bookmark, otherwise it is equal to zero. The function where t is the time spent by the learner during the learning object θ, C is the access frequency of the learning activity.

Finally, the S function, which is the social ranking dimension, is defined as follows:

(2)

Is the time elapsed during all simultaneous and asynchronous communication using the communication and association tool, c is the number of contributions and interactions with this tool?

After weighting the learning resources, we obtain a preference model for each learner, which is defined as a learner learning object ranking matrix (LLOR) with n rows and m columns, where n represents the number of learners, and m represent the number of learning objects

This matrix uses a rating scale of 0 to 10: 10 means that the learner is highly satisfied with the chosen learning object; 5 indicates that the learner is relatively dissatisfied; 1 indicates that the learner is relatively satisfied with the learning object. Not at all satisfied, and finally a score of zero indicates that the learning object has not yet been explicitly ranked or has not been used at all [14].

Step 4. Calculate similarity

The critical step in memory-based participatory filtering methods is to define the similarity and dissimilarity between users or items. Weight is measured for the similarity between two learners’ u and v with Pearson correlation coefficient, which is as follows:

(3)

In relation to the above equation: and are the average ranking of learner’s u and v, respectively; and rank learners u and v for learning object j, respectively.

After the similarity between the two learners is calculated, a similarity matrix is generated, where N is the number of learners. Then, to predict the unrated learning object in the ranking matrix by the active learner u, the number K of the most similar learners will be selected and used as input to calculate the prediction u on j.

Step 5. Optimize the information using the cuckoo algorithm

In the next step, we will optimize our data using the cuckoo meta-heuristic algorithm. The cuckoo algorithm is one of the newest and most powerful evolutionary optimization methods. This algorithm uses the lifestyle of a bird called a cuckoo. This algorithm starts with an initial population. This population of cuckoos has a number of eggs that they will lay in the nest of a number of host birds. Some of these eggs are more similar to the eggs of the host bird, more likely to be the eggs of the host bird, more likely to grow into an adult cuckoo. Other eggs identified by the host bird are destroyed. The amount of eggs grown indicates the suitability of the nests in that area. The more eggs that can live and survive in an area, the more profit will be made to that area; Therefore, the situation in which the largest number of eggs are saved will be a parameter that cuckoos intend to optimize [17, 15, 15].

· To solve an optimization problem, it is necessary to form the values of the problem variables in the form of an array.

· In the cuckoo algorithm, the array is called Habitat.

· In the next optimization problem, a Habitat will be an array that represents the current living position of the cuckoos. This array is defined as follows.

(4)

The appropriateness of the current Habitat is obtained by evaluating in the Habitat, therefore:

(5)

· As can be seen, the cuckoo algorithm is an algorithm that maximizes the profit function.

· To use the cuckoo algorithm to solve the minimization problems, it is enough to multiply a negative sign in the cost function.

· To start the optimization algorithm, we generate a Habitat matrix with size

· A number of random eggs are then assigned to each of these habitats.

· In the wild, each cuckoo lays between 5 and 20 eggs. These numbers are used as the upper and lower limits of each cuckoo egg allocation in different iterations.

· Another habit of the real cuckoo is that they lay their eggs in a certain range.

(6)

Fig 2. Frame of the cuckoo algorithm

Step 6. Generate recommendations

To make a prediction and generate suggestions for an active learner u about specific learning objects j, we can learn a weighted average of all the rankings made on these learning objects according to the following formula:

(7)

In relation (5), represents the rating value given by user v for the selected learning object j. The ranking scatter is calculated as follows:

(8)

Explicit, implicit, and social rankings of learning objects are displayed numerically from zero to 5 using our formula (1) for weighting learning objects.

Testing process. The experiments were conducted specifically to find the following questions: 1) how can parameters such as the number of neighborhoods and the size of the data set affect the predictions? 2) How can the performance of our recommender system be obtained by comparing our algorithm in different datasets?

Results. In our experiments, we have used explicit ratings first, then explicit and implicit ratings, and finally improved our dataset by adding the cuckoo optimization algorithm.

4. Evaluate the proposed method

We have mainly focused on testing the prediction accuracy of our proposed method, and to do this we have used the absolute mean error (MAE), which is the most widely used method for comparing the deviation between predictions and actual values specified by the user. MAE is defined as follows:

(9)

Is the total number of ratings for all users; is the predicted ranking for the learner u using the learning object j; and is the learner's rating. Obviously, the smaller the MAE value, the better the algorithm performance.

You can see the results obtained from these two criteria in the following tables.

Fig 3. MSE value of the proposed method

Fig 4. MAE value of the proposed method

It is often used to validate recommender systems such as system accuracy and item recall. In this research, these criteria have been used to evaluate the system. Accuracy and recall in recommender systems are calculated using the following two equations.

Accuracy is calculated using the following equation.

(10)

The call is calculated using the following equation.

(11)

Fig 5. Comparison of the accuracy of the proposed method and other available methods

Fig 5. Comparison of the proposed method call rate and other available methods

Fig 5. Comparison of the accuracy of the proposed method and other available methods

Fig 6. Comparison of the proposed method call rate and other available methods

Fig 7. Comparison of clustering algorithms used in the proposed method and other clustering algorithms

Fig 8. Comparison of cuckoo algorithm and other meta-heuristic algorithms

5. Conclusion and Future works

Today, due to the increasing growth of web pages and the problems that arise for users when using the web, the existence of a system that can reduce the problems of users, it seems necessary. To reduce these problems, we need to customize the systems in question. There are many ways to customize search engines, but the most practical way to do this is to use referrer or recommender systems. Recommending systems are systems that, by taking limited information from the user, can make good suggestions to the user and help the user when faced with a cold start. By recognizing the behavior of users, these systems can discover their interests and use this information to offer appropriate suggestions to the user that may be of interest to him. In this research, a new method was proposed to improve the recommender systems in the field of web and it was tried to provide a way to improve the recommender systems by covering the problems and shortcomings of the previous systems. By comparing the proposed and studied methods and reviewing the results obtained with previous methods, the system has a more acceptable performance. The proposed system was able to achieve an acceptable percentage in the accuracy and recall sections, so that this system was able to score 99% in the accuracy section and 97% in the recall section. The results of the research indicate the improvement of recommender systems using the proposed method compared to previous systems.

References

1. Kalanat, N., Khanshan, A., & Khanjari, E. (2020). Actionable knowledge discovery from social networks using causal structures of structural features. Journal of Intelligent & Fuzzy Systems, 39(1), 489-501.‏

2. Maazouzi, F. Zarzour, H. & Jararweh, Y. (2020). An effective recommender system based on clustering technique for ted talks. International Journal of Information Technology and Web Engineering (IJITWE), 15(1), 35-51.

3. Riyahi, M. & Sohrabi, M. K. (2020). Providing effective recommendations in discussion groups using a new hybrid recommender system based on implicit ratings and semantic similarity. Electronic Commerce Research and Applications, 40, 100938.

4. Pabitha, P., Amirthavalli, G., Vasuki, C., & Mridhula, J. (2014, April). Graph based resource recommender system. In 2014 International Conference on Recent Trends in Information Technology (pp. 1-6). IEEE.‏

5. Liu, H. (2018). A tag-based recommender system framework for social bookmarking websites. International Journal of Web Based Communities, 14(3), 303-322.‏

6. Wagh, R., & Patil, J. (2019). A Novel Web Page Recommender System for Anonymous Users Based on Clustering of Web Pages. Asian Journal For Convergence In Technology (AJCT).

7. krishnan, G., Saicharan, V., Chandrasekaran, K., Rathnamma, M. V., & Ramana, V. V. (2020). Collaborative Filtering for Book Recommendation System. In Soft Computing for Problem Solving (pp. 325-338). Springer, Singapore.

8. Wang, X., Jin, H., Zhang, A., He, X., Xu, T., & Chua, T. S. (2020, July). Disentangled Graph Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1001-1010).

9. Chartier, J. F., Mongeau, P., & Saint-Charles, J. (2020). Predicting semantic preferences in a socio-semantic system with collaborative filtering: A case study. International Journal of Information Management, 51, 102020.

10. Jeunen, O., Van Balen, J., & Goethals, B. (2020, September). Closed-Form Models for Collaborative Filtering with Side-Information. In Fourteenth ACM Conference on Recommender Systems (pp. 651-656).

11. Botangen, K. A., Yu, J., Sheng, Q. Z., Han, Y., & Yongchareon, S. (2020). Geographic-aware collaborative filtering for web service recommendation. Expert Systems with Applications, 113347.

12. Alhijawi, B., & Kilani, Y. (2020). The recommender system: A survey. International Journal of Advanced Intelligence Paradigms, 15(3), 229-251.

13. Ohadi, N., Kamandi, A., Shabankhah, M., Fatemi, S. M., Hosseini, S. M., & Mahmoudi, A. (2020, April). SW-DBSCAN: A grid-based DBSCAN algorithm for large datasets. In 2020 6th International Conference on Web Research (ICWR) (pp. 139-145). IEEE

14. Bourkoukou, O., & Achbarou, O. (2018). Weighting based approach for learning resources recommendations. JOIV: International Journal on Informatics Visualization, 2(3), 104-109.‏

15. Boveiri, H. R. (2020). An enhanced cuckoo optimization algorithm for task graph scheduling in cluster-computing systems. Soft Computing, 24(13), 10075-10093.‏

16. Cai, X., Niu, Y., Geng, S., Zhang, J., Cui, Z., Li, J., & Chen, J. (2020). An under‐sampled software defect prediction method based on hybrid multi‐objective cuckoo search. Concurrency and Computation: Practice and Experience, 32(5), e5478.

17. Inci, M., & Caliskan, A. (2020). Performance enhancement of energy extraction capability for fuel cell implementations with improved Cuckoo search algorithm. International Journal of Hydrogen Energy.‏

18. Khusro, S., Ali, Z., & Ullah, I. (2016). Recommender systems: issues, challenges, and research opportunities. In Information Science and Applications (ICISA) 2016 (pp. 1179-1189). Springer, Singapore.‏

19. Kalanat, N., & Khanjari, E. (2020). Extracting actionable knowledge from social networks with node attributes. Expert Systems with Applications, 152, 113382.‏

(12)

[1]

Share To

Article Url

Optimization of weighting-based approach to predict and deal with cold start of web recommender systems using cuckoo algorithm

Sanad

Links

Related Centers

Technical Support

Official pages