Optimization of weighting-based approach to predict and deal with cold start of web recommender systems using cuckoo algorithm
Subject Areas : Data Mining
1 - Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran
Keywords:
Abstract :
3
Journal of Advances in Computer Engineering and Technology
Optimization of weighting-based approach to predict and deal with cold start of web recommender systems using cuckoo algorithm
First A. Author, Second B. Author, and Third C. Author
Received (Day Month Year)
Revised (Day Month Year)
Accepted (Day Month Year)
Abstract— Recommending systems are systems that, by taking limited information from the user and features such as what the user has searched for in the past and what product they have rated, can correctly identify the user and the desired items Offer the user. The user's desired items are suggested to him through the user profile. In this research, a new method is presented to recommend the user's interests in the form of the user's personalized profile. The way to do this is to use other users' searched information in the form of a database to recommend to new users. The procedure is that we first collect a log file from the items searched by users, then we pre-process this log file to remove the data from the raw state and clean it. Then, using data weighting and using the score function, we extract the most searched items of users in the past and provide them to the user in the form of a recommendation system based on participatory filtering. Finally, we use our data using an algorithm. We optimize the cuckoo that this information can be of interest to the user. The results of this study showed 99% accuracy and 97% frequency, which can to a large extent correctly predict the user's favorite items and pages and start with the problem that is the problem of most recommender systems To confront.
I. INTRODUCTION
Knowledge discovery and data mining provide an array of solutions for real-world problems. When facing business requirements, the ultimate goal of knowledge discovery is not the knowledge itself but rather making the gained knowledge practical. Consequently, the models and patterns found by the mining methods often require post-processing. To this end, actionable knowledge discovery has been introduced which is developed to extract actionable knowledge from data. The output of actionable knowledge discovery is a set of actions that help the domain expert to gain the desired outcome. Such a process where a set of actions are extracted is called action extraction [1]. One of the best ways to extract information is through recommender systems. Recommending systems are systems that can provide appropriate suggestions to the user by taking limited information from the user and features such as what the user has searched for in the past as well as what products they have rated. An advisory system with the ability to gather information about users' tastes, interests and preferences, categorize and interpret them, allows users to access the information they want with less time and energy. Recommender Systems A subset of an information filtering system that seeks to predict a user's rating or priority on an item. In recent years, recommendation systems have become very common and have been used in various fields. Some popular uses for these systems include music, web pages, news, books and articles, search, and social media. The recommendation system is one of the tools that can guide users in electronic environments to find the information, services and items they want. Recommending systems, with the ability to discover users' interests and predict their preferences, refine items that are likely to be of interest to the user from large volumes of data or offer them time-saving. On the other hand, these systems also infer the ability to store and analyze the user's past behaviors, services and information that are not of interest to users and may be interested in them, and provide interesting results to users. In fact, recommender systems are one of the main tools to overcome the problem of information redundancy and with the power to analyze user behaviors, it is an intelligent complement to the concepts of information retrieval and refinement. One of the main problems of recommendation systems is cold start. This problem occurs when new users log in or new items are added to the directory. In such cases, neither the new user's tastes can be predicted nor can new items be rated or purchased by users, which leads to inappropriate and less accurate offers. The problem of cold start can be solved in many ways, including: a) first ask the user to rank some items, b) generally ask the user to clearly express his taste, and c) Based on the collected demographic information, suggest items to the new user. Demographic information can be used to know location, zip code, etc. This information is collected during new user interactions with the system and is used to suggest items rated by other similar users, users with similar demographic information to this user. Have a recommender system is an information retrieval technology that improves access and dynamically suggests relevant items to users, taking into account explicit preferences and behaviors explicitly expressed by users. A Recommender system is actually one of the main ways to deal with the problem of information overload in the field of information retrieval and it does this by offering relevant and appropriate items to users. Today, several recommendation systems have been developed for different domains, however, these systems are not accurate enough to meet the information needs of users. Therefore, it is necessary to create very high quality advisory systems [18]. In this research, a new method is proposed to improve the efficiency and accuracy of recommender systems, which can solve the problem of cold start, which is the problem of most recommender systems. The method used in this research is the use of data weighting and the use of the score function in the form of a recommender system based on participatory filtering. According to research, this recommender system can be much more efficient than previous systems.
2. Related work
In their 2020 article, Maazouzi et al. presented a new way to improve the recommendation of web pages to users. In their paper, the researchers presented an effective suggestion system based on user conversations and grouping them into different groups, then effectively advising a group that had similar characteristics. The authors used Pearson correlation coefficient method and user TED negotiations. They then used the pages using the K-Means clustering method to group users and then suggest to the target user. The results of the researchers' research, which was performed using RMSE, obtained better accuracy and recall than other available methods [2].
In their 2020 paper, Riyahi and Sohrabi presented a way to improve web page recommendation using a hybrid recommender system and the use of data tagging. The researchers extracted the meaning of the tags using the WORDNET vocabulary database and then organized the tags into a hierarchical structure based on their semantic significance. The hierarchical structure was used to search for relevant tags in the content filtering section, and user requests were expanded using related semantic web and calculated in the collaborative filtering section. The result of combining these two sections was a hybrid suggestion system that could suggest pages to users [3].
In their paper, Pabitha et al. presented a way to improve recommendation systems with a weight-based approach. In their paper, the researchers discussed a participatory filtering-based approach to advising users. In this research, the clustering method is used to identify the neighborhood of the current user to suggest relevant sources. A weight-based method is used to calculate resource rankings. This method has been adopted to reduce the system against the problem of data scatter. The system is a client-side web application that provides recommendations by building user resource charts and ranking resources by a newly designed method similar to search algorithms. The results of the research indicate the improvement of the recommendation in these systems [4].
In his 2018 article, Liu introduced a method based on weighting and labeling to improve recommendations for users. In this research, to address the problem of expressing personal interest and recommending a user-related web resource, a proposed tag-based and weighting framework is proposed for markup websites, in which user profiles, tags and resources are Reciprocity is defined in a single form and "interest pursuit" based on social network analysis to calculate the impact of social relationships on individual interests. This method compares itself with several common filter-based recommendation methods using data sets collected from two social bookmarking websites. The results show that it improves the performance of resource recommendation and outperforms the basic methods [5].
In his 2019 article, Waq and Patil use web mining techniques to personalize the web and recommend web pages. These techniques are used to find the relationship between web pages, the clustering stage, and the classification in data mining and data analysis methods. The two researchers have modeled new measures for the relationship between pages, such as the distance matrix, the occurrence frequency matrix, and the relationship matrix. For the relationship between web pages, a virtual graph is created that fits the relationship matrix. By presenting an advanced search algorithm, they divide the virtual chart into different clusters, i.e. infertility patterns. This method is a diagram-based algorithm. Using the LCS algorithm, the active user is classified in one of the clusters and finally a threshold value is used to suggest only the optimal pages to the user [6].
In their 2020 article, Kalanat & Khanjari presented a way to extract and discover knowledge on social media. In their paper, considering the problems of previous methods of information extraction, these researchers tried to present a method whose main focus is the extraction of knowledge in social networks based on the characteristics of nodes. The actions of these researchers showed that there are optimal changes in the properties of nodes that lead to the desired tag if users apply tags. The method was to develop a feature-based extraction operation method that naturally combines network structure information with node properties, in which the goal is to learn a function that changes the values of node properties. In turn, it affects the weight of the edges in the network, so that the label of the people in question may minimize the label of the desired feature. The cost of modifying the experiments confirms that the proposed approach works better than the current most advanced method in mining [19].
2.1. Recommender systems
Recommending systems have become very important in recent years. The goal of any proposing system is for consumers to be able to find new goods or services, such as movies, articles, the Web, books, music, restaurants, or even people, based on information about the consumer or recommended [8, 7]. The recommender system is a system that, according to the user's preferences, recommends items jointly to a group of users [9]. Recommending systems are systems that help to find the user's interests in situations of over-information. Where the user's preferences are estimated based on the behavior observed in the past and can provide the user with a ranked list of suggestions [10].
2.2. Collaborative refinement recommendation system
Suggests participatory filtering by matching users with other users with similar interests. This type of filtering collects user feedback and does so in the form of user-generated ratings for each item, and finds adaptations in user-centric rating behaviors to find a group of users who have similar preferences [11]. Here, a user profile represents the user's preferences, preferences that the user has explicitly or implicitly provided [12]. In a typical collaborative filtering environment, there is a list of m user numbers and item items that are displayed as and, respectively. Are given. Each user has a list of items that are explicitly or implicitly ranked. This produces a user-item matrix called ‘R’, a matrix that shows the user preferences for items. To find unknown rankings, various methods are used, such as finding the "nearest neighbor" so that items offered to new users are based on rankings provided by their nearest neighbors.
3. Proposed method
The proposed approach tries to suggest learning resources to the learner based on his / her preferences and his / her previous search history, a history extracted from log files. This approach combines learning styles and collaborative filtering techniques to improve the quality of suggestions. The proposed system provides an initial training strategy for cold start of the session to deal with the problem of cold start. This happens in cases where there is a lack of data about learners and their preferences, which makes it impossible to make suggestions. Therefore, we have also used the participatory filtering approach to review these initial proposals. The idea of this method is to make predictions about the learner's preferences based on the preferences of other people who are similar to the active learner. In fact, by looking at the profiles of learners, a group of learners is identified whose preferences match the current learner. The learner profiles are then compared with different groups of learners that have already been categorized and the most appropriate neighborhood is identified. In addition, to update this profile, we have defined a new score function for weighting learning resources, to extract the learner preferences from the log files. Also, this function allows you to increase and strengthen the performance and normalize the rankings to prevent the problem of data scatter. The architecture for proposing resources using this method is shown in Figure 1.
Fig 1. An overview of the proposed method
Data Base
To implement the proposed system, we must first collect a log file from the requests of different users. To do this, we use the NASA log file that can be extracted at http://ita.ee.lbl.gov/html/contrib/NASAHTTP. This log file contains 978,000 data related to user searches in a specific period of time that we randomly selected a database containing 20,000 records for this research. The steps are to convert this log file into user sessions using the 30-minute rule. The weight of each page viewed by the user is calculated in one session. This continues until there is no meeting left. At the end of this step, a file containing 1650 sessions was extracted from the initial log file, which was cleared to 1431 sessions.
Step 1. Data preprocessing
In the first step of the proposed method, we must first perform the data preprocessing operation because it is usually not possible to inject data raw into data mining algorithms. To prepare the data, it is necessary to take it out of its original form and form and transform it into a form that is suitable for the algorithm. Also, the available data usually have different extras that may confuse the algorithm. In data mining, we also need to remove additional data that does not help the problem and the algorithm. Data preprocessing operations are usually performed before the main operation of data mining algorithms and facilitate and assist the algorithms. Data processing is an important step towards successful data mining.
Data cleansing
At this point we need to clear the existing data. Data cleansing is the process of eliminating errors and inconsistencies in the data and is in fact the stage of quality control before performing data analysis. Data from real-world sources are often inaccurate, incomplete, and inconsistent due to operational errors and systems implementation. Such data needs to be cleared first.
Data normalization
At this point we need to normalize the data, normalizing the data changes the data so that it is mapped to a small and definite range such as the distance between 1- and 1. The goal of normalization is to eliminate data redundancy and maintain dependencies between related data. This process often results in more tables, but reduces database measurement and ensures improved performance.
Step 2. Data clustering with SW-DBSCAN algorithm
Next we need to cluster our data to find similarities between the data. The clustering method used in the research is the use of an improved version of the DBSCAN clustering algorithm, SW-DBSCAN. SW-DBSCAN is a network-based clustering method to reduce the time complexity and increase the efficiency of DBSCAN to obtain more accurate clusters, which we cluster in this research through this algorithm. First, we divide the network into different cells and then run the DBSCAN algorithm on the other cells according to the number of points in each cell. Then we change the coordinates of the network. We now have a new network in which we will apply the same method as before. We now have two networks. The next step is to integrate the resulting clusters in the first network according to the second network. To do this, we create a matrix that helps us determine which categories should be merged. Finally, we integrate the first network clusters based on the generated matrix. As a result, it gives us clusters that are so accurate that this data can be used to identify the user's interest in a particular set of data [13]. The results of clustering were extracted from 1431 active sessions of 118 clusters.
Step 3. Weighing learning resources
After clearing and pre-processing the web logs, the data is converted or merged into the appropriate forms for the proposed purposes. For this purpose, we have defined the ranking weight of each learning activity using the following score function:
| (1) |
| (2) |
| (3) |
| (4) |
| (5) |
| (6) |
| (7) |
| (8) |
| (9) |
| (10) |
| (11) |
| (12) |