A Novel Clustering Algorithm Based upon Learning Automata for Collaborative Filtering
Subject Areas : International Journal of Smart Electrical EngineeringSara Taghipour 1 , Javad Akbari Torkestani 2 * , Sara Nazari 3
1 - Department of Computer Engineering, Qom Branch, Islamic Azad University, Qom, Iran.
2 - Department of Computer Engineering, Arak Branch, Islamic Azad University, Arak, Iran.
3 - Department of Computer Engineering, Arak Branch, Islamic Azad University, Arak, Iran.
Keywords: Learning Automata, collaborative filtering, Clustering, Recommender Systems,
Abstract :
Collaborative Filtering (CF) is one of the principal techniques applied in Recommender Systems, which uses ratings from similar users to predict interest items to a particular user. The scalability issue is a widespread problem of CF. The clustering technique is a successful approach to address the scalability issue in CF. However, some classic clustering methods cannot find appropriate clusters, which leads to low prediction accuracy. This paper suggests a new clustering algorithm based on the Learning Automata (LA) framework to group users for the CF technique. In this algorithm, a learning automaton is assigned to each user to detect the cluster membership of that user. Learning automatons improve their selection based on the reinforcement signal is received from intra-cluster distances and inter-cluster distances in previous iterations.Experimental results on standard and real datasets show that the proposed algorithm outperforms other compared methods in various evaluation metrics. This approach enhances the prediction accuracy and effectively deals with the scalability problem.
3 International Journal of Smart Electrical Engineering, Vol.10, No.1, Winter 2021 ISSN: 2251-9246
EISSN: 2345-6221
pp. 1:6 |
A Novel Clustering Algorithm Based upon Learning Automata for Collaborative Filtering
Sara Taghipour 1 , Javad Akbari Torkestni 2, Sara Nazari 3
1Department of Computer Engineering, Qom Branch, Islamic Azad University, Qom, Iran, sa.taghipour_stu@qom-iau.ac.ir
2Department of Computer Engineering, Arak Branch, Islamic Azad University, Arak, Iran, j-akbari@iau-arak.ac.ir
3Department of Computer Engineering, Arak Branch, Islamic Azad University, Arak, Iran, s-nazari@iau-arak.ac.ir
Abstract
Collaborative Filtering (CF) is one of the main approaches in Recommender Systems, which utilizes ratings from similar users to predict favorite items to a specific user. The scalability issue is a widespread problem of CF. The clustering technique is a successful approach to address the scalability issue in CF. However, some classic clustering methods cannot find appropriate clusters that result in low prediction accuracy. This paper suggests a new clustering algorithm based on the Learning Automata (LA) framework to group users for the CF technique. In this algorithm, each user is equipped with a learning automaton; the mission of each automaton is to find the proper cluster for the user. Learning automatons scatter the users randomly into clusters then, they gradually improve their selection by the feedback received from the intra-cluster distances and inter-cluster distances. Experiments on standard and popular datasets prove that the proposed algorithm achieves better results compared to other methods in different evaluation metrics. This approach enhances the accuracy of prediction and effectively deals with the scalability issue.
Keywords: Recommender Systems, Collaborative Filtering, Clustering, Learning Automata
Article history: Received 20-Apr-2021; Revised 01-May-2021; Accepted 15-May-2021.
© 2021 IAUCTB-IJSEE Science. All rights reserved
1. Introduction
The Recommender Systems (RS) uses the overloading information to attract customers. RS generates meaningful recommendations to users for items that are attractive to them [4]. Amazon.com Recommender System[1], Facebook Friend Recommendations, Netflix Recommendation [2], Google News Personalization System[3] are examples of RS in the real world.
Generally, the RS is categorized into three principal methods: Collaborative Filtering [5], Content-based Filtering [6], and Hybrid method [7]. Collaborative Filtering makes predictions to a specific user based on the ratings of their similar neighbors. Content-based Filtering methods utilize the properties of users and items in addition to explicit ratings. These two methods are often combined in Hybrid methods [8, 9].
Collaborative Filtering (CF) is successful recommendation method in practice [10, 11]. In CF, for a specific user, the missing rating on an item is estimated by ratings that were previously given by similar users. The scalability issue is one of the challenges in the CF that has negatively affected the prediction accuracy of CF [4, 5, 12, 13].
The clustering techniques are used to cope with the scalability and sparsity issues in the CF method. Generally, a cluster-based CF includes three steps. First, user clusters have been constructed by a clustering algorithm. Second, similar users are identified by the similarity measure in each cluster. Finally, the ratings of neighbors are applied to estimate the missing ratings. Clustering users can effectively partition the rating database thus, alleviate the scalability and sparsity issues; To identify similar users for a specific user, smaller clusters instead of the entire database are investigated [5, 11, 12, 14, 15].
Recent works [16-18] show that by applying more advanced clustering methods to alleviate the scalability issue, the prediction accuracy of recommendations can be preserved even outperform the other CF approaches. Koohi tries to alleviate the sparsity issue by clustering users with a fuzzy method and applying it in user-based CF. It shows that fuzzy clustering works well in comparison with KMeans clustering [1]. A novel evolutionary clustering algorithm has been presented by Chen to catch the similar interest between users in CF [16]. In [19], improving the scalability of CF has been achieved by using genetic algorithms to cluster users. K-means clustering algorithms in CF have been investigated by Zahra. The work investigates different centroid selection strategies in the k-means clustering method for CF [20].
Learning Automata (LA) is a reinforcement learning approach that learns by rewarding desired actions and punishing undesired actions. On the other hand, it learns based on feedback received from the system feedback [21]. However, LA has been applied in recommender systems [22, 23], social networks [24, 25], retrieval information [26]; it has not been proposed for clustering in CF methods.
In this study, we have presented a novel clustering algorithm that employed the learning automata method to group users in CF; it has called LAC. The proposed method intends to enhance the quality of clusters by learning automata and thereby increase the accuracy of the prediction. Here, each user is equipped with a learning automaton; the mission of each automaton is to find the proper cluster for the assigned user. The process of the proposed clustering algorithm has been accomplished by the repetition of the following steps. First, each of the learning automatons randomly chooses a cluster based on their action probability vector; a cluster configuration has appeared. Next, to check the current cluster configuration, the reinforcement signal has been determined by the Silhouette value (intra-cluster distances and inter-cluster distances) for each user. Finally, the automaton rewards or punishes the latest action based on the reinforcement signal. In the presented algorithm, the reinforcement signal is calculated based on the Silhouette value that is a measure of how similar a user is to the assigned cluster (cohesion) compared to other clusters (separation).
The efficiency of our method is assessed through simulation experiments on two real datasets, MovieLens 100K (denoted as Movie 100K) and MovieLens 1M (denotes as Movie 1M). Our method is compared with frequently used CF methods: user-based CF, KMeans CF, and FCM CF (Fuzzy C-means CF). The obtained results show that it is higher than others in accuracy metrics.
The rest of this paper is structured as follows. In section 2, we present the basic concepts of CF, the introduction of learning automata, and related work. In Section 3, our clustering algorithm (LAC) is described in detail. Collaborative Filtering based on the LAC algorithm (LAC-CF) is explained in Section 4. The experimental evaluation results are disclosed in Section 5. Section 6 is the conclusion.
2. Literature Review
In this part, a concise presentation to the CF is provided. Next, since the proposed technique is based upon Learning Automata to form clusters of users, we will introduce a brief theory of this technique. Finally, related work will be explained.
A. Collaborative Filtering Recommender Systems:
Collaborative Filtering (CF) is the successful method in RS. for making recommendations; the CF method has been employed in successful online businesses such as Netflix, Amazon, and Lastfm to recommend services and items to their customers [10, 18]. Ratings of n users on m items that are collected in a rating matrix are input data for CF; the rating matrix is sparse because of numerous missing ratings. The CF method could be categorized into model-based methods and memory-based. The principle of model-based CF is to produce a model from the database; it uses the learned model to predict missing ratings. The main property of the memory-based CF is that the rating matrix is explicitly applied to predict missing ratings. Two categories of memory-based CF are user-based CF and item-based CF. The basis of user-based CF is that a user likes those items that its similar users like. For finding similar users, the similarity value between users in the database is calculated by similarity measures then the most similar N users to user u are selected as neighbors of user u. Finally, predicting missing rating for user u is concluded of average ratings of neighbors. Three popular similarity measures for CF are Cosine-based, Adjusted cosine-based, and Pearson correlation-based [2-5] that described below. In Table 1, the symbols used in the following formula are reported. The formula for Cosine-based similarity is as equation (1):
| )1( |
Meaning | Symbol | |||||
Set of items rated by user u |
| |||||
Set of items rated by user v |
| |||||
User u’s rating for item i |
| |||||
User v’s rating for item i |
| |||||
The average rating by user u |
| |||||
The average rating by user v |
| |||||
The average rating for item i |
| |||||
User u’s predicted rating on item i |
| |||||
Neighbors of user u |
| |||||
|
|
| (2) |
| (3) |
| (4) |
| (5) | |||||
| (6) |
| (7) |
Algorithm1. Pseudocode of clustering based on Learning Automata in RS (LAC) | ||||||
Input: user-item rating matrix n, K //numbers of users, numbers of clusters Output: clusters of users () Assumptions: Let Ai be the learning automaton corresponding to user i Begin = //initial LAs Probability vector =0 = 0 =0 WHILE (t < tmax) //All LAs select an action based on their probability vector For each user, i, = random cluster based on End // Reinforcement signal () is computed for each LA For each user i = find an appropriate cluster End For each user i If == = 0 else = 1 End if End for // Update probability vector under reinforcement signal For each user i If = 0 Update by equation (5) Else if =1 Update by equation (6) End if End for t = t + 1 END WHILE End Fig. 2. The pseudocode of the LAC algorithm
|