Big data course K17 - Spark's collaborative filtering method

Email of the author of the article: [email protected] Address: Huizhou, Guangdong

 ▲ This chapter’s program

⚪ Understand the concept of collaborative filtering in Spark;

1. The concept of collaborative filtering

1. Concept

Collaborative filtering is a way to leverage crowdsourced wisdom. It uses a large number of existing user preferences to estimate the user's preference for untouched items. The underlying idea is the definition of similarity.

1. User-based collaborative filtering concept

In user-based methods, if two users show similar preferences (i.e., roughly the same preferences for the same items),

Then assume that their interests are similar. To recommend an unknown item to one of their users, select a number of similar items

Users calculate a comprehensive score for each item based on their preferences, and then use the score to recommend items. Its overall logic is,

If other users also prefer certain items, those items are likely to be recommended.

2. The concept of item-based collaborative filtering

Item-based methods can also be used to make recommendations. This method is usually based on existing users' preferences or reviews of items.

level situation to calculate a certain degree of similarity between items. At this time, those items with the same ratings from similar users will be considered more similar.

Once the similarity between items is known, the user can be represented by the items that the user has come into contact with, and then find out the similarity between the items and these known items.

items with similar products and recommend these items to users. Likewise, items similar to existing items are used to generate a comprehensive

The combined score is used to evaluate the similarity of unknown items.

2. Recommended methods for collaborative filtering

1. User-based recommendations

For recommendations based on user similarity, a simple word can be used to express it, that is "like-minded". This is also true.

For example, if you want to watch a movie, but you don’t know if it suits your taste, what should you do? Finding introductions and watching short trailers on the Internet is certainly a good idea, but it does not provide more detailed and accurate information on whether the movie will actually meet your preferences. The best way at this time may be this:

Xiao Wang: Brother, I want to go see this movie. Didn’t you watch it? How did it go?

Xiao Zhang: Not really. I went to watch it with my girlfriend. She watched it with gusto. I watched half of it and then played with my phone. Xiao Wang: Are there any good movies to watch recently?

Xiao Zhang: Go watch "Thunder XX". I watched it well and I think you will like it too.

Xiao Wang: Okay.

This is a conversation that often occurs in daily life and is the basis of user-based collaborative filtering algorithms.

Xiao Wang and Xiao Zhang are good buddies. As good buddies, they should also have the same hobbies. So on this basis, it must be reasonable for each other to recommend the things they like to each other. There is reason to believe that the recommended people can also better enjoy the happiness and satisfaction brought by the recommended items.

The figure below shows the performance of the user-based collaborative filtering algorithm.

As you can see from the picture, if you want to recommend a product to user 3, how to choose this product is a big problem. In the existing information, user 3 has selected item 1 and item 5, user 2 is more inclined to select item 2 and item 4, and user 1 has selected item 1, item 4 and item 5.

According to the reader's rational thinking, it can be seen without further analysis that User 1 and User 3 are more similar in selection preferences. Then there is every reason to believe that both user 1 and user 3 have chosen the same items 1 and 5, and it is completely reasonable to recommend item 3 to user 3.

This is a recommendation based on the user-based collaborative filtering algorithm. Use a specific calculation method to scan existing users who have the same target as the specified target, calculate the similarity of the users based on the given similarity, select the user with the highest score, and feed it back to the user as a recommendation result based on their existing information. This recommendation algorithm is relatively simple and easy to understand in terms of calculation results, and has high practical application value.

2. Item-based recommendations

In the item-based recommendation algorithm, one word can also be used to describe the principle of the entire algorithm. That is "birds of a feather flock together".

This time Xiao Zhang wanted to buy a gift for his girlfriend.

Xiao Zhang: Valentine’s Day is coming soon. I want to buy a gift for my girlfriend, but I don’t know what to buy. She almost scolded me to death for buying a racing car model last time.

Xiao Wang: Oh? Then you really don’t want to buy something she likes. What does she usually like?

Xiao Zhang: She usually likes to watch cartoons, especially "Doraemon". She watches a few episodes when she has nothing to do.

Xiao Wang: Then I suggest you buy her a Doraemon model set. She will definitely like it.

Xiao Zhang: Good idea, I’ll try it.

From the conversation, we can feel that Xiao Zhang wanted to buy a gift for his girlfriend and asked Xiao Wang for advice.

For unfamiliar users, in the absence of specific user information, it is reasonable to recommend an unknown item based on the user's existing preference data. This is the item-based recommendation algorithm.

2. Case - User and Movie Recommendation

1. Simplified version of the code

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import scala.collection.mutable.Map

object Driver1{

def main(args:Array[String]):Unit={

Guess you like

Origin blog.csdn.net/u013955758/article/details/132567538