In the first installment of this two-part series on machine learning with Apache Mahout, I explained how to use one of the Mahout recommender engines with just a few lines of code. In this article, I explore more-advanced machine learning algorithms included in Apache Mahout and discuss their refinement process.
Making Changes to the User-Based Recommender
The recommender example introduced in the previous article generated three recommendations for user 1001. The following lines show a new version of this example. In this case, the code retrieves all the user IDs from the data model and iterates over the long primitives using the org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator
. The code generates a list of five recommendations for each user and displays the recommended item with the strength of preference value.
package com.first; import java.io.*; import java.util.*; import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; public class GenericUserBasedRecommender1 { public static void main(String[] args) throws Exception { // Create a data source from the CSV file File userPreferencesFile = new File("data/dataset1.csv"); DataModel dataModel = new FileDataModel(userPreferencesFile); UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel); UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(2, userSimilarity, dataModel); // Create a generic user based recommender with the dataModel, the userNeighborhood and the userSimilarity Recommender genericRecommender = new GenericUserBasedRecommender(dataModel, userNeighborhood, userSimilarity); // Recommend 5 items for each user for (LongPrimitiveIterator iterator = dataModel.getUserIDs(); iterator.hasNext();) { long userId = iterator.nextLong(); // Generate a list of 5 recommendations for the user List<RecommendedItem> itemRecommendations = genericRecommender.recommend(userId, 5); System.out.format("User Id: %d%n", userId); if (itemRecommendations.isEmpty()) { System.out.println("No recommendations for this user."); } else { // Display the list of recommendations for (RecommendedItem recommendedItem : itemRecommendations) { System.out.format("Recommened Item Id %d. Strength of the preference: %f%n", recommendedItem.getItemID(), recommendedItem.getValue()); } } } } }
To rebuild the project, execute the mvn compile
Maven command explained in the previous article.
Then, to run the built project:
mvn exec:java -Dexec.mainClass="com.first.GenericUserBasedRecommender1"
The following lines show the last part of the output generated by the execution. Notice that there are just four recommendations for user 1001 and no recommendations for the rest of the users.
User Id: 1001 Recommened Item Id 9010. Strength of the preference: 9.500863 Recommened Item Id 9011. Strength of the preference: 9.499137 Recommened Item Id 9012. Strength of the preference: 8.499137 Recommened Item Id 9004. Strength of the preference: 2.001726 User Id: 1002 No recommendations for this user. User Id: 1003 No recommendations for this user. User Id: 1004 No recommendations for this user. User Id: 1005 No recommendations for this user. User Id: 1006 No recommendations for this user.
The GenericUserBasedRecommender
user-based recommender works with the following components:
- A data model:
dataModel
. - A user similarity that encapsulates some notion of similarity among users:
userSimilarity
. In this case,userSimilarity
is an instance ofPearsonCorrelationSimilarity
. - A user neighborhood that encapsulates some notion of a group of users with the most similar tastes. In this case,
userNeighborhood
is an instance ofNearestNUserNeighborhood
and the code defines the neighborhood as the two most similar users. - A recommender engine:
genericRecommender
. In this case,genericRecommender
is an instance ofGenericUserBasedRecommender
.
You can make changes to both the way in which the user-based recommender determines the similarities among users and the ways to define a neighborhood. For example, if you change the line that creates the instance of NearestNUserNeighborhood
to:
UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(10, userSimilarity, dataModel);
the neighborhood will be composed of up to 10 most similar users, instead of just two.
You can use the previously shown Maven commands to rebuild and execute the modified project. The following lines show the last part of the output generated by the execution. Notice that there are more recommendations for users with the bigger neighborhood. In addition, the recommendations for user 1001 are the same as the previous execution, but the values for the strength of the preference are different.
User Id: 1001 Recommened Item Id 9010. Strength of the preference: 8.699270 Recommened Item Id 9012. Strength of the preference: 8.659677 Recommened Item Id 9011. Strength of the preference: 8.377571 Recommened Item Id 9004. Strength of the preference: 1.000000 User Id: 1002 Recommened Item Id 9012. Strength of the preference: 8.721395 Recommened Item Id 9010. Strength of the preference: 8.523443 Recommened Item Id 9011. Strength of the preference: 8.211071 User Id: 1003 Recommened Item Id 9012. Strength of the preference: 8.692321 Recommened Item Id 9010. Strength of the preference: 8.613442 Recommened Item Id 9011. Strength of the preference: 8.303847 User Id: 1004 No recommendations for this user. User Id: 1005 No recommendations for this user. User Id: 1006 No recommendations for this user.
It is also possible to define a neighborhood with users that have a similarity above a certain value instead of picking a fixed number of closest neighbors. For example, if you replace the line that creates the instance of NearestNUserNeighborhood
with this:
UserNeighborhood userNeighborhood = new ThresholdUserNeighborhood(0.75, userSimilarity, dataModel);
the neighborhood will be composed only of users that have a Pearson correlation of 0.75 or above. This way, the code uses a threshold-based definition of the user neighborhood, rather than a fixed-size neighborhood. The code creates an instance of the ThresholdUserNeighborhood
class that replaces the usage of the NearestNUserNeighborhood
class.
The following lines show the last part of the output generated by the execution of this change. The recommendations for user 1001 are the same as the previous execution, but the values for the strength of the preference are different.
User Id: 1001 Recommened Item Id 9010. Strength of the preference: 8.699270 Recommened Item Id 9012. Strength of the preference: 8.659677 Recommened Item Id 9011. Strength of the preference: 8.377571 Recommened Item Id 9004. Strength of the preference: 1.680646 User Id: 1002 No recommendations for this user. User Id: 1003 No recommendations for this user. User Id: 1004 No recommendations for this user. User Id: 1005 No recommendations for this user. User Id: 1006 No recommendations for this user.
The user similarity determines which users are similar to others. In the previous examples, the PearsonCorrelationSimilarity
class provided a similarity metric based on the Pearson correlation. Because the pure Pearson correlation does not reflect the number of items over which it is computed, the PearsonCorrelationSimilarity
is used, which allows you to use the weighting option to reflect the number of items. This way, when the result is based on more information, the correlation is more reliable.
You simply need to specify an additional parameter to the PearsonCorrelationSimilarity
constructor in order to use weighting:
UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel, org.apache.mahout.cf.taste.common.Weighting.WEIGHTED);