Channels ▼
RSS

Machine Learning with Apache Mahout: Refining the Recommender


In the first installment of this two-part series on machine learning with Apache Mahout, I explained how to use one of the Mahout recommender engines with just a few lines of code. In this article, I explore more-advanced machine learning algorithms included in Apache Mahout and discuss their refinement process.

Making Changes to the User-Based Recommender

The recommender example introduced in the previous article generated three recommendations for user 1001. The following lines show a new version of this example. In this case, the code retrieves all the user IDs from the data model and iterates over the long primitives using the org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator. The code generates a list of five recommendations for each user and displays the recommended item with the strength of preference value.

package com.first;

import java.io.*;
import java.util.*;

import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;

public class GenericUserBasedRecommender1 {

  public static void main(String[] args) throws Exception {
	  // Create a data source from the CSV file
	  File userPreferencesFile = new File("data/dataset1.csv");
	  DataModel dataModel = new FileDataModel(userPreferencesFile);
	 
	  UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);
	  UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(2, userSimilarity, dataModel);

	  // Create a generic user based recommender with the dataModel, the userNeighborhood and the userSimilarity
	  Recommender genericRecommender =  new GenericUserBasedRecommender(dataModel, userNeighborhood, userSimilarity);

	  // Recommend 5 items for each user
	  for (LongPrimitiveIterator iterator = dataModel.getUserIDs(); iterator.hasNext();)
	  {
		  long userId = iterator.nextLong();

		  // Generate a list of 5 recommendations for the user
		  List<RecommendedItem> itemRecommendations = genericRecommender.recommend(userId, 5);

		  System.out.format("User Id: %d%n", userId);

		  if (itemRecommendations.isEmpty())
		  {
			  System.out.println("No recommendations for this user.");
		  }
		  else
		  {
			  // Display the list of recommendations
			  for (RecommendedItem recommendedItem : itemRecommendations)
			  {
				  System.out.format("Recommened Item Id %d. Strength of the preference: %f%n", recommendedItem.getItemID(), recommendedItem.getValue());
			  }
		  }
	  }
  }
}

To rebuild the project, execute the mvn compile Maven command explained in the previous article.

Then, to run the built project:

mvn exec:java -Dexec.mainClass="com.first.GenericUserBasedRecommender1"

The following lines show the last part of the output generated by the execution. Notice that there are just four recommendations for user 1001 and no recommendations for the rest of the users.

User Id: 1001
Recommened Item Id 9010. Strength of the preference: 9.500863
Recommened Item Id 9011. Strength of the preference: 9.499137
Recommened Item Id 9012. Strength of the preference: 8.499137
Recommened Item Id 9004. Strength of the preference: 2.001726
User Id: 1002
No recommendations for this user.
User Id: 1003
No recommendations for this user.
User Id: 1004
No recommendations for this user.
User Id: 1005
No recommendations for this user.
User Id: 1006
No recommendations for this user.

The GenericUserBasedRecommender user-based recommender works with the following components:

  • A data model: dataModel.
  • A user similarity that encapsulates some notion of similarity among users: userSimilarity. In this case, userSimilarity is an instance of PearsonCorrelationSimilarity.
  • A user neighborhood that encapsulates some notion of a group of users with the most similar tastes. In this case, userNeighborhood is an instance of NearestNUserNeighborhood and the code defines the neighborhood as the two most similar users.
  • A recommender engine: genericRecommender. In this case, genericRecommender is an instance of GenericUserBasedRecommender.

You can make changes to both the way in which the user-based recommender determines the similarities among users and the ways to define a neighborhood. For example, if you change the line that creates the instance of NearestNUserNeighborhood to:

UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(10, userSimilarity, dataModel);

the neighborhood will be composed of up to 10 most similar users, instead of just two.

You can use the previously shown Maven commands to rebuild and execute the modified project. The following lines show the last part of the output generated by the execution. Notice that there are more recommendations for users with the bigger neighborhood. In addition, the recommendations for user 1001 are the same as the previous execution, but the values for the strength of the preference are different.

User Id: 1001
Recommened Item Id 9010. Strength of the preference: 8.699270
Recommened Item Id 9012. Strength of the preference: 8.659677
Recommened Item Id 9011. Strength of the preference: 8.377571
Recommened Item Id 9004. Strength of the preference: 1.000000
User Id: 1002
Recommened Item Id 9012. Strength of the preference: 8.721395
Recommened Item Id 9010. Strength of the preference: 8.523443
Recommened Item Id 9011. Strength of the preference: 8.211071
User Id: 1003
Recommened Item Id 9012. Strength of the preference: 8.692321
Recommened Item Id 9010. Strength of the preference: 8.613442
Recommened Item Id 9011. Strength of the preference: 8.303847
User Id: 1004
No recommendations for this user.
User Id: 1005
No recommendations for this user.
User Id: 1006
No recommendations for this user.

It is also possible to define a neighborhood with users that have a similarity above a certain value instead of picking a fixed number of closest neighbors. For example, if you replace the line that creates the instance of NearestNUserNeighborhood with this:

UserNeighborhood userNeighborhood = new ThresholdUserNeighborhood(0.75, userSimilarity, dataModel);

the neighborhood will be composed only of users that have a Pearson correlation of 0.75 or above. This way, the code uses a threshold-based definition of the user neighborhood, rather than a fixed-size neighborhood. The code creates an instance of the ThresholdUserNeighborhood class that replaces the usage of the NearestNUserNeighborhood class.

The following lines show the last part of the output generated by the execution of this change. The recommendations for user 1001 are the same as the previous execution, but the values for the strength of the preference are different.

User Id: 1001
Recommened Item Id 9010. Strength of the preference: 8.699270
Recommened Item Id 9012. Strength of the preference: 8.659677
Recommened Item Id 9011. Strength of the preference: 8.377571
Recommened Item Id 9004. Strength of the preference: 1.680646
User Id: 1002
No recommendations for this user.
User Id: 1003
No recommendations for this user.
User Id: 1004
No recommendations for this user.
User Id: 1005
No recommendations for this user.
User Id: 1006
No recommendations for this user.

The user similarity determines which users are similar to others. In the previous examples, the PearsonCorrelationSimilarity class provided a similarity metric based on the Pearson correlation. Because the pure Pearson correlation does not reflect the number of items over which it is computed, the PearsonCorrelationSimilarity is used, which allows you to use the weighting option to reflect the number of items. This way, when the result is based on more information, the correlation is more reliable.

You simply need to specify an additional parameter to the PearsonCorrelationSimilarity constructor in order to use weighting:

UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel, org.apache.mahout.cf.taste.common.Weighting.WEIGHTED);


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video