- 说明
- day54 基于 M-distance 的推荐
- 1. M-distance 理解
- 2.代码理解
- 1.代码中变量的解读
- 2.leave-one-out测试
- 3.计算MAE(平均绝对误差)
- 4.计算RMSE(均方根误差)
- day55 基于 M-distance 的推荐(续)
- 1.基于用户和基于项目的推荐
- 2.基于用户推荐代码思路
- 2.1 抽象文本内容
- 2.2 构造函数, 初始化MBR对象(借助jdk1.8新特性--Stream流)
- 2.3 leave-one-out测试
- 2.4 代码结果
day54 基于 M-distance 的推荐
1. M-distance 理解
例如day51-53中的KNN我们预测一个物品类别,我们是以测试样本和我们训练样本的距离的远近来找k个最相似似的邻居,对这k个邻居评分来预测测试样本的类别。而M-distance是根据平均分来计算两个用户 (或项目) 之间的距离。
- 总数统计(用户数量,项目数量,评分数)
private int numItems;
private int numUsers;
private int numRatings;
- compressedRatingMatrix(压缩的评分矩阵-实际上就是把文件内容读出来)
private int[][] compressedRatingMatrix;
- userDegrees(每个用户评分的项目数量)
private int[] userDegrees;
- userStartingIndices(每个用户的起始索引,例如用户1的起始索引是272)
private int[] userStartingIndices;
- userAverageRatings(每个用户评价项目的一个平均分)
private double[] userAverageRatings;
- itemDegrees (每个项目被评分的次数-也可以理解为有多少用户评分了)
private int[] itemDegrees;
- itemAverageRatings (每个项目的平均分)
private double[] itemAverageRatings;
- 先移除这个用户0对项目0的评分,重新计算对项目0的平均分
- 找邻居(这里是基于项目进行预测,去找用户评论过的项目的平均分与当前项目的平均分差值在一个半径范围内,则作为邻居,并累计他的评分)。如我们知道用户0评论了272部电影,排除项目0,我们要从271部电影中去找邻居来预测项目0的评分。
- 若找到了邻居则求他们的平均分。如用户0找到94个邻居,总分387分,那我们预测用户0对项目0的预测分数为:4.117021276595745
- 完整代码:
public void leaveOneOutPrediction() {
double tempItemAverageRating;
// Make each line of the code shorter.
int tempUser, tempItem, tempRating;
System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);
numNonNeighbors = 0;
for (int i = 0; i < numRatings; i++) {
tempUser = compressedRatingMatrix[i][0];
tempItem = compressedRatingMatrix[i][1];
tempRating = compressedRatingMatrix[i][2];
// Step 1. Recompute average rating of the current item.
tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
/ (itemDegrees[tempItem] - 1);
// Step 2. Recompute neighbors, at the same time obtain the ratings
// Of neighbors.
int tempNeighbors = 0;
double tempTotal = 0;
int tempComparedItem;
for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
tempComparedItem = compressedRatingMatrix[j][1];
if (tempItem == tempComparedItem) {
continue;// Ignore itself.
} // Of if
if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
tempTotal += compressedRatingMatrix[j][2];
// Step 3. Predict as the average value of neighbors.
if (tempNeighbors > 0) {
predictions[i] = tempTotal / tempNeighbors;
} else {
predictions[i] = DEFAULT_RATING;
预测值与实际值之间的平均绝对偏差程度(MAE 的值越小,表示预测结果与实际值的偏差越小,预测模型的准确性越高)
public double computeMAE() throws Exception {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
return tempTotalError / predictions.length;
预测值与实际值之间的平方值偏差程度。RMSE 的值越小,表示预测结果与实际值的均方差越小,预测模型的准确性越高
public double computeRSME() throws Exception {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
* (predictions[i] - compressedRatingMatrix[i][2]);
double tempAverage = tempTotalError / predictions.length;
return Math.sqrt(tempAverage);
day55 基于 M-distance 的推荐(续)
- day 54基于项目的推荐
- 基于用户的推荐
2.1 抽象文本内容
class Text{
private Integer userNum;
private Integer itemNum;
private Integer score;
public Integer getUserNum() {
return userNum;
public void setUserNum(Integer userNum) {
this.userNum = userNum;
public Integer getItemNum() {
return itemNum;
public void setItemNum(Integer itemNum) {
this.itemNum = itemNum;
public Integer getScore() {
return score;
public void setScore(Integer score) {
this.score = score;
public Text(Integer userNum, Integer itemNum, Integer score) {
this.userNum = userNum;
this.itemNum = itemNum;
this.score = score;
2.2 构造函数, 初始化MBR对象(借助jdk1.8新特性–Stream流)
// 按电影编号分组
textGroupByItem = textList.stream().collect(Collectors.groupingBy(Text::getItemNum));
textGroupByUser = textList.stream().collect(Collectors.groupingBy(Text::getUserNum));
tempUserTotalScore[i] = textsByUser.stream().mapToDouble(Text::getScore).sum();
public MBR(String paraFileName, int paraNumUsers, int paraNumItems, int paraNumRatings, boolean basedUser) throws Exception {
if (basedUser){
//step1. initialize these arrays
numItems = paraNumItems;
numUsers = paraNumUsers;
numRatings = paraNumRatings;
userDegrees = new int[numUsers];
userAverageRatings = new double[numUsers];
itemDegrees = new int[numItems];
itemAverageRatings = new double[numItems];
predictions = new double[numRatings];
System.out.println("Reading " + paraFileName);
//step2. Read the data file
File tempFile = new File(paraFileName);
if (!tempFile.exists()) {
System.out.println("File " + paraFileName + " does not exists.");
BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
String tempString;
String[] tempStrArray;
while ((tempString = tempBufReader.readLine()) != null) {
// Each line has three values
tempStrArray = tempString.split(",");
Text text = new Text(Integer.parseInt(tempStrArray[0]), Integer.parseInt(tempStrArray[1]), Integer.parseInt(tempStrArray[2]));
textGroupByItem = textList.stream().collect(Collectors.groupingBy(Text::getItemNum));
textGroupByUser = textList.stream().collect(Collectors.groupingBy(Text::getUserNum));
double[] tempUserTotalScore = new double[numUsers];
double[] tempItemTotalScore = new double[numItems];
for (int i = 0; i < numUsers; i++) {
// 用户的总分
List<Text> textsByUser = textGroupByUser.get(i);
tempUserTotalScore[i] = textsByUser.stream().mapToDouble(Text::getScore).sum();
userDegrees[i] = textsByUser.size();
userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
for (int i = 0; i < numItems; i++) {
try {
// 电影的总分
List<Text> textsByItem = textGroupByItem.get(i);
tempItemTotalScore[i] = textsByItem.stream().mapToDouble(Text::getScore).sum();
itemDegrees[i] = textsByItem.size();
itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
} catch (Exception e) {
2.3 leave-one-out测试
// 对列表过滤数据
textsByUser = textsByUser.stream().filter(e -> !e.getItemNum().equals(outItem)).collect(Collectors.toList());
public void leaveOneOutPredictionByUser() {
double tempItemAverageRating;
// Make each line of the code shorter.
int tempUser, tempItem, tempRating;
System.out.println("\r\nLeaveOneOutPredictionUser for radius " + radius);
numNonNeighbors = 0;
for (int i = 0; i < numRatings; i++) {
Text text = textList.get(i);
tempUser = text.getUserNum();
tempItem = text.getItemNum();
tempRating = text.getScore();
// Step 1. Recompute average rating of the current user.
List<Text> textsByUser = textGroupByUser.get(tempUser);
Integer outItem = tempItem;
textsByUser = textsByUser.stream().filter(e -> !e.getItemNum().equals(outItem)).collect(Collectors.toList());
tempItemAverageRating = textsByUser.stream().mapToDouble(Text::getScore).sum() / textsByUser.size();
// Step 2. Recompute neighbors, at the same time obtain the ratings
// Of neighbors.
int tempNeighbors = 0;
double tempTotal = 0;
List<Text> texts = textGroupByItem.get(tempItem);
for (int j = 0; j < texts.size(); j++) {
Text userText = texts.get(j);
if (tempUser == j) {
continue;// Ignore itself.
if (Math.abs(tempItemAverageRating - userAverageRatings[userText.getUserNum()]) < radius) {
tempTotal += userText.getScore();
// Step 3. Predict as the average value of neighbors.
if (tempNeighbors > 0) {
predictions[i] = tempTotal / tempNeighbors;
} else {
predictions[i] = DEFAULT_RATING;
2.4 代码结果
- day54-55代码
package machinelearing.knn;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class MBR {
* Default rating for 1-5 points
public static final double DEFAULT_RATING = 3.0;
* the total number of users (参与评分的用户数量)
private int numUsers;
* the total number of items (评分的物品数量)
private int numItems;
* the total number of ratings (no-zero values) (非零评分值的数量)
private int numRatings;
* the predictions
private double[] predictions;
* Compressed rating matrix. user-item-rating triples (压缩的评分矩阵,存储用户-物品-评分的三元组)
private int[][] compressedRatingMatrix;
* The degree of users (how many item he has rated). (用户已评分的物品数量)
private int[] userDegrees;
* The average rating of the current user. (当前用户的平均评分。存储每个用户的平均评分值)
private double[] userAverageRatings;
* The degree of users (how many item he has rated). (物品被评分的次数)
private int[] itemDegrees;
* The average rating of the current item. (当前物品的平均评分。存储每个物品的平均评分值)
private double[] itemAverageRatings;
* The first user start from 0. Let the first user has x ratings, the second user will start from x. (用户起始索引。第一个用户的起始索引为0,第二个用户的起始索引为前一个用户评分的数量。用于定位用户的评分在compressedRatingMatrix中的位置。)
private int[] userStartingIndices;
* Number of non-neighbor objects. (非邻居对象的数量。用于表示在某个半径内不属于邻居的对象的数量。)
private int numNonNeighbors;
* The radius (delta) for determining the neighborhood. (: 确定邻域的半径(delta)。用于确定邻域内的对象,即在该半径范围内的对象被视为邻居。)
private double radius;
List<Text> textList = new ArrayList<>();
private Map<Integer, List<Text>> textGroupByItem = new HashMap<>();
private Map<Integer, List<Text>> textGroupByUser= new HashMap<>();
class Text{
private Integer userNum;
private Integer itemNum;
private Integer score;
public Integer getUserNum() {
return userNum;
public void setUserNum(Integer userNum) {
this.userNum = userNum;
public Integer getItemNum() {
return itemNum;
public void setItemNum(Integer itemNum) {
this.itemNum = itemNum;
public Integer getScore() {
return score;
public void setScore(Integer score) {
this.score = score;
public Text(Integer userNum, Integer itemNum, Integer score) {
this.userNum = userNum;
this.itemNum = itemNum;
this.score = score;
public MBR(String paraFileName, int paraNumUsers, int paraNumItems, int paraNumRatings) throws Exception{
//step1. initialize these arrays
numItems = paraNumItems;
numUsers = paraNumUsers;
numRatings = paraNumRatings;
userDegrees = new int[numUsers];
userStartingIndices = new int[numUsers + 1];
userAverageRatings = new double[numUsers];
itemDegrees = new int[numItems];
compressedRatingMatrix = new int[numRatings][3];
itemAverageRatings = new double[numItems];
predictions = new double[numRatings];
System.out.println("Reading " + paraFileName);
//step2. Read the data file
File tempFile = new File(paraFileName);
if (!tempFile.exists()) {
System.out.println("File " + paraFileName + " does not exists.");
BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
String tempString;
String[] tempStrArray;
int tempIndex = 0;
userStartingIndices[0] = 0;
userStartingIndices[numUsers] = numRatings;
while ((tempString = tempBufReader.readLine()) != null) {
// Each line has three values
tempStrArray = tempString.split(",");
compressedRatingMatrix[tempIndex][0] = Integer.parseInt(tempStrArray[0]);
compressedRatingMatrix[tempIndex][1] = Integer.parseInt(tempStrArray[1]);
compressedRatingMatrix[tempIndex][2] = Integer.parseInt(tempStrArray[2]);
if (tempIndex > 0) {
// Starting to read the data of a new user.
if (compressedRatingMatrix[tempIndex][0] != compressedRatingMatrix[tempIndex - 1][0]) {
userStartingIndices[compressedRatingMatrix[tempIndex][0]] = tempIndex;
double[] tempUserTotalScore = new double[numUsers];
double[] tempItemTotalScore = new double[numItems];
for (int i = 0; i < numRatings; i++) {
tempUserTotalScore[compressedRatingMatrix[i][0]] += compressedRatingMatrix[i][2];
tempItemTotalScore[compressedRatingMatrix[i][1]] += compressedRatingMatrix[i][2];
for (int i = 0; i < numUsers; i++) {
userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
for (int i = 0; i < numItems; i++) {
itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
public MBR(String paraFileName, int paraNumUsers, int paraNumItems, int paraNumRatings, boolean basedUser) throws Exception {
if (basedUser){
//step1. initialize these arrays
numItems = paraNumItems;
numUsers = paraNumUsers;
numRatings = paraNumRatings;
userDegrees = new int[numUsers];
userAverageRatings = new double[numUsers];
itemDegrees = new int[numItems];
itemAverageRatings = new double[numItems];
predictions = new double[numRatings];
System.out.println("Reading " + paraFileName);
//step2. Read the data file
File tempFile = new File(paraFileName);
if (!tempFile.exists()) {
System.out.println("File " + paraFileName + " does not exists.");
BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
String tempString;
String[] tempStrArray;
while ((tempString = tempBufReader.readLine()) != null) {
// Each line has three values
tempStrArray = tempString.split(",");
Text text = new Text(Integer.parseInt(tempStrArray[0]), Integer.parseInt(tempStrArray[1]), Integer.parseInt(tempStrArray[2]));
textGroupByItem = textList.stream().collect(Collectors.groupingBy(Text::getItemNum));
textGroupByUser = textList.stream().collect(Collectors.groupingBy(Text::getUserNum));
double[] tempUserTotalScore = new double[numUsers];
double[] tempItemTotalScore = new double[numItems];
for (int i = 0; i < numUsers; i++) {
// 用户的总分
List<Text> textsByUser = textGroupByUser.get(i);
tempUserTotalScore[i] = textsByUser.stream().mapToDouble(Text::getScore).sum();
userDegrees[i] = textsByUser.size();
userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
for (int i = 0; i < numItems; i++) {
try {
// 电影的总分
List<Text> textsByItem = textGroupByItem.get(i);
tempItemTotalScore[i] = textsByItem.stream().mapToDouble(Text::getScore).sum();
itemDegrees[i] = textsByItem.size();
itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
} catch (Exception e) {
public void setRadius(double paraRadius) {
if (paraRadius > 0) {
radius = paraRadius;
} else {
radius = 0.1;
public void leaveOneOutPrediction() {
double tempItemAverageRating;
// Make each line of the code shorter.
int tempUser, tempItem, tempRating;
// System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);
numNonNeighbors = 0;
for (int i = 0; i < numRatings; i++) {
tempUser = compressedRatingMatrix[i][0];
tempItem = compressedRatingMatrix[i][1];
tempRating = compressedRatingMatrix[i][2];
// Step 1. Recompute average rating of the current item.
tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
/ (itemDegrees[tempItem] - 1);
// Step 2. Recompute neighbors, at the same time obtain the ratings
// Of neighbors.
int tempNeighbors = 0;
double tempTotal = 0;
int tempComparedItem;
for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
tempComparedItem = compressedRatingMatrix[j][1];
if (tempItem == tempComparedItem) {
continue;// Ignore itself.
} // Of if
if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
tempTotal += compressedRatingMatrix[j][2];
// Step 3. Predict as the average value of neighbors.
if (tempNeighbors > 0) {
predictions[i] = tempTotal / tempNeighbors;
} else {
predictions[i] = DEFAULT_RATING;
public void leaveOneOutPredictionByUser() {
double tempItemAverageRating;
// Make each line of the code shorter.
int tempUser, tempItem, tempRating;
// System.out.println("\r\nLeaveOneOutPredictionUser for radius " + radius);
numNonNeighbors = 0;
for (int i = 0; i < numRatings; i++) {
Text text = textList.get(i);
tempUser = text.getUserNum();
tempItem = text.getItemNum();
tempRating = text.getScore();
// Step 1. Recompute average rating of the current user.
List<Text> textsByUser = textGroupByUser.get(tempUser);
Integer outItem = tempItem;
textsByUser = textsByUser.stream().filter(e -> !e.getItemNum().equals(outItem)).collect(Collectors.toList());
tempItemAverageRating = textsByUser.stream().mapToDouble(Text::getScore).sum() / textsByUser.size();
// Step 2. Recompute neighbors, at the same time obtain the ratings
// Of neighbors.
int tempNeighbors = 0;
double tempTotal = 0;
List<Text> texts = textGroupByItem.get(tempItem);
for (int j = 0; j < texts.size(); j++) {
Text userText = texts.get(j);
if (tempUser == j) {
continue;// Ignore itself.
if (Math.abs(tempItemAverageRating - userAverageRatings[userText.getUserNum()]) < radius) {
tempTotal += userText.getScore();
// Step 3. Predict as the average value of neighbors.
if (tempNeighbors > 0) {
predictions[i] = tempTotal / tempNeighbors;
} else {
predictions[i] = DEFAULT_RATING;
public double computeMAE() throws Exception {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
} // Of for i
return tempTotalError / predictions.length;
public double computeMAE_User() throws Exception {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += Math.abs(predictions[i] - textList.get(i).getScore());
} // Of for i
return tempTotalError / predictions.length;
public double computeRSME() throws Exception {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
* (predictions[i] - compressedRatingMatrix[i][2]);
} // Of for i
double tempAverage = tempTotalError / predictions.length;
return Math.sqrt(tempAverage);
public double computeRSME_User() throws Exception {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += (predictions[i] - textList.get(i).getScore())
* (predictions[i] - textList.get(i).getScore());
} // Of for i
double tempAverage = tempTotalError / predictions.length;
return Math.sqrt(tempAverage);
public static void main(String[] args) {
try {
MBR tempRecommender = new MBR("C:/Users/Desktop/sampledata/movielens-943u1682m.txt", 943, 1682, 100000);
MBR tempRecommender1 = new MBR("C:/Users/Desktop/sampledata/movielens-943u1682m.txt", 943, 1682, 100000, true);
for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
double tempMAE = tempRecommender.computeMAE();
double tempRSME = tempRecommender.computeRSME();
double tempMAE1 = tempRecommender1.computeMAE_User();
double tempRSME1 = tempRecommender1.computeRSME_User();
System.out.println("Radius_item = " + tempRadius + ", MAE_item = " + tempMAE + ", RSME_item = " + tempRSME
+ ", numNonNeighbors_item = " + tempRecommender.numNonNeighbors);
System.out.println("Radius_user = " + tempRadius + ", MAE_user = " + tempMAE1 + ", RSME_user = " + tempRSME1
+ ", numNonNeighbors_user = " + tempRecommender1.numNonNeighbors);
} catch (Exception ee) {