《斯坦福数据挖掘教程·第三版》读书笔记(英文版) Chapter 9 Recommendation Systems

news2024/9/30 7:23:07

来源:《斯坦福数据挖掘教程·第三版》对应的公开英文书和PPT

Chapter 9 Recommendation SystemsRecommendation systems use a number of different technologies. We can classify these systems into two broad groups.

  • Content-based systems examine properties of the items recommended. For instance, if a Netflix user has watched many cowboy movies, then recommend a movie classified in the database as having the “cowboy” genre.
  • Collaborative filtering systems recommend items based on similarity measures between users and/or items. The items recommended to a user are those preferred by similar users. This sort of recommendation system can use the groundwork laid in Chapter 3 on similarity search and Chapter 7 on clustering. However, these technologies by themselves are not sufficient, and there are some new algorithms that have proven effective for recommendation systems.

In a recommendation-system application there are two classes of entities, which we shall refer to as users and items. Users have preferences for certain items, and these preferences must be teased out of the data. The data itself is represented as a utility matrix, giving for each user-item pair, a value that represents what is known about the degree of preference of that user for that item. Values come from an ordered set, e.g., integers 1–5 representing the number of stars
that the user gave as a rating for that item. We assume that the matrix is sparse, meaning that most entries are “unknown.” An unknown rating implies that we have no explicit information about the user’s preference for the item.

The distinction between the physical and on-line worlds has been called the long tail phenomenon, and it is suggested in Fig. 9.2. The vertical axis represents popularity (the number of times an item is chosen). The items are ordered on the horizontal axis according to their popularity. Physical institutions provide only the most popular items to the left of the vertical line, while the corresponding on-line institutions provide the entire range of items: the tail as well as the popular items.
在这里插入图片描述

Without a utility matrix, it is almost impossible to recommend items. However, acquiring data from which to build a utility matrix is often difficult. There are two general approaches to discovering the value users place on items.

  1. We can ask users to rate items. Movie ratings are generally obtained this way, and some on-line stores try to obtain ratings from their purchasers. Sites providing content, such as some news sites or YouTube also ask users to rate items. This approach is limited in its effectiveness, since generally users are unwilling to provide responses, and the information from those who do may be biased by the very fact that it comes from people willing to provide ratings.
  2. We can make inferences from users’ behavior. Most obviously, if a user buys a product at Amazon, watches a movie on YouTube, or reads a news article, then the user can be said to “like” this item. Note that this sort of rating system really has only one value: 1 means that the user likes the item. Often, we find a utility matrix with this kind of data shown with 0’s rather than blanks where the user has not purchased or viewed the item. However, in this case 0 is not a lower rating than 1; it is no rating at all. More generally, one can infer interest from behavior other than purchasing. For example, if an Amazon customer views information about an item, we can infer that they are interested in the item, even if they don’t buy it.

Consider movies as a case in point. Most users respond to a small number of features; they like certain genres, they may have certain famous actors or actresses that they like, and perhaps there are a few directors with a significant following. If we start with the utility matrix M, with n rows and m columns (i.e., there are n users and m items), then we might be able to find a matrix
U with n rows and d columns and a matrix V with d rows and m columns, such that UV closely approximates M in those entries where M is nonblank. If so, then we have established that there are d dimensions that allow us to characterize both users and items closely. We can then use the entry in the product UV to estimate the corresponding blank entry in utility matrix M. This process is called UV-decomposition of M.

在这里插入图片描述
While we can pick among several measures of how close the product UV is to M, the typical choice is the root-mean-square error (RMSE), where we

  1. Sum, over all nonblank entries in M the square of the difference between that entry and the corresponding entry in the product UV .
  2. Take the mean (average) of these squares by dividing by the number of terms in the sum (i.e., the number of nonblank entries in M).
  3. Take the square root of the mean.

Building a Complete UV-Decomposition Algorithm

Preprocessing

Because the differences in the quality of items and the rating scales of users are such important factors in determining the missing elements of the matrix M, it is often useful to remove these influences before doing anything else.

We can subtract from each nonblank element m i j m_{ij} mij the average rating of user i. Then, the resulting matrix can be modified by subtracting the average rating (in the modified matrix) of item j. It is also possible to first subtract the average rating of item j and then subtract the
average rating of user i in the modified matrix. The results one obtains from doing things in these two different orders need not be the same, but will tend to be close. A third option is to normalize by subtracting from m i j m_{ij} mij the average of the average rating of user i and item j, that is, subtracting one half the sum of the user average and the item average. If we choose to normalize M, then when we make predictions, we need to undo the normalization. That is, if whatever prediction method we use results in estimate e for an element m i j m_{ij} mij of the normalized matrix, then the value we predict for m i j m_{ij} mij in the true utility matrix is e plus whatever amount was subtracted from row i and from column j during the normalization process.

Initialization

As we mentioned, it is essential that there be some randomness in the way we seek an optimum solution, because the existence of many local minima justifies our running many different optimizations in the hope of reaching the global minimum on at least one run. We can vary the initial values of U and V , or we can vary the way we seek the optimum (to be discussed next), or both. A simple starting point for U and V is to give each element the same value, and a good choice for this value is that which gives the elements of the product UV the average of the nonblank elements of M. Note that if we have normalized M, then this value will necessarily be 0. If we have chosen d as the lengths of the short sides of U and V , and a is the average nonblank element of M, then the elements of U and V should be a / d \sqrt {a/d} a/d .
If we want many starting points for U and V , then we can perturb the value a / d \sqrt {a/d} a/d randomly and independently for each of the elements. There are many options for how we do the perturbation. We have a choice regarding the distribution of the difference. For example we could add to each element a normally distributed value with mean 0 and some chosen standard deviation. Or we could add a value uniformly chosen from the range − c −c c to + c +c +c for some c.

Performing the Optimization

In order to reach a local minimum from a given starting value of U and V , we need to pick an order in which we visit the elements of U and V . The simplest thing to do is pick an order, e.g., row-by-row, for the elements of U and V , and visit them in round-robin fashion. Note that just because we optimized an element once does not mean we cannot find a better value for that element after other elements have been adjusted. Thus, we need to visit elements repeatedly, until we have reason to believe that no further improvements are possible.
Alternatively, we can follow many different optimization paths from a single starting value by randomly picking the element to optimize. To make sure that every element is considered in each round, we could instead choose a permutation of the elements and follow that order for every round.

Converging to a Minimum

Ideally, at some point the RMSE becomes 0, and we know we cannot do better. In practice, since there are normally many more nonblank elements in M than there are elements in U and V together, we have no right to expect that we can reduce the RMSE to 0. Thus, we have to detect when there is little benefit to be had in revisiting elements of U and/or V . We can track the amount of improvement in the RMSE obtained in one round of the optimization, and stop when that improvement falls below a threshold. A small variation is to observe the improvements resulting from the optimization of individual elements, and stop when the maximum improvement during a round is below a threshold.

Avoiding Overfitting

One problem that often arises when performing a UV-decomposition is that we arrive at one of the many local minima that conform well to the given data, but picks up values in the data that don’t reflect well the underlying process that gives rise to the data. That is, although the RMSE may be small on the given data, it doesn’t do well predicting future data. There are several things that can be done to cope with this problem, which is called overfitting by statisticians.

  1. Avoid favoring the first components to be optimized by only moving the value of a component a fraction of the way, say half way, from its current value toward its optimized value.
  2. Stop revisiting elements of U and V well before the process has converged.
  3. Take several different UV decompositions, and when predicting a new entry in the matrix M, take the average of the results of using each decomposition.

Summary of Chapter 9

  • Utility Matrices: Recommendation systems deal with users and items. A utility matrix offers known information about the degree to which a user likes an item. Normally, most entries are unknown, and the essential problem of recommending items to users is predicting the values of the unknown entries based on the values of the known entries.
  • Two Classes of Recommendation Systems: These systems attempt to predict a user’s response to an item by discovering similar items and the response of the user to those. One class of recommendation system is content-based; it measures similarity by looking for common features of the items. A second class of recommendation system uses collaborative filtering; these measure similarity of users by their item preferences and/or measure similarity of items by the users who like them.
  • Item Profiles: These consist of features of items. Different kinds of items have different features on which content-based similarity can be based. Features of documents are typically important or unusual words. Products have attributes such as screen size for a television. Media such as movies have a genre and details such as actor or performer. Tags can also be used as features if they can be acquired from interested users.
  • User Profiles: A content-based collaborative filtering system can construct profiles for users by measuring the frequency with which features appear in the items the user likes. We can then estimate the degree to which a user will like an item by the closeness of the item’s profile to the user’s profile.
  • Classification of Items: An alternative to constructing a user profile is to build a classifier for each user, e.g., a decision tree. The row of the utility matrix for that user becomes the training data, and the classifier must predict the response of the user to all items, whether or not the row had an entry for that item.
  • Similarity of Rows and Columns of the Utility Matrix : Collaborative filtering algorithms must measure the similarity of rows and/or columns of the utility matrix. Jaccard distance is appropriate when the matrix consists only of 1’s and blanks (for “not rated”). Cosine distance works for more general values in the utility matrix. It is often useful to normalize the utility matrix by subtracting the average value (either by row, by column, or both) before measuring the cosine distance.
  • Clustering Users and Items: Since the utility matrix tends to be mostly blanks, distance measures such as Jaccard or cosine often have too little data with which to compare two rows or two columns. A preliminary step or steps, in which similarity is used to cluster users and/or items into small groups with strong similarity, can help provide more common components with which to compare rows or columns.
  • UV-Decomposition: One way of predicting the blank values in a utility matrix is to find two long, thin matrices U and V , whose product is an approximation to the given utility matrix. Since the matrix product UV gives values for all user-item pairs, that value can be used to predict the value of a blank in the utility matrix. The intuitive reason this method makes sense is that often there are a relatively small number of issues (that
    number is the “thin” dimension of U and V ) that determine whether or not a user likes an item.
  • Root-Mean-Square Error : A good measure of how close the product UV is to the given utility matrix is the RMSE (root-mean-square error). The RMSE is computed by averaging the square of the differences between UV and the utility matrix, in those elements where the utility matrix is nonblank. The square root of this average is the RMSE.
  • Computing U and V : One way of finding a good choice for U and V in a UV decomposition is to start with arbitrary matrices U and V . Repeatedly adjust one of the elements of U or V to minimize the RMSE between the product UV and the given utility matrix. The process converges to a local optimum, although to have a good chance of obtaining a global optimum we must either repeat the process from many starting matrices, or search from the starting point in many different ways.

END

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/585802.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

STL 函数对象,即仿函数

目录 1.函数对象 1.1函数对象的概念 1.2函数对象的使用 2.谓词 2.1谓词概念 2.2一元谓词 2.3二元谓词 3.内建函数对象 3.1内建函数对象的意义 3.2算术仿函数 3.3关系仿函数 3.4逻辑仿函数 1.函数对象 1.1函数对象的概念 1.2函数对象的使用 #include<iostream>…

文心一言 VS 讯飞星火 VS chatgpt (25)-- 算法导论4.2 7题

七、设计算法&#xff0c;仅使用三次实数乘法即可完成复数 abi和cdi 相乘。算法需接收a、b、c和d 为输入&#xff0c;分别生成实部 ac-bd 和虚部adbc。 文心一言&#xff1a; 可以使用如下算法来计算复数 abi 和 cdi 的积&#xff0c;且只需进行三次实数乘法&#xff1a; 1.…

Slower使用教程完整版本【2023年更新】

Slower软件的版本&#xff0c;目前市面上有多种。 如果你指的是Slower器加速软件的话&#xff0c;可以看下面的教程&#xff1a; Slower是一款很不错的安全国际互联网工具&#xff0c;广泛用于外贸与留学生行业&#xff0c;设计师行业与科研行业。但是&#xff0c;因为使用过…

注意:腾讯云轻量应用服务器十大限制说明

腾讯云轻量应用服务器相对于云服务器CVM是有一些限制的&#xff0c;比如轻量服务器不支持更换内网IP地址&#xff0c;不支持自定义私有网络VPC&#xff0c;内网连通性方面也有限制&#xff0c;轻量不支持CPU内存、带宽或系统盘单独升级&#xff0c;只能整个套餐整体升级&#x…

IO系统(计算机组成原理)

IO系统这一章主要讲的就是IO的四种控制方式&#xff0c;首先对这四种方式进行一个简单介绍&#xff0c;下面再对着四种方式分别进行介绍。 程序查询方式&#xff1a;由cpu通过程序不断查询IO设备是否已经做好准备&#xff0c;从而控制IO设备于主机进行信息交换 程序中断方式&am…

python自动化测试面试题,25K入职字节测试岗

问&#xff1a; http 和 https的区别   答&#xff1a; https需要申请ssl证书&#xff0c;https是超文本传输协议&#xff0c;信息是明文传输&#xff0c;https则是具有安全性的ssl加密传输协议http和https使用的是不同的链接方式&#xff0c;用的端口也不一样&#xff0c;前…

《深入理解计算机系统》读书笔记1

1.1信息就是位上下文 只由ASCLL字符构成的文件称为文本文件&#xff0c;所有其他文件都称为二进制文件。 系统中的所有的信息都由一串比特表示。区分不同数据对象的唯一方法是读到这些数据对象时的上下文。 1.2程序被其他程序翻译成不同的格式 预编译&#xff0c;编译&#xf…

【C++】类的访问权限

欢迎来到博主 Apeiron 的博客&#xff0c;祝您旅程愉快 &#xff01;时止则止&#xff0c;时行则行。动静不失其时&#xff0c;其道光明。 目录 1、缘起 2、示例代码 3、总结 1、缘起 在 C 中&#xff0c;类在设计时&#xff0c;可以把 属性 和 行为 放在不同的权限下加以…

智警杯赛前学习1.1---excel基本操作

修改默认设置 步骤一&#xff1a;打开“Excel选项”窗口&#xff0c;打开“文件”菜单&#xff0c;选择“选项”标签 步骤二&#xff1a;在“Excel选项”窗口中&#xff0c;选择“常规与保存”标签&#xff0c;在“常规与保存”标签中&#xff0c;可以修改录入数据时的默认字体…

【群智能算法改进】一种改进的沙丘猫群优化算法 改进沙丘猫群算法 改进SCSO[1]【Matlab代码#34】

文章目录 【获取资源请见文章第5节&#xff1a;资源获取】1. 原始沙丘猫群优化算法2. 改进沙丘猫群算法2.1 Logistic混沌映射种群初始化2.2 透镜成像折射反向学习策略2.3 动态因子2.4 黄金正弦策略 3. 部分代码展示4. 仿真结果展示5. 资源获取 【获取资源请见文章第5节&#xf…

国际标准 ISO 11898 解读

从 1993 第一个版 CAN 国际标准&#xff08;ISO 11898:1993 和 ISO 11519-2&#xff09;发布至今&#xff0c;ISO 11898 逐渐被分割整合成了相互独立的 6 个部分。分别以 Part 1 ~ Part 6 来标识。在旧版本&#xff08;2003年之前&#xff09;中 ISO 11898 是通信速度为 5kbps…

二维笛卡尔坐标系下的角的概念

文章目录 参考环境笛卡尔坐标系二维笛卡尔坐标系三维笛卡尔坐标系 任意角角的静态定义角的动态定义二维笛卡尔坐标系下角的概念方向正角、负角及零角 象限角象限象限角 终边相同角圆心角终边相同角 参考 项目描述搜索引擎Google 、Bing百度百科首页韩庆波正负角佟大大还是ETT【…

前端学习---Vue(6)路由

一、前端路由的概念和原理 Hash地址与组件的对应关系。 Hash:url中#之后的都是Hash地址 location.hash 1.1 前端路由的工作方式 ① 用户点击了页面上的路由链接 ② 导致了 URL 地址栏中的 Hash 值发生了变化 ③ 前端路由监听了到 Hash 地址的变化 ④ 前端路由把当前 Hash…

doris分区、join

动态分区和临时分区 动态分区 旨在对表级别的分区实现生命周期管理(TTL)&#xff0c;减少用户的使用负担。 目前实现了动态添加分区及动态删除分区的功能。只支持 Range 分区。原理 在某些使用场景下&#xff0c;用户会将表按照天进行分区划分&#xff0c;每天定时执行例行任…

tidyverse中filter行筛选时缺失值存在的一个坑

大家好&#xff0c;我是邓飞&#xff0c;好久没有更新博客了&#xff0c;是因为好久没有进步了。 之前我认为鲁迅说的对&#xff0c;他在《野草》中写道&#xff1a;“当我沉默着的时候&#xff0c;我觉得充实&#xff1b;我将开口&#xff0c;同时感到空虚”。现在确切的情况…

msvcr90.dll丢失的解决方法

在使用计算机的过程中&#xff0c;我们时常会遇到一些问题&#xff0c;比如应用程序无法正常启动&#xff0c;提示msvcr90.dll文件丢失&#xff0c;这个问题困扰了许多计算机用户。那么&#xff0c;怎么才能解决这个问题呢&#xff1f; 首先&#xff0c;让我们先了解一下msvcr…

c语言编程练习题:7-65 字符串替换

#include <stdio.h>int main() {char c;while (scanf("%c", &c) 1 && c ! \n) {if (c > A && c < Z) {c Z - (c - A);}printf("%c", c);}return 0; }代码来自&#xff1a;https://yunjinqi.top/article/190

Spring:spring-web中DeferredResult执行过程分析

对于HTTP请求的处理&#xff0c;有时处理请求的时间较长&#xff0c;可能会采用异步处理方式来处理。一般常用的异步处理方式是采用DeferredResult&#xff0c;本文会简单分析一下spring-web的整个处理过程。 首先&#xff0c;提供一个简单的DeferredResult例子&#xff1a; R…

C++map和set

目录&#xff1a; 什么是关联式容器&#xff1f;键值对树形结构的关联式容器 set的概念multiset的使用pair和make_pair map的概念用“[]”实现统计水果的次数 multimap的使用 什么是关联式容器&#xff1f; 在初阶阶段&#xff0c;我们已经接触过STL中的部分容器&#xff0c;比…

Centos7 Failed to start login service 问题

最近发现Centos7有个问题&#xff0c;用普通用户登录的时候&#xff0c;打开命令窗口无法进行操作一直卡在那里&#xff0c;但切换到root用户后命令输入又正常。因为我需要从 window 上的 SecureCRT 去连接 Centos7&#xff0c;每次都需要用户登录&#xff0c;然后把防火墙关闭…