2.5无监督学习-part-2
In the last video, you saw what is unsupervised learning, and one type of unsupervised learning called clustering. Let’s give a slightly more formal definition of unsupervised learning and take a quick look at some other types of unsupervised learning other than clustering. Whereas in supervised learning, the data comes with both inputs x and input labels y, in unsupervised learning, the data comes only with inputs x but not output labels y, and the algorithm has to find some structure or some pattern or something interesting in the data.
在上一个视频中,你了解了什么是无监督学习,还了解了无监督学习的一个种类即聚类。让我们给无监督学习下一个稍微正式一些的定义并且快速了解一下除了聚类之外的其他一些无监督学习类型。在监督学习中,数据包含输入x和输出标签y,而在无监督学习中,数据只包含输入x,没有输出标签y,算法需要在数据中找到某种结构、某种模式或其他有趣的东西。
We’re seeing just one example of unsupervised learning called a clustering algorithm, which groups similar data points together. In this specialization, you’ll learn about clustering as well as two other types of unsupervised learning. One is called anomaly detection, which is used to detect unusual events. This turns out to be really important for fraud detection in the financial system, where unusual events, unusual transactions could be signs of fraud and for many other applications.
我们只看到了无监督学习中聚类的例子,聚类算法将相似的数据点划分到一个聚类簇(组)中,在这个专题(本节课)中,你将学习到关于聚类和其他两种无监督学习方法。其中一种被称为异常检测,用于发现异常事件。异常检测在金融系统的欺诈检测中起到非常重要的作用,因为异常事件、异常交易可能是欺诈的迹象,异常也适用于许多其他应用领域。
You also learn about dimensionality reduction. This lets you take a big data-set and almost magically compress it to a much smaller data-set while losing as little information as possible. In case anomaly detection and dimensionality reduction don’t seem to make too much sense to you yet. Don’t worry about it. We’ll get to this later in the specialization.
你还将学习到降维技术。它让能够将一个大型数据集像变戏法一样压缩成一个更小的数据集,同时尽可能少地丢失信息。如果异常检测和降维对你来说还不太有意义,不太清晰的话。在这个专题中,我们稍后会详细讲解。
Now, I’d like to ask you another question to help you check your understanding, and no pressure, if you don’t get it right on the first try, is totally fine. Please select any of the following that you think are examples of unsupervised learning. Two are unsupervised examples and two are supervised learning examples. Please take a look.
现在,我想问你一个问题,以帮助你检查自己对机器学习的理解情况,不用担心,如果你第一次没有答对也完全没问题的。请从以下选项中选择任何你认为是无监督学习示例的选项。其中两个是无监督学习示例,另外两个是监督学习示例。请看一下。
【问题】下面四个选项中,哪些选项是使用了无监督学习算法来解决问题?
(A)给邮件标记为垃圾邮件或非垃圾邮件,学习构建一个垃圾邮件过滤器。
(B)给网上找到的一组新闻分类,将他们按相同的主题分成不同的组。
(C)给定一个客户数据的数据库,自动发现市场区隔并将客户分组到不同的市场区隔中。(市场区隔(Market Segment)是将消费者依不同的需求、特征区分成若干个不同的群体,而形成各个不同的消费群。)
(D)给定一个被诊断为患有糖尿病或无糖尿病的患者数据集,学习将新患者分类为患有糖尿病或无糖尿病。
【答案】BC
Maybe you remember the spam filtering problem. If you have labeled data you now label as spam or non-spam e-mail, you can treat this as a supervised learning problem. The second example, the news story example. That’s exactly the Google News and tangible example that you saw in the last video. You can approach that using a clustering algorithm to group news articles together. That we’ll use unsupervised learning.
或许你还记得垃圾邮件过滤问题。如果你有被标记为垃圾邮件或非垃圾邮件的数据,你可以将其视为有监督学习问题。第二个例子是新闻报道的示例,就像你在上个视频中看到的Google新闻和实际例子一样。你可以使用聚类算法将新闻文章分组在一起。这个问题可以用无监督学习算法来解决。
The market segmentation example that I talked about a little bit earlier. You can do that as an unsupervised learning problem as well because you can give your algorithm some data and ask it to discover market segments automatically. The final example on diagnosing diabetes. Well, actually that’s a lot like our breast cancer example from the supervised learning videos. Only instead of benign or malignant tumors, we instead have diabetes or not diabetes. You can approach this as a supervised learning problem, just like we did for the breast tumor classification problem.
之前我稍微提到的市场区隔的例子,你也可以将其作为无监督学习问题处理,因为你可以给算法一些数据,让它自动发现市场区隔。最后一个例子是关于糖尿病诊断的。实际上,它与我们在监督学习视频中提到的乳腺癌例子很相似。只是不同的是,我们不是将其分为良性或恶性肿瘤,而是糖尿病或非糖尿病。你可以像我们处理乳腺肿瘤分类问题那样,将其视为监督学习问题来处理。
Even though in the last video, we’ve talked mainly about clustering, in later videos, in this specialization, we’ll dive much more deeply into anomaly detection and dimensionality reduction as well. That’s unsupervised learning. Before we wrap up this section, I want to share with you something that I find really exciting, and useful, which is the use of Jupyter Notebooks in machine learning. Let’s take a look at that in the next video.
虽然在上个视频中我们主要谈到了聚类,但在后面的视频里,在这个专项课程中,我们将更深入地研究异常检测和降维技术。这就是无监督学习。在我们结束本节之前,我想与大家分享一些我认为非常令人兴奋和实用的东西,那就是在机器学习中使用Jupyter Notebook。让我们在下个视频中一起来看看吧。