Lecture 3 N-gram Language Models

news2024/11/24 5:50:40

目录

      • Probabilities: Joint to Conditional 概率:联合概率到条件概率
      • The Markov Assumption 马尔可夫假设
      • Maximum Likelihood Estimation 最大似然估计
      • Book-ending Sequences 序列的开头和结尾
      • Problems with N-gram models N-gram模型的问题
      • Smoothing 平滑处理
      • In Practice
      • Generation 生成
      • How to select next word

Language Models

  • One application NLP is about explaining language: Why some sentences are more fluent than others NLP的一个应用是关于解释语言:为什么有些句子比其他句子更流畅
  • E.g. in speech recognition: recognize speech > wreck a nice beach 在语音识别中:识别语音 > 毁坏一个好的海滩
  • Measures “goodness” using probabilities estimated by language models 使用语言模型估计的概率衡量"好度"
  • Language model can also be used for generation 语言模型也可以用于生成
  • Language model is useful for: 语言模型对以下方面很有用
    • Query completion 查询补全
    • Optical character recognition 光学字符识别
    • And other generation tasks: 和其他生成任务
      • Machine translation 机器翻译
      • Summarization 概括
      • Dialogue systems 对话系统
  • Nowadays pretrained language models are the backbone of modern NLP systems 如今,预训练的语言模型是现代NLP系统的骨干

N-gram Language Model

Probabilities: Joint to Conditional 概率:联合概率到条件概率

  • Goal of language model is to get a probability for an arbitrary sequence of m words: 语言模型的目标是获取一个任意序列的m个词的概率

  • First step is to apply the chain rule to convert joint probabilities to conditional ones: 第一步是应用链式法则,将联合概率转换为条件概率

The Markov Assumption 马尔可夫假设

  • is still intractable, so make a simplifying assumption: 还是不可处理的,所以做一个简化的假设

  • For some small n: 对于某个小的n

    • When n = 1, it is a unigram model :

      在这里插入图片描述

    • When n = 2, it is a bigram model:

      在这里插入图片描述

    • When n = 3, it is a trigram model:

      在这里插入图片描述

Maximum Likelihood Estimation 最大似然估计

  • Estimate the probabilities based on counts in the corpus: 根据语料库中的计数估计概率
    • For unigram models:

    • For bigram models:

    • For n-gram models generally:

Book-ending Sequences 序列的开头和结尾

  • Special tags used to denote start and end of sequence: 用特殊的标签表示序列的开始和结束
    • <s> = sentence start 句子开始
    • </s> = sentence end 句子结束

Problems with N-gram models N-gram模型的问题

  • Language has long distance effects, therefore large n required. 语言有长距离的影响,因此需要较大的n值

    The lecture/s that took place last week was/were on preprocessing

    • The “was/were” here is mentioning “lecture/s” which is 6 words ahead. Therefore need a 6-grams
  • Resulting probabilities are often very small 结果的概率通常非常小

    • Possible solution: Use log probability to avoid numerical underflow 可能的解决方案:使用对数概率以避免数值下溢问题
  • Unseen words: 未见过的词

    • Special symbol to represent. E.g. <UNK> 用特殊符号表示
  • Unseen n-grams: Because the opertaion is multiplication, if one term in the multiplication is 0 then whole probability is 0 未见过的n-grams:因为操作是乘法,如果乘法中的一个术语为0,那么整个概率就是0

    • Need to smooth the n-gram language model 需要对n-gram语言模型进行平滑处理

Smoothing

Smoothing 平滑处理

  • Basic idea: give events you have never seen before some probability 基本思想:给你从未见过的事件赋予一些概率
  • Must be the case that
  • Many different kinds of smoothing
    • Laplacian(add-one) smoothing
    • Add-k smoothing
    • Absolute discounting
    • Katz Backoff
    • Kneser-Ney
    • Interpolation
    • Interpolated Kneser-Ney Smoothing

Laplacian(add-one) smoothing

  • Simple idea: pretend we have seen each n-gram once more than we did. 假装我们看到每个n-gram多了一次

  • For unigram models:

  • For bigram models:

Add-k smoothing

  • Adding one is often too much. Instead, add a fraction k. 加一通常太多。相反,加一个k的分数

  • Also called Lidstone Smoothing

  • Have to choose k 需要选择k的值

Absolute Discounting

  • Borrows a fixed probability mass from observed n-gram counts 从观察到的n-gram计数中借来固定的概率质量
  • Redistributes it to unseen n-grams 将其重新分配给未见过的n-grams

Katz Backoff

  • Absolute discounting redistributes the probability mass equally for all unseen n-grams

  • Katz Backoff: redistributes the mass based on a lower order model (e.g. Unigram)

    在这里插入图片描述

  • Problems: Has preference of high frequency words rather than true related words. 问题:倾向于高频词,而不是真正相关的词

    • E.g. I can’t see without my reading _
      • C(reading, glasses) = C(reading, Francisco) = 0
      • C(Francisco) > C(glasses)
      • Katz Backoff will give higher probability to Francisco

Kneser-Ney Smoothing

  • Redistribute probability mass based on the versatility(广泛性) of the lower order n-gram. 根据低阶n-gram的多功能性(广泛性)重新分配概率质量
  • Also called continuation probability 也称为续写概率
  • Versatility: 多功能性
    • High versatility: co-occurs with a lot of unique words 高多功能性:与许多唯一的词共现
      • E.g. glasses: men’s glasses, black glasses, buy glasses
    • Low versatility: co-occurs with few unique words 低多功能性:与少数唯一的词共现
      • E.g. Francisco: San Francisco

在这里插入图片描述

  • Intuitively the numerator of Pcont counts the number of unique wi-1 that co-occurs with wi 直观地说,Pcont的分子计算与wi共现的唯一wi-1的数量
  • High continuation counts for glasses and low continuation counts for Francisco 对于眼镜有高的续写计数,对于Francisco有低的续写计数

Interpolation

  • A better way to combine different orders of n-grams models 结合不同阶数的n-grams模型的更好方式
  • Weighted sum of probabilities across progressively shorter contexts 对逐渐缩短的上下文进行加权概率求和
  • E.g. Interpolated trigram model: 插值trigram模型:

    PIN(wi|wi-1, wi-2) = λ3P3(wi|wi-2, wi-1) + λ2P2(wi|wi-1) + λ1P1(wi)
    λ3 + λ2 + λ1 = 1

Interpolated Kneser-Ney Smoothing

  • Interpolation instead of back-off 使用插值而不是回退

在这里插入图片描述

In Practice

  • Commonly used Kneser-Ney language models use 5-grams as max order 常用的Kneser-Ney语言模型将5-grams作为最大阶数
  • Has different discount values for each n-gram order 对每个n-gram阶数有不同的折扣值

Generating Language

Generation 生成

  • Given an initial word, draw the next word according to the probability distribution produced by the language model. 给定一个初始词,根据语言模型产生的概率分布选择下一个词

  • Include n-1 <s> tokens for n-gram model to provide context to generate first word 对于n-gram模型,包括n-1个标记,以便提供上下文来生成第一个词

    • Never generate <s> 永不生成
    • Generating </s> terminates the sequence 生成会结束序列
  • E.g.

在这里插入图片描述

How to select next word

  • Argmax: Takes highest probability word each turn. 每次选择概率最高的词

    • Greedy Search 贪婪搜索
  • Beam Search Decoding:

    • Keeps track of top-N highest probability words each turn 每次跟踪前N个概率最高的词
    • Select sequence of words that produce the best sentence probability 选择产生最佳句子概率的词序列
  • Randomly samples from the distribution 从分布中随机采样

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/617009.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Mujoco210 Ubuntu 22.04配置安装

.1 下载 1.1 解压 先是下载软件包 然后 mkdir ~/.mujoco缩包所在位置&#xff08;一般在下载目录下&#xff09;在终端打开&#xff0c;输入以下命令将压缩包解压到.mujoco文件夹中&#xff1a; tar -zxvf mujoco210-linux-x86_64.tar.gz -C ~/.mujoco1.2 许可问题 有说mu…

concrt140.dll丢失怎么修复?concrt140.dll丢失的最新修复教程

今天准备打开电脑软件时候&#xff0c;当打开我自己的软件后&#xff0c;弹出了一个对话框&#xff0c;内容是&#xff1a;由于找不到concrt140.dll&#xff0c;无法继续执行代码。重新安装程序可能会解决此问题。 我很纳闷&#xff0c;前几天还好好着呢。于是我上网上查了一下…

八、EGL实践

第一部分基础概念 1&#xff09;引入 之前的OpenGLes应用的开发都是使用类GLSurfaceView &#xff0c;然而GLSurfaceView 是内部已经实现了EGL的封装&#xff0c;也就是对Display&#xff0c;surface&#xff0c;context的管理。因此我们也很方便的利用GLSurfaceView.Rendere…

零基础入门网络安全/Web安全,收藏这一篇就够了

前言 由于我之前写了不少网络安全技术相关的文章和回答&#xff0c;不少读者朋友知道我是从事网络安全相关的工作&#xff0c;于是经常有人私信问我&#xff1a; 我刚入门网络安全&#xff0c;该怎么学&#xff1f;要学哪些东西&#xff1f;有哪些方向&#xff1f;怎么选&…

高频面试八股文用法篇(八) == 和 equals 的区别

目录 区别 如何对equals重写 为何重写equals方法就得重写hashCode方法 扩展延伸 1、使用HashSet存储自定义类对象时为什么要重写equals和hashCode方法&#xff1f; 2、HashMap为什么要同时重写hashCode和equals方法 区别 一、对象类型不同 1、equals()&#xff1a;是超类…

第二章:MySQL环境搭建

第二章&#xff1a;MySQL环境搭建 2.1&#xff1a;MySQL的下载、安装、配置 MySQL的四大版本 MySQL Community Server社区版本&#xff1a;开源免费、自由下载&#xff0c;但不提供官方技术支持&#xff0c;适用于大多数普通用户。MySQL Enterprise Edition企业版本&#xff1…

SpringBoot个人博客系统(含源码+数据库)

一、作品设计理念 个人博客系统是一个让个人可以通过互联网自由表达、交流和分享的平台&#xff0c;是个人展示自己思想、感受和经验的品牌。设计理念对于任何一个个人博客系统来说都非常重要&#xff0c;它直接影响到用户的使用体验和网站的整体感觉。 好的设计理念应该着眼于…

小红书热搜榜TOP1,多巴胺时尚爆火,怎么抄作业?

今夏时尚&#xff0c;明媚与简约并存。要说今年夏天什么最火&#xff1f;多巴胺必须拥有姓名。无论男女、老少、人宠&#xff0c;都被这股快乐风带飞。 “多巴胺”有多火&#xff1f;就只是彩色穿搭吗&#xff1f;各大博主、品牌若想加入&#xff0c;要怎么玩&#xff1f;今儿&…

Python如何解决“京东滑块验证码”(5)

前言 本文是该专栏的第51篇,后面会持续分享python爬虫干货知识,记得关注。 多数情况下使用模拟登录会遇到滑块验证码的问题,对于普通的滑块验证码,使用selenium可以轻松解决。但是对于滑块缺失验证码,比如京东的滑块验证要怎么解决呢?京东滑块验证的这个滑块缺口,每次刷…

软件测试报告模板范文来了——优秀测试报告模板流程

一、软件测试报告是什么&#xff1f; 软件测试报告就是当软件开发人员开发出软件之后&#xff0c;在上市前交由测试人员进行一系列测试&#xff0c;再由测试人员对过程和结果的进行记录分析的一份文档。也是测试团队的工作成果展现&#xff0c;通过详细的记录测试内容&#xf…

算法修炼之筑基篇——筑基二层中期(讨论一下如何解决动态方程问题,没时间了,快快快看一下)

✨博主&#xff1a;命运之光 &#x1f984;专栏&#xff1a;算法修炼之练气篇 &#x1f353;专栏&#xff1a;算法修炼之筑基篇 ✨博主的其他文章&#xff1a;点击进入博主的主页 前言&#xff1a;学习了算法修炼之练气篇想必各位蒟蒻们的基础已经非常的扎实了&#xff0c;下来…

使用FFmpeg实现最简单的视频播放

按照之前的编译步骤我们会编译得到使用ffmpeg需要的文件&#xff0c;现在就使用ffmpeg实现最简单的视频播放 集成ffmpeg 使用Android Studio创建一个Native C项目编译之后得到三个文件夹 把include 文件夹放到cpp目录下面。 main 目录下面新建jniLibs 目录把lib文件下的so文件…

Java之BigDecimal使用

Java之BigDecimal使用 1、BigDecimal概述 ​ BigDecimal用来对超过16位有效位的数进行精确的运算。双精度浮点型变量double可以处理16位有效数&#xff0c;但在实际应用中&#xff0c;可能需要对更大或者更小的数进行运算和处理。一般情况下&#xff0c;对于那些不需要准确计…

OA系统,企业数字化转型的重要工具,用现成还是自己搭建呢

什么是OA系统 OA系统是办公自动化系统的简称&#xff0c;它是指一种基于计算机技术的办公工作管理系统&#xff0c;用于协调和规划企业内部各部门的信息发布、通信、人员流动、文档管理等方面的工作。它可以有效地提高企业办公效率和工作效益&#xff0c;优化企业内部沟通协作…

计算机视觉 | 深度学习预训练与MMPretrain

前言 MMPretrain是一款基于pytorch的开源深度学习预训练工具箱&#xff0c;是OenMMLab的项目成员之一。它是一个全新升级的预训练开源算法框架&#xff0c;旨在提供各种强大的预训练主干网络&#xff0c;并支持了不同的预训练策略。 一、MMPretrain算法库介绍 MMPretrain 源…

几分钟上线一个应用,这个神器我爱了!

配置一套公司企业运用的SaaS工作流办公管理系统需要多久&#xff1f;需要多少人才能开发出来&#xff1f;传统软件开发起码需要10个人&#xff0c;花上个把月时间&#xff0c;才能做出一套比较完整的SaaS工作流办公管理系统。 传统的开发模式它需要前后端程序员以及各平台系统的…

【Docker】浅谈Docker之AUFS、BTRFS、ZFS、Container、分层的概念

作者简介&#xff1a; 辭七七&#xff0c;目前大一&#xff0c;正在学习C/C&#xff0c;Java&#xff0c;Python等 作者主页&#xff1a; 七七的个人主页 文章收录专栏&#xff1a; 七七的闲谈 欢迎大家点赞 &#x1f44d; 收藏 ⭐ 加关注哦&#xff01;&#x1f496;&#x1f…

【算法】--- 几分钟了解直接选择排序(排序中最简单的排序)+快排(解决一切的优质算法)(中)

文章目录 前言&#x1f31f;一、常见的排序算法&#xff1a;&#x1f31f;二、选择排序---直接选择排序&#xff1a;&#x1f30f;2.1.1 基本思想&#xff1a;&#x1f30f;2.1.2 直接选择排序:&#x1f30f;2.1.3 直接选择排序的特性总结&#xff1a;&#x1f30f;2.1.4 思路&…

Vue3 Vite4 ElementPlus TS模板(含Vue-Router4+Pinia4)

引言 手动安装配置Vue3 ElementPlus模板比较繁琐&#xff0c;网上寻找一些模板不太符合自己预期&#xff0c;因此花点精力搭建一个符合自己需求的架子 采用最新的组件&#xff0c;版本如下&#xff1a; vite 4.3.9vite-plugin-mock 2.9.8vue 3.3.4pinia 2.1.3vue-router 4.2.2…

总结6种服务限流的实现方式

服务限流&#xff0c;是指通过控制请求的速率或次数来达到保护服务的目的&#xff0c;在微服务中&#xff0c;我们通常会将它和熔断、降级搭配在一起使用&#xff0c;来避免瞬时的大量请求对系统造成负荷&#xff0c;来达到保护服务平稳运行的目的。下面就来看一看常见的6种限流…