Understanding LSTM Networks

news2024/9/19 11:04:35

文章目录

    • Recurrent Neural Networks
    • The Problem of Long-Term Dependencies
    • LSTM Networks
    • The Core Idea Behind LSTMs.
    • Step-by-Step LSTM Walk Through

本篇文章记述了自己对“Understanding LSTM Networks”的理解

Recurrent Neural Networks

Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence.

Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

在这里插入图片描述

Recurrent Neural Networks have loops.

In the above diagram, a chunk of neural network, A A A, looks at some input x t x_t xt and outputs a value h t h_t ht. A loop allows information to be passed from one step of the network to the next.

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop:

在这里插入图片描述

An unrolled recurrent neural network.

This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data.

And they certainly are used! In the last few years, there have been incredible success applying RNNs to a variety of problems: speech recognition, language modeling, translation, image captioning… The list goes on. I’ll leave discussion of the amazing feats one can achieve with RNNs to Andrej Karpathy’s excellent blog post, The Unreasonable Effectiveness of Recurrent Neural Networks. But they really are pretty amazing.

Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which works, for many tasks, much much better than the standard version. Almost all exciting results based on recurrent neural networks are achieved with them. It’s these LSTMs that this essay will explore.

The Problem of Long-Term Dependencies

One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends.

Sometimes, we only need to look at recent information to perform the present task. For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information.

在这里插入图片描述
But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large.

Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.

在这里插入图片描述

LSTM Networks

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in following work.1 They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.

在这里插入图片描述

The repeating module in a standard RNN contains a single layer.

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.

在这里插入图片描述

The repeating module in an LSTM contains four interacting layers.

Don’t worry about the details of what’s going on. We’ll walk through the LSTM diagram step by step later. For now, let’s just try to get comfortable with the notation we’ll be using.

在这里插入图片描述
In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations

The Core Idea Behind LSTMs.

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.

The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.

在这里插入图片描述
The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation.

在这里插入图片描述
The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”

An LSTM has three of these gates, to protect and control the cell state.

Step-by-Step LSTM Walk Through

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct−1. A 1 represents “completely keep this” while a 0 represents “completely get rid of this.”

Let’s go back to our example of a language model trying to predict the next word based on all the previous ones. In such a problem, the cell state might include the gender of the present subject, so that the correct pronouns can be used. When we see a new subject, we want to forget the gender of the old subject.

在这里插入图片描述
The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, C t ~ \tilde{C_t} Ct~ , that could be added to the state. In the next step, we’ll combine these two to create an update to the state.

In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting.

在这里插入图片描述
It’s now time to update the old cell state, C t − 1 C_{t-1} Ct1, into the new cell state C t C_t Ct. The previous steps already decided what to do, we just need to actually do it.

We multiply the old state by f t f_t ft, forgetting the things we decided to forget earlier. Then we add i t ∗ C t ~ i_t*\tilde{C_t} itCt~. This is the new candidate values, scaled by how much we decided to update each state value.

In the case of the language model, this is where we’d actually drop the information about the old subject’s gender and add the new information, as we decided in the previous steps.

在这里插入图片描述
Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through t a n h tanh tanh (to push the values to be between − 1 −1 1 and 1 1 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows next.

在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/172778.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

springmvc统一日志打印request和response内容

在web项目中,有不少场景需要统一处理一些和实际业务基本不相关的逻辑,比如rest接口的监控、出入参日志、操作记录、统一异常处理(避免将错误堆栈等信息直接打到web端)。如果你觉得日志打印rest接口出入参非常简单,直接getParameter()就好了&a…

Redis6学习笔记【part4】Jedis-API与手机验证码功能实现

1.连接 Jedis 第一步&#xff0c;修改 redis 的配置&#xff0c;以允许外网 ip 访问 redis。 在 redis.conf 中注释掉 bind 127.0.0.1 &#xff0c;并修改 protected-mode no 。 第二步&#xff0c;导入依赖。 <dependency><groupId>redis.clients</groupId…

Unity 进阶 之 资源文件夹下资源名的重名检查,并简单生产资源表的方法整理

Unity 进阶 之 资源文件夹下资源名的重名检查,并简单生产资源表的方法整理 目录 Unity 进阶 之 资源文件夹下资源名的重名检查,并简单生产资源表的方法整理 一、简单介绍 二、简单实现过程 三、关键代码 一、简单介绍 Unity中的一些知识点整理。 本节简单介绍在Unity开发…

python使用sentinelsat库下载sentinel影像数据

GIS遥感不分家&#xff0c;最近开始找一些影像的下载脚本了&#xff0c;这两天搞定了哨兵和modis的&#xff0c;分别贴一下 鉴于《Python中使用sentinelsat包自动下载Sentinel系列数据》这篇文章已经写得非常全乎&#xff0c;这里就简单补充一下&#xff0c;放个最简单的下载脚…

Vue CLI(Vue.js 开发的标准工具)

Vue CLI&#xff08;Vue.js 开发的标准工具&#xff09;参考描述Vue CLI获取检测项目创建项目Please pick a presetCheck the features needed for your projectChoose a version of Vue.jsPrefer placing configSave this as a preset for future projects?Save preset asFin…

[MRCTF2020]Ezaudit(随机数的安全)

目录 信息收集 代码审计 相关函数 前提知识 思路分析 补充知识 信息收集 查看源代码没有发现有用信息&#xff0c;尝试dirmap扫下目录 python3 dirmap.py -i 网址 -lcf 扫描时发现一个www.zip目录 下载到一份index.php文件&#xff0c;找到一个login.html <?php h…

docker安装pg数据库及pg数据库基本操作

一、首先准备pg数据库的docker镜像二、先创建一个文件作为pg数据库数据文件、配置文件等的外部挂载文件三、创建镜像docker run -it -d --name postgres14 --restartalways --privilegedtrue -p 5432:5432 -e POSTGRES_PASSWORDpostgres -v /home/fengyang/pg_data:/var/lib/po…

SpringBoot+VUE前后端分离项目学习笔记 - 【25 SpringBoot实现1对1、1对多、多对多关联查询】

新增课程Course页面&#xff0c;实现学生选课功能、课程教授老师选择等功能 1. 课程与授课老师是一对一关系 因为course表仅记录了teacherid&#xff0c;而页面需要的是老师的名字 select course.*,sys_user.id from course left join sys_user **on ** course.teacher_id sys…

第六章SpringFramework之声明事务

文章目录JdbcTemplate准备工作导入依赖创建jdbc.properties配置Spring的配置文件配置测试类的环境实例声明式事务概念先看看对应的编程式事务声明式事务通过一个案例了解声明式事务前提准备三层架构的构建模拟场景的情况添加事务Spring声明式事务的属性事务注解标识的位置事务属…

手把手教你学51单片机-点亮你的LED

单片机内部资源 Flash——程序存储空间。对于单片机来说 Flash 最大的意义是断电后数据 不丢失。 RAM——数据存储空间。RAM 是单片机的数据存储空间,用来存储程序运行过程中产生的和需要的数据,关电后数据丢失 SFR——特殊功能寄存器。通过对特殊工程寄存器读写操作,可以…

循环语句(循环结构)——“C”

各位CSDN的uu们好呀&#xff0c;我又来啦&#xff0c;今天&#xff0c;小雅兰给大家介绍的是一个知识点&#xff0c;就是循环语句啦&#xff0c;包括while循环、do-while循环、for循环&#xff0c;话不多说&#xff0c;让我们一起进入循环结构的世界吧 首先&#xff0c;我们先来…

利用Python暴力破解邻居家WiFi密码

如觉得博主文章写的不错或对你有所帮助的话&#xff0c;还望大家多多支持呀&#xff01;关注、点赞、收藏、评论。 文章目录一、编写代码二、展示测试结果三、测试四、生成密码本&#xff08;建议自己找一个密码本&#xff09;一、编写代码 在桌面新建一个文件 如果你新建的文…

如何实现everything的http外网访问

Everything是voidtools开发的一款文件搜索工具&#xff0c;官网描述为“基于名称快速定位文件和文件夹。”可以实现快速文件索引、快速搜索、最小资源使用、便于文件分享等功能。 everything部署本地后&#xff0c;可以开启配置Http访问功能&#xff0c;这样在局域网内就可以直…

【自用】Git日常开发教程

因为经常容易忘记指令&#xff08;年纪大了&#xff09;&#xff0c;所以打算记录一下将一堆文件从vscode上传到GitHub仓库 目录软件下载初始化状态过程可能出现的错误其他操作参考资料软件下载 https://gitforwindows.org/ https://code.visualstudio.com/ 初始化状态 过程 …

上海亚商投顾:两市缩量微涨,数字经济概念全线走强

上海亚商投顾前言&#xff1a;无惧大盘涨跌&#xff0c;解密龙虎榜资金&#xff0c;跟踪一线游资和机构资金动向&#xff0c;识别短期热点和强势个股。市场情绪三大指数今日缩量震荡&#xff0c;黄白二线有所分化&#xff0c;题材概念表现活跃。数字经济概念全线走强&#xff0…

8、MariaDB11数据库安装初始化密码

MariaDB11安装 安装前准备 下载安装包 点我去MariaDB官网下载安装包 查看相关文档 Mariadb Server官方文档 使用zip安装 解压缩zip 将下载到的zip解压缩到想安装的位置。 生成data目录 打开cmd并进入到刚才解压后的bin目录&#xff0c; 执行mysql_install_db.exe程序生…

Flowable进阶学习(二)流程部署的深入解析

文章目录一、流程部署涉及表及其结构1. 部署流程代码示例&#xff1a;2. 流程部署所涉及表&#xff1a;3. 流程部署涉及表的结构、字段解析二、流程部署中数据的存储的过程一、流程部署涉及表及其结构 1. 部署流程代码示例&#xff1a; 设计俩个流程&#xff0c;并压缩成zip包…

项目引入多类数据源依赖,MyBatisPlus 是如何确定使用哪种数据源的?

背景 壬寅年腊月廿八&#xff0c;坚守在工作岗位。看了一下项目的 pom.xml 依赖&#xff0c;发现了好几个数据库连接相关的包&#xff0c;有 commons-dbcp2、c3p0、hikaricp、druid-spring-boot-starter&#xff0c;这可是四种不同的数据库连接池呢&#xff0c;一个项目中引入…

Github访问办法

GitHub GitHub是为开发者提供Git仓库的托管服务。这是一个让开发者与朋友、同事、同学及陌生人共享代码的完美场所。 GitHub除提供Git仓库的托管服务外&#xff0c;还为开发者或团队提供了一系列功能&#xff0c;帮助其高效率、高品质地进行代码编写。 GitHub的创始人之一Chr…

使用GPIO模拟I2C的驱动程序分析

使用GPIO模拟I2C的驱动程序分析 文章目录使用GPIO模拟I2C的驱动程序分析参考资料&#xff1a;一、回顾I2C协议1.1 硬件连接1.3 协议细节二、 使用GPIO模拟I2C的要点三、 驱动程序分析3.1 平台总线设备驱动模型3.2 设备树3.3 驱动程序分析1. I2C-GPIO驱动层次2. 传输函数分析四、…