cs231n assignmen3 Extra Credit: Image Captioning with LSTMs

news2024/10/7 16:18:06

文章目录

  • 嫌墨迹直接看代码
  • Extra Credit: Image Captioning with LSTMs
    • lstm_step_forward
      • 题面
      • 解析
      • 代码
      • 输出
    • lstm_step_backward
      • 题面
      • 解析
      • 代码
      • 输出
    • lstm_forward
      • 题面
      • 解析
      • 代码
      • 输出
    • lstm_backward
      • 题面
      • 解析
      • 代码
      • 输出
    • CaptioningRNN.loss
      • 解析
      • 代码
      • 输出
    • 最后输出
    • 结语

嫌墨迹直接看代码

Extra Credit: Image Captioning with LSTMs

lstm_step_forward

题面

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
结合课程和上面的讲解,这部分就是让我们来实现lstm的前向操作,具体的操作流程在上面都写好了

解析

看代码注释吧

代码

def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
    """Forward pass for a single timestep of an LSTM.

    The input data has dimension D, the hidden state has dimension H, and we use
    a minibatch size of N.

    Note that a sigmoid() function has already been provided for you in this file.

    Inputs:
    - x: Input data, of shape (N, D)
    - prev_h: Previous hidden state, of shape (N, H)
    - prev_c: previous cell state, of shape (N, H)
    - Wx: Input-to-hidden weights, of shape (D, 4H)
    - Wh: Hidden-to-hidden weights, of shape (H, 4H)
    - b: Biases, of shape (4H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - next_c: Next cell state, of shape (N, H)
    - cache: Tuple of values needed for backward pass.
    """
    next_h, next_c, cache = None, None, None
    #############################################################################
    # TODO: Implement the forward pass for a single timestep of an LSTM.        #
    # You may want to use the numerically stable sigmoid implementation above.  #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    # 计算a
    a = x.dot(Wx) + prev_h.dot(Wh) + b
    # 分割a
    ai, af, ao, ag = np.split(a, 4, axis=1)
    # 计算i, f, o, g
    i = sigmoid(ai)
    f = sigmoid(af)
    o = sigmoid(ao)
    g = np.tanh(ag)

    # 计算next_c
    next_c = f * prev_c + i * g
    # 计算next_h
    next_h = o * np.tanh(next_c)

    cache = (x, prev_h, prev_c, Wx, Wh, b, a, i, f, o, g, next_c, next_h)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return next_h, next_c, cache

输出

在这里插入图片描述

lstm_step_backward

题面

在这里插入图片描述
计算lstm的反向操作

解析

sigmoid求导
在这里插入图片描述

Tanh 求导

在这里插入图片描述
反向传播讲解可以看这个

然后结合代码注释,想想链式求导法则就好了

代码

def lstm_step_backward(dnext_h, dnext_c, cache):
    """Backward pass for a single timestep of an LSTM.

    Inputs:
    - dnext_h: Gradients of next hidden state, of shape (N, H)
    - dnext_c: Gradients of next cell state, of shape (N, H)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient of input data, of shape (N, D)
    - dprev_h: Gradient of previous hidden state, of shape (N, H)
    - dprev_c: Gradient of previous cell state, of shape (N, H)
    - dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
    - dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
    - db: Gradient of biases, of shape (4H,)
    """
    dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None
    #############################################################################
    # TODO: Implement the backward pass for a single timestep of an LSTM.       #
    #                                                                           #
    # HINT: For sigmoid and tanh you can compute local derivatives in terms of  #
    # the output value from the nonlinearity.                                   #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    (x, prev_h, prev_c, Wx, Wh, b, a, i, f, o, g, next_c, next_h) = cache

    # 计算dnext_c
    dnext_c += dnext_h * o * (1 - np.tanh(next_c) ** 2)
    # 计算dprev_c
    dprev_c = dnext_c * f

    # 计算da
    dai = dnext_c * g * i * (1 - i)
    daf = dnext_c * prev_c * f * (1 - f)
    dao = dnext_h * np.tanh(next_c) * o * (1 - o)
    dag = dnext_c * i * (1 - g ** 2)
    # 组合
    da = np.concatenate((dai, daf, dao, dag), axis=1)

    # 计算dx
    dx = da.dot(Wx.T)
    # 计算dprev_h
    dprev_h = da.dot(Wh.T)
    # 计算dWx
    dWx = x.T.dot(da)
    # 计算dWh
    dWh = prev_h.T.dot(da)
    # 计算db
    db = np.sum(da, axis=0)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return dx, dprev_h, dprev_c, dWx, dWh, db

输出

在这里插入图片描述

lstm_forward

题面

在这里插入图片描述
在这里插入图片描述
让我们实现lstm整个的前向传播

解析

没啥好说的

代码

def lstm_forward(x, h0, Wx, Wh, b):
    """Forward pass for an LSTM over an entire sequence of data.
    
    We assume an input sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running the LSTM forward,
    we return the hidden states for all timesteps.

    Note that the initial cell state is passed as input, but the initial cell state is set to zero.
    Also note that the cell state is not returned; it is an internal variable to the LSTM and is not
    accessed from outside.

    Inputs:
    - x: Input data of shape (N, T, D)
    - h0: Initial hidden state of shape (N, H)
    - Wx: Weights for input-to-hidden connections, of shape (D, 4H)
    - Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
    - b: Biases of shape (4H,)

    Returns a tuple of:
    - h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
    - cache: Values needed for the backward pass.
    """
    h, cache = None, None
    #############################################################################
    # TODO: Implement the forward pass for an LSTM over an entire timeseries.   #
    # You should use the lstm_step_forward function that you just defined.      #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    (N, T, D) = x.shape
    (N, H) = h0.shape

    # 初始化c0
    c = np.zeros((N, H))
    # 初始化h
    h = np.zeros((N, T, H))
    # 初始化cache
    cache = []

    prev_h = h0
    prev_c = c

    # 遍历每个时间步
    for t in range(T):
        # 计算h和c
        next_h, next_c, cache_t = lstm_step_forward(x[:, t, :], prev_h, prev_c, Wx, Wh, b)
        # 更新prev_h和prev_c
        prev_h = next_h
        prev_c = next_c
        # 添加h
        h[:, t, :] = next_h
        # 添加cache
        cache.append(cache_t)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return h, cache

输出

在这里插入图片描述

lstm_backward

题面

在这里插入图片描述

解析

理解了上面的代码的话,这个写起来应该没压力

代码

def lstm_backward(dh, cache):
    """Backward pass for an LSTM over an entire sequence of data.

    Inputs:
    - dh: Upstream gradients of hidden states, of shape (N, T, H)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient of input data of shape (N, T, D)
    - dh0: Gradient of initial hidden state of shape (N, H)
    - dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
    - dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
    - db: Gradient of biases, of shape (4H,)
    """
    dx, dh0, dWx, dWh, db = None, None, None, None, None
    #############################################################################
    # TODO: Implement the backward pass for an LSTM over an entire timeseries.  #
    # You should use the lstm_step_backward function that you just defined.     #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    (x, prev_h, prev_c, Wx, Wh, b, a, i, f, o, g, next_c, next_h) = cache[0]
    (N, T, H) = dh.shape
    (N, D) = x.shape

    # 初始化梯度
    dx = np.zeros((N, T, D))
    dnext_c = np.zeros((N, H))
    dnext_h = np.zeros((N, H))
    dWx = np.zeros((D, 4 * H))
    dWh = np.zeros((H, 4 * H))
    db = np.zeros((4 * H))

    # 反向传播
    for t in reversed(range(T)):
        # 计算梯度
        dnext_h += dh[:, t, :]
        dx[:, t, :], dnext_h, dnext_c, dWx_t, dWh_t, db_t = lstm_step_backward(dnext_h, dnext_c, cache[t])
        # 更新梯度
        dWx += dWx_t
        dWh += dWh_t
        db += db_t

    # 计算dh0
    dh0 = dnext_h

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return dx, dh0, dWx, dWh, db

输出

在这里插入图片描述

CaptioningRNN.loss

解析

这个因为之前我就写好了,所以我直接吧代码贴上来了,其实之前写过普通RNN的话,不难理解这里的操作

代码

    def loss(self, features, captions):
        """
        Compute training-time loss for the RNN. We input image features and
        ground-truth captions for those images, and use an RNN (or LSTM) to compute
        loss and gradients on all parameters.

        Inputs:
        - features: Input image features, of shape (N, D)
        - captions: Ground-truth captions; an integer array of shape (N, T + 1) where
          each element is in the range 0 <= y[i, t] < V

        Returns a tuple of:
        - loss: Scalar loss
        - grads: Dictionary of gradients parallel to self.params
        """
        # Cut captions into two pieces: captions_in has everything but the last word
        # and will be input to the RNN; captions_out has everything but the first
        # word and this is what we will expect the RNN to generate. These are offset
        # by one relative to each other because the RNN should produce word (t+1)
        # after receiving word t. The first element of captions_in will be the START
        # token, and the first element of captions_out will be the first word.
        captions_in = captions[:, :-1]
        captions_out = captions[:, 1:]

        # You'll need this
        mask = captions_out != self._null

        # Weight and bias for the affine transform from image features to initial
        # hidden state
        W_proj, b_proj = self.params["W_proj"], self.params["b_proj"]

        # Word embedding matrix
        W_embed = self.params["W_embed"]

        # Input-to-hidden, hidden-to-hidden, and biases for the RNN
        Wx, Wh, b = self.params["Wx"], self.params["Wh"], self.params["b"]

        # Weight and bias for the hidden-to-vocab transformation.
        W_vocab, b_vocab = self.params["W_vocab"], self.params["b_vocab"]

        loss, grads = 0.0, {}
        ############################################################################
        # TODO: Implement the forward and backward passes for the CaptioningRNN.   #
        # In the forward pass you will need to do the following:                   #
        # (1) Use an affine transformation to compute the initial hidden state     #
        #     from the image features. This should produce an array of shape (N, H)#
        # (2) Use a word embedding layer to transform the words in captions_in     #
        #     from indices to vectors, giving an array of shape (N, T, W).         #
        # (3) Use either a vanilla RNN or LSTM (depending on self.cell_type) to    #
        #     process the sequence of input word vectors and produce hidden state  #
        #     vectors for all timesteps, producing an array of shape (N, T, H).    #
        # (4) Use a (temporal) affine transformation to compute scores over the    #
        #     vocabulary at every timestep using the hidden states, giving an      #
        #     array of shape (N, T, V).                                            #
        # (5) Use (temporal) softmax to compute loss using captions_out, ignoring  #
        #     the points where the output word is <NULL> using the mask above.     #
        #                                                                          #
        #                                                                          #
        # Do not worry about regularizing the weights or their gradients!          #
        #                                                                          #
        # In the backward pass you will need to compute the gradient of the loss   #
        # with respect to all model parameters. Use the loss and grads variables   #
        # defined above to store loss and gradients; grads[k] should give the      #
        # gradients for self.params[k].                                            #
        #                                                                          #
        # Note also that you are allowed to make use of functions from layers.py   #
        # in your implementation, if needed.                                       #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # 第一步,使用全连接层,将图像特征转换为隐藏层的初始状态
        h0, cache_h0 = affine_forward(features, W_proj, b_proj)
        # 第二步,使用词嵌入层,将输入的单词转换为词向量
        word_vector, cache_word_vector = word_embedding_forward(captions_in, W_embed)
        # 第三步,使用RNN或者LSTM,将词向量序列转换为隐藏层状态序列
        if self.cell_type == "rnn":
            h, cache_h = rnn_forward(word_vector, h0, Wx, Wh, b)
        elif self.cell_type == "lstm":
            h, cache_h = lstm_forward(word_vector, h0, Wx, Wh, b)
        # 第四步,使用全连接层,将隐藏层状态序列转换为词汇表上的得分序列
        scores, cache_scores = temporal_affine_forward(h, W_vocab, b_vocab)
        # 第五步,使用softmax,计算损失
        loss, dscores = temporal_softmax_loss(scores, captions_out, mask)

        # 反向传播
        # 第四步,全连接层的反向传播
        dh, dW_vocab, db_vocab = temporal_affine_backward(dscores, cache_scores)
        # 第三步,RNN或者LSTM的反向传播
        if self.cell_type == "rnn":
            dword_vector, dh0, dWx, dWh, db = rnn_backward(dh, cache_h)
        elif self.cell_type == "lstm":
            dword_vector, dh0, dWx, dWh, db = lstm_backward(dh, cache_h)
        # 第二步,词嵌入层的反向传播
        dW_embed = word_embedding_backward(dword_vector, cache_word_vector)
        # 第一步,全连接层的反向传播
        dfeatures, dW_proj, db_proj = affine_backward(dh0, cache_h0)

        # 将梯度保存到grads中
        grads["W_proj"] = dW_proj
        grads["b_proj"] = db_proj
        grads["W_embed"] = dW_embed
        grads["Wx"] = dWx
        grads["Wh"] = dWh
        grads["b"] = db
        grads["W_vocab"] = dW_vocab
        grads["b_vocab"] = db_vocab

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        return loss, grads

输出

在这里插入图片描述

最后输出

在这里插入图片描述
在这里插入图片描述

结语

通过整个对cs231n的学习,让我们对整个深度学习有了个基础的认识,但是总体来说还是比较入门的讲解,对于深度学习的学习,还需要不断地钻研,这几个实验都挺好玩的,目前对于RNN虽然有了初步的印象,但是仍有一些地方比较模糊,还没有完全吃透。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/941456.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【ES6】—【新特性】—Symbol详情

一、一种新的原始数据类型 定义&#xff1a;独一无二的字符串 二、 声明方式 1. 无描述声明 let s1 Symbol() let s2 Symbol() console.log(s1, s2) // Symbol() Symbol() console.log(s1 s2) // falsePS: Symbol 声明的值是独一无二的 2. 有描述的声明 let s1 Symb…

Android自定义view实现横向滚动弹幕

参考文章 此方案使用动画方式实现&#xff0c;只适合轻量级别的弹幕滚动效果实现&#xff0c;数据量过大时会出现内存激增的情况。 效果&#xff1a; 自定义view代码 public class TumbleLayout extends ViewGroup {private final String TAG "TumbleLayout";priva…

Camunda 7.x 系列【30】中间事件

有道无术,术尚可求,有术无道,止于术。 本系列Spring Boot 版本 2.7.9 本系列Camunda 版本 7.19.0 源码地址:https://gitee.com/pearl-organization/camunda-study-demo 文章目录 1. 概述2. 消息中间事件3. 定时器中间事件4. 信号中间事件5. 错误中间事件6. 条件中间事件7…

代码随想录算法训练营第五十天|LeetCode 739,496

目录 LeetCode 739.每日温度 LeetCode 496.下一个更大元素&#xff01; LeetCode 739.每日温度 文章讲解&#xff1a;代码随想录 力扣题目&#xff1a;力扣&#xff08;LeetCode&#xff09;官网 - 全球极客挚爱的技术成长平台 代码如下&#xff08;Java&#xff09;&#xf…

Python“牵手”义乌购商品列表数据,关键词搜索义乌购API接口数据,义乌购API接口申请指南

义乌购平台API接口是为开发电商类应用程序而设计的一套完整的、跨浏览器、跨平台的接口规范&#xff0c;义乌购API接口是指通过编程的方式&#xff0c;让开发者能够通过HTTP协议直接访问义乌购平台的数据&#xff0c;包括商品信息、店铺信息、物流信息等&#xff0c;从而实现义…

国标GB28181视频平台EasyGBS视频监控平台无法播放,抓包返回ICMP排查过程

国标GB28181视频平台EasyGBS是基于国标GB/T28181协议的行业内安防视频流媒体能力平台&#xff0c;可实现的视频功能包括&#xff1a;实时监控直播、录像、检索与回看、语音对讲、云存储、告警、平台级联等功能。国标GB28181视频监控平台部署简单、可拓展性强&#xff0c;支持将…

产品经理工作常见的4大误区

产品管理对项目来说非常重要&#xff0c;但在日常工作中&#xff0c;我们往往容易进入思维误区&#xff0c;如果我们没有及时发现错误并进行纠正&#xff0c;这会对产品需求工作以及项目进度产生较大影响。 因此我们需要重视产品工作中常见的思维误区并及时避免&#xff0c;常见…

2023Web自动化测试的技术框架和工具有哪些?

Web 自动化测试是一种自动化测试方式&#xff0c;旨在模拟人工操作对 Web 应用程序进行测试。这种测试方式可以提高测试效率和测试精度&#xff0c;减少人工测试的工作量和测试成本。在 Web 自动化测试中&#xff0c;技术框架和工具起着至关重要的作用。本文将介绍几种常见的 W…

如何让你的交易高效且安全?离不开这项技术

作者&#xff5c;Jason Jiang 在区块链技术演变过程中&#xff0c;有两个关键问题始终绕不过去&#xff1a;隐私与扩容。当我们探寻这两个问题的“标准解法”时&#xff0c;却发现它们都离不开一种技术&#xff0c;那就是&#xff1a;零知识证明。什么是零知识证明&#xff1f…

Dubbo源码环境搭建

背景 Dubbo 作为一款微服务框架&#xff0c;最重要的是向用户提供跨进程的 RPC 远程调用能力。如上图所示&#xff0c;Dubbo 的服务消费者&#xff08;Consumer&#xff09;通过一系列的工作将请求发送给服务提供者&#xff08;Provider&#xff09;。 为了实现这样一个目标&a…

Apipost: 程序员必备的API管理神器

作为一款专为程序员打造的API管理工具&#xff0c;Apipost也成为开发人员圈子里的一款热门工具。Apipost拥有强大的功能和便捷操作性&#xff0c;这也让许多开发者爱不释手。那么&#xff0c;Apipost到底有哪些吸引人的特点呢&#xff1f;本文将为您详细介绍。 统一API管理 A…

【具身智能】论文系列解读-RL-ViGen

1. RL-ViGen&#xff1a;视觉泛化的强化学习基准 RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization 0 摘要与总结 视觉强化学习&#xff08;Visual RL&#xff09;与高维观察相结合&#xff0c;一直面临着分布外泛化的长期挑战。尽管重点关注旨在解…

[QT]设置程序仅打开一个,再打开就唤醒已打开程序的窗口

需求&#xff1a;speedcrunch 这个软件是开源的计算器软件。配合launch类软件使用时&#xff0c;忘记关闭就经常很多窗口&#xff0c;强迫症&#xff0c;从网上搜索对版本进行了修改。 #include "gui/mainwindow.h"#include <QCoreApplication> #include <…

如何基于自己训练的Yolov5权重,结合DeepSort实现目标跟踪

网上有很多相关不错的操作demo&#xff0c;但自己在训练过程仍然遇到不少疑惑。因此&#xff0c;我这总结一下操作过程中所解决的问题。 1、deepsort的训练集是否必须基于逐帧视频&#xff1f; 我经过尝试&#xff0c;发现非连续性的图像仍可以作为训练集。一个实例&#xff0…

kubernetes搭建及基本使用

1. 前置要求准备 一台或多台机器&#xff0c;操作系统 CentOS7.x-86_x64硬件配置&#xff1a;2GB 或更多 RAM&#xff0c;2 个 CPU 或更多 CPU&#xff0c;硬盘 30GB 或更多集群中所有机器之间网络互通可以访问外网&#xff0c;需要拉取镜像禁止 swap 分区 此处我是白嫖的谷歌云…

No122.精选前端面试题,享受每天的挑战和学习

文章目录 1、vue中key的作用2、如何让vue页面重新渲染3、组件之间通信方式4、vue为什么要mutation、 action操作5、插槽、具名插槽、作用域插槽6、用set求两个数组的交集7、树用js如何实现&#xff1f; 1、vue中key的作用 在Vue中&#xff0c;key的作用是帮助Vue识别每个VNode…

性能评估之旅:软件测试的神秘工具与方法论

引言&#xff1a;性能评估的重要性 在当今的软件开发领域&#xff0c;性能评估已经成为了一个不可或缺的环节。随着用户对于软件响应速度和稳定性的要求越来越高&#xff0c;如何确保软件在各种环境下都能稳定运行&#xff0c;成为了每一个开发者和测试者必须面对的问题。性能…

【App端】uni-app使用echarts和百度地图api

目录 前言方案一&#xff1a;echarts百度地图获取百度地图AK安装echarts和引入百度地图api完整使用代码 方案二&#xff1a;echarts地图和柱状图变形动画完整使用代码 前言 近期的app项目中想加一个功能&#xff0c;展示全国各地的某一数据统计情况&#xff0c;想来想去&#…

搭建Android自动化python+appium环境

一. 需要软件 1. JDK:JAVA安装后配置JDK环境 2. SDK:SDK下载后配置adb环境 3. Python:pyhton语言 4. Pycharm:python脚本编译工具 5. Appium-python-client:pyhton中的库 6. Appium客户端 二. 搭建步骤 1. 配置JDK环境 ①. 下载安装java: https://www.oracle.com/jav…

windows下Mysql安装配置教程

Mysql下载 在官网下载mysql community Server https://dev.mysql.com/downloads/mysql/ 可以选择下载压缩包或者MSI安装程序 使用压缩包安装 MySQL 压缩包安装通常需要以下步骤&#xff1a; 1. 下载 MySQL 安装包 你可以从 MySQL 官网上下载适合你系统的 MySQL 安装包&am…