Derivatives and Differentiation (导数和微分)

news2025/4/5 7:44:32

Derivatives and Differentiation {导数和微分}

  • 1. Derivatives and Differentiation (导数和微分)
    • 1.1. Visualization Utilities
  • 2. Chain Rule (链式法则)
  • 3. Discussion
  • References

For a long time, how to calculate the area of a circle remained a mystery. Then, in Ancient Greece, the mathematician Archimedes came up with the clever idea to inscribe a series of polygons with increasing numbers of vertices on the inside of a circle.

inscribe /ɪnˈskraɪb/ vt. 题写;题献;铭记;雕

For a polygon with n n n vertices, we obtain n n n triangles. The height of each triangle approaches the radius r r r as we partition the circle more finely. At the same time, its base approaches 2 π r / n 2 \pi r/n 2πr/n, since the ratio between arc and secant approaches 1 for a large number of vertices. Thus, the area of the polygon approaches 1 2 ⋅ ( 2 π r / n ) ⋅ r ⋅ n = π r 2 \frac{1}{2} \cdot (2 \pi r/n) \cdot r \cdot n = \pi r^2 21(2πr/n)rn=πr2.

arc /ɑːk/ n. 弧度;弧形物;天穹;弧光 (electric arc) adj. 圆弧的;反三角函数的 vt. 走弧线;形成电弧
secant /'siːk(ə)nt/ adj. 割的;切的;交叉的 n. 割线;正割

古希腊人把一个多边形分成三角形,并把它们的面积相加,计算多边形的面积。为了求出圆的面积,古希腊人在圆内接多边形。内接多边形的等长边越多,就越接近圆。 这个过程也被称为逼近法 (method of exhaustion)。

在这里插入图片描述
Fig. 1 Finding the area of a circle as a limit procedure.

This limiting procedure is at the root of both differential calculus and integral calculus.
微分和积分是微积分的两个分支,微分可以应用于深度学习中的优化问题。

calculus /'kælkjʊləs/ n. 微积分 (学),结石,积石
integral calculus 积分学
differential calculus 微分学

在深度学习中,我们训练模型,并不断更新它们,使它们在看到越来越多的数据时变得越来越好。通常情况下,变得更好意味着最小化一个损失函数 (loss function),即一个衡量“模型有多糟糕”这个问题的分数。我们真正关心的是生成一个模型,它能够在从未见过的数据上表现良好。但训练模型只能将模型与我们实际能看到的数据相拟合。因此,我们可以将拟合模型的任务分解为两个关键问题:

  • 优化 (optimization):用模型拟合观测数据的过程。
  • 泛化 (generalization):生成出有效性超出用于训练的数据集本身的模型。

1. Derivatives and Differentiation (导数和微分)

Put simply, a derivative is the rate of change in a function with respect to changes in its arguments. Derivatives can tell us how rapidly a loss function would increase or decrease were we to increase or decrease each parameter by an infinitesimally small amount.
在深度学习中,我们通常选择对于模型参数可微的损失函数。对于每个参数,如果我们把这个参数增加或减少一个无穷小的量,可以知道损失会以多快的速度增加或减少。

Formally, for functions f : R → R f: \mathbb{R} \rightarrow \mathbb{R} f:RR, that map from scalars to scalars (其输入和输出都是标量), the derivative of f f f at a point x x x is defined as
f ′ ( x ) = lim ⁡ h → 0 f ( x + h ) − f ( x ) h . f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h}. f(x)=h0limhf(x+h)f(x).

This term on the right hand side is called a limit and it tells us what happens to the value of an expression as a specified variable approaches a particular value. This limit tells us what the ratio between a perturbation h h h and the change in the function value f ( x + h ) − f ( x ) f(x + h) - f(x) f(x+h)f(x) converges to as we shrink its size to zero.

perturbation /ˌpɜːtə'beɪʃ(ə)n/ n. 忧虑;不安;烦恼;摄动;微扰;小变异

When f ′ ( x ) f'(x) f(x) exists, f f f is said to be differentiable at x x x; and when f ′ ( x ) f'(x) f(x) exists for all x x x on a set, e.g., the interval [ a , b ] [a,b] [a,b], we say that f f f is differentiable on this set.
如果 f ′ ( a ) f'(a) f(a) 存在,则称 f f f a a a 处是可微 (differentiable) 的。如果 f f f 在一个区间内的每个数上都是可微的,则此函数在此区间中是可微的。

Not all functions are differentiable, including many that we wish to optimize, such as accuracy and the area under the receiving operating characteristic (AUC). However, because computing the derivative of the loss is a crucial step in nearly all algorithms for training deep neural networks, we often optimize a differentiable surrogate instead.
由于计算损失的导数是几乎所有训练深度神经网络算法的关键步骤,因此我们通常会优化可微分的替代函数。

surrogate /ˈsʌrəɡət/ adj. 替代的,代理的 n. 代理人;主教代理人;遗嘱检验法官 v. 取代,替代;指定 (某人) 为自己的代理人

We can interpret the derivative f ′ ( x ) f'(x) f(x) as the instantaneous rate of change of f ( x ) f(x) f(x) with respect to x x x.
导数 f ′ ( x ) f'(x) f(x) 解释为 f ( x ) f(x) f(x) 相对于 x x x 的瞬时 (instantaneous) 变化率。所谓的瞬时变化率是基于 x x x 中的变化 h h h,且 h h h 接近 0 0 0

Let’s develop some intuition with an example. Define u = f ( x ) = 3 x 2 − 4 x u = f(x) = 3x^2-4x u=f(x)=3x24x.

Setting x = 1 x=1 x=1, we see that f ( x + h ) − f ( x ) h \frac{f(x+h) - f(x)}{h} hf(x+h)f(x) approaches 2 2 2 as h h h approaches 0 0 0. While this experiment lacks the rigor of a mathematical proof, we can quickly see that indeed f ′ ( 1 ) = 2 f'(1) = 2 f(1)=2.
通过令 x = 1 x=1 x=1 并让 h h h 接近 0 0 0 f ( x + h ) − f ( x ) h \frac{f(x+h)-f(x)}{h} hf(x+h)f(x) 的数值结果接近 2 2 2。虽然这个实验不是一个数学证明,但稍后会看到,当 x = 1 x=1 x=1 时,导数 u ′ u' u 2 2 2

rigor /ˈrɪɡə/ n. 严格,严厉;严谨,严密;严酷;艰苦;(发热前的) 寒战;(由惊吓或中毒等导致的身体) 僵直,强直
#!/usr/bin/env python
# coding=utf-8

def f(x):
    return 3 * (x ** 2) - 4 * x


def numerical_lim(f, x, h):
    return (f(x + h) - f(x)) / h


h = 0.1
for i in range(5):
    print(f'h={h:.5f}, numerical limit={numerical_lim(f, 1, h):.5f}')
    h *= 0.1

/home/yongqiang/miniconda3/bin/python /home/yongqiang/stable_diffusion_work/stable_diffusion_diffusers/yongqiang.py 
h=0.10000, numerical limit=2.30000
h=0.01000, numerical limit=2.03000
h=0.00100, numerical limit=2.00300
h=0.00010, numerical limit=2.00030
h=0.00001, numerical limit=2.00003

Process finished with exit code 0

There are several equivalent notational conventions for derivatives. Given y = f ( x ) y = f(x) y=f(x), the following expressions are equivalent:

f ′ ( x ) = y ′ = d y d x = d f d x = d d x f ( x ) = D f ( x ) = D x f ( x ) , f'(x) = y' = \frac{dy}{dx} = \frac{df}{dx} = \frac{d}{dx} f(x) = Df(x) = D_x f(x), f(x)=y=dxdy=dxdf=dxdf(x)=Df(x)=Dxf(x),

where the symbols d d x \frac{d}{dx} dxd and D D D are differentiation operators.
其中符号 d d x \frac{d}{dx} dxd D D D是 微分运算符,表示微分操作。

Below, we present the derivatives of some common functions:

d d x C = 0 for any constant  C d d x x n = n x n − 1 for  n ≠ 0 d d x e x = e x d d x ln ⁡ x = x − 1 . \begin{aligned} \frac{d}{dx} C & = 0 && \textrm{for any constant $C$} \\ \frac{d}{dx} x^n & = n x^{n-1} && \textrm{for } n \neq 0 \\ \frac{d}{dx} e^x & = e^x \\ \frac{d}{dx} \ln x & = x^{-1}. \end{aligned} dxdCdxdxndxdexdxdlnx=0=nxn1=ex=x1.for any constant Cfor n=0

  • D C = 0 DC = 0 DC=0 C C C是一个常数)
  • D x n = n x n − 1 Dx^n = nx^{n-1} Dxn=nxn1 n n n 是任意实数)
  • D e x = e x De^x = e^x Dex=ex
  • D ln ⁡ ( x ) = 1 / x D\ln(x) = 1/x Dln(x)=1/x

Functions composed from differentiable functions are often themselves differentiable. The following rules come in handy for working with compositions of any differentiable functions f f f and g g g, and constant C C C.
假设函数 f f f g g g 都是可微的, C C C 是一个常数。

d d x [ C f ( x ) ] = C d d x f ( x ) Constant multiple rule d d x [ f ( x ) + g ( x ) ] = d d x f ( x ) + d d x g ( x ) Sum rule d d x [ f ( x ) g ( x ) ] = f ( x ) d d x g ( x ) + g ( x ) d d x f ( x ) Product rule d d x f ( x ) g ( x ) = g ( x ) d d x f ( x ) − f ( x ) d d x g ( x ) g 2 ( x ) Quotient rule \begin{aligned} \frac{d}{dx} [C f(x)] & = C \frac{d}{dx} f(x) && \textrm{Constant multiple rule} \\ \frac{d}{dx} [f(x) + g(x)] & = \frac{d}{dx} f(x) + \frac{d}{dx} g(x) && \textrm{Sum rule} \\ \frac{d}{dx} [f(x) g(x)] & = f(x) \frac{d}{dx} g(x) + g(x) \frac{d}{dx} f(x) && \textrm{Product rule} \\ \frac{d}{dx} \frac{f(x)}{g(x)} & = \frac{g(x) \frac{d}{dx} f(x) - f(x) \frac{d}{dx} g(x)}{g^2(x)} && \textrm{Quotient rule} \end{aligned} dxd[Cf(x)]dxd[f(x)+g(x)]dxd[f(x)g(x)]dxdg(x)f(x)=Cdxdf(x)=dxdf(x)+dxdg(x)=f(x)dxdg(x)+g(x)dxdf(x)=g2(x)g(x)dxdf(x)f(x)dxdg(x)Constant multiple ruleSum ruleProduct ruleQuotient rule

常数相乘法则

d d x [ C f ( x ) ] = C d d x f ( x ) , \frac{d}{dx} [Cf(x)] = C \frac{d}{dx} f(x), dxd[Cf(x)]=Cdxdf(x),

加法法则

d d x [ f ( x ) + g ( x ) ] = d d x f ( x ) + d d x g ( x ) , \frac{d}{dx} [f(x) + g(x)] = \frac{d}{dx} f(x) + \frac{d}{dx} g(x), dxd[f(x)+g(x)]=dxdf(x)+dxdg(x),

乘法法则

d d x [ f ( x ) g ( x ) ] = f ( x ) d d x [ g ( x ) ] + g ( x ) d d x [ f ( x ) ] , \frac{d}{dx} [f(x)g(x)] = f(x) \frac{d}{dx} [g(x)] + g(x) \frac{d}{dx} [f(x)], dxd[f(x)g(x)]=f(x)dxd[g(x)]+g(x)dxd[f(x)],

除法法则

d d x [ f ( x ) g ( x ) ] = g ( x ) d d x [ f ( x ) ] − f ( x ) d d x [ g ( x ) ] [ g ( x ) ] 2 . \frac{d}{dx} \left[\frac{f(x)}{g(x)}\right] = \frac{g(x) \frac{d}{dx} [f(x)] - f(x) \frac{d}{dx} [g(x)]}{[g(x)]^2}. dxd[g(x)f(x)]=[g(x)]2g(x)dxd[f(x)]f(x)dxd[g(x)].

Using this, we can apply the rules to find the derivative of 3 x 2 − 4 x 3 x^2 - 4x 3x24x via
d d x [ 3 x 2 − 4 x ] = 3 d d x x 2 − 4 d d x x = 6 x − 4. \frac{d}{dx} [3 x^2 - 4x] = 3 \frac{d}{dx} x^2 - 4 \frac{d}{dx} x = 6x - 4. dxd[3x24x]=3dxdx24dxdx=6x4.

Plugging in x = 1 x = 1 x=1 shows that, indeed, the derivative equals 2 2 2 at this location. Note that derivatives tell us the slope of a function at a particular location.
x = 1 x=1 x=1,我们有 u ′ = 2 u'=2 u=2:在这个实验中,数值结果接近 2 2 2。当 x = 1 x=1 x=1 时,此导数也是曲线 u = f ( x ) u=f(x) u=f(x) 切线的斜率。

1.1. Visualization Utilities

We can visualize the slopes of functions using the matplotlib library.

#!/usr/bin/env python
# coding=utf-8

import matplotlib
import numpy as np
from matplotlib import pyplot as plt

print(matplotlib.__version__)


def f(x):
    return 3 * (x ** 2) - 4 * x


def set_figsize(figsize=(3.5, 2.5)):
    """Set the figure size for matplotlib."""
    plt.rcParams['figure.figsize'] = figsize


def set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend):
    """Set the axes for matplotlib."""
    axes.set_xlabel(xlabel), axes.set_ylabel(ylabel)
    axes.set_xscale(xscale), axes.set_yscale(yscale)
    axes.set_xlim(xlim), axes.set_ylim(ylim)
    if legend:
        axes.legend(legend)
    axes.grid()


def plot(X, Y=None, xlabel=None, ylabel=None, legend=[], xlim=None,
         ylim=None, xscale='linear', yscale='linear',
         fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None):
    """Plot data points."""

    def has_one_axis(X):  # True if X (tensor or list) has 1 axis
        return (hasattr(X, "ndim") and X.ndim == 1 or isinstance(X, list)
                and not hasattr(X[0], "__len__"))

    if has_one_axis(X): X = [X]
    if Y is None:
        X, Y = [[]] * len(X), X
    elif has_one_axis(Y):
        Y = [Y]
    if len(X) != len(Y):
        X = X * len(Y)

    set_figsize(figsize)
    if axes is None:
        axes = plt.gca()
    axes.cla()
    for x, y, fmt in zip(X, Y, fmts):
        axes.plot(x, y, fmt) if len(x) else axes.plot(y, fmt)
    set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
    plt.show()


x = np.arange(0, 3, 0.1)
plot(x, [f(x), 2 * x - 3], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])

Conveniently, we can set figure sizes with set_figsize.
我们定义 set_figsize 函数来设置图表大小。

The set_axes function can associate axes with properties, including labels, ranges, and scales.
set_axes 函数用于设置由 matplotlib 生成图表的轴的属性。

With these three functions, we can define a plot function to overlay multiple curves. Much of the code here is just ensuring that the sizes and shapes of inputs match.
通过这三个用于图形配置的函数,定义一个 plot 函数来简洁地绘制多条曲线

Now we can plot the function u = f ( x ) u = f(x) u=f(x) and its tangent line y = 2 x − 3 y = 2x - 3 y=2x3 at x = 1 x=1 x=1, where the coefficient 2 2 2 is the slope of the tangent line.
绘制函数 u = f ( x ) u=f(x) u=f(x) 及其在 x = 1 x=1 x=1 处的切线 y = 2 x − 3 y=2x-3 y=2x3,其中系数 2 2 2 是切线的斜率。

在这里插入图片描述

2. Chain Rule (链式法则)

In deep learning, the gradients of concern are often difficult to calculate because we are working with deeply nested functions (of functions (of functions…)). Fortunately, the chain rule takes care of this.
然而,上面方法可能很难找到梯度。这是因为在深度学习中,多元函数通常是复合 (composite) 的,所以难以应用上述任何规则来微分这些函数。幸运的是,链式法则可以被用来微分复合函数。

Returning to functions of a single variable, suppose that y = f ( g ( x ) ) y = f(g(x)) y=f(g(x)) and that the underlying functions y = f ( u ) y=f(u) y=f(u) and u = g ( x ) u=g(x) u=g(x) are both differentiable.**
假设函数 y = f ( u ) y=f(u) y=f(u) u = g ( x ) u=g(x) u=g(x) 都是可微的。

The chain rule states that
d y d x = d y d u d u d x . \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx}. dxdy=dudydxdu.

Turning back to multivariate functions, suppose that y = f ( u ) y = f(\mathbf{u}) y=f(u) has variables u 1 , u 2 , … , u m u_1, u_2, \ldots, u_m u1,u2,,um, where each u i = g i ( x ) u_i = g_i(\mathbf{x}) ui=gi(x) has variables x 1 , x 2 , … , x n x_1, x_2, \ldots, x_n x1,x2,,xn, i.e., u = g ( x ) \mathbf{u} = g(\mathbf{x}) u=g(x).
假设可微分函数 y y y 有变量 u 1 , u 2 , … , u m u_1, u_2, \ldots, u_m u1,u2,,um,其中每个可微分函数 u i u_i ui 都有变量 x 1 , x 2 , … , x n x_1, x_2, \ldots, x_n x1,x2,,xn。注意, y y y x 1 , x 2 , … , x n x_1, x_2, \ldots, x_n x1,x2,xn 的函数。

Then the chain rule states that

∂ y ∂ x i = ∂ y ∂ u 1 ∂ u 1 ∂ x i + ∂ y ∂ u 2 ∂ u 2 ∂ x i + … + ∂ y ∂ u m ∂ u m ∂ x i   and so   ∇ x y = A ∇ u y , \frac{\partial y}{\partial x_{i}} = \frac{\partial y}{\partial u_{1}} \frac{\partial u_{1}}{\partial x_{i}} + \frac{\partial y}{\partial u_{2}} \frac{\partial u_{2}}{\partial x_{i}} + \ldots + \frac{\partial y}{\partial u_{m}} \frac{\partial u_{m}}{\partial x_{i}} \ \textrm{ and so } \ \nabla_{\mathbf{x}} y = \mathbf{A} \nabla_{\mathbf{u}} y, xiy=u1yxiu1+u2yxiu2++umyxium  and so  xy=Auy,

where A ∈ R n × m \mathbf{A} \in \mathbb{R}^{n \times m} ARn×m is a matrix that contains the derivative of vector u \mathbf{u} u with respect to vector x \mathbf{x} x. Thus, evaluating the gradient requires computing a vector–matrix product. This is one of the key reasons why linear algebra is such an integral building block in building deep learning systems.
这是线性代数成为构建深度学习系统不可或缺的基石的关键原因之一。

3. Discussion

First, the composition rules for differentiation can be applied routinely, enabling us to compute gradients automatically. This task requires no creativity and thus we can focus our cognitive powers elsewhere.

Second, computing the derivatives of vector-valued functions requires us to multiply matrices as we trace the dependency graph of variables from output to input. In particular, this graph is traversed in a forward direction when we evaluate a function and in a backwards direction when we compute gradients. Later chapters will formally introduce backpropagation, a computational procedure for applying the chain rule.

From the viewpoint of optimization, gradients allow us to determine how to move the parameters of a model in order to lower the loss, and each step of the optimization algorithms used throughout this book will require calculating the gradient.

  • 导数可以被解释为函数相对于其变量的瞬时变化率,它也是函数曲线的切线的斜率。
  • 梯度是一个向量,其分量是多变量函数相对于其所有变量的偏导数。
  • 链式法则可以用来微分复合函数。

References

[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2328450.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

P17_ResNeXt-50

🍨 本文为🔗365天深度学习训练营 中的学习记录博客🍖 原作者:K同学啊 一、模型结构 ResNeXt-50由多个残差块(Residual Block)组成,每个残差块包含三个卷积层。以下是模型的主要结构&#xff1…

Ubuntu上离线安装ELK(Elasticsearch、Logstash、Kibana)

在 Ubuntu 上离线安装 ELK(Elasticsearch、Logstash、Kibana)的完整步骤如下: 一.安装验证 二.安装步骤 1. 在联网机器上准备离线包 (1) 安装依赖工具 #联网机器 sudo apt update sudo apt install apt-rdepends wget(2) 下载 ELK 的 .deb 安装包 #创建目录将安装包下载…

PyCharm 下载与安装教程:从零开始搭建你的 Python 开发环境

PyCharm 是一款专为 Python 开发设计的集成开发环境(IDE),它提供了强大的代码编辑、调试、版本控制等功能,是 Python 开发者的必备工具之一。如果你是初学者,或者正在寻找一款高效的开发工具,这篇文章将帮助…

TSMaster在新能源汽车研发测试中的硬核应用指南

——从仿真到标定,全面赋能智能汽车开发 引言:新能源汽车测试的挑战与TSMaster的破局之道 新能源汽车的快速发展对研发测试提出了更高要求:复杂的电控系统、高实时性通信需求、多域融合的验证场景,以及快速迭代的开发周期。传统测…

黑白彩色相机成像原理

文章目录 黑白相机成像原理彩色相机成像原理 黑白相机成像原理 参考:B站优致谱视觉 光线聚焦:相机镜头将外界景物反射的光线聚焦到相机内部的成像平面上。光电转换:成像平面上通常是图像传感器,黑白相机常用的是CCD&#xff08…

室内指路机器人是否支持环境监测功能?

并非所有室内指路机器人都具备环境监测功能。那些支持环境监测的室内指路机器人,往往在设计上进行了针对性的优化,搭载了一系列先进且实用的传感器。温湿度传感器犹如一位敏锐的 “温度湿度侦探”,时刻精准地监测室内温度与湿度,为…

基于DrissionPage的Taptap热门游戏数据爬虫实战:从Requests到现代爬虫框架的迁移指南(含完整代码复制)

目录 ​编辑 一、项目重构背景与技术选型 1.1 原代码问题分析 1.2 DrissionPage框架优势 二、环境配置与基础改造 2.1 依赖库安装 2.2 基础类改造 三、核心功能模块重构 3.1 请求参数自动化生成 3.2 智能页面渲染 3.3 数据解析优化 四、数据库操作增强 4.1 批量插入…

MINIQMT学习课程Day8

获取qmt账号的资金账号后,我们进入下一步,如何获得当前账号的持仓情况 还是之前的步骤,打开qmt,选择独立交易, 之后使用pycharm,编写py文件。 from xtquant import xtdata from xtquant.xttrader import…

【硬件模块】数码管模块

一位数码管 共阳极数码管:8个LED共用一个阳极 数字编码00xC010xF920xA430xB040x9950x9260x8270xF880x8090x90A0x88B0x83C0xC6D0xA1E0x86F0x8E 共阴极数码管:8个LED共用一个阴极 数字编码00x3F10x0620x5B30x4F40x6650x6D60x7D70x0780x7F90x6FA0x77B0x7…

专为 零基础初学者 设计的最简前端学习路线,聚焦核心内容,避免过度扩展,帮你快速入门并建立信心!

第一阶段&#xff1a;HTML CSS&#xff08;2-3周&#xff09; 目标&#xff1a;能写出静态网页&#xff0c;理解盒子模型和布局。 HTML基础 常用标签&#xff1a;<div>, <p>, <img>, <a>, <ul>, <form> 语义化标签&#xff1a;<head…

详解大模型四类漏洞

关键词&#xff1a;大模型&#xff0c;大模型安全&#xff0c;漏洞研究 1. 引入 promptfoo&#xff08;参考1&#xff09;是一款开源大语言模型&#xff08;LLM&#xff09;测试工具&#xff0c;能对 LLM 应用进行全面漏洞测试&#xff0c;它可检测包括安全风险、法律风险在内…

Java全栈面试宝典:线程安全机制与Spring Boot核心原理深度解析

目录 一、Java线程安全核心原理 &#x1f525; 问题1&#xff1a;线程安全的三要素与解决方案 线程安全风险模型 线程安全三要素 synchronized解决方案 &#x1f525; 问题2&#xff1a;synchronized底层实现全解析 对象内存布局 Mark Word结构&#xff08;64位系统&…

Linux开发工具——apt

&#x1f4dd;前言&#xff1a; 在之前我们已经讲解了有关的Linux基础命令和Linux权限的问题&#xff0c;这篇文章我们来讲讲Linux的开发工具——apt。 &#x1f3ac;个人简介&#xff1a;努力学习ing &#x1f4cb;个人专栏&#xff1a;Linux &#x1f380;CSDN主页 愚润求学 …

嵌入式——Linux系统的使用以及编程练习

目录 一、Linux的进程、线程概念 &#xff08;一&#xff09;命令控制进程 1、命令查看各进程的编号pid 2、命令终止一个进程pid 二、初识Linux系统的虚拟机内存管理 &#xff08;一&#xff09;虚拟机内存管理 &#xff08;二&#xff09;与STM32内存管理对比 三、Lin…

在MacOS 10.15上使用MongoDB

这次是在MacOS 10.15上使用MongoDB。先在豆包问支持MacOS 10.15的MongoDB最新版是什么&#xff0c;答案是MongoDB 5.0。 抱着谨慎怀疑的态度去官方网站查询了一下&#xff0c;答案如下 MongoDB 7.x支持的最低版本MacOS是11MongoDB 6.x支持的最低版本MacOS是10.14 又找deepsee…

思二勋:未来所有的业务都将生于AI、长于AI、成于AI

每个时代都有其标志性的技术&#xff0c;每个技术的产生或极大地解放了个体的劳动力&#xff0c;提高了个体与组织之间的协作效率&#xff0c;或极大地促进了生产效率或使用体验&#xff0c;或将极大地优化了资源配置和供需匹配效率&#xff0c;从而提高人们的生活水平。从青铜…

混合专家模型(MoE):助力大模型实现高效计算

引言 近年来&#xff0c;大模型的参数规模不断攀升&#xff0c;如何在保证性能的前提下降低计算成本和显存消耗&#xff0c;成为业界关注的重点问题。混合专家模型&#xff08;Mixture of Experts, MoE&#xff09;应运而生&#xff0c;通过“分而治之”的设计理念&#xff0c…

【学习笔记】计算机网络(七)—— 网络安全

第7章 网络安全 文章目录 第7章 网络安全7.1 网络安全问题概述7.1.1 计算机网络面临的安全性威胁7.1.2 安全的计算机网络7.1.3 数据加密模型 7.2 两类密码体制7.2.1 对称密钥密码体制7.2.2 公钥密码体制 7.3 鉴别7.3.1 报文鉴别7.3.2 实体鉴别 7.4 密钥分配7.4.1 对称密钥的分配…

预测分析(四):面向预测分析的神经网络简介

文章目录 面向预测分析的神经网络简介神经网络模型1. 基本概念2. 前馈神经网络3. 常见激活函数4. 循环神经网络&#xff08;RNN&#xff09;5. 卷积神经网络&#xff08;CNN&#xff09; MPL结构工作原理激活函数训练方法 基于神经网络的回归——以钻石为例构建预测钻石价格的M…

LLaMA-Factory大模型微调全流程指南

该文档为LLaMA-Factory大模型微调提供了完整的技术指导&#xff0c;涵盖了从环境搭建到模型训练、推理和合并模型的全流程&#xff0c;适用于需要进行大模型预训练和微调的技术人员。 一、docker 容器服务 请参考如下资料制作 docker 容器服务&#xff0c;其中&#xff0c;挂…