Reinforcement Learning with Code 【Chapter 9. Policy Gradient Methods】

news2025/1/24 2:11:47

Reinforcement Learning with Code

This note records how the author begin to learn RL. Both theoretical understanding and code practice are presented. Many material are referenced such as ZhaoShiyu’s Mathematical Foundation of Reinforcement Learning, .

文章目录

  • Reinforcement Learning with Code
    • Chapter 9. Policy Gradient Methods
      • 9.1 Basic idea of policy gradient
      • 9.2 Metrics to define optimal policies
      • 9.3 Gradients of the metrics
      • 9.4 Policy gradient by Monte Carlo estimation: REINFORCE
    • Reference

Chapter 9. Policy Gradient Methods

​ The idea of function approximation can be applied to represent not only state/action values but also policies. Up to now in this book, policies have been represented by tables: the action probabilities of all states are stored in a table π ( a ∣ s ) \pi(a|s) π(as), each entry of which is indexed by a state and an action. In this chapter, we show that polices can be represented by parameterized functions denoted as π ( a ∣ s , θ ) \pi(a|s,\theta) π(as,θ), where θ ∈ R m \theta\in\mathbb{R}^m θRm is a parameter vector. The function representation is also sometimes written as π ( a , s , θ ) , π θ ( a ∣ s ) , \textcolor{blue}{\pi(a,s,\theta)},\textcolor{blue}{\pi_\theta(a|s)}, π(a,s,θ),πθ(as), or π θ ( a , s ) \textcolor{blue}{\pi_\theta(a,s)} πθ(a,s).

​ When policies are represented as a function, optimal policies can be found by optimizing certain scalar metrics. Such kind of method is called policy gradient.

9.1 Basic idea of policy gradient

How to define optimal policies? When represented as a table, a policy π \pi π is defined as optimal if it can maximize every state value. When represented by a function, a policy π \pi π is fully determined by θ \theta θ together with the function strcuture. The policy is defined as optimal if it can maximize certain scalar metrics, which we will introduce later.

How to update policies? When represented as a table, a plicy π \pi π can be updated by directly changing the entries in the table. However, when represented by a parameterized function, a policy π \pi π cannot be updated in this way anymore. Instead, it can only be improved by updating the parameter θ \theta θ. We can use gradient-based method to optimize some metrics to update the parameter θ \theta θ.

9.2 Metrics to define optimal policies

​ The first metric is the average state value or simply called average value. Let

v π = [ ⋯   , v π ( s ) , ⋯   ] T ∈ R ∣ S ∣ d π = [ ⋯   , d π ( s ) , ⋯   ] T ∈ R ∣ S ∣ v_\pi = [\cdots, v_\pi(s), \cdots]^T \in \mathbb{R}^{|\mathcal{S}|} \\ d_\pi = [\cdots, d_\pi(s), \cdots]^T \in \mathbb{R}^{|\mathcal{S}|} vπ=[,vπ(s),]TRSdπ=[,dπ(s),]TRS

be the vector of state values and a vector of distribution of state value, respectively. Here, d π ( s ) ≥ 0 d_\pi(s)\ge 0 dπ(s)0 is the weight for state s s s and satisfies ∑ s d π ( s ) = 1 \sum_s d_\pi(s)=1 sdπ(s)=1. The metric of average value is defined as

v ˉ π ≜ d π T v π = ∑ s d π ( s ) v π ( s ) = E [ v π ( S ) ] \begin{aligned} \textcolor{red}{\bar{v}_\pi} & \textcolor{red}{\triangleq d_\pi^T v_\pi} \\ & \textcolor{red}{= \sum_s d_\pi(s)v_\pi(s)} \\ & \textcolor{red}{= \mathbb{E}[v_\pi(S)]} \end{aligned} vˉπdπTvπ=sdπ(s)vπ(s)=E[vπ(S)]

where S ∼ d π S \sim d_\pi Sdπ. As its name suggests, v ˉ π \bar{v}_\pi vˉπ is simply a weighted average of the state values. The distribution d π ( s ) d_\pi(s) dπ(s) statisfies stationary distribution by sovling the equation

d π T P π = d π T d^T_\pi P_\pi = d^T_\pi dπTPπ=dπT

where P π P_\pi Pπ is the state transition probability matrix.

​ The second metrics is the average one-step rewrad or simply called average reward. Let

r π = [ ⋯   , r π ( s ) , ⋯   ] T ∈ R ∣ S ∣ r_\pi = [\cdots, r_\pi(s),\cdots]^T \in \mathbb{R}^{|\mathcal{S}|} rπ=[,rπ(s),]TRS

be the vector of one-step immediate rewards. Here

r π ( s ) = ∑ a π ( a ∣ s ) r ( s , a ) r_\pi(s) = \sum_a \pi(a|s)r(s,a) rπ(s)=aπ(as)r(s,a)

is the average of the one-step immediate reward that can be obtained starting from state s s s, and r ( s , a ) = E [ R ∣ s , a ] = ∑ r r p ( r ∣ s , a ) r(s,a)=\mathbb{E}[R|s,a]=\sum_r r p(r|s,a) r(s,a)=E[Rs,a]=rrp(rs,a) is the average of the one-step immediate reward that can be obtained after taking action a a a at state s s s. Then the metric is defined as

r ˉ π ≜ d π T r π = ∑ s d π ( s ) ∑ a π ( a ∣ s ) ∑ r r p ( r ∣ s , a ) = ∑ s d π ( s ) ∑ a π ( a ∣ s ) r ( s , a ) = ∑ s d π ( s ) r π ( s ) = E [ r π ( S ) ] \begin{aligned} \textcolor{red}{\bar{r}_\pi} & \textcolor{red}{\triangleq d_\pi^T r_\pi} \\ & \textcolor{red}{= \sum_s d_\pi(s)\sum_a \pi(a|s) \sum_r r p(r|s,a) } \\ & \textcolor{red}{= \sum_s d_\pi(s)\sum_a \pi(a|s)r(s,a) } \\ & \textcolor{red}{= \sum_s d_\pi(s)r_\pi(s)} \\ & \textcolor{red}{= \mathbb{E}[r_\pi(S)]} \end{aligned} rˉπdπTrπ=sdπ(s)aπ(as)rrp(rs,a)=sdπ(s)aπ(as)r(s,a)=sdπ(s)rπ(s)=E[rπ(S)]

where S ∼ d π S\sim d_\pi Sdπ. As its name suggests, r ˉ π \bar{r}_\pi rˉπ is simply a weighted average of the one-step immediate rewards.

​ The third metric is the state value of a specific starting state v π ( s 0 ) v_\pi(s_0) vπ(s0). For some tasks, we can only start from a specific state s 0 s_0 s0. In this case, we only care about the long-term return starting from s 0 s_0 s0. This metric can also be viewed as a weighted average of the state values.

v π ( s 0 ) = ∑ s ∈ S d 0 ( s ) v π ( s ) \textcolor{red}{v_\pi(s_0) = \sum_{s\in\mathcal{S}} d_0(s) v_\pi(s)} vπ(s0)=sSd0(s)vπ(s)

where d 0 ( s = s 0 ) = 1 , d 0 ( s ≠ s 0 ) = 0 d_0(s=s_0)=1, d_0(s\ne s_0)=0 d0(s=s0)=1,d0(s=s0)=0.

​ We aim to search different value of parameter θ \theta θ to maximize these metrics.

9.3 Gradients of the metrics

Theorem 9.1 (Policy gradient theorem). The gradient of the average reward r ˉ π \bar{r}_\pi rˉπ metric is

∇ θ r ˉ π ( θ ) ≃ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) \textcolor{blue}{\nabla_\theta \bar{r}_\pi(\theta) \simeq \sum_s d_\pi(s)\sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a)} θrˉπ(θ)sdπ(s)aθπ(as,θ)qπ(s,a)

where ∇ θ π \nabla_\theta \pi θπ is the gradient of π \pi π with respect to θ \theta θ. Here ≃ \simeq refers to either strict equality or approximated equality. In particular, it is a strict equation in the undiscounted case where γ = 1 \gamma=1 γ=1 and an approximated equation in the discounted case where 0 < γ < 1 0<\gamma<1 0<γ<1. The approximation is more accurate in the discounted case when γ \gamma γ is closer to 1 1 1. Moreover, the equation has a more compact and useful form expressed in terms of expectation:

∇ θ r ˉ π ( θ ) ≃ E [ ∇ θ ln ⁡ π ( A ∣ S , θ ) q π ( S , A ) ] \textcolor{red}{\nabla_\theta \bar{r}_\pi(\theta) \simeq \mathbb{E} [\nabla_\theta \ln \pi(A|S,\theta)q_\pi(S,A)]} θrˉπ(θ)E[θlnπ(AS,θ)qπ(S,A)]

where ln ⁡ \ln ln is the natural logarithm and S ∼ d π , A ∼ π ( S ) S\sim d_\pi, A\sim \pi(S) Sdπ,Aπ(S).

​ Why the two equations mentioned above is equivalent? Here is the derivation process

∇ θ r ˉ π ( θ ) ≃ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) = E [ ∑ a ∇ θ π ( a ∣ S , θ ) q π ( S , a ) ] \begin{aligned} \nabla_\theta \bar{r}_\pi(\theta) & \simeq \sum_s d_\pi(s)\sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a) \\ & = \mathbb{E}\Big[ \sum_a \nabla_\theta \pi(a|S,\theta) q_\pi(S,a) \Big] \end{aligned} θrˉπ(θ)sdπ(s)aθπ(as,θ)qπ(s,a)=E[aθπ(aS,θ)qπ(S,a)]

where S ∼ d π ( s ) S \sim d_\pi(s) Sdπ(s). Furthermore, consider the function ln ⁡ π \ln\pi lnπ where ln ⁡ \ln ln is the natural algorithm.

∇ θ ln ⁡ π ( a ∣ s , θ ) = ∇ θ π ( a ∣ s , θ ) π ( a ∣ s , θ ) → ∇ θ π ( a ∣ s , θ ) = π ( a ∣ s , θ ) ∇ θ ln ⁡ π ( a ∣ s , θ ) \begin{aligned} \nabla_\theta \ln \pi (a|s,\theta) & = \frac{\nabla_\theta \pi(a|s,\theta)}{\pi(a|s,\theta)} \\ \to \nabla_\theta \pi(a|s,\theta) &= \pi(a|s,\theta) \nabla_\theta \ln \pi (a|s,\theta) \end{aligned} θlnπ(as,θ)θπ(as,θ)=π(as,θ)θπ(as,θ)=π(as,θ)θlnπ(as,θ)

By substituting

∇ θ r ˉ π ( θ ) = E [ ∑ a ∇ θ π ( a ∣ S , θ ) q π ( S , a ) ] = E [ ∑ a π ( a ∣ S , θ ) ∇ θ ln ⁡ π ( a ∣ S , θ ) q π ( S , a ) ] = E [ ∇ θ ln ⁡ π ( A ∣ S , θ ) q π ( S , A ) ] \begin{aligned} \nabla_\theta \bar{r}_\pi(\theta) & = \mathbb{E}\Big[ \sum_a \nabla_\theta \pi(a|S,\theta) q_\pi(S,a) \Big] \\ & = \mathbb{E}\Big[ \sum_a \pi(a|S,\theta) \nabla_\theta \ln \pi (a|S,\theta) q_\pi(S,a) \Big] \\ & = \mathbb{E}\Big[ \nabla_\theta \ln \pi (A|S,\theta) q_\pi(S,A) \Big] \end{aligned} θrˉπ(θ)=E[aθπ(aS,θ)qπ(S,a)]=E[aπ(aS,θ)θlnπ(aS,θ)qπ(S,a)]=E[θlnπ(AS,θ)qπ(S,A)]

where A ∼ π ( s , θ ) A \sim \pi(s,\theta) Aπ(s,θ).

​ Next we will show the metrics average one-step reward r ˉ π \bar{r}_\pi rˉπ and average state value v ˉ π \bar{v}_\pi vˉπ is equivalent. When discounted rate γ ∈ [ 0 , 1 ) \gamma\in[0,1) γ[0,1) is given, that

r ˉ π = ( 1 − γ ) v ˉ π \textcolor{blue}{\bar{r}_\pi = (1-\gamma)\bar{v}_\pi} rˉπ=(1γ)vˉπ

Proof, note that v ˉ π ( θ ) = d π T v π \bar{v}_\pi(\theta)=d^T_\pi v_\pi vˉπ(θ)=dπTvπ and r ˉ = d π T r π \bar{r}=d^T_\pi r_\pi rˉ=dπTrπ, where v π v_\pi vπ and r π r_\pi rπ statisfy the Bellman equation v π = r π + γ P π v π v_\pi=r_\pi + \gamma P_\pi v_\pi vπ=rπ+γPπvπ. Then multiplying d π T d_\pi^T dπT on the both left sides of the Bellman equation gives

v ˉ π = r ˉ π + γ d π T P π v π = r ˉ π + γ d π T v π = r ˉ π + γ v ˉ π \bar{v}_\pi = \bar{r}_\pi + \gamma d^T_\pi P_\pi v_\pi = \bar{r}_\pi + \gamma d^T_\pi v_\pi = \bar{r}_\pi + \gamma \bar{v}_\pi vˉπ=rˉπ+γdπTPπvπ=rˉπ+γdπTvπ=rˉπ+γvˉπ

which implies r ˉ π = ( 1 − γ ) v ˉ π \bar{r}_\pi = (1-\gamma)\bar{v}_\pi rˉπ=(1γ)vˉπ.

Theorem 9.2 (Gradient of v π ( s 0 ) v_\pi(s_0) vπ(s0) in the discounted case). In the discounted case where γ ∈ [ 0 , 1 ) \gamma \in [0,1) γ[0,1), the gradients of v π ( s 0 ) v_\pi(s_0) vπ(s0) is

∇ θ v π ( s 0 ) = E [ ∇ θ ln ⁡ π ( A ∣ S , θ ) q π ( S , A ) ] \nabla_\theta v_\pi(s_0) = \mathbb{E}[\nabla_\theta \ln \pi(A|S, \theta)q_\pi(S,A)] θvπ(s0)=E[θlnπ(AS,θ)qπ(S,A)]

where S ∼ ρ π S \sim \rho_\pi Sρπ and A ∼ π ( s , θ ) A \sim \pi(s,\theta) Aπ(s,θ). Here, the state distribution ρ π \rho_\pi ρπ is

ρ π ( s ) = Pr ⁡ π ( s ∣ s 0 ) = ∑ k = 0 γ k Pr ⁡ ( s 0 → s , k , π ) = [ ( I n − γ P π ) − 1 ] s 0 , s \rho_\pi(s) = \Pr_\pi (s|s_0) = \sum_{k=0} \gamma^k \Pr (s_0\to s, k, \pi) = [(I_n - \gamma P_\pi)^{-1}]_{s_0,s} ρπ(s)=πPr(ss0)=k=0γkPr(s0s,k,π)=[(InγPπ)1]s0,s

which is the discounted total probability transiting from s 0 s_0 s0 to s s s under policy π \pi π.

Theorem 9.3 (Gradient of v ˉ π \bar{v}_\pi vˉπ and r ˉ π \bar{r}_\pi rˉπ in the discounted case). In the discounted case where γ ∈ [ 0 , 1 ) \gamma \in [0,1) γ[0,1), the gradients of v ˉ π \bar{v}_\pi vˉπ and r ˉ π \bar{r}_\pi rˉπ are, respectively,

∇ θ v ˉ π ≈ 1 1 − γ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) ∇ θ r ˉ π ≈ ∑ s d π ( s ) ∑ a ∇ θ π ( a ∣ s , θ ) q π ( s , a ) \begin{aligned} \nabla_\theta \bar{v}_\pi & \approx \frac{1}{1-\gamma} \sum_s d_\pi(s) \sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a) \\ \nabla_\theta \bar{r}_\pi & \approx \sum_s d_\pi(s) \sum_a \nabla_\theta \pi(a|s,\theta) q_\pi(s,a) \end{aligned} θvˉπθrˉπ1γ1sdπ(s)aθπ(as,θ)qπ(s,a)sdπ(s)aθπ(as,θ)qπ(s,a)

where the approximations are more accurate when γ \gamma γ is closer to 1 1 1.

9.4 Policy gradient by Monte Carlo estimation: REINFORCE

​ Consider J ( θ ) = r ˉ π ( θ ) J(\theta) = \bar{r}_\pi(\theta) J(θ)=rˉπ(θ) or v π ( s 0 ) v_\pi(s_0) vπ(s0). The gradient-ascent algorithm maximizing J ( θ ) J(\theta) J(θ) is

θ t + 1 = θ t + α ∇ θ J ( θ ) = θ t + α E [ ∇ θ ln ⁡ π ( A ∣ S , θ t ) q π ( S , A ) ] \begin{aligned} \theta_{t+1} & = \theta_t + \alpha \nabla_\theta J(\theta) \\ & = \theta_t + \alpha \mathbb{E}[\nabla_\theta \ln\pi(A|S,\theta_t) q_\pi(S,A)] \end{aligned} θt+1=θt+αθJ(θ)=θt+αE[θlnπ(AS,θt)qπ(S,A)]

where α > 0 \alpha>0 α>0 is a constant learning rate. Since the expected value on the right-hand side is unknown, we can replace the expected value with a sample (the idea of stochastic gradient). Then we have

θ t + 1 = θ t + α ∇ θ ln ⁡ π ( a t ∣ s t , θ t ) q π ( s t , a t ) \theta_{t+1} = \theta_t + \alpha \nabla_\theta \ln\pi(a_t|s_t,\theta_t) q_\pi(s_t,a_t) θt+1=θt+αθlnπ(atst,θt)qπ(st,at)

However this cannot be implemented because q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at) is the true value we can’t obtain. Hence, we use q t ( s t , a t ) q_t(s_t,a_t) qt(st,at) to estimate the true action value q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at).

θ t + 1 = θ t + α ∇ θ ln ⁡ π ( a t ∣ s t , θ t ) q t ( s t , a t ) \theta_{t+1} = \theta_t + \alpha \nabla_\theta \ln\pi(a_t|s_t,\theta_t) q_t(s_t,a_t) θt+1=θt+αθlnπ(atst,θt)qt(st,at)

If q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at) is approximated by Monte Carlo estimation,

q π ( s t , a t ) ≜ E [ G t ∣ S t = s t , A t = a t ] ≈ 1 n ∑ i = 1 n g ( i ) ( s t , a t ) \begin{aligned} q_\pi(s_t,a_t) & \triangleq \mathbb{E}[G_t|S_t=s_t, A_t=a_t] \\ & \textcolor{blue}{\approx \frac{1}{n} \sum_{i=1}^n g^{(i)}(s_t,a_t)} \\ \end{aligned} qπ(st,at)E[GtSt=st,At=at]n1i=1ng(i)(st,at)

with stochastic approximation we don’t need to collect n n n episode start from ( s t , a t ) (s_t,a_t) (st,at) to approximate q π ( s t , a t ) q_\pi(s_t,a_t) qπ(st,at), we just need a discounted return starting from ( s t , a t ) (s_t,a_t) (st,at)

q π ( s t , a t ) ≈ q t ( a t , a t ) = ∑ k = t + 1 T γ k − t − 1 r k q_\pi(s_t,a_t) \approx q_t(a_t,a_t) = \sum_{k=t+1}^T \gamma^{k-t-1}r_k qπ(st,at)qt(at,at)=k=t+1Tγkt1rk

The algorithm is called REINFORCE.

Pseudocode:

Image

Reference

赵世钰老师的课程

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/807804.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

C++之实例化对象总结(一百七十)

简介&#xff1a; CSDN博客专家&#xff0c;专注Android/Linux系统&#xff0c;分享多mic语音方案、音视频、编解码等技术&#xff0c;与大家一起成长&#xff01; 优质专栏&#xff1a;Audio工程师进阶系列【原创干货持续更新中……】&#x1f680; 人生格言&#xff1a; 人生…

13 亿美金买个寂寞?No!AI 时代的数据行业蓄势待发

6月底&#xff0c;全球数据分析领域彻底炸锅了。 两大数据分析企业Databricks和Snowflake纷纷将目光瞄准了AI大模型。要知道&#xff0c;这两位对手平时没少对台戏&#xff0c;为性能、产品和技术经常开撕。但在今年的自家大会上&#xff0c;两家企业却出奇的一致&#xff0c;…

opencv+ffmpeg环境(ubuntu)搭建全面详解

一.先讲讲opencv和ffmpeg之间的关系 1.1它们之间的联系 我们知道opencv主要是用来做图像处理的&#xff0c;但也包含视频解码的功能&#xff0c;而在视频解码部分的功能opencv是使用了ffmpeg。所以它们都是可以处理图像和视频的编解码&#xff0c;我个人感觉两个的侧重点不一…

【博客684】Multi-regional高可用模式部署VictoriaMetrics

Multi-regional模式部署VictoriaMetrics 整体架构图 每个工作负载区域&#xff08;地球、火星、金星&#xff09;都有一个 vmagent&#xff0c;通过监控设置将数据发送到多个区域。监控设置&#xff08;地面控制 1,2&#xff09;包含 VictoriaMetrics 时间序列数据库 (TSDB) 集…

四姑娘山三日游

趁着小孩放暑假&#xff0c;从昆明回来之后&#xff0c;直接自驾到四姑娘山。 第一天 成都-四川省阿坝藏族羌族自治州小金县日隆镇(20230711) 大概9:30从成都市郫都区出发&#xff0c;路线如下&#xff1a;郫都—都江堰–映秀—耿达—卧龙—四姑娘山&#xff0c;中途翻过巴朗…

Notepad++工具通过正则表达式批量替换内容

1.每行末尾新增特定字符串 CtrlH弹出小窗口&#xff1b;查找目标输入$&#xff0c;替换为输入特定字符串&#xff1b;选中循环查找&#xff0c;查找模式选正则表达式&#xff1b;最后点击全部替换 2.每行行首新增特定字符串 CtrlH弹出小窗口&#xff1b;查找目标输入^&…

会议OA之我的会议(会议排座送审)

目录 前言&#xff1a; 2.我的会议&#xff1a; 2.1实现的特色功能&#xff1a; 2.2思路&#xff1a; 2.3功能实现&#xff1a; 我的会议页面&#xff1a;myMeeting.jsp myMeeting.js Dao方法 在mvc中配置info信息 Meeting InfoAction 2.4会议排座的思路&#xff1a; …

第四代SHARC® ADSP-21479KBCZ-2A、ADSP-21479BSWZ-2A、ADSP-21479KSWZ-2A高性能DSP(数字信号处理器)

第四代SHARC Processors 现在内置低功耗浮点DSP产品&#xff08;ADSP-21478和ADSP-21479&#xff09;&#xff0c;可提供改进的性能、基于硬件的滤波器加速器、面向音频与应用的外设以及能够支持单芯片解决方案的新型存储器配置。所有器件都彼此引脚兼容&#xff0c;而且与以往…

【Android知识笔记】UI体系(二)

什么是UI线程? 常说的UI线程到底是哪个线程?UI线程一定是主线程吗? 下面先给出两条确定的结论: UI线程就是刷新UI所在的线程UI是单线程刷新的关于第二条为什么UI只能是单线程刷新的呢?道理很简单,因为多线程访问的话需要加锁,太卡,所以一般系统的UI框架都是采用单线程…

《重构的时机和方法》,值得程序员仔细研读的一本书

现有代码结构及框架沿用的比较久&#xff0c;持续在其上新增功能&#xff0c;可维护性与可扩展性变得越来越差&#xff0c;随着需求不断增加&#xff0c;现有代码变得越来越臃肿复杂&#xff0c;变得很难维护&#xff0c;甚至出现较严重的性能瓶颈&#xff0c;一般这个时候我们…

Thymeleaf入门

Thymeleaf是前端开发模板&#xff0c;springboot默认支持。前端模板用法大多数是类似的jsp、thymeleaf、vue.js都有while\for\if\switch等使用&#xff0c;页面组件化等。 1.前端模板区别 jsp是前后端完全不分离的&#xff0c;jsp页面写一堆Java逻辑。 thymeleaf好处是html改…

域名解析优先级

浏览器访问过程解析 访问网址——>首先在本地电脑看看hosts里面是否有域名对应IP地址&#xff0c;如何有直接访问对应IP&#xff0c; 如果没有&#xff0c;则联网询问DNS服务器&#xff08;一般网卡那边都配置了DNS服务器IP&#xff09; linux hosts 路径&#xff1a; w…

苍穹外卖-day07

苍穹外卖-day07 本项目学自黑马程序员的《苍穹外卖》项目&#xff0c;是瑞吉外卖的Plus版本 功能更多&#xff0c;更加丰富。 结合资料&#xff0c;和自己对学习过程中的一些看法和问题解决情况上传课件笔记 视频&#xff1a;https://www.bilibili.com/video/BV1TP411v7v6/?sp…

中国气象局:到2030年,人工智能在气象应用领域取得世界领先地位

最近&#xff0c;中国气象局发布了《2023-2030年人工智能气象应用工作方案》&#xff0c;旨在加快推进国内人工智能气象应用技术体系建设&#xff0c;提升基础支撑能力&#xff0c;构建健全的人工智能气象应用政策环境&#xff0c;促进人工智能技术在气象观测、预报和服务领域的…

华为H12-821更新了32题,大家注意了

&#xff08;多选题&#xff09;使用堆叠和集群技术构建园区网的优势包括以下哪些项&#xff1f; A、业务中断时间大大减少 B、简化网络管理&#xff0c;降低网络部署规划的复杂度 C、可有效减少网络功耗 D、提高网络设备和链路的利用率 正确答案是…

教雅川学缠论02-K线

传统行情上的K线是下图中这样子的 而在缠论中K线是下面这样子的&#xff0c;它没有上影线和下影线 下图是武汉控股2023年7月的日K线 接下来我们将它转换成缠论K线&#xff08;画图累死我了&#xff09; K线理解了我们才能进行下一步&#xff0c;目前位置应该很好理解的

C++笔记之vector的resize()和clear()用法

C笔记之vector的resize()和clear()用法 code review! 文章目录 C笔记之vector的resize()和clear()用法1.resize()2.clear() 1.resize() 运行 2.clear() 运行

Python自动计算Excel数据指定范围内的区间最大值

本文介绍基于Python语言&#xff0c;基于Excel表格文件内某一列的数据&#xff0c;计算这一列数据在每一个指定数量的行的范围内&#xff08;例如每一个4行的范围内&#xff09;的区间最大值的方法。 已知我们现有一个.csv格式的Excel表格文件&#xff0c;其中有一列数据&#…

设计模式行为型——责任链模式

目录 什么是责任链模式 责任链模式的实现 责任链模式角色 责任链模式类图 责任链模式举例 责任链模式代码实现 责任链模式的特点 优点 缺点 使用场景 注意事项 实际应用 什么是责任链模式 责任链模式&#xff08;Chain of Responsibility Pattern&#xff09;又叫职…

【面试题】前端中 JS 发起的请求可以暂停吗?

这个问题非常有意思&#xff0c;我一看到就想了很多可以回复的答案&#xff0c;但是评论区太窄&#xff0c;就直接开一篇文章来写了。 审题 JS 发起的请求可以暂停吗&#xff1f;这一句话当中有两个概念需要明确&#xff0c;一是什么样的状态才能称之为 暂停&#xff1f;二是…