线性规划
方法有单纯形法(简单,非多项式),椭圆法(复杂,多项式,仅有理论价值),内点法(非多项式,实际效率高)。
以例子说明,目标函数
min
z
=
0
x
1
+
0.1
x
2
+
0.2
x
3
+
0.3
x
4
+
0.8
x
5
\min z = 0 x_1 + 0.1 x_2 + 0.2 x_3 + 0.3 x_4 + 0.8 x_5
minz=0x1+0.1x2+0.2x3+0.3x4+0.8x5
约束条件
s
.
t
.
{
x
1
+
2
x
2
+
x
4
=
100
2
x
3
+
2
x
4
+
x
5
=
100
3
x
1
+
x
2
+
2
x
3
+
3
x
5
=
100
x
1
,
x
2
,
x
3
,
x
4
,
x
5
≥
0
s.t. \begin{cases} x_1 + 2 x_2 + x_4 = 100 \\ 2 x_3 + 2 x_4 + x_5 = 100 \\ 3 x_1 + x_2 + 2 x_3 + 3 x_5 = 100 \\ x_1, x_2, x_3, x_4, x_5 ≥ 0 \end{cases}
s.t.⎩
⎨
⎧x1+2x2+x4=1002x3+2x4+x5=1003x1+x2+2x3+3x5=100x1,x2,x3,x4,x5≥0
代码:
using JuMP, GLPK
m = Model(GLPK.Optimizer)
@variable(m, x[1:5] >= 0, integer = true)
@objective(m, Min, 0.1x[2] + 0.2x[3] + 0.3x[4] + 0.8x[5])
@constraint(m, constraint1, x[1] + 2x[2] + x[4] == 100)
@constraint(m, constraint2, 2x[3] + 2x[4] + x[5] == 100)
@constraint(m, constraint3, 3x[1] + x[2] + 2x[3] + 3x[5] == 100)
#print(m)
JuMP.optimize!(m)
#println("Optimal Soutions:")
for i in 1:5
println("x$i = ", JuMP.value(x[i]))
end
println("z = ", JuMP.objective_value(m))
结果:
x1 = 30.0
x2 = 10.0
x3 = 0.0
x4 = 50.0
x5 = 0.0
z = 16.0
一维搜索
黄金分割和 Fibonacci 法
取
α
<
λ
<
μ
<
β
\alpha < \lambda < \mu < \beta
α<λ<μ<β 和
x
∈
[
α
,
β
]
x \in [\alpha, \beta]
x∈[α,β] 上单谷函数
Φ
(
x
)
\Phi(x)
Φ(x),求其最值
x
∗
x^*
x∗,则有:
一,若
Φ
(
λ
)
≥
Φ
(
μ
)
\Phi(\lambda) ≥ \Phi(\mu)
Φ(λ)≥Φ(μ),则
x
∗
∈
[
λ
,
β
]
x^* \in [\lambda, \beta]
x∗∈[λ,β];
二,若
Φ
(
λ
)
<
Φ
(
μ
)
\Phi(\lambda) < \Phi(\mu)
Φ(λ)<Φ(μ),则
x
∗
∈
[
α
,
μ
]
x^* \in [\alpha, \mu]
x∗∈[α,μ];
如此迭代缩减区间长度,进行精确一维搜索。
对称性推导
通过选择不同的 λ , μ \lambda, \mu λ,μ 有不同策略,而一个好的策略保证对称性 λ − α = β − μ \lambda - \alpha = \beta - \mu λ−α=β−μ,下面推导之。
设归一化区间参数
p
=
λ
−
α
β
−
α
,
q
=
β
−
μ
β
−
α
p = \frac{\lambda - \alpha}{\beta - \alpha}, q = \frac{\beta - \mu}{\beta - \alpha}
p=β−αλ−α,q=β−αβ−μ,则任一个策略即可表示为点对
(
p
,
q
)
(p, q)
(p,q),且有
p
,
q
∈
(
0
,
1
)
p, q \in (0, 1)
p,q∈(0,1) 和
p
+
q
<
1
p + q < 1
p+q<1,不妨设
x
∗
x^*
x∗ 在
[
α
,
β
]
[\alpha, \beta]
[α,β] 上均匀分布,则分以下三种情况:
一,
x
∗
∈
[
α
,
λ
]
x^* \in [\alpha, \lambda]
x∗∈[α,λ],概率为
p
p
p,此时必然有
Φ
(
λ
)
<
Φ
(
μ
)
\Phi(\lambda) < \Phi(\mu)
Φ(λ)<Φ(μ),下轮区间缩减为
[
α
,
μ
]
[\alpha, \mu]
[α,μ];
二,
x
∗
∈
[
μ
,
β
]
x^* \in [\mu, \beta]
x∗∈[μ,β],概率为
q
q
q,此时必然有
Φ
(
λ
)
>
Φ
(
μ
)
\Phi(\lambda) > \Phi(\mu)
Φ(λ)>Φ(μ),下轮区间缩减为
[
λ
,
β
]
[\lambda, \beta]
[λ,β];
三,
x
∗
∈
[
λ
,
μ
]
x^* \in [\lambda, \mu]
x∗∈[λ,μ],概率为
1
−
p
−
q
1 - p - q
1−p−q,此时不妨设
Φ
(
λ
)
<
Φ
(
μ
)
\Phi(\lambda) < \Phi(\mu)
Φ(λ)<Φ(μ) 和
Φ
(
λ
)
>
Φ
(
μ
)
\Phi(\lambda) > \Phi(\mu)
Φ(λ)>Φ(μ) 发生的概率各占一半。
用缩减比
t
(
p
,
q
)
t(p, q)
t(p,q) 来表示策略的好坏,可由下轮区间长度与本轮区间长度的比值得到。综上,可分为两种情况:
情况一,下轮区间缩减为
[
α
,
μ
]
[\alpha, \mu]
[α,μ] 的概率为
1
+
p
−
q
2
\frac{1 + p - q}{2}
21+p−q,缩减比为
1
−
q
1 - q
1−q;
情况二,下轮区间缩减为
[
λ
,
β
]
[\lambda, \beta]
[λ,β] 的概率为
1
+
q
−
p
2
\frac{1 + q - p}{2}
21+q−p,缩减比为
1
−
p
1 - p
1−p;
则有
t
(
p
,
q
)
=
(
1
−
q
)
1
+
p
−
q
2
+
(
1
−
p
)
1
+
q
−
p
2
=
2
−
(
p
+
q
)
+
(
p
−
q
)
2
2
t(p, q) = (1 - q) \frac{1 + p - q}{2} + (1 - p) \frac{1 + q - p}{2} = \frac{2 - (p + q) + (p - q)^2}{2}
t(p,q)=(1−q)21+p−q+(1−p)21+q−p=22−(p+q)+(p−q)2
而好的策略意味着低的缩减比。
可得在 p + q p+q p+q 一定的条件下,好的策略保证 p = q p = q p=q,即对称性 λ − α = β − μ \lambda - \alpha = \beta - \mu λ−α=β−μ。
黄金缩减比推导
经过以上讨论,若只考虑缩减比最优,那 p = q = 1 2 − ϵ p = q = \frac{1}{2} - \epsilon p=q=21−ϵ 无疑是最优策略。可实际上,计算 Φ ( λ ) \Phi(\lambda) Φ(λ) 和 Φ ( μ ) \Phi(\mu) Φ(μ) 在大多数情况下都是该算法流程中的主要时间消耗,而区间缩减为 [ α , μ ] [\alpha, \mu] [α,μ] 的情况下 Φ ( λ ) \Phi(\lambda) Φ(λ) 是可以作为下轮计算的已知量的,同理区间缩减为 [ λ , β ] [\lambda, \beta] [λ,β] 的情况下 Φ ( μ ) \Phi(\mu) Φ(μ) 也是可以作为下轮计算的已知量。
换言之,结果区间精度与初始区间精度的比值为 r r r,在最优缩减比 t ∗ = 1 2 + ϵ t^* = \frac{1}{2} + \epsilon t∗=21+ϵ 下每轮计算两次 Φ ( x ) \Phi(x) Φ(x),时间消耗为 T ( t ∗ ) = 2 log t ∗ r T(t^*) = 2 \log_{t^*} r T(t∗)=2logt∗r,若存在缩减比 t ′ t' t′ 下每轮计算一次 Φ ( x ) \Phi(x) Φ(x) 的策略,其时间消耗为 T ( t ′ ) = log t ′ r T(t') = \log_{t'} r T(t′)=logt′r,只需要满足条件 t ′ < t ∗ = 1 2 + ϵ t' < \sqrt{t^*} = \frac{1}{\sqrt{2}} + \epsilon t′<t∗=21+ϵ,即 t ′ ≤ 1 2 ≈ 0.707 t' ≤ \frac{1}{\sqrt{2}} \approx 0.707 t′≤21≈0.707 即可保证该策略比最优缩减比策略更好。
对于本轮区间参数
(
α
,
λ
,
μ
,
β
)
(\alpha, \lambda, \mu, \beta)
(α,λ,μ,β) 和下轮区间参数
(
α
′
,
λ
′
,
μ
′
,
β
′
)
(\alpha', \lambda', \mu', \beta')
(α′,λ′,μ′,β′),保证缩减比
t
=
μ
−
α
β
−
α
=
β
−
λ
β
−
α
=
μ
′
−
α
′
β
′
−
α
′
=
β
′
−
λ
′
β
′
−
α
′
t = \frac{\mu - \alpha}{\beta - \alpha} = \frac{\beta - \lambda}{\beta - \alpha} = \frac{\mu' - \alpha'}{\beta' - \alpha'} = \frac{\beta' - \lambda'}{\beta' - \alpha'}
t=β−αμ−α=β−αβ−λ=β′−α′μ′−α′=β′−α′β′−λ′
的情况下,希望有如下性质:
当
α
′
=
α
,
β
′
=
μ
\alpha' = \alpha, \beta' = \mu
α′=α,β′=μ 时,有
μ
′
=
λ
\mu' = \lambda
μ′=λ;(
λ
′
=
λ
\lambda' = \lambda
λ′=λ 是无法满足的)
当
α
′
=
λ
,
β
′
=
β
\alpha' = \lambda, \beta' = \beta
α′=λ,β′=β 时,有
λ
′
=
μ
\lambda' = \mu
λ′=μ;(
μ
′
=
μ
\mu' = \mu
μ′=μ 是无法满足的)
由于左右两种情况对称,故以
α
′
=
α
,
μ
′
=
λ
,
β
′
=
μ
\alpha' = \alpha, \mu' = \lambda, \beta' = \mu
α′=α,μ′=λ,β′=μ 为例,设此时缩减比为
t
ϕ
t_\phi
tϕ,则有
β
−
λ
=
μ
−
α
=
t
ϕ
(
β
−
α
)
λ
−
α
=
μ
′
−
α
′
=
t
ϕ
(
β
′
−
α
′
)
=
t
ϕ
(
μ
−
α
)
=
t
ϕ
2
(
β
−
α
)
\beta - \lambda = \mu - \alpha = t_\phi(\beta - \alpha) \\ \lambda - \alpha = \mu' - \alpha' = t_\phi(\beta' - \alpha') = t_\phi(\mu - \alpha) = t_\phi^2(\beta - \alpha)
β−λ=μ−α=tϕ(β−α)λ−α=μ′−α′=tϕ(β′−α′)=tϕ(μ−α)=tϕ2(β−α)
得方程
β
−
α
=
(
β
−
λ
)
+
(
λ
−
α
)
=
t
ϕ
(
β
−
α
)
+
t
ϕ
2
(
β
−
α
)
\beta - \alpha = (\beta - \lambda) + (\lambda - \alpha) = t_\phi(\beta - \alpha) + t_\phi^2(\beta - \alpha)
β−α=(β−λ)+(λ−α)=tϕ(β−α)+tϕ2(β−α)
化简之,即
t
ϕ
2
+
t
ϕ
−
1
=
0
t_\phi^2 + t_\phi - 1 = 0
tϕ2+tϕ−1=0
由
t
ϕ
>
0
t_\phi > 0
tϕ>0 舍去负根,得
t
ϕ
=
5
−
1
2
≈
0.618
t_\phi = \frac{\sqrt{5} - 1}{2} \approx 0.618
tϕ=25−1≈0.618 恰为黄金分割比,满足上文条件为最优策略。此即一维搜索的黄金分割法。
Fibonacci 法
设 Fibonacci 数列 F 0 = 0 , F 1 = 1 , F n = F n − 1 + F n − 2 F_0 = 0, F_1 = 1, F_n = F_{n - 1} + F_{n - 2} F0=0,F1=1,Fn=Fn−1+Fn−2,由于 lim n → ∞ F n F n + 1 = 5 − 1 2 \lim_{n \to \infty} \frac{F_n}{F_{n + 1}} = \frac{\sqrt{5} - 1}{2} limn→∞Fn+1Fn=25−1,故可用反向 Fibonacci 数列指导每轮缩减,是为 Fibonacci 法,代码如下:
function FibonacciSection(f, l, u, tol)
fib = [1, 1]
while fib[end] * tol < u - l
push!(fib, fib[end] + fib[end-1])
end
m, n = (fib[end-1] * l + fib[end-2] * u) / fib[end], (fib[end-2] * l + fib[end-1] * u) / fib[end]
for i = length(fib)-1 : -1 : 3
if f(m) > f(n)
l, m = m, n
n = (fib[i-2] * l + fib[i-1] * u) / fib[i]
else
u, n = n, m
m = (fib[i-1] * l + fib[i-2] * u) / fib[i]
end
end
return m
end
二分法
取当前区间
[
α
,
β
]
[\alpha, \beta]
[α,β] 中点
λ
=
α
+
β
2
\lambda = \frac{\alpha + \beta}{2}
λ=2α+β,求导数值
Φ
′
(
λ
)
\Phi'(\lambda)
Φ′(λ),并判断:
若
Φ
′
(
λ
)
>
0
\Phi'(\lambda) > 0
Φ′(λ)>0,区间更新为
[
α
,
λ
]
[\alpha, \lambda]
[α,λ];
若
Φ
′
(
λ
)
<
0
\Phi'(\lambda) < 0
Φ′(λ)<0,区间更新为
[
λ
,
β
]
[\lambda, \beta]
[λ,β];
若
Φ
′
(
λ
)
=
0
\Phi'(\lambda) = 0
Φ′(λ)=0,
λ
\lambda
λ 为最小点,算法结束。
代码:
function BinarySection(f, grad_f, l, u, tol)
m = (l + u) / 2
while u - l > tol
g = grad_f(m)
if g > 0.
u = m
elseif g < 0.
l = m
else
break
end
m = (l + u) / 2
end
return m
end
Shubert-Piyavskii 法
若有 ∣ Φ ( λ ) − Φ ( μ ) ∣ ≤ l ∣ λ − μ ∣ , ∀ λ , μ ∈ [ a , b ] \left| \Phi(\lambda) - \Phi(\mu) \right| ≤ l \left| \lambda - \mu \right|, \forall \lambda, \mu \in [a, b] ∣Φ(λ)−Φ(μ)∣≤l∣λ−μ∣,∀λ,μ∈[a,b],则称函数 Φ ( x ) \Phi(x) Φ(x) 在 [ a , b ] [a, b] [a,b] 上是 l l l-Lipschitz 连续的,即导数的幅度存在上界。
Shubert-Piyavskii 法用斜率为 ± l \pm l ±l 的线段,迭代构造 Φ ( x ) \Phi(x) Φ(x) 的锯齿状下界,代码:
function Shubert_Piyavskii(f, L, l, u, tol)
intersection(p, q) = ((p[1] + q[1]) / 2 + (p[2] - q[2]) / (2 * L), (p[1] - q[1]) * L / 2 + (p[2] + q[2]) / 2)
m = (l + u) / 2
Up = [(l, f(l)), (m, f(m)), (u, f(u))]
Lo = [intersection(Up[1], Up[2]), intersection(Up[2], Up[3])]
f_min = minimum(map(last, Up))
while true
m = argmin(map(last, Lo))
y = f(Lo[m][1])
if f_min > y
f_min = y
if (f_min - Lo[m][2]) * 2 < L * tol
break
end
end
Up = vcat(Up[1 : m], [(Lo[m][1], y)], Up[m+1 : end])
Lo = vcat(Lo[1 : m-1], [intersection(Up[m], Up[m+1]), intersection(Up[m+1], Up[m+2])], Lo[m+1 : end])
end
return Lo[m][1]
end
不精确搜索
考虑从 x ( k ) x^{(k)} x(k) 出发,沿方向 d ( k ) d^{(k)} d(k) 寻找新迭代点 x ( k + 1 ) = x ( k ) + s ( k ) = x ( k ) + λ k d ( k ) x^{(k + 1)} = x^{(k)} + s^{(k)} = x^{(k)} + \lambda_k d^{(k)} x(k+1)=x(k)+s(k)=x(k)+λkd(k) 使得 Φ ( x ( k + 1 ) ) \Phi(x^{(k + 1)}) Φ(x(k+1)) 尽可能小。选取步长增大系数 α > 1 \alpha > 1 α>1 和步长缩短系数 β < 1 \beta < 1 β<1 迭代求步长 λ k \lambda_k λk,直到满足给定规则。
代码:
macro LineSearch(cond2)
quote
alpha, beta, lambda = 1.5, 0.5, 1.
while true
g, s = grad_f(x), lambda * d
if f(x + s) - f(x) > rho * g' * s
lambda = beta * lambda
elseif $cond2
lambda = alpha * lambda
else
break
end
end
return lambda
end |> esc
end
Goldstein(f, grad_f, x, d, rho) = @LineSearch(f(x + s) - f(x) < (1. - rho) * g' * s)
Armijo(f, grad_f, x, d, rho, mu) = @LineSearch(f(x + s) - f(x) < (mu * rho) * g' * s)
Goldstein2(f, grad_f, x, d, rho, sigma) = @LineSearch(f(x + s) - f(x) < sigma * g' * s)
WP(f, grad_f, x, d, rho, sigma) = @LineSearch(grad_f(x + s)' * s < sigma * g' * s)
WP2(f, grad_f, x, d, rho, eta) = @LineSearch(abs(grad_f(x + s)' * d) > -eta * g' * d)
Goldstein 规则
(1)
Φ
(
x
(
k
+
1
)
)
−
Φ
(
x
(
k
)
)
≤
ρ
∇
Φ
T
(
x
(
k
)
)
s
(
k
)
\Phi(x^{(k + 1)}) - \Phi(x^{(k)}) ≤ \rho \nabla \Phi^T(x^{(k)}) s^{(k)}
Φ(x(k+1))−Φ(x(k))≤ρ∇ΦT(x(k))s(k)
(2)
Φ
(
x
(
k
+
1
)
)
−
Φ
(
x
(
k
)
)
≥
(
1
−
ρ
)
∇
Φ
T
(
x
(
k
)
)
s
(
k
)
\Phi(x^{(k + 1)}) - \Phi(x^{(k)}) ≥ (1 - \rho) \nabla \Phi^T(x^{(k)}) s^{(k)}
Φ(x(k+1))−Φ(x(k))≥(1−ρ)∇ΦT(x(k))s(k)
其中
ρ
∈
(
0
,
1
2
)
\rho \in (0, \frac{1}{2})
ρ∈(0,21),实际取 0.1 或更小。
Armijo 规则
(1) 同 Goldstein(1)
(2)
Φ
(
x
(
k
+
1
)
)
−
Φ
(
x
(
k
)
)
≥
μ
∇
Φ
T
(
x
(
k
)
)
s
(
k
)
\Phi(x^{(k + 1)}) - \Phi(x^{(k)}) ≥ \mu \nabla \Phi^T(x^{(k)}) s^{(k)}
Φ(x(k+1))−Φ(x(k))≥μ∇ΦT(x(k))s(k)
其中
μ
∈
(
5
,
10
)
\mu \in (5, 10)
μ∈(5,10)。
Goldstein 改进规则
(1) 同 Goldstein(1)
(2)
Φ
(
x
(
k
+
1
)
)
−
Φ
(
x
(
k
)
)
≥
σ
∇
Φ
T
(
x
(
k
)
)
s
(
k
)
\Phi(x^{(k + 1)}) - \Phi(x^{(k)}) ≥ \sigma \nabla \Phi^T(x^{(k)}) s^{(k)}
Φ(x(k+1))−Φ(x(k))≥σ∇ΦT(x(k))s(k)
其中
σ
∈
(
ρ
,
1
)
\sigma \in (\rho, 1)
σ∈(ρ,1)。
Wolfe-Powell 规则
(1) 同 Goldstein(1)
(2)
∇
Φ
T
(
x
(
k
+
1
)
)
s
(
k
)
≥
σ
∇
Φ
T
(
x
(
k
)
)
s
(
k
)
\nabla \Phi^T(x^{(k + 1)}) s^{(k)} ≥ \sigma \nabla \Phi^T(x^{(k)}) s^{(k)}
∇ΦT(x(k+1))s(k)≥σ∇ΦT(x(k))s(k)
其中
σ
∈
(
ρ
,
1
)
\sigma \in (\rho, 1)
σ∈(ρ,1)。
Wolfe-Powell 改进规则
(1) 同 Goldstein(1)
(2)
∣
∇
Φ
T
(
x
(
k
+
1
)
)
d
(
k
)
∣
≤
−
η
∇
Φ
T
(
x
(
k
)
)
d
(
k
)
\left| \nabla \Phi^T(x^{(k + 1)}) d^{(k)} \right| ≤ - \eta \nabla \Phi^T(x^{(k)}) d^{(k)}
∇ΦT(x(k+1))d(k)
≤−η∇ΦT(x(k))d(k)
其中
η
∈
(
0
,
1
)
\eta \in (0, 1)
η∈(0,1),当
η
=
0
\eta = 0
η=0 即为精确一维搜索,当
η
=
0.1
\eta = 0.1
η=0.1 可看作近似的精确一维搜索。
无约束最优化
无约束最优化问题 min f ( x ) , f : R n → R \min f(x), f: \mathbb{R}^n \to \mathbb{R} minf(x),f:Rn→R 往往采用迭代下降的方法求解,即 x ( k + 1 ) = x ( k ) + λ k d ( k ) x^{(k + 1)} = x^{(k)} + \lambda_k d^{(k)} x(k+1)=x(k)+λkd(k),不同算法间的区别在于求得 d ( k ) d^{(k)} d(k) 的方式和得出的值不同。
当
f
(
x
)
f(x)
f(x) 为二次函数时,写成二次型
f
(
x
)
=
1
2
x
T
A
x
+
b
T
x
+
c
,
∇
f
(
x
)
=
A
x
+
b
f(x) = \frac{1}{2} x^T A x + b^T x + c, \nabla f(x) = Ax + b
f(x)=21xTAx+bTx+c,∇f(x)=Ax+b 带入方程
∇
f
T
(
x
+
λ
d
)
d
=
0
\nabla f^T(x + \lambda d) d = 0
∇fT(x+λd)d=0
得
(
A
(
x
+
λ
d
)
+
b
)
T
d
=
0
(A(x + \lambda d) + b)^T d = 0
(A(x+λd)+b)Td=0
化简得最佳步长
λ
\lambda
λ 为
λ
=
−
d
T
(
A
x
+
b
)
d
T
A
d
\lambda = -\frac{d^T (Ax + b)}{d^T A d}
λ=−dTAddT(Ax+b)
若
f
(
x
)
f(x)
f(x) 非二次函数,则需要线搜索得到最佳步长。
有约束最优化问题可通过罚因子方法或增广 Lagrange 方法转化为无约束最优化问题求解。
测试选用 Rosenbrock Banana 函数
f
(
x
)
=
(
a
−
x
1
)
2
+
b
(
x
2
−
x
1
2
)
2
f(x) = (a - x_1)^2 + b (x_2 - x_1^2)^2
f(x)=(a−x1)2+b(x2−x12)2
其有全局极小点
(
a
,
a
2
)
(a, a^2)
(a,a2),取
a
=
1
,
b
=
5
a = 1, b = 5
a=1,b=5,初始解
x
0
=
(
1
,
−
1
)
T
x_0 = (1, -1)^T
x0=(1,−1)T。代码:
using LinearAlgebra, Optim
using Plots
macro iterDescent(part0, part1, part2)
quote
x, r = x0, [x0]
$part0
for i = 1 : max_iter
$part1
if norm(d) < tol
break
end
a = Optim.minimizer(optimize(a -> f.f(x + a * d), -1., 1.))
s = a * d
if norm(s) < tol
break
end
x = x + s
$part2
push!(r, x)
end
return r
end |> esc
end
function draw(p)
for i = 2 : length(p)
plot!([p[i - 1][1], p[i][1]], [p[i - 1][2], p[i][2]], color = :black, legend = false, w = 1)
end
end
function RosenbrockBanana(a, b)
(
f = x::Vector{Float64} -> (a - x[1])^2 + b * (x[2] - x[1]^2)^2,
g = x::Vector{Float64} -> [-2 * (a - x[1]) - 4 * b * x[1] * (x[2] - x[1]^2); 2 * b * (x[2] - x[1]^2)],
h = x::Vector{Float64} -> [(2 - 4 * b * (x[2] - 3 * x[1]^2)) (-4 * b * x[1]); (-4 * b * x[1]) (2 * b)]
)
end
rb = RosenbrockBanana(1, 5)
contour(range(-2, 2, length = 1000), range(-2, 2, length = 1000), (x, y) -> rb.f([x, y]),
aspect_ratio = 1, levels = 100, color=:turbo, xlims = (-2, 2), ylims = (-2, 2))
x0 = [1., -1.]
坐标轮换法
迭代方向为坐标轴,即单位基向量,每 n n n 轮一循环。
代码:
CyclicCoordinate(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent(
n = length(x0),
begin
d = zeros(n); d[(i - 1) % n + 1] = 1
end,
nothing
)
draw(CyclicCoordinate(rb, x0))
结果:
Hooke-Jeeves 法
在坐标轮换法基础上加上第 n + 1 n + 1 n+1 轮的加速步,方向为 d ( k ) = x ( k ) − x ( k − n − 1 ) d^{(k)} = x^{(k)} - x^{(k - n - 1)} d(k)=x(k)−x(k−n−1),每 n + 1 n + 1 n+1 轮一循环。
代码:
HookeJeeves(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent(
begin
n = length(x0); l = zeros(n)
end,
begin
if i % (n + 1) > 0
d = zeros(n); d[i % (n + 1)] = 1
else
d = l
end
end,
begin
if i % (n + 1) > 0
l[i % (n + 1)] += a
else
l = s
end
end
)
draw(HookeJeeves(rb, x0))
结果:
Rosenbrock 法
第
1
1
1 到
n
n
n 轮同坐标轮换法,然后将前
n
n
n 轮的位移
d
(
k
)
=
x
(
k
)
−
x
(
k
−
n
)
d^{(k)} = x^{(k)} - x^{(k - n)}
d(k)=x(k)−x(k−n) 作为第
n
+
1
n + 1
n+1 轮的方向,并进行 Gram-Schmidt 正交化形成新的基,每
n
n
n 轮一循环。
a
j
=
{
d
j
,
λ
j
=
0
Σ
i
=
j
n
λ
i
d
i
,
λ
j
≠
0
,
b
j
=
{
a
j
,
j
=
1
a
j
−
Σ
i
=
1
j
−
1
(
a
j
T
d
‾
i
)
d
‾
i
,
j
≥
2
,
d
‾
j
=
b
j
∥
b
j
∥
a_j = \begin{cases} d_j, & \lambda_j = 0 \\ \Sigma_{i = j}^n \lambda_i d_i, & \lambda_j ≠ 0 \end{cases}, b_j = \begin{cases} a_j, & j = 1 \\ a_j - \Sigma_{i = 1}^{j - 1} (a_j^T \overline{d}_i) \overline{d}_i, & j ≥ 2 \end{cases}, \overline{d}_j = \frac{b_j}{\Vert b_j \Vert}
aj={dj,Σi=jnλidi,λj=0λj=0,bj={aj,aj−Σi=1j−1(ajTdi)di,j=1j≥2,dj=∥bj∥bj
代码:
Rosenbrock(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent(
begin
n = length(x0); l = zeros(n)
b = []
for i = 1 : n
push!(b, zeros(n))
b[end][i] = 1
end
end,
d = b[(i - 1) % n + 1],
begin
l[(i - 1) % n + 1] = a
if i % n == 0
t = zeros(n)
for j = n : -1 : 1
if l[j] != 0
t += l[j] * b[j]
b[j] = t
end
end
for j = 1 : n
t = zeros(n)
for k = 1 : j - 1
t += (b[j]' * b[k]) * b[k]
end
b[j] -= t
b[j] /= norm(b[j])
end
l = zeros(n)
end
end
)
draw(Rosenbrock(rb, x0))
结果:
梯度下降法
迭代方向为梯度下降方向。(二维和坐标轮换法类似,更高维情况有不同)
代码:
GradientDescent(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent nothing d = -f.g(x) nothing
draw(GradientDescent(rb, x0))
结果:
Newton 法
代码:
Newton(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent nothing d = -f.h(x) \ f.g(x) nothing
draw(Newton(rb, x0))
结果:
DFP 法
H k + 1 = H k − H k y k ( H k y k ) T ( y k ) T H k y k + s k ( s k ) T ( y k ) T s k H^{k+1} = H^k - \frac{H^k y^k (H^k y^k)^T}{(y^k)^T H^k y^k} + \frac{s^k (s^k)^T}{(y^k)^T s^k} Hk+1=Hk−(yk)THkykHkyk(Hkyk)T+(yk)Tsksk(sk)T
代码:
DFP(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent(
begin
g, H = f.g(x0), Matrix{Float64}(I, length(x0), length(x0))
end,
d = -H * g,
begin
g_new = f.g(x)
y, g = g_new - g, g_new
H = H - ((H * y) * (H * y)') / (y' * H * y) + (s * s') / (y' * s)
end
)
draw(DFP(rb, x0))
结果:
BFGS 法
H
k
+
1
=
(
I
−
ρ
k
y
k
(
s
k
)
T
)
T
H
k
(
I
−
ρ
k
y
k
(
s
k
)
T
)
+
ρ
k
s
k
(
s
k
)
T
H^{k+1} = (I - \rho_k y^k (s^k)^T)^T H^k (I - \rho_k y^k (s^k)^T) + \rho_k s^k (s^k)^T
Hk+1=(I−ρkyk(sk)T)THk(I−ρkyk(sk)T)+ρksk(sk)T
其中
ρ
k
=
1
(
s
k
)
T
y
k
\rho_k = \frac{1}{(s^k)^T y^k}
ρk=(sk)Tyk1
代码:
BFGS(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent(
begin
g, H, E = f.g(x0), Matrix{Float64}(I, length(x0), length(x0)), Matrix{Float64}(I, length(x0), length(x0))
end,
d = -H * g,
begin
g_new = f.g(x)
y, g = g_new - g, g_new
rho = 1 / (s' * y)
H = (E - rho * y * s')' * H * (E - rho * y * s') + rho * s * s'
end
)
draw(BFGS(rb, x0))
结果:
Fletcher-Reeves 共轭梯度法
代码:
ConjugateGradientFR(f, x0, tol = 1e-6, max_iter = 1000) = @iterDescent(
begin
g = f.g(x0); d = -g
end,
nothing,
begin
g_new = f.g(x)
b, g = (g_new' * g_new) / (g' * g), g_new
d = - g_new + b * d
end
)
draw(ConjugateGradientFR(rb, x0))
结果:
References
北大李东风 Julia语言入门 教材:https://www.math.pku.edu.cn/teachers/lidf/docs/Julia/html/_book/index.html
快速入门: 一份简单而粗略的语言概览 :https://cheatsheet.juliadocs.org/zh-cn/
最优化:建模、算法与理论 (刘浩洋等)