- 1. 不动点定理及其条件验证
- 2. 收敛阶、收敛检测与收敛加速
- 2.1 如何估计不动点迭代的收敛阶 x k + 1 = g ( x k ) {x}_{{k}+1}={g}\left({x}_{{k}}\right) xk+1=g(xk)
- 2.2 给定精度的情况下,如何预测不动点迭代需要迭代的次数
- 2.3 如何加快收敛的速度
- 2.4 停止不定点迭代的条件
- 2.5 不动点迭代的两个缺点
- 3. 应用:如何求解非线性方程组 f ( x ) = 0 f(x)=0 f(x)=0的解
- 3.1 二分法(Bisection Method of Bolzano)
- 3.2 试位法(False Position Method)
- 3.3 牛顿-拉夫逊方法(Newton-Raphson method)
- 3.4 割线法(Secant Method)
- 3.5 Aitken过程加速
- 3.6 Muller方法(Muller's method)
- 4. 其他问题
- 4.1 如何寻找初值
- 4.2 收敛条件
- 4.3 算法的收敛速度对比
- 4.4 算法的选择
1. 不动点定理及其条件验证
不动点定义: P = g ( P ) {P}={g}({P}) P=g(P)
不定点迭代: x k + 1 = g ( x k ) {x}_{{k}+1}={g}\left({x}_{{k}}\right) xk+1=g(xk)
定理: 如果 g ( x k ) g({x_k}) g(xk)是连续的并且序列 x k {x_k} xk是收敛的, x k {x_k} xk收敛到方程的解: x = g ( x ) {x}=g({x}) x=g(x)
x ∗ = g ( x ∗ ) and x k − > x ∗ {x}^{*}={g}\left({x}^{*}\right) \text { and } {x}_{{k}}->{x}^{*} x∗=g(x∗) and xk−>x∗
定理: 假设
(1) 对于 g ( x ) g(x) g(x), g ′ ( x ) ∈ C [ a , b ] g'(x)\in C[a,b] g′(x)∈C[a,b](连续)
(2) K K K是一个正的常数
(3) p 0 ∈ ( a , b ) p_0\in(a,b) p0∈(a,b)
(4) g ( x ) ∈ [ a , b ] , ∀ x ∈ [ a , b ] g(x)\in[a,b],\forall x\in[a,b] g(x)∈[a,b],∀x∈[a,b]
那么
(a) 如果 ∣ g ′ ( x ) ∣ ≤ K < 1 , ∀ x ∈ [ a , b ] , x k + 1 = g ( x k ) \left|\mathrm{g}^{\prime}(x)\right| \leq \mathrm{K}<1 , \forall x \in[\mathrm{a}, \mathrm{b}], \mathrm{x}_{\mathrm{k}+1}=\mathrm{g}\left(\mathrm{x}_{\mathrm{k}}\right) ∣g′(x)∣≤K<1,∀x∈[a,b],xk+1=g(xk)收敛。
(b) 如果 ∣ g ′ ( x ) ∣ > 1 , ∀ x ∈ [ a , b ] , x k + 1 = g ( x k ) \left|\mathrm{g}^{\prime}(x)\right|>1 , \forall x \in[\mathrm{a}, \mathrm{b}], \mathrm{x}_{\mathrm{k}+1}=\mathrm{g}\left(\mathrm{x}_{\mathrm{k}}\right) ∣g′(x)∣>1,∀x∈[a,b],xk+1=g(xk)不收敛
曲线的切线斜率 k ∈ ( − 1 , 1 ) k\in(-1,1) k∈(−1,1)看下面的图逐渐收敛:
曲线的切线斜率
k
∈
[
−
∞
,
−
1
)
∪
(
1
,
∞
]
k\in[-\infty,-1)\cup (1,\infty]
k∈[−∞,−1)∪(1,∞]看下面的图不收敛:
综上所述,不动点迭代满足最重要的是:
( 1 ) ∣ g ′ ( x ) ∣ ≤ K < 1 , ∀ x ∈ [ a , b ] 【 g ′ ( x ) 的边界条件】 ( 2 ) g ( x ) ∈ [ a , b ] , ∀ x ∈ [ a , b ] ,并且有 g ( [ a , b ] ) ⊂ [ a , b ] 【 g ( x ) 的边界条件】 \begin{aligned}&(1) \left|\mathrm{g}^{\prime}(x)\right| \leq \mathrm{K}<1 ,\forall x \in[\mathrm{a}, \mathrm{b}] & 【g'(x)的边界条件】\\ &(2) \mathrm{g}(x) \in[\mathrm{a}, \mathrm{b}] , \forall x \in[\mathrm{a}, \mathrm{b}] ,并且有g([a, b]) \subset[a, b]& 【g(x)的边界条件】\end{aligned} (1)∣g′(x)∣≤K<1,∀x∈[a,b](2)g(x)∈[a,b],∀x∈[a,b],并且有g([a,b])⊂[a,b]【g′(x)的边界条件】【g(x)的边界条件】
单调和非单调要分别判断边界条件,单调的g(x)的范围看端点就可以了,非单调还要看极值点。
2. 收敛阶、收敛检测与收敛加速
定义;
∣ x k + 1 − x ∗ ∣ ≤ C ∣ x k − x ∗ ∣ p , k > M , for C > 0 , p > 0 \left|x_{k+1}-x^{*}\right| \leq C\left|x_{k}-x^{*}\right|^{p}, k>M \text {, for } C>0, p>0 ∣xk+1−x∗∣≤C∣xk−x∗∣p,k>M, for C>0,p>0
或
lim k → ∞ ∣ x k + 1 − x ∗ ∣ ∣ x k − x ∗ ∣ p = C \lim _{k \rightarrow \infty} \frac{\left|x_{k+1}-x^{*}\right|}{\left|x_{k}-x^{*}\right|^{p}}=C k→∞lim∣xk−x∗∣p∣xk+1−x∗∣=C
为 p p p阶收敛
其中:
p = 1 p=1 p=1 , 线性收敛(linear convergence)
1 < p < 2 1<p<2 1<p<2 , 超线性收敛(superlinear convergence)
p = 2 p=2 p=2 , 平方收敛(square convergence)
2.1 如何估计不动点迭代的收敛阶 x k + 1 = g ( x k ) {x}_{{k}+1}={g}\left({x}_{{k}}\right) xk+1=g(xk)
定理;设 x ∗ x^* x∗是最优解,如果 g ′ ( x ∗ ) = g ′ ′ ( x ∗ ) = … = g ( p − 1 ) ( x ∗ ) = 0 g^{\prime}\left(x^{*}\right)=g^{\prime \prime}\left(x^{*}\right)=\ldots=g^{(p-1)}\left(x^{*}\right)=0 g′(x∗)=g′′(x∗)=…=g(p−1)(x∗)=0, g ( p ) ( x ∗ ) ≠ 0 g^{(p)}\left(x^{*}\right) \neq 0 g(p)(x∗)=0, x k + 1 = g ( x k ) x_{k+1}=g\left(x_{k}\right) xk+1=g(xk)是 p p p阶收敛。
证明:
x
k
+
1
=
g
(
x
k
)
=
g
(
x
∗
)
+
g
′
(
x
∗
)
(
x
k
−
x
∗
)
+
…
+
g
(
p
−
1
)
(
x
∗
)
(
x
k
−
x
∗
)
p
−
1
(
p
−
1
)
!
+
g
(
p
)
(
ξ
)
(
x
k
−
x
∗
)
p
p
!
,
【
ξ
∈
[
x
k
,
x
∗
]
或
[
x
∗
,
x
k
]
】
⇒
x
k
+
1
=
x
∗
+
g
(
p
)
(
ξ
)
(
x
k
−
x
∗
)
p
p
!
⇒
x
k
+
1
−
x
∗
(
x
k
−
x
∗
)
p
=
g
(
p
)
(
ξ
)
p
!
→
g
(
p
)
(
x
∗
)
p
!
\begin{aligned} x_{k+1}&=g\left(x_{k}\right)=g\left(x^{*}\right)+g^{\prime}\left(x^{*}\right)\left(x_{k}-x^{*}\right)+\ldots\\&+\frac{g^{(p-1)}\left(x^{*}\right)\left(x_{k}-x^{*}\right)^{p-1}}{(p-1) !} +\frac{g^{(p)}(\xi)\left(x_{k}-x^{*}\right)^{p}}{p !}, \quad【\xi \in\left[x_{k}, x^{*}\right] 或\left[x^{*}, x_{k}\right] 】\\ \Rightarrow& x_{k+1}=x^{*}+\frac{g^{(p)}(\xi)\left(x_{k}-x^{*}\right)^{p}}{p !} \\ \Rightarrow& \frac{x_{k+1}-x^{*}}{\left(x_{k}-x^{*}\right)^{p}}=\frac{g^{(p)}(\xi)}{p !} \rightarrow \frac{g^{(p)}\left(x^{*}\right)}{p !} \end{aligned}
xk+1⇒⇒=g(xk)=g(x∗)+g′(x∗)(xk−x∗)+…+(p−1)!g(p−1)(x∗)(xk−x∗)p−1+p!g(p)(ξ)(xk−x∗)p,【ξ∈[xk,x∗]或[x∗,xk]】xk+1=x∗+p!g(p)(ξ)(xk−x∗)p(xk−x∗)pxk+1−x∗=p!g(p)(ξ)→p!g(p)(x∗)
2.2 给定精度的情况下,如何预测不动点迭代需要迭代的次数
定义 L = max x ∈ [ a , b ] { ∣ g ′ ( x ) ∣ } < 1 L=\max _{x \in[a, b]}\left\{\left|g^{\prime}(x)\right|\right\}<1 L=maxx∈[a,b]{∣g′(x)∣}<1
迭代的次数满足: k ≥ ln ( ε ( 1 − L ) / ∣ x 1 − x 0 ∣ ) / ln L k \geq \ln \left(\varepsilon(1-L) /\left|x_{1}-x_{0}\right|\right) / \ln L k≥ln(ε(1−L)/∣x1−x0∣)/lnL
证明:
x
k
+
1
=
g
(
x
k
)
=
g
(
x
∗
)
+
g
′
(
ξ
)
(
x
k
−
x
∗
)
=
x
∗
+
g
′
(
ξ
)
(
x
k
−
x
∗
)
⇒
∣
x
k
+
1
−
x
∗
∣
≤
∣
g
′
(
ξ
)
∥
(
x
k
−
x
∗
)
∣
≤
L
∣
x
k
−
x
∗
∣
≤
L
k
∣
x
1
−
x
∗
∣
\begin{aligned}x_{k+1}=g\left(x_{k}\right)=g\left(x^{*}\right)+g^{\prime}(\xi)\left(x_{k}-x^{*}\right)=x^{*}+g^{\prime}(\xi)\left(x_{k}-x^{*}\right)\\ \Rightarrow \left|x_{k+1}-x^{*}\right| \leq\left|g^{\prime}(\xi) \|\left(x_{k}-x^{*}\right)\right| \leq L\left|x_{k}-x^{*}\right|\le L^{k}\left|x_{1}-x^{*}\right|\end{aligned}
xk+1=g(xk)=g(x∗)+g′(ξ)(xk−x∗)=x∗+g′(ξ)(xk−x∗)⇒∣xk+1−x∗∣≤∣g′(ξ)∥(xk−x∗)∣≤L∣xk−x∗∣≤Lk∣x1−x∗∣
又有
∣
x
k
+
1
−
x
k
∣
=
∣
g
(
x
k
)
−
g
(
x
k
−
1
)
∣
≤
L
∣
x
k
−
x
k
−
1
∣
≤
L
k
∣
x
1
−
x
0
∣
\left|x_{k+1}-x_{k}\right|=\left|g\left(x_{k}\right)-g\left(x_{k-1}\right)\right| \leq L\left|x_{k}-x_{k-1}\right| \leq L^{k}\left|x_{1}-x_{0}\right|
∣xk+1−xk∣=∣g(xk)−g(xk−1)∣≤L∣xk−xk−1∣≤Lk∣x1−x0∣
于是有:
∣
x
k
+
q
−
x
k
∣
≤
∣
x
k
+
q
−
x
k
+
q
−
1
∣
+
∣
x
k
+
q
−
1
−
x
k
+
q
−
2
∣
+
…
+
∣
x
k
+
1
−
x
k
∣
≤
(
L
q
−
1
+
L
q
−
2
+
…
+
1
)
∣
x
k
+
1
−
x
k
∣
<
(
1
+
L
+
L
2
+
…
+
L
q
−
1
+
…
)
∣
x
k
+
1
−
x
k
∣
=
1
1
−
L
∣
x
k
+
1
−
x
k
∣
≤
L
k
1
−
L
∣
x
1
−
x
0
∣
\begin{aligned} &\left|x_{k+q}-x_{k}\right| \leq\left|x_{k+q}-x_{k+q-1}\right|+\left|x_{k+q-1}-x_{k+q-2}\right|+\ldots+\left|x_{k+1}-x_{k}\right| \\ &\leq\left(L^{q-1}+L^{q-2}+\ldots+1\right)\left|x_{k+1}-x_{k}\right|\\&<\left(1+L+L^{2}+\ldots+L^{q-1}+\ldots\right)\left|x_{k+1}-x_{k}\right|\\ &=\frac{1}{1-L}\left|x_{k+1}-x_{k}\right| \\&\leq \frac{L^{k}}{1-L}\left|x_{1}-x_{0}\right| \end{aligned}
∣xk+q−xk∣≤∣xk+q−xk+q−1∣+∣xk+q−1−xk+q−2∣+…+∣xk+1−xk∣≤(Lq−1+Lq−2+…+1)∣xk+1−xk∣<(1+L+L2+…+Lq−1+…)∣xk+1−xk∣=1−L1∣xk+1−xk∣≤1−LLk∣x1−x0∣
让 q → ∞ q \rightarrow \infty q→∞有
∣ x ∗ − x k ∣ ≤ 1 1 − L ∣ x k + 1 − x k ∣ ≤ L k 1 − L ∣ x 1 − x 0 ∣ \left|x^{*}-x_{k}\right| \leq \frac{1}{1-L}\left|x_{k+1}-x_{k}\right| \leq \frac{L^{k}}{1-L}\left|x_{1}-x_{0}\right| ∣x∗−xk∣≤1−L1∣xk+1−xk∣≤1−LLk∣x1−x0∣
于是:
L k 1 − L ∣ x 1 − x 0 ∣ ≤ ε ⇒ k ≥ ln ( ε ( 1 − L ) / ∣ x 1 − x 0 ∣ ) / ln L \frac{L^{k}}{1-L}\left|x_{1}-x_{0}\right| \leq \varepsilon \Rightarrow k \geq \ln \left(\varepsilon(1-L) /\left|x_{1}-x_{0}\right|\right) / \ln L 1−LLk∣x1−x0∣≤ε⇒k≥ln(ε(1−L)/∣x1−x0∣)/lnL
2.3 如何加快收敛的速度
x k + 1 − x ∗ ≈ L ( x k − x ∗ ) x k + 2 − x ∗ ≈ L ( x k + 1 − x ∗ ) x k + 1 − x ∗ x k + 2 − x ∗ ≈ x k − x ∗ x k + 1 − x ∗ ⇒ x ∗ ≈ x k − ( x k + 1 − x k ) 2 x k + 2 − 2 x k + 1 + x k = x Δ \begin{aligned} &x_{k+1}-x^{*} \approx L\left(x_{k}-x^{*}\right) \\ &x_{k+2}-x^{*} \approx L\left(x_{k+1}-x^{*}\right) \\ &\frac{x_{k+1}-x^{*}}{x_{k+2}-x^{*}} \approx \frac{x_{k}-x^{*}}{x_{k+1}-x^{*}} \Rightarrow\quad x^{*} \approx x_{k}-\frac{\left(x_{k+1}-x_{k}\right)^{2}}{x_{k+2}-2 x_{k+1}+x_{k}}=x^{\Delta} \end{aligned} xk+1−x∗≈L(xk−x∗)xk+2−x∗≈L(xk+1−x∗)xk+2−x∗xk+1−x∗≈xk+1−x∗xk−x∗⇒x∗≈xk−xk+2−2xk+1+xk(xk+1−xk)2=xΔ
根据上面的思路我们可以:
I t e r a t i o n x ˉ k + 1 = g ( x k ) O n e m o r e x ^ k + 1 = g ( x ˉ k + 1 ) T o s p e e d u p x k + 1 = x k − ( x ˉ k + 1 − x k ) 2 x ^ k + 1 − 2 x ˉ k + 1 + x k \begin{aligned}Iteration &\quad \bar{x}_{k+1}=g\left(x_{k}\right) \\ One more &\quad \hat{x}_{k+1}=g\left(\bar{x}_{k+1}\right) \\ To\, speed \,up &\quad x_{k+1}=x_{k}-\frac{\left(\bar{x}_{k+1}-x_{k}\right)^{2}}{\hat{x}_{k+1}-2 \bar{x}_{k+1}+x_{k}} \end{aligned} IterationOnemoreTospeedupxˉk+1=g(xk)x^k+1=g(xˉk+1)xk+1=xk−x^k+1−2xˉk+1+xk(xˉk+1−xk)2
2.4 停止不定点迭代的条件
当 L = max x ∈ [ a , b ] { ∣ g ′ ( x ) ∣ } < 1 L=\max _{x \in[a, b]}\left\{\left|g^{\prime}(x)\right|\right\}<1 L=maxx∈[a,b]{∣g′(x)∣}<1时,可以使用下面的条件:
∣ x k + 1 − x k ∣ < e p s \left|x_{\mathrm{k}+1}-x_{\mathrm{k}}\right|<\mathrm{eps} ∣xk+1−xk∣<eps
2.5 不动点迭代的两个缺点
- 很难估计 L ( max x ∈ [ a , b ] { ∣ g ′ ( x ) ∣ } ) L(\max _{x \in[a, b]}\left\{\left|g^{\prime}(x)\right|\right\}) L(maxx∈[a,b]{∣g′(x)∣})
- L < 1 L<1 L<1时无法收敛。
3. 应用:如何求解非线性方程组 f ( x ) = 0 f(x)=0 f(x)=0的解
3.1 二分法(Bisection Method of Bolzano)
算法的流程:
- 用一个区间找到一个根。
- 用中点分割该区间。
- 选择其中的一个子区间作为新的位置。
a
=
x
0
,
b
=
x
0
+
h
c
=
a
+
b
2
f
(
a
)
f
(
b
)
<
0
,
\begin{aligned} &a=x_{0}, \quad b=x_{0}+h \\ &c=\frac{a+b}{2}\\ &f(a) f(b)<0, \end{aligned}
a=x0,b=x0+hc=2a+bf(a)f(b)<0,
于是:
[
a
,
b
]
→
[
a
1
,
b
1
]
→
[
a
2
,
b
2
]
→
…
→
[
a
n
,
b
n
]
a
=
a
0
≤
a
1
≤
⋯
≤
a
n
≤
⋯
≤
r
≤
⋯
≤
b
n
≤
⋯
≤
b
1
≤
b
0
=
b
\begin{aligned} &{[{a}, {b}]\rightarrow\left[{a}_{1}, {~b}_{1}\right]\rightarrow \left[{a}_{2}, {~b}_{2}\right]\rightarrow\ldots\rightarrow\left[{a}_{{n}}, {b}_{{n}}\right]} \\ &a=a_{0} \leq a_{1} \leq \cdots \leq a_{n} \leq \cdots \leq r \leq \cdots \leq b_{n} \leq \cdots \leq b_{1} \leq b_{0}=b \end{aligned}
[a,b]→[a1, b1]→[a2, b2]→…→[an,bn]a=a0≤a1≤⋯≤an≤⋯≤r≤⋯≤bn≤⋯≤b1≤b0=b
定义
r
r
r是精确解。
∣
r
−
c
n
∣
≤
b
−
a
2
n
+
1
,
for
n
=
0
,
1
,
2
,
…
c
n
=
a
n
+
b
n
2
\begin{aligned} &\left|r-c_{n}\right| \leq \frac{b-a}{2^{n+1}}, \text { for } n=0,1,2, \ldots \\ &c_{n}=\frac{a_{n}+b_{n}}{2} \end{aligned}
∣r−cn∣≤2n+1b−a, for n=0,1,2,…cn=2an+bn
迭代次数N:
∣ r − c n ∣ ≤ b − a 2 n + 1 < δ 2 n + 1 > b − a δ ( n + 1 ) ln 2 > ln ( b − a ) − ln δ n + 1 > ln ( b − a ) − ln δ ln 2 N = int ( ln ( b − a ) − ln δ ln 2 ) \begin{aligned} &\left|r-c_{n}\right| \leq \frac{b-a}{2^{n+1}}<\delta \\ &2^{n+1}>\frac{b-a}{\delta} \\ &(n+1) \ln 2>\ln (b-a)-\ln \delta \\ &n+1>\frac{\ln (b-a)-\ln \delta}{\ln 2} \\ &N=\operatorname{int}\left(\frac{\ln (b-a)-\ln \delta}{\ln 2}\right) \end{aligned} ∣r−cn∣≤2n+1b−a<δ2n+1>δb−a(n+1)ln2>ln(b−a)−lnδn+1>ln2ln(b−a)−lnδN=int(ln2ln(b−a)−lnδ)
简单地利用二分法可以判断区间内有没有零点(区间内有变号【可取最大值和最小值】)
3.2 试位法(False Position Method)
算法的流程:
- 用一个区间找到一个根。
- 以割线与X轴的交点划分区间。(过程中仍然保证端点的异号,让区间包含零点)
- 选择其中一个子区间作为新的位置。
c = b − f ( b ) ( b − a ) f ( b ) − f ( a ) c 1 → c 2 → … → r [ a n , b n ] → [ a , c ] : = [ a n + 1 , b n + 1 ] c=b-\frac{f(b)(b-a)}{f(b)-f(a)}\\ c_{1}\rightarrow c_{2}\rightarrow \ldots\rightarrow r\\ \left[a_{n}, b_{n}\right]\rightarrow [a, c]:=\left[a_{n+1}, b_{n+1}\right] c=b−f(b)−f(a)f(b)(b−a)c1→c2→…→r[an,bn]→[a,c]:=[an+1,bn+1]
缺点:在凹函数下不适用,不会收敛。
3.3 牛顿-拉夫逊方法(Newton-Raphson method)
我们知道不动点迭代,能不能用到求解非线性方程组呢?
使用泰勒展式:
f ( x k + 1 ) = f ( x k ) + f ′ ( x k ) ( x k + 1 − x k ) + O ( ∣ d ∣ 2 ) = 0 f(x_{k+1})=f\left(x_{{k}}\right)+f^{\prime}\left(x_{{k}}\right) (x_{k+1}-x_k)+{O}\left(|d|^{2}\right)=0 f(xk+1)=f(xk)+f′(xk)(xk+1−xk)+O(∣d∣2)=0
于是我们可以让
f ( x k ) + f ′ ( x k ) ( x k + 1 − x k ) = 0 f\left(x_{\mathrm{k}}\right)+f^{\prime}\left(x_{\mathrm{k}}\right)\left(x_{{k}+1}-x_{{k}}\right)=0 f(xk)+f′(xk)(xk+1−xk)=0
使得:
x k + 1 = x k − f ( x k ) / f ′ ( x k ) = g ( x k ) x_{\mathrm{k}+1}=x_{{k}}-f\left(x_{\mathrm{k}}\right) / f^{\prime}\left(x_{\mathrm{k}}\right)=g(x_k) xk+1=xk−f(xk)/f′(xk)=g(xk)
总结Newton-Raphson方法即:
f ( x ) = 0 x = g ( x ) = x − f ( x ) f ′ ( x ) x k + 1 = g ( x k ) = x k − f ( x k ) f ′ ( x k ) \begin{array}{l} f(x)=0 \\ x=g(x)=x-\frac{f(x)}{f^{\prime}(x)} \\ x_{k+1}=g\left(x_{k}\right)=x_{k}-\frac{f\left(x_{k}\right)}{f^{\prime}\left(x_{k}\right)} \end{array} f(x)=0x=g(x)=x−f′(x)f(x)xk+1=g(xk)=xk−f′(xk)f(xk)
我们可以证明在解的附近,Newton-Raphson方法是收敛的。
证明:
g ( x ) = x − f ( x ) / f ′ ( x ) g(x)=x-f(x) / f^{\prime}(x) g(x)=x−f(x)/f′(x)
g ′ ( x ) = 1 − f ′ ( x ) f ′ ( x ) − f ( x ) f ′ ′ ( x ) [ f ′ ( x ) ] 2 = f ( x ) f ′ ′ ( x ) [ f ′ ( x ) ] 2 g^{\prime}(x)=1-\frac{f^{\prime}(x) f^{\prime}(x)-f(x) f^{\prime \prime}(x)}{\left[f^{\prime}(x)\right]^{2}}=\frac{f(x) f^{\prime \prime}(x)}{\left[f^{\prime}(x)\right]^{2}} g′(x)=1−[f′(x)]2f′(x)f′(x)−f(x)f′′(x)=[f′(x)]2f(x)f′′(x)
我们知道不动点的条件是 ∣ g ′ ( x ) ∣ < K < 1 \left|g^{\prime}(x)\right|<K<1 ∣g′(x)∣<K<1,当我们取的邻域足够小,条件 g ( [ a , b ] ) ⊂ [ a , b ] g([a, b]) \subset[a, b] g([a,b])⊂[a,b]会满足,注意到 f ( x ∗ ) = 0 f(x^*)=0 f(x∗)=0,在解的邻域附近,因为 f ( x ) = 0 f(x)=0 f(x)=0,所以 g ′ ( x ) = 0 g'(x)=0 g′(x)=0。
各种条件下的推导(不做要求,想了解可以看一下)
- f ′ ( x ∗ ) > 0 and f ′ ′ ( x ∗ ) < 0 , g ( [ x ∗ − δ , x ∗ + δ ] ) ⊂ [ x ∗ − δ , x ∗ + δ ] f^{\prime}\left(x^{*}\right)>0 \text { and } f^{\prime \prime}\left(x^{*}\right)<0, \,\,\,\,g\left(\left[x^{*}-\delta, x^{*}+\delta\right]\right) \subset\left[x^{*}-\delta, x^{*}+\delta\right] f′(x∗)>0 and f′′(x∗)<0,g([x∗−δ,x∗+δ])⊂[x∗−δ,x∗+δ]
x ∗ − δ < g ( x ∗ − δ ) = ( x ∗ − δ ) − f ( x ∗ − δ ) f ′ ( x ∗ − δ ) ⇔ 0 < − f ( x ∗ − δ ) f ′ ( x ∗ − δ ) ⇔ f ( x ∗ − δ ) f ′ ( x ∗ − δ ) < 0 ⇔ f ( x ∗ − δ ) < 0 ⇔ f ( x ∗ ) − f ′ ( ξ ) δ < 0 【 ξ ∈ [ x ∗ − δ , x ∗ ] 】 ⇔ − f ′ ( ξ ) δ < 0 ⇒ ∃ δ 1 > 0 , f ′ ( ξ ) > 0 , for x ∗ − ξ < δ 1 \begin{aligned} &x^{*}-\delta<g\left(x^{*}-\delta\right)=\left(x^{*}-\delta\right)-\frac{f\left(x^{*}-\delta\right)}{f^{\prime}\left(x^{*}-\delta\right)} \\ \Leftrightarrow& 0<-\frac{f\left(x^{*}-\delta\right)}{f^{\prime}\left(x^{*}-\delta\right)} \\ \Leftrightarrow &\frac{f\left(x^{*}-\delta\right)}{f^{\prime}\left(x^{*}-\delta\right)}<0 \\ \Leftrightarrow &f\left(x^{*}-\delta\right)<0 \\ \Leftrightarrow &f\left(x^{*}\right)-f^{\prime}(\xi) \delta<0 【\xi\in[x^*-\delta,x^*]】\\ \Leftrightarrow&-f^{\prime}(\xi) \delta<0\\ \Rightarrow&\exists \delta_1>0, f^{\prime}(\xi)>0, \text { for } x^{*}-\xi<\delta_1 \\ \end{aligned} ⇔⇔⇔⇔⇔⇒x∗−δ<g(x∗−δ)=(x∗−δ)−f′(x∗−δ)f(x∗−δ)0<−f′(x∗−δ)f(x∗−δ)f′(x∗−δ)f(x∗−δ)<0f(x∗−δ)<0f(x∗)−f′(ξ)δ<0【ξ∈[x∗−δ,x∗]】−f′(ξ)δ<0∃δ1>0,f′(ξ)>0, for x∗−ξ<δ1
又有
f ′ ′ ( x ∗ ) < 0 ⇒ ∃ δ 2 > 0 , f ′ ′ ( x ) < 0 【保号性】 ⇒ g ′ ( x ) = f ( x ) f ′ ′ ( x ) [ f ′ ( x ) ] 2 > 0 , for x ∗ − x < δ 2 【 δ 2 足够小,导数保号性, f ′ ( x ) > 0 , x < x ∗ , f ( x ∗ ) = 0 , f ( x ) < 0 】 \begin{aligned} &f^{\prime \prime}\left(x^{*}\right)<0\\ \Rightarrow & \exists \delta_2>0, f^{\prime \prime}(x)<0 【保号性】\\ \Rightarrow & g^{\prime}(x)=\frac{f(x) f^{\prime \prime}(x)}{\left[f^{\prime}(x)\right]^{2}}>0, \text { for } x^{*}-x<\delta_2\\ &【\delta_2足够小,导数保号性,f'(x)>0,x<x^*,f(x^*)=0,f(x)<0】 \end{aligned} ⇒⇒f′′(x∗)<0∃δ2>0,f′′(x)<0【保号性】g′(x)=[f′(x)]2f(x)f′′(x)>0, for x∗−x<δ2【δ2足够小,导数保号性,f′(x)>0,x<x∗,f(x∗)=0,f(x)<0】
当 δ < min { δ 1 , δ 2 } \delta<\min\{\delta_1,\delta_2\} δ<min{δ1,δ2}有:
x ∗ − δ < g ( x ∗ − δ ) < g ( x ) , for x ∗ − x < δ x^{*}-\delta<g\left(x^{*}-\delta\right)<g(x), \text { for } x^{*}-x<\delta x∗−δ<g(x∗−δ)<g(x), for x∗−x<δ- f ′ ( x ∗ ) > 0 and f ′ ′ ( x ∗ ) < 0 , g ( [ x ∗ − δ , x ∗ + δ ] ) ⊂ [ x ∗ − δ , x ∗ + δ ] f^{\prime}\left(x^{*}\right)>0 \text { and } f^{\prime \prime}\left(x^{*}\right)<0, \,\,\,\,g\left(\left[x^{*}-\delta, x^{*}+\delta\right]\right) \subset\left[x^{*}-\delta, x^{*}+\delta\right] f′(x∗)>0 and f′′(x∗)<0,g([x∗−δ,x∗+δ])⊂[x∗−δ,x∗+δ]
x ∗ − δ < g ( x ∗ − δ ) = ( x ∗ − δ ) − f ( x ∗ − δ ) f ′ ( x ∗ − δ ) ⇔ 0 < − f ( x ∗ − δ ) f ′ ( x ∗ − δ ) ⇔ f ( x ∗ − δ ) f ′ ( x ∗ − δ ) < 0 ⇔ f ( x ∗ − δ ) < 0 ⇔ f ( x ∗ ) − f ′ ( ξ ) δ < 0 【 ξ ∈ [ x ∗ − δ , x ∗ ] 】 ⇔ − f ′ ( ξ ) δ < 0 ⇒ ∃ δ 1 > 0 , f ′ ( ξ ) > 0 , for x ∗ − ξ < δ 1 \begin{aligned} &x^{*}-\delta<g\left(x^{*}-\delta\right)=\left(x^{*}-\delta\right)-\frac{f\left(x^{*}-\delta\right)}{f^{\prime}\left(x^{*}-\delta\right)} \\ \Leftrightarrow& 0<-\frac{f\left(x^{*}-\delta\right)}{f^{\prime}\left(x^{*}-\delta\right)} \\ \Leftrightarrow &\frac{f\left(x^{*}-\delta\right)}{f^{\prime}\left(x^{*}-\delta\right)}<0 \\ \Leftrightarrow &f\left(x^{*}-\delta\right)<0 \\ \Leftrightarrow &f\left(x^{*}\right)-f^{\prime}(\xi) \delta<0 【\xi\in[x^*-\delta,x^*]】\\ \Leftrightarrow&-f^{\prime}(\xi) \delta<0\\ \Rightarrow&\exists \delta_1>0, f^{\prime}(\xi)>0, \text { for } x^{*}-\xi<\delta_1 \\ \end{aligned} ⇔⇔⇔⇔⇔⇒x∗−δ<g(x∗−δ)=(x∗−δ)−f′(x∗−δ)f(x∗−δ)0<−f′(x∗−δ)f(x∗−δ)f′(x∗−δ)f(x∗−δ)<0f(x∗−δ)<0f(x∗)−f′(ξ)δ<0【ξ∈[x∗−δ,x∗]】−f′(ξ)δ<0∃δ1>0,f′(ξ)>0, for x∗−ξ<δ1
又有
f ′ ′ ( x ∗ ) > 0 ⇒ ∃ δ 2 > 0 , f ′ ′ ( x ) < 0 【保号性】 ⇒ g ′ ( x ) = f ( x ) f ′ ′ ( x ) [ f ′ ( x ) ] 2 < 0 , for x ∗ − x < δ 2 【 δ 2 足够小,导数保号性, f ′ ( x ) > 0 , x < x ∗ , f ( x ∗ ) = 0 , f ( x ) < 0 】 \begin{aligned} &f^{\prime \prime}\left(x^{*}\right)>0\\ \Rightarrow & \exists \delta_2>0, f^{\prime \prime}(x)<0 【保号性】\\ \Rightarrow & g^{\prime}(x)=\frac{f(x) f^{\prime \prime}(x)}{\left[f^{\prime}(x)\right]^{2}}<0, \text { for } x^{*}-x<\delta_2\\ &【\delta_2足够小,导数保号性,f'(x)>0,x<x^*,f(x^*)=0,f(x)<0】 \end{aligned} ⇒⇒f′′(x∗)>0∃δ2>0,f′′(x)<0【保号性】g′(x)=[f′(x)]2f(x)f′′(x)<0, for x∗−x<δ2【δ2足够小,导数保号性,f′(x)>0,x<x∗,f(x∗)=0,f(x)<0】
当 δ < min { δ 1 , δ 2 } \delta<\min\{\delta_1,\delta_2\} δ<min{δ1,δ2}有:
x ∗ − δ < x ∗ = g ( x ∗ ) < g ( x ) , for x ∗ − x < δ , x < x ∗ x^{*}-\delta<x^{*}=g\left(x^{*}\right)<g(x), \text { for } x^{*}-x<\delta,x<x^* x∗−δ<x∗=g(x∗)<g(x), for x∗−x<δ,x<x∗
注意Newton-Raphson方法对于单根是二阶收敛(二次收敛)【quadratic convergence】
∣ E n + 1 ∣ ≈ ∣ f ′ ′ ( p ) ∣ 2 ∣ f ′ ( p ) ∣ ∣ E n ∣ 2 n → ∞ \left|E_{n+1}\right| \approx \frac{\left|f^{\prime \prime}(p)\right|}{2\left|f^{\prime}(p)\right|}\left|E_{n}\right|^{2}\quad n\rightarrow \infty ∣En+1∣≈2∣f′(p)∣∣f′′(p)∣∣En∣2n→∞
证明:
而对于多重根是线性(一次)收敛,收敛速度降低。
∣ E n + 1 ∣ ≈ M − 1 M ∣ E n ∣ n → ∞ \left|E_{n+1}\right| \approx \frac{M-1}{M}\left|E_{n}\right |\quad n\rightarrow \infty ∣En+1∣≈MM−1∣En∣n→∞
证明:
如果出现了多重根 p ∗ p^* p∗,我们看到在 f ′ ( p ∗ ) = 0 f'(p^*)=0 f′(p∗)=0,Newton-Raphson方法的分母会出现0.然而一般来说,分子 f ( p k ) f(p_k) f(pk)要比分母 f ′ ( p k ) f'(p_k) f′(pk)先出现0,所以Newton-Raphson方法一般还是可以用的。
Newton-Raphson方法的问题:
1.分母可能为0,除以零是不允许的。
2.收敛到一个不同的根,或发散。
3.产生一个循环序列。
4.产生一个发散的振荡序列。
由于多重根线性收敛的问题,可以考虑Newton-Raphson方法加速:
p
k
=
p
k
−
1
−
M
f
(
p
k
−
1
)
f
′
(
p
k
−
1
)
M
>
1
p_{k}=p_{k-1}-\frac{M f\left(p_{k-1}\right)}{f^{\prime}\left(p_{k-1}\right)}\quad M>1
pk=pk−1−f′(pk−1)Mf(pk−1)M>1
证明:
3.4 割线法(Secant Method)
当Newton-Raphson的导数不好显式表达的时候,可以通过两端点的直线的斜率来近似导数。
我们有:
x k + 2 = g ( x k , x k + 1 ) = x k + 1 − f ( x k + 1 ) ( x k + 1 − x k ) f ( x k + 1 ) − f ( x k ) x_{k+2}=g\left(x_{k}, x_{k+1}\right)=x_{k+1}-\frac{f\left(x_{k+1}\right)\left(x_{k+1}-x_{k}\right)}{f\left(x_{k+1}\right)-f\left(x_{k}\right)} xk+2=g(xk,xk+1)=xk+1−f(xk+1)−f(xk)f(xk+1)(xk+1−xk)
3.5 Aitken过程加速
使用不定点的迭代,Aitken过程加速又称为史蒂芬森加速(Steffensen’s acceleration).注意,只对一阶方法有效。
lim n → ∞ p − p n + 1 p − p n = A , p ≈ p n + 2 p n − p n + 1 2 p n + 2 − 2 p n + 1 + p n = q n \lim _{n \rightarrow \infty} \frac{p-p_{n+1}}{p-p_{n}}=A, \quad p \approx \frac{p_{n+2} p_{n}-p_{n+1}^{2}}{p_{n+2}-2 p_{n+1}+p_{n}}=q_{n} n→∞limp−pnp−pn+1=A,p≈pn+2−2pn+1+pnpn+2pn−pn+12=qn
3.6 Muller方法(Muller’s method)
给定三个初始值
(
p
0
,
f
(
p
0
)
)
,
(
p
1
,
f
(
p
1
)
)
,
(
p
2
,
f
(
p
2
)
)
\left(p_{0}, f\left(p_{0}\right)\right),\left(p_{1}, f\left(p_{1}\right)\right),\left(p_{2},f\left(p_{2}\right)\right)
(p0,f(p0)),(p1,f(p1)),(p2,f(p2))
令
t
=
x
−
p
2
h
0
=
p
0
−
p
2
,
h
1
=
p
1
−
p
2
\begin{aligned} &t=x-p_{2} \\ &h_{0}=p_{0}-p_{2}, h_{1}=p_{1}-p_{2} \\ \end{aligned}
t=x−p2h0=p0−p2,h1=p1−p2
我们使用二次函数计算下一个点:
y = a t 2 + b t + c y=a t^{2}+b t+c y=at2+bt+c
则有:
t = h 0 : a h 0 2 + b h 0 + c = f 0 ⇒ a h 0 2 + b h 0 = f 0 − c = e 0 t = h 1 : a h 1 2 + b h 1 + c = f 1 ⇒ a h 1 2 + b h 1 = f 1 − c = e 1 t = 0 : a 0 2 + b 0 + c = f 2 ⇒ c = f 2 \begin{aligned} t=h_{0}: a h_{0}^{2}+b h_{0}+c=f_{0} &\Rightarrow a h_{0}^{2}+b h_{0}=f_{0}-c=e_{0} \\ t=h_{1}: a h_{1}^{2}+b h_{1}+c=f_{1} &\Rightarrow a h_{1}^{2}+b h_{1}=f_{1}-c=e_{1} \\ t=0: a 0^{2}+b 0+c=f_{2}& \Rightarrow c=f_{2} \end{aligned} t=h0:ah02+bh0+c=f0t=h1:ah12+bh1+c=f1t=0:a02+b0+c=f2⇒ah02+bh0=f0−c=e0⇒ah12+bh1=f1−c=e1⇒c=f2
解得:
a = e 0 h 1 − e 1 h 0 h 1 h 0 2 − h 0 h 1 2 , b = e 1 h 0 2 − e 0 h 1 2 h 1 h 0 2 − h 0 h 1 2 a=\frac{e_{0} h_{1}-e_{1} h_{0}}{h_{1} h_{0}^{2}-h_{0} h_{1}^{2}}, \quad b=\frac{e_{1} h_{0}^{2}-e_{0} h_{1}^{2}}{h_{1} h_{0}^{2}-h_{0} h_{1}^{2}} a=h1h02−h0h12e0h1−e1h0,b=h1h02−h0h12e1h02−e0h12
于是得到:
a t 2 + b t + c = 0 : t = z 1 , z 2 ⇒ z i = − 2 c b ± b 2 − 4 a c z = arg min { ∣ z i ∣ } 【对于一个复数,在计算中只保留其实数部分】 \begin{aligned} &a t^{2}+b t+c=0: \quad t=z_{1}, z_{2} \Rightarrow z_{i}=\frac{-2 c}{b \pm \sqrt{b^{2}-4 a c}} \\ &z=\arg \min \left\{\left|z_{i}\right|\right\}【\text{对于一个复数,在计算中只保留其实数部分}】 \end{aligned} at2+bt+c=0:t=z1,z2⇒zi=b±b2−4ac−2cz=argmin{∣zi∣}【对于一个复数,在计算中只保留其实数部分】
p 3 = p 2 + z p_{3}=p_{2}+z p3=p2+z
继续得到 ( p ˉ 1 , p ˉ 2 , p 3 ) \left(\bar{p}_{1}, \bar{p}_{2}, p_{3}\right) (pˉ1,pˉ2,p3),其中 p ˉ 1 , p ˉ 2 \bar{p}_{1}, \bar{p}_{2} pˉ1,pˉ2是距离 p 3 p_3 p3最近的两个点。
4. 其他问题
4.1 如何寻找初值
例如
可以有两个判断条件:
-
【针对 r 1 r_1 r1和 r 2 r_2 r2】
f ( x k − 1 ) f ( x k ) < 0 [ a , b ] = [ x k − 1 , x k ] f\left(x_{k-1}\right) f\left(x_{k}\right)<0 \quad[{a}, {b}]=\left[{x}_{{k}-1}, {x}_{{k}}\right] f(xk−1)f(xk)<0[a,b]=[xk−1,xk] -
【针对 r 3 r_3 r3】
∣ f ( x k ) ∣ < ε 并且 ( f ( x k ) − f ( x k − 1 ) ) ( f ( x k + 1 ) − f ( x k ) ) < 0 [ a , b ] = [ x k − 1 , x k + 1 ] \left|f\left(x_{k}\right)\right|<\varepsilon \text { 并且}\left(f\left(x_{k}\right)-f\left(x_{k-1}\right)\right) \left(f\left(x_{k+1}\right)-f\left(x_{k}\right)\right)<0\quad [{a}, {b}]=\left[{x}_{{k}-1}, {x}_{{k}+1}\right] ∣f(xk)∣<ε 并且(f(xk)−f(xk−1))(f(xk+1)−f(xk))<0[a,b]=[xk−1,xk+1]
4.2 收敛条件
可以有两个收敛条件:
1. 根据纵坐标
∣
f
(
x
k
)
∣
<
ε
\left|f\left(x_{k}\right)\right|<\varepsilon
∣f(xk)∣<ε
误差为: Error x = ∣ x k − r ∣ \text{Error}_{x}=\left|x_{k}-r\right| Errorx=∣xk−r∣
2. 根据横坐标
∣ x k − x k − 1 ∣ < δ \left|x_{k}-x_{k-1}\right|<\delta ∣xk−xk−1∣<δ
由以下推出:
∣ x k − r ∣ < δ ⇒ ∣ x k − x k − 1 ∣ < δ \left|x_{k}-r\right|<\delta \Rightarrow\left|x_{k}-x_{k-1}\right|<\delta ∣xk−r∣<δ⇒∣xk−xk−1∣<δ
误差为:
Error
f
=
max
{
∣
f
(
r
−
δ
)
∣
,
∣
f
(
r
+
δ
)
∣
}
\text { Error }_{f}=\max \{|f(r-\delta)|,|f(r+\delta)|\}
Error f=max{∣f(r−δ)∣,∣f(r+δ)∣}
3. 我们也可以把上面两个进行组合:
∣ f ( x k ) ∣ < ε 并且 ∣ x k − r ∣ < δ \left|f\left(x_{k}\right)\right|<\varepsilon \text{并且}\left|x_{k}-r\right|<\delta ∣f(xk)∣<ε并且∣xk−r∣<δ
- 如果针对Newton-Raphson问题,我们还可以有如下的判断标准:
①
f
′
(
r
)
≠
0
f^{\prime}(r) \neq 0
f′(r)=0
②
x
0
∈
[
r
−
δ
,
r
+
δ
]
x_{0} \in[r-\delta, r+\delta]
x0∈[r−δ,r+δ],
δ
\delta
δ足够小。
4.3 算法的收敛速度对比
4.4 算法的选择
单根:
Newton-Raphson方法
双根(当分母为0失效):
Newton-Raphson方法
Steffensen’s method