(六)ILQR正则化和line search
1. ILQR正则化
在iLQR中,我们通常线性化系统动力学并对目标函数进行二阶近似。在反向传播步骤中,我们需要计算逆矩阵(控制变量对目标函数的二阶导数矩阵),用以更新控制增量。但在某些情况下,可能是奇异的或者接近奇异,导致逆矩阵的计算不稳定。为了避免这种问题,可以通过加入正则化项,使得变得更具条件性。
方法1:基于梯度下降的正则化
一种常见的正则化方法是在上加上一个对角项,这类似于梯度下降中的学习率调节。
令正则化矩阵为,其中是单位矩阵,是正则化系数。我们可以将调整为:
通过这种方式,即便是非正定或接近奇异的,我们仍然可以通过计算 的逆矩阵来进行控制更新。
方法2:信赖域正则化(Trust-Region Regularization)
另一种正则化方法是基于信赖域(trust region)优化的思想。我们在优化控制增量时限制其大小。具体而言,控制增量的大小受限于一个固定范围:
这种方法通过限制更新步长来确保每一步的更新都不会过大,从而提高优化的鲁棒性。
正则化对算法的影响
收敛性: 适当的正则化能够提高 DDP 收敛的稳定性,但过大的正则化参数可能会减慢算法的收敛速度。
步长控制: 一些算法中,正则化系数会根据当前迭代的效果动态调整。若迭代效果较好,减小以加快收敛;若效果较差,则增大以确保稳定性。
2. 线搜索(Line Search)
在 iLQR中,线搜索(Line Search)是一种常见的技巧,用于在更新控制序列时进一步提高算法的数值稳定性和收敛性。在 iLQR 的反向传播阶段,通过局部线性化的系统模型和二次近似的目标函数来计算新的控制增量。然而,由于近似并不精确,直接应用所得到的控制更新可能会导致目标函数值的增加,尤其是当系统具有高度非线性时。为了防止目标值恶化,线搜索通过控制步长来确保新的控制输入能带来实际的性能提升。
线搜索的基本步骤
线搜索的核心思想是在一条给定的方向上缩小步长,直到找到一个合适的步长,使得目标函数有所改善。以下是具体的步骤:
-
初始控制增量: 通过反向传播步骤得到一个控制增量,假设控制更新为:
其中是步长系数(初始设定为 1)。
-
目标函数改善判定: 对于每一个,我们进行前向传播以计算新的目标函数值。比较更新前后的目标函数值:
如果目标函数值降低,则可以接受当前的步长并更新控制序列。
-
缩小步长: 如果新的控制增量导致目标函数值变差,即,则缩小步长,例如设为初始值的一半(或其他比例),即,其中,继续计算目标函数值,直到找到合适的步长。
-
重复迭代: 重复前向模拟和目标函数比较,直到找到合适的 α\alphaα 或步长缩小到某个阈值以下。
3. python代码演示
import numpy as np
# System dynamics (nonlinear)
def system_dynamics(x, u):
return x + np.sin(u)
# Cost function for a single step
def cost_function(x, u):
return 0.5 * (x**2 + u**2)
# Derivative of the cost function w.r.t. control input u (l_u)
def cost_u(u):
return u
# Derivative of the cost function w.r.t. state x (l_x)
def cost_x(x):
return x
# Second derivative of the cost function w.r.t. control input u (l_uu)
def cost_uu():
return 1
# Second derivative of the cost function w.r.t. state x (l_xx)
def cost_xx():
return 1
# Terminal cost for the final state
def terminal_cost(x):
return 0.5 * x**2
# Derivative of terminal cost w.r.t state (l_x at final step)
def terminal_cost_x(x):
return x
# Function to calculate the initial state trajectory based on control sequence
def compute_initial_trajectory(x0, u):
x = np.zeros(N+1)
x[0] = x0
for k in range(N):
x[k+1] = system_dynamics(x[k], u[k])
return x
# iLQR algorithm with gradient descent, trust-region regularization, and line search
def ilqr_with_regularization_and_linesearch(x0, u, iterations, epsilon_u, epsilon_J, epsilon_x, max_step_size, line_search_steps=10):
x = compute_initial_trajectory(x0, u) # Compute initial trajectory
prev_cost = np.inf
regularization = 1e-3 # Regularization term for gradient descent (small to avoid overfitting)
for i in range(iterations):
# Backward pass
V_x = np.zeros(N+1)
V_xx = np.zeros(N+1)
V_x[-1] = terminal_cost_x(x[-1]) # Terminal value for V_x
V_xx[-1] = 1 # Terminal value for V_xx (quadratic cost on terminal state)
du = np.zeros(N) # Control updates
# Backward pass: compute Q function and control update
for k in range(N-1, -1, -1):
# Compute Q-function terms
f_u = np.cos(u[k]) # Derivative of system dynamics w.r.t. u
Q_u = cost_u(u[k]) + f_u * V_x[k+1] # Q_u = l_u + f_u^T * V_x(k+1)
Q_uu = cost_uu() + f_u**2 * V_xx[k+1] + regularization # Add regularization to avoid singularity
Q_x = cost_x(x[k]) + V_x[k+1] # Q_x = l_x + f_x^T * V_x(k+1)
Q_xx = cost_xx() + V_xx[k+1] # Q_xx = l_xx + f_x^T * V_xx(k+1) * f_x
# Update control input with trust-region regularization
step = -Q_u / Q_uu # Compute step size for control update
step = np.clip(step, -max_step_size, max_step_size) # Apply trust-region constraint (step size regularization)
du[k] = step
V_x[k] = Q_x + Q_uu * du[k] # Update value function gradient
V_xx[k] = Q_xx # Update value function Hessian (V_xx)
# Forward pass: update trajectory using the new control inputs and line search
x_new = np.zeros(N+1)
u_new = np.zeros(N)
x_new[0] = x0
alpha = 1.0 # Starting step size for line search
for search_step in range(line_search_steps):
for k in range(N):
u_new[k] = u[k] + alpha * du[k] # Update control with line search step
x_new[k+1] = system_dynamics(x_new[k], u_new[k]) # Update state
# Compute the total cost for the current trajectory
current_cost = np.sum([cost_function(x_new[k], u_new[k]) for k in range(N)]) + terminal_cost(x_new[-1])
# Check if this step size improves the cost
if current_cost < prev_cost:
break
alpha *= 0.5 # Reduce step size if no improvement
# 1. Stop based on control input change
if np.max(np.abs(du)) < epsilon_u:
print(f"Stopped due to control input convergence at iteration {i}")
break
# 2. Stop based on cost function change (relative)
if np.abs(prev_cost - current_cost) / np.abs(prev_cost) < epsilon_J:
print(f"Stopped due to cost function convergence at iteration {i}")
break
# 3. Stop based on state trajectory change
if np.max(np.abs(x_new - x)) < epsilon_x:
print(f"Stopped due to state trajectory convergence at iteration {i}")
break
# Update for next iteration
x = x_new
u = u_new
prev_cost = current_cost
else:
print("Reached maximum number of iterations")
return x, u, i
if __name__ == "__main__":
# iLQR parameters
N = 3 # Number of time steps
x0 = 1 # Initial state
iterations = 50 # Maximum number of iterations
epsilon_u = 1e-3 # Tolerance for control input changes
epsilon_J = 1e-4 # Tolerance for cost function change (relative)
epsilon_x = 1e-4 # Tolerance for state trajectory change
max_step_size = 1.0 # Trust region step size for control updates
# Initialize control sequence and state trajectory
u = np.zeros(N) # Initial control sequence
x = np.zeros(N+1) # State trajectory
x[0] = x0
# Compute initial trajectory
x_initial = compute_initial_trajectory(x0, u)
# Run iLQR with stopping conditions
x_final, u_final, num_iterations = ilqr_with_regularization_and_linesearch(x0, u, iterations, epsilon_u, epsilon_J, epsilon_x, max_step_size)
# Output the final results and number of iterations
print("Final state trajectory: ", x_final)
print("Final control inputs: ", u_final)
print("Total iterations: ", num_iterations)
输出结果如下:
Stopped due to cost function convergence at iteration 8
Final state trajectory: [1. 0.44370578 0.17737126 0.0834065 ]
Final control inputs: [-0.58991961 -0.26958819 -0.09410359]
Total iterations: 8