✨机器学习笔记（三）—— 多元线性回归、特征缩放、Scikit-Learn（未完待续）

news2026/2/15 21:15:49

Course1-Week2:
https://github.com/kaieye/2022-Machine-Learning-Specialization/tree/main/Supervised%20Machine%20Learning%20Regression%20and%20Classification/week2

机器学习笔记（三）

1️⃣多元线性回归及矢量化
2️⃣特征缩放（Feature Scaling）
3️⃣学习率（Learning Rate）
4️⃣特征工程（Feature Engineering）
5️⃣多项式回归
6️⃣Scikit-Learn

1️⃣多元线性回归及矢量化

多元线性回归（multiple linear regression）

🎈对于一元线性回归问题，我们只是考虑将 Size 作为 input 的情况来得出房屋的价格。

在这里插入图片描述

🎈而在现实中考虑房屋价格的因素绝不止有一个，所以我们引入了多维特征（房屋大小，卧室的数量，楼层数量，房屋的年龄）四个维度的特征来得出房价，数据集及说明如下图：

在这里插入图片描述

对于这四个维度分别表示为 $x_1,x_2,x_3,x_4$ ，为了方便使用向量 $\vec x$ 来表示，即对于第 $i$ 组案例有 $\vec x^{(i)}=({x_1}^{(i)},{x_2}^{(i)},{x_3}^{(i)},{x_4}^{(i)})$ ，所以可以写出这样的线性回归方程：
✨ $f_{w,b}(x)=w_1x_1+w_2x_2+w_3x_3+w_4x_4+b$ ，而对于 $w$ 也可用向量来表示 $\vec w=({w_1},{w_2},{w_3},{w_4})$ 。

✨ 扩展到 $n$ 维，可以得出多元线性回归模型：
$\begin{align} f_{\vec w,b}({\vec x}) &= \vec w \cdot \vec x + b \\ &= w_1x_1+w_2x_2+...+w_nx_n+b \end{align}$

多元线性回归模型的梯度下降法
在这里插入图片描述

矢量化（Vectorization）

🎈指在数值计算或机器学习中，将循环操作转换为向量或矩阵运算的过程。通过矢量化，可以利用现代处理器的并行计算能力，极大地提升代码的执行效率，尤其是在处理大量数据时。矢量化通常应用于使用 NumPy 或类似库来替代原本的显式循环（for-loop）操作。

在这里插入图片描述

🎈使用 NumPy 的 dot 函数去计算，很好地利用了处理器并行计算的能力，下面通过代码来演示显式循环和矢量化的性能差距。

在这里插入图片描述

# 引入库
import numpy as np    
import time

自定义一个函数用来显式循环计算

def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x

print(a.shape[0])

用两个数组 $a, b$ 来模拟一下计算过程

np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b)  #remove these big arrays from memory

在这里插入图片描述
通过演示，可以明显看到两种计算方式之间的性能差距。

2️⃣特征缩放（Feature Scaling）

🎈对于不同的特征，例如房屋的大小和卧室数量两种参数，由于房屋的大小这个值是远远大于卧室数量的，如果当选择差不多大的 $w$ 时，bedrooms 这个参数对于代价的影响将微乎其微。如果想让两者的影响相当的话可能要让一方选择较大的 $w$ ，而另一方选择较小的 $w$ 去平衡对代价 $J$ 的影响。
在这里插入图片描述

🎈我们最好还是想让两个参数一视同仁，对代价有着同等的影响。

✨特征缩放起到了这个作用，本质上是将每个特征除以用户选定的一个值，让参数得到 -1 到 1 的范围。
在这里插入图片描述

在这里插入图片描述

🎊还有两种方法也能达到目的：

✨均值归一化（Mean normalization）：
$x_i= \frac {x_i-\mu_i} {max-min}, \mu_i是所有特征 x 的平均值$

在这里插入图片描述

✨Z-score标准化（Z-score normalization）：

To implement z-score normalization, adjust your input values as shown in this formula:
${x_j}^{(i)}= \frac {{x_j}^{(i)}-\mu_j} {\sigma_j}$
where $j$ selects a feature or a column in the X matrix. $µ_j$ is the mean of all the values for feature(j) and $\sigma_j$ is the standard deviation of feature(j).