1.PCA介绍

来源：视频https://www.bilibili.com/video/BV1E5411E71z/

笔记：https://www.bilibili.com/read/cv23587690?spm_id_from=333.999.0.0&jump_opus=1

PCA就是找坐标系，使得数据在保留一维数据，损失是最小的。

目标：只保留一个轴的时候（二维降到一维），信息保留最多

怎么样最好：

找到数据分布最分散的方向（方差最大），作为主成分（坐标轴）

去中心化（把坐标原点放在数据中心)
找坐标系(找到方差最大的方向)

问题是:怎么找到方差最大的方向?

白数据

拉伸决定了方差最大的方向是横或者纵

旋转决定了方差最大的方向的角度

怎么求解PCA

PCA缺点：离群点影响大

PCA与SVD（奇异值分解Singular Value Decomposition）

SVD中的右奇异矩阵V，就是PCA的主成分

PCA需要先求出协方差矩阵:计算量可能很大

SVD有两个好处:

1)一些SVD的实现算法可不求出协方差矩阵C也能求出右奇异矩阵V

2)PCA仅仅使用了SVD的右奇异矩阵V,没有使用到左奇异值矩阵U，那么U有什么用呢?

2.用PCA给将二维数据降成一维

参考：https://blog.csdn.net/Shiraka/article/details/122354007

https://blog.csdn.net/weixin_42010722/article/details/123826197

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.decomposition import PCA

rng = np.random.RandomState(1)
X = [[6,-4],[-3,5],[-2,6],[7,-3]]
X = np.array(X)
plt.scatter(X[:, 0], X[:,-1])


pca = PCA(n_components=2)
pca.fit(X)

print(pca.explained_variance_)#PCA解释方差；
print("新的轴向量：")
print(pca.components_)# PCA分量；
print("各维度投影方差占比分布：")
print(pca.explained_variance_ratio_)
print("各点在新轴上的投影：")
print(pca.transform(X))


def draw_vector(v0, v1, ax=None):
    ax = ax or plt.gca()
    arrowprops=dict(arrowstyle='->',
                    linewidth=2,
                    shrinkA=0, shrinkB=0)
    ax.annotate('', v1, v0, arrowprops=arrowprops)

# plot data
plt.scatter(X[:, 0], X[:, 1], alpha=0.2)
for length, vector in zip(pca.explained_variance_, pca.components_):
    v = vector * 3 * np.sqrt(length)
    draw_vector(pca.mean_, pca.mean_ + v)
plt.axis('equal');


pca = PCA(n_components=1) # 降到一维
pca.fit(X)
print("新的轴向量：")
print(pca.components_)
print("各维度投影方差占比分布：")
print(pca.explained_variance_ratio_)
print("各点在新轴上的投影：")
print(pca.transform(X))