解决matlab和python做zscore标准化结果不一样的问题

总结:

matlab和python做zscore时使用的求std公式的默认方法有差异，导致了结果差异。
想要结果相同则使用以下代码：

td = rand(50,15,39)%td是三维矩阵，求zscore结果
#Python代码：
tdzInPy = (td - np.mean(td, axis=1, keepdims=True)) / np.std(td,ddof=1, axis=1, keepdims=True)
#%对td第二维进行zscore,使用population standard deviation。

#Maltab代码：
tdzInMatlab = zscore(td,0,2);%对td第二维进行zscore,使用population standard deviation。
或
tdzInMatlab = zscore(td,[],2);%对td第二维进行zscore,使用population standard deviation。

代码中，变量tdzInPy 与tdzInMatlab 的计算结果是完全一致的。都是使用的总体标准差：population standard deviation。

举一反三，若要使用样本标准差sample standard deviation：则：

td = rand(50,15,39,100)
#Python代码：
tdzInPy = (td - np.mean(td, axis=1, keepdims=True)) / np.std(td,ddof=0, axis=1, keepdims=True)
#%对td第二维进行zscore,使用population standard deviation。

#Maltab代码：
tdzInMatlab = zscore(td,1,2);%对td第二维进行zscore,使用population standard deviation。

总之： python中，ddof =0代表求总体标准差（Population Standard Deviation）【不指定时默认此方法】，ddof =1代表求样本标准差（Sample Standard Deviation）。而matlab中，flag =0代表求样本标准差【不指定时默认此方法】，flag =1代表总体标准差。
matlab和python做zscore使用的默认方法有差异，导致了结果差异。

原理：

matlab官方文档中给出了样本标准差总体标准差2者的计算公式：

Z = zscore(X, flag, dim) scales X using the standard deviation indicated by flag.

If flag is 0 (default默认是0), then zscore scales X using the sample standard deviation, with n - 1 in the denominator of the standard deviation formula. zscore(X,0) is the same as zscore(X).

If flag is 1, then zscore scales X using the population standard deviation, with n in the denominator of standard deviation formula.

上图是求样本标准差总体标准差的公式，样本标准差和总体标准差的差异就在分母上，差别其实不大。

python中：

在Python中，你可以使用NumPy库来计算样本标准差和总体标准差。下面是如何计算它们的示例：

首先，导入NumPy库：

import numpy as np

然后，假设你有一个数据集data，可以使用以下方法计算样本标准差和总体标准差：

计算样本标准差（Sample Standard Deviation）：

sample_std_deviation = np.std(data, ddof=1)

其中，ddof参数用于指定自由度（degrees of freedom），通常设置为1以计算样本标准差。如果你的数据集代表整个总体而不是样本，你可以将ddof设置为0。

计算总体标准差（Population Standard Deviation）：

population_std_deviation = np.std(data, ddof=0)

这两个函数将返回相应的标准差值。请根据你的数据集和需求使用适当的函数来计算标准差。
官方解释：
numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=, *, where=)[source]
The average squared deviation is typically calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

总之

总之就是，python中，ddof =0代表求总体标准差（Population Standard Deviation）【不指定时默认此方法】，ddof =1代表求样本标准差（Sample Standard Deviation）。而matlab中，flag =0代表求样本标准差【不指定时默认此方法】，flag =1代表总体标准差。

#标准化
#归一化
#zscore
#maltab
#python

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/983945.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！