商务与经济统计：中英文转换

数据分析基础中英文转换

Chap 1 数据与统计资料 Data and Statistics

1.2 数据 Data

数据集 Data Sets

个体 Element 变量 Variables 观测值 Observations

测量尺度 Scales of Measurement：

名义尺度 Nominal Scale 顺序尺度 Ordinal Scale 区间尺度 Interval Scale

比率尺度 Ratio Scale

分类型数据 Categorical Data 数量型数据 Quantitative Data

分类变量 Categorical Variable 数量变量 Quantitative Variable

截面数据 Cross-Sectional Data 时间序列数据 Time Series Data

1.3 数据来源 Data Sources

现有来源 Existing Sources

观测性研究 observational (nonexperimental) studies

实验 experimental studies

时间需求 Time Requirement 获取成本 Cost of Acquisition

数据采集误差 Data Errors

1.4 描述统计 Descriptive Statistics （图形或数值形式汇总的统计方法被称为描述统计）

1.5 统计推断 Statistical Inference

普查 census （搜集总体全部数据的调查过程）

抽样调查 sample survey （搜集样本数据的调查过程）

1.6 逻辑分析方法 Analytics

描述性分析 Descriptive analytics 预测性分析 Predictive analytics

规范性分析 Prescriptive analytics （产生一个最佳行动过程的分析技术集合）

1.7 大数据和数据挖掘 Big Data and Data Mining

1.9 统计实践的道德准则 Ethical Guidelines for Statistical Practic

Chap 2 描述统计学（一）：表格法和图形法

Descriptive Statistics: Tabular and Graphical Displays

品质型数据 qualitative data

数量型数据 quantitative data

2.1 汇总分类型变量的数据 Summarizing Data for a Categorical Variable

频数分布 Frequency Distribution

相对频数分布 Relative Frequency Distribution

相对频数（频率） Relative Frequency

百分数频数分布 Percent Frequency Distribution

条形图 Bar Chart 饼图 Pie Chart

2.2 汇总数量型变量的数据 Summarizing Data for a Quantitative Variable

打点图 Dot Plot Histogram 直方图

偏度 Skewness

对称的 Symmetric

Moderately Right Skewed 适度右偏 Highly Skewed Right 严重右偏

累积曲线 ogive 累积频数分布 Cumulative Frequency Distribution

累积相对频数分布 Cumulative Relative Frequency Distribution

累积百分数频数分布 Cumulative Percent Frequency Distribution

茎叶显示 Stem-and-Leaf Display Leaf Unit叶单位

2.3 用图形方法汇总两个变量的数据 Summarizing Data for Two Variables using Tables

交叉分组表 Crosstabulation

辛普森悖论 Simpson’s Paradox

2.4 用图形显示方法汇总两个变量的数据 Summarizing Data for Two Variables
Using Graphical Displays

散点图 Scatter Diagram 趋势线 Trendline

复合条形图 Side-by-Side Bar Chart

结构条形图 Stacked Bar Chart

2.5 数据可视化：创建有效图形显示的最佳实践

Data Visualization: Best Practices in Creating Effective Graphical Displays

数据仪表盘 Data Dashboards

Chap 3 描述统计学（二）：数值方法

Descriptive Statistics: Numerical Measures

样本统计量 sample statistics

总体参数 population parameters

点估计量 point estimator

样本统计量是相应总体参数的点估计量

3.1 位置的度量 Measures of Location

平均数 mean

样本平均数 Sample Mean x拔

总体平均数 Population Mean μ

加权平均数 Weighted Mean 中位数 Median

几何平均数 Geometric Mean 众数 Mode

百分位数 Percentile 四分位数 Quartiles

第三四分位数 Third Quartile (75th Percentile)

切尾均值 Trimmed Mean（去掉两端的极端值后所计算的算术平均数）

3.2 变异程度的度量 Measures of Variability

极差 Range 四分位数间距 Interquartile Range=IQR

方差 Variance 标准差 Standard Deviation

标准差系数 Coefficient of Variation

3.3 分布形态、相对位置的度量以及异常值的检测

Measures of Distribution Shape, Relative Location, and Detecting Outliers

分布形态：偏度 Distribution Shape：Skewness

注：

01右偏，偏度为正，平均数＞中位数

02左偏，偏度为负，平均数＜中位数

z-分数 z-Scores （标准化变量）

切比雪夫定理 Chebyshev’s Theorem：

与平均数的距离在z个标准差之内的数据项所占比例至少为 1- 1/z²

经验法则 Empirical Rule 异常值的检测 Detecting Outliers

3.4 探索型数据分析

五数概括法 Five-Number Summaries 箱线图 Box Plots

3.5 两变量间关系的度量 Measures of Association Between Two Variables

协方差 Covariance

Chap 7 抽样和抽样分布 Sampling and Sampling Distributions

抽样总体 sampled population

抽样框 frame

7.1 Electronics Associates 公司的抽样问题

参数 parameters （总体的数字特征）

7.2 抽样 Sampling

从有限总体中抽样 Sampling from a Finite Population

从无限总体中抽样Sampling from an Infinite Population

简单随机样本 A simple random sample

无放回抽样 sampling without replacement

有放回抽样 sampling with replacement

7.3 点估计 Point Estimation

样本统计量 sample statistic

点估计量 point estimator

点估计值 point estimate

样本均值是总体均值的点估计，样本方差是总体方差的点估计，样本比例是总体比例的点估计→

x拔 as Point Estimator of μ

S as Point Estimator of 6

P拔 as Point Estimator of p

7.4 抽样分布 Sampling Distribution

7.5 样本均值的抽样分布 Sampling Distribution of x拔

X拔的数学期望 Expected Value of x拔 μ

无偏估计 unbiased estimator

X拔的标准差 the standard deviation of x拔 6/Sqrt(n)

样本大小 the sample size n

总体大小 the population size N

有限总体的修正系数 finite population correction factor

样本均值的标准差 the standard deviation of x拔

总体均值的标准误差standard error of the mean

总体服从正态分布 the population has a normal distribution

中心极限定理 Central Limit Theorem：

In selecting random samples of size n from a population, the sampling distribution of the sample mean

can be approximated by a normal distribution as the sample size becomes large.

书上案例 EAI问题中的X拔抽样分布 Sampling Distribution of

for SAT Scores

→使用标准化变量+查表（概率论与数理统计当中的中心极限定理）

样本容量与x拔的抽样分布的关系 Relationship Between the Sample Size and the Sampling Distribution of

随着样本容量的增加，均值的标准误差在减少。样本容量越大，样本均值落在总体均值附近某一特定范围内概率也越大。

7.6 样本比率的抽样分布 Sampling Distribution of p拔

p拔的数学期望 Expected Value of p拔 p

p拔的标准差 the standard deviation of p拔 Sqrt(p(1-p)/n)

结论：当np≥5并且n(1-p)≥5时，p拔的抽样分布可以用正态分布近似。

The sampling distribution of

can be approximated by a normal distribution whenever the sample size is large enough to satisfy the two conditions:

np≥5 and n(1-p)≥5

7.7 点估计的性质 Properties of Point Estimators

无偏性 unbiased

有效性 efficiency

一致性 consistency

7.8 其他抽样方法 Other Sampling Methods

分层抽样 Stratified Random Sampling

整群抽样 Cluster Sampling

系统抽样 Systematic Sampling

方便抽样 Convenience Sampling

判断抽样 Judgment Sampling

概率抽样 probability sampling 非概率抽样 non-probability sampling

Chap 14 简单线性回归 Simple Linear Regression

应变量 dependent variable

自变量 independent variable

14.1 简单线性模型 Simple Linear Regression Model

简单线性回归模型 simple linear regression model

简单线性回归方程 simple linear regression equation

估计的简单线性回归方程 the estimated simple linear regression equation

14.2 最小二乘法 Least Squares Method

估计回归方程的截距b0 intercept for the estimated regression equation

估计回归方程的斜率b1 slope for the estimated regression equation

14.3 判定系数 coefficient of determination R-sq

总的平方和 SST total sum of squares

误差平方和 SSE sum of squares due to error

回归平方和 SSR sum of squares due to regression

样本相关系数 sample correlation coefficient

14.4 模型的假定（误差项的假定） Assumptions About the Error Terme

01 e是一个均值为0的随机变量。

02 e的方差，对于所有x值都是相同的

03 e的取值是相互独立的。

04 e是一个正态分布的随机变量。

01 The error e is a random variable with mean of zero.

02 The variance of e, denoted by s 2, is the same for all values of the independent variable.

03 The values of e are independent.

04 The error e is a normally distributed random variable.

14.5 显著性检验 Testing for Significance

误差项e的方差的估计=an estimate of error e’s s²=MSE=the mean square error=SSE/n-p-1

T检验 T test

F检验 F test

假设 Hypotheses 原假设H0 备择假设Hα

检验统计量Test Statistic 拒绝规则（拒绝域）Rejection Rule

具体步骤：

01 Determine the hypotheses 确定原假设和备择假设

02 Specify the level of significance 确定显著性水平

03 Select the test statistic 选择检验统计量

04 State the rejection rule 确定拒绝域

05 Compute the value of the test statistic 计算检验统计量的值

06 Determine whether to reject H0 确定是否拒绝原假设

β1的置信区间 Confidence Interval for β1

14.6 应用估计的回归方程进行估计和预测 Using the Estimated Regression Equation for Estimation and Prediction

置信区间 A confidence interval is an interval estimate of the mean value of y for a given value of x.

预测区间 A prediction interval is used whenever we want to predict an individual value of y for a new observation corresponding to a given value of x.

置信区间的长度要比预测区间小。

14.7 计算机解法 Computer Solution

14.8 残差分析 Residual Analysis

→为了确定误差项的假定是否成立 to determine whether Assumptions About the Error Terme are right

第i次观察的残差 residual for observation i

关于x的残差图 residual plot against x

标准化残差图 standardized residual plot

正态概率图 normal probability plot

正态分数 normal standardized score

14,9 异常值和有影响的观测值 outliers and influential observations

检测异常值 detecting outliers

杠杆率 leverage ratio

高杠杆率点 high leverage points

Chap 15 多元线性回归 Multiple Regression

15.1 多元回归模型

多元回归模型 Multiple Regression Model

多元回归方程 Multiple Regression Equation

估计的多元回归方程 Estimated Multiple Regression Equation

15.2 多元回归最小二乘法

15.3 多元判定系数 Multiple Coefficient of Determination

多元判定系数 Multiple Coefficient of Determination

修正多元判定系数 Adjusted Multiple Coefficient of Determination

15.4 模型的假定 Model Assumptions

多元回归中关于误差项e的假定 assumptions about the error term e in the multiple regression

15.5 显著性检验 Testing for Significance

F检验 the test for overall significance

T检验 a test for individual significance

方差来源 source 平方和 sum of squares

自由度 degrees of freedom 均方 mean square

多重共线性 multicollinearity→the correlation among the independent variables

15.6 利用估计的回归方程进行估计和预测 Using the Estimated Regression Equation for Estimation and Prediction

Confidence Interval 置信区间 Prediction Interval 预测区间

Lower Limit 下限 Upper Limit 上限

15.7 定性自变量 Categorical Independent Variable

虚拟变量 Dummy Variable

解释参数 Interpreting the Parameters

更复杂的定性变量 More Complex Categorical Variables

→If a categorical variable has k levels, k-1 dummy variables are required, with each dummy variable being coded as 0 or 1.

15.8 残差分析 Residual Analysis

第i次观测的残差的标准差 standard deviation of residual i

检测异常值 detecting outliers

→残差很大的点

学生化删除残差（删除第i次观测值得到的标准化残差）studentized deleted residuals

有影响的观测值 Influential Observations

→对模型有较大影响的点，如果删除该点能改变拟合回归方程

库克距离 Cook’s Distance

库克距离测度 Using Cook’s Distance Measure

检测方法：

01用标准化残差直接检测异常值

02用学生化删除残差检测异常值

03用杠杆率检测有影响的观测值 hi>3(p+1)/n 杠杆率 leverage ratio
04 用库克距离检测有影响的观测值 Di＞1

15.9 Logistic回归 Logistic Regression

Logistic回归方程 Logistic Regression Equation

Logistic回归中的E(y) 被解释为概率 interpretation of E(y) as a probability in logistic regression

估计的Logistic回归方程 The estimated logistic regression equation

有利于一个事件发生的机会比（事情将要发生的概率与事情将不发生的概率之比）

The odds in favor of an event occurring

机会比率 odds ratio= odds1/odds0

Odds1: 当一组自变量中的一个自变量增加1个单位时，y=1的机会比

Odds2: 该组自变量的值都没有变化，y=1的机会比

对于Logistic回归方程的解释 Interpreting the Logistic Regression Equation

→odds ratio=e^β1

对数机会比变换 Logit Transformation

估计的对数机会比 estimated logit

Chap 16 回归分析建立模型 Regression Analysis Model Building

16.1 一般线性模型 General Linear Model

模拟曲线关系 Modeling Curvilinear Relationships

标准化残差图 Standardized Residual Plot

具有一个变量的二阶模型 second-order model with one predictor variable

交互作用 Interaction

涉及因变量的变换 Transformations Involving the Dependent Variable

对数变换 logarithmic transformation 倒数变换 reciprocal transformation

内蕴线性的非线性模型 Nonlinear Models That Are Intrinsically Linear

指数模型 The exponential model

16.2 确定什么时候增加或者删除变量 Determining When to Add or Delete Variables

16.3 大型问题的分析 Analysis of a Larger Problem

16.4 变量选择方法 Variable Selection Procedures

逐步回归 Stepwise Regression 前向选择 Forward Selection

后向消元 Backward Elimination 最佳子集回归 Best-Subsets Regression

16.5 实验设计的多元回归方法 Multiple Regression Approach to Experimental Design

16.6 自相关性和杜宾-瓦特森实验

自相关性 Autocorrelation 杜宾-瓦特斯实验 Durbin-Watson Test

杜宾-瓦特森检验统计量 Durbin-Watson Test Statistic 序列相关 serial correlation

Chap 17 时间序列及预测 Time Series Analysis and Forecasting

因果预测方法 causal method 时间序列方法 time series method.

截面回归 Cross-sectional regression 时间序列回归 Time-series regression

17.1 时间序列的模式 Time Series Patterns

水平模式 Horizontal Pattern 平稳时间序列 stationary time series

趋势模式 Trend Pattern 季节模式 Seasonal Pattern

季节与趋势模式 Trend and Seasonal Pattern

循环模式 Cyclical Pattern

17.2 预测精度 Forecast Accuracy

预测误差 forecast error

平均绝对误差 Mean Absolute Error (MAE) 预测误差的绝对值的平均数

均方误差 Mean Squared Error (MSE) 预测误差平方的平均数（预测误差平方和/自由度）

平均绝对百分数误差 Mean Absolute Percentage Error (MAPE) 百分数预测误差的平均数

朴素预测法 Naive forecast

17.3 移动平均法和指数平滑法

移动平均法 moving averages method 加权移动平均法 Weighted Moving Averages

指数平滑预测 Exponential Smoothing Forecast 平滑常数α smoothing constant

17.4 趋势推测法

趋势推测法 Trend Projection 线性趋势方程 Linear Trend Regression

非线性趋势回归 Nonlinear Trend Regression

二次趋势方程 quadratic trend equation 指数趋势方程 exponential trend equation

17.5 季节性和趋势

没有趋势的季节性 Seasonality without Trend 季节性和趋势 Seasonality and Trend

17.6 时间序列分解法 Time Series Decomposition

加法分解模型 Additive Decomposition Model

乘法分解模型 Multiplicative Decomposition Model

时间序列分解法步骤（书上案例）

第一步计算季节指数 Calculate seasonal index

第二步消除季节影响的时间序列 Deseasonalized Time Series

第三步利用消除季节影响的时间序列确定趋势 Using the Deseasonalized Time Series to Identify Trend

第四步季节调整 Seasonal Adjustments