一、累计统计函数
函数 | 作用 |
---|---|
cumsum | 计算前1/2/3/…/n个数的和 |
cummax | 计算前1/2/3/…/n个数的最大值 |
cummin | 计算前1/2/3/…/n个数的最小值 |
cumprod | 计算前1/2/3/…/n个数的积 |
import numpy as np
import pandas as pd
# np.nan :空值
df = pd.DataFrame({'key1': np.arange(10),
'key2': np.random.rand(10) * 10})
print("df = \n", df)
print('-' * 200)
key1_cumsum = df['key1'].cumsum()
key2_cumsum = df['key2'].cumsum()
print("key1_cumsum = \n{0} \ntype(key1_cumsum) = {1}".format(key1_cumsum, type(key1_cumsum)))
print('-' * 50)
print("key2_cumsum = \n{0} \ntype(key2_cumsum) = {1}".format(key2_cumsum, type(key2_cumsum)))
print('-' * 50)
df['key1_cumsum'] = df['key1'].cumsum()
df['key2_cumsum'] = df['key2'].cumsum()
print("添加cumsum样本的累计和之后: df = \n", df)
print('-' * 200)
key1_cumprod = df['key1'].cumprod()
key2_cumprod = df['key2'].cumprod()
print("key1_cumprod = \n{0} \ntype(key1_cumprod) = {1}".format(key1_cumprod, type(key1_cumprod)))
print('-' * 50)
print("key2_cumprod = \n{0} \ntype(key2_cumprod) = {1}".format(key2_cumprod, type(key2_cumprod)))
print('-' * 50)
df['key1_cumprod'] = key1_cumprod
df['key2_cumprod'] = key2_cumprod
print("添加cumprod样本的累计积之后: df = \n", df)
print('-' * 200)
# cummax,cummin分别求累计最大值,累计最小值,会填充key1,和key2的值,返回新的对象
df1 = df.cummax()
df2 = df.cummin()
print("df = \n", df)
print('-' * 50)
print("df1 = df.cummax() = \n", df1)
print('-' * 50)
print("df2 = df.cummin() = \n", df2)
print('-' * 200)
打印结果:
df =
key1 key2
0 0 5.946567
1 1 6.500338
2 2 0.517269
3 3 6.888832
4 4 0.029891
5 5 6.908777
6 6 4.522801
7 7 6.755125
8 8 6.676930
9 9 3.002233
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
key1_cumsum =
0 0
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
Name: key1, dtype: int32
type(key1_cumsum) = <class 'pandas.core.series.Series'>
--------------------------------------------------
key2_cumsum =
0 5.946567
1 12.446905
2 12.964174
3 19.853006
4 19.882897
5 26.791673
6 31.314474
7 38.069599
8 44.746529
9 47.748762
Name: key2, dtype: float64
type(key2_cumsum) = <class 'pandas.core.series.Series'>
--------------------------------------------------
添加cumsum样本的累计和之后: df =
key1 key2 key1_cumsum key2_cumsum
0 0 5.946567 0 5.946567
1 1 6.500338 1 12.446905
2 2 0.517269 3 12.964174
3 3 6.888832 6 19.853006
4 4 0.029891 10 19.882897
5 5 6.908777 15 26.791673
6 6 4.522801 21 31.314474
7 7 6.755125 28 38.069599
8 8 6.676930 36 44.746529
9 9 3.002233 45 47.748762
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
key1_cumprod =
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
Name: key1, dtype: int32
type(key1_cumprod) = <class 'pandas.core.series.Series'>
--------------------------------------------------
key2_cumprod =
0 5.946567
1 38.654696
2 19.994865
3 137.741271
4 4.117176
5 28.444652
6 128.649488
7 869.043329
8 5802.541623
9 17420.580379
Name: key2, dtype: float64
type(key2_cumprod) = <class 'pandas.core.series.Series'>
--------------------------------------------------
添加cumprod样本的累计积之后: df =
key1 key2 key1_cumsum key2_cumsum key1_cumprod key2_cumprod
0 0 5.946567 0 5.946567 0 5.946567
1 1 6.500338 1 12.446905 0 38.654696
2 2 0.517269 3 12.964174 0 19.994865
3 3 6.888832 6 19.853006 0 137.741271
4 4 0.029891 10 19.882897 0 4.117176
5 5 6.908777 15 26.791673 0 28.444652
6 6 4.522801 21 31.314474 0 128.649488
7 7 6.755125 28 38.069599 0 869.043329
8 8 6.676930 36 44.746529 0 5802.541623
9 9 3.002233 45 47.748762 0 17420.580379
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
df =
key1 key2 key1_cumsum key2_cumsum key1_cumprod key2_cumprod
0 0 5.946567 0 5.946567 0 5.946567
1 1 6.500338 1 12.446905 0 38.654696
2 2 0.517269 3 12.964174 0 19.994865
3 3 6.888832 6 19.853006 0 137.741271
4 4 0.029891 10 19.882897 0 4.117176
5 5 6.908777 15 26.791673 0 28.444652
6 6 4.522801 21 31.314474 0 128.649488
7 7 6.755125 28 38.069599 0 869.043329
8 8 6.676930 36 44.746529 0 5802.541623
9 9 3.002233 45 47.748762 0 17420.580379
--------------------------------------------------
df1 = df.cummax() =
key1 key2 key1_cumsum key2_cumsum key1_cumprod key2_cumprod
0 0 5.946567 0 5.946567 0 5.946567
1 1 6.500338 1 12.446905 0 38.654696
2 2 6.500338 3 12.964174 0 38.654696
3 3 6.888832 6 19.853006 0 137.741271
4 4 6.888832 10 19.882897 0 137.741271
5 5 6.908777 15 26.791673 0 137.741271
6 6 6.908777 21 31.314474 0 137.741271
7 7 6.908777 28 38.069599 0 869.043329
8 8 6.908777 36 44.746529 0 5802.541623
9 9 6.908777 45 47.748762 0 17420.580379
--------------------------------------------------
df2 = df.cummin() =
key1 key2 key1_cumsum key2_cumsum key1_cumprod key2_cumprod
0 0 5.946567 0 5.946567 0 5.946567
1 0 5.946567 0 5.946567 0 5.946567
2 0 0.517269 0 5.946567 0 5.946567
3 0 0.517269 0 5.946567 0 5.946567
4 0 0.029891 0 5.946567 0 4.117176
5 0 0.029891 0 5.946567 0 4.117176
6 0 0.029891 0 5.946567 0 4.117176
7 0 0.029891 0 5.946567 0 4.117176
8 0 0.029891 0 5.946567 0 4.117176
9 0 0.029891 0 5.946567 0 4.117176
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process finished with exit code 0
二、累计统计函数怎么用?
以上这些函数可以对series和dataframe操作
这里我们按照时间的从前往后来进行累计
- 排序
# 排序之后,进行累计求和 data = data.sort_index()
- 对p_change进行求和
stock_rise = data['p_change'] # plot方法集成了前面直方图、条形图、饼图、折线图 stock_rise.cumsum() 2015-03-02 2.62 2015-03-03 4.06 2015-03-04 5.63 2015-03-05 7.65 2015-03-06 16.16 2015-03-09 16.37 2015-03-10 18.75 2015-03-11 16.36 2015-03-12 15.03 2015-03-13 17.58 2015-03-16 20.34 2015-03-17 22.42 2015-03-18 23.28 2015-03-19 23.74 2015-03-20 23.48 2015-03-23 23.74
使用matplotlib画出连续求和的结果:
如果要使用plot函数,需要导入matplotlib.
import matplotlib.pyplot as plt
# plot显示图形
stock_rise.cumsum().plot()
# 需要调用show,才能显示出结果
plt.show()