数据分组
Splitting
: 利用某些条件将数据进行分组Applying
: 函数应用于每个单独的分组Combining
: 合并最终的结果
df = pd.DataFrame(
{
"A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
"B": ["one", "one", "two", "three", "two", "two", "one", "three"],
"C": np.random.randn(8),
"D": np.random.randn(8),
}
)
df
A B C D
0 foo one -0.738005 -2.019732
1 bar one 0.887627 0.015670
2 foo two -0.108933 -0.077614
3 bar three 0.076641 1.675694
4 foo two -0.787585 0.466678
5 bar two 0.193921 -0.345819
6 foo one 0.846988 -1.513333
7 foo three 1.110915 0.189766
分组并应用 sum()
对他们进行求和汇总
C D
A B
bar one 0.887627 0.015670
three 0.076641 1.675694
two 0.193921 -0.345819
foo one 0.108983 -3.533064
three 1.110915 0.189766
two -0.896518 0.389064
先对 A
分组,后对 B
分组
C D
B A
one bar 0.887627 0.015670
foo 0.108983 -3.533064
three bar 0.076641 1.675694
foo 1.110915 0.189766
two bar 0.193921 -0.345819
foo -0.896518 0.389064
先对
B
分组,后对A
分组
注意:对多个列进行操作,用
[["C", "D"]]
对一个列进行操作,可以用["C"]
, 当然也可以用[["C"]]
数据表格形状改变
Stack
tuples = list(
zip(
["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
["one", "two", "one", "two", "one", "two", "one", "two"],
)
)
# tuples
# 多索引值
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.DataFrame(np.random.randn(8, 3), columns=["C1", "C2", "C3"], index=index)
df2 = df[:5]
df2
stack 将数据压缩成一个列
上面例子中 df2 的 shape 为 (5,3)
stacked 的 shape 为 (15, )
Pivot
创建一个电子表格风格的数据透视表作为数据框架。
函数原型: pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)
df = pd.DataFrame(
{
"C1": ["one", "one", "two", "three"] * 3,
"C2": ["A", "B", "C"] * 4,
"C3": ["foo", "foo", "foo", "bar", "bar", "bar"] * 2,
"C4": np.random.randn(12),
"C5": np.random.randn(12),
}
)
df
取 C1
列的值作为新的 label
取 C2
, C3
列的值作为索引
取 C5
列的值作为表里的值, 无值则补 NaN
C1 one three two
C2 C3
A bar 0.225416 -1.335228 NaN
foo -0.049645 NaN -1.054699
B bar 0.594608 NaN -1.495795
foo -2.182207 -0.359334 NaN
C bar -0.873641 1.551327 NaN
foo -1.594076 NaN -0.669410
Lnton 羚通是专注于音视频算法、算力、云平台的高科技人工智能企业。 公司基于视频分析技术、视频智能传输技术、远程监测技术以及智能语音融合技术等, 拥有多款可支持 ONVIF、RTSP、GB/T28181 等多协议、多路数的音视频智能分析服务器 / 云平台。