pandas 处理什么样的数据？

pandas 数据表格的表示

在这里插入图片描述

想存储一些 Titanic 乘客数据，知道姓名，年龄，性别等；

df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)
df

                       Name  Age     Sex
0   Braund, Mr. Owen Harris   22    male
1  Allen, Mr. William Henry   35    male
2  Bonnell, Miss. Elizabeth   58  female

要手动在表中存储数据，请创建DataFrame。当使用Python列表字典时，字典键将用作列标头，每个列表中的值将用作DataFrame的列。
DataFrame是一种二维数据结构，可以在列中存储不同类型的数据(包括字符、整数、浮点值、分类数据等)。它类似于电子表格、SQL表或R中的data.frame。

DataFrame 中的每一列都是一个 Series

在这里插入图片描述

要提取年龄列信息

df["Age"]

0    22
1    35
2    58
Name: Age, dtype: int64

选择 DataFrame 里单独的一列，结果是 Series，可以利用 [] 选择列名即可。
也可以手动创建一个 Series 。
Series 没有列标签，只是一个 DataFrame 单独的列

ages = pd.Series([22, 35, 58], name="Age")
ages

0    22
1    35
2    58
Name: Age, dtype: int64

对 DataFrame 或者 Series 做点什么？

找到最大年龄的乘客

df["Age"].max()
# 58

pandas 提供了大量的函数，可以将这些函数应用于 DataFrame 或者 Series，最后不要忘了 () .

describe

基础的数理统计

df.describe()

             Age
count   3.000000
mean   38.333333
std    18.230012
min    22.000000
25%    28.500000
50%    35.000000
75%    46.500000
max    58.000000

describe()方法提供了一个DataFrame中数值数据的快速概述。由于Name和Sex列是文本数据，因此在默认情况下，describe()方法不会考虑这些列。
许多pandas操作返回一个DataFrame或Series。describe()方法是pandas操作返回pandas Series或pandas DataFrame的一个例子。