Python酷库之旅-第三方库Pandas(119)

一、用法精讲

526、pandas.DataFrame.head方法

526-1、语法

526-2、参数

526-3、功能

526-4、返回值

526-5、说明

526-6、用法

526-6-1、数据准备

526-6-2、代码示例

526-6-3、结果输出

527、pandas.DataFrame.idxmax方法

527-1、语法

527-2、参数

527-3、功能

527-4、返回值

527-5、说明

527-6、用法

527-6-1、数据准备

527-6-2、代码示例

527-6-3、结果输出

528、pandas.DataFrame.idxmin方法

528-1、语法

528-2、参数

528-3、功能

528-4、返回值

528-5、说明

528-6、用法

528-6-1、数据准备

528-6-2、代码示例

528-6-3、结果输出

529、pandas.DataFrame.last方法

529-1、语法

529-2、参数

529-3、功能

529-4、返回值

529-5、说明

529-6、用法

529-6-1、数据准备

529-6-2、代码示例

529-6-3、结果输出

530、pandas.DataFrame.reindex方法

530-1、语法

530-2、参数

530-3、功能

530-4、返回值

530-5、说明

530-6、用法

530-6-1、数据准备

530-6-2、代码示例

530-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

526、pandas.DataFrame.head方法

526-1、语法

# 526、pandas.DataFrame.head方法
pandas.DataFrame.head(n=5)
Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last |n| rows, equivalent to df[:n].

If n is larger than the number of rows, this function returns all rows.

Parameters:
n
int, default 5
Number of rows to select.

Returns:
same type as caller
The first n rows of the caller object.

526-2、参数

526-2-1、n(可选，默认值为5)：整数，表示要返回的行数，通过指定n的值，可以控制返回的数据行数。默认情况下，n=5，也就是返回前5行。当n为正整数时，它返回从头开始的n行；当n为负数时，则会返回排除最后|n|行的数据。

526-3、功能

用于返回数据框的前几行，它通常用于快速查看数据框的头几行数据，以便了解数据的结构、列的名称及数据分布。

526-4、返回值

返回类型为pandas.DataFrame，即原始数据框的一个子集，包含前n行数据，如果数据框的行数小于n，则返回整个数据框。

526-5、说明

无

526-6、用法

526-6-1、数据准备

无

526-6-2、代码示例

# 526、pandas.DataFrame.head方法
import pandas as pd
data = {'A': range(10), 'B': range(10, 20)}
df = pd.DataFrame(data)
# 返回前5行
print(df.head())

526-6-3、结果输出

# 526、pandas.DataFrame.head方法
#    A   B
# 0  0  10
# 1  1  11
# 2  2  12
# 3  3  13
# 4  4  14

527、pandas.DataFrame.idxmax方法

527-1、语法

# 527、pandas.DataFrame.idxmax方法
pandas.DataFrame.idxmax(axis=0, skipna=True, numeric_only=False)
Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.

Parameters:
axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

numeric_onlybool, default False
Include only float, int or boolean data.

New in version 1.5.0.

Returns:
Series
Indexes of maxima along the specified axis.

Raises:
ValueError
If the row/column is empty.

527-2、参数

527-2-1、axis(可选，默认值为0)：{0或'index', 1或'columns'}，如果是0或'index'，则沿着每一列查找最大值的索引；如果是1或'columns'，则沿着每一行查找最大值的索引。

527-2-2、skipna(可选，默认值为True)：布尔值，如果为True，计算时会忽略NaN值；如果为False，遇到NaN值时将返回NaN。

527-2-3、numeric_only(可选，默认值为False)：布尔值，如果为True，只考虑数值型数据；如果为False，则包括所有类型的数据，可能在含有非数值类型数据的列中返回错误。

527-3、功能

用于定位DataFrame中最大值所在的行索引(如果按列查找)或列索引(如果按行查找)。

527-4、返回值

返回一个Series，其索引是原DataFrame的列索引(如果按列查找)，或者行索引(如果按行查找)，值是对应列(或行)中最大值的索引位置。

527-5、说明

无

527-6、用法

527-6-1、数据准备

无

527-6-2、代码示例

# 527、pandas.DataFrame.idxmax方法
import pandas as pd
# 创建一个示例DataFrame
data = {
    'A': [1, 3, 5],
    'B': [4, 2, 8],
    'C': [7, None, 6]
}
df = pd.DataFrame(data)
# 查找每一列最大值的索引
max_indices = df.idxmax(axis=0)
print(max_indices)

527-6-3、结果输出

# 527、pandas.DataFrame.idxmax方法
# A    2
# B    2
# C    0
# dtype: int64

528、pandas.DataFrame.idxmin方法

528-1、语法

# 528、pandas.DataFrame.idxmin方法
pandas.DataFrame.idxmin(axis=0, skipna=True, numeric_only=False)
Return index of first occurrence of minimum over requested axis.

NA/null values are excluded.

Parameters:
axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

numeric_onlybool, default False
Include only float, int or boolean data.

New in version 1.5.0.

Returns:
Series
Indexes of minima along the specified axis.

Raises:
ValueError
If the row/column is empty.

528-2、参数

528-2-1、axis(可选，默认值为0)：{0或'index', 1或'columns'}，如果是0或'index'，则沿着每一列查找最小值的索引；如果是1或'columns'，则沿着每一行查找最小值的索引。

528-2-2、skipna(可选，默认值为True)：布尔值，如果为True，计算时会忽略NaN值；如果为False，遇到NaN值时将返回NaN。

528-2-3、numeric_only(可选，默认值为False)：布尔值，如果为True，只考虑数值型数据；如果为False，则包括所有类型的数据，可能在含有非数值类型数据的列中返回错误。

528-3、功能

用于定位DataFrame中最小值所在的行索引(如果按列查找)或列索引(如果按行查找)。

528-4、返回值

返回一个Series，其索引是原DataFrame的列索引(如果按列查找)，或者行索引(如果按行查找)，值是对应列(或行)中最小值的索引位置。

528-5、说明

无

528-6、用法

528-6-1、数据准备

无

528-6-2、代码示例

# 528、pandas.DataFrame.idxmin方法
import pandas as pd
# 创建一个示例DataFrame
data = {
    'A': [1, 3, 5],
    'B': [4, 2, 8],
    'C': [7, None, 6]
}
df = pd.DataFrame(data)
# 查找每一列最小值的索引
min_indices = df.idxmin(axis=0)
print(min_indices)

528-6-3、结果输出

# 528、pandas.DataFrame.idxmin方法
# A    0
# B    1
# C    2
# dtype: int64

529、pandas.DataFrame.last方法

529-1、语法

# 529、pandas.DataFrame.last方法
pandas.DataFrame.last(offset)
Select final periods of time series data based on a date offset.

Deprecated since version 2.1: last() is deprecated and will be removed in a future version. Please create a mask and filter using .loc instead.

For a DataFrame with a sorted DatetimeIndex, this function selects the last few rows based on a date offset.

Parameters:
offset
str, DateOffset, dateutil.relativedelta
The offset length of the data that will be selected. For instance, ‘3D’ will display all the rows having their index within the last 3 days.

Returns:
Series or DataFrame
A subset of the caller.

Raises:
TypeError
If the index is not a DatetimeIndex.

529-2、参数

529-2-1、offset(必须)：字符串，指定时间段的长度，offset字符串可以表示为不同的时间单位，比如'1D'(1天)、'2H'(2小时)、'30T'(30分钟)等，它表示希望从DataFrame的末尾获取的数据区间的时间长度。

529-3、功能

根据指定的时间单位，返回DataFrame中最后一段时间的数据，该方法通常与时间索引的DataFrame一起使用，特别适用于具有时间序列数据的场景。

529-4、返回值

返回一个新的DataFrame，其中包含从末尾开始的指定时间段内的数据，如果没有满足条件的行，则返回一个空的DataFrame。

529-5、说明

无

529-6、用法

529-6-1、数据准备

无

529-6-2、代码示例

# 529、pandas.DataFrame.last方法
import pandas as pd
# 创建一个时间序列（日期范围缩小至7天以匹配数据长度）
date_rng = pd.date_range(start='2024-08-01', end='2024-08-07', freq='D')
# 创建一个与日期匹配的数据集
data = {
    'A': [1, 2, 3, 4, 5, 6, 7],
    'B': [7, 6, 5, 4, 3, 2, 1]
}
# 创建DataFrame，使用时间序列作为索引
df = pd.DataFrame(data, index=date_rng)
# 输出整个DataFrame
print("完整的DataFrame:")
print(df)
# 使用last()方法获取最后3天的数据
last_3_days = df.last('3D')
print("\n最后3天的数据:")
print(last_3_days)
# 使用last()方法获取最后5天的数据
last_5_days = df.last('5D')
print("\n最后5天的数据:")
print(last_5_days)
# 使用last()方法获取最后10天的数据
last_10_days = df.last('10D')
print("\n最后10天的数据（超出实际数据范围）:")
print(last_10_days)

529-6-3、结果输出

# 529、pandas.DataFrame.last方法
# 完整的DataFrame:
#             A  B
# 2024-08-01  1  7
# 2024-08-02  2  6
# 2024-08-03  3  5
# 2024-08-04  4  4
# 2024-08-05  5  3
# 2024-08-06  6  2
# 2024-08-07  7  1
# 
# 最后3天的数据:
#             A  B
# 2024-08-05  5  3
# 2024-08-06  6  2
# 2024-08-07  7  1
# 
# 最后5天的数据:
#             A  B
# 2024-08-03  3  5
# 2024-08-04  4  4
# 2024-08-05  5  3
# 2024-08-06  6  2
# 2024-08-07  7  1
# 
# 最后10天的数据（超出实际数据范围）:
#             A  B
# 2024-08-01  1  7
# 2024-08-02  2  6
# 2024-08-03  3  5
# 2024-08-04  4  4
# 2024-08-05  5  3
# 2024-08-06  6  2
# 2024-08-07  7  1

530、pandas.DataFrame.reindex方法

530-1、语法

# 530、pandas.DataFrame.reindex方法
pandas.DataFrame.reindex(labels=None, *, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)
Conform DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
labelsarray-like, optional
New labels / index to conform the axis specified by ‘axis’ to.

indexarray-like, optional
New labels for the index. Preferably an Index object to avoid duplicating data.

columnsarray-like, optional
New labels for the columns. Preferably an Index object to avoid duplicating data.

axisint or str, optional
Axis to target. Can be either the axis name (‘index’, ‘columns’) or number (0, 1).

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}
Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

None (default): don’t fill gaps

pad / ffill: Propagate last valid observation forward to next valid.

backfill / bfill: Use next valid observation to fill gap.

nearest: Use nearest valid observations to fill gap.

copybool, default True
Return a new object, even if the passed indexes are the same.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuescalar, default np.nan
Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

limitint, default None
Maximum number of consecutive elements to forward or backward fill.

toleranceoptional
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns:
DataFrame with changed index.

530-2、参数

530-2-1、labels(可选，默认值为None)：指要重新索引的标签，该参数一般不直接使用，而是通过index或columns参数使用。

530-2-2、index(可选，默认值为None)：指新的索引标签，如果提供了这个参数，DataFrame的行索引将按照这个参数进行重新排列；如果某个索引在原DataFrame中不存在，那么会填充NaN(可通过fill_value指定填充值)。

530-2-3、columns(可选，默认值为None)：指新的列标签，与index类似，用于重新排列DataFrame的列索引；如果某个列标签在原DataFrame中不存在，也会填充NaN。

530-2-4、axis(可选，默认值为None)：{0 or 'index', 1 or 'columns'}，指定重新索引的轴，如果指定为1，则是按照列索引进行重新排列。

530-2-5、method(可选，默认值为None)：{None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}，用于填充缺失标签的数据插值方法，可选值：

'backfill'或'bfill'：使用向前填充数据
'pad'或'ffill'：使用向后填充数据
'nearest'：使用最近的值进行填充

530-2-6、copy(可选，默认值为None)：布尔值，如果为False，尝试避免对数据进行复制。默认会进行复制，除非新索引与旧索引完全一致且没有其他改变。

530-2-7、level(可选，默认值为None)：整数或字符串，只在多重索引(MultiIndex)DataFrame中适用，指定重新索引的是哪一个级别。

530-2-8、fill_value(可选，默认值为nan)：用于填补重新索引后缺失值的标量值。

530-2-9、limit(可选，默认值为None)：整数，填充时的最大步数限制，该参数结合method一起使用。

530-2-10、tolerance(可选，默认值为None)：用于限制填充过程中允许的最大距离，需要与method='nearest'组合使用。

530-3、功能

根据给定的index和columns参数重新排列索引和列，如果新的索引或列在原DataFrame中不存在，这些位置将被填充NaN或指定的fill_value。

530-4、返回值

返回值是一个新的DataFrame，新的DataFrame根据指定的索引和列进行了重新排列，可能会包含一些填充值(例如NaN或者用户指定的fill_value)，以填补新的索引或列在原始DataFrame中不存在的位置。

530-5、说明

无

530-6、用法

530-6-1、数据准备

无

530-6-2、代码示例

# 530、pandas.DataFrame.reindex方法
import pandas as pd
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}, index=['a', 'b', 'c'])
print("原始DataFrame:")
print(df)
# 重新按照新的索引重新排列DataFrame
new_index = ['a', 'c', 'd']
new_columns = ['A', 'C']
df_reindexed = df.reindex(index=new_index, columns=new_columns, fill_value=0)
print("\n重新索引后的DataFrame:")
print(df_reindexed)

530-6-3、结果输出

# 530、pandas.DataFrame.reindex方法
# 原始DataFrame:
#    A  B
# a  1  4
# b  2  5
# c  3  6
# 
# 重新索引后的DataFrame:
#    A  C
# a  1  0
# c  3  0
# d  0  0