Python酷库之旅-第三方库Pandas(117)

一、用法精讲

516、pandas.DataFrame.add_suffix方法

516-1、语法

516-2、参数

516-3、功能

516-4、返回值

516-5、说明

516-6、用法

516-6-1、数据准备

516-6-2、代码示例

516-6-3、结果输出

517、pandas.DataFrame.align方法

517-1、语法

517-2、参数

517-3、功能

517-4、返回值

517-5、说明

517-6、用法

517-6-1、数据准备

517-6-2、代码示例

517-6-3、结果输出

518、pandas.DataFrame.at_time方法

518-1、语法

518-2、参数

518-3、功能

518-4、返回值

518-5、说明

518-6、用法

518-6-1、数据准备

518-6-2、代码示例

518-6-3、结果输出

519、pandas.DataFrame.between_time方法

519-1、语法

519-2、参数

519-3、功能

519-4、返回值

519-5、说明

519-6、用法

519-6-1、数据准备

519-6-2、代码示例

519-6-3、结果输出

520、pandas.DataFrame.drop方法

520-1、语法

520-2、参数

520-3、功能

520-4、返回值

520-5、说明

520-6、用法

520-6-1、数据准备

520-6-2、代码示例

520-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

516、pandas.DataFrame.add_suffix方法

516-1、语法

# 516、pandas.DataFrame.add_suffix方法
pandas.DataFrame.add_suffix(suffix, axis=None)
Suffix labels with string suffix.

For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.

Parameters:
suffixstr
The string to add after each label.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to add suffix on

New in version 2.0.0.

Returns:
Series or DataFrame
New Series or DataFrame with updated labels.

516-2、参数

516-2-1、suffix(必须)：字符串，要添加的后缀，表示你想要在列标签或行标签后添加的字符串。

516-2-2、axis(可选，默认值为None)：表示你想在哪个轴上添加后缀，具体如下：

默认值为None，表示对列标签添加后缀(相当于axis=1)。
axis=0：对行标签(索引)添加后缀。
axis=1：对列标签添加后缀

516-3、功能

用于为DataFrame的行或列标签添加后缀。

516-4、返回值

返回一个新的DataFrame，操作完后缀的标签和原来的数据内容保持一致，但标签已根据指定的后缀进行了修改。

516-5、说明

无

516-6、用法

516-6-1、数据准备

无

516-6-2、代码示例

# 516、pandas.DataFrame.add_suffix方法
import pandas as pd
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}, index=['x', 'y', 'z'])
print(df, '\n')
# 为列标签添加后缀
df_with_suffix = df.add_suffix('_col')
print(df_with_suffix, '\n')
# 为行标签添加后缀
df_with_suffix = df.add_suffix('_row', axis=0)
print(df_with_suffix)

516-6-3、结果输出

# 516、pandas.DataFrame.add_suffix方法
#    A  B
# x  1  4
# y  2  5
# z  3  6 
# 
#    A_col  B_col
# x      1      4
# y      2      5
# z      3      6 
# 
#        A  B
# x_row  1  4
# y_row  2  5
# z_row  3  6

517、pandas.DataFrame.align方法

517-1、语法

# 517、pandas.DataFrame.align方法
pandas.DataFrame.align(other, join='outer', axis=None, level=None, copy=None, fill_value=None, method=_NoDefault.no_default, limit=_NoDefault.no_default, fill_axis=_NoDefault.no_default, broadcast_axis=_NoDefault.no_default)
Align two objects on their axes with the specified join method.

Join method is specified for each axis Index.

Parameters:
otherDataFrame or Series
join{‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’
Type of alignment to be performed.

left: use only keys from left frame, preserve key order.

right: use only keys from right frame, preserve key order.

outer: use union of keys from both frames, sort keys lexicographically.

inner: use intersection of keys from both frames, preserve the order of the left keys.

axisallowed axis of the other object, default None
Align on index (0), columns (1), or both (None).

levelint or level name, default None
Broadcast across a level, matching Index values on the passed MultiIndex level.

copybool, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

fill_valuescalar, default np.nan
Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series:

pad / ffill: propagate last valid observation forward to next valid.

backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.

limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

Deprecated since version 2.1.

fill_axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame, default 0
Filling axis, method and limit.

Deprecated since version 2.1.

broadcast_axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame, default None
Broadcast values along this axis, if aligning two objects of different dimensions.

Deprecated since version 2.1.

Returns:
tuple of (Series/DataFrame, type of other)
Aligned objects.

517-2、参数

517-2-1、other(必须)：DataFrame或Series，要与当前DataFrame对齐的另一个DataFrame或Series。

517-2-2、join(可选，默认值为'outer')：字符串，指定对齐方式，可选值有：

'outer'：联合索引和列(并集)。
'inner'：交集索引和列。
'left'：使用调用DataFrame的索引和列。
'right'：使用other的索引和列。

517-2-3、axis(可选，默认值为None)：整数或字符串，指定对齐的轴，可选值有：

None：对齐所有轴。
0或'index'：仅对齐行索引。
1或'columns'：仅对齐列标签。

517-2-4、level(可选，默认值为None)：整数或级别名称，在层次化索引中指定对齐的层级。

517-2-5、copy(可选，默认值为None)：布尔值，如果为True，则始终复制数据，即使索引和列标签已经对齐；如果为False，则尝试避免不必要的数据复制。

517-2-6、fill_value(可选，默认值为None)：标量值，对缺失数据使用的填充值。

517-2-7、method(可选)：字符串，填充的方法。可选值有：

'ffill'：向前填充。
'bfill'：向后填充

517-2-8、limit(可选)：整数，向前或向后填充的最大次数。

517-2-9、fill_axis(可选)：整数或字符串，指定填充值使用的轴，可选值有：

0或'index'：沿着行轴填充。
1或'columns'：沿着列轴填充。

517-2-10、broadcast_axis(可选)：整数或字符串，未对齐维度的广播轴。

517-3、功能

对两个DataFrame进行对齐操作，使它们的索引和列标签一致。

517-4、返回值

返回一对齐的DataFrame，第一个是调用DataFrame对象，第二个是other对象，它们具有相同的索引和列标签。

517-5、说明

无

517-6、用法

517-6-1、数据准备

无

517-6-2、代码示例

# 517、pandas.DataFrame.align方法
import pandas as pd
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}, index=['x', 'y', 'z'])
df2 = pd.DataFrame({
    'B': [7, 8],
    'C': [9, 10]
}, index=['y', 'z'])
df1_aligned, df2_aligned = df1.align(df2, join='outer', fill_value=0)
print(df1_aligned)
print(df2_aligned)

517-6-3、结果输出

# 517、pandas.DataFrame.align方法
#    A  B  C
# x  1  4  0
# y  2  5  0
# z  3  6  0
#    A  B   C
# x  0  0   0
# y  0  7   9
# z  0  8  10

518、pandas.DataFrame.at_time方法

518-1、语法

# 518、pandas.DataFrame.at_time方法
pandas.DataFrame.at_time(time, asof=False, axis=None)
Select values at particular time of day (e.g., 9:30AM).

Parameters:
time
datetime.time or str
The values to select.

axis
{0 or ‘index’, 1 or ‘columns’}, default 0
For Series this parameter is unused and defaults to 0.

Returns:
Series or DataFrame
Raises:
TypeError
If the index is not a DatetimeIndex

518-2、参数

518-2-1、time(必须)：可以是字符串或时间对象，表示所需的时间(例如，'14:00')。

518-2-2、asof(可选，默认值为False)：布尔值，如果设置为True，则返回的行将是时间上最接近但不晚于指定时间的行。

518-2-3、axis(可选，默认值为None)：指定沿哪个轴进行选择(0表示行，1表示列)。

518-3、功能

从带有时间索引的DataFrame中选择特定时间的行，它主要用于时间序列数据的分析和处理。

518-4、返回值

返回一个新的DataFrame，其中只包含与指定时间匹配的行，如果asof参数设置为True，则返回的是时间上最接近但不晚于指定时间的行。

518-5、说明

无

518-6、用法

518-6-1、数据准备

无

518-6-2、代码示例

# 518、pandas.DataFrame.at_time方法
import pandas as pd
data = {
    'value': [1, 2, 3, 4],
}
index = pd.date_range('2024-01-01 10:00', periods=4, freq='h')
df = pd.DataFrame(data, index=index)
result = df.at_time('11:00')
print(result)

518-6-3、结果输出

# 518、pandas.DataFrame.at_time方法
#                      value
# 2024-01-01 11:00:00      2

519、pandas.DataFrame.between_time方法

519-1、语法

# 519、pandas.DataFrame.between_time方法
pandas.DataFrame.between_time(start_time, end_time, inclusive='both', axis=None)
Select values between particular times of the day (e.g., 9:00-9:30 AM).

By setting start_time to be later than end_time, you can get the times that are not between the two times.

Parameters:
start_time
datetime.time or str
Initial time as a time filter limit.

end_time
datetime.time or str
End time as a time filter limit.

inclusive
{“both”, “neither”, “left”, “right”}, default “both”
Include boundaries; whether to set each bound as closed or open.

axis
{0 or ‘index’, 1 or ‘columns’}, default 0
Determine range time on index or columns value. For Series this parameter is unused and defaults to 0.

Returns:
Series or DataFrame
Data from the original object filtered to the specified dates range.

Raises:
TypeError
If the index is not a DatetimeIndex.

519-2、参数

519-2-1、start_time(必须)：字符串或时间对象，指定时间范围的起始时间，必须与DataFrame的时间索引一致。

519-2-2、end_time(必须)：字符串或时间对象，指定时间范围的结束时间，必须与DataFrame的时间索引一致。

519-2-3、inclusive(可选，默认值为'both')：{'both', 'neither', 'left', 'right'}，可选的值有：

'both'：包含起始和结束时间的行。
'neither'：不包含起始和结束时间的行。
'left'：包含起始时间，但不包含结束时间的行。
'right'：不包含起始时间，但包含结束时间的行。

519-2-4、axis(可选，默认值为None)：{0或'index', 1或'columns'}，该参数在DataFrame的操作中通常不使用，默认为None。

519-3、功能

可以通过指定开始时间和结束时间，从DataFrame中筛选出符合时间条件的行，通过inclusive参数，可以控制是否包含起始和结束时间的行，以满足不同的需求。

519-4、返回值

返回一个新的DataFrame，其中只包含在指定时间范围[start_time, end_time]内的行，如果没有任何符合条件的行，则返回一个空的DataFrame。

519-5、说明

无

519-6、用法

519-6-1、数据准备

无

519-6-2、代码示例

# 519、pandas.DataFrame.between_time方法
import pandas as pd
data = {
    'value': [1, 2, 3, 4, 5],
}
index = pd.date_range('2024-01-01 08:00', periods=5, freq='h')
df = pd.DataFrame(data, index=index)
result = df.between_time('09:00', '11:00')
print(result)

519-6-3、结果输出

# 519、pandas.DataFrame.between_time方法
#                      value
# 2024-01-01 09:00:00      2
# 2024-01-01 10:00:00      3
# 2024-01-01 11:00:00      4

520、pandas.DataFrame.drop方法

520-1、语法

# 520、pandas.DataFrame.drop方法
pandas.DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by directly specifying index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide for more information about the now unused levels.

Parameters:
labels
single label or list-like
Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like.

axis
{0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

index
single label or list-like
Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

columns
single label or list-like
Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

level
int or level name, optional
For MultiIndex, level from which the labels will be removed.

inplace
bool, default False
If False, return a copy. Otherwise, do operation in place and return None.

errors
{‘ignore’, ‘raise’}, default ‘raise’
If ‘ignore’, suppress error and only existing labels are dropped.

Returns:
DataFrame or None
Returns DataFrame or None DataFrame with the specified index or column labels removed or None if inplace=True.

Raises:
KeyError
If any of the labels is not found in the selected axis.

520-2、参数

520-2-1、labels(可选，默认值为None)：单个标签或标签列表，指定要删除的行列的标签，该参数可以与index或columns一起使用，通常与axis配合使用。

520-2-2、axis(可选，默认值为0)：{0或'index', 1或'columns'}, 确定是删除行(0 或 'index')还是列(1 或 'columns')，如果labels被指定，则axis可被省略。

520-2-3、index(可选，默认值为None)：单个标签或标签列表，指要删除的行的标签，该参数与labels参数互斥的。

520-2-4、columns(可选，默认值为None)：单个标签或标签列表，指要删除的列的标签，该参数与labels参数互斥的。

520-2-5、level(可选，默认值为None)：整数或级别名称，在多层索引情况下，指定要删除的级别。

520-2-6、inplace(可选，默认值为False)：布尔值，如果设置为True，将直接在原DataFrame上执行删除操作，而不返回新的DataFrame。

520-2-7、errors(可选，默认值为'raise')：{'raise', 'ignore'}，指定如何处理错误，如果设置为'raise'，在删除不存在的标签时会引发错误；如果设置为'ignore'，则会忽略这些错误。

520-3、功能

用于删除指定的行或列这个方法提供了活的参数设置，允许用户根据需要选择删除的对象。

520-4、返回值

返回一个新的DataFrame，其中删除了指定的行或列，如果inplace=True，则返回值为None，原DataFrame 将被修改。

520-5、说明

无

520-6、用法

520-6-1、数据准备

无

520-6-2、代码示例

# 520、pandas.DataFrame.drop方法
import pandas as pd
# 创建一个示例 DataFrame
data = {
    'Name': ['Myelsa', 'Bryce', 'Jimmy', 'Lucy'],
    'Age': [43, 6, 15, 43],
    'City': ['New York', 'Los Angeles', 'Chicago', 'San Francisco']
}
df = pd.DataFrame(data)
print("原始DataFrame:")
print(df)
# 删除一行（例如，删除 Bob 的数据）
df_dropped_row = df.drop(labels=1)  # 1 是 Bob 的索引
print("\n删除一行后的DataFrame:")
print(df_dropped_row)
# 删除一列（例如，删除 Age 列）
df_dropped_column = df.drop(columns='Age')
print("\n删除一列后的DataFrame:")
print(df_dropped_column)
# 在原 DataFrame 上删除，删除 Charlie 的数据
df.drop(labels=2, inplace=True)  # 2 是 Charlie 的索引
print("\n在原DataFrame上删除一行后的结果:")
print(df)
# 尝试删除不存在的标签，并使用 errors='ignore' 防止引发错误
df_dropped_nonexistent = df.drop(labels=10, errors='ignore')  # 假设 10 不存在
print("\n尝试删除不存在的行后的DataFrame:")
print(df_dropped_nonexistent)

520-6-3、结果输出

# 520、pandas.DataFrame.drop方法
# 原始DataFrame:
#      Name  Age           City
# 0  Myelsa   43       New York
# 1   Bryce    6    Los Angeles
# 2   Jimmy   15        Chicago
# 3    Lucy   43  San Francisco
# 
# 删除一行后的DataFrame:
#      Name  Age           City
# 0  Myelsa   43       New York
# 2   Jimmy   15        Chicago
# 3    Lucy   43  San Francisco
# 
# 删除一列后的DataFrame:
#      Name           City
# 0  Myelsa       New York
# 1   Bryce    Los Angeles
# 2   Jimmy        Chicago
# 3    Lucy  San Francisco
# 
# 在原DataFrame上删除一行后的结果:
#      Name  Age           City
# 0  Myelsa   43       New York
# 1   Bryce    6    Los Angeles
# 3    Lucy   43  San Francisco
# 
# 尝试删除不存在的行后的DataFrame:
#      Name  Age           City
# 0  Myelsa   43       New York
# 1   Bryce    6    Los Angeles
# 3    Lucy   43  San Francisco