Python酷库之旅-第三方库Pandas(120)

一、用法精讲

531、pandas.DataFrame.reindex_like方法

531-1、语法

531-2、参数

531-3、功能

531-4、返回值

531-5、说明

531-6、用法

531-6-1、数据准备

531-6-2、代码示例

531-6-3、结果输出

532、pandas.DataFrame.rename方法

532-1、语法

532-2、参数

532-3、功能

532-4、返回值

532-5、说明

532-6、用法

532-6-1、数据准备

532-6-2、代码示例

532-6-3、结果输出

533、pandas.DataFrame.rename_axis方法

533-1、语法

533-2、参数

533-3、功能

533-4、返回值

533-5、说明

533-6、用法

533-6-1、数据准备

533-6-2、代码示例

533-6-3、结果输出

534、pandas.DataFrame.reset_index方法

534-1、语法

534-2、参数

534-3、功能

534-4、返回值

534-5、说明

534-6、用法

534-6-1、数据准备

534-6-2、代码示例

534-6-3、结果输出

535、pandas.DataFrame.sample方法

535-1、语法

535-2、参数

535-3、功能

535-4、返回值

535-5、说明

535-6、用法

535-6-1、数据准备

535-6-2、代码示例

535-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

531、pandas.DataFrame.reindex_like方法

531-1、语法

# 531、pandas.DataFrame.reindex_like方法
pandas.DataFrame.reindex_like(other, method=None, copy=None, limit=None, tolerance=None)
Return an object with matching indices as other object.

Conform the object to the same index on all axes. Optional filling logic, placing NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
otherObject of the same data type
Its row and column indices are used to define the new indices of this object.

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}
Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

None (default): don’t fill gaps

pad / ffill: propagate last valid observation forward to next valid

backfill / bfill: use next valid observation to fill gap

nearest: use nearest valid observations to fill gap.

copybool, default True
Return a new object, even if the passed indexes are the same.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

limitint, default None
Maximum number of consecutive labels to fill for inexact matches.

toleranceoptional
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns:
Series or DataFrame
Same type as caller, but with changed indices on each axis.

531-2、参数

531-2-1、other(必须)：另一个DataFrame，目标DataFrame将会按照该DataFrame的索引和列进行重新排列。

531-2-2、method(可选，默认值为None)：字符串，填充方法，用于填充因为新索引或列缺失而产生的NaN值，可选值为None、'backfill'、'bfill'、'pad'、'ffill'、'nearest'。

531-2-3、copy(可选，默认值为None)：布尔值，控制是否产生新的DataFrame，如果设为False，当新的索引和旧的索引完全匹配时，可能会返回原始的DataFrame。

531-2-4、limit(可选，默认值为None)：整数，指定在填充时能够填充的最大步骤数，防止过度填充。

531-2-5、tolerance(可选，默认值为None)：数组或标量数值，填充的容差范围。

531-3、功能

使一个DataFrame的行索引和列索引与另一个DataFrame保持一致，便于后续的合并、比较等操作。

531-4、返回值

返回一个新的DataFrame，其中的行索引和列索引与other保持一致，如果某些行或列在原始DataFrame中不存在，这些位置将被填充为NaN或者根据method指定的填充值。

531-5、说明

无

531-6、用法

531-6-1、数据准备

无

531-6-2、代码示例

# 531、pandas.DataFrame.reindex_like方法
import pandas as pd
import numpy as np
# 创建两个示例DataFrame
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({
    'C': [7, 8, 9],
    'D': [10, 11, 12],
    'A': [13, 14, 15]
}, index=['b', 'c', 'd'])
# 使用reindex_like方法
df_reindexed = df1.reindex_like(df2)
print("原始DataFrame df1:")
print(df1)
print("\n目标DataFrame df2:")
print(df2)
print("\n重新索引后的DataFrame:")
print(df_reindexed)

531-6-3、结果输出

# 531、pandas.DataFrame.reindex_like方法
# 原始DataFrame df1:
#    A  B
# a  1  4
# b  2  5
# c  3  6
# 
# 目标DataFrame df2:
#    C   D   A
# b  7  10  13
# c  8  11  14
# d  9  12  15
# 
# 重新索引后的DataFrame:
#     C   D    A
# b NaN NaN  2.0
# c NaN NaN  3.0
# d NaN NaN  NaN

532、pandas.DataFrame.rename方法

532-1、语法

# 532、pandas.DataFrame.rename方法
pandas.DataFrame.rename(mapper=None, *, index=None, columns=None, axis=None, copy=None, inplace=False, level=None, errors='ignore')
Rename columns or index labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

See the user guide for more.

Parameters:
mapperdict-like or function
Dict-like or function transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns.

indexdict-like or function
Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).

columnsdict-like or function
Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.

copybool, default True
Also copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

inplacebool, default False
Whether to modify the DataFrame rather than creating a new one. If True then value of copy is ignored.

levelint or level name, default None
In case of a MultiIndex, only rename labels in the specified level.

errors{‘ignore’, ‘raise’}, default ‘ignore’
If ‘raise’, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the Index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

Returns:
DataFrame or None
DataFrame with the renamed axis labels or None if inplace=True.

Raises:
KeyError
If any of the labels is not found in the selected axis and “errors=’raise’”.

532-2、参数

532-2-1、mapper(可选，默认值为None)：字典或函数，指定一个字典或函数，用于将现有的行索引或列名称映射为新名称。如果是字典，键是旧的名称，值是新的名称。如果是函数，则应用于所有标签。

532-2-2、index(可选，默认值为None)：字典或函数，可以单独为行索引设置映射规则，类似于mapper参数，但只作用于行索引。

532-2-3、columns(可选，默认值为None)：字典或函数，与index类似，但作用于列名称，用于重命名特定的列名。

532-2-4、axis(可选，默认值为None)：整数或字符串，指定重命名操作应用在行还是列，如果指定axis=0或'index'，重命名行索引；指定axis=1或'columns'，重命名列名称。

532-2-5、copy(可选，默认值为None)：布尔值，控制是否生成新的DataFrame，即使inplace=True，如果copy=True，也会产生副本。

532-2-6、inplace(可选，默认值为False)：布尔值，决定是否在原DataFrame上修改数据。若设置为True，则修改将在原DataFrame上进行，不会返回副本。

532-2-7、level(可选，默认值为None)：整数或级别名，如果行或列是多级索引(MultiIndex)，指定在哪一级进行重命名。

532-2-8、errors(可选，默认值为'ignore')：字符串，指定如何处理重命名时找不到的标签，如果设置为'ignore'，在重命名时，如果找不到某个标签，将忽略错误；如果设置为'raise'，则会抛出错误。

532-3、功能

用于对行索引或列标签进行更改，主要功能包括：

基于字典或函数映射重命名。
支持沿行索引或列标签的单独重命名。
可选择返回副本或在原DataFrame上进行操作。

532-4、返回值

返回一个新的DataFrame(如果inplace=False)，其索引或列被重新命名，如果inplace=True，则直接在原始DataFrame上进行修改，不返回值。

532-5、说明

无

532-6、用法

532-6-1、数据准备

无

532-6-2、代码示例

# 532、pandas.DataFrame.rename方法
import pandas as pd
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
# 使用字典重命名列
df_renamed = df.rename(columns={'A': 'alpha', 'B': 'beta'})
print("原始DataFrame:")
print(df)
print("\n重命名后的DataFrame:")
print(df_renamed)
# 使用字典重命名行索引
df_renamed_rows = df.rename(index={0: 'first', 1: 'second', 2: 'third'})
print("\n重命名行索引后的DataFrame:")
print(df_renamed_rows)

532-6-3、结果输出

# 532、pandas.DataFrame.rename方法
# 原始DataFrame:
#    A  B  C
# 0  1  4  7
# 1  2  5  8
# 2  3  6  9
# 
# 重命名后的DataFrame:
#    alpha  beta  C
# 0      1     4  7
# 1      2     5  8
# 2      3     6  9
# 
# 重命名行索引后的DataFrame:
#         A  B  C
# first   1  4  7
# second  2  5  8
# third   3  6  9

533、pandas.DataFrame.rename_axis方法

533-1、语法

# 533、pandas.DataFrame.rename_axis方法
pandas.DataFrame.rename_axis(mapper=_NoDefault.no_default, *, index=_NoDefault.no_default, columns=_NoDefault.no_default, axis=0, copy=None, inplace=False)
Set the name of the axis for the index or columns.

Parameters:
mapperscalar, list-like, optional
Value to set the axis name attribute.

index, columnsscalar, list-like, dict-like or function, optional
A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to rename. For Series this parameter is unused and defaults to 0.

copybool, default None
Also copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

inplacebool, default False
Modifies the object directly, instead of creating a new Series or DataFrame.

Returns:
Series, DataFrame, or None
The same type as the caller or None if inplace=True.

533-2、参数

533-2-1、mapper(可选)：用于指定新的轴标签，如果同时指定了index或columns参数，则此参数将被忽略。如果DataFrame的索引或列是多级的，mapper可以是一个函数、字典或列表，用于映射旧的标签到新的标签。

533-2-2、index(可选)：字符串或映射器(如函数、字典或列表)，用于重命名索引(行标签)，如果指定了此参数，则仅重命名索引。

533-2-3、columns(可选)：字符串或映射器(如函数、字典或列表)，用于重命名列，如果指定了此参数，则仅重命名列。

533-2-4、axis(可选，默认值为0)：整数，指定要重命名的轴。0或'index' 表示索引(行)，1或'columns' 表示列。注意，当明确指定了index或columns参数时，此参数将被忽略。

533-2-5、copy(可选，默认值为None)：布尔值，如果为True，则返回原始DataFrame的副本，并在副本上进行修改；如果为False，则尝试在原地修改DataFrame。

533-2-6、inplace(可选，默认值为False)：布尔值，如果为True，则直接在原始DataFrame上进行修改，不返回任何内容(或者说返回None)；如果为False，则返回修改后的DataFrame的副本。

533-3、功能

重命名DataFrame的轴标签(索引或列名)，这对于清理数据、准备数据以进行可视化或分析特别有用，特别是当索引或列名包含不需要的字符或需要更清晰的标签时。

533-4、返回值

如果inplace=False(默认值)，则返回修改后的DataFrame的副本；如果inplace=True，则直接在原始DataFrame上进行修改，不返回任何内容(或者说返回None)。

533-5、说明

无

533-6、用法

533-6-1、数据准备

无

533-6-2、代码示例

# 533、pandas.DataFrame.rename_axis方法
import pandas as pd
# 创建一个简单的DataFrame
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
# 重命名索引
df_renamed_index = df.rename_axis('rows')
print(df_renamed_index, end='\n\n')
# 重命名列
df_renamed_columns = df.rename_axis(None, axis='columns').rename(columns={'A': 'X', 'B': 'Y'})
print(df_renamed_columns)

533-6-3、结果输出

# 533、pandas.DataFrame.rename_axis方法
#       A  B
# rows      
# 0     1  3
# 1     2  4
# 
#    X  Y
# 0  1  3
# 1  2  4

534、pandas.DataFrame.reset_index方法

534-1、语法

# 534、pandas.DataFrame.reset_index方法
pandas.DataFrame.reset_index(level=None, *, drop=False, inplace=False, col_level=0, col_fill='', allow_duplicates=_NoDefault.no_default, names=None)
Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters:
levelint, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default.

dropbool, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.

inplacebool, default False
Whether to modify the DataFrame rather than creating a new one.

col_levelint or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.

col_fillobject, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

allow_duplicatesbool, optional, default lib.no_default
Allow duplicate column labels to be created.

New in version 1.5.0.

namesint, str or 1-dimensional list, default None
Using the given string, rename the DataFrame column which contains the index data. If the DataFrame has a MultiIndex, this has to be a list or tuple with length equal to the number of levels.

New in version 1.5.0.

Returns:
DataFrame or None
DataFrame with the new index or None if inplace=True.

534-2、参数

534-2-1、level(可选，默认值为None)：指定要重置的索引级别，如果是None，则重置所有索引；如果是整数或级别名称，可以重置特定的索引级别。

534-2-2、drop(可选，默认值为False)：是否将索引丢弃，如果drop=True，则重置索引后不会将索引列添加到DataFrame中，索引将被彻底丢弃。

534-2-3、inplace(可选，默认值为False)：是否在原地操作DataFrame，inplace=True表示对原DataFrame进行操作，而不是返回一个新的对象。

534-2-4、col_level(可选，默认值为0)：当DataFrame的列是MultiIndex时，指定放置索引列的级别。

534-2-5、col_fill(可选，默认值为'')：当列是MultiIndex时，用于填充新插入索引列的其他级别，如果不提供，默认为空字符串。

534-2-6、allow_duplicates(可选)：是否允许在新插入的列中有重复列名，该参数在Pandas的某些版本中默认不允许重复列名。

534-2-7、names(可选，默认值为None)：指定新生成的索引列的名称，默认为None，使用当前的索引名称。

534-3、功能

将当前的索引(无论是单级还是多级索引)重置为默认的整数索引；同时可以选择将旧的索引作为DataFrame的一列保留下来，或完全丢弃。

534-4、返回值

如果inplace=False(默认值)，则返回一个带有新整数索引的DataFrame(副本)，旧的索引作为新列添加或被丢弃；如果inplace=True，则对原DataFrame进行修改，不会返回任何值。

534-5、说明

无

534-6、用法

534-6-1、数据准备

无

534-6-2、代码示例

# 534、pandas.DataFrame.reset_index方法
import pandas as pd
# 创建一个带有MultiIndex的DataFrame
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B', 2)])
df = pd.DataFrame({'value': [10, 20, 30, 40]}, index=index)
# 重置索引并将其作为列保留
df_reset = df.reset_index()
print(df_reset, end='\n\n')
# 丢弃索引，不保留为列
df_dropped = df.reset_index(drop=True)
print(df_dropped)

534-6-3、结果输出

# 534、pandas.DataFrame.reset_index方法
#   level_0  level_1  value
# 0       A        1     10
# 1       A        2     20
# 2       B        1     30
# 3       B        2     40
# 
#    value
# 0     10
# 1     20
# 2     30
# 3     40

535、pandas.DataFrame.sample方法

535-1、语法

# 535、pandas.DataFrame.sample方法
pandas.DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)
Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:
nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional
Fraction of axis items to return. Cannot be used with n.

replacebool, default False
Allow or disallow sampling of the same row more than once.

weightsstr or ndarray-like, optional
Default ‘None’ results in equal probability weighting. If passed a Series, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a DataFrame, will accept the name of a column when axis = 0. Unless weights are a Series, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.

random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

Changed in version 1.4.0: np.random.Generator objects now accepted

axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For Series this parameter is unused and defaults to None.

ignore_indexbool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.

New in version 1.3.0.

Returns:
Series or DataFrame
A new object of same type as caller containing n items randomly sampled from the caller object.

535-2、参数

535-2-1、n(可选，默认值为None)：指定要抽取的样本数量，如果设置了n，则从DataFrame中抽取n行或列(根据axis参数)；如果没有设置n，则会使用frac参数。

535-2-2、frac(可选，默认值为None)：指定抽样的比例(浮点数)，如果设置了frac，则会抽取DataFrame中frac × 总行数或总列数的样本，n和frac参数不能同时使用。

535-2-3、replace(可选，默认值为False)：是否允许重复抽样，replace=True表示允许从DataFrame中有放回地抽取样本，即同一行或列可以多次被抽取。

535-2-4、weights(可选，默认值为None)：指定每个样本的抽取概率，它可以是一个数组或列名，代表每一行(或列)被选中的概率，需要保证所有概率的和为1，或者会自动归一化，如果None，则表示每个样本被选中的概率相等。

535-2-5、random_state(可选，默认值为None)：用于控制随机数生成的种子(确保随机性可重复)，如果指定了random_state，则每次运行该函数时都会产生相同的抽样结果，可以是整数或numpy.random.RandomState对象。

535-2-6、axis(可选，默认值为None)：指定抽样的轴，如果axis=0(或axis='index')，则从行中抽样；如果axis=1(或axis='columns')，则从列中抽样。

535-2-7、ignore_index(可选，默认值为False)：如果设置为True，则会忽略原有的索引，在返回的DataFrame中重新分配默认整数索引，否则保留原始索引。

535-3、功能

从DataFrame中随机抽取行或列，可以通过指定抽样的数量、比例、是否有放回抽样以及是否按照指定概率抽样来实现复杂的抽样任务。

535-4、返回值

返回一个新的DataFrame或Series，其中包含随机抽取的行或列，如果axis=0，则返回的DataFrame包含随机抽取的行；如果axis=1，则返回的DataFrame包含随机抽取的列。

535-5、说明

无

535-6、用法

535-6-1、数据准备

无

535-6-2、代码示例

# 535、pandas.DataFrame.sample方法
import pandas as pd
# 创建一个DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})
# 随机抽取3行
sampled_df = df.sample(n=3)
print(sampled_df, end='\n\n')
# 抽取50%的行
sampled_frac_df = df.sample(frac=0.5)
print(sampled_frac_df, end='\n\n')
# 随机抽取2列
sampled_columns = df.sample(n=2, axis=1)
print(sampled_columns, end='\n\n')
# 有放回地随机抽取4行
sampled_with_replacement = df.sample(n=4, replace=True)
print(sampled_with_replacement, end='\n\n')
# 按照列'A'中的权重抽取3行
sampled_with_weights = df.sample(n=3, weights='A')
print(sampled_with_weights)

535-6-3、结果输出

# 535、pandas.DataFrame.sample方法
#    A   B    C
# 3  4  40  400
# 4  5  50  500
# 2  3  30  300
# 
#    A   B    C
# 4  5  50  500
# 1  2  20  200
# 
#      C   B
# 0  100  10
# 1  200  20
# 2  300  30
# 3  400  40
# 4  500  50
# 
#    A   B    C
# 2  3  30  300
# 4  5  50  500
# 4  5  50  500
# 2  3  30  300
# 
#    A   B    C
# 4  5  50  500
# 0  1  10  100
# 1  2  20  200