Python酷库之旅-第三方库Pandas(122)

一、用法精讲

541、pandas.DataFrame.take方法

541-1、语法

541-2、参数

541-3、功能

541-4、返回值

541-5、说明

541-6、用法

541-6-1、数据准备

541-6-2、代码示例

541-6-3、结果输出

542、pandas.DataFrame.truncate方法

542-1、语法

542-2、参数

542-3、功能

542-4、返回值

542-5、说明

542-6、用法

542-6-1、数据准备

542-6-2、代码示例

542-6-3、结果输出

543、pandas.DataFrame.backfill方法

543-1、语法

543-2、参数

543-3、功能

543-4、返回值

543-5、说明

543-6、用法

543-6-1、数据准备

543-6-2、代码示例

543-6-3、结果输出

544、pandas.DataFrame.bfill方法

544-1、语法

544-2、参数

544-3、功能

544-4、返回值

544-5、说明

544-6、用法

544-6-1、数据准备

544-6-2、代码示例

544-6-3、结果输出

545、pandas.DataFrame.dropna方法

545-1、语法

545-2、参数

545-3、功能

545-4、返回值

545-5、说明

545-6、用法

545-6-1、数据准备

545-6-2、代码示例

545-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

541、pandas.DataFrame.take方法

541-1、语法

# 541、pandas.DataFrame.take方法
pandas.DataFrame.take(indices, axis=0, **kwargs)
Return the elements in the given positional indices along an axis.

This means that we are not indexing according to actual values in the index attribute of the object. We are indexing according to the actual position of the element in the object.

Parameters:
indices
array-like
An array of ints indicating which positions to take.

axis
{0 or ‘index’, 1 or ‘columns’, None}, default 0
The axis on which to select elements. 0 means that we are selecting rows, 1 means that we are selecting columns. For Series this parameter is unused and defaults to 0.

**kwargs
For compatibility with numpy.take(). Has no effect on the output.

Returns:
same type as caller
An array-like containing the elements taken from the object.

541-2、参数

541-2-1、indices(必须)：似数组的对象，指定要提取的行或列的索，索引是基于位置的(类似于 ``)，可以是负数以从末尾开始计数。

541-2-2、axis(可选，默认值为0)：整数，指定沿哪个轴进行操作，axis=0表示提取行，axis=1表示提取列。

541-2-3、**kwargs(可选)：其他关键字参数，为后续扩展功能做预留。

541-3、功能

用于从DataFrame中选取特定的行或列，类似于基于位置的索引访问，它可以通过提供的索引列表提取对应的数据。

541-4、返回值

返回一个新的DataFrame，其中包含根据indices提取出的行或列。

541-5、说明

无

541-6、用法

541-6-1、数据准备

无

541-6-2、代码示例

# 541、pandas.DataFrame.take方法
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print(df, end='\n\n')
# 提取第1行和第2行
result1 = df.take([0, 2], axis=0)
print(result1, end='\n\n')
# 提取第0列和第2列
result2 = df.take([0, 2], axis=1)
print(result2)

541-6-3、结果输出

# 541、pandas.DataFrame.take方法
#    A  B  C
# 0  1  4  7
# 1  2  5  8
# 2  3  6  9
# 
#    A  B  C
# 0  1  4  7
# 2  3  6  9
# 
#    A  C
# 0  1  7
# 1  2  8
# 2  3  9

542、pandas.DataFrame.truncate方法

542-1、语法

# 542、pandas.DataFrame.truncate方法
pandas.DataFrame.truncate(before=None, after=None, axis=None, copy=None)
Truncate a Series or DataFrame before and after some index value.

This is a useful shorthand for boolean indexing based on index values above or below certain thresholds.

Parameters:
beforedate, str, int
Truncate all rows before this index value.

afterdate, str, int
Truncate all rows after this index value.

axis{0 or ‘index’, 1 or ‘columns’}, optional
Axis to truncate. Truncates the index (rows) by default. For Series this parameter is unused and defaults to 0.

copybool, default is True,
Return a copy of the truncated section.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
type of caller
The truncated Series or DataFrame.

542-2、参数

542-2-1、before(可选，默认值为None)：整数、字符串或次序标签值，指截取数据之前的索引标签。例如，如果传递一个日期时间索引标签，数据将从这个日期时间点开始，包括这个点。

542-2-2、after(可选，默认值为None)：整数、字符串或次序标签值，指截取数据之后的索引标签。例如，如果传递一个日期时间索引标签，数据将一直保留到这个日期时间点，包括这个点。

542-2-3、axis(可选，默认值为None)：整数或字符串，指定截取的轴，可以是0或'index'表示行，1或'columns'表示列，默认情况下，它会尝试对行进行操作。

542-2-4、copy(可选，默认值为None)：布尔值，是否返回原数据框的深拷贝，深拷贝保证新数据框和原数据框之间互不影响。

542-3、功能

用于从一个DataFrame的开始或结束截取部分数据，这可以方便地处理时间序列数据或其他按索引值截断的数据。

542-4、返回值

返回一个新的DataFrame，它包含了从指定before到after之间的数据。

542-5、说明

无

542-6、用法

542-6-1、数据准备

无

542-6-2、代码示例

# 542、pandas.DataFrame.truncate方法
import pandas as pd
import numpy as np
# 创建示例数据
date_range = pd.date_range(start='2024-01-01', periods=10, freq='D')
df = pd.DataFrame(data=np.random.randn(10, 2), index=date_range, columns=['A', 'B'])
# 示例: 截断日期索引，保留2024-01-03到2024-01-07之间的数据
truncated_df = df.truncate(before='2024-01-03', after='2024-01-07')
print("截断日期索引后的DataFrame：")
print(truncated_df, end='\n\n')
# 示例: 按列截断，保留第1列和第2列
df2 = pd.DataFrame(data=np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
truncated_df2 = df2.iloc[:, 1:3]  # 保留第1列和第2列
print("按列截断后的DataFrame：")
print(truncated_df2)

542-6-3、结果输出

# 542、pandas.DataFrame.truncate方法
# 截断日期索引后的DataFrame：
#                    A         B
# 2024-01-03  1.171253 -0.292728
# 2024-01-04  1.514965  0.563961
# 2024-01-05 -0.725748  0.301818
# 2024-01-06 -0.180755 -0.317646
# 2024-01-07  0.785789  0.139568
# 
# 按列截断后的DataFrame：
#           B         C
# 0 -0.814442 -0.345845
# 1 -0.330464 -0.695036
# 2  1.960896  0.097870
# 3 -0.341738 -0.775819
# 4  0.218615  1.818791
# 5 -0.742757 -2.160435
# 6 -0.761504  0.057824
# 7  1.335177 -0.331876
# 8  1.363113  0.215259
# 9 -0.095859  0.453446

543、pandas.DataFrame.backfill方法

543-1、语法

# 543、pandas.DataFrame.backfill方法
pandas.DataFrame.backfill(*, axis=None, inplace=False, limit=None, downcast=_NoDefault.no_default)
Fill NA/NaN values by using the next valid observation to fill the gap.

Deprecated since version 2.0: Series/DataFrame.backfill is deprecated. Use Series/DataFrame.bfill instead.

Returns:
Series/DataFrame or None
Object with missing values filled or None if inplace=True.

543-2、参数

543-2-1、axis(可选，默认值为None)：{0 or 'index', 1 or 'columns'}，指定填充的方向，0或'index' 表示按行填充(纵向)，1或'columns'表示按列填充(横向)。

543-2-2、inplace(可选，默认值为False)：布尔值，如果设置为True，则在原始数据上进行填充，而不返回新对象；如果为False，则返回一个新对象，原始数据保持不变。

543-2-3、limit(可选，默认值为None)：整数，指定最大填充数量，如果指定，填充将最多进行limit次，对于每个缺失值，最多填充limit个值。

543-2-4、downcast(可选)：{'int', 'float', 'string', None}，用于指定数据类型的降级方式，如果指定了类型，填充后的数据将尝试转换为该指定类型，这可以帮助节省内存或调整数据类型。

543-3、功能

向后填充缺失数据(NaN)，它会检查每个缺失值，并使用在其后遇到的第一个非缺失值进行填充，这在处理时间序列数据或需要按顺序填充缺失值时特别有用。

543-4、返回值

返回一个新的DataFrame(如果inplace=False)或None(如果inplace=True)，当返回新对象时，它将是一个填充了缺失值的DataFrame。

543-5、说明

无

543-6、用法

543-6-1、数据准备

无

543-6-2、代码示例

# 543、pandas.DataFrame.backfill方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, np.nan, 3, np.nan],
    'B': [np.nan, 2, np.nan, 4]
})
# 使用backfill填充缺失值
filled_df = df.backfill()
print(filled_df)

543-6-3、结果输出

# 543、pandas.DataFrame.backfill方法
#      A    B
# 0  1.0  2.0
# 1  3.0  2.0
# 2  3.0  4.0
# 3  NaN  4.0

544、pandas.DataFrame.bfill方法

544-1、语法

# 544、pandas.DataFrame.bfill方法
pandas.DataFrame.bfill(*, axis=None, inplace=False, limit=None, limit_area=None, downcast=_NoDefault.no_default)
Fill NA/NaN values by using the next valid observation to fill the gap.

Parameters:
axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame
Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

inplacebool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

limit_area{None, ‘inside’, ‘outside’}, default None
If limit is specified, consecutive NaNs will be filled with this restriction.

None: No fill restriction.

‘inside’: Only fill NaNs surrounded by valid values (interpolate).

‘outside’: Only fill NaNs outside valid values (extrapolate).

New in version 2.2.0.

downcastdict, default is None
A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Deprecated since version 2.2.0.

Returns:
Series/DataFrame or None
Object with missing values filled or None if inplace=True.

544-2、参数

544-2-1、axis(可选，默认值为None)：{0 or 'index', 1 or 'columns'}，指定填充的方向,0或'index'表示沿行方向进行填充，1或'columns'表示沿列方向进行填充，如果为None，默认为0。

544-2-2、inplace(可选，默认值为False)：布尔值，如果为True，则在原始DataFrame上进行填充，并不返回新对象；如果为False，则返回一个新的DataFrame，原始数据保持不变。

544-2-3、limit(可选，默认值为None)：整数，指定最大填充数量，填充时最多填充limit次，以防止过多的填充。

544-2-4、limit_area(可选，默认值为None)：可迭代对象，限制填充的区域，可为列索引或行索引的集合，只会在指定的区域内填充缺失值。

544-2-5、downcast(可选)：{'int', 'float', 'string', None}，用于指定数据类型的降级方式，可将填充后的数据强制转换为指定类型，可以帮助减少内存使用或确保数据类型一致。

544-3、功能

通过使用紧接着缺失值后的第一个非缺失值来填充缺失值，这在处理顺序数据(如时间序列)时尤其有用，允许填充缺失的观测值。

544-4、返回值

返回一个新的DataFrame(如果inplace=False)或None(如果inplace=True，返回的DataFrame包含填充后的数据，其间的NaN值已被后续的非缺失值替代。

544-5、说明

无

544-6、用法

544-6-1、数据准备

无

544-6-2、代码示例

# 544、pandas.DataFrame.bfill方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, np.nan, 3, np.nan],
    'B': [np.nan, 2, np.nan, 4]
})
# 使用bfill填充缺失值
filled_df = df.bfill()
print(filled_df)

544-6-3、结果输出

# 544、pandas.DataFrame.bfill方法
#      A    B
# 0  1.0  2.0
# 1  3.0  2.0
# 2  3.0  4.0
# 3  NaN  4.0

545、pandas.DataFrame.dropna方法

545-1、语法

# 545、pandas.DataFrame.dropna方法
pandas.DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False, ignore_index=False)
Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:
axis{0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values.

1, or ‘columns’ : Drop columns which contain missing value.

Only a single axis is allowed.

how{‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

‘any’ : If any NA values are present, drop that row or column.

‘all’ : If all values are NA, drop that row or column.

threshint, optional
Require that many non-NA values. Cannot be combined with how.

subsetcolumn label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplacebool, default False
Whether to modify the DataFrame rather than creating a new one.

ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 2.0.0.

Returns:
DataFrame or None
DataFrame with NA entries dropped from it or None if inplace=True.

545-2、参数

545-2-1、axis(可选，默认值为0)：{0 or 'index', 1 or 'columns'}，指定删除的方向，0或'index'表示删除行，1或'columns'表示删除列。

545-2-2、how(可选)：{'any', 'all'}，指定删除的条件：

'any'：如果行或列中存在任何NaN值，则删除该行或列。
'all'：仅当行或列中的所有值都是NaN时，才删除该行或列。

545-2-3、thresh(可选)：整数，指定保留的非NaN值的最小数量，如果某行或列中非NaN值的数量少于该值，则将其删除。

545-2-4、subset(可选，默认值为None)：类数组对象，指定用于检查NaN值的行或列的子集，如果未提供，整个DataFrame会被用于检查。

545-2-5、inplace(可选，默认值为False)：布尔值，如果为True，则在原始DataFrame上进行删除，并不返回新对象；如果为False，则返回一个新的DataFrame，原始数据保持不变。

545-2-6、ignore_index(可选，默认值为False)：布尔值，如果为True，则在结果中重新索引，返回的新DataFrame会从0开始重新索引；如果为False，则保留原有的索引。

545-3、功能

提供一种简便的方法来清理缺失值，从而使数据集更适合分析或建模，这对于处理包含缺失数据的实际数据集时尤为重要。

545-4、返回值

返回一个新的DataFrame(如果inplace=False)或None(如果inplace=True)，返回的DataFrame包含经过删除缺失值处理后的数据，原始数据保持不变(如果未指定inplace为True)。

545-5、说明

无

545-6、用法

545-6-1、数据准备

无

545-6-2、代码示例

# 545、pandas.DataFrame.dropna方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 3, 4],
    'C': [1, np.nan, np.nan, 4]
})
# 删除包含任何NaN值的行
cleaned_df = df.dropna(how='any')
print(cleaned_df)

545-6-3、结果输出

# 545、pandas.DataFrame.dropna方法
#      A    B    C
# 3  4.0  4.0  4.0