Python酷库之旅-第三方库Pandas(121)

一、用法精讲

536、pandas.DataFrame.set_axis方法

536-1、语法

536-2、参数

536-3、功能

536-4、返回值

536-5、说明

536-6、用法

536-6-1、数据准备

536-6-2、代码示例

536-6-3、结果输出

537、pandas.DataFrame.set_index方法

537-1、语法

537-2、参数

537-3、功能

537-4、返回值

537-5、说明

537-6、用法

537-6-1、数据准备

537-6-2、代码示例

537-6-3、结果输出

538、pandas.DataFrame.tail方法

538-1、语法

538-2、参数

538-3、功能

538-4、返回值

538-5、说明

538-6、用法

538-6-1、数据准备

538-6-2、代码示例

538-6-3、结果输出

539、pandas.DataFrame.xs方法

539-1、语法

539-2、参数

539-3、功能

539-4、返回值

539-5、说明

539-6、用法

539-6-1、数据准备

539-6-2、代码示例

539-6-3、结果输出

540、pandas.DataFrame.get方法

540-1、语法

540-2、参数

540-3、功能

540-4、返回值

540-5、说明

540-6、用法

540-6-1、数据准备

540-6-2、代码示例

540-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

536、pandas.DataFrame.set_axis方法

536-1、语法

# 536、pandas.DataFrame.set_axis方法
pandas.DataFrame.set_axis(labels, *, axis=0, copy=None)
Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

Parameters:
labelslist-like, Index
The values for the new index.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to update. The value 0 identifies the rows. For Series this parameter is unused and defaults to 0.

copybool, default True
Whether to make a copy of the underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
DataFrame
An object of type DataFrame.

536-2、参数

536-2-1、labels(必须)：数组或列表，要赋予DataFrame或Series新的标签列表，标签的数量必须与原轴上元素的数量相同。

536-2-2、axis(可选，默认值为0)：整数或字符串，指定要设置标签的轴。

0或'index'：对应DataFrame的行。
1或 'columns'：对应DataFrame的列。

536-2-3、copy(可选，默认值为None)：布尔值，指定是否返回副本，默认情况下，copy参数为None，这意味着仅在labels不合适时才进行复制。

若copy=True，则在任何情况下都返回副本。
若copy=False，则在任何情况下都不返回副本。

536-3、功能

更改DataFrame行或列的标签(名称)，这对于数据预处理或清理非常重要，例如重命名列、调整索引等。

536-4、返回值

返回一个新的DataFrame或Series，对应轴的标签已被设置为新提供的标签，如果inplace=True，则修改原来的DataFrame或Series，并返回None。

536-5、说明

无

536-6、用法

536-6-1、数据准备

无

536-6-2、代码示例

# 536、pandas.DataFrame.set_axis方法
import pandas as pd
# 创建一个DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['a', 'b', 'c'])
print("原始DataFrame:\n", df)
# 设置新的列标签
new_labels = ['alpha', 'beta', 'gamma']
df1 = df.set_axis(new_labels, axis=1)
print("\n设置新的列标签:\n", df1)
# 设置新的行标签
new_index = ['one', 'two', 'three']
df2 = df.set_axis(new_index, axis=0)
print("\n设置新的行标签:\n", df2)
# 设置新的行标签，使用copy=False确保不进行复制
df.set_axis(new_index, axis=0, copy=False)
print("\n使用inplace设置新的行标签:\n", df)

536-6-3、结果输出

# 536、pandas.DataFrame.set_axis方法
# 原始DataFrame:
#     A  B  C
# a  1  4  7
# b  2  5  8
# c  3  6  9
# 
# 设置新的列标签:
#     alpha  beta  gamma
# a      1     4      7
# b      2     5      8
# c      3     6      9
# 
# 设置新的行标签:
#         A  B  C
# one    1  4  7
# two    2  5  8
# three  3  6  9
# 
# 使用inplace设置新的行标签:
#     A  B  C
# a  1  4  7
# b  2  5  8
# c  3  6  9

537、pandas.DataFrame.set_index方法

537-1、语法

# 537、pandas.DataFrame.set_index方法
pandas.DataFrame.set_index(keys, *, drop=True, append=False, inplace=False, verify_integrity=False)
Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

Parameters:
keys
label or array-like or list of labels/arrays
This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator.

drop
bool, default True
Delete columns to be used as the new index.

append
bool, default False
Whether to append columns to existing index.

inplace
bool, default False
Whether to modify the DataFrame rather than creating a new one.

verify_integrity
bool, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.

Returns:
DataFrame or None
Changed row labels or None if inplace=True.

537-2、参数

537-2-1、keys(必须)：单个列名(字符串)或列名的列表，该参数指定一个或多个列名(字符串、列表或数组)用于作为新的索引，可以是DataFrame的列名，也可以是列的序列。

537-2-2、drop(可选，默认值为True)：布尔值，指定是否在设置索引后删除用于设置索引的列，默认值为True，表示删除这些列；如果设置为False，则保留这些列。

537-2-3、append(可选，默认值为False)：布尔值，如果为True，新的索引将附加到现有索引上，而不是替换掉现有的索引。

537-2-4、inplace(可选，默认值为False)：布尔值，如果为True，则在原始DataFrame上进行修改，而不是返回一个新的DataFrame。

537-2-5、verify_integrity(可选，默认值为False)：布尔值，如果为True，检查设置的新索引是否有重复值，如果有重复值，将引发错误。

537-3、功能

用于将指定的列设置为DataFrame的索引，通过设置索引，可以提高数据的查询速度，并且可以通过索引更方便地进行数据操作。此外，通过设置合适的索引，可以使数据分析和可视化更加直观。

537-4、返回值

如果inplace参数为False，返回一个新的DataFrame，新的DataFrame会将指定的列作为索引；如果inplace为True，则返回值为None，原始DataFrame将被修改。

537-5、说明

无

537-6、用法

537-6-1、数据准备

无

537-6-2、代码示例

# 537、pandas.DataFrame.set_index方法
import pandas as pd
# 创建一个示例DataFrame
data = {
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c'],
    'C': [4, 5, 6]
}
df = pd.DataFrame(data)
# 使用set_index设置'A'列为索引
new_df = df.set_index('A')
print(new_df)

537-6-3、结果输出

# 537、pandas.DataFrame.set_index方法
#    B  C
# A      
# 1  a  4
# 2  b  5
# 3  c  6

538、pandas.DataFrame.tail方法

538-1、语法

# 538、pandas.DataFrame.tail方法
pandas.DataFrame.tail(n=5)
Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n, this function returns all rows except the first |n| rows, equivalent to df[|n|:].

If n is larger than the number of rows, this function returns all rows.

Parameters:
n
int, default 5
Number of rows to select.

Returns:
type of caller
The last n rows of the caller object.

538-2、参数

538-2-1、n(可选，默认值为5)：整数，指定要返回的最后几行的数量，如果传入的值为负数，则返回空的DataFrame；如果DataFrame的行数少于n，则返回整个DataFrame。

538-3、功能

从DataFrame中提取最后几行数据，这在数据查看和分析时非常有用，通常用于快速检查数据的底部，以更好地理解数据的分布或进行后续操作。

538-4、返回值

返回一个新的DataFrame，其中包括原始DataFrame的最后n行，如果n大于DataFrame的行数，则返回整个DataFrame；如果n为负数，则返回空的DataFrame。

538-5、说明

无

538-6、用法

538-6-1、数据准备

无

538-6-2、代码示例

# 538、pandas.DataFrame.tail方法
import pandas as pd
# 创建一个示例DataFrame
data = {
    'A': [1, 2, 3, 4, 5, 6],
    'B': ['a', 'b', 'c', 'd', 'e', 'f'],
    'C': [7, 8, 9, 10, 11, 12]
}
df = pd.DataFrame(data)
# 使用tail方法获取最后3行
last_rows = df.tail(3)
print(last_rows)

538-6-3、结果输出

# 538、pandas.DataFrame.tail方法
#    A  B   C
# 3  4  d  10
# 4  5  e  11
# 5  6  f  12

539、pandas.DataFrame.xs方法

539-1、语法

# 539、pandas.DataFrame.xs方法
pandas.DataFrame.xs(key, axis=0, level=None, drop_level=True)
Return cross-section from the Series/DataFrame.

This method takes a key argument to select data at a particular level of a MultiIndex.

Parameters:
key
label or tuple of label
Label contained in the index, or partially in a MultiIndex.

axis
{0 or ‘index’, 1 or ‘columns’}, default 0
Axis to retrieve cross-section on.

level
object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.

drop_level
bool, default True
If False, returns object with same levels as self.

Returns:
Series or DataFrame
Cross-section from the original Series or DataFrame corresponding to the selected index levels.

539-2、参数

539-2-1、key(必须)：任意数据类型，指要选择的数据的键值，可以是单个值或多级索引的特定值。

539-2-2、axis(可选，默认值为0)：整数，指定选择的轴，默认为0(行)；如果设置为1，则表示选择列。

539-2-3、level(可选，默认值为None)：整数或字符串，如果DataFrame是多层索引，使用此参数可以指定要提取的层级，可以是层级的名称或层级的数字索引。

539-2-4、drop_level(可选，默认值为True)：布尔值，指定是否从结果中删除所选择的索引级别，默认为True，表示删除选择的级别。

539-3、功能

用于从多层索引的DataFrame中提取特定数据，特别是在处理复杂的层次结构数据时非常有用，可以根据指定的键值和快速定位所需数据。

539-4、返回值

返回一个DataFrame或Series，具体取决于所选择的数据，如果选择的是单个值且drop_level为True，则返回的是Series；如果选择多层，则返回DataFrame。

539-5、说明

无

539-6、用法

539-6-1、数据准备

无

539-6-2、代码示例

# 539、pandas.DataFrame.xs方法
import pandas as pd
# 创建一个多层索引的示例DataFrame
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
# 注意value列的值数量应该对应索引数量
data = {'value': [1, 2, 3, 4]}
df = pd.DataFrame(data, index=index)
# 使用xs方法选择'A'组的所有数据
result = df.xs('A', level='first')
print(result)

539-6-3、结果输出

# 539、pandas.DataFrame.xs方法
#         value
# second       
# one         1
# two         2

540、pandas.DataFrame.get方法

540-1、语法

# 540、pandas.DataFrame.get方法
pandas.DataFrame.get(key, default=None)
Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters:
key
object
Returns:
same type as items contained in object.

540-2、参数

540-2-1、key(必须)：字符串，指要获取的列名。

540-2-2、default(可选，默认值为None)：如果列不存在时返回的值，也可以指定为其他值。

540-3、功能

用于从DataFrame中获取指定列的值，当指定的列存在时，它返回该列的值；如果列不存在，它返回一个默认值。

540-4、返回值

如果DataFrame中有指定列，它返回该列的pandas.Series；如果列不存在，则返回default参数的值，如果没有指定default，则返回None。

540-5、说明

无

540-6、用法

540-6-1、数据准备

无

540-6-2、代码示例

# 540、pandas.DataFrame.get方法
import pandas as pd
# 创建一个DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 获取存在的列'A'
result_a = df.get('A')
print(result_a, end='\n\n')
# 获取不存在的列'C'，返回默认None
result_c = df.get('C')
print(result_c, end='\n\n')
# 获取不存在的列'C'，自定义默认值'Column not found'
result_c_default = df.get('C', default='Column not found')
print(result_c_default)

540-6-3、结果输出

# 540、pandas.DataFrame.get方法
# 0    1
# 1    2
# 2    3
# Name: A, dtype: int64
# 
# None
# 
# Column not found