Python酷库之旅-第三方库Pandas(053)

一、用法精讲

196、pandas.Series.first方法

196-1、语法

196-2、参数

196-3、功能

196-4、返回值

196-5、说明

196-6、用法

196-6-1、数据准备

196-6-2、代码示例

196-6-3、结果输出

197、pandas.Series.head方法

197-1、语法

197-2、参数

197-3、功能

197-4、返回值

197-5、说明

197-6、用法

197-6-1、数据准备

197-6-2、代码示例

197-6-3、结果输出

198、pandas.Series.idxmax方法

198-1、语法

198-2、参数

198-3、功能

198-4、返回值

198-5、说明

198-6、用法

198-6-1、数据准备

198-6-2、代码示例

198-6-3、结果输出

199、pandas.Series.idxmin方法

199-1、语法

199-2、参数

199-3、功能

199-4、返回值

199-5、说明

199-6、用法

199-6-1、数据准备

199-6-2、代码示例

199-6-3、结果输出

200、pandas.Series.isin方法

200-1、语法

200-2、参数

200-3、功能

200-4、返回值

200-5、说明

200-6、用法

200-6-1、数据准备

200-6-2、代码示例

200-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

196、pandas.Series.first方法

196-1、语法

# 196、pandas.Series.first方法
pandas.Series.first(offset)
Select initial periods of time series data based on a date offset.

Deprecated since version 2.1: first() is deprecated and will be removed in a future version. Please create a mask and filter using .loc instead.

For a DataFrame with a sorted DatetimeIndex, this function can select the first few rows based on a date offset.

Parameters:
offset
str, DateOffset or dateutil.relativedelta
The offset length of the data that will be selected. For instance, ‘1ME’ will display all the rows having their index within the first month.

Returns:
Series or DataFrame
A subset of the caller.

Raises:
TypeError
If the index is not a DatetimeIndex

196-2、参数

196-2-1、offset(必须)：字符串，表示定义时间偏移量的单位。

196-3、功能

用于从时间序列数据中提取符合指定时间偏移量的第一个记录。

196-4、返回值

返回一个pandas.Series对象，包含时间序列中第一个符合指定时间偏移条件的记录及其对应的值，如果没有符合条件的记录，返回的将是NaN。

196-5、说明

常见的offset参数值：

196-5-1、'D'-日：提取第一个符合“每天”时间偏移的记录。例如，first('D')返回时间序列中的第一个日期。

196-5-2、'W'-周：提取第一个符合“每周”时间偏移的记录。例如，first('W')返回时间序列中的第一个符合每周的日期。

196-5-3、'M'-月：提取第一个符合“每月”时间偏移的记录。例如，first('M')返回时间序列中的第一个符合每月的日期。

196-5-4、'Q'-季度：提取第一个符合“每季度”时间偏移的记录。例如，first('Q')返回时间序列中的第一个符合每季度的日期。

196-5-5、'A'或'Y'-年：提取第一个符合“每年”时间偏移的记录。例如，first('A')或first('Y')返回时间序列中的第一个符合每年的日期。

196-6、用法

196-6-1、数据准备

无

196-6-2、代码示例

# 196、pandas.Series.first方法
import pandas as pd
# 示例数据
date_range = pd.date_range(start='2024-01-01', periods=12, freq='M')
ts = pd.Series(range(len(date_range)), index=date_range)
# 使用first方法
first_month = ts.first('M')
print(first_month)

196-6-3、结果输出

# 196、pandas.Series.first方法
# 2024-01-31    0
# Freq: ME, dtype: int64

197、pandas.Series.head方法

197-1、语法

# 197、pandas.Series.head方法
pandas.Series.head(n=5)
Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last |n| rows, equivalent to df[:n].

If n is larger than the number of rows, this function returns all rows.

Parameters:
n
int, default 5
Number of rows to select.

Returns:
same type as caller
The first n rows of the caller object.

197-2、参数

197-2-1、n(可选，默认值为5)：整数，指定要返回的前N行记录的数量。如果N是一个正整数，方法将返回时间序列前N行的记录；如果N是负整数，则方法将返回从末尾倒数N行的记录。

197-3、功能

用于获取时间序列前N行记录的方法。

197-4、返回值

返回一个pandas.Series对象，包含时间序列中前N行记录的一个新的Series对象，返回的Series对象的索引和原时间序列保持一致。

197-5、说明

无

197-6、用法

197-6-1、数据准备

无

197-6-2、代码示例

# 197、pandas.Series.head方法
import pandas as pd
# 创建示例时间序列
data = pd.Series([10, 20, 30, 40, 50, 60, 70])
# 获取前5行记录
print(data.head(5))

197-6-3、结果输出

# 197、pandas.Series.head方法
# 0    10
# 1    20
# 2    30
# 3    40
# 4    50
# dtype: int64

198、pandas.Series.idxmax方法

198-1、语法

# 198、pandas.Series.idxmax方法
pandas.Series.idxmax(axis=0, skipna=True, *args, **kwargs)
Return the row label of the maximum value.

If multiple values equal the maximum, the first row label with that value is returned.

Parameters:
axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

skipna
bool, default True
Exclude NA/null values. If the entire Series is NA, the result will be NA.

*args, **kwargs
Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns:
Index
Label of the maximum value.

Raises:
ValueError
If the Series is empty.

198-2、参数

198-2-1、axis(可选，默认值为0)：用于指定操作的轴，通常Series只有一个轴，所以这个参数通常是默认的0。

198-2-2、skipna(可选，默认值为True)：是否忽略NA/NaN值，默认是True，即忽略这些值来计算最大值的索引。

198-2-3、*args(可选)：其他位置参数，为后续扩展功能做预留。

198-2-4、**kwargs(可选)：其他关键字参数，为后续扩展功能做预留。

198-3、功能

查找并返回Series中最大值的索引。

198-4、返回值

返回最大值所在的索引(即标签)，类型与Series的索引类型相同。

198-5、说明

使用场景：

198-5-1、数据探索: 在数据探索阶段，了解数据集中最大值的位置可以帮助识别潜在的关键数据点。例如，在分析销售数据时，找到销售额最高的日期可以帮助发现销售高峰期。

198-5-2、性能分析: 在性能分析中，确定最大性能指标(如最大CPU使用率或最大吞吐量)的时间点，可以帮助优化系统性能或识别性能瓶颈。

198-5-3、异常检测: 当分析数据时，找到最大值的位置有助于识别异常值或极端情况。例如，在监控传感器数据时，最大温度读数的时间点可能需要进一步检查。

198-5-4、决策支持: 在决策过程中，知道哪个项目或时间点具有最大值可以帮助做出更好的决策。例如，在市场营销策略评估中，找到最高的投资回报率的时间点，可以帮助制定未来的营销计划。

198-5-5、数据排序与筛选: 在进行数据排序和筛选时，找到最大值的位置可以帮助进一步操作。例如，确定最高评分的用户，或找到最大收益的产品。

198-6、用法

198-6-1、数据准备

无

198-6-2、代码示例

# 198、pandas.Series.idxmax方法
# 198-1、数据探索
import pandas as pd
# 示例数据
data = {
    '日期': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04'],
    '销售额': [200, 300, 150, 400]
}
df = pd.DataFrame(data)
# 找到销售额最高的日期
max_sales_date = df['销售额'].idxmax()
print(f"销售额最高的日期是: {df.loc[max_sales_date, '日期']}")

# 198-2、性能分析
import pandas as pd
# 示例数据
data = {
    '时间': ['00:00', '01:00', '02:00', '03:00'],
    'CPU使用率': [55, 75, 60, 80]
}
df = pd.DataFrame(data)
# 找到最高CPU使用率的时间点
max_cpu_time = df['CPU使用率'].idxmax()
print(f"最高CPU使用率的时间点是: {df.loc[max_cpu_time, '时间']}")

# 198-3、异常检测
import pandas as pd
# 示例数据
data = {
    '时间': ['2024-07-01', '2024-07-02', '2024-07-03', '2024-07-04'],
    '温度': [23.5, 29.0, 25.0, 30.2]
}
df = pd.DataFrame(data)
# 找到最高温度读数的时间点
max_temp_date = df['温度'].idxmax()
print(f"最高温度的时间点是: {df.loc[max_temp_date, '时间']}")

# 198-4、决策支持
import pandas as pd
# 示例数据
data = {
    '广告活动': ['活动A', '活动B', '活动C', '活动D'],
    '投资回报率': [3.5, 4.2, 2.8, 5.1]
}
df = pd.DataFrame(data)
# 找到投资回报率最高的广告活动
max_roi_index = df['投资回报率'].idxmax()
print(f"投资回报率最高的广告活动是: {df.loc[max_roi_index, '广告活动']}")

# 198-5、数据排序与筛选
import pandas as pd
# 示例数据
data = {
    '产品': ['产品X', '产品Y', '产品Z'],
    '销售额': [1000, 1500, 1200]
}
df = pd.DataFrame(data)
# 找到销售额最高的产品
max_sales_product_index = df['销售额'].idxmax()
print(f"销售额最高的产品是: {df.loc[max_sales_product_index, '产品']}")

198-6-3、结果输出

# 198、pandas.Series.idxmax方法
# 198-1、数据探索
# 销售额最高的日期是: 2024-01-04

# 198-2、性能分析
# 最高CPU使用率的时间点是: 03:00

# 198-3、异常检测
# 最高温度的时间点是: 2024-07-04

# 198-4、决策支持
# 投资回报率最高的广告活动是: 活动D

# 198-5、数据排序与筛选
# 销售额最高的产品是: 产品Y

199、pandas.Series.idxmin方法

199-1、语法

# 199、pandas.Series.idxmin方法
pandas.Series.idxmin(axis=0, skipna=True, *args, **kwargs)
Return the row label of the minimum value.

If multiple values equal the minimum, the first row label with that value is returned.

Parameters:
axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

skipna
bool, default True
Exclude NA/null values. If the entire Series is NA, the result will be NA.

*args, **kwargs
Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns:
Index
Label of the minimum value.

Raises:
ValueError
If the Series is empty.

199-2、参数

199-2-1、axis(可选，默认值为0)：用于指定操作的轴，通常Series只有一个轴，所以这个参数通常是默认的0。

199-2-2、skipna(可选，默认值为True)：是否忽略NA/NaN值，默认是True，即忽略这些值来计算最大值的索引。

199-2-3、*args(可选)：其他位置参数，为后续扩展功能做预留。

199-2-4、**kwargs(可选)：其他关键字参数，为后续扩展功能做预留。

199-3、功能

用于返回Series对象中最小值的索引的位置的方法，它可以用于快速定位最小值的索引位置。

199-4、返回值

返回值是最小值所在的索引标签，如果Series是基于默认的整数索引，则返回的是最小值的整数位置；如果Series使用了自定义的索引标签，则返回的是对应的标签。

199-5、说明

无

199-6、用法

199-6-1、数据准备

无

199-6-2、代码示例

# 199、pandas.Series.idxmin方法
# 199-1、基本用法
import pandas as pd
# 示例数据
data = pd.Series([3, 1, 4, 1, 5])
# 找到最小值的索引
min_index = data.idxmin()
print(f"最小值的索引是: {min_index}")

# 199-2、跳过NaN值
import pandas as pd
import numpy as np
# 示例数据
data = pd.Series([3, np.nan, 4, 1, 5])
# 找到最小值的索引，跳过NaN
min_index = data.idxmin()
print(f"最小值的索引是: {min_index}")

# 199-3、无有效数据
import pandas as pd
import numpy as np
# 示例数据
data = pd.Series([np.nan, np.nan, np.nan])
# 尝试找到最小值的索引
try:
    min_index = data.idxmin()
except ValueError as e:
    print(f"错误: {e}")
print(f"最小值的索引是: {min_index}")

199-6-3、结果输出

# 199、pandas.Series.idxmin方法
# 199-1、基本用法
# 最小值的索引是: 1

# 199-2、跳过NaN值
# 最小值的索引是: 3

# 199-3、无有效数据
# 最小值的索引是: nan(注：后续更新版本中，此处可能会报错)

200、pandas.Series.isin方法

200-1、语法

# 200、pandas.Series.isin方法
pandas.Series.isin(values)
Whether elements in Series are contained in values.

Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

Parameters:
values
set or list-like
The sequence of values to test. Passing in a single string will raise a TypeError. Instead, turn a single string into a list of one element.

Returns:
Series
Series of booleans indicating if each element is in values.

Raises:
TypeError
If values is a string

200-2、参数

200-2-1、values(必须)：可以是一个列表、集合、或其他可迭代对象，包含需要检查的值。

200-3、功能

用于检查Series对象中的每个元素是否存在于给定的values列表或集合中，并返回一个布尔型的Series对象。

200-4、返回值

返回一个布尔型的Series对象，每个布尔值表示原始Series中对应位置的元素是否存在于values中。

200-5、说明

无

200-6、用法

200-6-1、数据准备

无

200-6-2、代码示例

# 200、pandas.Series.isin方法
import pandas as pd
# 示例数据
data = pd.Series(['apple', 'banana', 'cherry', 'date'])
# 检查哪些元素在给定的列表中
result = data.isin(['banana', 'date'])
print(result)

200-6-3、结果输出

# 200、pandas.Series.isin方法
# 0    False
# 1     True
# 2    False
# 3     True
# dtype: bool