Python酷库之旅-第三方库Pandas(048)

一、用法精讲

171、pandas.Series.nlargest方法

171-1、语法

171-2、参数

171-3、功能

171-4、返回值

171-5、说明

171-6、用法

171-6-1、数据准备

171-6-2、代码示例

171-6-3、结果输出

172、pandas.Series.nsmallest方法

172-1、语法

172-2、参数

172-3、功能

172-4、返回值

172-5、说明

172-6、用法

172-6-1、数据准备

172-6-2、代码示例

172-6-3、结果输出

173、pandas.Series.pct_change方法

173-1、语法

173-2、参数

173-3、功能

173-4、返回值

173-5、说明

173-6、用法

173-6-1、数据准备

173-6-2、代码示例

173-6-3、结果输出

174、pandas.Series.prod方法

174-1、语法

174-2、参数

174-3、功能

174-4、返回值

174-5、说明

174-6、用法

174-6-1、数据准备

174-6-2、代码示例

174-6-3、结果输出

175、pandas.Series.quantile方法

175-1、语法

175-2、参数

175-3、功能

175-4、返回值

175-5、说明

175-6、用法

175-6-1、数据准备

175-6-2、代码示例

175-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

171、pandas.Series.nlargest方法

171-1、语法

# 171、pandas.Series.nlargest方法
pandas.Series.nlargest(n=5, keep='first')
Return the largest n elements.

Parameters:
n
int, default 5
Return this many descending sorted values.

keep
{‘first’, ‘last’, ‘all’}, default ‘first’
When there are duplicate values that cannot all fit in a Series of n elements:

first : return the first n occurrences in order of appearance.

last : return the last n occurrences in reverse order of appearance.

all : keep all occurrences. This can result in a Series of size larger than n.

Returns:
Series
The n largest values in the Series, sorted in decreasing order.

171-2、参数

171-2-1、n(可选，默认值为5)：一个整数，选择的最大的元素的数量。

171-2-2、keep(可选，默认值为'first')：{'first', 'last', 'all'}，当出现多个相同大小的值时，如何处理：

'first': 保留最早出现的n个。
'last': 保留最后出现的n个。
'all': 保留所有最大值，不考虑n。

171-3、功能

171-3-1、选择最大值：从Series中选择前n个最大的值。

171-3-2、排序：返回的结果是按值从大到小排序的。

171-3-3、处理重复值：可以通过keep参数控制如何处理重复值。

171-4、返回值

171-4-1、返回类型：pandas.Series。

171-4-2、内容：包含前n个最大的值，按降序排列。

171-5、说明

171-5-1、性能：nlargest方法会对数据进行排序，在处理大型数据集时可能会较慢。

171-5-2、重复值处理：可以通过keep参数来控制是否保留第一个出现的、最后一个出现的或者所有重复的最大值。

171-6、用法

171-6-1、数据准备

无

171-6-2、代码示例

# 171、pandas.Series.nlargest方法
# 171-1、数据探索与分析
import pandas as pd
# 示例：学生成绩数据
scores = pd.Series([85, 90, 78, 92, 88, 76, 95, 89])
# 获取前3个最高的成绩
top_scores = scores.nlargest(n=3)
print(top_scores, end='\n\n')

# 171-2、异常检测
import pandas as pd
# 示例：传感器读数数据
sensor_data = pd.Series([100, 150, 200, 250, 300, 350, 400, 450, 500, 1000])
# 获取前2个最大的读数
top_readings = sensor_data.nlargest(n=2)
print(top_readings, end='\n\n')

# 171-3、绩效评估
import pandas as pd
# 示例：销售数据
sales = pd.Series([12000, 15000, 18000, 20000, 22000, 25000, 27000, 30000])
# 获取销售额前3名的销售代表
top_sales = sales.nlargest(n=3)
print(top_sales, end='\n\n')

# 171-4、资源分配
import pandas as pd
# 示例：客户投诉数据
complaints = pd.Series([5, 15, 25, 35, 45, 55, 65, 75, 85, 95])
# 获取前3个最严重的投诉
top_complaints = complaints.nlargest(n=3)
print(top_complaints, end='\n\n')

# 171-5、投资决策
import pandas as pd
# 示例：股票收益数据
returns = pd.Series([0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50])
# 获取收益前3名的股票
top_returns = returns.nlargest(n=3)
print(top_returns)

171-6-3、结果输出

# 171、pandas.Series.nlargest方法
# 171-1、数据探索与分析
# 6    95
# 3    92
# 1    90
# dtype: int64

# 171-2、异常检测
# 9    1000
# 8     500
# dtype: int64

# 171-3、绩效评估
# 7    30000
# 6    27000
# 5    25000
# dtype: int64

# 171-4、资源分配
# 9    95
# 8    85
# 7    75
# dtype: int64

# 171-5、投资决策
# 9    0.50
# 8    0.45
# 7    0.40
# dtype: float64

172、pandas.Series.nsmallest方法

172-1、语法

# 172、pandas.Series.nsmallest方法
pandas.Series.nsmallest(n=5, keep='first')
Return the smallest n elements.

Parameters:
n
int, default 5
Return this many ascending sorted values.

keep
{‘first’, ‘last’, ‘all’}, default ‘first’
When there are duplicate values that cannot all fit in a Series of n elements:

first : return the first n occurrences in order of appearance.

last : return the last n occurrences in reverse order of appearance.

all : keep all occurrences. This can result in a Series of size larger than n.

Returns:
Series
The n smallest values in the Series, sorted in increasing order.

172-2、参数

172-2-1、n(可选，默认值为5)：指定要返回的最小值的数量，如果n大于Series的长度，则返回整个Series。

172-2-2、keep(可选，默认值为'first')：{'first', 'last', 'all'}，当出现多个相同大小的值时，如何处理：

'first': 保留最早出现的n个。
'last': 保留最后出现的n个。
'all': 保留所有最小值，不考虑n。

172-3、功能

172-3-1、提取最小值：从Series中提取指定数量的最小值。

172-3-2、处理重复值：可以通过keep参数指定如何处理重复值。

172-4、返回值

返回一个包含指定数量最小值的新的Series，返回的Series保留了原始Series的索引信息，这样可以方便地追踪这些最小值在原始数据中的位置。

172-5、说明

无

172-6、用法

172-6-1、数据准备

无

172-6-2、代码示例

# 172、pandas.Series.nsmallest方法
# 172-1、默认用法
import pandas as pd
data = pd.Series([3, 5, 6, 8, 10, 10, 11, 24])
result = data.nsmallest()
print(result, end='\n\n')

# 172-2、指定n参数
import pandas as pd
data = pd.Series([3, 5, 6, 8, 10, 10, 11, 24])
result = data.nsmallest(n=3)
print(result, end='\n\n')

# 172-3、使用keep='last'
import pandas as pd
data = pd.Series([3, 5, 6, 8, 10, 10, 11, 24])
result = data.nsmallest(n=5, keep='last')
print(result, end='\n\n')

# 172-4、使用keep='all'
import pandas as pd
data = pd.Series([3, 5, 6, 8, 10, 10, 11, 24])
result = data.nsmallest(n=5, keep='all')
print(result)

172-6-3、结果输出

# 172、pandas.Series.nsmallest方法
# 172-1、默认用法
# 0     3
# 1     5
# 2     6
# 3     8
# 4    10
# dtype: int64

# 172-2、指定n参数
# 0    3
# 1    5
# 2    6
# dtype: int64

# 172-3、使用keep='last'
# 0     3
# 1     5
# 2     6
# 3     8
# 5    10
# dtype: int64

# 172-4、使用keep='all'
# 0     3
# 1     5
# 2     6
# 3     8
# 4    10
# 5    10
# dtype: int64

173、pandas.Series.pct_change方法

173-1、语法

# 173、pandas.Series.pct_change方法
pandas.Series.pct_change(periods=1, fill_method=_NoDefault.no_default, limit=_NoDefault.no_default, freq=None, **kwargs)
Fractional change between the current and a prior element.

Computes the fractional change from the immediately previous row by default. This is useful in comparing the fraction of change in a time series of elements.

Note

Despite the name of this method, it calculates fractional change (also known as per unit change or relative change) and not percentage change. If you need the percentage change, multiply these values by 100.

Parameters:
periodsint, default 1
Periods to shift for forming percent change.

fill_method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default ‘pad’
How to handle NAs before computing percent changes.

Deprecated since version 2.1: All options of fill_method are deprecated except fill_method=None.

limitint, default None
The number of consecutive NAs to fill before stopping.

Deprecated since version 2.1.

freqDateOffset, timedelta, or str, optional
Increment to use from time series API (e.g. ‘ME’ or BDay()).

**kwargs
Additional keyword arguments are passed into DataFrame.shift or Series.shift.

Returns:
Series or DataFrame
The same type as the calling object.

173-2、参数

173-2-1、periods(可选，默认值为1)：整数，表示计算变化的间隔期数。例如，periods=1表示当前元素与前一个元素的变化，periods=2表示当前元素与前两个元素的变化。

173-2-2、fill_method(可选)：{'backfill', 'bfill', 'pad', 'ffill', None}，用于填充缺失值的方法：

173-2-2-1、'backfill'或'bfill': 使用之后的有效值填充NaN。

173-2-2-2、'pad'或'ffill': 使用之前的有效值填充NaN。

173-2-3、limit(可选)：整数，最多填充多少个连续的NaN值。

173-2-4、freq(可选，默认值为None)：用于时间序列的频率转换。例如，freq='M'表示按月计算变化。

173-2-5、**kwargs(可选)：其他传递给内部填充方法的关键字参数。

173-3、功能

用于计算当前元素与前一个元素之间的百分比变化，该方法在时间序列分析中非常有用，因为它可以帮助我们快速识别变化趋势和波动。

173-4、返回值

返回一个包含百分比变化的Series，如果当前元素或前一个元素为NaN，则相应的百分比变化也会是NaN。

173-5、说明

无

173-6、用法

173-6-1、数据准备

无

173-6-2、代码示例

# 173、pandas.Series.pct_change方法
# 173-1、默认用法
import pandas as pd
data = pd.Series([100, 120, 130, 90, 160])
result = data.pct_change()
print(result, end='\n\n')

# 173-2、指定periods参数
import pandas as pd
data = pd.Series([100, 120, 130, 90, 160])
result = data.pct_change(periods=2)
print(result, end='\n\n')

# 173-3、使用fill_method填充缺失值
import pandas as pd
data_with_nan = pd.Series([100, None, 130, 90, 160])
result = data_with_nan.pct_change(fill_method='ffill')
print(result, end='\n\n')

# 173-4、指定limit和fill_method参数
import pandas as pd
data_with_nan = pd.Series([100, None, 130, 90, 160])
result = data_with_nan.pct_change(fill_method='ffill', limit=1)
print(result)

173-6-3、结果输出

# 173、pandas.Series.pct_change方法
# 173-1、默认用法
# 0         NaN
# 1    0.200000
# 2    0.083333
# 3   -0.307692
# 4    0.777778
# dtype: float64

# 173-2、指定periods参数
# 0         NaN
# 1         NaN
# 2    0.300000
# 3   -0.250000
# 4    0.230769
# dtype: float64

# 173-3、使用fill_method填充缺失值
# 0         NaN
# 1    0.000000
# 2    0.300000
# 3   -0.307692
# 4    0.777778
# dtype: float64

# 173-4、指定limit和fill_method参数
# 0         NaN
# 1    0.000000
# 2    0.300000
# 3   -0.307692
# 4    0.777778
# dtype: float64

174、pandas.Series.prod方法

174-1、语法

# 174、pandas.Series.prod方法
pandas.Series.prod(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)
Return the product of the values over the requested axis.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

New in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

174-2、参数

174-2-1、axis(可选，默认值为None)：只适用于DataFrame，对于Series来说，该参数无效。

174-2-2、skipna(可选，默认值为True)：指定是否跳过NaN值，如果设置为True(默认情况)，NaN值将被忽略，计算时只考虑非空值；如果设置为False，结果将是NaN，如果存在NaN值。

174-2-3、numeric_only(可选，默认值为False)：指定是否仅考虑数值类型的数据，如果为True，则仅包含数字类型数据进行计算；对于Series来说，该参数通常无效，因为Series本身通常只有一种数据类型。

174-2-4、min_count(可选，默认值为0)：表示需要参与计算的最小有效值数量，如果非零有效值的数量小于min_count，则结果为NaN。例如，如果设置min_count=1，而Series中没有非零值，则返回NaN。

174-2-5、**kwargs(可选)：其他关键字参数，以后可能用于扩展方法的功能。

174-3、功能

用于计算Series元素的乘积，并返回一个浮点数或整数，表示所有非NaN元素的乘积。

174-4、返回值

174-4-1、数值类型: 返回值为浮点数或整数，具体取决于Series的数据类型和参与计算的元素。

174-4-2、缺失值情况: 如果skipna=False且Series中存在NaN值，返回NaN，如果有效值数量少于min_count，返回NaN。

174-5、说明

无

174-6、用法

174-6-1、数据准备

无

174-6-2、代码示例

# 174、pandas.Series.prod方法
import pandas as pd
import numpy as np
# 创建一个Series
s = pd.Series([1, 2, 3, np.nan, 4])
# 计算所有非NaN元素的乘积
result = s.prod()
print(result)

# 不跳过NaN值
result_skipna_false = s.prod(skipna=False)
print(result_skipna_false)

# 需要至少2个有效值参与计算，否则返回NaN
result_min_count = s.prod(min_count=2)
print(result_min_count)

# 需要至少5个有效值参与计算
result_min_count_high = s.prod(min_count=5)
print(result_min_count_high)

174-6-3、结果输出

# 174、pandas.Series.prod方法
# 24.0
# nan
# 24.0
# nan

175、pandas.Series.quantile方法

175-1、语法

# 175、pandas.Series.quantile方法
pandas.Series.quantile(q=0.5, interpolation='linear')
Return value at the given quantile.

Parameters:
qfloat or array-like, default 0.5 (50% quantile)
The quantile(s) to compute, which can lie in range: 0 <= q <= 1.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

linear: i + (j - i) * (x-i)/(j-i), where (x-i)/(j-i) is the fractional part of the index surrounded by i > j.

lower: i.

higher: j.

nearest: i or j whichever is nearest.

midpoint: (i + j) / 2.

Returns:
float or Series
If q is an array, a Series will be returned where the index is q and the values are the quantiles, otherwise a float will be returned.

175-2、参数

175-2-1、q(可选，默认值为0.5)：表示分位数值，取值范围为[0, 1]之间的浮点数。例如，0.5表示中位数。

175-2-2、interpolation(可选，默认值为'linear')：表示指定插值方法，其他选项包括'lower'、'higher'、'nearest'和'midpoint'。

175-3、功能

175-3-1、计算分位数: 根据参数q的值，计算指定分位数。

175-3-2、插值方式: 通过interpolation参数指定计算分位数时使用的插值方法。

175-4、返回值

175-4-1、数值类型: 返回指定分位数的值。

175-4-2、数据类型一致: 返回值的类型与Series中的数据类型一致。

175-5、说明

无

175-6、用法

175-6-1、数据准备

无

175-6-2、代码示例

# 175、pandas.Series.quantile方法
# 175-1、计算中位数(0.5分位数)
import pandas as pd
import numpy as np
# 创建一个Series
s = pd.Series([1, 2, 3, np.nan, 5])
median = s.quantile(q=0.5)
print(median, end='\n\n')

# 175-2、使用'lower'插值方法计算0.5分位数
import pandas as pd
import numpy as np
# 创建一个Series
s = pd.Series([1, 2, 3, np.nan, 5])
median_lower = s.quantile(q=0.5, interpolation='lower')
print(median_lower, end='\n\n')

# 175-3、使用'higher'插值方法计算0.5分位数
import pandas as pd
import numpy as np
# 创建一个Series
s = pd.Series([1, 2, 3, np.nan, 5])
median_higher = s.quantile(q=0.5, interpolation='higher')
print(median_higher, end='\n\n')

# 175-4、计算0.25分位数
import pandas as pd
import numpy as np
# 创建一个Series
s = pd.Series([1, 2, 3, np.nan, 5])
first_quartile = s.quantile(q=0.25)
print(first_quartile, end='\n\n')

# 175-5、计算0.75分位数
import pandas as pd
import numpy as np
# 创建一个Series
s = pd.Series([1, 2, 3, np.nan, 5])
third_quartile = s.quantile(q=0.75)
print(third_quartile)

175-6-3、结果输出

# 175、pandas.Series.quantile方法
# 175-1、计算中位数(0.5分位数)
# 2.5

# 175-2、使用'lower'插值方法计算0.5分位数
# 2.0

# 175-3、使用'higher'插值方法计算0.5分位数
# 3.0

# 175-4、计算0.25分位数
# 1.75

# 175-5、计算0.75分位数
# 3.5