Python酷库之旅-第三方库Pandas(081)

一、用法精讲

336、pandas.Series.str.rpartition方法

336-1、语法

336-2、参数

336-3、功能

336-4、返回值

336-5、说明

336-6、用法

336-6-1、数据准备

336-6-2、代码示例

336-6-3、结果输出

337、pandas.Series.str.slice方法

337-1、语法

337-2、参数

337-3、功能

337-4、返回值

337-5、说明

337-6、用法

337-6-1、数据准备

337-6-2、代码示例

337-6-3、结果输出

338、pandas.Series.str.slice_replace方法

338-1、语法

338-2、参数

338-3、功能

338-4、返回值

338-5、说明

338-6、用法

338-6-1、数据准备

338-6-2、代码示例

338-6-3、结果输出

339、pandas.Series.str.split方法

339-1、语法

339-2、参数

339-3、功能

339-4、返回值

339-5、说明

339-6、用法

339-6-1、数据准备

339-6-2、代码示例

339-6-3、结果输出

340、pandas.Series.str.rsplit方法

340-1、语法

340-2、参数

340-3、功能

340-4、返回值

340-5、说明

340-6、用法

340-6-1、数据准备

340-6-2、代码示例

340-6-3、结果输出

一、用法精讲

336、pandas.Series.str.rpartition方法

336-1、语法

# 336、pandas.Series.str.rpartition方法
pandas.Series.str.rpartition(sep=' ', expand=True)
Split the string at the last occurrence of sep.

This method splits the string at the last occurrence of sep, and returns 3 elements containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements containing two empty strings, followed by the string itself.

Parameters:
sep
str, default whitespace
String to split on.

expand
bool, default True
If True, return DataFrame/MultiIndex expanding dimensionality. If False, return Series/Index.

Returns:
DataFrame/MultiIndex or Series/Index of objects.

336-2、参数

336-2-1、sep(可选，默认值为' ')：字符串，用作分隔符的字符串，你可以设置为任何字符串，作为切割的依据。

336-2-2、expand(可选，默认值为True)：布尔值，如果为True，返回一个DataFrame，其中每一列分别对应分隔符前、分隔符和分隔符后的部分；如果为False，返回一个Series，其中每个元素是一个包含三部分的元组。

336-3、功能

从右侧开始查找指定的分隔符，将字符串分为三部分。对于每个字符串，它会查找最后一次出现的分隔符，将字符串分割成以下三部分：

分隔符前的部分
分隔符本身
分隔符后的部分

336-4、返回值

如果expand=True，返回一个DataFrame，包含三列，分别是分隔符前的部分、分隔符、分隔符后的部分；如果expand=False，返回一个Series，每个元素是一个包含上述三部分的元组。

336-5、说明

使用场景：

336-5-1、数据清洗与处理：在数据清洗过程中，常常需要从字符串中提取特定信息，比如从文件路径中提取文件名或文件扩展名，可以使用rpartition方法找到最后一个斜杠，并将路径分割成目录和文件。

336-5-2、文本分析：在进行文本分析时，可能需要从句子中提取特定的词或字符串段，例如，获取最后一个单词和其之前的部分，这在处理评论、反馈等用户生成内容时尤其有用。

336-5-3、分割复合数据：在某些情况下，字符串可能包含用特定字符分隔的复合数据(如“键:值”对)，使用rpartition可以方便地将其分成键和值。

336-6、用法

336-6-1、数据准备

无

336-6-2、代码示例

# 336、pandas.Series.str.rpartition方法
# 336-1、数据清洗(提取文件名)
import pandas as pd
# 示例数据
file_paths = pd.Series([
    '/home/user/documents/report.pdf',
    '/var/www/html/index.html',
    '/tmp/example.txt'
])
# 使用rpartition提取文件名
file_names = file_paths.str.rpartition('/')[2]
print("提取的文件名：")
print(file_names, end='\n\n')

# 336-2、文本分析(提取最后一个单词)
import pandas as pd
# 示例数据
sentences = pd.Series([
    'The quick brown fox',
    ' jumps over the lazy dog',
    'Hello world'
])
# 提取最后一个单词
last_words = sentences.str.rpartition(' ')[2]
print("提取的最后一个单词：")
print(last_words, end='\n\n')

# 336-3、分割复合数据(提取键和值)
import pandas as pd
# 示例数据
key_value_pairs = pd.Series([
    'name:Alice',
    'age:30',
    'city:New York'
])
# 提取键和值
keys = key_value_pairs.str.rpartition(':')[0]
values = key_value_pairs.str.rpartition(':')[2]
print("提取的键：")
print(keys)
print("提取的值：")
print(values)

336-6-3、结果输出

# 336、pandas.Series.str.rpartition方法
# 336-1、数据清洗(提取文件名)
# 提取的文件名：
# 0     report.pdf
# 1     index.html
# 2    example.txt
# Name: 2, dtype: object

# 336-2、文本分析(提取最后一个单词)
# 提取的最后一个单词：
# 0      fox
# 1      dog
# 2    world
# Name: 2, dtype: object

# 336-3、分割复合数据(提取键和值)
# 提取的键：
# 0    name
# 1     age
# 2    city
# Name: 0, dtype: object
# 提取的值：
# 0       Alice
# 1          30
# 2    New York
# Name: 2, dtype: object

337、pandas.Series.str.slice方法

337-1、语法

# 337、pandas.Series.str.slice方法
pandas.Series.str.slice(start=None, stop=None, step=None)
Slice substrings from each element in the Series or Index.

Parameters:
start
int, optional
Start position for slice operation.

stop
int, optional
Stop position for slice operation.

step
int, optional
Step size for slice operation.

Returns:
Series or Index of object
Series or Index from sliced substring from original string object.

337-2、参数

337-2-1、start(可选，默认值为None)：整数或None，指定要开始切片的位置索引，索引从0开始，如果未指定或为None，则默认从字符串的起始位置开始。

337-2-2、stop(可选，默认值为None)：整数或None，指定切片结束的位置索引，切片不包括该位置的字符，如果未指定或为None，则默认切到字符串的末尾。

337-2-3、step(可选，默认值为None)：整数或None，指定步长，默认为None，即步长为1；步长为负数时，可以进行反向切片。

337-3、功能

用于从Series中的每个字符串中根据指定的索引范围提取子字符串，它允许你通过指定起始位置、结束位置和步长来精确控制提取的部分。

337-4、返回值

返回Series或Index，具体取决于输入数据的类型：

返回值是一个与原Series长度相同的新Series，其中包含根据指定的start、stop和step提取的子字符串。
如果某个字符串的长度小于start，该位置返回一个空字符串。

337-5、说明

无

337-6、用法

337-6-1、数据准备

无

337-6-2、代码示例

# 337、pandas.Series.str.slice方法
import pandas as pd
# 示例数据
s = pd.Series(['apple', 'banana', 'cherry', 'date'])
# 从索引1开始切片到索引4(不包括4)
result_slice = s.str.slice(start=1, stop=4)
# 仅指定步长为2，默认从头到尾
result_step = s.str.slice(step=2)
# 反向切片，步长为-1
result_reverse = s.str.slice(start=4, stop=0, step=-1)
print("从索引1到4切片：")
print(result_slice)
print("\n每隔一个字符切片：")
print(result_step)
print("\n反向切片：")
print(result_reverse)

337-6-3、结果输出

# 337、pandas.Series.str.slice方法
# 从索引1到4切片：
# 0    ppl
# 1    ana
# 2    her
# 3    ate
# dtype: object
# 
# 每隔一个字符切片：
# 0    ape
# 1    bnn
# 2    cer
# 3     dt
# dtype: object
# 
# 反向切片：
# 0    elpp
# 1    nana
# 2    rreh
# 3     eta
# dtype: object

338、pandas.Series.str.slice_replace方法

338-1、语法

# 338、pandas.Series.str.slice_replace方法
pandas.Series.str.slice_replace(start=None, stop=None, repl=None)
Replace a positional slice of a string with another value.

Parameters:
start
int, optional
Left index position to use for the slice. If not specified (None), the slice is unbounded on the left, i.e. slice from the start of the string.

stop
int, optional
Right index position to use for the slice. If not specified (None), the slice is unbounded on the right, i.e. slice until the end of the string.

repl
str, optional
String for replacement. If not specified (None), the sliced region is replaced with an empty string.

Returns:
Series or Index
Same type as the original object.

338-2、参数

338-2-1、start(可选，默认值为None)：整数或None，指定开始切片的索引位置，索引从0开始，如果不指定(即为None)，默认从字符串的开头开始切片。

338-2-2、stop(可选，默认值为None)：整数或None，指定结束切片的索引位置(不包括该位置)，如果不指定(即为None)，默认一直切到字符串的末尾。

338-2-3、repl(可选，默认值为None)：字符串或None，指定要替换切片部分的字符串，repl将取代原字符串从start到stop位置的内容。

338-3、功能

用于对字符串序列的某一部分进行替换操作，你可以指定从哪个位置开始(start)，到哪个位置结束(stop)，然后用指定的字符串(repl)来替换这部分内容，该方法不会改变原序列，而是返回一个新的序列，其中包含替换后的字符串。

338-4、返回值

返回一个pandas.Series对象，包含处理后的字符串序列，原序列中的每一个字符串都会根据指定的start、stop和repl参数进行相应的替换操作。

338-5、说明

无

338-6、用法

338-6-1、数据准备

无

338-6-2、代码示例

# 338、pandas.Series.str.slice_replace方法
import pandas as pd
# 示例数据
data = pd.Series(['abcdefg', 'hijklmn', 'opqrstu'])
# 使用str.slice_replace()方法
result = data.str.slice_replace(start=2, stop=5, repl='XYZ')
print(result)

338-6-3、结果输出

# 338、pandas.Series.str.slice_replace方法
# 0    abXYZfg
# 1    hiXYZmn
# 2    opXYZtu
# dtype: object

339、pandas.Series.str.split方法

339-1、语法

# 339、pandas.Series.str.split方法
pandas.Series.str.split(pat=None, *, n=-1, expand=False, regex=None)
Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string.

Parameters:
patstr or compiled regex, optional
String or regular expression to split on. If not specified, split on whitespace.

nint, default -1 (all)
Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.

expandbool, default False
Expand the split strings into separate columns.

If True, return DataFrame/MultiIndex expanding dimensionality.

If False, return Series/Index, containing lists of strings.

regexbool, default None
Determines if the passed-in pattern is a regular expression:

If True, assumes the passed-in pattern is a regular expression

If False, treats the pattern as a literal string.

If None and pat length is 1, treats pat as a literal string.

If None and pat length is not 1, treats pat as a regular expression.

Cannot be set to False if pat is a compiled regex

New in version 1.4.0.

Returns:
Series, Index, DataFrame or MultiIndex
Type matches caller unless expand=True (see Notes).

Raises:
ValueError
if regex is False and pat is a compiled regex.

339-2、参数

339-2-1、pat(可选，默认值为None)：字符串或None，指定用于分割字符串的分隔符，如果不指定(即为None)，默认按照空白字符(包括空格、制表符等)进行分割；如果指定了regex=True，则pat被解释为正则表达式。

339-2-2、n(可选，默认值为-1)：整数或None，指定最多分割的次数，如果为-1(默认值)，则不限制分割次数，即分割所有出现的分隔符。

339-2-3、expand(可选，默认值为False)：布尔值，是否将分割结果展开为一个DataFrame，若为True，返回一个DataFrame，每个拆分的部分作为一列；若为False(默认值)，返回一个Series，其中每个元素是一个列表，包含分割后的字符串部分。

339-2-4、regex(可选，默认值为None)：布尔值或None，是否将pat解释为正则表达式，如果为None(默认值)，则会根据pat是否为正则表达式自动判断；如果为True，则强制将pat解释为正则表达式。

339-3、功能

用于将字符串按照指定的分隔符拆分为多个部分，你可以控制分割的次数以及是否将结果展开为多列。

339-4、返回值

如果expand=False(默认值)，返回一个pandas.Series对象，其中每个元素是一个列表，包含分割后的字符串部分；如果expand=True，返回一个pandas.DataFrame对象，其中每列对应分割后的字符串部分。

339-5、说明

无

339-6、用法

339-6-1、数据准备

无

339-6-2、代码示例

# 339、pandas.Series.str.split方法
import pandas as pd
# 示例数据
data = pd.Series(['a,b,c', 'd,e,f', 'g,h,i'])
# 不展开结果，只分割一次
result1 = data.str.split(",", n=1, expand=False)
# 展开结果为多列
result2 = data.str.split(",", expand=True)
print("Result with expand=False:")
print(result1)
print("\nResult with expand=True:")
print(result2)

339-6-3、结果输出

# 339、pandas.Series.str.split方法
# Result with expand=False:
# 0    [a, b,c]
# 1    [d, e,f]
# 2    [g, h,i]
# dtype: object
# 
# Result with expand=True:
#    0  1  2
# 0  a  b  c
# 1  d  e  f
# 2  g  h  i

340、pandas.Series.str.rsplit方法

340-1、语法

# 340、pandas.Series.str.rsplit方法
pandas.Series.str.rsplit(pat=None, *, n=-1, expand=False)
Split strings around given separator/delimiter.

Splits the string in the Series/Index from the end, at the specified delimiter string.

Parameters:
pat
str, optional
String to split on. If not specified, split on whitespace.

n
int, default -1 (all)
Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.

expand
bool, default False
Expand the split strings into separate columns.

If True, return DataFrame/MultiIndex expanding dimensionality.

If False, return Series/Index, containing lists of strings.

Returns:
Series, Index, DataFrame or MultiIndex
Type matches caller unless expand=True (see Notes).

340-2、参数

340-2-1、pat(可选，默认值为None)：字符串或None，指定用于分割字符串的分隔符，如果不指定(即为None)，默认按照空白字符(包括空格、制表符等)进行分割；如果指定了regex=True，则pat被解释为正则表达式。

340-2-2、n(可选，默认值为-1)：整数或None，指定最多分割的次数，从右侧开始，如果为-1(默认值)，则不限制分割次数，即分割所有出现的分隔符。

340-2-3、expand(可选，默认值为False)：布尔值，是否将分割结果展开为一个DataFrame，若为True，返回一个DataFrame，每个拆分的部分作为一列；若为False(默认值)，返回一个Series，其中每个元素是一个列表，包含分割后的字符串部分。

340-3、功能

用于将字符串从右侧开始按照指定的分隔符拆分为多个部分，你可以控制分割的次数以及是否将结果展开为多列，与split()方法不同的是，rsplit()从右向左进行分割，这在处理末尾部分的固定格式或逆序字符串时特别有用。

340-4、返回值

如果expand=False(默认值)，返回一个pandas.Series对象，其中每个元素是一个列表，包含从右侧开始分割后的字符串部分；如果expand=True，返回一个pandas.DataFrame对象，其中每列对应从右侧分割后的字符串部分。

340-5、说明

无

340-6、用法

340-6-1、数据准备

无

340-6-2、代码示例

# 340、pandas.Series.str.rsplit方法
import pandas as pd
# 示例数据
data = pd.Series(['a,b,c', 'd,e,f', 'g,h,i'])
# 不展开结果，只从右侧分割一次
result1 = data.str.rsplit(",", n=1, expand=False)
# 从右侧展开结果为多列
result2 = data.str.rsplit(",", expand=True)
print("Result with expand=False:")
print(result1)
print("\nResult with expand=True:")
print(result2)

340-6-3、结果输出

# 340、pandas.Series.str.rsplit方法
# Result with expand=False:
# 0    [a,b, c]
# 1    [d,e, f]
# 2    [g,h, i]
# dtype: object
# 
# Result with expand=True:
#    0  1  2
# 0  a  b  c
# 1  d  e  f
# 2  g  h  i