pandas|判断是否包含|contains|isin

news2025/4/9 4:46:13

文章目录

1. 方法简介
- 1.1 pandas.Series.str.contains
- 1.2 pandas.DataFrame.isin
2. 示例1
3. 示例2
4. 相关文章
- (1) pandas分组聚合|agg|transform|apply
- (2) 缺省值判断 pd.isnull, pd.isna, pd.notna, pd.notnull, np.isnan, math.isnan 区别
- (3) pandas中DataFrame字典互转
- (4) pandas.concat实现DataFrame竖着拼接、横着拼接
- (5) pandas|找出某列最大值的所在的行
- (6) DataFrame——指定位置增加删除一行一列
- (7) AttributeError: module ‘pandas‘ has no attribute ‘isna‘
- (8) pandas--Series.str--字符串处理
- (9) list、ndarry、Series、DataFrame的创建、索引和选取
- (10) Series和DataFrame复合索引的创建和取值
- (11) pd.notnull
- (12) Pandas|DataFrame| 处理DataFrame中的inf值
- (13) 由字典dictionary或列表list创建dataframe
- (14) pandas|DataFrame排序及分组排序
- (15) Pandas|DataFrame| DataFrame中的nan值处理
- (16) pandas|判断是否包含|contains|isin

1. 方法简介

1.1 pandas.Series.str.contains

Series.str.contains（pat，case = True，flags = 0，na = nan，regex = True)

函数作用
测试pattern或regex是否包含在Series或Index的字符串中。
返回布尔值系列或索引，具体取决于给定模式或正则表达式是否包含在系列或索引的字符串中。
pat ： str类型
字符序列或正则表达式。
case ： bool，默认为True
如果为True，区分大小写。
flags ： int，默认为0（无标志）
标志传递到re模块，例如re.IGNORECASE。
na ：默认NaN
填写缺失值的值。
na = True 就表示把有NAN的转换为布尔值True
na = False 就表示把有NAN的转换为布尔值False
regex ： bool，默认为True
如果为True，则假定pat是正则表达式。
如果为False，则将pat视为文字字符串。
返回：
布尔值的系列或索引
布尔值的Series或Index，指示给定模式是否包含在Series或Index的每个元素的字符串中。

1.2 pandas.DataFrame.isin

pd.isin

函数作用
Pandas isin()方法用于过滤数据帧。isin()方法有助于选择在特定列中具有特定(或多个)值的行。
用法：DataFrame.isin(values)
参数：
values:Iterable，Series，List，Tuple，DataFrame或字典以检入调用方的Series /Data Frame。
返回类型：维度布尔值的DataFrame。

2. 示例1

# Returning a Series of booleans using only a literal pattern.
s1 = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.nan])
s1.str.contains('og', regex=False)

0    False
1     True
2    False
3    False
4      NaN
dtype: object

#Returning an Index of booleans using only a literal pattern.
ind = pd.Index(['Mouse', 'dog', 'house and parrot', '23.0', np.nan])
ind.str.contains('23', regex=False)

Index([False, False, False, True, nan], dtype='object')

# case区分大小写，regex使用正则
s1.str.contains('oG', case=True, regex=True)

0    False
1    False
2    False
3    False
4      NaN
dtype: object

# nan值判定为False
s1.str.contains('og', na=False, regex=True)

0    False
1     True
2    False
3    False
4    False
dtype: bool

# 使用正则
s1.str.contains('house|dog', regex=True)

0    False
1     True
2     True
3    False
4      NaN
dtype: object

# Ignoring case sensitivity using flags with regex.
import re
s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True)

0    False
1    False
2     True
3    False
4      NaN
dtype: object

3. 示例2

df = pd.DataFrame({'name': ['悟空','唐僧','八戒','沙僧','白龙马'], 
                   'age': [1000,33,800,700,500],
                  '兵器': ['金箍棒',np.nan, '九齿钉耙','降妖宝杖','龙泉宝剑'],
                  '力量':[100.0, 10.0, 60.0, 50.0, np.nan]}, 
                  index=['a', 'b', 'c', 'd','e'])
df

	name	age	兵器	力量
a	悟空	1000	金箍棒	100.0
b	唐僧	33	NaN	10.0
c	八戒	800	九齿钉耙	60.0
d	沙僧	700	降妖宝杖	50.0
e	白龙马	500	龙泉宝剑	NaN

# 当值为列表时，检查DataFrame中的每个值是否都存在于列表中
df.isin([1000, 33])

	name	age	兵器	力量
a	False	True	False	False
b	False	True	False	False
c	False	False	False	False
d	False	False	False	False
e	False	False	False	False

~df.isin([1000, 33])

	name	age	兵器	力量
a	True	False	True	True
b	True	False	True	True
c	True	True	True	True
d	True	True	True	True
e	True	True	True	True

# 当值是dict时，我们可以分别传递值来检查每列
df.isin({'兵器': ['金箍棒', '九齿钉耙']})

	name	age	兵器	力量
a	False	False	True	False
b	False	False	False	False
c	False	False	True	False
d	False	False	False	False
e	False	False	False	False

# 当值为Series或DataFrame时，索引和列必须匹配。
other = pd.DataFrame({'name': ['悟空', '-'], 'age': [1000, 33], '兵器': ['金箍棒', '-'], '力量': [100.0, 10.0]}, index=['a', 'b'])
df.isin(other)

	name	age	兵器	力量
a	True	True	True	True
b	False	True	False	True
c	False	False	False	False
d	False	False	False	False
e	False	False	False	False

names = ['悟空', '八戒', '沙僧']
names = '|'.join(names)
names

'悟空|八戒|沙僧'

df['name'].str.contains(names, regex=True)

a     True
b    False
c     True
d     True
e    False
Name: name, dtype: bool

df[df['name'].str.contains(names, regex=True)]

	name	age	兵器	力量
a	悟空	1000	金箍棒	100.0
c	八戒	800	九齿钉耙	60.0
d	沙僧	700	降妖宝杖	50.0

4. 相关文章

(1) pandas分组聚合|agg|transform|apply

pandas分组聚合|agg|transform|apply

(2) 缺省值判断 pd.isnull, pd.isna, pd.notna, pd.notnull, np.isnan, math.isnan 区别

缺省值判断 pd.isnull, pd.isna, pd.notna, pd.notnull, np.isnan, math.isnan 区别

(3) pandas中DataFrame字典互转

pandas中DataFrame字典互转

(4) pandas.concat实现DataFrame竖着拼接、横着拼接

pandas.concat实现DataFrame竖着拼接、横着拼接

(5) pandas|找出某列最大值的所在的行

pandas|找出某列最大值的所在的行

(6) DataFrame——指定位置增加删除一行一列

DataFrame——指定位置增加删除一行一列

(7) AttributeError: module ‘pandas‘ has no attribute ‘isna‘

AttributeError: module ‘pandas‘ has no attribute ‘isna‘

(8) pandas–Series.str–字符串处理

pandas–Series.str–字符串处理

(9) list、ndarry、Series、DataFrame的创建、索引和选取

list、ndarry、Series、DataFrame的创建、索引和选取

(10) Series和DataFrame复合索引的创建和取值

Series和DataFrame复合索引的创建和取值

(11) pd.notnull

pd.notnull

(12) Pandas|DataFrame| 处理DataFrame中的inf值

Pandas|DataFrame| 处理DataFrame中的inf值

(13) 由字典dictionary或列表list创建dataframe

由字典dictionary或列表list创建dataframe

(14) pandas|DataFrame排序及分组排序

pandas|DataFrame排序及分组排序

(15) Pandas|DataFrame| DataFrame中的nan值处理

Pandas|DataFrame| DataFrame中的nan值处理

(16) pandas|判断是否包含|contains|isin

pandas|判断是否包含|contains|isin

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/958246.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！