100天精通Python（数据分析篇）——第69天：Pandas常用数据筛选方法（between、isin、loc、iloc）

news2026/3/13 1:23:23

在这里插入图片描述

文章目录

每篇前言
一、布尔索引
二、between()
三、isin()
- 1. 单列筛选
- 2. 多列筛选
- 3. 通过字典的形式传递多个条件
- 4. 删除异常值所在行
- 5. isnotin实现
四、loc、iloc（重要）
- 0. 创建DataFrame
- 1. 提取行数据
- 2. 提取列数据
- 3. 提取多列数据
- 4. 提取指定行、指定列数据
- 5. 提取所有数据
- 6. 提取指定数据行

每篇前言

🏆🏆作者介绍：Python领域优质创作者、华为云享专家、阿里云专家博主、2021年CSDN博客新星Top6

🔥🔥本文已收录于Python全栈系列专栏：《100天精通Python从入门到就业》
📝📝此专栏文章是专门针对Python零基础小白所准备的一套完整教学，从0到100的不断进阶深入的学习，各知识点环环相扣
🎉🎉订阅专栏后续可以阅读Python从入门到就业100篇文章；还可私聊进两百人Python全栈交流群（手把手教学，问题解答）； 进群可领取80GPython全栈教程视频 + 300本计算机书籍：基础、Web、爬虫、数据分析、可视化、机器学习、深度学习、人工智能、算法、面试题等。
🚀🚀加入我一起学习进步，一个人可以走的很快，一群人才能走的更远！

在数据分析清洗数据过程中，可能需要会滤掉、删除DataFrame中一些行，本文将介绍常用的筛选方法。

一、布尔索引

布尔索引可以用于判断和筛选

>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(3, 3), columns=['A', 'B', 'C'])
>>> print(df)
          A         B         C
0 -0.595510 -1.349175 -0.313918
1  1.130604 -2.094348 -0.449182
2  1.745407 -0.136642 -0.943479
>>>
>>> # 布尔索引判断：A列大于1的数
>>> print(df['A'] > 1)
0    False
1     True
2     True
Name: A, dtype: bool
>>>
>>> # 布尔索引筛选：A列中大于1的行
>>> print(df[df['A'] > 1])
          A         B         C
1  1.130604 -2.094348 -0.449182
2  1.745407 -0.136642 -0.943479

二、between()

between(left,right)，筛选指定区间的行

>>> import pandas as pd
>>>
>>> data = {'name': ['小红', '小明', '小白', '小黑'], 'age': [10, 20, 30, 25]}
>>> df = pd.DataFrame(data)
>>> print(df)
  name  age
0   小红   10
1   小明   20
2   小白   30
3   小黑   25
>>>
>>> # 判断年龄是否在20-30之间
>>> print(df['age'].between(20, 30))
0    False
1     True
2     True
3     True
Name: age, dtype: bool
>>> # 筛选年龄在20-30之间的行
>>> print(df[df['age'].between(20, 30)])
  name  age
1   小明   20
2   小白   30
3   小黑   25

三、isin()

isin()接收一个列表，可以同时判断数据是否与多个值相等，若与其中的某个值相等则返回True，否则则为False

创建DataFrame：

>>> import pandas as pd
>>> import numpy as np
>>>
>>> data = [['foo', 'one', 'small', 1], ['foo', 'one', 'large', 5],
...         ['bar', 'one', 'small', 10], ['bar', 'two', 'samll', 10],
...         ['bar', 'two', 'large', 50]]
>>> df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])
>>> print(df)
     A    B      C   D
0  foo  one  small   1
1  foo  one  large   5
2  bar  one  small  10
3  bar  two  samll  10
4  bar  two  large  50

1. 单列筛选

df[df[列名].isin([异常值])]

>>> # 1. 接收一个值：判断A列中的值是否为foo
>>> df['A'].isin(['foo'])
0     True
1     True
2    False
3    False
4    False
Name: A, dtype: bool
>>>
>>> # 2. 接收多个值：判断A列中的值是否为foo，bar
>>> df['A'].isin(['foo','bar'])
0    True
1    True
2    True
3    True
4    True
Name: A, dtype: bool

2. 多列筛选

同时满足用&连接，或的话用 | 连接

筛选出每列都有异常值的行：df[df[列名].isin([异常值])& df[列名].isin([异常值])]

>>> # 筛选中A列中等于bar，并且B列中等于one的行
>>> df[df['A'].isin(['bar'])& df['B'].isin(['one'])]
     A    B      C   D
2  bar  one  small  10

筛选出至少有一列有异常值的行：df[df[列名].isin([异常值])| df[列名].isin([异常值])]

>>> # 筛选中A列中等于bar，或者B列中等于one的行
>>> df[df['A'].isin(['bar']) | df['B'].isin(['one'])]
     A    B      C   D
0  foo  one  small   1
1  foo  one  large   5
2  bar  one  small  10
3  bar  two  samll  10
4  bar  two  large  50

3. 通过字典的形式传递多个条件

{‘某列’:[条件],‘某列’:[条件],}

# 这种方法不符合的位置都会显示NAN

>>> df[df.isin({'A':['bar'],'C':['small']})]
     A    B      C   D
0  NaN  NaN  small NaN
1  NaN  NaN    NaN NaN
2  bar  NaN  small NaN
3  bar  NaN    NaN NaN
4  bar  NaN    NaN NaN

4. 删除异常值所在行

因为isin()返还的是boolean的DataFrame，在里面的是True，不在里面的是False，所以我们只需要对它进行异或取反即可。

# 删除A列中foo的行

>>> df[True^df['A'].isin(['foo'])]
     A    B      C   D
2  bar  one  small  10
3  bar  two  samll  10
4  bar  two  large  50

5. isnotin实现

前面加上 ~

# 删除A列中foo的行
>>> df[~(df['A']=='foo')]
     A    B      C   D
2  bar  one  small  10
3  bar  two  samll  10
4  bar  two  large  50

四、loc、iloc（重要）

loc()函数和iloc()函数的区别在于：

loc()函数是通过索引名称提取数据
iloc()函数通过行和列的下标提取数据

0. 创建DataFrame

>>> import pandas as pd
>>>
>>> data = [['foo', 'one', 'small', 1], ['foo', 'one', 'large', 5],
...         ['bar', 'one', 'small', 10], ['bar', 'two', 'samll', 10],
...         ['bar', 'two', 'large', 50]]
>>> df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=['a', 'b', 'c', 'd', 'e'])
>>> print(df)
     A    B      C   D
a  foo  one  small   1
b  foo  one  large   5
c  bar  one  small  10
d  bar  two  samll  10
e  bar  two  large  50

1. 提取行数据

>>> # loc取索引为a的行（第一行）
>>> df.loc['a']
A      foo
B      one
C    small
D        1
Name: a, dtype: object
>>>
>>> # iloc取索引为a的行（第一行）
>>> df.iloc[0]
A      foo
B      one
C    small
D        1
Name: a, dtype: object

2. 提取列数据

>>> # loc取A列所有行
>>> df.loc[:, ['A']]
     A
a  foo
b  foo
c  bar
d  bar
e  bar
>>>
>>> # iloc取A列所有行
>>> df.iloc[:,[0]]
     A
a  foo
b  foo
c  bar
d  bar
e  bar

3. 提取多列数据

（1）连续多列：

>>> # loc取A，B，C列所有行
>>> df.loc[:, ['A', 'B', 'C']]
     A    B      C
a  foo  one  small
b  foo  one  large
c  bar  one  small
d  bar  two  samll
e  bar  two  large
>>>
>>> # iloc取A，B，C列所有行
>>> df.iloc[:, 0:3]
     A    B      C
a  foo  one  small
b  foo  one  large
c  bar  one  small
d  bar  two  samll
e  bar  two  large

（2）不连续多列

>>> # loc取A，D列所有行
>>> df.loc[:, ['A', 'D']]
     A   D
a  foo   1
b  foo   5
c  bar  10
d  bar  10
e  bar  50
>>>
>>> # iloc取A，D列所有行
>>> df.iloc[:, [0,3]]
     A   D
a  foo   1
b  foo   5
c  bar  10
d  bar  10
e  bar  50

4. 提取指定行、指定列数据

>>> # loc取索引为a、d，并且列名也为A、D的行和列
>>> df.loc[['a', 'd'], ['A', 'D']]
     A   D
a  foo   1
d  bar  10
>>>
>>> # iloc取索引为a、d，并且列名也为A、D的行和列
>>> df.iloc[[0, 3], [0, 3]]
     A   D
a  foo   1
d  bar  10

5. 提取所有数据

>>> # loc取全部
>>> df.loc[:,:]
     A    B      C   D
a  foo  one  small   1
b  foo  one  large   5
c  bar  one  small  10
d  bar  two  samll  10
e  bar  two  large  50
>>>
>>> # iloc取全部
>>> df.iloc[:,:]
     A    B      C   D
a  foo  one  small   1
b  foo  one  large   5
c  bar  one  small  10
d  bar  two  samll  10
e  bar  two  large  50

6. 提取指定数据行

利用loc可以对值进行筛选

>>> # loc取A列值为foo的行
>>> df.loc[df['A'] == 'foo']
     A    B      C  D
a  foo  one  small  1
b  foo  one  large  5
>>>
>>> # loc取D值大于等于10的行
>>> df.loc[df['D'] >= 10]
     A    B      C   D
c  bar  one  small  10
d  bar  two  samll  10
e  bar  two  large  50