Python酷库之旅-第三方库Pandas(095)

一、用法精讲

406、pandas.DataFrame.index属性

406-1、语法

406-2、参数

406-3、功能

406-4、返回值

406-5、说明

406-6、用法

406-6-1、数据准备

406-6-2、代码示例

406-6-3、结果输出

407、pandas.DataFrame.columns属性

407-1、语法

407-2、参数

407-3、功能

407-4、返回值

407-5、说明

407-6、用法

407-6-1、数据准备

407-6-2、代码示例

407-6-3、结果输出

408、pandas.DataFrame.dtypes属性

408-1、语法

408-2、参数

408-3、功能

408-4、返回值

408-5、说明

408-6、用法

408-6-1、数据准备

408-6-2、代码示例

408-6-3、结果输出

409、pandas.DataFrame.info方法

409-1、语法

409-2、参数

409-3、功能

409-4、返回值

409-5、说明

409-6、用法

409-6-1、数据准备

409-6-2、代码示例

409-6-3、结果输出

410、pandas.DataFrame.select_dtypes方法

410-1、语法

410-2、参数

410-3、功能

410-4、返回值

410-5、说明

410-6、用法

410-6-1、数据准备

410-6-2、代码示例

410-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

406、pandas.DataFrame.index属性

406-1、语法

# 406、pandas.DataFrame.index属性
pandas.DataFrame.index
The index (row labels) of the DataFrame.

The index of a DataFrame is a series of labels that identify each row. The labels can be integers, strings, or any other hashable type. The index is used for label-based access and alignment, and can be accessed or modified using this attribute.

Returns:
pandas.Index
The index labels of the DataFrame.

406-2、参数

无

406-3、功能

用于访问或设置DataFrame的索引，索引在DataFrame中扮演着重要角色，因为它定义了行的标签，以便于数据的检索和操作。

406-4、返回值

返回一个Index对象，表示DataFrame的行索引，该对象可以包含多种类型，比如整数、字符串、时间戳等。

406-5、说明

无

406-6、用法

406-6-1、数据准备

无

406-6-2、代码示例

# 406、pandas.DataFrame.index属性
import pandas as pd
# 创建一个DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 访问索引
index = df.index
print("索引为:", index)
# 设置新的索引
df.set_index('A', inplace=True)
print("新的索引为:", df.index)
# 重置索引
df.reset_index(inplace=True)
print("重置后的索引为:", df.index)

406-6-3、结果输出

# 406、pandas.DataFrame.index属性
# 索引为: RangeIndex(start=0, stop=3, step=1)
# 新的索引为: Index([1, 2, 3], dtype='int64', name='A')
# 重置后的索引为: RangeIndex(start=0, stop=3, step=1)

407、pandas.DataFrame.columns属性

407-1、语法

# 407、pandas.DataFrame.columns属性
pandas.DataFrame.columns
The column labels of the DataFrame.

407-2、参数

无

407-3、功能

用于获取或设置DataFrame的列标签(列名)。

407-4、返回值

返回一个Index对象，表示DataFrame的列标签，它可以包含各种类型的数据，如字符串、日期等。

407-5、说明

无

407-6、用法

407-6-1、数据准备

无

407-6-2、代码示例

# 407、pandas.DataFrame.columns属性
import pandas as pd
# 创建一个DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 访问列名
columns = df.columns
print("列名为:", columns)
# 设置新的列名
df.columns = ['Column1', 'Column2']
print("新的列名为:", df.columns)

407-6-3、结果输出

# 407、pandas.DataFrame.columns属性
# 列名为: Index(['A', 'B'], dtype='object')
# 新的列名为: Index(['Column1', 'Column2'], dtype='object')

408、pandas.DataFrame.dtypes属性

408-1、语法

# 408、pandas.DataFrame.dtypes属性
pandas.DataFrame.dtypes
Return the dtypes in the DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more.

Returns:
pandas.Series
The data type of each column.

408-2、参数

无

408-3、功能

用于获取DataFrame中每一列的数据类型，该属性是一个Series，其索引为列名，值为对应列的数据类型。

408-4、返回值

返回一个Series，索引是列名，值的数据类型是(int64,float64,object,datetime64[ns]等)。

408-5、说明

无

408-6、用法

408-6-1、数据准备

无

408-6-2、代码示例

# 408、pandas.DataFrame.dtypes属性
import pandas as pd
# 创建一个DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4.5, 5.5, 6.5],
    'C': ['foo', 'bar', 'baz'],
    'D': pd.date_range('20240101', periods=3)
}
df = pd.DataFrame(data)
# 查看每一列的数据类型
dtypes = df.dtypes
print("每列的数据类型:")
print(dtypes)

408-6-3、结果输出

# 408、pandas.DataFrame.dtypes属性
# 每列的数据类型:
# A             int64
# B           float64
# C            object
# D    datetime64[ns]
# dtype: object

409、pandas.DataFrame.info方法

409-1、语法

# 409、pandas.DataFrame.info方法
pandas.DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None)
Print a concise summary of a DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

Parameters:
verbosebool, optional
Whether to print the full summary. By default, the setting in pandas.options.display.max_info_columns is followed.

bufwritable buffer, defaults to sys.stdout
Where to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.

max_colsint, optional
When to switch from the verbose to the truncated output. If the DataFrame has more than max_cols columns, the truncated output is used. By default, the setting in pandas.options.display.max_info_columns is used.

memory_usagebool, str, optional
Specifies whether total memory usage of the DataFrame elements (including the index) should be displayed. By default, this follows the pandas.options.display.memory_usage setting.

True always show memory usage. False never shows memory usage. A value of ‘deep’ is equivalent to “True with deep introspection”. Memory usage is shown in human-readable units (base-2 representation). Without deep introspection a memory estimation is made based in column dtype and number of rows assuming values consume the same memory amount for corresponding dtypes. With deep memory introspection, a real memory usage calculation is performed at the cost of computational resources. See the Frequently Asked Questions for more details.

show_countsbool, optional
Whether to show the non-null counts. By default, this is shown only if the DataFrame is smaller than pandas.options.display.max_info_rows and pandas.options.display.max_info_columns. A value of True always shows the counts, and False never shows the counts.

Returns:
None
This method prints a summary of a DataFrame and returns None.

409-2、参数

409-2-1、verbose(可选，默认值为None)：布尔值，是否详细显示所有列的信息，True会显示所有列的信息，False则只显示前几列，None会根据列数自动决定。

409-2-2、buf(可选，默认值为None)：TextIOBase，如果指定，则输出信息会写入到这个对象中，而不是直接打印到屏幕上，通常用于重定向输出到文件或其他地方。

409-2-3、max_cols(可选，默认值为None)：整数，指定在verbose=True时要显示的最大列数，如果列数超过这个限制，超出的列会以省略号表示。

409-2-4、memory_usage(可选，默认值为None)：布尔值或'deep'，如果为True，则计算DataFrame的内存使用情况并显示；如果为'deep'，则进行更深层次的内存使用估算，适用于包含对象类型(如字符串)的列；None时，根据DataFrame的大小自动决定。

409-2-5、show_counts(可选，默认值为None)：布尔值，当为True时，显示非空计数；如果为None，则根据列的数量决定是否显示计数。

409-3、功能

用于快速获取DataFrame的概述信息，包括数据类型、缺失值、内存使用情况等，这对数据的初步探索和理解非常有帮助。

409-4、返回值

该方法不返回任何值(返回None)，但会打印出DataFrame的详细信息。

409-5、说明

无

409-6、用法

409-6-1、数据准备

无

409-6-2、代码示例

# 409、pandas.DataFrame.info方法
import pandas as pd
# 创建一个DataFrame
data = {
    'A': [1, 2, None],
    'B': [4.5, None, 6.5],
    'C': ['foo', 'bar', 'baz']
}
df = pd.DataFrame(data)
# 使用info()方法查看DataFrame的信息
df.info(verbose=True, memory_usage=True, show_counts=True)

409-6-3、结果输出

# 409、pandas.DataFrame.info方法
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 3 entries, 0 to 2
# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype
# ---  ------  --------------  -----
#  0   A       2 non-null      float64
#  1   B       2 non-null      float64
#  2   C       3 non-null      object
# dtypes: float64(2), object(1)
# memory usage: 204.0+ bytes

410、pandas.DataFrame.select_dtypes方法

410-1、语法

# 410、pandas.DataFrame.select_dtypes方法
pandas.DataFrame.select_dtypes(include=None, exclude=None)
Return a subset of the DataFrame’s columns based on the column dtypes.

Parameters:
include, exclude
scalar or list-like
A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

Returns:
DataFrame
The subset of the frame including the dtypes in include and excluding the dtypes in exclude.

Raises:
ValueError
If both of include and exclude are empty

If include and exclude have overlapping elements

If any kind of string dtype is passed in.

410-2、参数

410-2-1、include(可选，默认值为None)：字符串、列表或None，指定要包括的数据类型。可以是以下值之一：

float：浮点型数据
int：整型数据
object：对象类型(通常是字符串)
string：字符串类型(在Pandas 1.0及以上版本)
boolean：布尔型数据
category：类别数据
还可以通过列表同时包含多个类型，如['float','int']。

410-2-2、exclude(可选，默认值为None)：字符串、列表或None，指定要排除的数据类型，用法与include类似，可以排除一种或多种数据类型。

410-3、功能

通过列的数据类型来选择DataFrame中的列，这样可以方便地处理特定类型的数据，可以灵活地组合include和exclude，从而精确地控制要返回的列。

410-4、返回值

返回一个新的DataFrame，包含满足条件的列，这个新DataFrame仅包含指定的数据类型，原始DataFrame不会受到影响。

410-5、说明

无

410-6、用法

410-6-1、数据准备

无

410-6-2、代码示例

# 410、pandas.DataFrame.select_dtypes方法
import pandas as pd
# 创建一个DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4.5, 5.5, 6.5],
    'C': ['foo', 'bar', 'baz'],
    'D': [True, False, True]
}
df = pd.DataFrame(data)
# 选择所有浮点和整型数据的列
numeric_df = df.select_dtypes(include=['float', 'int'])
# 选择所有字符串类型的列
string_df = df.select_dtypes(include='object')
# 排除布尔型数据的列
non_bool_df = df.select_dtypes(exclude='boolean')
print("Numeric DataFrame:")
print(numeric_df)
print("\nString DataFrame:")
print(string_df)
print("\nNon-Boolean DataFrame:")
print(non_bool_df)

410-6-3、结果输出

# 410、pandas.DataFrame.select_dtypes方法
# Numeric DataFrame:
#    A    B
# 0  1  4.5
# 1  2  5.5
# 2  3  6.5
#
# String DataFrame:
#      C
# 0  foo
# 1  bar
# 2  baz
#
# Non-Boolean DataFrame:
#    A    B    C
# 0  1  4.5  foo
# 1  2  5.5  bar
# 2  3  6.5  baz