Python酷库之旅-第三方库Pandas(139)

一、用法精讲

626、pandas.plotting.scatter_matrix方法

626-1、语法

626-2、参数

626-3、功能

626-4、返回值

626-5、说明

626-6、用法

626-6-1、数据准备

626-6-2、代码示例

626-6-3、结果输出

627、pandas.plotting.table方法

627-1、语法

627-2、参数

627-3、功能

627-4、返回值

627-5、说明

627-6、用法

627-6-1、数据准备

627-6-2、代码示例

627-6-3、结果输出

628、pandas.array函数

628-1、语法

628-2、参数

628-3、功能

628-4、返回值

628-5、说明

628-6、用法

628-6-1、数据准备

628-6-2、代码示例

628-6-3、结果输出

629、pandas.arrays.ArrowExtensionArray类

629-1、语法

629-2、参数

629-3、功能

629-4、返回值

629-5、说明

629-6、用法

629-6-1、数据准备

629-6-2、代码示例

629-6-3、结果输出

630、pandas.ArrowDtype类

630-1、语法

630-2、参数

630-3、功能

630-4、返回值

630-5、说明

630-6、用法

630-6-1、数据准备

630-6-2、代码示例

630-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

626、pandas.plotting.scatter_matrix方法

626-1、语法

# 626、pandas.plotting.scatter_matrix方法
pandas.plotting.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds=None, hist_kwds=None, range_padding=0.05, **kwargs)
Draw a matrix of scatter plots.

Parameters:
frame
DataFrame
alpha
float, optional
Amount of transparency applied.

figsize
(float,float), optional
A tuple (width, height) in inches.

ax
Matplotlib axis object, optional
grid
bool, optional
Setting this to True will show the grid.

diagonal
{‘hist’, ‘kde’}
Pick between ‘kde’ and ‘hist’ for either Kernel Density Estimation or Histogram plot in the diagonal.

marker
str, optional
Matplotlib marker type, default ‘.’.

density_kwds
keywords
Keyword arguments to be passed to kernel density estimate plot.

hist_kwds
keywords
Keyword arguments to be passed to hist function.

range_padding
float, default 0.05
Relative extension of axis range in x and y with respect to (x_max - x_min) or (y_max - y_min).

**kwargs
Keyword arguments to be passed to scatter function.

Returns:
numpy.ndarray
A matrix of scatter plots.

626-2、参数

626-2-1、frame(必须)：DataFrame，表示要绘制的DataFrame数据源。

626-2-2、alpha(可选，默认值为0.5)：浮点数，表示散点图中点的透明度，范围在[0, 1]之间，值越低，点越透明。

626-2-3、figsize(可选，默认值为None)：(float, float)，表示图表的大小，以英寸为单位。例如，(8,8) 表示宽度为8英寸，高度为8英寸，如果没有提供，则默认使用当前图表的大小设置。

626-2-4、ax(可选，默认值为None)：Matplotlib axis object，表示现有的Matplotlib轴对象，如果提供，图形会在这个轴对象上绘制，而不是创建一个新的。

626-2-5、grid(可选，默认值为False)：布尔值，是否显示网格线，如果设置为True，则会在图表上绘制网格线。

626-2-6、diagonal(可选，默认值为'hist')：{'hist', 'kde'}，指定对角线上绘制的内容，'hist'表示绘制直方图，'kde'表示绘制核密度估计图。

626-2-7、marker(可选，默认值为'.')：字符串，表示散点图中点的形状，可以是任何Matplotlib支持的标记样式，如'.', 'o', 'x'等。

626-2-8、density_kwds(可选，默认值为None)：字典，包含传递给pandas.DataFrame.plot.kde()方法的关键字参数，仅在diagonal='kde'时使用。

626-2-9、hist_kwds(可选，默认值为None)：字典，包含传递给pandas.DataFrame.hist()方法的关键字参数，仅在diagonal='hist'时使用。

626-2-10、range_padding(可选，默认值为0.05)：浮点数，为坐标范围添加的填充值(值0.05表示坐标范围整体向外扩展5%)。

626-2-11、**kwargs(可选)：其它传递给plt.scatter函数的参数。例如，可以使用c='red'来将散点图中的点设为红色。

626-3、功能

创建一个矩阵，其中每对列之间都绘制一个散点图，同时在对角线上绘制每列数据的直方图或密度图，该图表非常有助于数据分析，特别是在探索变量之间的关系时。

626-4、返回值

一个包含所有子图轴对象的Numpy n维数组，可以用来进一步调整每个子图。

626-5、说明

无

626-6、用法

626-6-1、数据准备

无

626-6-2、代码示例

# 626、pandas.plotting.scatter_matrix方法
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
# 创建示例数据
df = pd.DataFrame(np.random.rand(100, 4), columns=['A', 'B', 'C', 'D'])
# 创建散点矩阵图
scatter_matrix(df, alpha=0.5, figsize=(10, 10), diagonal='kde')
# 显示图形
plt.show()

626-6-3、结果输出

# 626、pandas.plotting.scatter_matrix方法
见图1

图1：

627、pandas.plotting.table方法

627-1、语法

# 627、pandas.plotting.table方法
pandas.plotting.table(ax, data, **kwargs)
Helper function to convert DataFrame and Series to matplotlib.table.

Parameters:
ax
Matplotlib axes object
data
DataFrame or Series
Data for table contents.

**kwargs
Keyword arguments to be passed to matplotlib.table.table. If rowLabels or colLabels is not specified, data index or column name will be used.

Returns:
matplotlib table object.

627-2、参数

627-2-1、ax(必须)：Matplotlib axis object，指定在其上绘制表格的轴对象，通过提供一个现有的Matplotlib轴，你可以将表格绘制在一个现有的图形上。

627-2-2、data(必须)：DataFrame or Series，表示要在表格中显示的数据，这可以是一个二进制的数据帧或者一个序列。

627-2-3、**kwargs(可选)：其他关键字参数，这些是传递给底层的Matplotlib table函数的可选参数，用于自定义表格的外观。这些参数可以包括：

cellColours：指定每个单元格的背景颜色。
cellLoc：指定单元格内容对齐方式('left', 'center', 'right')。
colWidths：指定每列的宽度。
rowLabels：手动设置行标签。
colLabels：手动设置列标签。
loc：指定表格在轴中的放置位置(如'bottom', 'center'等)。
bbox：指定表格的边界框，可以用来定义表格位置和尺寸。

627-3、功能

用于在给定的Matplotlib轴(ax)上创建一个表格，这个表格显示了传入的DataFrame或Series数据，适用于在数据可视化图表旁边直接展示数据值。

627-4、返回值

返回一个包含表格实例的Matplotlib表格对象，可以使用这个对象对表格进行进一步的自定义，如调整样式。

627-5、说明

无

627-6、用法

627-6-1、数据准备

无

627-6-2、代码示例

# 627、pandas.plotting.table方法
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import table
# 创建示例数据
data = {'A': [1, 2, 3, 4],
        'B': [5, 6, 7, 8],
        'C': [9, 10, 11, 12]}
df = pd.DataFrame(data)
# 创建图形和轴
fig, ax = plt.subplots(figsize=(6, 4))
# 关闭轴，以便只显示表格
ax.axis('off')
# 绘制表格
tbl = table(ax, df, loc='center', cellLoc='center')
# 显示图形
plt.show()

627-6-3、结果输出

# 627、pandas.plotting.table方法
见图2

图2：

628、pandas.array函数

628-1、语法

# 628、pandas.array函数
pandas.array(data, dtype=None, copy=True)
Create an array.

Parameters:
dataSequence of objects
The scalars inside data should be instances of the scalar type for dtype. It’s expected that data represents a 1-dimensional array of data.

When data is an Index or Series, the underlying array will be extracted from data.

dtypestr, np.dtype, or ExtensionDtype, optional
The dtype to use for the array. This may be a NumPy dtype or an extension type registered with pandas using pandas.api.extensions.register_extension_dtype().

If not specified, there are two possibilities:

When data is a Series, Index, or ExtensionArray, the dtype will be taken from the data.

Otherwise, pandas will attempt to infer the dtype from the data.

Note that when data is a NumPy array, data.dtype is not used for inferring the array type. This is because NumPy cannot represent all the types of data that can be held in extension arrays.

Currently, pandas will infer an extension dtype for sequences of

Scalar Type

Array Type

pandas.Interval

pandas.arrays.IntervalArray

pandas.Period

pandas.arrays.PeriodArray

datetime.datetime

pandas.arrays.DatetimeArray

datetime.timedelta

pandas.arrays.TimedeltaArray

int

pandas.arrays.IntegerArray

float

pandas.arrays.FloatingArray

str

pandas.arrays.StringArray or pandas.arrays.ArrowStringArray

bool

pandas.arrays.BooleanArray

The ExtensionArray created when the scalar type is str is determined by pd.options.mode.string_storage if the dtype is not explicitly given.

For all other cases, NumPy’s usual inference rules will be used.

copybool, default True
Whether to copy the data, even if not necessary. Depending on the type of data, creating the new array may require copying data, even if copy=False.

Returns:
ExtensionArray
The newly created array.

Raises:
ValueError
When data is not 1-dimensional.

628-2、参数

628-2-1、data(必须)：array-like，表示传入的数据，可以是列表、NumPy数组、Pandas Series或者其他任何array-like结构，此数据将被用于创建ExtensionArray。

628-2-2、dtype(可选，默认值为None)：dtype，指定希望创建的数组的数据类型，Pandas支持许多扩展的dtypes，比如Int64、string、boolean等，指定dtype有助于在创建数组时控制其行为和支持的操作。

628-2-3、copy(可选，默认值为True)：布尔值，是否复制数据，如果为True，则对原始数据进行复制，确保原始数据不被改变，设置为False将避免复制，从而提高性能，但这只在你确认原数据不会被意外修改时使用。

628-3、功能

创建一个Pandas ExtensionArray，该数组为数据操作提供更多的灵活性和功能性，而不仅仅局限于NumPy支持的基本数据类型。

628-4、返回值

返回一个基于传入数据和指定dtype创建的Pandas ExtensionArray。

628-5、说明

无

628-6、用法

628-6-1、数据准备

无

628-6-2、代码示例

# 628、pandas.array函数
import pandas as pd
# 示例数据
data = [1, 2, 3, 4, 5]
# 创建一个整数类型扩展数组
int_array = pd.array(data, dtype="Int64")
print(int_array, end='\n\n')
# 创建一个字符串类型扩展数组
str_array = pd.array(['a', 'b', 'c'], dtype="string")
print(str_array)

628-6-3、结果输出

# 628、pandas.array函数
# <IntegerArray>
# [1, 2, 3, 4, 5]
# Length: 5, dtype: Int64
# 
# <StringArray>
# ['a', 'b', 'c']
# Length: 3, dtype: string

629、pandas.arrays.ArrowExtensionArray类

629-1、语法

# 629、pandas.arrays.ArrowExtensionArray类
class pandas.arrays.ArrowExtensionArray(values)
Pandas ExtensionArray backed by a PyArrow ChunkedArray.

Warning

ArrowExtensionArray is considered experimental. The implementation and parts of the API may change without warning.

Parameters:
values
pyarrow.Array or pyarrow.ChunkedArray
Returns:
ArrowExtensionArray
Notes

Most methods are implemented using pyarrow compute functions. Some methods may either raise an exception or raise a PerformanceWarning if an associated compute function is not available based on the installed version of PyArrow.

Please install the latest version of PyArrow to enable the best functionality and avoid potential bugs in prior versions of PyArrow.

629-2、参数

629-2-1、values(必须)：参数是需要存储在数组中的实际数据，它通常是一个pyarrow.Array或者类似的数据结构，使用Apache Arrow进行数据存储有利于高效的序列化和反序列化操作。

629-3、功能

将pyarrow.Array或者类似格式的数据包装成Pandas扩展数组，使其适用于Pandas的DataFrame和Series，并提供高效的数据处理能力，这种方式能够大大提高数据操作的速度，尤其适用于大数据集和高频数据处理场景。

629-4、返回值

返回一个ArrowExtensionArray对象，该对象可以被Pandas DataFrame和Series使用，就像其他Pandas扩展数组一样，你可以像平常使用Pandas类型那样对其进行操作，但同时你可以享受Apache Arrow提供的高效数据处理能力。

629-5、说明

无

629-6、用法

629-6-1、数据准备

无

629-6-2、代码示例

# 629、pandas.arrays.ArrowExtensionArray类
import pandas as pd
import pyarrow as pa
# 通过pyarrow创建一个Arrow Array
arrow_array = pa.array([1, 2, 3, 4, 5])
# 使用pandas.arrays.ArrowExtensionArray包装Arrow Array
arrow_extension_array = pd.arrays.ArrowExtensionArray(arrow_array)
# 将ArrowExtensionArray用于Pandas Series
series = pd.Series(arrow_extension_array)
print(series)

629-6-3、结果输出

# 629、pandas.arrays.ArrowExtensionArray类
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# dtype: int64[pyarrow]

630、pandas.ArrowDtype类

630-1、语法

# 630、pandas.ArrowDtype类
class pandas.ArrowDtype(pyarrow_dtype)
An ExtensionDtype for PyArrow data types.

Warning

ArrowDtype is considered experimental. The implementation and parts of the API may change without warning.

While most dtype arguments can accept the “string” constructor, e.g. "int64[pyarrow]", ArrowDtype is useful if the data type contains parameters like pyarrow.timestamp.

Parameters:
pyarrow_dtype
pa.DataType
An instance of a pyarrow.DataType.

Returns:
ArrowDtype.

630-2、参数

630-2-1、pyarrow_dtype(必须)：该参数是一个pyarrow数据类型对象(如pyarrow.int64()、pyarrow.string()等等)，它指定了在Arrow数据类型系统中所使用的数据类型。

630-3、功能

作为一个桥梁，使得Pandas可以利用Apache Arrow的数据类型和相关功能，这对处理大数据和高性能计算非常有用，因为Arrow是一个跨语言的内存格式，可以高效地进行序列化和反序列化。

630-4、返回值

当你创建一个ArrowDtype实例时，它返回一个对象，这个对象代表了指定的Arrow数据类型，你可以在Pandas DataFrame或Series中使用这个数据类型，从而充分利用Arrow的高性能特性。

630-5、说明

无

630-6、用法

630-6-1、数据准备

无

630-6-2、代码示例

# 630、pandas.ArrowDtype类
import pandas as pd
import pyarrow as pa
# 创建一个ArrowDtype实例
arrow_dtype = pd.ArrowDtype(pa.int64())
# 创建一个Pandas Series使用ArrowDtype
s = pd.Series([1, 2, 3], dtype=arrow_dtype)
print(s)
print(type(s.dtype))

630-6-3、结果输出

# 630、pandas.ArrowDtype类
# 0    1
# 1    2
# 2    3
# dtype: int64[pyarrow]
# <class 'pandas.core.dtypes.dtypes.ArrowDtype'>