Python酷库之旅-第三方库Pandas(011)

一、用法精讲

25、pandas.HDFStore.get函数

25-1、语法

25-2、参数

25-3、功能

25-4、返回值

25-5、说明

25-6、用法

25-6-1、数据准备

25-6-2、代码示例

25-6-3、结果输出

26、pandas.HDFStore.select函数

26-1、语法

26-2、参数

26-3、功能

26-4、返回值

26-5、说明

26-6、用法

26-6-1、数据准备

26-6-2、代码示例

26-6-3、结果输出

27、pandas.HDFStore.info函数

27-1、语法

27-2、参数

27-3、功能

27-4、返回值

27-5、说明

27-6、用法

27-6-1、数据准备

27-6-2、代码示例

27-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

25、pandas.HDFStore.get函数

25-1、语法

# 25、pandas.HDFStore.get函数
HDFStore.get(key)
Retrieve pandas object stored in file.

Parameters:
key
str
Returns:
object
Same type as object stored in file.

25-2、参数

25-2-1、key(必须)：一个字符串，指定了要检索的数据在HDF5文件中的位置或名称，这个key通常对应于你在将数据保存到HDF5文件时所使用的名称或路径。

25-3、功能

用于从HDF5文件中检索(或获取)存储的数据。

25-4、返回值

一般来说，这个函数会返回与key相关联的pandas对象，如DataFrame、Series或其他可能的pandas容器。

具体来说，返回值可以是：

25-4-1、DataFrame：如果存储在HDF5文件中与key相关联的数据是一个表格或类似表格的数据结构，那么get方法将返回一个DataFrame对象。DataFrame是pandas中用于存储和操作结构化数据的主要数据结构，它以表格形式存储数据，包含行和列。

25-4-2、Series：在某些情况下，如果存储的数据是一维的，比如时间序列数据或单个列的数据，那么get方法可能会返回一个Series对象，Series是pandas中用于存储一维数据(即具有索引的数组)的数据结构。

25-4-3、其他pandas对象：虽然不常见，但理论上HDF5文件中也可以存储其他类型的pandas对象，如Panel(注意：从pandas 0.25.0版本开始，Panel已被弃用并从pandas库中移除)。然而，随着pandas的发展，这种情况变得越来越罕见。

25-4-4、None或默认值：如果指定的key在HDF5文件中不存在，并且get方法没有提供默认值作为第二个参数，那么它可能会引发一个KeyError。但是，如果提供了默认值(尽管这不是get方法的标准行为，因为get方法在HDFStore中通常不直接支持默认值参数，这可能是DataFrame.get方法的混淆)，那么它将返回该默认值。然而，在HDFStore的上下文中，更常见的做法是使用try-except块来捕获KeyError，并在需要时处理这种情况。

25-5、说明

无

25-6、用法

25-6-1、数据准备

无

25-6-2、代码示例

# 25、pandas.HDFStore.get函数
import pandas as pd
# 创建一个示例的DataFrame
data = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['foo', 'bar', 'foo', 'bar'],
    'C': [0.1, 0.2, 0.3, 0.4]
})
# 将数据保存到HDF5文件中
filename = 'example.h5'
key = 'data'
data.to_hdf(filename, key=key, format='table', mode='w')
# 从HDF5文件中读取数据
with pd.HDFStore(filename, mode='r') as store:
    df_from_hdf = store.get(key)
# 打印读取的数据
print("Data read from HDF5:")
print(df_from_hdf)

25-6-3、结果输出

# 25、pandas.HDFStore.get函数
# Data read from HDF5:
#    A    B    C
# 0  1  foo  0.1
# 1  2  bar  0.2
# 2  3  foo  0.3
# 3  4  bar  0.4

26、pandas.HDFStore.select函数

26-1、语法

# 26、pandas.HDFStore.select函数
HDFStore.select(key, where=None, start=None, stop=None, columns=None, iterator=False, chunksize=None, auto_close=False)
Retrieve pandas object stored in file, optionally based on where criteria.

Warning

Pandas uses PyTables for reading and writing HDF5 files, which allows serializing object-dtype data with pickle when using the “fixed” format. Loading pickled data received from untrusted sources can be unsafe.

See: https://docs.python.org/3/library/pickle.html for more.

Parameters:
key
str
Object being retrieved from file.

where
list or None
List of Term (or convertible) objects, optional.

start
int or None
Row number to start selection.

stop
int, default None
Row number to stop selection.

columns
list or None
A list of columns that if not None, will limit the return columns.

iterator
bool or False
Returns an iterator.

chunksize
int or None
Number or rows to include in iteration, return an iterator.

auto_close
bool or False
Should automatically close the store when finished.

Returns:
object
Retrieved object from file.

26-2、参数

26-2-1、key(必须)：要检索的HDF5文件中的键(或路径)，这通常是在将数据保存到HDF5文件时指定的名称或路径。

26-2-2、where(可选，默认值为None)：用于筛选数据的条件表达式。如果是一个字符串，则它应该是一个有效的Pandas查询字符串，类似于在DataFrame上使用.query()方法时所使用的字符串；如果是一个可调用对象(如函数)，则它应该接受一个DataFrame作为输入，并返回一个布尔序列来指示哪些行应该被选中。

26-2-3、start/stop(可选，默认值为None)：要检索的行的起始/结束索引(0-based)，如果指定了start和stop，则只会检索这两个索引之间的行(包括start，但不包括stop)。

26-2-4、columns(可选，默认值为None)：要检索的列名列表或单个列名，如果指定了此参数，则只会检索这些列的数据。

26-2-5、iterator(可选，默认值为False)：如果为True，则返回一个迭代器，该迭代器逐个块地生成数据，而不是一次性将整个数据集加载到内存中，这对于处理大型数据集非常有用。

26-2-6、chunksize(可选，默认值为None)：当iterator=True时，此参数指定了每个块中的行数，它允许你控制内存的使用量，并可以在处理大型数据集时提高性能。

26-2-7、auto_close(可选，默认值为False)：如果为True，则在迭代器耗尽或发生异常时自动关闭存储，这有助于确保即使在发生错误时也能正确关闭文件。然而，请注意，如果你打算在迭代器耗尽后继续使用HDFStore对象，则应该将此参数设置为False。

26-3、功能

从HDF5文件中检索存储在特定key下的pandas对象(如DataFrame或Series)，并允许用户根据一系列参数来筛选或控制检索的数据。

26-4、返回值

返回值取决于存储在HDF5文件中与key相关联的数据类型以及查询条件(如果有的话)。通常情况下，返回值是pandas对象，如：

26-4-1、DataFrame：如果检索的数据是表格形式的，那么返回的将是一个DataFrame对象。

26-4-2、Series：如果检索的数据是一维的(例如，单个列的数据)，那么返回的可能是一个Series对象，尽管这通常发生在明确指定了单列作为columns参数时。

26-4-3、其他pandas对象：理论上也可以是其他pandas容器，但在HDF5文件的上下文中，最常见的是DataFrame和Series。

26-5、说明

无

26-6、用法

26-6-1、数据准备

无

26-6-2、代码示例

# 26、pandas.HDFStore.select函数
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
np.random.seed(0)  # 设置随机种子以确保结果可重复
data = pd.DataFrame({
    'A': np.random.randn(100),
    'B': np.random.randn(100),
    'C': np.random.randn(100),
    'D': np.random.randint(0, 2, 100)
})
# 将DataFrame保存到HDF5文件中
with pd.HDFStore('example.h5') as store:
    store.put('data', data, format='table')
# 从HDF5文件中检索数据的示例
with pd.HDFStore('example.h5') as store:
    # 选择所有数据
    print("\nAll data:")
    all_data = store.select('data')
    print(all_data.head())  # 只打印前几行以节省空间
    # 选择特定的列
    print("\nSpecific columns (A, B):")
    specific_columns = store.select('data', columns=['A', 'B'])
    print(specific_columns.head())
    # 选择部分数据行（注意：HDF5的索引可能不是从0开始的，但这里假设它是）
    print("\nPartial data (rows 10 to 19):")
    partial_data = store.select('data', start=10, stop=20)
    print(partial_data)
    # 使用chunksize来逐块读取数据
    print("\nData read in chunks:")
    chunks = store.select('data', chunksize=10)
    for i, chunk in enumerate(chunks):
        print(f"Chunk {i + 1}:")
        print(chunk.head())  # 只打印每个块的前几行

26-6-3、结果输出

# 26、pandas.HDFStore.select函数
# All data:
#           A         B         C  D
# 0  1.764052  1.883151 -0.369182  0
# 1  0.400157 -1.347759 -0.239379  0
# 2  0.978738 -1.270485  1.099660  1
# 3  2.240893  0.969397  0.655264  1
# 4  1.867558 -1.173123  0.640132  0
# 
# Specific columns (A, B):
#           A         B
# 0  1.764052  1.883151
# 1  0.400157 -1.347759
# 2  0.978738 -1.270485
# 3  2.240893  0.969397
# 4  1.867558 -1.173123
# 
# Partial data (rows 10 to 19):
#            A         B         C  D
# 10  0.144044  1.867559  0.910179  0
# 11  1.454274  0.906045  0.317218  0
# 12  0.761038 -0.861226  0.786328  1
# 13  0.121675  1.910065 -0.466419  0
# 14  0.443863 -0.268003 -0.944446  0
# 15  0.333674  0.802456 -0.410050  0
# 16  1.494079  0.947252 -0.017020  1
# 17 -0.205158 -0.155010  0.379152  1
# 18  0.313068  0.614079  2.259309  0
# 19 -0.854096  0.922207 -0.042257  0
# 
# Data read in chunks:
# Chunk 1:
#           A         B         C  D
# 0  1.764052  1.883151 -0.369182  0
# 1  0.400157 -1.347759 -0.239379  0
# 2  0.978738 -1.270485  1.099660  1
# 3  2.240893  0.969397  0.655264  1
# 4  1.867558 -1.173123  0.640132  0
# Chunk 2:
#            A         B         C  D
# 10  0.144044  1.867559  0.910179  0
# 11  1.454274  0.906045  0.317218  0
# 12  0.761038 -0.861226  0.786328  1
# 13  0.121675  1.910065 -0.466419  0
# 14  0.443863 -0.268003 -0.944446  0
# Chunk 3:
#            A         B         C  D
# 20 -2.552990  0.376426 -0.955945  0
# 21  0.653619 -1.099401 -0.345982  1
# 22  0.864436  0.298238 -0.463596  0
# 23 -0.742165  1.326386  0.481481  0
# 24  2.269755 -0.694568 -1.540797  1
# Chunk 4:
#            A         B         C  D
# 30  0.154947 -0.769916 -1.424061  1
# 31  0.378163  0.539249 -0.493320  0
# 32 -0.887786 -0.674333 -0.542861  0
# 33 -1.980796  0.031831  0.416050  1
# 34 -0.347912 -0.635846 -1.156182  1
# Chunk 5:
#            A         B         C  D
# 40 -1.048553 -1.491258 -0.637437  0
# 41 -1.420018  0.439392 -0.397272  1
# 42 -1.706270  0.166673 -0.132881  0
# 43  1.950775  0.635031 -0.297791  0
# 44 -0.509652  2.383145 -0.309013  0
# Chunk 6:
#            A         B         C  D
# 50 -0.895467 -0.068242  0.521065  1
# 51  0.386902  1.713343 -0.575788  1
# 52 -0.510805 -0.744755  0.141953  0
# 53 -1.180632 -0.826439 -0.319328  0
# 54 -0.028182 -0.098453  0.691539  1
# Chunk 7:
#            A         B         C  D
# 60 -0.672460 -0.498032 -1.188859  1
# 61 -0.359553  1.929532 -0.506816  1
# 62 -0.813146  0.949421 -0.596314  0
# 63 -1.726283  0.087551 -0.052567  0
# 64  0.177426 -1.225436 -1.936280  0
# Chunk 8:
#            A         B         C  D
# 70  0.729091  0.920859  0.399046  0
# 71  0.128983  0.318728 -2.772593  1
# 72  1.139401  0.856831  1.955912  0
# 73 -1.234826 -0.651026  0.390093  1
# 74  0.402342 -1.034243 -0.652409  1
# Chunk 9:
#            A         B         C  D
# 80 -1.165150 -0.353994 -0.110541  0
# 81  0.900826 -1.374951  1.020173  0
# 82  0.465662 -0.643618 -0.692050  1
# 83 -1.536244 -2.223403  1.536377  0
# 84  1.488252  0.625231  0.286344  0
# Chunk 10:
#            A         B         C  D
# 90 -0.403177 -1.292857 -0.628088  1
# 91  1.222445  0.267051 -0.481027  1
# 92  0.208275 -0.039283  2.303917  0
# 93  0.976639 -1.168093 -1.060016  1
# 94  0.356366  0.523277 -0.135950  0

27、pandas.HDFStore.info函数

27-1、语法

# 27、pandas.HDFStore.info函数
HDFStore.info()
Print detailed information on the store.

Returns:
str

27-2、参数

无

27-3、功能

提供关于存储在HDF5文件中的数据集(也称为键或节点)的详细信息。

27-4、返回值

没有直接的返回值(即不返回任何数据给变量)，而是将信息打印到控制台(或标准输出)。

27-5、说明

无

27-6、用法

27-6-1、数据准备

无

27-6-2、代码示例

# 27、pandas.HDFStore.info函数
import pandas as pd
import numpy as np
# 创建一个包含随机数的数据帧
data = pd.DataFrame({
    'A': np.random.randn(100),
    'B': np.random.randn(100),
    'C': np.random.randn(100),
    'D': np.random.randint(0, 2, 100)
})
# 将数据写入HDF5文件
with pd.HDFStore('example.h5') as store:
    store.put('data', data, format='table')
# 使用HDFStore.info()函数获取HDF5文件的信息
with pd.HDFStore('example.h5') as store:
    # 打印存储的信息
    store.info()
    # 读取数据以确认
    all_data = store.select('data')
    print("\nAll data (first 5 rows):")
    print(all_data.head())

27-6-3、结果输出

# 27、pandas.HDFStore.info函数
# All data (first 5 rows):
#           A         B         C  D
# 0 -1.186803 -0.983345  0.661022  1
# 1  0.549244 -0.429500 -0.022329  1
# 2  1.408989  0.779268  0.079574  1
# 3 -1.178696  0.918125  0.174332  0
# 4 -0.538677 -0.124535 -1.165208  1