python随机选取数据算法

news2025/4/28 11:52:56

python随机选取数据算法：

使用sample方法
pandas的sample方法是最常用的方法来随机选取DataFrame中的数据。可以通过设置frac参数来指定选取的比例。
代码：

import pandas as pd

# 创建一个示例DataFrame
data = {
    'A': range(1, 101),
    'B': range(101, 201)
}
df = pd.DataFrame(data)

# 随机选取10%的数据
sampled_df = df.sample(frac=0.1, random_state=1)
print(sampled_df)

pandas.DataFrame.sample：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)

random_state: int value or numpy.random.RandomState, optional. if set to a particular integer, will return same rows as sample in every iteration.可以使用参数random_state指定随机数。随机数是固定的，因此总是返回相同的行和列。
如果参数replace设置为True，则允许重复的行/列。默认值为False。
可以参考：
https://blog.csdn.net/qq_40433737/article/details/107048681

图片来源：https://www.w3schools.com/python/pandas/ref_df_sample.asp

2、使用numpy的随机选择【可以生成随机索引，然后选择相应的行】
代码：

import numpy as np

# 计算要选取的行数
num_samples = int(len(df) * 0.1)

# 随机选择行索引
random_indices = np.random.choice(df.index, size=num_samples, replace=False)

# 根据随机索引选择数据
sampled_df = df.loc[random_indices]
print(sampled_df)

3、使用sklearn的train_test_split

from sklearn.model_selection import train_test_split

# 随机选取10%的数据
sampled_df, _ = train_test_split(df, test_size=0.9, random_state=1)
print(sampled_df)

4、使用random模块

import random

# 计算要选取的行数
num_samples = int(len(df) * 0.1)

# 随机选择行索引
random_indices = random.sample(range(len(df)), num_samples)

# 根据随机索引选择数据
sampled_df = df.iloc[random_indices]
print(sampled_df)