目录
写字前面:
结果展示
获取数据
行业数据
控制人数据
十大流通股东数据
开始合并
1 从行业数据中提取证券股的行业数据
2 合并控制人数据
3 合并十大流通股东
4 把三个结果按列合并
写字前面:
在分析数据的时候,常常需要的字段分散在多个文档里,这时候就需要把这些字段合并到一起方便查看和分析。本文以证券板块的股票为例,整合行业、控制人、十大流通股东这三项数据。
结果展示
获取数据
行业数据
数据来源优矿【具体优矿的使用过程看“python_寻找底部股票”博文】
下载的文件存储为industry_list.csv
控制人数据
巨潮-》数据-》专题统计-》股本股东-》实际控制人持股变动
可以直接下载
注意:好像非交易日没法打开,那我这就不截网站的图了。
下图是下载后的数据:
十大流通股东数据
通过akshare获取
akshare文档中 Welcome to AKShare’s Online Documentation! — AKShare 1.10.57 文档
有股票基本面数据的接口
路径:数据字典-》股票数据-》基本面数据-》流通股东
注意:最好使用python3.8或python3.8以上版本安装akshare包执行接口代码
import akshare as ak
pre_dir = r'D:/temp001/'
code_list = ['000166','000686','000712','000728','000750','000776','000783','002500','002670','002673','002736','002797','002926','002939','002945','300059','600030','600061','600095','600109','600155','600369','600621','600837','600864','600906','600909','600918','600858','600999','601059','601066','601099','601108','601136','601162','601198','601211','601236','601375','601377','601456','601555','601688','601696','601788','601878','601881','601901','601990','601995']
for code in code_list:
# 流通股东
try:
df = ak.stock_circulate_stock_holder(symbol=code)
df.to_excel(f'{pre_dir}流通股东_{code}.xlsx', engine='openpyxl')
except Exception as e:
print(str(e))
print(f'{code} 失败')
pass
time.sleep(5)
开始合并
已知:证券股代码列表
1 从行业数据中提取证券股的行业数据
def method_001():
res_dir = r'D:/temp000/'
def pd_000(x):
sec_str = x['secID']
res_str = sec_str.split('.')[0]
return res_str
industry_path = r'D:/temp000/industry_list.csv'
code_list = ['000166','000686','000712','000728','000750','000776','000783','002500','002670','002673','002736','002797','002926','002939','002945','300059','600030','600061','600095','600109','600155','600369','600621','600837','600864','600906','600909','600918','600858','600999','601059','601066','601099','601108','601136','601162','601198','601211','601236','601375','601377','601456','601555','601688','601696','601788','601878','601881','601901','601990','601995']
df = pd.read_csv(industry_path,encoding='utf-8')
df = df.loc[df['isNew']==1].copy()
df['ticker00'] = df.apply(pd_000,axis=1)
df00 = df.loc[df['ticker00'].isin(code_list)].copy()
df00 = df00.loc[:,['ticker00','secShortName','industryName1','industryName2','industryName3']]
df00.to_excel(res_dir+'industry_000.xlsx',engine='openpyxl')
pass
执行后结果
2 合并控制人数据
def method_002():
res_dir = r'D:/temp003/'
code_list = ['000166', '000686', '000712', '000728', '000750', '000776', '000783', '002500', '002670', '002673',
'002736', '002797', '002926', '002939', '002945', '300059', '600030', '600061', '600095', '600109',
'600155', '600369', '600621', '600837', '600864', '600906', '600909', '600918', '600858', '600999',
'601059', '601066', '601099', '601108', '601136', '601162', '601198', '601211', '601236', '601375',
'601377', '601456', '601555', '601688', '601696', '601788', '601878', '601881', '601901', '601990',
'601995']
control_path = r'D:/temp000/专题统计_实际控制人持股变动.xlsx'
df = pd.read_excel(control_path,engine='openpyxl')
df = df.loc[:,['证券代码','实际控制人名称','控股数量(万股)','控股比例(%)','直接控制人名称','控制类型']].copy()
df = df.loc[df['证券代码'].isin(code_list)].copy()
df.to_excel(res_dir+'control_000.xlsx',engine='openpyxl')
pass
执行后结果
3 合并十大流通股东
先看下每个股票十大流通股东的数据结构
它是每季度十行数据,只取最新一季度的十大流通股东,把十行的股东名称转成一个字符串,彼此之间用分号隔开
def method_003():
code_list = ['000166', '000686', '000712', '000728', '000750', '000776', '000783', '002500', '002670', '002673',
'002736', '002797', '002926', '002939', '002945', '300059', '600030', '600061', '600095', '600109',
'600155', '600369', '600621', '600837', '600864', '600906', '600909', '600918', '600858', '600999',
'601059', '601066', '601099', '601108', '601136', '601162', '601198', '601211', '601236', '601375',
'601377', '601456', '601555', '601688', '601696', '601788', '601878', '601881', '601901', '601990',
'601995']
# pre_dir 十大流通股东文件存储的文件夹
pre_dir = r'D:/temp000/000/'
results_list = []
for code in code_list:
print(code)
file_name = f"流通股东_{code}.xlsx"
file_path = pre_dir + file_name
df = pd.read_excel(file_path,engine='openpyxl')
# 取最新的一次十大流通股东
df = df.iloc[0:10].copy()
holders = df['股东名称'].values.tolist()
holders_str = ';'.join(holders)
date = df.iloc[0]['截止日期']
results_list.append(
{
'ticker':code,
'date':date,
'circulation_holders':holders_str
}
)
pass
res_df = pd.DataFrame(results_list)
res_df.to_excel('D:/temp003/holders_000.xlsx',engine='openpyxl')
pass
执行结果
4 把三个结果按列合并
def method_004():
one_path = r'D:/temp003/industry_000.xlsx'
two_path = r'D:/temp003/control_000.xlsx'
three_path = r'D:/temp003/holders_000.xlsx'
df_one = pd.read_excel(one_path,engine='openpyxl',converters={'ticker00':str})
df_two = pd.read_excel(two_path,engine='openpyxl',converters={'证券代码':str})
df_three = pd.read_excel(three_path,engine='openpyxl',converters={'ticker':str})
df00 = pd.merge(df_one,df_two,how='inner',left_on='ticker00',right_on='证券代码')
df01 = pd.merge(df00,df_three,how='inner',left_on='ticker00',right_on='ticker')
df01.to_excel(r'D:/temp003/res_000.xlsx',engine='openpyxl')
pass
最终结果
把数据都合并到一个表后,就可以进行下一步的分析了。