原创文章第240篇,专注“个人成长与财富自由、世界运作的逻辑与投资"。
今天做排序学习算法在ETF行业轮动上的策略,我们选用的DBDT框架是lightGBM,它的优点就是快且效果不错。
我们的候选集是29个行业ETF:
etfs = [ '159870.SZ', '512400.SH', '515220.SH', '515210.SH', '516950.SH', '562800.SH', '515170.SH', '512690.SH', '159996.SZ', '159865.SZ', '159766.SZ', '515950.SH', '159992.SZ', '159839.SZ', '512170.SH', '159883.SZ', '512980.SH', '159869.SZ', '515050.SH', '515000.SH', '515880.SH', '512480.SH', '515230.SH', '512670.SH', '515790.SH', '159757.SZ', '516110.SH', '512800.SH', '512200.SH', ]
我们使用的alpha数据集:
这里的因子列表可以再扩展的,可以替换成qlib的alpha158,或者是补充更多的技术指标。
class Alpha: def __init__(self): pass def get_feature_config(self): return self.parse_config_to_fields() def get_label_config(self): return ["shift(close, -5)/shift(open, -1) - 1", "qcut(shift(close, -5)/shift(open, -1) - 1,20)" ], ["label_c", 'label'] @staticmethod def parse_config_to_fields(): # ['CORD30', 'STD30', 'CORR5', 'RESI10', 'CORD60', 'STD5', 'LOW0', # 'WVMA30', 'RESI5', 'ROC5', 'KSFT', 'STD20', 'RSV5', 'STD60', 'KLEN'] fields = [] names = [] windows = [5, 10, 20, 30, 60] fields += ["corr(close/shift(close,1), log(volume/shift(volume, 1)+1), %d)" % d for d in windows] names += ["CORD%d" % d for d in windows] fields += ['close/shift(close,20)-1'] names += ['roc_20'] return fields, names
下面是梯度提升树的代码:
from quant.datafeed.dataset import DataSet import joblib, os class LGBModel: def __init__(self, load_model=False, feature_cols=None): self.feature_cols = feature_cols if load_model: path = os.path.dirname(__file__) self.ranker = joblib.load(path + '/lgb.pkl') def _prepare_groups(self, df): df['day'] = df.index group = df.groupby('day')['day'].count() return group def predict(self, data): data = data.copy(deep=True) if self.feature_cols: data = data[self.feature_cols] pred = self.ranker.predict(data) return pred def train(self, ds: DataSet): X_train, X_test, y_train, y_test = ds.get_split_data() X_train_data = X_train.drop('symbol', axis=1) X_test_data = X_test.drop('symbol', axis=1) # X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=1) query_train = self._prepare_groups(X_train.copy(deep=True)).values query_val = self._prepare_groups(X_test.copy(deep=True)).values query_test = [X_test.shape[0]] import lightgbm as lgb gbm = lgb.LGBMRanker() gbm.fit(X_train_data, y_train, group=query_train, eval_set=[(X_test_data, y_test)], eval_group=[query_val], eval_at=[5, 10, 20], early_stopping_rounds=50) print(gbm.feature_importances_) joblib.dump(gbm, 'lgb.pkl')
模型训练好了之后,我们使用algo的机器学习“模块”来加载,预测:
from quant.context import ExecContext class ModelPredict: def __init__(self, model): self.model = model def __call__(self, context: ExecContext): context.bar_df['pred_score'] = self.model.predict(context.bar_df) return False # 需要继续处理
整合的代码依然简洁:
env = Env(ds) from quant.context import ExecContext from quant.algo.algos import * from quant.algo.algo_model import ModelPredict from quant.models.gbdt_l2r import LGBModel model = LGBModel(load_model=True, feature_cols=ds.features) env.set_algos([ RunWeekly(), ModelPredict(model=model), SelectTopK(K=2, order_by='pred_score', b_ascending=False), WeightEqually() ]) env.backtest_loop() env.show_results()
年化33.8%,夏普1.22。(整合回测框架代码,数据,策略均上传到星球,请大家前往下载我的开源项目及知识星球)
作为对比,如果我们把pred_score按最差的买入。
结果是如下这样,也侧面印证我们排序的正确性。
明天把alpha因子集扩充,然后加入前向滚动回测看效果。
闲庭独坐对闲花, 轻煮时光慢煮茶, 不问人间烟火事, 任凭岁月染霜华。
财富自由的生活,值得期待。