【机器学习大杀器】Stacking堆叠模型-English

news2024/11/28 4:38:11

1. Introduction

        The stacking model is very common in Kaglle competitions. Why? 


【机器学习大杀器】Stacking堆叠模型(English)

1. Introduction

2. Model 3: Stacking model

 2.1 description of the algorithms:

2.2 interpretation of the estimated models:

  3. Extend

 3.1 code sample

3.2 the coefficient of last model


2. Model 3: Stacking model

===============================================

Include three model:  

=================

  1. Lasso;
  2. Random Forest;
  3. Gradient Boost.

==============================================================


 2.1 description of the algorithms:


        When we get three models with equal robustness but different structure, which model should we choose at this time? I think this is a very difficult thing, but through the integrated learning method, we can easily synthesize these three models into a better model that absorbs the advantages of each model. Stacking is just such a strategy for merging models.

        In this experiment, we just used such a stacking strategy, which greatly improved the effect of our model. In the first layer of the model, there are three stacked models: {1.Lasso, 2.Random Forest, 3.Gradient Boost}, and used {Logistics Region} as the second layer of the model.


2.2 interpretation of the estimated models:


Stacking is generally composed of two layers. Level 1: basic models with excellent performance (there can be multiple models); The second layer: take the output of the models in the first layer as the model obtained from the training set. The second layer model is also called "meta model". The key role is to integrate the results of all models in the first layer and output them. That is, the second layer model trains the output of the first layer model as a feature. The overall steps of stacking are as follows:

  1. The original data is divided into two parts: training set D-train and test set D-test. The training set is used to train the overall Stacking integration model, and the test set is used to test the integration model.
  2. This step includes training and testing:
    1. Training     Stacking is based on cross validation. Taking the 50 fold cross validation as an example, the training set D-train is divided into five parts, trained five times, and each time an unselected part is selected as the validation set, as shown in the following figure. Wherein, Learn corresponds to Training folds, which is used to train primary learners; The Predict in the following figure corresponds to Validation fold, which is used to obtain five prediction results Predictions_Train={p1,p2,p3,p4,p5} through the primary trainer. These prediction results will be combined into a new training set, which will be used to train the secondary learner Model2.
    2. Testing       Five tests are also required for five times of training. After training the model once each time, the data of the test set is also tested. Finally, five prediction test set results are obtained. The average of the five prediction results is taken as the final prediction result Predictions_Test={p1,p2,p3,p4,p5} of the model of this layer. The model of the next layer will be tested on the new test set Predictions_Train.

      3. If there are n models stacked in the first layer, perform the operation in step 3 for each model, and finally get n Predictions_Train and Predictions_Test, put n redictions_Train as the characteristic value into the second layer model for training. After training, use the second layer model to predict the test set, and finally take the combined result of n+1 Predictions_Test as the final prediction result. 

  3. Extend


 3.1 code sample


from sklearn.ensemble import StackingRegressor
models = [('Lasso', lo), ('Random Forest', rf), ('Gradient Boost', gb_boost)]
 
stack = StackingRegressor(models, final_estimator=LinearRegression(positive=True), cv=5, n_jobs=4)
stack.fit(x_train, y_train)
stack_log_pred = stack.predict(x_test)
print(stack_log_pred)
stack_pred = np.exp(stack_log_pred)
 
plt.barh(np.arange(len(models)), stack.final_estimator_.coef_)
plt.yticks(np.arange(len(models)), ['Linear Regression', 'Random Forest', 'Gradient Boost']);
plt.xlabel('Model coefficient')
plt.title('Figure 4. Model coefficients for our stacked model');

3.2 the coefficient of last model


In order to better understand what Stacking has done, we printed the coefficient of the last layer of the Stacking model, i.e. the second layer of the Linear Regression model, after we completed the training with skleran, as shown in the figure:

Coefficients of high and positive linear regression models coef_ It means a stronger relationship. From the above figure, we can see that the proportion of the three models we combined in Stacking is 8:2:0. It can be seen that Gradient Boost has the greatest impact on the overall prediction results of the model. 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/3785.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

浅谈Vue中 ref、reactive、toRef、toRefs、$refs 的用法

💭💭 ✨: 浅谈ref、reactive、toRef、toRefs、$refs   💟:东非不开森的主页   💜: 技术需沉淀,不要浮躁💜💜   🌸: 如有错误或不足之处,希望可…

Redhat(3)-Bash-Shell-正则表达式

1.bash脚本 2.bash变量、别名、算术扩展 3.控制语句 4.正则表达式 1.bash脚本 #!/bin/bash#this is basic bash script<< BLOCK This is the basic bash script BLOKC: This is the basic bash script echo "hello world!" 双引号、单引号只有在变量时才有区…

健身房信息管理系统/健身房管理系统

21世纪的今天&#xff0c;随着社会的不断发展与进步&#xff0c;人们对于信息科学化的认识&#xff0c;已由低层次向高层次发展&#xff0c;由原来的感性认识向理性认识提高&#xff0c;管理工作的重要性已逐渐被人们所认识&#xff0c;科学化的管理&#xff0c;使信息存储达到…

VCS 工具学习笔记(1)

目录 引言 平台说明 关于VCS 能力 Verilog 仿真事件队列 准备 VCS工作介绍 工作步骤 支持 工作机理 编译命令格式 编译选项 示例 仿真命令格式 仿真选项 示例 库调用 -y 总结 实践 设计文件 仿真文件 编译 仿真 关于增量编译 日志文件记录 编译仿真接续进…

链接脚本和可执行文件

几个重要的概念 摘取自知乎内容&#xff1a; 链接器与链接脚本 - 知乎 linker 链接器 链接器(linker) 是一个程序&#xff0c;这个程序主要的作用就是将目标文件(包括用到的标准库函数目标文件)的代码段、数据段以及符号表等内容搜集起来并按照 ELF或者EXE 等格式组合成一个…

【C++学习】string的使用

&#x1f431;作者&#xff1a;一只大喵咪1201 &#x1f431;专栏&#xff1a;《C学习》 &#x1f525;格言&#xff1a;你只管努力&#xff0c;剩下的交给时间&#xff01; string的使用&#x1f640;模板&#x1f639;函数模板&#x1f639;类模板&#x1f640;string模板简…

【菜鸡读论文】Former-DFER: Dynamic Facial Expression Recognition Transformer

Former-DFER: Dynamic Facial Expression Recognition Transformer 哈喽&#xff0c;大家好呀&#xff01;本菜鸡又来读论文啦&#xff01;先来个酷炫小叮当作为我们的开场&#xff01; 粉红爱心泡泡有没有击中你的少女心&#xff01;看到这么可爱的小叮当陪我们一起读论文&am…

有了PySnooper,不用print、不用debug轻松查找问题所在!

PySnooper是一个非常方便的调试器&#xff0c;它是通过python注解的方式来对函数的执行过程进行监督的。 应用起来比较简单&#xff0c;不用一步一步的去走debug来查找问题所在&#xff0c;并且将运行过程中函数的变量值打印出来结果一目了然&#xff0c;相当于替代了print函数…

Boundary Loss 原理与代码解析

paper&#xff1a;Boundary loss for highly unbalanced segmentation Introduction 在医学图像分割中任务中通常存在严重的类别不平衡问题&#xff0c;目标前景区域的大小常常比背景区域小几个数量级&#xff0c;比如下图中前景区域比背景区域小500倍以上。 分割通常采用的交…

SpringBoot实践(三十三):Maven使用及POM详解

文章目录maven是什么maven怎么装settings.xml本地仓库地址&#xff1a;localRepository远程镜像&#xff1a;mirrorsJDK 版本&#xff1a;profile私服配置POM.xml中的常用标签projectmodelVersiongroupIdartifactIdversionpropertiesdependenciesbuild和pluginsresourcesdepend…

【学生管理系统】用户登录三种验证方式—图片验证、短信验证、邮件验证

目录 一、页面需求展示 二、验证方式—按钮组件 三、手机短信验证 四、邮件验证 五、图片验证邮件验证 &#x1f49f; 创作不易&#xff0c;不妨点赞&#x1f49a;评论❤️收藏&#x1f499;一下 一、页面需求展示 二、验证方式—按钮组件 2.1前端 <el-form-item labe…

【Linux】第十章 进程间通信(管道+system V共享内存)

&#x1f3c6;个人主页&#xff1a;企鹅不叫的博客 ​ &#x1f308;专栏 C语言初阶和进阶C项目Leetcode刷题初阶数据结构与算法C初阶和进阶《深入理解计算机操作系统》《高质量C/C编程》Linux ⭐️ 博主码云gitee链接&#xff1a;代码仓库地址 ⚡若有帮助可以【关注点赞收藏】…

工作流的例子

工作流的例子目录概述需求&#xff1a;设计思路实现思路分析1.配置bean2.examples3.no bean4.activiti-api-basic-process-example5.taskspringweb参考资料和推荐阅读Survive by day and develop by night. talk for import biz , show your perfect code,full busy&#xff0c…

C++ 多态类型

多态 C在面向对象中&#xff0c;多态就是不同对象收到相同消息&#xff0c;执行不同的操作。在程序设计中&#xff0c;多态性是名字相同的函数&#xff0c;这些函数执行不同或相似的操作&#xff0c;这样就可以用同一个函数名调用不同内容的函数。简而言之“一个接口&#xff…

2022 国赛postgresql

安装postgresql配置postgresql [root@linux3 ~]# postgresql-setup --initdb //初始化数据库Initializing database in ‘/var/lib/pgsql/data’Initialized, logs are in /var/lib/pgsql/initdb_postgresql.log[root@linux3 ~]# systemctl enable postgresql.service Created …

澳洲最热门职业,护士排第一,医生竟然不如程序员?

2022澳洲最新的职业紧缺名单出炉了&#xff0c;令人惊讶的是护士竟然排行第一名&#xff0c;可见澳洲的医疗人力资源紧缺的问题。 既然人力资源紧缺&#xff0c;那么首当其冲的医生作为高学历且同属医疗行业的代表理应收到重视&#xff0c;然而令人意外的是&#xff0c;通过榜单…

Linux一篇入门(以Ubuntu为例)

一、Linux与Windows区别 Linux&#xff1a;无盘符&#xff0c;只有一个根目录&#xff08;/&#xff09; Windows&#xff1a;有盘符 二、目录相关常见命令 Linux命令格式&#xff1a; cmd -option parameter cdm命令&#xff0c;就是一个操作 parameter一般是要做的对象…

韩国程序员面试考什么?

大家好&#xff0c;我是老三&#xff0c;在G站闲逛的时候&#xff0c;从每日热门上&#xff0c;看到一个韩国的技术面试项目&#xff0c;感觉有点好奇&#xff0c;忍不住点进去看看。 韩国的面试都考什么&#xff1f;有没有国内的卷呢&#xff1f; 可以看到&#xff0c;有8.…

抽象类和接口

文章目录 前言 一、今日回顾 1.《高等数学》 2.阅读&#xff1a; 3.英语&#xff1a; 二、编程的那些事 1.引入库 2.读入数据 总结 前言 一、今日回顾 1.《高等数学》 2.阅读&#xff1a; 3.英语&#xff1a; 二、编程的那些事 1.抽象类的描述 在java中&#xff0…

一次函数与二次函数的联系

首先&#xff0c;无论是一次函数还是二次函数&#xff0c;都是函数&#xff0c;所以便可以从表达式&#xff0c;图像&#xff0c;函数的四个性质&#xff08;即有界性&#xff0c;单调性&#xff0c;奇偶性&#xff0c;周期性&#xff09;去看他们之间的联系 一次函数与二次函…