R语言ggplot2 | 绘制随机森林重要性+相关性热图

news2025/1/15 6:54:00

📋文章目录

  • 原图
  • 复现
    • 准备数据集及数据处理
    • 构建不同分类随机森林模型的并行计算
    • 绘制随机森林变量重要性柱状图
    • 计算数据集的相关性
    • 热图可视化
    • 合并随机森林重要性和热图
  • 附上所有代码

   在文献中,我们经常遇到随机森林和相关性热图的组合图片(下图),它由一幅叠加变量重要性圆圈的相关性热图和一幅说明因变量被解释程度的条形图组成。今天,我们将试着用自己的数据在R里面去复现这类图。

原图

   先看看所需复现的随机森林变量重要性+相关性热图:

在这里插入图片描述
这类图片由两部分组成。其中容易理解的就是下面的相关性热图,它是核心微生物(y轴)与土壤养分变量(x轴)的相关性分析。剩下的圆圈和上面的条形图都可以归类到随机森林分析部分。它将核心微生物作为自变量,分别将各个土壤养分变量作为因变量,进行随机森林分析。每一次随机森林分析得到的变量重要性指数则映射为热图里的圆圈大小(只收集p<0.05的变量的重要性),而模型对目标土壤养分变量的解释程度则作为上面条形图柱子的高度。接下来,我们来模仿复现这张图。

复现

准备数据集及数据处理

rm(list = ls())
# 在R中绘制随机森林模型中变量重要性和相关性的热图
library(vegan)
library(randomForest)
library(rfUtilities)
library(rfPermute)
library(ggplot2)
library(tidyverse)
library(patchwork)
library(export)
library(reshape2)

Micro <- openxlsx::read.xlsx("OTU1.xlsx", 1)
ENV <- openxlsx::read.xlsx("ENV1.xlsx", 1);head(ENV)
# Sample Ecosystem       SOC       pH       CEC        TP       TK       AK        TN        NO3        NH4      DOC
# 1     A1 Grassland  27.05671 7.904906 15.323426 0.9808826 14.75433 180.5585 2.1198671 125.335557  0.3171696 118.5123
# 2     A2 Grassland  25.76547 8.573666 18.104304 0.8527650 14.69093 166.8692 2.4470941 107.296478  2.4062394 124.6120
# 3     A3    Forest  12.33270 8.277955  8.297831 0.4298910 12.94322 121.5713 0.7090047   6.961546  2.1797286 129.7396
# 4     A4    Forest  16.63684 7.150481 15.632651 0.3711634 13.53909 194.0584 1.0868895  23.967231 29.2321159 129.3390
# 5     A5  Cropland 131.72950 7.548969 30.291618 0.6691908 14.19519 144.4154 5.4975148  38.032274 17.6463368 106.5602
# 6     A6  Cropland 123.75436 7.840254 34.357811 0.5828902 14.51699 387.8284 7.9689639  59.413622 12.7478292 105.8368
# MAT       SD       Pl
# 1 2.0162632 1.855115 274.0981
# 2 2.1286678 2.370189 369.1070
# 3 3.5848455 2.037255 263.5001
# 4 2.6846039 2.026378 346.7367
# 5 0.5787488 2.380261 333.7921
# 6 0.6756787 1.590068 240.1557
# 计算香浓指数,并把它合并在环境变量数据中
ENV$shannon <- diversity(t(Micro), index = "shannon")

# 生成一个总的与先前的不同生态系统合并成一个新的数据集
ENV_total <- ENV
ENV_total$Ecosystem <- "Whole"

ENV_new <- rbind(ENV_total, ENV)

# 确定不同生态系统类型的数量
unique(ENV_new$Ecosystem)
# [1] "Whole"     "Grassland" "Forest"    "Cropland"  "Wetland"   "Desert" 

自变量数据是一组虚拟的土壤理化指标数据,它来自5个生态系统:草地、森林、农田、湿地、荒漠,和一个总体的。随后将采样点微生物的香浓指数合并到环境变量中。我们计划将数据按生态系统包括总体共6类,计算每一类生态系统中土壤环境变量对微生物香浓指数的潜在贡献。这里的思路是通过并行运算每类的随机森林模型,提高运行速度。

构建不同分类随机森林模型的并行计算

# 构建自定义函数进行并行运算
library(foreach)
library(doParallel)

# 设置核心数
n_cores <- detectCores() - 1
cl <- makeCluster(n_cores)
registerDoParallel(cl)

# 获取生态系统列表
ecosystems <- unique(ENV_new$Ecosystem)

# 设置输出结果名称
options <- list()
options$results <- ecosystems

# 定义一个函数,对每个生态系统运行随机森林分析
run_single_rf <- function(data, eco) {
  if (!"Ecosystem" %in% names(data)) {
    stop("Error: 'Ecosystem' variable not found in data.")
  }
  if (!(eco %in% unique(data$Ecosystem))) {
    stop("Error: Invalid Ecosystem value.")
  }
  # 对当前生态系统的数据进行子集
  eco_data <- subset(data, Ecosystem == eco)[,-1:-2]
  # 设置种子
  set.seed(1)
  # 进行随机森林分析和置换检验
  RF <- randomForest(shannon~., eco_data, importance = T)
  RF_r2 <- rf.significance(RF, eco_data, nperm=99, ntree=501)$Rsquare
  RF_permuted <- rfPermute(shannon~., eco_data, nperm=99, ntree=501)
  # 返回结果
  list(ecosystem = eco, RF = RF, RF_r2 = RF_r2, RF_permuted = RF_permuted)
}

results <- foreach(i = ecosystems, .combine = rbind) %dopar% {
  library(randomForest)
  library(rfUtilities)
  library(rfPermute)
  # 运行单个随机森林分析
  run_single_rf(data = ENV_new, eco = i)
}

# 停止并行处理
stopCluster(cl)
registerDoSEQ()

# 确认结果
head(results)
# ecosystem   RF                      RF_r2     RF_permuted
# result.1 "Whole"     randomForest.formula,18 0.5722844 rfPermute,6
# result.2 "Grassland" randomForest.formula,18 0.6213995 rfPermute,6
# result.3 "Forest"    randomForest.formula,18 0.4347413 rfPermute,6
# result.4 "Cropland"  randomForest.formula,18 0.4695682 rfPermute,6
# result.5 "Wetland"   randomForest.formula,18 0.1822965 rfPermute,6
# result.6 "Desert"    randomForest.formula,18 0.5186782 rfPermute,6

results[, "RF_permuted"][[1]]
# An rfPermute model
# 
# Type of random forest: regression 
# Number of trees: 501 
# No. of variables tried at each split: 4 
# No. of permutation replicates: 100 
# Start time: 2023-04-18 22:56:38 
# End time: 2023-04-18 22:57:05 
# Run time: 26.8 secs 

这个results结果看起来想data.frame,实际上是一个Large list。
使用随机森林计算整体数据中环境生物因子对微生物香浓指数的重要性。

  1. 这里的%Var explained就是我们画柱状图时柱子的高度,大家可以抄录下来。
  2. importance函数调取了每个变量对shannon指数的重要性,用它们定义热图中空心圆的大小,但我们只需要调取p值小于0.05的变量。(将在后文统一调用)

绘制随机森林变量重要性柱状图

# 绘制柱状图
names <- c("ecosystem", "RF_r2")
bar <- as.data.frame(results[,names])
bar$ecosystem <- unlist(bar$ecosystem)
bar$RF_r2 <- unlist(bar$RF_r2)
str(bar)
# 'data.frame':	6 obs. of  2 variables:
# $ ecosystem: chr  "Whole" "Grassland" "Forest" "Cropland" ...
# $ RF_r2    : num  0.572 0.621 0.435 0.47 0.182 ...
colnames(bar) <- c("variable", "Exp")
bar$variable<- factor(bar$variable)
barplot<- ggplot(bar, aes(variable, Exp))+
  geom_bar(stat = "identity", fill = "steelblue")+
  scale_y_continuous(expand = c(0,0))+
  theme_classic(base_line_size = 0.75)+
  theme(panel.background = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_text(color = "black", size = 12))+
  labs(title = "Explained variation (%)", size = 12)
barplot

在这里插入图片描述

六组随机森林分析的解释量条形图已经完成,我们可以计算并绘制相关性热图部分。

计算数据集的相关性

使用cor()函数直接计算矩阵的相关性,相关矩阵的第14行,1到13列正是我们需要的香浓指数与13个环境因子的相关性分析结果。把它整理在一个表格内,并整理成绘图用的tidy数据,并对环境因子进行排序

注: cor()默认使用Pearson算法,如果对数据正态性不自信的话,建议修改为Spearman算法。

# 打包6个分类的随机森林模型结果
rf_list <- results[, "RF_permuted"]
# 修改6个分类模型的名称,改为对应的分类
names(rf_list) <- ecosystems
# 生成变量重要性的值,如果p值小于0.05则表示对应变量解释量
important_vars <- do.call(rbind, lapply(seq_along(results[, "RF_permuted"]), function(i) data.frame(ENV = row.names(importance(rf_list[[i]])), 
                                                            Important = ifelse(importance(rf_list[[i]])[, 2] < 0.05, 
                                                                               importance(rf_list[[i]])[, 1], NA),
                                                            Eco = rep(names(rf_list[i]), each = nrow(importance(rf_list[[i]]))))))
# 合并数据
circle <- data.frame(ENV = colnames(ENV)[3:15]) %>%
  left_join(important_vars) %>% 
  melt(id = c("ENV", "Eco"), value.name = "Importance"); circle
# Joining, by = "ENV"
# ENV       Eco  variable Importance
# 1  SOC     Whole Important         NA
# 2  SOC Grassland Important         NA
# 3  SOC    Forest Important  13.227866
# 4  SOC  Cropland Important         NA
# 5  SOC   Wetland Important         NA
# 6  SOC    Desert Important         NA
# 7   pH     Whole Important         NA
# 8   pH Grassland Important         NA
# 9   pH    Forest Important         NA
# 10  pH  Cropland Important         NA
# 11  pH   Wetland Important         NA
# 12  pH    Desert Important         NA
# 13 CEC     Whole Important         NA
# 14 CEC Grassland Important         NA
# 15 CEC    Forest Important         NA
# 16 CEC  Cropland Important   6.204274
# 17 CEC   Wetland Important         NA
# 18 CEC    Desert Important         NA
# 19  TP     Whole Important         NA
# 20  TP Grassland Important         NA
# 21  TP    Forest Important         NA
# 22  TP  Cropland Important         NA
# 23  TP   Wetland Important         NA
# 24  TP    Desert Important         NA
# 25  TK     Whole Important         NA
# 26  TK Grassland Important         NA
# 27  TK    Forest Important         NA
# 28  TK  Cropland Important   6.969350
# 29  TK   Wetland Important         NA
# 30  TK    Desert Important   6.783699
# 31  AK     Whole Important         NA
# 32  AK Grassland Important         NA
# 33  AK    Forest Important         NA
# 34  AK  Cropland Important   7.491691
# 35  AK   Wetland Important   7.208659
# 36  AK    Desert Important   6.601446
# 37  TN     Whole Important         NA
# 38  TN Grassland Important         NA
# 39  TN    Forest Important   6.953556
# 40  TN  Cropland Important         NA
# 41  TN   Wetland Important         NA
# 42  TN    Desert Important         NA
# 43 NO3     Whole Important   5.075570
# 44 NO3 Grassland Important         NA
# 45 NO3    Forest Important         NA
# 46 NO3  Cropland Important         NA
# 47 NO3   Wetland Important         NA
# 48 NO3    Desert Important         NA
# 49 NH4     Whole Important   4.594003
# 50 NH4 Grassland Important         NA
# 51 NH4    Forest Important   6.911031
# 52 NH4  Cropland Important         NA
# 53 NH4   Wetland Important         NA
# 54 NH4    Desert Important         NA
# 55 DOC     Whole Important  14.319739
# 56 DOC Grassland Important  18.475196
# 57 DOC    Forest Important  13.514009
# 58 DOC  Cropland Important         NA
# 59 DOC   Wetland Important         NA
# 60 DOC    Desert Important   5.634245
# 61 MAT     Whole Important         NA
# 62 MAT Grassland Important         NA
# 63 MAT    Forest Important         NA
# 64 MAT  Cropland Important         NA
# 65 MAT   Wetland Important         NA
# 66 MAT    Desert Important         NA
# 67  SD     Whole Important         NA
# 68  SD Grassland Important         NA
# 69  SD    Forest Important   6.683399
# 70  SD  Cropland Important   5.815041
# 71  SD   Wetland Important         NA
# 72  SD    Desert Important         NA
# 73  Pl     Whole Important  26.440467
# 74  Pl Grassland Important  12.635259
# 75  Pl    Forest Important         NA
# 76  Pl  Cropland Important  12.873442
# 77  Pl   Wetland Important   7.492399
# 78  Pl    Desert Important  15.139708

circle$ENV<- factor(circle$ENV, rev(colnames(ENV)[3:15]))

# 划分不同生态系统的数据集
df_eco <- split(ENV[, -1:-2], ENV$Ecosystem)
names(df_eco)

# 计算不同生态系统的相关性
r <- data.frame(ENV = colnames(ENV)[3:15],
               Whole = (cor(ENV[, -1:-2]))[14, 1:13],
               Grassland = (cor(df_eco$Grassland))[14, 1:13],
               Forest = (cor(df_eco$Forest))[14, 1:13],
               Cropland = (cor(df_eco$Cropland))[14, 1:13],
               Wetland = (cor(df_eco$Wetland))[14, 1:13],
               Desert = (cor(df_eco$Desert))[14, 1:13])%>%
  melt(id = "ENV", value.name = "Correlation");r
# ENV  variable  Correlation
# 1  SOC     Whole  0.190880826
# 2   pH     Whole -0.047988531
# 3  CEC     Whole  0.375623831
# 4   TP     Whole  0.204011032
# 5   TK     Whole  0.282682786
# 6   AK     Whole -0.238701400
# 7   TN     Whole  0.205713695
# 8  NO3     Whole -0.213326858
# 9  NH4     Whole -0.076868083
# 10 DOC     Whole  0.417261347
# 11 MAT     Whole -0.336919922
# 12  SD     Whole -0.272103426
# 13  Pl     Whole  0.590955861
# 14 SOC Grassland  0.389609962
# 15  pH Grassland -0.292803759
# 16 CEC Grassland  0.466440818
# 17  TP Grassland  0.240020091
# 18  TK Grassland  0.250093957
# 19  AK Grassland  0.295479703
# 20  TN Grassland  0.286305242
# 21 NO3 Grassland  0.060433644
# 22 NH4 Grassland -0.179338677
# 23 DOC Grassland  0.449023819
# 24 MAT Grassland -0.445949001
# 25  SD Grassland -0.511629899
# 26  Pl Grassland  0.657737041
# 27 SOC    Forest  0.419911131
# 28  pH    Forest -0.259555477
# 29 CEC    Forest  0.441782003
# 30  TP    Forest -0.043325378
# 31  TK    Forest  0.184701577
# 32  AK    Forest  0.314240260
# 33  TN    Forest  0.394795298
# 34 NO3    Forest  0.191051827
# 35 NH4    Forest -0.062053889
# 36 DOC    Forest  0.472024615
# 37 MAT    Forest -0.246051709
# 38  SD    Forest -0.427179574
# 39  Pl    Forest  0.445715297
# 40 SOC  Cropland  0.190504600
# 41  pH  Cropland  0.079739140
# 42 CEC  Cropland  0.372432772
# 43  TP  Cropland  0.173902720
# 44  TK  Cropland  0.339013659
# 45  AK  Cropland -0.567700216
# 46  TN  Cropland  0.179234867
# 47 NO3  Cropland -0.426516867
# 48 NH4  Cropland  0.047660104
# 49 DOC  Cropland  0.431862312
# 50 MAT  Cropland -0.388846551
# 51  SD  Cropland -0.129670246
# 52  Pl  Cropland  0.693855669
# 53 SOC   Wetland  0.062759256
# 54  pH   Wetland -0.230794196
# 55 CEC   Wetland  0.292344133
# 56  TP   Wetland  0.170356678
# 57  TK   Wetland  0.265791237
# 58  AK   Wetland  0.323268430
# 59  TN   Wetland -0.004741411
# 60 NO3   Wetland  0.172574329
# 61 NH4   Wetland -0.160782185
# 62 DOC   Wetland  0.436793024
# 63 MAT   Wetland -0.386445255
# 64  SD   Wetland -0.189241202
# 65  Pl   Wetland  0.343211982
# 66 SOC    Desert  0.008834840
# 67  pH    Desert  0.298885289
# 68 CEC    Desert  0.155105965
# 69  TP    Desert  0.253336024
# 70  TK    Desert  0.485088124
# 71  AK    Desert -0.554681227
# 72  TN    Desert  0.173607814
# 73 NO3    Desert -0.256845001
# 74 NH4    Desert -0.144874726
# 75 DOC    Desert  0.443678302
# 76 MAT    Desert -0.263961749
# 77  SD    Desert -0.160875488
# 78  Pl    Desert  0.648853903

r$ENV <- factor(r$ENV, levels = rev(colnames(ENV)[3:15]))

# 合并相关性与随机森林重要性的结果
r <- r %>% left_join(circle[c("ENV", "Importance")], by = "ENV");head(r)
# ENV variable Correlation Importance
# 1 SOC    Whole   0.1908808         NA
# 2 SOC    Whole   0.1908808         NA
# 3 SOC    Whole   0.1908808   13.22787
# 4 SOC    Whole   0.1908808         NA
# 5 SOC    Whole   0.1908808         NA
# 6 SOC    Whole   0.1908808         NA

热图可视化

heatmap <- ggplot()+
  geom_tile(data = r, aes(x = variable, y = ENV, fill = Correlation))+
  scale_fill_gradientn(colors = c('#2D6DB1', 'white', '#DC1623'),
                       limit = c(-1, 1))+
  geom_point(data = circle, aes(x = Eco, y = ENV,
                                size = Importance), shape = 21)+
  theme_bw()+
  theme(panel.background = element_blank(),
        panel.grid = element_blank(),
        axis.text.x = element_text(angle = 45, color = "black",
                                   size = 12, vjust = 0.6),
        axis.text.y = element_text(color = 'black', size = 12),
        legend.title = element_text(size = 10))+
  labs(y = '', x = '')
heatmap

在这里插入图片描述

合并随机森林重要性和热图

# barplot + heatmap + 
#   plot_layout(ncol = 1, heights = c(1, 6))
barplot %>%
  aplot::insert_bottom(heatmap, height = 6)
# graph2ppt(file = "ppt.ppt", height = 8, width = 5, append = T)

在这里插入图片描述
从结果来看,可以说是完美复刻!

附上所有代码

rm(list = ls())
# 在R中绘制随机森林模型中变量重要性和相关性的热图

library(vegan)
library(randomForest)
library(rfUtilities)
library(rfPermute)
library(ggplot2)
library(tidyverse)
library(patchwork)
library(export)
library(reshape2)

Micro <- openxlsx::read.xlsx("OTU1.xlsx", 1)
ENV <- openxlsx::read.xlsx("ENV1.xlsx", 1);head(ENV)
# Sample Ecosystem       SOC       pH       CEC        TP       TK       AK        TN        NO3        NH4      DOC
# 1     A1 Grassland  27.05671 7.904906 15.323426 0.9808826 14.75433 180.5585 2.1198671 125.335557  0.3171696 118.5123
# 2     A2 Grassland  25.76547 8.573666 18.104304 0.8527650 14.69093 166.8692 2.4470941 107.296478  2.4062394 124.6120
# 3     A3    Forest  12.33270 8.277955  8.297831 0.4298910 12.94322 121.5713 0.7090047   6.961546  2.1797286 129.7396
# 4     A4    Forest  16.63684 7.150481 15.632651 0.3711634 13.53909 194.0584 1.0868895  23.967231 29.2321159 129.3390
# 5     A5  Cropland 131.72950 7.548969 30.291618 0.6691908 14.19519 144.4154 5.4975148  38.032274 17.6463368 106.5602
# 6     A6  Cropland 123.75436 7.840254 34.357811 0.5828902 14.51699 387.8284 7.9689639  59.413622 12.7478292 105.8368
# MAT       SD       Pl
# 1 2.0162632 1.855115 274.0981
# 2 2.1286678 2.370189 369.1070
# 3 3.5848455 2.037255 263.5001
# 4 2.6846039 2.026378 346.7367
# 5 0.5787488 2.380261 333.7921
# 6 0.6756787 1.590068 240.1557
# 计算香浓指数,并把它合并在环境变量数据中
ENV$shannon <- diversity(t(Micro), index = "shannon")

# 生成一个总的与先前的不同生态系统合并成一个新的数据集
ENV_total <- ENV
ENV_total$Ecosystem <- "Whole"

ENV_new <- rbind(ENV_total, ENV)

# 确定不同生态系统类型的数量
unique(ENV_new$Ecosystem)
# [1] "Whole"     "Grassland" "Forest"    "Cropland"  "Wetland"   "Desert" 
# # 使用随机森林计算各生态系统中环境生物因子对微生物香浓指数的重要性
# set.seed(1)#设置种子,确保结果一致
# RF_total <- randomForest(shannon ~ . , ENV[, -1:-2], importance = T)
# RFs_total <- rfPermute(shannon ~ . , ENV[, -1:-2])
# RF_total;importance(RFs_total)[,1:2]
# # 这里的%Var explained就是我们画柱状图时柱子的高度,大家可以整理下来
# # importance函数调取了每个变量对shannon指数的重要性,用它们定义热图中空心圆的大小

# 构建自定义函数进行并行运算
library(foreach)
library(doParallel)

# 设置核心数
n_cores <- detectCores() - 1
cl <- makeCluster(n_cores)
registerDoParallel(cl)

# 获取生态系统列表
ecosystems <- unique(ENV_new$Ecosystem)

# 设置输出结果名称
options <- list()
options$results <- ecosystems

# 定义一个函数,对每个生态系统运行随机森林分析
run_single_rf <- function(data, eco) {
  if (!"Ecosystem" %in% names(data)) {
    stop("Error: 'Ecosystem' variable not found in data.")
  }
  if (!(eco %in% unique(data$Ecosystem))) {
    stop("Error: Invalid Ecosystem value.")
  }
  # 对当前生态系统的数据进行子集
  eco_data <- subset(data, Ecosystem == eco)[,-1:-2]
  
  # 设置种子
  set.seed(1)
  
  # 进行随机森林分析和置换检验
  RF <- randomForest(shannon~., eco_data, importance = T)
  RF_r2 <- rf.significance(RF, eco_data, nperm=99, ntree=501)$Rsquare
  RF_permuted <- rfPermute(shannon~., eco_data, nperm=99, ntree=501)
  
  # 返回结果
  list(ecosystem = eco, RF = RF, RF_r2 = RF_r2, RF_permuted = RF_permuted)
}

results <- foreach(i = ecosystems, .combine = rbind) %dopar% {
  
  library(randomForest)
  library(rfUtilities)
  library(rfPermute)
  
  # 运行单个随机森林分析
  run_single_rf(data = ENV_new, eco = i)
  
}

# 停止并行处理
stopCluster(cl)
registerDoSEQ()

# 确认结果
head(results)
# ecosystem   RF                      RF_r2     RF_permuted
# result.1 "Whole"     randomForest.formula,18 0.5722844 rfPermute,6
# result.2 "Grassland" randomForest.formula,18 0.6213995 rfPermute,6
# result.3 "Forest"    randomForest.formula,18 0.4347413 rfPermute,6
# result.4 "Cropland"  randomForest.formula,18 0.4695682 rfPermute,6
# result.5 "Wetland"   randomForest.formula,18 0.1822965 rfPermute,6
# result.6 "Desert"    randomForest.formula,18 0.5186782 rfPermute,6

results[, "RF_permuted"][[1]]
# An rfPermute model
# 
# Type of random forest: regression 
# Number of trees: 501 
# No. of variables tried at each split: 4 
# No. of permutation replicates: 100 
# Start time: 2023-04-18 22:56:38 
# End time: 2023-04-18 22:57:05 
# Run time: 26.8 secs 

# 绘制柱状图
names <- c("ecosystem", "RF_r2")
bar <- as.data.frame(results[,names])
bar$ecosystem <- unlist(bar$ecosystem)
bar$RF_r2 <- unlist(bar$RF_r2)
str(bar)
# 'data.frame':	6 obs. of  2 variables:
# $ ecosystem: chr  "Whole" "Grassland" "Forest" "Cropland" ...
# $ RF_r2    : num  0.572 0.621 0.435 0.47 0.182 ...
colnames(bar) <- c("variable", "Exp")
bar$variable<- factor(bar$variable)
barplot<- ggplot(bar, aes(variable, Exp))+
  geom_bar(stat = "identity", fill = "steelblue")+
  scale_y_continuous(expand = c(0,0))+
  theme_classic(base_line_size = 0.75)+
  theme(panel.background = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_text(color = "black", size = 12))+
  labs(title = "Explained variation (%)", size = 12)
barplot

# 打包6个分类的随机森林模型结果
rf_list <- results[, "RF_permuted"]
# 修改6个分类模型的名称,改为对应的分类
names(rf_list) <- ecosystems
# 生成变量重要性的值,如果p值小于0.05则表示对应变量解释量
important_vars <- do.call(rbind, lapply(seq_along(results[, "RF_permuted"]), function(i) data.frame(ENV = row.names(importance(rf_list[[i]])), 
                                                            Important = ifelse(importance(rf_list[[i]])[, 2] < 0.05, 
                                                                               importance(rf_list[[i]])[, 1], NA),
                                                            Eco = rep(names(rf_list[i]), each = nrow(importance(rf_list[[i]]))))))
# 合并数据
circle <- data.frame(ENV = colnames(ENV)[3:15]) %>%
  left_join(important_vars) %>% 
  melt(id = c("ENV", "Eco"), value.name = "Importance"); circle
# Joining, by = "ENV"
# ENV       Eco  variable Importance
# 1  SOC     Whole Important         NA
# 2  SOC Grassland Important         NA
# 3  SOC    Forest Important  13.227866
# 4  SOC  Cropland Important         NA
# 5  SOC   Wetland Important         NA
# 6  SOC    Desert Important         NA
# 7   pH     Whole Important         NA
# 8   pH Grassland Important         NA
# 9   pH    Forest Important         NA
# 10  pH  Cropland Important         NA
# 11  pH   Wetland Important         NA
# 12  pH    Desert Important         NA
# 13 CEC     Whole Important         NA
# 14 CEC Grassland Important         NA
# 15 CEC    Forest Important         NA
# 16 CEC  Cropland Important   6.204274
# 17 CEC   Wetland Important         NA
# 18 CEC    Desert Important         NA
# 19  TP     Whole Important         NA
# 20  TP Grassland Important         NA
# 21  TP    Forest Important         NA
# 22  TP  Cropland Important         NA
# 23  TP   Wetland Important         NA
# 24  TP    Desert Important         NA
# 25  TK     Whole Important         NA
# 26  TK Grassland Important         NA
# 27  TK    Forest Important         NA
# 28  TK  Cropland Important   6.969350
# 29  TK   Wetland Important         NA
# 30  TK    Desert Important   6.783699
# 31  AK     Whole Important         NA
# 32  AK Grassland Important         NA
# 33  AK    Forest Important         NA
# 34  AK  Cropland Important   7.491691
# 35  AK   Wetland Important   7.208659
# 36  AK    Desert Important   6.601446
# 37  TN     Whole Important         NA
# 38  TN Grassland Important         NA
# 39  TN    Forest Important   6.953556
# 40  TN  Cropland Important         NA
# 41  TN   Wetland Important         NA
# 42  TN    Desert Important         NA
# 43 NO3     Whole Important   5.075570
# 44 NO3 Grassland Important         NA
# 45 NO3    Forest Important         NA
# 46 NO3  Cropland Important         NA
# 47 NO3   Wetland Important         NA
# 48 NO3    Desert Important         NA
# 49 NH4     Whole Important   4.594003
# 50 NH4 Grassland Important         NA
# 51 NH4    Forest Important   6.911031
# 52 NH4  Cropland Important         NA
# 53 NH4   Wetland Important         NA
# 54 NH4    Desert Important         NA
# 55 DOC     Whole Important  14.319739
# 56 DOC Grassland Important  18.475196
# 57 DOC    Forest Important  13.514009
# 58 DOC  Cropland Important         NA
# 59 DOC   Wetland Important         NA
# 60 DOC    Desert Important   5.634245
# 61 MAT     Whole Important         NA
# 62 MAT Grassland Important         NA
# 63 MAT    Forest Important         NA
# 64 MAT  Cropland Important         NA
# 65 MAT   Wetland Important         NA
# 66 MAT    Desert Important         NA
# 67  SD     Whole Important         NA
# 68  SD Grassland Important         NA
# 69  SD    Forest Important   6.683399
# 70  SD  Cropland Important   5.815041
# 71  SD   Wetland Important         NA
# 72  SD    Desert Important         NA
# 73  Pl     Whole Important  26.440467
# 74  Pl Grassland Important  12.635259
# 75  Pl    Forest Important         NA
# 76  Pl  Cropland Important  12.873442
# 77  Pl   Wetland Important   7.492399
# 78  Pl    Desert Important  15.139708

circle$ENV<- factor(circle$ENV, rev(colnames(ENV)[3:15]))

# 划分不同生态系统的数据集
df_eco <- split(ENV[, -1:-2], ENV$Ecosystem)
names(df_eco)

# 计算不同生态系统的相关性
r <- NULL
r <- data.frame(ENV = colnames(ENV)[3:15],
               Whole = (cor(ENV[, -1:-2]))[14, 1:13],
               Grassland = (cor(df_eco$Grassland))[14, 1:13],
               Forest = (cor(df_eco$Forest))[14, 1:13],
               Cropland = (cor(df_eco$Cropland))[14, 1:13],
               Wetland = (cor(df_eco$Wetland))[14, 1:13],
               Desert = (cor(df_eco$Desert))[14, 1:13])%>%
  melt(id = "ENV", value.name = "Correlation");r
# ENV  variable  Correlation
# 1  SOC     Whole  0.190880826
# 2   pH     Whole -0.047988531
# 3  CEC     Whole  0.375623831
# 4   TP     Whole  0.204011032
# 5   TK     Whole  0.282682786
# 6   AK     Whole -0.238701400
# 7   TN     Whole  0.205713695
# 8  NO3     Whole -0.213326858
# 9  NH4     Whole -0.076868083
# 10 DOC     Whole  0.417261347
# 11 MAT     Whole -0.336919922
# 12  SD     Whole -0.272103426
# 13  Pl     Whole  0.590955861
# 14 SOC Grassland  0.389609962
# 15  pH Grassland -0.292803759
# 16 CEC Grassland  0.466440818
# 17  TP Grassland  0.240020091
# 18  TK Grassland  0.250093957
# 19  AK Grassland  0.295479703
# 20  TN Grassland  0.286305242
# 21 NO3 Grassland  0.060433644
# 22 NH4 Grassland -0.179338677
# 23 DOC Grassland  0.449023819
# 24 MAT Grassland -0.445949001
# 25  SD Grassland -0.511629899
# 26  Pl Grassland  0.657737041
# 27 SOC    Forest  0.419911131
# 28  pH    Forest -0.259555477
# 29 CEC    Forest  0.441782003
# 30  TP    Forest -0.043325378
# 31  TK    Forest  0.184701577
# 32  AK    Forest  0.314240260
# 33  TN    Forest  0.394795298
# 34 NO3    Forest  0.191051827
# 35 NH4    Forest -0.062053889
# 36 DOC    Forest  0.472024615
# 37 MAT    Forest -0.246051709
# 38  SD    Forest -0.427179574
# 39  Pl    Forest  0.445715297
# 40 SOC  Cropland  0.190504600
# 41  pH  Cropland  0.079739140
# 42 CEC  Cropland  0.372432772
# 43  TP  Cropland  0.173902720
# 44  TK  Cropland  0.339013659
# 45  AK  Cropland -0.567700216
# 46  TN  Cropland  0.179234867
# 47 NO3  Cropland -0.426516867
# 48 NH4  Cropland  0.047660104
# 49 DOC  Cropland  0.431862312
# 50 MAT  Cropland -0.388846551
# 51  SD  Cropland -0.129670246
# 52  Pl  Cropland  0.693855669
# 53 SOC   Wetland  0.062759256
# 54  pH   Wetland -0.230794196
# 55 CEC   Wetland  0.292344133
# 56  TP   Wetland  0.170356678
# 57  TK   Wetland  0.265791237
# 58  AK   Wetland  0.323268430
# 59  TN   Wetland -0.004741411
# 60 NO3   Wetland  0.172574329
# 61 NH4   Wetland -0.160782185
# 62 DOC   Wetland  0.436793024
# 63 MAT   Wetland -0.386445255
# 64  SD   Wetland -0.189241202
# 65  Pl   Wetland  0.343211982
# 66 SOC    Desert  0.008834840
# 67  pH    Desert  0.298885289
# 68 CEC    Desert  0.155105965
# 69  TP    Desert  0.253336024
# 70  TK    Desert  0.485088124
# 71  AK    Desert -0.554681227
# 72  TN    Desert  0.173607814
# 73 NO3    Desert -0.256845001
# 74 NH4    Desert -0.144874726
# 75 DOC    Desert  0.443678302
# 76 MAT    Desert -0.263961749
# 77  SD    Desert -0.160875488
# 78  Pl    Desert  0.648853903
r$ENV <- factor(r$ENV, levels = rev(colnames(ENV)[3:15]))

# 合并相关性与随机森林重要性的结果
r <- r %>% left_join(circle[c("ENV", "Importance")], by = "ENV");head(r)
# ENV variable Correlation Importance
# 1 SOC    Whole   0.1908808         NA
# 2 SOC    Whole   0.1908808         NA
# 3 SOC    Whole   0.1908808   13.22787
# 4 SOC    Whole   0.1908808         NA
# 5 SOC    Whole   0.1908808         NA
# 6 SOC    Whole   0.1908808         NA

heatmap <- ggplot()+
  geom_tile(data = r, aes(x = variable, y = ENV, fill = Correlation))+
  scale_fill_gradientn(colors = c('#2D6DB1', 'white', '#DC1623'),
                       limit = c(-1, 1))+
  geom_point(data = circle, aes(x = Eco, y = ENV,
                                size = Importance), shape = 21)+
  theme_bw()+
  theme(panel.background = element_blank(),
        panel.grid = element_blank(),
        axis.text.x = element_text(angle = 45, color = "black",
                                   size = 12, vjust = 0.6),
        axis.text.y = element_text(color = 'black', size = 12),
        legend.title = element_text(size = 10))+
  labs(y = '', x = '')
heatmap

# barplot + heatmap +
#   plot_layout(ncol = 1, heights = c(1, 6))
# graph2ppt(file = "ppt.ppt", height = 8, width = 5, append = T)


# barplot + heatmap + 
#   plot_layout(ncol = 1, heights = c(1, 6))
barplot %>%
  aplot::insert_bottom(heatmap, height = 6)
graph2ppt(file = "ppt.ppt", height = 8, width = 5, append = T)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/435784.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

LeetCode热题HOT100:76. 最小覆盖子串,84.柱状图中最大的矩形、96. 不同的二叉搜索树

LeetCode 热题 HOT 100 76. 最小覆盖子串 题目&#xff1a;给你一个字符串 s 、一个字符串 t 。返回 s 中涵盖 t 所有字符的最小子串。如果 s 中不存在涵盖 t 所有字符的子串&#xff0c;则返回空字符串 “” 。 注意&#xff1a; 对于 t 中重复字符&#xff0c;我们寻找的子字…

Ubuntu18.04获取root权限并用root用户登录

写在前面&#xff1a;以下步骤中需要在终端输入命令&#xff0c;电脑端查看博客的朋友可以直接复制粘贴到终端&#xff0c;手机端查看的朋友请注意命令里面的空格是必须的&#xff0c;否则运行会出错。 1.为root设置初始密码 &#xff08;1&#xff09;登录系统&#xff0c;打…

【unity实战】随机地下城生成1

先看看最终效果 导入素材 导入房间图片素材,配置图片信息信息 点击sprite Editor,开始切割图片 随机创建基本房间 已一个白底图片模拟房间预设体 思路:建立一个空的 GameObject 用来做创建房间的点,设置坐标(0,0,0)。每创建1个房间之后,随机在上、下、右判断是否有…

python以及PyCharm工具的环境安装与配置

这里以Windows为例 Python的安装 当然是到Python官网下载咯&#xff0c;https://www.python.org/downloads/点我直达&#xff0c;如图&#xff1a; 可以下载最新版本&#xff0c;可以下拉找到之前特定的版本安装&#xff0c;如图&#xff1a; 这里先择的是最新版的进行安装…

leetcode每日一题:链表专题篇第一期(1/2)

&#x1f61a;一个不甘平凡的普通人&#xff0c;日更算法学习和打卡&#xff0c;期待您的关注和认可&#xff0c;陪您一起学习打卡&#xff01;&#xff01;&#xff01;&#x1f618;&#x1f618;&#x1f618; &#x1f917;专栏&#xff1a;每日算法学习 &#x1f4ac;个人…

现在备考2023年5月软考网络工程师时间够吗?

距离2023年5月软考还有1个多月的时间&#xff0c;备考网络工程师的时间是够的&#xff0c;以下是一些备考方法&#xff1a; 1.了解考试内容 在你开始学习考试之前&#xff0c;了解考试的形式和内容是很重要的。这将帮助你把注意力集中在最有可能被测试的领域。你应该复习考试…

Gartner Magic Quadrant for SD-WAN 2022 (Gartner 魔力象限:软件定义广域网 2022)

Gartner 魔力象限&#xff1a;SD-WAN 2022 请访问原文链接&#xff1a;https://sysin.org/blog/gartner-magic-quadrant-sd-wan-2022/&#xff0c;查看最新版。原创作品&#xff0c;转载请保留出处。 作者主页&#xff1a;sysin.org Gartner 魔力象限&#xff1a;SD-WAN 2022…

ChatGPT最强竞争对手,无需魔法,直接使用

无需科学上网&#xff0c;无需加入Waitlist&#xff0c;免费使用&#xff0c;没有高峰限制&#xff0c;而且效果媲美ChatGPT&#xff01; ChatGPT 的最强竞争对手Claude!!! Claude 是由Anthropic这家人工智能公司开发出来的&#xff0c;其联合创始人Dario Amodei曾经担任OpenAI…

K8S使用持久化卷存储到NFS(NAS盘)

参考文章&#xff1a;K8S-v1.20中使用PVC持久卷 - 知乎 Persistent Volumes&#xff1a;PV是持久化卷&#xff0c;系统管理员设置的存储&#xff0c;它是群集的一部分&#xff0c;是一种资源&#xff0c;所以它有独立于Pod的生命周期 Persistent Volume Claim&#xff1a;PVC…

抽象同步队列AbstractQueuedSynchronizer(AQS)简要理解

抽象同步队列AbstractQueuedSynchronizer AQS 简要理解 1 什么是AQS2 AQS结构2.1 同步状态2.2 CLH队列2.3 Node 3 AQS流程 https://zhuanlan.zhihu.com/p/370501087 1 什么是AQS AQS&#xff08;AbstractQueuedSynchronizer&#xff09;是 Java 中实现锁和同步器的基础设施&am…

el-input-number中添加单位(css版)

优点: 通过css添加,非常便捷和简单 例如此处,需要添加一个分钟单位 <el-input-number class"input-number" v-model"value" :step"5" :min"30" ></el-input-number> //css <style lang"scss"> .input-nu…

node项目(一) koa脚手架的搭建

一、koa 安装 // 安装koa npm install -g koa-generator // 创建项目 koa2 项目名称 当出现这个框的时候安装完毕 之后就是进入目录文件&#xff0c;根据package.json执行即可 二、出现问题 汇总 问题一&#xff1a;koa-generator安装失败 没有出现koa-generator安装成功 …

分类预测 | MATLAB实现BO-CNN-GRU贝叶斯优化卷积门控循环单元多输入分类预测

分类预测 | MATLAB实现BO-CNN-GRU贝叶斯优化卷积门控循环单元多输入分类预测 目录 分类预测 | MATLAB实现BO-CNN-GRU贝叶斯优化卷积门控循环单元多输入分类预测效果一览基本介绍模型描述程序设计参考资料 效果一览 基本介绍 基于贝叶斯(bayes)优化卷积神经网络-门控循环单元(CN…

4月第2周榜单丨飞瓜数据B站UP主排行榜(哔哩哔哩平台)发布!

飞瓜轻数发布2023年4月10日-4月16日飞瓜数据UP主排行榜&#xff08;B站平台&#xff09;&#xff0c;通过充电数、涨粉数、成长指数三个维度来体现UP主账号成长的情况&#xff0c;为用户提供B站号综合价值的数据参考&#xff0c;根据UP主成长情况用户能够快速找到运营能力强的B…

Linux进程概念——其一

目录 冯诺依曼体系结构 操作系统(Operator System) 概念 设计OS的目的 定位 如何理解 "管理" 总结 系统调用和库函数概念 进程 基本概念 描述进程-PCB task_struct-PCB的一种 task_ struct内容分类 组织进程 查看进程 通过系统调用获取进程标示符 通…

Docker Desktop 占用过多C盘存储空间的一种解决办法——在其他磁盘分区添加访问路径

一、问题背景 Docker Desktop默认是安装到C盘中的。但随着Docker的使用&#xff0c;其占用的空间也越来越大&#xff0c;Docker占用C盘空间过大成了个令人头疼的问题。恰好最近腾出了一个空的磁盘分区&#xff0c;因此可以使用“在其他磁盘分区添加访问路径”的方式&#xff0c…

03-Mybatis的基本使用-注解配置文件+xml配置文件

目录 1、环境准备 2、注解配置文件 基础操作01-通过ID删除数据 基础操作02-插入数据 基础操作03-更新数据 基础操作04-根据ID查询数据 基础操作05-条件查询数据 3、xml配置文件 1、环境准备 1. 创建数据库数据表 -- 部门管理 create table dept(id int unsigned prim…

【数据篇】SpringBoot 整合 MyBatis 组合 Redis 作为数据源缓存

写在最前 MyBatis 是常见的 Java 数据库访问层框架。在日常工作中&#xff0c;开发人员多数情况下是使用 MyBatis 的默认缓存配置&#xff0c;但是 MyBatis 缓存机制有一些不足之处&#xff0c;在使用中容易引起脏数据&#xff0c;形成一些潜在的隐患。 本文介绍的是 Redis 组…

版本控制工具之git安装

作为软件开发者的必备工具——版本控制工具&#xff0c;git无疑深受欢迎。 业界常用的版本控制工具主要有两种&#xff1a;SVN和Git SVN 传统的版本控制工具&#xff0c;特点为集中式分布。 使用一台专用的服务器存储所有资料。 缺点是所有的动作都必须依赖于中央服务器&#x…

FPGA配置方式的基本知识?

FPGA配置粗略可以分为主动和被动两种。主动加载是指由FPGA控制配置流程&#xff0c;被动加载是指FPGA仅仅被动接收配置数据。 最常见的被动配置模式就是JTAG下载bit文件。此模式下&#xff0c;主动发起操作的设备是计算机&#xff0c;数据通路是JTAG&#xff0c;FPGA会被动接收…