甲基化系列 4. 基于芯片甲基化数据寻找简单的CpG甲基化标志物 (CimpleG)

news2024/10/6 8:37:42

a21bc829ae435b5e31532f15f1e398fc.gif


甲基化系列分析教程

桓峰基因公众号推出甲基化系列分析教程,整理如下:

甲基化系列 1. 甲基化之前世今生(Methylation)

甲基化系列 2. 甲基化芯片数据介绍与下载(GEO)

甲基化系列 3. 甲基化芯片数据分析完整版(ChAMP)


甲基化系列的教程有很久没有更新了,有这方研究方向的高手,有偿约稿,辛苦费大大地有,速速联系我吧!

简介

DNA 甲基化特征通常是基于多元的方法,需要数以百计的预测网站。在这里,我们提出了一个计算框架 CimpleG 用于检测 CpG 甲基化特征,用于细胞类型分类功能和反卷积。CimpleG 既具有时间效率,又具有执行能力以及表现最好的血细胞和其他细胞类型分类方法而它的预测是基于每种细胞类型的单个 DNA 甲基化位点。总之,CimpleG 为描述提供了一个完整的计算框架 DNA 特征和细胞反褶积。

c646cd399f9356264c73e915ed138544.png

(A) 对用于特性选择和下游应用程序的 CimpleG 和统计的概述。体细胞(B)或白细胞(C)的DNAm数据集的主成分分析。训练样本(≥10个阳性样本)和测试数据的粗体突出显示的细胞类型被用作目标类。测试数据中不存在的细胞类型仅用作阴性示例(非靶细胞)。

838b3b72554ea9a91030fe881c92fe2d.png

软件包安装

# Install directly from github:
devtools::install_github("costalab/CimpleG")

# Alternatively, downloading from our release page and installing it from a local source:
#  - ie navigating through your system
install.packages(file.choose(), repos = NULL, type = "source")
#  - ie given a path to a local source
install.packages("~/Downloads/CimpleG_0.0.5.XXXX.tar.gz", repos = NULL, type = "source")
# or
devtools::install_local("~/Downloads/CimpleG_0.0.5.XXXX.tar.gz")

数据读取

library("CimpleG")

data(train_data)
train_data[1:5,1:5]
##            cg10456121 cg01954746 cg07580588 cg11857210 cg02674789
## GSM2759107  0.5357843  0.9270407  0.9571794  0.9016588  0.9556690
## GSM2759108  0.5516279  0.9351165  0.9500683  0.8665362  0.9421682
## GSM2759109  0.5154959  0.9054998  0.9538037  0.9327675  0.8824507
## GSM2759110  0.5887189  0.9108159  0.9551378  0.9156878  0.9249404
## GSM2759111  0.4917143  0.9397178  0.9519388  0.8655511  0.8840323
data(train_targets)
head(train_targets)
##          gsm         cell_type adipocytes astrocytes blood_cells
## 1 GSM2759107 endothelial_cells          0          0           0
## 2 GSM2759108 endothelial_cells          0          0           0
## 3 GSM2759109 endothelial_cells          0          0           0
## 4 GSM2759110 endothelial_cells          0          0           0
## 5 GSM2759111 endothelial_cells          0          0           0
## 6 GSM2759112 endothelial_cells          0          0           0
##   endothelial_cells epidermal_cells epithelial_cells fibroblasts glia
## 1                 1               0                0           0    0
## 2                 1               0                0           0    0
## 3                 1               0                0           0    0
## 4                 1               0                0           0    0
## 5                 1               0                0           0    0
## 6                 1               0                0           0    0
##   hepatocytes ips_cells msc muscle_cells neurons muscle_sc group_data
## 1           0         0   0            0       0         0      train
## 2           0         0   0            0       0         0      train
## 3           0         0   0            0       0         0      train
## 4           0         0   0            0       0         0      train
## 5           0         0   0            0       0         0      train
## 6           0         0   0            0       0         0      train
##         description
## 1 ENDOTHELIAL.CELLS
## 2 ENDOTHELIAL.CELLS
## 3 ENDOTHELIAL.CELLS
## 4 ENDOTHELIAL.CELLS
## 5 ENDOTHELIAL.CELLS
## 6 ENDOTHELIAL.CELLS
colnames(train_targets)
##  [1] "gsm"               "cell_type"         "adipocytes"       
##  [4] "astrocytes"        "blood_cells"       "endothelial_cells"
##  [7] "epidermal_cells"   "epithelial_cells"  "fibroblasts"      
## [10] "glia"              "hepatocytes"       "ips_cells"        
## [13] "msc"               "muscle_cells"      "neurons"          
## [16] "muscle_sc"         "group_data"        "description"
data(test_data)
test_data[1:5,1:5]
##            cg10456121 cg01954746 cg07580588 cg11857210 cg02674789
## GSM1289142 0.06305196  0.8088967  0.7291303  0.8387920  0.7728405
## GSM1289143 0.09940250  0.8411065  0.7906977  0.8273305  0.7540370
## GSM1289146 0.10658357  0.8103309  0.7638934  0.8142458  0.7384098
## GSM1289144 0.24916585  0.8149448  0.8164471  0.8223350  0.7968497
## GSM1289145 0.11871330  0.8754153  0.8049503  0.7617512  0.7903106
data(test_targets)

# check the train_targets table to see
# what other columns can be used as targets
# colnames(train_targets)

实例操作

CimpleG试图找到对给定的训练数据集的细胞类型进行最佳分类的CpG, 还能够在几个简单的步骤中执行细胞型反褶积,可以使用beta值或M值。这里我们展示了生成signature就非常容易了。

运行 CimpleG

运行CimpleG非常简单。您只需要使用一些参数运行CimpleG函数。

# mini example with just 4 target signatures
set.seed(42)
cimpleg_result <- CimpleG(
  train_data = train_data,
  train_targets = train_targets,
  test_data = test_data,
  test_targets = test_targets,
  method = "CimpleG",
  target_columns = c(
    "neurons",
    "glia",
    "blood_cells",
    "fibroblasts"
  )
)
## Training for target 'neurons' with 'CimpleG' has finished.: 1.63 sec elapsed
## Training for target 'glia' with 'CimpleG' has finished.: 0.5 sec elapsed
## Training for target 'blood_cells' with 'CimpleG' has finished.: 0.37 sec elapsed
## Training for target 'fibroblasts' with 'CimpleG' has finished.: 0.36 sec elapsed

cimpleg_result$results
## $neurons
## $neurons$train_res
## $neurons$train_res$fold_id
## # A tibble: 4,090 × 3
##      Row Data     Fold  
##    <int> <chr>    <chr> 
##  1     1 Analysis Fold02
##  2     1 Analysis Fold03
##  3     1 Analysis Fold04
##  4     1 Analysis Fold05
##  5     1 Analysis Fold06
##  6     1 Analysis Fold07
##  7     1 Analysis Fold08
##  8     1 Analysis Fold09
##  9     1 Analysis Fold10
## 10     2 Analysis Fold01
## # ℹ 4,080 more rows
## 
## $neurons$train_res$train_summary
##             id     stat_origin mean_aupr mean_var_a fold_presence
##  1: cg02124957      train_aupr 1.0000000 0.03121190            10
##  2: cg02124957 validation_aupr 1.0000000 0.03121190            10
##  3: cg13700051      train_aupr 1.0000000 0.03278101            10
##  4: cg13700051 validation_aupr 1.0000000 0.03278101            10
##  5: cg14356362      train_aupr 0.9787985 0.02450627            10
##  6: cg14356362 validation_aupr 0.9791667 0.02450627            10
##  7: cg21637776      train_aupr 0.9663700 0.04718167             8
##  8: cg21637776 validation_aupr 1.0000000 0.04718167             8
##  9: cg24548498      train_aupr 1.0000000 0.02497980            10
## 10: cg24548498 validation_aupr 1.0000000 0.02497980            10
## 
## $neurons$train_res$dt_dmsv
##            id diff_means sum_variance pred_type
## 1: cg02124957 -0.5272931  0.008677887     FALSE
## 2: cg13700051 -0.5790852  0.010987231     FALSE
## 3: cg14356362 -0.6419124  0.010098182     FALSE
## 4: cg21637776 -0.3788893  0.006784796     FALSE
## 5: cg24548498  0.5781948  0.008344877      TRUE
## 
## $neurons$train_res$train_results
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg24548498            10  0.5781948  0.008344877      TRUE 0.02497980
## 2: cg14356362            10 -0.6419124  0.010098182     FALSE 0.02450627
## 3: cg02124957            10 -0.5272931  0.008677887     FALSE 0.03121190
## 4: cg13700051            10 -0.5790852  0.010987231     FALSE 0.03278101
## 5: cg21637776             8 -0.3788893  0.006784796     FALSE 0.04718167
##    train_aupr validation_aupr   cpg_score train_rank
## 1:  1.0000000       1.0000000 0.001998384          1
## 2:  0.9787985       0.9791667 0.002380850          2
## 3:  1.0000000       1.0000000 0.002496952          3
## 4:  1.0000000       1.0000000 0.002622481          4
## 5:  0.9663700       1.0000000 0.005138542          5
## 
## 
## $neurons$test_perf
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg24548498            10  0.5781948  0.008344877      TRUE 0.02497980
## 2: cg14356362            10 -0.6419124  0.010098182     FALSE 0.02450627
## 3: cg02124957            10 -0.5272931  0.008677887     FALSE 0.03121190
## 4: cg13700051            10 -0.5790852  0.010987231     FALSE 0.03278101
## 5: cg21637776             8 -0.3788893  0.006784796     FALSE 0.04718167
##    train_mean_aupr validation_mean_aupr   cpg_score train_rank test_aupr
## 1:       1.0000000            1.0000000 0.001998384          1 0.9215653
## 2:       0.9787985            0.9791667 0.002380850          2 1.0000000
## 3:       1.0000000            1.0000000 0.002496952          3 0.8796205
## 4:       1.0000000            1.0000000 0.002622481          4 0.4853427
## 5:       0.9663700            1.0000000 0.005138542          5 0.6187609
## 
## $neurons$elapsed_time
## Time difference of 1.687599 secs
## 
## 
## $glia
## $glia$train_res
## $glia$train_res$fold_id
## # A tibble: 4,090 × 3
##      Row Data     Fold  
##    <int> <chr>    <chr> 
##  1     1 Analysis Fold01
##  2     1 Analysis Fold02
##  3     1 Analysis Fold03
##  4     1 Analysis Fold05
##  5     1 Analysis Fold06
##  6     1 Analysis Fold07
##  7     1 Analysis Fold08
##  8     1 Analysis Fold09
##  9     1 Analysis Fold10
## 10     2 Analysis Fold01
## # ℹ 4,080 more rows
## 
## $glia$train_res$train_summary
##             id     stat_origin mean_aupr mean_var_a fold_presence
##  1: cg02011981      train_aupr 1.0000000 0.03369357            10
##  2: cg02011981 validation_aupr 1.0000000 0.03369357            10
##  3: cg07644184      train_aupr 0.5982708 0.03479852            10
##  4: cg07644184 validation_aupr 0.7750000 0.03479852            10
##  5: cg11150667      train_aupr 0.7621032 0.04904385            10
##  6: cg11150667 validation_aupr 0.9250000 0.04904385            10
##  7: cg14501977      train_aupr 1.0000000 0.01745930            10
##  8: cg14501977 validation_aupr 1.0000000 0.01745930            10
##  9: cg25737283      train_aupr 0.9556481 0.04227059            10
## 10: cg25737283 validation_aupr 1.0000000 0.04227059            10
## 
## $glia$train_res$dt_dmsv
##            id diff_means sum_variance pred_type
## 1: cg02011981 -0.5970260  0.012028332     FALSE
## 2: cg07644184 -0.4876997  0.008267879     FALSE
## 3: cg11150667 -0.2643837  0.003424957     FALSE
## 4: cg14501977 -0.6653107  0.007728417     FALSE
## 5: cg25737283 -0.4803594  0.009757117     FALSE
## 
## $glia$train_res$train_results
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg14501977            10 -0.6653107  0.007728417     FALSE 0.01745930
## 2: cg02011981            10 -0.5970260  0.012028332     FALSE 0.03369357
## 3: cg25737283            10 -0.4803594  0.009757117     FALSE 0.04227059
## 4: cg11150667            10 -0.2643837  0.003424957     FALSE 0.04904385
## 5: cg07644184            10 -0.4876997  0.008267879     FALSE 0.03479852
##    train_aupr validation_aupr   cpg_score train_rank
## 1:  1.0000000           1.000 0.001396744          1
## 2:  1.0000000           1.000 0.002695486          2
## 3:  0.9556481           1.000 0.003825165          3
## 4:  0.7621032           0.925 0.007052477          4
## 5:  0.5982708           0.775 0.009051174          5
## 
## 
## $glia$test_perf
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg14501977            10 -0.6653107  0.007728417     FALSE 0.01745930
## 2: cg02011981            10 -0.5970260  0.012028332     FALSE 0.03369357
## 3: cg25737283            10 -0.4803594  0.009757117     FALSE 0.04227059
## 4: cg11150667            10 -0.2643837  0.003424957     FALSE 0.04904385
## 5: cg07644184            10 -0.4876997  0.008267879     FALSE 0.03479852
##    train_mean_aupr validation_mean_aupr   cpg_score train_rank test_aupr
## 1:       1.0000000                1.000 0.001396744          1 0.9808673
## 2:       1.0000000                1.000 0.002695486          2 0.8743197
## 3:       0.9556481                1.000 0.003825165          3 1.0000000
## 4:       0.7621032                0.925 0.007052477          4 1.0000000
## 5:       0.5982708                0.775 0.009051174          5 0.6837868
## 
## $glia$elapsed_time
## Time difference of 0.5351162 secs
## 
## 
## $blood_cells
## $blood_cells$train_res
## $blood_cells$train_res$fold_id
## # A tibble: 4,090 × 3
##      Row Data     Fold  
##    <int> <chr>    <chr> 
##  1     1 Analysis Fold01
##  2     1 Analysis Fold02
##  3     1 Analysis Fold03
##  4     1 Analysis Fold04
##  5     1 Analysis Fold05
##  6     1 Analysis Fold06
##  7     1 Analysis Fold08
##  8     1 Analysis Fold09
##  9     1 Analysis Fold10
## 10     2 Analysis Fold01
## # ℹ 4,080 more rows
## 
## $blood_cells$train_res$train_summary
##             id     stat_origin mean_aupr mean_var_a fold_presence
##  1: cg02522196      train_aupr 0.9989550 0.05494539            10
##  2: cg02522196 validation_aupr 0.9994294 0.05494539            10
##  3: cg04785083      train_aupr 0.9996123 0.02253060            10
##  4: cg04785083 validation_aupr 0.9993818 0.02253060            10
##  5: cg05051606      train_aupr 0.9705120 0.07597889            10
##  6: cg05051606 validation_aupr 0.9819607 0.07597889            10
##  7: cg14286208      train_aupr 0.9985731 0.06884537            10
##  8: cg14286208 validation_aupr 0.9979712 0.06884537            10
##  9: cg18993949      train_aupr 0.9982355 0.07246407            10
## 10: cg18993949 validation_aupr 0.9983854 0.07246407            10
## 
## $blood_cells$train_res$dt_dmsv
##            id diff_means sum_variance pred_type
## 1: cg02522196 -0.7380026   0.02991550     FALSE
## 2: cg04785083 -0.8168825   0.01502010     FALSE
## 3: cg05051606  0.6092552   0.02819070      TRUE
## 4: cg14286208  0.6353021   0.02777695      TRUE
## 5: cg18993949  0.4876390   0.01722916      TRUE
## 
## $blood_cells$train_res$train_results
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg04785083            10 -0.8168825   0.01502010     FALSE 0.02253060
## 2: cg02522196            10 -0.7380026   0.02991550     FALSE 0.05494539
## 3: cg14286208            10  0.6353021   0.02777695      TRUE 0.06884537
## 4: cg18993949            10  0.4876390   0.01722916      TRUE 0.07246407
## 5: cg05051606            10  0.6092552   0.02819070      TRUE 0.07597889
##    train_aupr validation_aupr   cpg_score train_rank
## 1:  0.9996123       0.9993818 0.001812508          1
## 2:  0.9989550       0.9994294 0.004411787          2
## 3:  0.9985731       0.9979712 0.005542186          3
## 4:  0.9982355       0.9983854 0.005830916          4
## 5:  0.9705120       0.9819607 0.006553584          5
## 
## 
## $blood_cells$test_perf
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg04785083            10 -0.8168825   0.01502010     FALSE 0.02253060
## 2: cg02522196            10 -0.7380026   0.02991550     FALSE 0.05494539
## 3: cg14286208            10  0.6353021   0.02777695      TRUE 0.06884537
## 4: cg18993949            10  0.4876390   0.01722916      TRUE 0.07246407
## 5: cg05051606            10  0.6092552   0.02819070      TRUE 0.07597889
##    train_mean_aupr validation_mean_aupr   cpg_score train_rank test_aupr
## 1:       0.9996123            0.9993818 0.001812508          1 0.9839162
## 2:       0.9989550            0.9994294 0.004411787          2 0.9596942
## 3:       0.9985731            0.9979712 0.005542186          3 0.7353430
## 4:       0.9982355            0.9983854 0.005830916          4 0.7801676
## 5:       0.9705120            0.9819607 0.006553584          5 0.8978199
## 
## $blood_cells$elapsed_time
## Time difference of 0.412195 secs
## 
## 
## $fibroblasts
## $fibroblasts$train_res
## $fibroblasts$train_res$fold_id
## # A tibble: 4,090 × 3
##      Row Data     Fold  
##    <int> <chr>    <chr> 
##  1     1 Analysis Fold01
##  2     1 Analysis Fold02
##  3     1 Analysis Fold04
##  4     1 Analysis Fold05
##  5     1 Analysis Fold06
##  6     1 Analysis Fold07
##  7     1 Analysis Fold08
##  8     1 Analysis Fold09
##  9     1 Analysis Fold10
## 10     2 Analysis Fold01
## # ℹ 4,080 more rows
## 
## $fibroblasts$train_res$train_summary
##             id     stat_origin mean_aupr mean_var_a fold_presence
##  1: cg02837162      train_aupr 0.6718360  0.3429565             5
##  2: cg02837162 validation_aupr 0.6244753  0.3429565             5
##  3: cg02907837      train_aupr 0.6822593  0.2570682            10
##  4: cg02907837 validation_aupr 0.7013197  0.2570682            10
##  5: cg03369247      train_aupr 0.8324525  0.2781751            10
##  6: cg03369247 validation_aupr 0.8561180  0.2781751            10
##  7: cg03509193      train_aupr 0.6585173  0.3324427             9
##  8: cg03509193 validation_aupr 0.7008693  0.3324427             9
##  9: cg26165286      train_aupr 0.6508256  0.3180929             6
## 10: cg26165286 validation_aupr 0.6143620  0.3180929             6
## 
## $fibroblasts$train_res$dt_dmsv
##            id diff_means sum_variance pred_type
## 1: cg02837162 -0.3433228   0.04093078     FALSE
## 2: cg02907837 -0.4095691   0.04309653     FALSE
## 3: cg03369247 -0.5998845   0.10002811     FALSE
## 4: cg03509193 -0.3343467   0.03717846     FALSE
## 5: cg26165286 -0.5503741   0.10177788     FALSE
## 
## $fibroblasts$train_res$train_results
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg03369247            10 -0.5998845   0.10002811     FALSE  0.2781751
## 2: cg02907837            10 -0.4095691   0.04309653     FALSE  0.2570682
## 3: cg03509193             9 -0.3343467   0.03717846     FALSE  0.3324427
## 4: cg26165286             6 -0.5503741   0.10177788     FALSE  0.3180929
## 5: cg02837162             5 -0.3433228   0.04093078     FALSE  0.3429565
##    train_aupr validation_aupr  cpg_score train_rank
## 1:  0.8324525       0.8561180 0.02536830          1
## 2:  0.6822593       0.7013197 0.02672967          2
## 3:  0.6585173       0.7008693 0.03666839          3
## 4:  0.6508256       0.6143620 0.05465925          4
## 5:  0.6718360       0.6244753 0.06894682          5
## 
## 
## $fibroblasts$test_perf
##            id fold_presence diff_means sum_variance pred_type mean_var_a
## 1: cg03369247            10 -0.5998845   0.10002811     FALSE  0.2781751
## 2: cg02907837            10 -0.4095691   0.04309653     FALSE  0.2570682
## 3: cg03509193             9 -0.3343467   0.03717846     FALSE  0.3324427
## 4: cg26165286             6 -0.5503741   0.10177788     FALSE  0.3180929
## 5: cg02837162             5 -0.3433228   0.04093078     FALSE  0.3429565
##    train_mean_aupr validation_mean_aupr  cpg_score train_rank test_aupr
## 1:       0.8324525            0.8561180 0.02536830          1 0.7548379
## 2:       0.6822593            0.7013197 0.02672967          2 0.7377733
## 3:       0.6585173            0.7008693 0.03666839          3 0.6241444
## 4:       0.6508256            0.6143620 0.05465925          4 0.9547866
## 5:       0.6718360            0.6244753 0.06894682          5 0.4816497
## 
## $fibroblasts$elapsed_time
## Time difference of 0.4029391 secs
# adjust target names to match signature names

绘制 CimpleG CpG signature

# check generated signatures
plt <- signature_plot(
  cimpleg_result,
  train_data,
  train_targets,
  sample_id_column = "gsm",
  true_label_column = "cell_type"
)
print(plt$plot)

42578fa2bd535575e37e91525e68d39d.png

均值之差与方差和(dmsv)图

基本的绘图

plt <- diffmeans_sumvariance_plot(
  data = train_data,
  target_vector = train_targets$neurons == 1
)
print(plt)

f81ffd8996fb2ff52c7794ffeed56231.png

添加颜色,突出显示选定的功能

df_dmeansvar <- compute_diffmeans_sumvar(
  data = train_data,
  target_vector = train_targets$neurons == 1
)

parab_param <- .7

df_dmeansvar$is_selected <- select_features(
  x = df_dmeansvar$diff_means,
  y = df_dmeansvar$sum_variance,
  a = parab_param
)

plt <- diffmeans_sumvariance_plot(
  data = df_dmeansvar,
  label_var1 = "Neurons",
  color_all_points = "purple",
  threshold_func = function(x, a) (a * x) ^ 2,
  is_feature_selected_col = "is_selected",
  func_factor = parab_param
)
print(plt)

a90c67aad0a69c8b70ddf1b180cd3b88.png

标记特征基因

plt <- diffmeans_sumvariance_plot(
  data = df_dmeansvar,
  feats_to_highlight = cimpleg_result$signatures
)
print(plt)

054b01cba7f82c453de05388b156393e.png

绘制反卷积图

最小的例子只有4个signature

deconv_result <- run_deconvolution(
  cpg_obj = cimpleg_result,
  new_data = test_data
)

plt <- deconvolution_barplot(
  deconvoluted_data = deconv_result,
  meta_data = test_targets,
  sample_id = "gsm",
  true_label = "cell_type"
)
print(plt$plot)

5f7471506b571db8833d231dd8e8bd62.png

更高级的例子

这个例子更高级一些。首先,创建额外的反卷积结果,以便我们可以比较,使用CimpleG创建另外两个模型。一种只使用高甲基化的特征,另一种每个特征使用3个CpGs,而不是一个。

set.seed(42)
cimpleg_hyper <- CimpleG(
  train_data = train_data,
  train_targets = train_targets,
  test_data = test_data,
  test_targets = test_targets,
  method = "CimpleG",
  pred_type = "hyper",
  target_columns = c(
    "neurons",
    "glia",
    "blood_cells",
    "fibroblasts"
  )
)
## Training for target 'neurons' with 'CimpleG' has finished.: 0.48 sec elapsed
## Training for target 'glia' with 'CimpleG' has finished.: 0.28 sec elapsed
## Training for target 'blood_cells' with 'CimpleG' has finished.: 0.33 sec elapsed
## Training for target 'fibroblasts' with 'CimpleG' has finished.: 0.28 sec elapsed


deconv_hyper <- run_deconvolution(
  cpg_obj = cimpleg_hyper,
  new_data = test_data
)


set.seed(42)
cimpleg_3sigs <- CimpleG(
  train_data = train_data,
  train_targets = train_targets,
  test_data = test_data,
  test_targets = test_targets,
  method = "CimpleG",
  n_sigs = 3,
  target_columns = c(
    "neurons",
    "glia",
    "blood_cells",
    "fibroblasts"
  )
)
## Training for target 'neurons' with 'CimpleG' has finished.: 0.38 sec elapsed
## Training for target 'glia' with 'CimpleG' has finished.: 0.36 sec elapsed
## Training for target 'blood_cells' with 'CimpleG' has finished.: 0.37 sec elapsed
## Training for target 'fibroblasts' with 'CimpleG' has finished.: 0.5 sec elapsed

deconv_3sigs <- run_deconvolution(
  cpg_obj = cimpleg_3sigs,
  new_data = test_data
)

让我们也创建一些假的真值,以便我们可以比较所有的结果。记住这只是一个例子,结果本身是没有意义的!

deconv_3sigs$prop_3sigs <- deconv_3sigs$proportion
deconv_hyper$prop_hyper <- deconv_hyper$proportion
deconv_result$prop_cimpleg <- deconv_result$proportion

dummy_deconvolution_data <-
  deconv_result |> 
  dplyr::mutate(true_vals = proportion + runif(nrow(deconv_result), min=-0.1,max=0.1)) |>
  dplyr::select(cell_type,sample_id,prop_cimpleg,true_vals) |>
  dplyr::left_join(deconv_hyper |> dplyr::select(-proportion), by=c("sample_id","cell_type")) |>
  dplyr::left_join(deconv_3sigs |> dplyr::select(-proportion), by=c("sample_id","cell_type")) |>
  dplyr::mutate_if(is.numeric, function(x){ifelse(x<0,0,x)}) |>
  dplyr::mutate_if(is.numeric, function(x){ifelse(x>1,1,x)}) |> 
  tibble::as_tibble()

现在让我们利用一些用来比较反卷积结果的绘图函数,我们可以检查真实值与预测值的比较。

install.packages("broom")
## 程序包'broom'打开成功,MD5和检查也通过
## 
## 下载的二进制程序包在
## 	C:\Users\Lenovo\AppData\Local\Temp\RtmpOonYSV\downloaded_packages里
scatter_plts <- CimpleG:::deconv_pred_obs_plot(
  deconv_df = dummy_deconvolution_data,
  true_values_col = "true_vals",
  predicted_cols = c("prop_cimpleg","prop_hyper","prop_3sigs"),
  sample_id_col = "sample_id",
  group_col= "cell_type"
)
scatter_panel <- scatter_plts |> patchwork::wrap_plots(ncol=1)

print(scatter_panel)

b58936a83bc38c05606d7afa549cd168.png

现在,更有趣的是,我们可以详细地看到并对用于评估反卷积结果的一个措施进行排序。

rank_plts <- CimpleG:::deconv_ranking_plot(
  deconv_df = dummy_deconvolution_data,
  true_values_col = "true_vals",
  predicted_cols = c("prop_cimpleg","prop_hyper","prop_3sigs"),
  sample_id_col = "sample_id",
  group_col= "cell_type",
  metrics = "rmse"
)
rank_panel <- list(rank_plts$perf_boxplt[[1]],rank_plts$nemenyi_plt[[1]]) |> patchwork::wrap_plots()

print(rank_panel)

297aaaf31b14af7dcb3addb9a8f29d01.png

Reference

  1. Maié T, Schmidt M, Erz M, Wagner W, G Costa I. CimpleG: finding simple CpG methylation signatures. Genome Biol. 2023 Jul 10;24(1):161. doi: 10.1186/s13059-023-03000-0. PMID: 37430364; PMCID: PMC10332104.

桓峰基因,铸造成功的您!

未来桓峰基因公众号将不间断的推出单细胞系列生信分析教程,

敬请期待!!

桓峰基因官网正式上线,请大家多多关注,还有很多不足之处,大家多多指正!

http://www.kyohogene.com/

桓峰基因和投必得合作,文章润色优惠85折,需要文章润色的老师可以直接到网站输入领取桓峰基因专属优惠券码:KYOHOGENE,然后上传,付款时选择桓峰基因优惠券即可享受85折优惠哦!https://www.topeditsci.com/

1463e561fd4b9a27502424f0946f875b.png

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/880470.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

见证马斯克的钞能力,AI.com再次易主,OpenAI投掷1100万美金购买AI.com刚满五个月

我们又一次见证了马斯克的钞能力。上次是去年他用440亿美元买下推特。 高价值的AI.com域名在2021年易主后&#xff0c;闲置过一段时间&#xff0c;今年2月份突然重定向到ChatGPT。 对于ChatGPT用户来说&#xff0c;每次访问都要在浏览器里敲这些字符&#xff1a;https://chat.o…

Java继承详解

目录 继承 为什么需要继承 继承的概念 继承的语法 父类成员的访问 子类中访问父类的成员变量 1.子类和父类不存在同名的成员变量 2.子类和父类成员变量同名 子类中访问父类的成员方法 1.成员方法名字不同 2.成员方法名字相同 super关键字 子类构造方法 super和thi…

【Redis】Redis三种集群模式-主从、哨兵、集群各自架构的优点和缺点对比

文章目录 前言1. 单机模式2. 主从架构3. 哨兵4. 集群模式总结 前言 如果Redis的读写请求量很大&#xff0c;那么单个实例很有可能承担不了这么大的请求量&#xff0c;如何提高Redis的性能呢&#xff1f;你也许已经想到了&#xff0c;可以部署多个副本节点&#xff0c;业务采用…

FPGA应用学习笔记------系统复位一(同异复位)

要满足复位恢复时间才能正常复位&#xff0c;不然会产生输出准稳态&#xff0c;输出逻辑错误 复位恢复时间只会存在复位释放时刻&#xff0c;不会出现在确立时刻&#xff0c;则不推荐完全异步复位 完全同步复位&#xff0c;肯定是同步于时钟滴&#xff0c;并将总是满足时钟条件…

视觉SLAM十四讲---【第三讲-三维空间刚体运动】

坐标系和位姿变换 坐标系 在三维空间中&#xff0c;三根不共面的轴&#xff0c;坐标系能用他的基来表示。 机器人中各种坐标系&#xff1a; 世界系、惯性系机体系传感器参考系 点、向量、坐标系 坐标系分为左左手系和右手系 下面讨论有关向量的运算&#xff1a; 内积(对应坐…

6.物联网操作系统信号量,二值信号量,计数信号量

一。信号量的概念与应用 信号量定义 FreeRTOS信号量介绍 FreeRTOS信号量工作原理 1.信号量的定义 多任务环境下使用&#xff0c;用来协调多个任务正确合理使用临界资源。 2.FreeRTOS信号量介绍 Semaphore包括Binary&#xff0c;Count&#xff0c;Mutex&#xff1b; Mutex包…

jmeter返回值中的中文显示为????问号处理解决方案

jmeter返回值中的中文显示为????问号 查找解决方案时&#xff0c;发现了以下两种解决方案&#xff1a; 一、1.打开jmter配置文件bin/jmeter.properties 2.修改配置文件&#xff0c;查找“sampleresult.default.encoding”将其改为utf8&#xff0c;注意要去掉“#”号 sample…

el-table实现静态和动态合并单元格 以及内容显示的问题

实现效果图 <el-tablev-loading"loading":data"tableData"style"width: 100%":row-class-name"tableRowClassName"size"small"><el-table-column fixed label"序号" width"50"><el-tab…

ad+硬件每日学习十个知识点(33)23.8.13 (导出gerber,PCB加工工艺)

文章目录 1.第一次制造输出2.第二次制造输出3.第三次制造输出4.嘉立创加工工艺信息5.PCB板材分类6.PCB的板子厚度和内外层铜厚1.板子厚度2.内外层铜厚 7.什么是PCB喷锡&#xff1f;8.PCB喷锡的主要作用。9.有铅喷锡和无铅喷锡的区别。 1.第一次制造输出 答&#xff1a; 2.…

首批通过!曙光云多款产品通过信通院可信云认证

7月25日&#xff0c;由中国信通院主办的2023可信云大会在北京举行&#xff0c;中科曙光Cloudview云计算操作系统和StackCube-K超融合系统获得可信云首批认证&#xff0c;并分别通过《一云多芯稳定性度量评估模型》增强级要求和《可信云超融合面向信创场景的评估》标准。 为响应…

R语言画图的-- ggplot2(实现图的精细修改)

文章目录 1. theme函数实现图的全局修改2. 图的精确修改3. 其他修改1. 坐标轴的排序&#xff1a;2. 实现一页多图 4. 具体作图中的参数修改(某些特殊的参数)柱状图的参数修改 写在最后 ggplot2是R中用来作图的很强的包&#xff0c;但是其用法比较多且各种参数比较复杂&#xff…

司徒理财:8.15早盘黄金1905多,最新操作建议

黄金昨日虽然再次新低&#xff0c;但是在司徒所强调的1902位置企稳&#xff0c;反弹即将开启&#xff0c;早盘依托1902的支撑低多看涨&#xff0c;1905现价可以直接多&#xff01;黄金本次的下跌已经接近尾声&#xff0c;弱不再弱必转强&#xff01;长时间大幅度的下跌后必将迎…

电脑剪辑用哪个软件比较好?电脑视频剪辑软件分享

在电脑上剪辑视频可以让您更容易地编辑和组织素材&#xff0c;以及添加音频、标题和其他效果。此外&#xff0c;电脑上的剪辑软件通常比手机上的应用程序更强大&#xff0c;使我们可以进行更精细的编辑&#xff0c;并获得更好的最终产品。那么电脑剪辑视频哪个软件比较好用呢&a…

创建maven的Springboot项目出现错误:Cannot access alimaven

创建maven的Springboot项目出现错误&#xff1a;Cannot access alimaven 1&#xff09;问题2) 分析问题3&#xff09;解决问题 1&#xff09;问题 创建maven的Springboot项目出现错误&#xff1a; Cannot access alimaven (http://maven.aliyun.com/nexus/content/groups/p…

开学季电容笔怎么选?iPad第三方电容笔了解下

不少的学生党开学必备清单里都少不了电容笔&#xff0c;可见其的重要性。自从苹果发布了ipad的原装电容笔以来&#xff0c;这款电容笔在目前市面上就一直很受欢迎&#xff0c;不过由于Apple Pencil的售价实在是太贵了&#xff0c;使得大部分人都买不起。于是&#xff0c;市面上…

【LeetCode】242 . 有效的字母异位词

242 . 有效的字母异位词&#xff08;简单&#xff09; 方法&#xff1a;哈希表 思路 首先判断两个字符串长度是否相等&#xff0c;不相等直接返回 false&#xff1b;接下来设置一个长度为26 的哈希表&#xff0c;分别对应26个小写字母&#xff1b;遍历两个字符串&#xff0c;…

【虚幻引擎】UE5数字人的创建

安装插件 在插件里面找到MetaHuman&#xff0c;设置激活&#xff0c;然后重启引擎 找到bridge&#xff0c;并开启&#xff0c;这个需要我们制作完成的metahuman需要在这个插件里下载&#xff0c;unreal5自动安装 创建metahuman 首先添加一个metahuman本体&#xff0c;如果你的插…

log4net使用

一. Log4Net简介 Log4net是从Java中的Log4j迁移过来的一个.Net版的开源日志框架&#xff0c;它的功能很强大&#xff0c;可以将日志分为不同的等级&#xff0c;以不同的格式输出到不同的存储介质中&#xff0c;比如&#xff1a;数据库、txt文件、内存缓冲区、邮件、控制台、ANS…

02.用户信息UserDetails相关入门

1. 前言 前一篇介绍了 Spring Security 入门的基础准备。从这篇开始我们来一步步窥探它是如何工作的。我们又该如何驾驭它。本篇将通过 Spring Boot 2.x 来讲解 Spring Security 中的用户主体UserDetails。以及从中找点乐子。 2. Spring Boot 集成 Spring Security 这个简直…

同比增长近4倍!5G智能座舱爆发

5G智能座舱&#xff0c;正在进入爆发期。 高工智能汽车研究院监测数据显示&#xff0c;2023年1-6月中国市场&#xff08;不含进出口&#xff09;乘用车前装标配5G智能座舱交付63.18万辆&#xff08;含选装&#xff09;&#xff0c;同比增长370.09%。 同时&#xff0c;5G与车载智…