【科研绘图系列】R语言绘制SCI论文图合集

禁止商业或二改转载，仅供自学使用，侵权必究，如需截取部分内容请后台联系作者!

在这里插入图片描述

文章目录

- 介绍
- 加载R包
- 数据下载
- Load data
- Figure 1
- - Fig 1B: functional assays adhension
  - FIG 1C: Functional assays OPK
- Figure 2
- - Fig 2C: Settings and function for heatmap
  - Fig 2C (continued): automate comparisons for IgG (1 month)
  - Fig 2C (continued): Automate comparisons for IgA (1week)
  - Related to Fig 2C - Supp Figure 1: heatmaps for IgG 1wk and IgA 1 mo.
  - Fig 2D and E: Column graphs of the #antigens each individual responded to for serum IgG
  - Figure 2D and E (continued): Column graph of the #antigens each individual responded to for saliva IgA
  - Fig 2F: graphing significant antigens raw values
  - Related to Fig 2F - Supplementary Fig S3 and S4: Graph each antigen seperately - raw titres
  - Fig 2G : Heatmap of correlation coefficients between fold change in IgA and IgG.
- Fig 3: Luminex data over 3 months
- Fig 4: luminex compare responses to pharyngitis children
- Figure 5: Luminex: Correlation of pre-challenge titre with 1 month fold change:
- Fig 6 and 7: multivariate analysis - data preparation
- Fig 6: Multidimensional scaling
- Fig 6 (continued) : compare MDS1
- Fig 7: Correlation with outcome variables
- Figure 7 (continued) : sPLS-DA
- 系统信息

介绍

【科研绘图系列】R语言绘制SCI论文图合集

加载R包

library(ggplot2)
library(tidyverse)
library(reshape2)
library(ggbeeswarm)
library(pheatmap)
library(RColorBrewer)
library(viridis)
library(vegan)
library(ggpubr)
library(gghalves)
library(Polychrome)
library(uwot)
library(mixOmics)
library(cowplot)

数据下载

所需要的数据的下载链接：

百度网盘链接: 从百度网盘下载
提取码: t2hs

在这里插入图片描述

Load data

load("RawData.RData")

Figure 1

The following code can be used to generate figures for the saliva adhesion assay and opsonic phagocytosis assay (OPK)

Fig 1B: functional assays adhension

#split out timepoints
Adhesion.pre <- data[,c("id","Pharyngitis","Adehesion_percent_Pre_saliva")]
Adhesion.post <- data[,c("id","Pharyngitis","Adehesion_percent_1_week_saliva")]
#define new timepoint variable
Adhesion.pre$time_point<- "Pre.challenge"
Adhesion.post$time_point<- "1week"
#rename vars
names(Adhesion.pre) <- c("id"  ,"Pharyngitis","Adhension Percent", "time_point")     
names(Adhesion.post) <- c("id"  ,"Pharyngitis","Adhension Percent", "time_point")     
#bind back together
Adhesion <- rbind(Adhesion.pre, Adhesion.post)
Adhesion$time_point <- factor(Adhesion$time_point, levels = c("Pre.challenge","1week"))
#make new grouping variable for use with graphing. 
Adhesion$Group <- factor(paste(Adhesion$Pharyngitis,Adhesion$time_point, sep = "_"), levels = c("no pharyngitis_Pre.challenge", "no pharyngitis_1week", "pharyngitis_Pre.challenge", "pharyngitis_1week"))

#plot the data
ggplot(Adhesion, aes(Group,Adhesion$`Adhension Percent`,fill = Pharyngitis))+
geom_violin(aes(alpha = time_point), draw_quantiles = c(0.5), scale= "width", position = "dodge")+
geom_beeswarm(size = 3, alpha = 0.75, cex =1, shape = 21)+
geom_line(aes(group = id, color = Pharyngitis), size = 0.35, alpha = 0.8)+
scale_fill_manual(values = c("blue", "red"))+
scale_alpha_manual(values = c(0.15, 0.5))+
scale_color_manual(values = c("blue", "red"))+
ylab("Percent Adhesion")+
  xlab("")+
  scale_y_continuous(expand = c(0, 0))+
  coord_cartesian(ylim = c(0,200))+
  custom_theme()
  
#Save the plot to a PDF file
 ggsave(file = "Adhesion.pdf", width = 4, height = 4 )

# calculate p-value for pharyngitis group, pre and post, paired Wilcoxon-signed rank test
wilcox.test(Adhesion[Adhesion$Pharyngitis == "pharyngitis" & Adhesion$time_point == "Pre.challenge",]$`Adhension Percent`, Adhesion[Adhesion$Pharyngitis == "pharyngitis"& Adhesion$time_point == "1week",]$`Adhension Percent`, paired = T)  
# calculate p-value for no pharyngitis group, pre and post, paired Wilcoxon-signed rank test
wilcox.test(Adhesion[Adhesion$Pharyngitis == "no pharyngitis" & Adhesion$time_point == "Pre.challenge",]$`Adhension Percent`, Adhesion[Adhesion$Pharyngitis == "no pharyngitis"& Adhesion$time_point == "1week",]$`Adhension Percent`, paired = T)
# calculate p-value for pre challenge, pharyngitis vs no pharyngitis, Mann-Whitney test 
wilcox.test(Adhesion[Adhesion$Pharyngitis == "pharyngitis" & Adhesion$time_point == "Pre.challenge",]$`Adhension Percent`, Adhesion[Adhesion$Pharyngitis == "no pharyngitis"& Adhesion$time_point == "Pre.challenge",]$`Adhension Percent`, paired = F)
# calculate p-value for 1 month, pharyngitis vs no pharyngitis, Mann-Whitney test
wilcox.test(Adhesion[Adhesion$Pharyngitis == "no pharyngitis" & Adhesion$time_point == "1week",]$`Adhension Percent`, Adhesion[Adhesion$Pharyngitis == "pharyngitis"& Adhesion$time_point == "1week",]$`Adhension Percent`, paired = F)

#remove dataframes that wont be used again
rm(Adhesion, Adhesion.post, Adhesion.pre)

FIG 1C: Functional assays OPK

#Split out time-points into seperate dataframe
OPK.pre <- data[,c("id","Pharyngitis","Opsonic_index_pre")] 
OPK.post <- data[,c("id","Pharyngitis","Opsonic_index_1mo")]

# Add time pont variable
OPK.pre$time_point<- "Pre.challenge" 
OPK.post$time_point<- "1month"

#rename columns for consistency
names(OPK.pre) <- c("id"  ,"Pharyngitis","Opsonic_index", "time_point")     
names(OPK.post) <- c("id"  ,"Pharyngitis","Opsonic_index", "time_point")     

#bind the the pre and post timepoints back together
OPK <- rbind(OPK.pre, OPK.post)

# Make time_point a factor with specified levels
OPK$time_point <- factor(OPK$time_point, levels = c("Pre.challenge","1month"))

# Create a binary variable to distinguish two individuals with OPK response
OPK$responders <- ifelse(OPK$id %in% c("SN010", "SN013"), "Y", "N")

# Create a new grouping variable that distinguishes based on responders and pharyngitis variables
OPK$Group <- factor(paste(OPK$responders, OPK$Pharyngitis, OPK$time_point, sep = "_"), 
                    levels = c("Y_no pharyngitis_Pre.challenge", "Y_no pharyngitis_1month", 
                               "Y_pharyngitis_Pre.challenge", "Y_pharyngitis_1month",
                               "N_no pharyngitis_Pre.challenge", "N_no pharyngitis_1month", 
                               "N_pharyngitis_Pre.challenge", "N_pharyngitis_1month"))

#plot data
ggplot(OPK, aes(Group, Opsonic_index, fill = Pharyngitis))+
geom_col(stat="identity",position = "dodge", aes(alpha = time_point))+
geom_beeswarm(size = 3, alpha = 0.75, cex =0.5, shape = 21)+
scale_fill_manual(values = c("blue", "red"))+
scale_alpha_manual(values = c(0.15, 0.5))+
ylab("OPK index")+
  xlab("")+
  scale_y_continuous(expand = c(0, 0))+
  coord_cartesian(ylim = c(0,100))+
  custom_theme()+
  theme( axis.text.x = element_text(colour = "black", size = 17, angle = 90)) # change orientation of X-axis text

# Save the plot to a PDF file
ggsave(file = "OPK.pdf", width = 8, height = 7 )


#remove datasets that dont get used again
rm(OPK, OPK.pre, OPK.post)

Figure 2

The following code can be used to generate figures for the the analysis of ELISA data shown in figure 2 and linked supplementary figures

#Fig 2B: Heatmap Saliva IgA and Serum IgG and Saliva IgA
Note: Column order in heatmap was manually altered in Adobe Illustrator

#use this for both pheatmap plotting objects
breaksList <- seq(from =-2.5, to = 2.5, by = 0.1)

#Generate new dataframe subsetting into pre-challenge and 1 month data for IgA
plotting <- data_long[data_long$timepoint %in% c("Pre.challenge") & data_long$Type == "IgA",]
plotting_1 <- data_long[data_long$timepoint %in% c("1week") & data_long$Type == "IgA",]

#add other variables back in
plotting <- left_join(plotting, outcome, by = "id")
plotting_1 <- left_join(plotting_1, outcome, by = "id")

#bind them back together
plotting <- rbind(plotting, plotting_1)
plotting <-(plotting[complete.cases(plotting[,colnames(plotting) %in% Antigen.Order]),])

#transpose 
plotting_t <- t(plotting[,c(colnames(plotting) %in% Antigen.Order)])
#add column names
colnames(plotting_t) <- paste(plotting$id, plotting$timepoint, sep = "_")

#generate annotation variables for heatmap
mat_col <- data.frame(outcome=as.factor(ifelse(plotting$id %in% pharyngitis, "pharyngitis", "no pharyngitis"))) 
rownames(mat_col) <- colnames(plotting_t) 
mat_colors <- list(outcome = c("blue","red"))  
names(mat_colors$outcome) <- levels(mat_col$outcome) 

#Plot data
pdf(file = "heatmap_raw_titres_IgA.pdf", height = 4, width = 6)
pheatmap(plotting_t, 
         breaks = breaksList,
         color = viridis(50),
         cellwidth = 2.5, cellheight =9, 
         border_color = "black",
         scale = "row",
         cluster_rows = F,
         cluster_cols = F, 
         clustering_distance_cols = "correlation", 
         clustering_method = "ward.D2", 
         annotation_col = mat_col, annotation_colors = mat_colors, 
         annotation_legend = TRUE,
         show_rownames = T, show_colnames = F, 
         fontsize = 8, display_numbers = F, number_format = "%.2f")
dev.off()

# Now lets plot IgG
#Generate new dataframe subsetting into pre-challenge and 1 month data
plotting <- data_long[data_long$timepoint %in% c("Pre.challenge") & data_long$Type == "IgG",]
plotting_1 <- data_long[data_long$timepoint %in% c("1month") & data_long$Type == "IgG",]
#add other variables back in
plotting <- left_join(plotting, outcome, by = "id")
plotting_1 <- left_join(plotting_1, outcome, by = "id")

#bind then back together
plotting <- rbind(plotting, plotting_1)
#remove the two individuals for IgG without 1 month data
plotting <-(plotting[complete.cases(plotting[,colnames(plotting) %in% Antigen.Order]),])

#transform the data
plotting_t <- t(plotting[,c(colnames(plotting) %in% Antigen.Order)])
colnames(plotting_t) <- paste(plotting$id, plotting$timepoint, sep = "_")

colnames(plotting_t) <- paste(plotting$id, plotting$timepoint, sep = "_")

#generate annotation variables for heatmap
mat_col <- data.frame(outcome=as.factor(ifelse(plotting$id %in% pharyngitis, "pharyngitis", "no pharyngitis"))) 
rownames(mat_col) <- colnames(plotting_t) 
mat_colors <- list(outcome = c("blue","red"))  
names(mat_colors$outcome) <- levels(mat_col$outcome) 

pdf(file = "heatmap_raw_titres_IgG.pdf", height = 4, width = 6)
pheatmap(plotting_t, 
         breaks = breaksList,
         color = viridis(length(breaksList)),,
         cellwidth = 2.5, cellheight =9, 
         border_color = "black",
         scale = "row",
         cluster_rows = F,
         cluster_cols = F, 
         clustering_distance_cols = "correlation", 
         clustering_method = "ward.D2", 
         annotation_col = mat_col, annotation_colors = mat_colors, 
         annotation_legend = TRUE,
         show_rownames = T, show_colnames = F, 
         fontsize = 8, display_numbers = F, number_format = "%.2f")
dev.off()

#remove dataframes/variables no longer needed
rm(mat_col, mat_colors, plotting, plotting_1, plotting_t, breaksList)

Fig 2C: Settings and function for heatmap

# define colour scheme
breaksList = seq(from = 0.5, to = 1.5, by = 0.005)
heatpal <- c("seagreen","white", "mediumpurple3")

# heatmap function:
heatIT <- function(heated) {
pheatmap(heated, 
         kmeans_k = NA, 
         breaks = breaksList, 
         color = colorRampPalette(heatpal)(length(breaksList)),
         border_color = "black",
         cellwidth = 11, cellheight =11, 
         # display_numbers = test_labels,
         number_format = "%",
         scale = "none", 
         cluster_rows = F,
         cluster_cols = F, 
         number_color = "grey",
         show_rownames = T, show_colnames = F, 
         fontsize = 8, display_numbers = F,
         filename = NA, silent = FALSE)
}

Fig 2C (continued): automate comparisons for IgG (1 month)

#Select IgG variable names
Ig.Names <- names(data[,32:150])
#Select within Ig.names for these variables at the following time-points
PreChallenge <- Ig.Names[grepl("_Pre.challenge", Ig.Names)]
OneMonth <- Ig.Names[grepl("1month", Ig.Names)]
OneMonth <- OneMonth[!grepl("1monthFC", OneMonth)]
FoldChange <-  Ig.Names[grepl("1monthFC", Ig.Names)]

#subset data into each timepoint
PreChallenge <- data[,PreChallenge] 
OneMonth <- data[,OneMonth] 
FoldChange <- data[,FoldChange] 

#tidy variable names
names(PreChallenge) <- gsub("IgG_", "", names(PreChallenge))
names(PreChallenge) <- gsub("_Pre.challenge", "", names(PreChallenge))
names(OneMonth) <- gsub("IgG_", "", names(OneMonth))
names(OneMonth) <- gsub("_1month", "", names(OneMonth))
names(FoldChange) <- gsub("IgG_", "", names(FoldChange))
names(FoldChange) <- gsub("_1monthFC", "", names(FoldChange))

#add outcome variable
PreChallenge$Pharyngitis <- data$Pharyngitis
OneMonth$Pharyngitis <- data$Pharyngitis
FoldChange$Pharyngitis <- data$Pharyngitis

Postchallenge = matrix(data = 0, nrow = 17, ncol = 11) #initialize matrix
for(i in 1:17){
  Postchallenge[i,1] <- median(FoldChange[c(1:19),i],na.rm = T) #median FC pharyngitis participants
  Postchallenge[i,2] <- median(FoldChange[c(20:25),i],na.rm = T) #median FC non pharyngitis participants
  Postchallenge[i,3] <- wilcox.test(FoldChange[c(1:19),i],FoldChange[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge[i,4] <- wilcox.test(PreChallenge[c(1:19),i],OneMonth[c(1:19),i], paired = T, alternative = "two.sided", na.rm = T)$p.value 
  Postchallenge[i,5] <- wilcox.test(PreChallenge[c(20:25),i],OneMonth[c(20:25),i], paired = T, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge[i,6] <- wilcox.test(PreChallenge[c(1:19),i],PreChallenge[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge[i,7] <- wilcox.test(OneMonth[c(1:19),i],OneMonth[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
}
for(i in 1:17){
Postchallenge[i,c(8:9)] <- p.adjust(Postchallenge[i,c(4:5)], method = "fdr" )
Postchallenge[i,c(10:11)] <- p.adjust(Postchallenge[i,c(6:7)], method = "fdr" )
}

#make into a data-frame
Postchallenge<- as.data.frame(Postchallenge)
#define column names 
names(Postchallenge) <- c("FC_Pharyn", "FC_NoPharyn","p_FcvsFC","p_0vs1_pharyn","p_0vs1_nopharyn","p_0vs0", "p_1vs1","p_0vs1_pharyn.adj","p_0vs1_nopharyn.adj","p_0vs0.adj", "p_1vs1.adj")

#name rows with antigen names
rownames(Postchallenge) <- names(FoldChange)[1:17] 
#name rows with antigen names
Postchallenge <- Postchallenge[c(Antigen.Order,additional),]#reorder 

# export table of FC and p-values to use to annotate figures. 
write.csv(file = "Postchallenge.IgG.csv", Postchallenge)

# Make heatmap - Main antigens 
heated <- (Postchallenge[Antigen.Order,c(2,1)])
pdf(file = "IgG_FcHeatmap.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()
# Make heatmap - additional antigens 
heated <- (Postchallenge[additional,c(2,1)])
pdf(file = "IgG_FcHeatmap.additional.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()

Fig 2C (continued): Automate comparisons for IgA (1week)

# Select IgA variables
Ig.Names <- names(data[,151:201])

#subset variable names for each timepoint
PreChallenge <- Ig.Names[grepl("_Pre.challenge", Ig.Names)] # extract variable names
OneWeek <- Ig.Names[grepl("1week", Ig.Names)] # extract variable names
OneWeek <- OneWeek[!grepl("1weekFC", OneWeek)] # extract variable names
FoldChange <-  Ig.Names[grepl("1weekFC", Ig.Names)] # extract variable names 

#subset data by names
PreChallenge <- data[,PreChallenge] #extract prechallenge data
OneWeek <- data[,OneWeek]  #extract fold change data
FoldChange <- data[,FoldChange] #extract fold change data

#tidy variable names
names(PreChallenge) <- gsub("IgA_", "", names(PreChallenge))
names(PreChallenge) <- gsub("_Pre.challenge", "", names(PreChallenge))
names(OneWeek) <- gsub("IgA_", "", names(OneWeek))
names(OneWeek) <- gsub("_1week", "", names(OneWeek))
names(FoldChange) <- gsub("IgA_", "", names(FoldChange))
names(FoldChange) <- gsub("_1weekFC", "", names(FoldChange))

#ADD pharyngitis variable
PreChallenge$Pharyngitis <- data$Pharyngitis
OneWeek$Pharyngitis <- data$Pharyngitis
FoldChange$Pharyngitis <- data$Pharyngitis

#double check that column names are identical
names(FoldChange) == names(PreChallenge)  
names(PreChallenge) == names(OneWeek)  

#generate fold change and p-value matrix
Postchallenge.IgA = matrix(data = 0, nrow = 17, ncol = 11) #initialize matrix
for(i in 1:17){
  Postchallenge.IgA[i,1] <- median(FoldChange[c(1:19),i],na.rm = T) #median FC pharyngitis participants
  Postchallenge.IgA[i,2] <- median(FoldChange[c(20:25),i],na.rm = T) #median FC non pharyngitis participants
  Postchallenge.IgA[i,3] <- wilcox.test(FoldChange[c(1:19),i],FoldChange[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge.IgA[i,4] <- wilcox.test(PreChallenge[c(1:19),i],OneWeek[c(1:19),i], paired = T, alternative = "two.sided", na.rm = T)$p.value 
  Postchallenge.IgA[i,5] <- wilcox.test(PreChallenge[c(20:25),i],OneWeek[c(20:25),i], paired = T, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge.IgA[i,6] <- wilcox.test(PreChallenge[c(1:19),i],PreChallenge[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge.IgA[i,7] <- wilcox.test(OneWeek[c(1:19),i],OneWeek[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
 }
for(i in 1:17){
Postchallenge.IgA[i,c(8:9)] <- p.adjust(Postchallenge.IgA[i,c(4:5)], method = "fdr" )
Postchallenge.IgA[i,c(10:11)] <- p.adjust(Postchallenge.IgA[i,c(6:7)], method = "fdr" )
}

#make into a dataframe
Postchallenge.IgA<- as.data.frame(Postchallenge.IgA)
#name columns
names(Postchallenge.IgA) <- c("FC_Pharyn", "FC_NoPharyn","p_FcvsFC","p_0vs1_pharyn","p_0vs1_nopharyn","p_0vs0", "p_1vs1","p_0vs1_pharyn.adj","p_0vs1_nopharyn.adj","p_0vs0.adj", "p_1vs1.adj")
#name rows
rownames(Postchallenge.IgA) <- names(OneWeek)[1:17]

#export p-values for annoating graph. 
write.csv(file = "Postchallenge.IgA.csv", Postchallenge.IgA)

#generate heatmap for IgA main antigens
heated <- (Postchallenge.IgA[Antigen.Order,c(2,1)])
pdf(file = "IgA_FcHeatmap.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()

#generate heatmap for IgA additional antigens
heated <- (Postchallenge.IgA[additional,c(2,1)])
pdf(file = "IgA_FcHeatmap.additional.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()

Related to Fig 2C - Supp Figure 1: heatmaps for IgG 1wk and IgA 1 mo.

#IgA 1 month data. used only for Supp fig x: 
FigS1_IgA$Mrp24 <- FigS1_IgA$Mrp24 +1
FigS1_IgA$P145 <-FigS1_IgA$P145 +1
pre <- FigS1_IgA[FigS1_IgA$X %in% "pre",]
mo1 <- FigS1_IgA[FigS1_IgA$X %in% "1mo",]
Foldchange_1mo <- mo1[,c(3:19)]/ pre[,c(3:19)]

mo1$Saliva.IgA.AU <- pre$Saliva.IgA.AU # add participant ID
Foldchange_1mo$Saliva.IgA.AU <- pre$Saliva.IgA.AU 

#keep only the 18 pharyngitis participants 
pre <- pre[pre$Saliva.IgA.AU %in% pharyngitis,]
mo1 <- mo1[mo1$Saliva.IgA.AU %in% pharyngitis,]
Foldchange_1mo <- Foldchange_1mo[pre$Saliva.IgA.AU %in% pharyngitis,]

#generate fold change and p-value matrix
Postchallenge.IgA.1month = matrix(data = 0, nrow = 17, ncol = 3) #initialize matrix
for(i in 1:17){
Postchallenge.IgA.1month[i,1] <- median(Foldchange_1mo[,i],na.rm = T) #median FC pharyngitis
Postchallenge.IgA.1month[i,2] <- wilcox.test(pre[,i+2],mo1[,i+2], paired = T, alternative = "two.sided", na.rm = T)$p.value
Postchallenge.IgA.1month[i,3] <- p.adjust(Postchallenge.IgA.1month[i,2], method = "fdr" )
}

#make into a dataframe
Postchallenge.IgA.1month<- as.data.frame(Postchallenge.IgA.1month)
#name columns
names(Postchallenge.IgA.1month) <- c("FC_Pharyn", "p_0vs1_pharyn","p_0vs1_nopharyn.adj")
#name rows
rownames(Postchallenge.IgA.1month) <- names(Foldchange_1mo[,1:17])

#export p-values for annoating graph. 
write.csv(file = "Postchallenge.IgA.1month.csv", Postchallenge.IgA.1month)

#generate heatmap for IgA main antigens
heated <- (Postchallenge.IgA.1month[Antigen.Order,])
pdf(file = "IgA_FcHeatmap_1mo.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()

#generate heatmap for IgA additional antigens
heated <- (Postchallenge.IgA.1month[additional,])
pdf(file = "IgA_FcHeatmap.additional_1mo.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated[,])
dev.off()


# Supp Fig 1 (continued) IgG 1 week. 
Ig.Names <- names(data[,32:150])
PreChallenge <- Ig.Names[grepl("_Pre.challenge", Ig.Names)]
OneWeek <- Ig.Names[grepl("1week", Ig.Names)]
OneWeek <- OneWeek[!grepl("1weekFC", OneWeek)]
FoldChange <-  Ig.Names[grepl("1weekFC", Ig.Names)]

PreChallenge <- data[,PreChallenge] 
OneWeek <- data[,OneWeek] 
FoldChange <- data[,FoldChange] 

names(PreChallenge) <- gsub("IgG_", "", names(PreChallenge))
names(PreChallenge) <- gsub("_Pre.challenge", "", names(PreChallenge))
PreChallenge$Pharyngitis <- data$Pharyngitis

names(OneWeek) <- gsub("IgG_", "", names(OneWeek))
names(OneWeek) <- gsub("_1week", "", names(OneWeek))
OneWeek$Pharyngitis <- data$Pharyngitis

names(FoldChange) <- gsub("IgG_", "", names(FoldChange))
names(FoldChange) <- gsub("_1weekFC", "", names(FoldChange))
FoldChange$Pharyngitis <- data$Pharyngitis

names(FoldChange) == names(PreChallenge) # double check
names(PreChallenge) == names(OneWeek) # double check

Postchallenge_1wk = matrix(data = 0, nrow = 17, ncol = 11) #initialize matrix
for(i in 1:17){
  Postchallenge_1wk[i,1] <- median(FoldChange[c(1:19),i],na.rm = T) #median FC pharyngitis participants
  Postchallenge_1wk[i,2] <- median(FoldChange[c(20:25),i],na.rm = T) #median FC non pharyngitis participants
  Postchallenge_1wk[i,3] <- wilcox.test(FoldChange[c(1:19),i],FoldChange[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge_1wk[i,4] <- wilcox.test(PreChallenge[c(1:19),i],OneWeek[c(1:19),i], paired = T, alternative = "two.sided", na.rm = T)$p.value 
  Postchallenge_1wk[i,5] <- wilcox.test(PreChallenge[c(20:25),i],OneWeek[c(20:25),i], paired = T, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge_1wk[i,6] <- wilcox.test(PreChallenge[c(1:19),i],PreChallenge[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
  Postchallenge_1wk[i,7] <- wilcox.test(OneWeek[c(1:19),i],OneWeek[c(20:25),i], paired = F, alternative = "two.sided", na.rm = T)$p.value
}
for(i in 1:17){
Postchallenge_1wk[i,c(8:9)] <- p.adjust(Postchallenge_1wk[i,c(4:5)], method = "fdr" )
Postchallenge_1wk[i,c(10:11)] <- p.adjust(Postchallenge_1wk[i,c(6:7)], method = "fdr" )
}

#make into a data-frame
Postchallenge_1wk<- as.data.frame(Postchallenge_1wk)
#define column names 
names(Postchallenge_1wk) <- c("FC_Pharyn", "FC_NoPharyn","p_FcvsFC","p_0vs1_pharyn","p_0vs1_nopharyn","p_0vs0", "p_1vs1","p_0vs1_pharyn.adj","p_0vs1_nopharyn.adj","p_0vs0.adj", "p_1vs1.adj")

rownames(Postchallenge_1wk) <- names(FoldChange)[1:17] #name rows with antigen names
Postchallenge_1wk <- Postchallenge_1wk[c(Antigen.Order,additional),]#reorder 

# export table of FC and p-values to use to annotate figures. 
write.csv(file = "Postchallenge.IgG_1wk.csv", Postchallenge_1wk)

# Make heatmap - Main antigens
heated <- (Postchallenge_1wk[Antigen.Order,c(2,1)])
pdf(file = "IgG_FcHeatmap.pdf_1wk.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()
# Make heatmap - additional antigens 
heated <- (Postchallenge_1wk[additional,c(2,1)])
pdf(file = "IgG_FcHeatmap.additional_1wk.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()

Fig 2D and E: Column graphs of the #antigens each individual responded to for serum IgG

# Specifying the antigens of interest
Filtered_Ags <- c("IgG_M75_1monthFC", "IgG_GAC_1monthFC", "IgG_SLO_1monthFC", "IgG_ScpA_1monthFC", 
                  "IgG_SPYAD_1monthFC", "IgG_SPYCEP_1monthFC", "IgG_ADI_1monthFC", "IgG_Enn336_1monthFC", 
                  "IgG_Mrp24_1monthFC", "IgG_TF_1monthFC", "IgG_T25_1monthFC")

#remove the two individuals without 1 month data
FC.vars <- data[!data$id %in% c("SN017","SN075"),]

# Initializing a binary matrix to store response data
binary <- matrix(nrow = 23, ncol = 11)

# Populating the binary matrix based on a 25% increase threshold
for(i in 1:11){
  binary[,i] <- if_else(FC.vars[, Filtered_Ags[i]] > 1.25, 1, 0)
}

# Converting the binary matrix to a data frame
binary <- as.data.frame(binary)
names(binary) <- Filtered_Ags

# Counting the number of positive responses (value == 1) for each row
binary$count.1 <- apply(binary, 1, function(x) length(which(x == 1)))

# Setting row names to match the IDs
row.names(binary) <- FC.vars$id

# Adding the Pharyngitis column from the original dataset
binary$Pharyngitis <- FC.vars$Pharyngitis

# Adding a new column to the original dataset to tally IgG responses
data$Tally_IgG <- ifelse(data$id %in% rownames(binary), binary$count.1, NA) 

# Creating a subset for specific responders to use in MDS analysis (Fig 6)
four.responders <- binary[,c("IgG_SPYCEP_1monthFC","IgG_GAC_1monthFC" ,"IgG_SLO_1monthFC","IgG_ScpA_1monthFC")]
four.responders$count <- apply(four.responders, 1, function(x) length(which(x==1)))
four.responders$count_2 <- ifelse(four.responders$count <2, "no", "yes")
four.responders <- rownames(four.responders[four.responders$count >2,])

# plot results (Fig2E IgG)
ggplot(binary, aes(Pharyngitis, count.1, fill = Pharyngitis))+
  geom_violin(draw_quantiles = c(0.5), scale= "width", alpha = 0.25)+
  geom_beeswarm(size = 3, alpha = 0.75, cex =5, shape = 21)+
  scale_fill_manual(values = c("blue", "red"))+
  ylab("Antigen response (#)")+
  xlab("")+
  stat_compare_means(method = "wilcox.test", tip.length = 0.005, bracket.size = 0.5, vjust = -1, paired = F,
                     aes(label = paste0("p = ", ..p.format..)))+
  coord_cartesian( ylim = c(0, 11))+
  scale_y_continuous(breaks = seq(0, 11, len = 12))+
  custom_theme()

#Saving the plot to a PDF file
ggsave(file = "Antigen_response_25pc_IgG_Tally.pdf", width = 3.25, height = 4 )

# Calculate the count of positive responses for each antigen in the "pharyngitis" group
Antigen.Tally <- apply(binary[binary$Pharyngitis == "pharyngitis",1:11], 2, function(x) length(which(x==1)))
# Define and clean antigen names
Antigen.Tally.Names <-Filtered_Ags
Antigen.Tally.Names <- gsub("_1monthFC", "", Antigen.Tally.Names)
Antigen.Tally.Names <- gsub("IgG_", "", Antigen.Tally.Names)

# Create a data frame for antigen tally
Antigen.Tally <- data.frame(Antigen = Antigen.Tally.Names, Count = Antigen.Tally)
Antigen.Tally$Antigen <- factor(Antigen.Tally$Antigen, levels = Antigen.Order)
Antigen.Tally$Percent <- (Antigen.Tally$Count/nrow(binary[binary$Pharyngitis == "pharyngitis",]))*100
Antigen.Tally$Pharyngitis <- "pharyngitis"

# Repeat the same process for the "no pharyngitis" group
Antigen.Tally.np <- apply(binary[binary$Pharyngitis == "no pharyngitis",1:11], 2, function(x) length(which(x==1)))
# Create a data frame for antigen tally in the "no pharyngitis" group
Antigen.Tally.np <- data.frame(Antigen = Antigen.Tally.Names, Count = Antigen.Tally.np)
Antigen.Tally.np$Antigen <- factor(Antigen.Tally.np$Antigen, levels = Antigen.Order)
Antigen.Tally.np$Percent <- (Antigen.Tally.np$Count/nrow(binary[binary$Pharyngitis == "no pharyngitis",]))*100
Antigen.Tally.np$Pharyngitis <- "no pharyngitis"

# Combine both groups into a single data frame
Antigen.Tally<- rbind(Antigen.Tally, Antigen.Tally.np)
Antigen.Tally$Pharyngitis <- factor(Antigen.Tally$Pharyngitis, levels = c("no pharyngitis","pharyngitis"))
# Reorder antigens by count for visualization
Antigen.Tally.Order<- Antigen.Tally.Names[rev(order(Antigen.Tally$Count))]
Antigen.Tally$Antigen <- factor(Antigen.Tally$Antigen, levels = Antigen.Order)

# Graph % and faceted by outcome (Fig2D IgG)
ggplot(Antigen.Tally, aes(x = Antigen, y = Percent, fill = Pharyngitis)) +
  geom_col(alpha = 0.75, color= "black") +
  geom_text(aes(label=Count), nudge_y = 3)+
scale_fill_manual(values = c("blue","red"))+
  xlab("")+
  ylab("Responders (%)")+
   scale_y_continuous(expand = c(0, 0))+
  coord_cartesian( ylim = c(0, 100))+
  facet_grid(. ~ Pharyngitis)+
  custom_theme()+ 
  theme(axis.text.x=element_text(angle = 90, vjust = 0.5, hjust=1))
#Save the percentage plot to a PDF file
ggsave(file = "Percent.responders.core.pdf", width = 5, height = 4)

rm(binary)
rm(FC.vars)

Figure 2D and E (continued): Column graph of the #antigens each individual responded to for saliva IgA

# Specifying the antigens of interest
Filtered_Ags <- c("IgA_M75_1weekFC", "IgA_GAC_1weekFC", "IgA_SLO_1weekFC", "IgA_ScpA_1weekFC", 
                  "IgA_SPYAD_1weekFC", "IgA_SPYCEP_1weekFC", "IgA_ADI_1weekFC", "IgA_Enn336_1weekFC", 
                  "IgA_Mrp24_1weekFC", "IgA_TF_1weekFC", "IgA_T25_1weekFC")

# build FC.vars dataframe
FC.vars <- data
# Initializing a binary matrix to store response data
binary <- matrix(nrow = 25, ncol = 11)

# Populating the binary matrix based on a 25% increase threshold
for(i in 1:11){
  binary[,i] <- if_else(FC.vars[, Filtered_Ags[i]] > 1.25, 1, 0)
}

# Converting the binary matrix to a data frame
binary <- as.data.frame(binary)
names(binary) <- Filtered_Ags

# Counting the number of positive responses (value == 1) for each row
binary$count.1 <- apply(binary, 1, function(x) length(which(x == 1)))

# Setting row names to match the IDs
row.names(binary) <- FC.vars$id

# Adding the Pharyngitis column from the original dataset
binary$Pharyngitis <- FC.vars$Pharyngitis

# Adding a new column to the original dataset to tally IgA responses
data$Tally_IgA <- ifelse(data$id %in% rownames(binary), binary$count.1, NA) 

# plot results (Fig2E IgA)
ggplot(binary, aes(Pharyngitis, count.1, fill = Pharyngitis))+
  geom_violin(draw_quantiles = c(0.5), scale= "width", alpha = 0.25)+
  geom_beeswarm(size = 3, alpha = 0.75, cex =5, shape = 21)+
  scale_fill_manual(values = c("blue", "red"))+
  ylab("Antigen response (#)")+
  xlab("")+
  stat_compare_means(method = "wilcox.test", tip.length = 0.005, bracket.size = 0.5, vjust = -1, paired = F,
                     aes(label = paste0("p = ", ..p.format..)))+
  coord_cartesian( ylim = c(0, 11))+
  scale_y_continuous(breaks = seq(0, 11, len = 12))+
  custom_theme()

#Saving the plot to a PDF file
ggsave(file = "Antigen_response_25pc_IgA_Tally.pdf", width = 3.25, height = 4 )

# Calculate the count of positive responses for each antigen in the "pharyngitis" group
Antigen.Tally <- apply(binary[binary$Pharyngitis == "pharyngitis",1:11], 2, function(x) length(which(x==1)))
# Define and clean antigen names
Antigen.Tally.Names <-Filtered_Ags
Antigen.Tally.Names <- gsub("_1weekFC", "", Antigen.Tally.Names)
Antigen.Tally.Names <- gsub("IgA_", "", Antigen.Tally.Names)

# Create a data frame for antigen tally
Antigen.Tally <- data.frame(Antigen = Antigen.Tally.Names, Count = Antigen.Tally)
Antigen.Tally$Antigen <- factor(Antigen.Tally$Antigen, levels = Antigen.Order)
Antigen.Tally$Percent <- (Antigen.Tally$Count/nrow(binary[binary$Pharyngitis == "pharyngitis",]))*100
Antigen.Tally$Pharyngitis <- "pharyngitis"

# Repeat the same process for the "no pharyngitis" group
Antigen.Tally.np <- apply(binary[binary$Pharyngitis == "no pharyngitis",1:11], 2, function(x) length(which(x==1)))
# Create a data frame for antigen tally in the "no pharyngitis" group
Antigen.Tally.np <- data.frame(Antigen = Antigen.Tally.Names, Count = Antigen.Tally.np)
Antigen.Tally.np$Antigen <- factor(Antigen.Tally.np$Antigen, levels = Antigen.Order)
Antigen.Tally.np$Percent <- (Antigen.Tally.np$Count/nrow(binary[binary$Pharyngitis == "no pharyngitis",]))*100
Antigen.Tally.np$Pharyngitis <- "no pharyngitis"

# Combine both groups into a single data frame
Antigen.Tally<- rbind(Antigen.Tally, Antigen.Tally.np)
Antigen.Tally$Pharyngitis <- factor(Antigen.Tally$Pharyngitis, levels = c("no pharyngitis","pharyngitis"))
# Reorder antigens by count for visualization
Antigen.Tally.Order<- Antigen.Tally.Names[rev(order(Antigen.Tally$Count))]
Antigen.Tally$Antigen <- factor(Antigen.Tally$Antigen, levels = Antigen.Order)

# Graph % and faceted by outcome (Fig2D IgA)
ggplot(Antigen.Tally, aes(x = Antigen, y = Percent, fill = Pharyngitis)) +
  geom_col(alpha = 0.75, color= "black") +
  geom_text(aes(label=Count), nudge_y = 3)+
scale_fill_manual(values = c("blue","red"))+
  xlab("")+
  ylab("Responders (%)")+
   scale_y_continuous(expand = c(0, 0))+
  coord_cartesian( ylim = c(0, 100))+
  facet_grid(. ~ Pharyngitis)+
  custom_theme()+ 
  theme(axis.text.x=element_text(angle = 90, vjust = 0.5, hjust=1))
#Save the percentage plot to a PDF file
ggsave(file = "Percent.responders.core.pdf", width = 5, height = 4)

#remove objects and dataframes that wont be used again
rm(binary)
rm(FC.vars)
rm(Filtered_Ags)

Fig 2F: graphing significant antigens raw values

# Define a function for plotting
plot_antigen <- function(data, antigen, antibody_type, timepoints, file_suffix) {
  # Filter data
  plotting <- data %>%
    filter(Antigen == antigen, timepoint %in% timepoints, Type == antibody_type)
# Define group variable
  plotting$Group <- factor(paste(plotting$Pharyngitis, plotting$timepoint, sep= "_"), 
                           levels = c(paste("no pharyngitis", timepoints, sep="_"), 
                                      paste("pharyngitis", timepoints, sep="_")))
  plotting$id <- factor(plotting$id)
  
  # Plot the graph
  p <- ggplot(plotting, aes(Group, value, fill = Pharyngitis)) +
    geom_violin(aes(alpha = timepoint), draw_quantiles = c(0.5), scale= "width", position = "dodge") +
    geom_beeswarm(size = 3, alpha = 0.75, cex =2, shape = 21) +
    geom_line(aes(group = id, color = Pharyngitis), size = 0.35, alpha = 0.8) +
    scale_fill_manual(values = c("blue", "red")) +
    scale_alpha_manual(values = c(0.15, 0.5)) +
    scale_color_manual(values = c("blue", "red")) +
    ylab(paste(antigen, "(ELISA AU)")) +
    xlab("") +
    coord_cartesian(ylim = c(0, max(plotting$value, na.rm = TRUE) * 1.25)) +
    custom_theme()
  
  # Save the plot
  ggsave(file = paste(antigen, file_suffix, ".pdf", sep = "_"), plot = p, width = 4, height = 4)
}

# Prepare data for IgG
rep.plotting_igg <- data_melt %>%
  filter(timepoint %in% c("Pre.challenge", "1month"), Type == "IgG")

# Antigens to plot for IgG
antigens_igg <- c("SPYCEP", "SLO", "ScpA", "GAC", "TF")

# Plot IgG antigens
for (antigen in antigens_igg) {
  plot_antigen(rep.plotting_igg, antigen, "IgG", c("Pre.challenge", "1month"), "rawtitres")
}

# Prepare IgA data
rep.plotting_iga <- data_melt %>%
  filter(timepoint %in% c("Pre.challenge", "1week"), Type == "IgA")

# Antigens to plot for IgA
antigens_iga <- c("GAC", "M75.HVR")

# Plot IgA antigens
for (antigen in antigens_iga) {
  plot_antigen(rep.plotting_iga, antigen, "IgA", c("Pre.challenge", "1week"), "IgA.rawtitres")
}

#remove variables and dataframes not used again
rm(rep.plotting_iga, rep.plotting_igg, antigens_iga, antigens_igg)

Related to Fig 2F - Supplementary Fig S3 and S4: Graph each antigen seperately - raw titres

#make a group variable with short names for the graph
rep.plotting <- data_melt[data_melt$timepoint %in% c("Pre.challenge","1month")& data_melt$Type == "IgG",]

rep.plotting$Group <- factor(paste(rep.plotting$Pharyngitis, rep.plotting$timepoint, sep= "_"), 
                         levels = c("no pharyngitis_Pre.challenge", "no pharyngitis_1month", 
                                    "pharyngitis_Pre.challenge", "pharyngitis_1month"))
rep.plotting$Group <- ifelse(rep.plotting$Group == "no pharyngitis_Pre.challenge", "NP.Pre", ifelse(rep.plotting$Group == "no pharyngitis_1month", "NP.1mo",ifelse(rep.plotting$Group == "pharyngitis_Pre.challenge", "P.Pre", "P.1mo")))

rep.plotting$Group <- factor(rep.plotting$Group, 
                         levels = c("NP.Pre","NP.1mo","P.Pre",  "P.1mo" ))

rep.plotting$id <- factor(rep.plotting$id)

#make the list
Antigen.order.all <- c(Antigen.Order, additional)
rep.plotting$Antigen <- factor(rep.plotting$Antigen , levels = Antigen.order.all )
antigen_levels <- unique(levels(rep.plotting$Antigen))

removeAgs <- c("GAC", "SLO", "ScpA", "SPYCEP") # These are already plotted
is_in_removeAgs <- antigen_levels %in% removeAgs
antigen_levels <- antigen_levels[!is_in_removeAgs]

#IgG data first
plot_list1 <- lapply(antigen_levels, function(antigen) {
  # Subset the data for the current level of 'Antigen'
  subset_df <- rep.plotting[rep.plotting$Antigen == antigen, ]
    ggplot(subset_df, aes(Group, value, fill = Pharyngitis))+
geom_violin(aes(alpha = timepoint), draw_quantiles = c(0.5), scale= "width", position = "dodge")+
geom_point(size = 3, alpha = 0.75, shape = 21)+
geom_line(aes(group = id, color = Pharyngitis), size = 0.35, alpha = 0.8)+
scale_fill_manual(values = c("blue", "red"))+
scale_alpha_manual(values = c(0.15, 0.5))+
scale_color_manual(values = c("blue", "red"))+
ylab(paste(antigen, "(ELISA AU)"))+
xlab("")+
coord_cartesian( ylim = c(0, max(subset_df$value, na.rm = T)*1.25))+
theme_bw()+
theme(legend.position = "none", axis.text.x=element_text(colour="black", size = 9), axis.text.y=element_text(colour="black", size = 9), panel.grid.major = element_blank(), 
        panel.grid.minor  = element_blank(), axis.title.y = element_text(size = 12))
})

#IgA data next
rep.plotting <- data_melt[data_melt$timepoint %in% c("Pre.challenge","1week")& data_melt$Type == "IgA",]
rep.plotting$Group <- factor(paste(rep.plotting$Pharyngitis, rep.plotting$timepoint, sep= "_"), 
                         levels = c("no pharyngitis_Pre.challenge", "no pharyngitis_1week", 
                                    "pharyngitis_Pre.challenge", "pharyngitis_1week"))
rep.plotting$Group <- ifelse(rep.plotting$Group == "no pharyngitis_Pre.challenge", "NP.Pre", ifelse(rep.plotting$Group == "no pharyngitis_1week", "NP.1wk",ifelse(rep.plotting$Group == "pharyngitis_Pre.challenge", "P.Pre", "P.1wk")))
rep.plotting$Group <- factor(rep.plotting$Group, 
                         levels = c("NP.Pre","NP.1wk","P.Pre",  "P.1wk" ))
rep.plotting$id <- factor(rep.plotting$id)
rep.plotting$Antigen <- factor(rep.plotting$Antigen , levels = Antigen.order.all )
antigen_levels <- unique(levels(rep.plotting$Antigen))

removeAgs <- c("GAC", "M75.HVR") # These are already plotted
is_in_removeAgs <- antigen_levels %in% removeAgs
antigen_levels <- antigen_levels[!is_in_removeAgs]


plot_list2 <- lapply(antigen_levels, function(antigen) {
  # Subset the data for the current level of 'Antigen'
  subset_df <- rep.plotting[rep.plotting$Antigen == antigen, ]
    ggplot(subset_df, aes(Group, value, fill = Pharyngitis))+
geom_violin(aes(alpha = timepoint), draw_quantiles = c(0.5), scale= "width", position = "dodge")+
geom_point(size = 3, alpha = 0.75, shape = 21)+
geom_line(aes(group = id, color = Pharyngitis), size = 0.35, alpha = 0.8)+
scale_fill_manual(values = c("blue", "red"))+
scale_alpha_manual(values = c(0.15, 0.5))+
scale_color_manual(values = c("blue", "red"))+
ylab(paste(antigen, "(ELISA AU)"))+
xlab("")+
coord_cartesian( ylim = c(0, max(subset_df$value, na.rm = T)*1.25))+
theme_bw()+
theme(legend.position = "none", axis.text.x=element_text(colour="black", size = 9), axis.text.y=element_text(colour="black", size = 9), panel.grid.major = element_blank(), 
        panel.grid.minor  = element_blank(), axis.title.y = element_text(size = 12))
})

# Step 2: Use lapply() to generate a list of ggplots for each 'Antigen' level
combined_plot1<- plot_grid(plotlist= plot_list1[1:9], ncol = 3, align = 'h')
combined_plot2<- plot_grid(plotlist = plot_list1[10:17], ncol = 3, align = 'h')
combined_plot3<- plot_grid(plotlist = plot_list2[1:9], ncol = 3, align = 'h')
combined_plot4<- plot_grid(plotlist = plot_list2[10:17], ncol = 3, align = 'h')

# Now, use ggsave to save the combined plot to a file
ggsave("combined_plots1_IgG.pdf", plot = combined_plot1, width = 8, height = 8, dpi = 300)
ggsave("combined_plots2_IgG.pdf", plot = combined_plot2, width = 8, height = 8, dpi = 300)
ggsave("combined_plots3_IgA.pdf", plot = combined_plot3, width = 8, height = 8, dpi = 300)
ggsave("combined_plots4_IgA.pdf", plot = combined_plot4, width = 8, height = 8, dpi = 300)

rm(plot_list1, plot_list2, combined_plot1, combined_plot2, combined_plot3, combined_plot4)

Fig 2G : Heatmap of correlation coefficients between fold change in IgA and IgG.

# heatmap function:
breaksList = c(-1.0, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1) # a ramp of vales for log2 FC 
heatpal <- c("#2166AC" ,"#3F8DC0", "#80B9D8" ,"#BBDAEA" ,"#F7F7F7" ,"#F4A582", "#D6604D", "#B2182B", "#67001F")

heatIT <- function(heated) {pheatmap(heated, 
         kmeans_k = NA, 
         breaks = breaksList, 
         color = colorRampPalette(heatpal)(11),
         border_color = "black",
         cellwidth = 11, cellheight =11, 
         # display_numbers = test_labels,
         number_format = "%",
         scale = "none", 
         cluster_rows = F,
         cluster_cols = F, 
         number_color = "grey",
         show_rownames = T, show_colnames = F, 
         fontsize = 8, display_numbers = F,
         filename = NA, silent = FALSE)
}

#extract variables
FC.Names <- names(data[,grepl("FC", names(data))])
#further subset variable names
IgA <- FC.Names[grepl("IgA", FC.Names)]
OneWeek <- FC.Names[grepl("week", FC.Names) ]
OneWeek <- OneWeek[!grepl("IgA", OneWeek) ]
OneMonth <- FC.Names[grepl("1month", FC.Names)]

#subset data by timepoint
IgA <- data[,IgA] 
OneWeek <- data[,OneWeek] 
OneMonth <- data[,OneMonth] 

#tidy variable names
names(IgA) <- gsub("IgA_", "", names(IgA))
names(IgA) <- gsub("_1weekFC", "", names(IgA))
names(OneWeek) <- gsub("IgG_", "", names(OneWeek))
names(OneWeek) <- gsub("_1weekFC", "", names(OneWeek))
names(OneMonth) <- gsub("IgG_", "", names(OneMonth))
names(OneMonth) <- gsub("_1monthFC", "", names(OneMonth))

# keep main antigens only
IgA <- IgA[,Antigen.Order]
OneWeek <- OneWeek[,Antigen.Order]
OneMonth <- OneMonth[,Antigen.Order]

#initiate matrix
Postchallenge = matrix(data = 0, nrow = 11, ncol = 4) #initialize matrix
for(i in 1:11){
  Postchallenge[i,1] <- cor.test(IgA[,i], OneWeek[,i], method = "spearman")$estimate #median FC pharyngitis participants
  Postchallenge[i,2] <- cor.test(IgA[,i], OneMonth[,i], method = "spearman")$estimate  #median FC non pharyngitis participants
  Postchallenge[i,3] <- cor.test(IgA[,i], OneWeek[,i], method = "spearman")$p.value 
  Postchallenge[i,4] <- cor.test(IgA[,i], OneMonth[,i], method = "spearman")$p.value
}
#make into a dataframe and name columns and rows
Postchallenge<- as.data.frame(Postchallenge)
names(Postchallenge) <- c("Rho.1week", "Rho.1month", "pvalOneweek", "pvalOneMonth")
row.names(Postchallenge) <- names(IgA)

#graph it
heated <- (Postchallenge[Antigen.Order,c(1,2)])
pdf(file = "IgA.correlationFcHeatmap.pdf", width = 3, height = c((nrow(heated)/12)+1))
heatIT(heated)
dev.off()

#remove dataframes and variables no longer needed 
rm(mo1, heated, Postchallenge, Postchallenge.IgA, Postchallenge_1wk, Postchallenge.IgA.1month, rep.plotting_iga, breaksList, is_in_removeAgs,heatpal)

Fig 3: Luminex data over 3 months

#define plotting function
PlotIT <-function(d){ggplot(data=d, aes(y=y)) +
  geom_line(aes(x=xj, group=id), color= "black", alpha = .8) +
  geom_point(data = d %>% filter(x=="1"), aes(x=xj), shape = 21, fill = 'darkgrey', size = 2.5,
              alpha = .7) +
   geom_point(data = d %>% filter(x=="2"), aes(x=xj), shape = 21, fill= '#F0E442', size = 2.5,
              alpha = .7) +
   geom_point(data = d %>% filter(x=="3"), aes(x=xj), shape = 21, fill= '#1FAF9D', size = 2.5,
              alpha = .7) +
  geom_point(data = d %>% filter(x=="4"), aes(x=xj), shape = 21, fill = '#521ADA', size = 2.5,
              alpha = .7) +
  geom_half_violin(
     data = d %>% filter(x=="1"),aes(x = x, y = y), draw_quantiles = c(0.5),position = position_nudge(x = 3.7), 
     side = "l", fill = 'darkgrey') +
 
   geom_half_violin(
     data = d %>% filter(x=="2"),aes(x = x, y = y), draw_quantiles = c(0.5),position = position_nudge(x = 2.9), 
     side = "l", fill = "#F0E442") +
  
   geom_half_violin(
     data = d %>% filter(x=="3"),aes(x = x, y = y), draw_quantiles = c(0.5),position = position_nudge(x = 2.1), 
     side = "l", fill = '#1FAF9D') +
 
   geom_half_violin(
     data = d %>% filter(x=="4"),aes(x = x, y = y), draw_quantiles = c(0.5),position = position_nudge(x = 1.3), 
     side = "l", fill = '#521ADA')+ 
  
   #Define additional settings
    scale_x_continuous(breaks=c(1,2,3,4), labels=c("Pre", "1wk","1mo", "3mo"), limits=c(0.75, 5.3)) +
   xlab("") + ylab(paste(var, "log10 ug/mL")) +
  coord_cartesian(ylim= ylimits)+
   ggtitle(paste(group, "over.time")) +
   custom_theme()+
    theme(axis.text.x=element_text(angle = 90))
}

# Function to adjust p-values
p_values <- function(d, e) {
  p <- c(
    wilcox.test(d[d$x == 1,]$y, d[d$x == 2,]$y, paired = TRUE)$p.value,
    wilcox.test(d[d$x == 1,]$y, d[d$x == 3,]$y, paired = TRUE)$p.value,
    wilcox.test(d[d$x == 1,]$y, d[d$x == 4,]$y, paired = TRUE)$p.value
  )
  p.adj.paired <- p.adjust(p, method = "fdr", n = length(p))
  
  p <- c(
    wilcox.test(d[d$x == 1,]$y, e[e$x == 1,]$y, paired = FALSE)$p.value,
    wilcox.test(d[d$x == 2,]$y, e[e$x == 2,]$y, paired = FALSE)$p.value,
    wilcox.test(d[d$x == 3,]$y, e[e$x == 3,]$y, paired = FALSE)$p.value,
    wilcox.test(d[d$x == 4,]$y, e[e$x == 4,]$y, paired = FALSE)$p.value
  )
  p.adj.groups <- p.adjust(p, method = "fdr", n = length(p))
  
  return(list(paired = p.adj.paired, groups = p.adj.groups))
}


#luminex data split out timepoints
OneMonth <-luminex[luminex$timepoint == "1month",]
PreChallenge <-luminex[luminex$timepoint == "Pre.challenge",]

#remove the 2 individuals with missing one month datapoints
PreChallenge <-PreChallenge[PreChallenge$id %in% OneMonth$id,]

#Create normalised dataframe
OneMonthFC <-OneMonth # to get the other vars
OneMonthFC[,c(6:11)] <- OneMonth[,c(6:11)] / PreChallenge[,c(6:11)]

# Keep data for participants with samples present at all time-points
keep.Ids <- luminex %>% group_by(id) %>% tally() %>% filter(n == 4) %>% pull(id)

#SpyCEP responders
var <- "SPYCEP"
#Split into group based on a 'non-response' of < 1.25 fold change
SpyCEP.non.responders <- OneMonthFC[which(OneMonthFC$SpyCEP < 1.25),]$id
#use this to generate new variable
luminex$SPYCEP.Resp <- if_else(luminex$id %in% SpyCEP.non.responders, "no","yes")
#define graph ylimits
ylimits <- c(-0.25, 2.25)

#graph SpyCEPresponders
group <- "responders"
d <- luminex[luminex$SPYCEP.Resp == "yes" & luminex$id %in% keep.Ids ,]
d$x <-as.numeric(d$timepoint)
d$xj <-jitter(d$x, amount = .09) 
d$y <-log(d$SpyCEP,10)
PlotIT(d)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

#graph SpyCEP non-responders
group <- "non-responders"
e <- luminex[luminex$SPYCEP.Resp == "no"& luminex$id %in% keep.Ids ,]
e$x <-as.numeric(e$timepoint)
e$xj <-jitter(e$x, amount = .09) 
e$y <-log(e$SpyCEP,10)
PlotIT(e)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

#write out p-values
print(p_values(d,e))

#SLO
var <- "SLO"
#Split into group based on a 'non-response' of < 1.25 fold change
SLO.non.responders <- OneMonthFC[which(OneMonthFC$SLO < 1.25),]$id
luminex$SLO.Resp <- if_else(luminex$id %in% SLO.non.responders, "no","yes")
#set graph limits
ylimits <- c(0.5, 2.5)

#graph responders
group <- "responders"
d <- luminex[luminex$SLO.Resp == "yes"& luminex$id %in% keep.Ids,]
d$x <-as.numeric(d$timepoint)
d$xj <-jitter(d$x, amount = .09) 
d$y <-log(d$SLO,10)
PlotIT(d)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

group <- "non-responders"
e <- luminex[luminex$SLO.Resp == "no"& luminex$id %in% keep.Ids ,]
e$x <-as.numeric(e$timepoint)
e$xj <-jitter(e$x, amount = .09) 
e$y <-log(e$SLO,10)
PlotIT(e)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

#write out p-values
print(p_values(d,e))

#SCPA 
var <- "SCPA"
ylimits <- c(1, 2.5)

SCPA.non.responders <- OneMonthFC[which(OneMonthFC$SCPA < 1.25),]$id
luminex$SCPA.Resp <- if_else(luminex$id %in% SCPA.non.responders, "no","yes")
luminex %>% filter(timepoint == "Pre.challenge" & id %in% keep.Ids) %>% count(SCPA.Resp)

group <- "responders"
d <- luminex[luminex$SCPA.Resp == "yes" & luminex$id %in% keep.Ids,]
d$x <-as.numeric(d$timepoint)
d$xj <-jitter(d$x, amount = .09) 
d$y <-log(d$SCPA,10)
PlotIT(d)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

group <- "non-responders"
e <- luminex[luminex$SCPA.Resp == "no"& luminex$id %in% keep.Ids ,]
e$x <-as.numeric(e$timepoint)
e$xj <-jitter(e$x, amount = .09) 
e$y <-log(e$SCPA,10)
PlotIT(e)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

#write out p-values
print(p_values(d,e))

##GAC 
var <- "GAC"
ylimits <- c(-1.5,1)

GAC.non.responders <- OneMonthFC[which(OneMonthFC$GAC < 1.25),]$id
luminex$GAC.Resp <- if_else(luminex$id %in% GAC.non.responders, "no","yes")
luminex %>% filter(timepoint == "Pre.challenge"& id %in% keep.Ids) %>% count(GAC.Resp)

group <- "responders"
d <- luminex[luminex$GAC.Resp == "yes" & luminex$id %in% keep.Ids,]
d$x <-as.numeric(d$timepoint)
d$xj <-jitter(d$x, amount = .09) 
d$y <-log(d$GAC,10)
PlotIT(d)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

group <- "non-responders"
e <- luminex[luminex$GAC.Resp == "no" & luminex$id %in% keep.Ids,]
e$x <-as.numeric(e$timepoint)
e$xj <-jitter(e$x, amount = .09) 
e$y <-log(e$GAC,10)
PlotIT(e)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

#write out p-values
print(p_values(d,e))

#DNAseB
var <- "DnaseB"
DnaseB.non.responders <- OneMonthFC[which(OneMonthFC$DnaseB < 1.25),]$id
luminex$Dnase.Resp <- if_else(luminex$id %in% DnaseB.non.responders, "no","yes")
luminex %>% filter(timepoint == "Pre.challenge"& id %in% keep.Ids) %>% count(Dnase.Resp)
ylimits <- c(-1,2.5)
group <- "responders"
d <- luminex[luminex$Dnase.Resp == "yes" & luminex$id %in% keep.Ids,]
d$x <-as.numeric(d$timepoint)
d$xj <-jitter(d$x, amount = .09) 
d$y <-log(d$DnaseB,10)
PlotIT(d)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

group <- "non-responders"
e <- luminex[luminex$Dnase.Resp == "no" & luminex$id %in% keep.Ids,]
e$x <-as.numeric(e$timepoint)
e$xj <-jitter(e$x, amount = .09) 
e$y <-log(e$DnaseB,10)
PlotIT(e)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

#write out p-values
print(p_values(d,e))

##SPNA
var <- "SpnA"
SpnA.non.responders <- OneMonthFC[which(OneMonthFC$SpnA < 1.25),]$id
luminex$SpnA.Resp <- if_else(luminex$id %in% SpnA.non.responders, "no","yes")
luminex %>% filter(timepoint == "Pre.challenge"& id %in% keep.Ids) %>% count(SpnA.Resp)
ylimits <- c(-1,2)
group <- "responders"
d <- luminex[luminex$SpnA.Resp == "yes" & luminex$id %in% keep.Ids,]
d$x <-as.numeric(d$timepoint)
d$xj <-jitter(d$x, amount = .09) 
d$y <-log(d$SpnA,10)
PlotIT(d)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

group <- "non-responders"
e <- luminex[luminex$SpnA.Resp == "no" & luminex$id %in% keep.Ids,]
e$x <-as.numeric(e$timepoint)
e$xj <-jitter(e$x, amount = .09) 
e$y <-log(e$SpnA,10)
PlotIT(e)
ggsave(file = paste(var, group, "lumi_over_time.pdf", sep = "_"), width = 2.5, height = 3.0)

#write out p-values
print(p_values(d,e))

rm(d, e, group, var, DnaseB.non.responders, GAC.non.responders, SCPA.non.responders, SLO.non.responders, SpnA.non.responders, SpyCEP.non.responders)

Fig 4: luminex compare responses to pharyngitis children

#join children dataset to pharyngitis participants from Challenge data 
luminex.keep <- luminex[
  luminex$timepoint %in% c("Pre.challenge", "1month") & 
  luminex$pharyngitis == "pharyngitis" & 
  !luminex$id %in% c("SN017", "SN075"), 
  c("id", "timepoint", "SCPA", "SpyCEP", "GAC", "SLO", "DnaseB", "SpnA")
]

names(luminex.children) <- names(luminex.keep)
luminex.keep <- rbind(luminex.keep, luminex.children)
luminex.keep$timepoint <- factor(luminex.keep$timepoint, levels = c("Pre.challenge", "1month", "healthy_child", "pharyngitis_child"))

# List of antigens
antigens <- c("SpyCEP", "SLO", "SCPA", "GAC", "DnaseB", "SpnA")

# Function to plot data for each antigen
plot_antigen_children <- function(data, antigen) {
  d <- data
  d$x <- as.numeric(d$timepoint)
  d$xj <- jitter(d$x, amount = .09)
  d$y <- log(d[[antigen]], 10)
  
  p <- ggplot(d, aes(timepoint, y, fill = timepoint)) +
    geom_violin(aes(alpha = timepoint), draw_quantiles = c(0.5), scale = "width", position = "dodge") +
    geom_beeswarm(size = 3, alpha = 0.65, cex = 2, shape = 21) +
    scale_fill_manual(values = c("red", "red", "#FFC107", "darkorange3")) +
    scale_alpha_manual(values = c(0.15, 0.5, 0.5, 0.5)) +
    scale_color_manual(values = c("red", "red", "#FFC107", "darkorange3")) +
    ylab(antigen) +
    xlab("") +
    coord_cartesian(ylim = c(min(d$y, na.rm = TRUE) * 1.25, max(d$y, na.rm = TRUE) * 1.25)) +
    custom_theme()
  
  ggsave(file = paste(antigen, "children.pdf", sep = "_"), plot = p, width = 4, height = 5)
}

# Iterate through each antigen and plot
for (antigen in antigens) {
  plot_antigen_children(luminex.keep, antigen)
}

#annotated p-value luminex healthy
luminex_pvals = matrix(data = 0, nrow = 6, ncol = 12) #initialize matrix
for(i in 1:6){
  luminex_pvals[i,1] <- wilcox.test(luminex.keep[c(1:17),c(i+2)],luminex.keep[c(18:34),c(i+2)], paired = T, alternative = "two.sided", na.rm = T)$p.value
  luminex_pvals[i,2] <- wilcox.test(luminex.keep[c(1:17),c(i+2)],luminex.keep[c(35:49),c(i+2)], paired = F, alternative = "two.sided", na.rm = T)$p.value
  luminex_pvals[i,3] <- wilcox.test(luminex.keep[c(1:17),c(i+2)],luminex.keep[c(50:73),c(i+2)], paired = F, alternative = "two.sided", na.rm = T)$p.value
    luminex_pvals[i,4] <- wilcox.test(luminex.keep[c(18:34),c(i+2)],luminex.keep[c(35:49),c(i+2)], paired = F, alternative = "two.sided", na.rm = T)$p.value
  luminex_pvals[i,5] <- wilcox.test(luminex.keep[c(18:34),c(i+2)],luminex.keep[c(50:73),c(i+2)], paired = F, alternative = "two.sided", na.rm = T)$p.value
    luminex_pvals[i,6] <- wilcox.test(luminex.keep[c(35:49),c(i+2)],luminex.keep[c(50:73),c(i+2)], paired = F, alternative = "two.sided", na.rm = T)$p.value
}
for(i in 1:6){
luminex_pvals[i,c(7:12)] <- p.adjust(luminex_pvals[i,c(1:6)], method = "fdr" )
}
luminex_pvals<- as.data.frame(luminex_pvals)
names(luminex_pvals) <- c("prevspost","pre_vs_healthy","pre_vs_pharyn","post_vs_healthy","post_vs_pharyn", "healthy_vs_pharyn",
                          "prevspost.adj","pre_vs_healthy.adj","pre_vs_pharyn.adj","post_vs_healthy.adj","post_vs_pharyn.adj",
                          "healthy_vs_pharyn.adj")

rownames(luminex_pvals) <- names(luminex.keep)[3:8]
write.csv(file = "children.luminex.csv", luminex_pvals)

rm(luminex_pvals, antigens, luminex.keep)

Figure 5: Luminex: Correlation of pre-challenge titre with 1 month fold change:

#define plotting function
plot_and_save <- function(data, fc_data, antigen, file_name, xlim, ylim) {
  p <- ggplot(data, aes(y = log(data[[antigen]], 10), x = log(fc_data[[antigen]], 10))) +
    geom_point(shape = 21, cex = 3, alpha = 0.75, aes(fill = pharyngitis)) +
    scale_fill_manual(values = c("blue", "red")) +
    ylab("log10 Pre challenge") +
    xlab("log10 FC IgG 1month") +
    geom_smooth(method = 'lm', se = FALSE, colour = "black") +
    coord_cartesian(ylim = ylim, xlim = xlim) +
    custom_theme()
  
  ggsave(file = file_name, plot = p, width = 3, height = 3)
  
  # Spearman Correlation p-value
  cor_result <- cor.test(data[[antigen]], fc_data[[antigen]], method = "spearman")
  print(cor_result)
  return(cor_result)
}

antigens <- c("SpyCEP", "SLO", "SCPA", "GAC", "SpnA", "DnaseB")
xlims <- list(c(-0.5, 2.5), c(-0.1, 0.8), c(-0.1, 0.8), c(-0.5, 1.5), c(-0.1, 1.5), c(-0.5, 2.5))
ylims <- list(c(-0.5, 2.5), c(0.5, 2.5), c(0.5, 2.5), c(-1.5, 1.5), c(-1.5, 1.65), c(-1, 2.5))

# Loop through each antigen and plot
for (i in seq_along(antigens)) {
  antigen <- antigens[i]
  xlim <- xlims[[i]]
  ylim <- ylims[[i]]
  file_name <- paste("PrechallengevsFC_", antigen, "_luminex.pdf", sep = "")
  
  plot_and_save(PreChallenge, OneMonthFC, antigen, file_name, xlim, ylim)
}

# now repeat for M75 IgG and for ScpA IgA, bot ELISA data
ggplot(data, aes(y = log(IgA_ScpA_Pre.challenge,10), x = log(data$IgA_ScpA_1weekFC,10)))+
  geom_point(shape = 21, cex = 3, alpha = 0.75, aes(fill =Pharyngitis))+
scale_fill_manual(values = c("blue", "red"))+
ylab("log10 Pre challenge")+
xlab("log 10 FC IgG 1month")+
  geom_smooth(method='lm', se = FALSE, colour = "black")+
coord_cartesian(ylim = c(0.5,2.5), xlim= c(-0.5, 0.5))+
custom_theme()
ggsave(file = "PrechallengevsFC_IgA_Scpa.pdf", width = 3, height = 3 )

ggplot(data, aes(y = log(IgG_M75_Pre.challenge,10), x = log(data$IgG_M75_1weekFC,10)))+
  geom_point(shape = 21, cex = 3, alpha = 0.75, aes(fill =Pharyngitis))+
scale_fill_manual(values = c("blue", "red"))+
ylab("log10 Pre challenge")+
xlab("log10 FC IgG 1month")+
  geom_smooth(method='lm', se = FALSE, colour = "black")+
coord_cartesian(ylim = c(1,4), xlim= c(-0.25, 0.5))+
custom_theme()
ggsave(file = "PrechallengevsFC_M75.pdf", width = 3, height = 3 )

rm(PreChallenge, OneMonth, OneMonthFC)

Fig 6 and 7: multivariate analysis - data preparation

#build a big dataframe of pre-challenge data variables, and substitute luminex for ELISA where antigens are in common. 
#keep baseline only. Substitute luminex for ELISA
PreChallenge <- data[,grepl("Pre.challenge",names(data))]
# add in clinical variables to include in Figure 7. 
PreChallenge <-cbind(PreChallenge, data[,c(204,1,2,208:ncol(data))]) 
#reorder variables
PreChallenge <- PreChallenge[,c(36:ncol(PreChallenge), 1:35)]
#create new dataframe for removing non-IgG variables
PreChallenge.lumi  <-luminex[luminex$timepoint == "Pre.challenge",c(1, 6:11)]
PreChallenge <- PreChallenge[,!names(PreChallenge) %in% c("IgG_SPYCEP_Pre.challenge","IgG_GAC_Pre.challenge","IgG_ScpA_Pre.challenge","IgG_SLO_Pre.challenge")]
PreChallenge <- left_join(PreChallenge, PreChallenge.lumi, by = "id")

Fig 6: Multidimensional scaling

set.seed(36) # ensures same plots each time for mds

#remove anything thats not IgG
PreChallenge_trim <- PreChallenge[,!grepl("IgA_", names(PreChallenge))]
PreChallenge_trim <- PreChallenge_trim[,!grepl("Adehesion_percent_Pre_saliva", names(PreChallenge_trim))]
# remove clinical variables and antigen response tally. Create the Prechallenge.z dataframe for use later
PreChallenge.z <- PreChallenge_trim[,c(1,14,17:ncol(PreChallenge_trim))] 
#tidy names
names(PreChallenge.z) <- gsub("IgG_", "", names(PreChallenge.z))
names(PreChallenge.z) <- gsub("_Pre.challenge", "", names(PreChallenge.z))

#Select one month data
OneMonth <- data[,grepl("1month",names(data))]
OneMonth <- OneMonth[,!grepl("FC",names(OneMonth))]
OneMonth <-cbind(OneMonth, data[,c(1,219)])
#reorder
OneMonth <- OneMonth[,c(18:ncol(OneMonth), 1:17)]
OneMonth.lumi  <-luminex[luminex$timepoint == "1month", c(1, 6:11)]
#remove ELISA vars
OneMonth <- OneMonth[,!names(OneMonth) %in% c("IgG_SPYCEP_1month","IgG_GAC_1month","IgG_ScpA_1month","IgG_SLO_1month")]
OneMonth <- left_join(OneMonth, OneMonth.lumi, by = "id")
#Create the OneMonth.z dataframe for use later
One.Month.z <- OneMonth 
#Tidy names
names(One.Month.z) <- gsub("IgG_", "", names(One.Month.z))
names(One.Month.z) <- gsub("_1month", "", names(One.Month.z))

#select 3 months data
ThreeMonths <- data[,grepl("3months",names(data))]
ThreeMonths <- ThreeMonths[,!grepl("FC",names(ThreeMonths))]
ThreeMonths <-cbind(ThreeMonths, data[,c(1,219)])
#reorder
ThreeMonths <- ThreeMonths[,c(18:ncol(ThreeMonths), 1:17)]
ThreeMonths.lumi  <-luminex[luminex$timepoint == "3months", c(1, 6:11)]
#remove ELISA vars
ThreeMonths <- ThreeMonths[,!names(ThreeMonths) %in% c("IgG_SPYCEP_3months","IgG_GAC_3months","IgG_ScpA_3months","IgG_SLO_3months")]
ThreeMonths <- left_join(ThreeMonths, ThreeMonths.lumi, by = "id")
#Create the Threemonths.z dataframe for use later
ThreeMonths.z <- ThreeMonths
names(ThreeMonths.z) <- gsub("IgG_", "", names(ThreeMonths.z))
names(ThreeMonths.z) <- gsub("_3months", "", names(ThreeMonths.z))

#double check names are consistent 
names(PreChallenge.z) == names(One.Month.z)
names(ThreeMonths.z) == names(One.Month.z)

#add timepoint variable
PreChallenge.z$timepoint <- "Pre"
One.Month.z$timepoint <- "1mo"
ThreeMonths.z$timepoint <- "3mo"

#bind together
data.z <- rbind(PreChallenge.z, One.Month.z, ThreeMonths.z)
data.z$timepoint <- factor(data.z$timepoint, levels =c("Pre","1mo", "3mo"))
#reorder
data.z <- data.z[,c(1, 2, 22, 3:21)]

#Zscore transform data
norm.study <- as.data.frame(data.z[,c(4:22)])

# define variables with which to normalise data using the 5-95% intervals of the data
conf.Int <- matrix(nrow = ncol(norm.study), ncol =4)
for(i in 1:19){
conf.Int[i,1] <- quantile(norm.study[,i], probs=c(0.05), na.rm= T) # 5th percentile
conf.Int[i,2] <- quantile(norm.study[,i], probs=c(0.95), na.rm= T) # 95th Percentile
conf.Int[i,3] <-  mean(norm.study[,i], na.rm= T) # determine mean of data within 90th percentile 
conf.Int[i,4] <-  sd(norm.study[,i], na.rm= T) # determine st dev of data within 90th percentile 
}
#normalise the data
for(i in 1:19){
  norm.study[,i] <- (norm.study[,i] - conf.Int[i,3])/conf.Int[i,4] #normalise data using this mean and stdev
}

#replace transformed data
data.z[,c(4:22)] <- norm.study


#filter data for graphing
plotting <- data.z
plotting <- data.z[complete.cases(data.z[,c(4:22)]),] # 2 individuals with missing 1 month and 3 month samples are removed
plotting <- plotting[plotting$timepoint %in% c("Pre", "1mo"),]

Y <- plotting$Pharyngitis #outcome variable
X <- plotting[,c(4:22)] #only numerical data
Z <- plotting$timepoint

#add in response variable (see line 562 )
plotting$four_responders <- if_else(plotting$id %in% four.responders, "Y", "N") 

#MDS
data.cor <- cor(t(X), method = "spearman")
# cluster
d <- dist(data.cor) # determine distance matrix
xcMDS <- monoMDS(d, k =2) # reduces dimensionality to 2 dimensions. 

#plot Multidimensional scaled data 
ggplot(data = plotting, aes(x = xcMDS$points[,1], y = xcMDS$points[,2]))+
  theme_classic() +
  geom_point(show.legend = F, cex = 4, alpha = 0.75, aes(color = Y, fill = Y, shape = Z))+
  scale_fill_manual(values = c("blue", "red"))+
  scale_color_manual(values = c("blue", "red"))+
  scale_shape_manual(values = c(21, 22, 23, 24, 25))+
  guides(fill = FALSE, color = FALSE)+
  ylab("MDS dim 2")+
  xlab("MDS dim 1")
ggsave("MDS.color.by.outcome.seed36.pdf", width = 3, height = 2.5)

#plot MDS plot coloured by ID with connecting lines
#define colour palette
P25 = createPalette(25,  c("#ff0000", "#00ff00", "#0000ff"))
names(P25) <- NULL

ggplot(data = plotting, aes(x = xcMDS$points[,1], y = xcMDS$points[,2]))+
  theme_classic() +
    geom_line(aes(color = id), size =2)+
geom_point(show.legend = F, cex = 4, alpha = 0.75, aes( fill = id, shape = Z)) +
  scale_fill_manual(values = P25)+
  scale_color_manual(values = P25)+
  scale_shape_manual(values = c(21, 22, 23, 24, 25))+
  guides(fill = FALSE, color = FALSE)+
  ylab("MDS dim 2")+
  xlab("MDS dim 1")
ggsave("MDS.color.by.id.seed36.pdf", width = 3, height = 2.5)

#plot MDS data coloured by four_responders
ggplot(data = plotting, aes(x = xcMDS$points[,1], y = xcMDS$points[,2]))+
  theme_classic() +
  geom_point(show.legend = F, cex = 4, alpha = 0.75, aes(fill = plotting$four_responders, shape = timepoint)) +
  scale_fill_manual(values = c("white", "dark green"))+
  scale_shape_manual(values = c(21, 22, 23))+
  # guides(fill = F)+
  ylab("MDS dim 2")+
  xlab("MDS dim 1")
ggsave("MDS.color.by.count.pdf", width = 3, height = 2.5)

#remove datasets not used again
rm(PreChallenge.lumi, PreChallenge_trim, OneMonth.lumi, ThreeMonths.lumi, ThreeMonths, OneMonth, PreChallenge.z, One.Month.z, ThreeMonths.z, norm.study, X, Y, Z, data.cor, d, P25)

Fig 6 (continued) : compare MDS1

# add new variable to plotting dataframe
plotting$mds1 <- xcMDS$points[,1]

#subset plotting data into each timepoint
mds1.pre <- plotting[plotting$timepoint %in% "Pre",]
mds1.post <- plotting[plotting$timepoint %in% "1mo",]
#create a post-pre MDS1 variable
mds1.post$mds1 <- mds1.post$mds1-mds1.pre[mds1.pre$id %in% mds1.post$id,]$mds1

#plots for 7B 
ggplot(mds1.pre, aes(Pharyngitis, mds1, fill = Pharyngitis))+
  geom_violin(draw_quantiles = c(0.5), scale= "width", alpha = 0.25)+
  geom_beeswarm(size = 3, alpha = 0.75, cex =5, shape = 21)+
  scale_fill_manual(values = c("blue", "red"))+
  ylab("NMDS dim 1")+
  xlab("")+
  stat_compare_means(method = "wilcox.test", tip.length = 0.005, bracket.size = 0.5, vjust = -1, paired = F,
                     aes(label = paste0("p = ", ..p.format..)))+
  coord_cartesian( ylim = c(-1.50, 1.5))+
  theme_bw()+
  theme(legend.position = "none", axis.text.x=element_text(colour="black", size = 14), axis.text.y=element_text(colour="black", size = 10), panel.grid.major = element_blank(),
        panel.grid.minor  = element_blank(), axis.title.y = element_text(size = 14))
ggsave(file = "NMDS_pre.pdf", width = 2.25, height = 3 )

ggplot(mds1.post, aes(Pharyngitis, mds1, fill = Pharyngitis))+
  geom_violin(draw_quantiles = c(0.5), scale= "width", alpha = 0.25)+
  geom_beeswarm(size = 3, alpha = 0.75, cex =5, shape = 21)+
  scale_fill_manual(values = c("blue", "red"))+
  ylab("dim 1 change (1mo-Pre)")+
  xlab("")+
  stat_compare_means(method = "wilcox.test", tip.length = 0.005, bracket.size = 0.5, vjust = -1, paired = F,
                     aes(label = paste0("p = ", ..p.format..)))+
  coord_cartesian( ylim = c(-0.50, 1.5))+
  theme_bw()+
  theme(legend.position = "none", axis.text.x=element_text(colour="black", size = 14), axis.text.y=element_text(colour="black", size = 10), panel.grid.major = element_blank(),
        panel.grid.minor  = element_blank(), axis.title.y = element_text(size = 14))
ggsave(file = "NMDS_post.pdf", width = 2.25, height = 3)

# Fig 6C: 
mds.Corr = matrix(data = 0, nrow =19 , ncol = 2) #initialize matrix
for(i in 1:19){
  mds.Corr[i,1] <- cor.test(plotting[,c(i+3)], plotting$mds1, method = "spearman")$estimate
  mds.Corr[i,2] <- cor.test(plotting[,c(i+3)], plotting$mds1, method = "spearman")$p.value
}
rownames(mds.Corr) <- names(plotting[,c(4:22)])
mds.Corr <- as.data.frame(mds.Corr)
mds.Corr$adj <- p.adjust(mds.Corr[,2], method = "fdr")

mds.Corr <- mds.Corr[mds.Corr$adj <0.05,]
mds.Corr <- mds.Corr[order(mds.Corr$V1),]
heatpal <- c("seagreen","white", "mediumpurple3")
#plot a heatmap or Rho values, but ignore columns 2 and 3. 
pdf(file = "Correlation.with.mds.pdf")
pheatmap(mds.Corr[,], 
         kmeans_k = NA, 
         # breaks = breaksList, 
         color = colorRampPalette(heatpal)(1000),
         border_color = "black",
         cellwidth = 11, cellheight =11, 
         # display_numbers = test_labels,
         number_format = "%",
         scale = "none", 
         cluster_rows = F,
         cluster_cols = F, 
         number_color = "grey",
         show_rownames = T, show_colnames = T, 
         fontsize = 8, display_numbers = F,
         filename = NA, silent = FALSE)
dev.off()

#remove variables no longer needed 
rm(plotting, mds1.pre, mds1.post, mds.Corr)

Fig 7: Correlation with outcome variables

#automate comparisons between pre-challenge immune variables and outcome vars
Outcome.Corr = matrix(data = 0, nrow =37 , ncol = 16) #initialize matrix
for(i in 1:37){
  Outcome.Corr[i,1] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Time_to_diagnosis, method = "spearman")$estimate
  Outcome.Corr[i,2] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Peak_Ct, method = "spearman")$estimate
  Outcome.Corr[i,3] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Tonsil, method = "spearman")$estimate
  Outcome.Corr[i,4] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Adenopathy, method = "spearman")$estimate
  Outcome.Corr[i,5] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Temp_max, method = "spearman")$estimate
  Outcome.Corr[i,6] <-   cor.test(PreChallenge[,c(i+16)], PreChallenge$CRP, method = "spearman")$estimate
  Outcome.Corr[i,7] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Pain, method = "spearman")$estimate
  Outcome.Corr[i,8] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Headache, method = "spearman")$estimate

  Outcome.Corr[i,9] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Time_to_diagnosis, method = "spearman")$p.value
  Outcome.Corr[i,10] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Peak_Ct, method = "spearman")$p.value
  Outcome.Corr[i,11] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Tonsil, method = "spearman")$p.value
  Outcome.Corr[i,12] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Adenopathy, method = "spearman")$p.value
  Outcome.Corr[i,13] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Temp_max, method = "spearman")$p.value
  Outcome.Corr[i,14] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$CRP, method = "spearman")$p.value
  Outcome.Corr[i,15] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Pain, method = "spearman")$p.value
  Outcome.Corr[i,16] <- cor.test(PreChallenge[,c(i+16)], PreChallenge$Headache, method = "spearman")$p.value
}

#make into a dataframe and rename columns and rows
Outcome.Corr <- as.data.frame(Outcome.Corr)
names(Outcome.Corr) <- c("Rho_time","Rho_CT","Rho_Tonsil","Rho_Adeno","Rho_Temp","Rho_CRP","Rho_Pain","Rho_Head",
                        "p_time","p_CT","p_Tonsil", "p_Adeno", "p_Temp","p_CRP","p_Pain","p_Head")
rownames(Outcome.Corr) <- names(PreChallenge[,c(17:ncol(PreChallenge))])
# save a csv of the p-values
write.csv(Outcome.Corr, file = "outcomeCorr2.csv")

#Keep any variables with a p<0.05 for any correlation 
plotting <- Outcome.Corr[c("IgG_ADI_Pre.challenge","IgG_TF_Pre.challenge","IgG_J8_Pre.challenge","IgG_K4S2_Pre.challenge","DnaseB","IgG_SPYAD_Pre.challenge","IgA_P145_Pre.challenge","SpyCEP","SpnA","IgG_T25_Pre.challenge"),1:8]
#simplify names
rownames(plotting) <-c("ADI","TF","J8","K4S2","DnaseB","SpyAD","P145_IgA","SpyCEP","SpnA","T25")

#reorder by correlation coefficient for 'Time to diagnosis'
names(plotting) <- gsub("Rho_", "",names(plotting) )
# graph it
heatpal <- c("seagreen","white", "mediumpurple3")
pdf(file = "Correlation.with.outcome.pdf")
pheatmap(plotting, 
         kmeans_k = NA, 
         color = colorRampPalette(heatpal)(1000),
         border_color = "black",
         cellwidth = 11, cellheight =11, 
         # display_numbers = test_labels,
         number_format = "%",
         scale = "none", 
         cluster_rows = F,
         cluster_cols = F, 
         number_color = "grey",
         show_rownames = T, show_colnames = T, 
         fontsize = 8, display_numbers = F,
         filename = NA, silent = FALSE)
dev.off()

#remove variables not used elsewhere: 
rm(plotting, Outcome.Corr, heatpal)

Figure 7 (continued) : sPLS-DA

set.seed(36)
Y <- PreChallenge$Pharyngitis #outcome variable
X <- PreChallenge[,c(17:ncol(PreChallenge))] #only numerical data 

#initial model
srbct.splsda <- splsda(X, Y, ncomp = 10)  # set ncomp to 10 for performance assessment later

# plot the samples projected onto the first two components of the PLS-DA subspace
plotIndiv(srbct.splsda , comp = 1:2, 
          group = PreChallenge$Pharyngitis, ind.names = FALSE,  # colour points by class
          ellipse = TRUE, # include 95% confidence ellipse for each class
          legend = TRUE, title = 'sPLS-DA untuned')

# undergo performance evaluation in order to tune the number of components to use
perf.splsda.srbct <- perf(srbct.splsda, validation = "Mfold", folds = 3, 
                progressBar = F, auc = TRUE, nrepeat = 100)

perf.splsda.srbct$choice.ncomp # what is the optimal value of components according to perf()

# grid of possible keepX values that will be tested for each component
list.keepX <- c(1:37)

# undergo the tuning process to determine the optimal number of variables
tune.splsda.srbct <- tune.splsda(X, Y, ncomp = 2, validation = 'Mfold', folds = 3,
                   progressBar = F, dist = 'mahalanobis.dist', measure = "BER",
                   test.keepX = list.keepX, nrepeat = 200, cpus=3) # allow for parallelisation to decrease runtime
#run final model
final.splsda <- splsda(X, Y, 
                       ncomp = 2, 
                       keepX = 5)

pdf(file = "sPLSDA.2dim.pdf", width = 3, height = 3)
plotIndiv(final.splsda , comp = 1:2, 
          group = PreChallenge$Pharyngitis, ind.names = FALSE,  # colour points by class
          ellipse = TRUE, ellipse.level = 0.95, # include 95% confidence ellipse for each class
          legend = F, title = 'sPLS-DA')
dev.off()

pdf(file = "sPLSDA.loadings.pdf", width = 4.5, height = 3)
ContribComp1 <- plotLoadings(final.splsda, comp = 1, 
             contrib = 'max', method = 'mean', xlim = c(-0.4, 0.8))
dev.off()

selectVar(final.splsda, comp=1)$name[which(ContribComp1$GroupContrib == "no pharyngitis")]
selectVar(final.splsda, comp=1)$name[which(ContribComp1$GroupContrib == "pharyngitis")]

# Graph up raw data of results
Select.var <- "IgG_TF_Pre.challenge"
# Select.var <- "IgG_ADI_Pre.challenge"
# Select.var <- "IgA_P145_Pre.challenge"
# Select.var <- "SpnA"
# Select.var <- "IgG_T25_Pre.challenge"

ggplot(PreChallenge, aes(Pharyngitis, log(PreChallenge[,Select.var],10), fill = Pharyngitis))+
  geom_violin(draw_quantiles = c(0.5), scale= "width", alpha = 0.25)+
  geom_beeswarm(size = 3, alpha = 0.75, cex =5, shape = 21)+
  scale_fill_manual(values = c("blue", "red"))+
  ylab("")+
  xlab("")+
  stat_compare_means(method = "wilcox.test", tip.length = 0.005, bracket.size = 0.5, vjust = -1, paired = F,
                     aes(label = paste0("p = ", ..p.format..)))+
  coord_cartesian( ylim = c(0, 3))+
  theme_bw()+
  theme(legend.position = "none", axis.text.x=element_text(colour="black", size = 14), axis.text.y=element_text(colour="black", size = 10), panel.grid.major = element_blank(),
        panel.grid.minor  = element_blank(), axis.title.y = element_text(size = 14))
  ggsave(file = paste(Select.var, "Pre.challenge.pdf", sep = ""), width = 2, height = 4)

系统信息

R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.3      mixOmics_6.26.0    MASS_7.3-60.0.1    uwot_0.1.16        Matrix_1.6-5       Polychrome_1.5.1   gghalves_0.1.4    
 [8] ggpubr_0.6.0       vegan_2.6-4        lattice_0.22-6     permute_0.9-7      viridis_0.6.5      viridisLite_0.4.2  RColorBrewer_1.1-3
[15] pheatmap_1.0.12    ggbeeswarm_0.7.2   reshape2_1.4.4     lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4       
[22] purrr_1.0.2        readr_2.1.5        tidyr_1.3.1        tibble_3.2.1       tidyverse_2.0.0    ggplot2_3.5.1     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1     vipor_0.4.7          fastmap_1.1.1        digest_0.6.35        timechange_0.3.0     lifecycle_1.0.4     
 [7] cluster_2.1.6        magrittr_2.0.3       compiler_4.3.3       rlang_1.1.4          tools_4.3.3          igraph_2.0.3        
[13] utf8_1.2.4           yaml_2.3.8           knitr_1.49           ggsignif_0.6.4       rARPACK_0.11-0       scatterplot3d_0.3-44
[19] plyr_1.8.9           abind_1.4-5          BiocParallel_1.36.0  withr_3.0.2          grid_4.3.3           fansi_1.0.6         
[25] colorspace_2.1-0     scales_1.3.0         cli_3.6.3            ellipse_0.5.0        rmarkdown_2.26       generics_0.1.3      
[31] RSpectra_0.16-1      rstudioapi_0.16.0    tzdb_0.4.0           splines_4.3.3        parallel_4.3.3       matrixStats_1.2.0   
[37] vctrs_0.6.5          carData_3.0-5        car_3.1-2            hms_1.1.3            rstatix_0.7.2        ggrepel_0.9.6       
[43] beeswarm_0.4.0       glue_1.8.0           codetools_0.2-19     stringi_1.8.3        gtable_0.3.4         munsell_0.5.0       
[49] pillar_1.9.0         htmltools_0.5.8      R6_2.5.1             evaluate_0.23        backports_1.4.1      broom_1.0.5         
[55] corpcor_1.6.10       Rcpp_1.0.12          gridExtra_2.3        nlme_3.1-164         mgcv_1.9-1           xfun_0.50           
[61] pkgconfig_2.0.3