Ring : Pipeline for the analysis of multiplex immunofluorescence stainings.

In article, please cite the original paper: Huyghe, N., Benidovskaya, E., Beyaert, S., Daumerie, A., Maestre Osorio, F., Aboubakar Nana, F., Bouzin, C., Van den Eynde, M. Multiplex Immunofluorescence Combined with Spatial Image Analysis for the Clinical and Biological Assessment of the Tumor Microenvironment. J. Vis. Exp. (196), e65220, doi:10.3791/65220 (2023).

Code written by: Benidovskaya Elena, Beyaert Simon and Huyghe Nicolas (22.02.2023)

Packages to load for this script

library(tidyverse)
library(ggplot2)
library(tibble)
library(plyr)
library(spatial)
library(spatstat)
library(contoureR)
library(sp)
library(concaveman)
library(maptools)
library(sf)
library(ggpubr)
library(fpc)
library(dbscan)
library(tmap)
library(ComplexHeatmap) ##install from Bioconductor
library(reshape)
library(SummarizedExperiment) ##install from Bioconductor
library(patchwork)
library(BBmisc)
library(readxl)
library(dplyr)
library(Rcpp)
library(circlize)
library(DescTools)

Multiplex data transformation from HALO to R

Setting the inputs and outputs

output_data <- "~/project/tables/" ## set directory to output data
output_graphs <- "~/project/graphs/" ## set directory to output graphs

wd_data <- "~/project/data/"
wd_tables <- "~/project/tables/"

x <- list.files("~/project/data", pattern = ".csv") ## creates a vector with all file names that ends with .csv in your folder

y <- "df" ## creates a variable where you store the name of the final data frame summarizing the densities of cells of interest for each sample you have

Creating a function to transform your data automatically

When exporting your data from an image analysis software, patient by patient, you end up with a .csv type file that you are going to analyze in R. This data contains the location of each cell and the markers (Alexa FLuors for example) for which your cell is positive.

When giving you the location,the program gives you a X/Y min and X/Y max for the borders of the cell. To simplify the analysis, we are going to calculate the mean of each point.

Then we are going to select only the columns of the data frame that are interesting for us (for which Alexa FLuor is the cell positive?) and we are going to rename those columns for a easier usage. Here for example, we used Alexa Fluor 647 for the CD8 T cell receptor, ...

After what we can start to create loops according to the cell types we want to study. In fact, different cells are defined according to the markers we use, for example CD8+ T cells should be positive for CD8 and CD3 marker; epithelial tumor cells should be positive to hPanCK marker; but if a cell is positive to hPanCK and CD3/CD8 then it's probably an immune cell lost in the tumor tissue (so it should be counted as a immune cell). While defining the different cells, don't forget to define markers that should be positive (== 1) and negative (== 0). If you have problems with overlapping markers, you can try to manage them in your loops. Then you create a column flag where you put the result of your loops and you save the new tables in the folder you created earlier. Finally you apply those loops to all the list defined in the variable x.

setwd(wd_data) ##set working directory (from the R project)

main_table <- function(a){
  Main_table <- read.csv(a)
  Main_table <- mutate(Main_table, X = (Main_table$XMin + Main_table$XMax)/2)
  Main_table <- mutate(Main_table, Y = (Main_table$YMin + Main_table$YMax)/2)
  Main_table <- select(Main_table, -c(XMin, XMax, YMin, YMax))
  Main_table <- select(Main_table, c("Alexa.Fluor.647.Positive.Classification","Alexa.Fluor.555.Positive.Classification","Alexa.Fluor.488.Positive.Classification", "Classifier.Label" , "X", "Y" ))
  colnames(Main_table) <- c("CD8", "CD3", "hPanCK", "Classifier Label","X","Y")
  
  output <- character(dim(Main_table)[1])
  condition <- Main_table$CD8 == 1 & Main_table$CD3 == 1
  
  for(i in (1:dim(Main_table)[1])[condition]) {
    if(condition[i]) {
      output[i] <- "CD8+ CD3+"
    } else {
      output[i] <- "0"
    }}
  
  condition <- Main_table$hPanCK == 1
  
  for(i in (1:dim(Main_table)[1])[condition]) {
    if(condition[i]) {
      output[i] <- "Tumor cells"
    }}
  
  condition <- Main_table$hPanCK == 1 & Main_table$CD8 == 1 & Main_table$CD3 == 1
  
  for(i in (1:dim(Main_table)[1])[condition]) {
    if(condition[i]) {
      output[i] <- "CD8+ CD3+"
    }}
  
  condition <- Main_table$hPanCK == 0 & Main_table$CD8 == 0 & Main_table$CD3 == 0
  
  for(i in (1:dim(Main_table)[1])[condition]) {
    if(condition[i]) {
      output[i] <- "Stromal cells"
    }}
  
  condition <- Main_table$CD8 == 0 & Main_table$CD3 == 1
  
  for(i in (1:dim(Main_table)[1])[condition]) {
    if(condition[i]) {
      output[i] <- "CD3+"
    }}
  
  condition <- Main_table$CD8 == 1 & Main_table$CD3 == 0
  
  for(i in (1:dim(Main_table)[1])[condition]) {
    if(condition[i]) {
      output[i] <- "CD8+ CD3-"
    }}
  
  Main_table[,"flag"] <- output
  
  output_table <- paste(output_data, str_replace(a, ".csv", ""),".csv", sep="")
  write_csv(Main_table, file=output_table, col_names = TRUE)
  
}

lapply(FUN = main_table, x)

Transforming your data in mm

Chech the resolution of your scanner an dtransform your data in mm. Here as an example, the resolution of the scanner is 0.325 so you need to multiply your X/Y by the resolution and transform in mm (x 0.001).

setwd(wd_tables)

main <- function(a){
  
  Main_table <- read.csv(a)
  Main_table <- Main_table %>%
    mutate(Xadj = X * 0.325 * 0.001) %>%
    mutate(Yadj = Y * 0.325 * 0.001)
  
  output_table <- paste(output_data, str_replace(a, ".csv", ""),".csv", sep="")
  write_csv(Main_table, file=output_table, col_names = TRUE)
}

lapply(FUN = main, x)

Visualisation of your biopsies

You can then create a function that is going to take the coordinates and create a visualisation of your biopsies (the axis represent mm and you can assign a color to each cell type you defined earlier).

setwd(wd_tables)

histo_plot <- function(a){
  
  Main_table <- read.csv(a)
  
  plotly <- ggplot(Main_table, aes(x = Xadj, y = Yadj, col = flag)) +
    geom_point(size = 0.1) + ## size of the points on the graph
    scale_color_manual(values = c('#FFFF33','#CC0033','#FF9900','#3333FF','#99FF33'),limits=c("CD3+","CD8+ CD3-", "CD8+ CD3+", "Stromal cells","Tumor cells"),drop=TRUE) + ## define the color you want to use for each cell type
    ggtitle(a) + ## add a title (a for the name of each biopsy)
    coord_fixed(ratio = 1) + ## allows you to correct the extent of the graph
    theme_bw() + 
    theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank()) + ## removes the grid (= white background)
    theme(plot.title = element_text(hjust = 0.5)) + ## sets the title to the center of the graph
    xlab("X (mm)") +
    ylab("Y (mm)") + 
    theme(text = element_text(size = 15)) ## allows you to set the size of the text (legends and titles)
  
  complete_plot_path_name <- paste(output_graphs, str_replace(a, ".csv", ""), ".png", sep = "")
  ggsave(plotly, file = complete_plot_path_name, dpi = 320, units = "mm") ## save each graph
  
}

lapply(FUN = histo_plot, x)

Clusterisation of the biopsy

Now, let's create a function which determines the number of tumor clusters you have on each slide. To do that, we transform each biopsy in a polygon and then calculate the surface of the later. To visualize what R is doing we can create a graph per step.

setwd(wd_tables)

area_plot <- function(a){
  
  Main_table <- read.csv(a)
  
  df <- Main_table[,c("Xadj","Yadj")]
  db <- dbscan(df, eps=0.5, minPts=50) ## function that clusters our cells
  plot(db, df, main = a, frame = FALSE)
  
  Main_table1 <- st_as_sf(df, coords=c("Xadj","Yadj"))
  Main_table1$cluster <- db$cluster
  Main_table1$flag <- Main_table$flag
  Main_table1$x <- Main_table$Xadj
  Main_table1$y <- Main_table$Yadj
  try(Main_table2 <- Main_table1 %>% filter(cluster!=0))
  polygons <- concaveman(Main_table1, concavity = 1, length_threshold = 0) ## creates polygons for each cluster determined by dbscan
  plot(st_geometry(Main_table1))
  plot(polygons)
  
  polygons2 <- map(unique(Main_table2$cluster), ~ concaveman(Main_table2[Main_table2$cluster %in% .,])) %>%
    map2(unique(Main_table2$cluster), ~ mutate(.x, cluster = .y)) %>%
    reduce(rbind)
  
  area <- st_area(polygons2) ## calculate the area of each cluster
  area1 <- as.data.frame(area)
  
  output_table <- paste(output_data, str_replace(a, ".csv", "_area"),".csv", sep="")
  write_csv(area1, file=output_table, col_names = TRUE)
  
  try(nicemap <- ## create a graph with the outline of each sample
        ggplot() + ## set up the framework
        geom_sf(data = polygons2, color="gray") + ## add the outline using geom_sf
        geom_point(data= Main_table2, aes(x=x, y=y, col = flag), size = 0.1) +
        scale_color_manual(values = c('#FFFF33','#CC0033','#FF9900','#3333FF','#99FF33'),limits=c("CD3+","CD8+ CD3-", "CD8+ CD3+", "Stromal cells","Tumor cells"),drop=TRUE) +
        theme_bw() +
        theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank()) + 
        ggtitle(a) +
        theme(plot.title = element_text(hjust = 0.5))) + 
        theme(text = element_text(size = 15)) 
  
  try(complete_plot_path_name <- paste(output_graphs,  str_replace(a, ".csv", "_polygon"), ".png", sep = ""))
  try(ggsave(nicemap, file = complete_plot_path_name))
}

lapply(FUN = area_plot, x)

Create a summary table with all your samples

Now, let's create a final table with a row per sample and each column represent the density of different cell types you are studying.


setwd(wd_tables)

removePat <- ".csv"
x2 <- gsub(removePat, "", x) ## removes the .csv in the names of your files

df1 <- data.frame(x2) ## new data frame created with a sample per row
df1[,c(2:6)] <- 0 ## fill columns from 2 to 6 with zeros (create a column per cell type)
colnames(df1) <- c("biopsy", "area", "CD4","CD8", "true_CD8", "CK") ## rename those new columns
output_df1 <- paste(output_data, y, ".csv", sep="")
write_csv(df1, output_df1)

table_df <- function(a){ ## create a function that extracts every important information from your tables
  
  df1 <- read_csv(output_df1)
  
  df3 <- paste(a,"_area.csv", sep="") ## takes the area of each sample
  t <- read.csv(df3)
  area <- sum(t$area)
  
  df1[df1$biopsy==a, 2] <- area
  
  df4 <- paste(a,".csv", sep="") ## takes the data frames with the cells
  
  Main_table <- read.csv(df4)
  
  ## For each cell for which you need to calculate its density
  ## Filter the data frame according to the flag you want and create df
  ## Add this df as a column in your final df
  
  dfcd3 <-  Main_table %>% filter(flag=="CD3+") 
  CD3 <- sum(dfcd3$CD3) ##the $CD3 is the name of the column in the Main_table, not the future name of the column
  
  
  dfcd8 <-  Main_table %>% filter(flag=="CD8+ CD3-") 
  CD8 <- sum(dfcd8$CD8)
  
  
  dftruecd8 <- Main_table %>% filter(flag=="CD8+ CD3+") 
  true_CD8 <- sum(dftruecd8$CD8) ## Here we have two markers (both positive), so the length of $CD8 or $CD3 is the same
  ## We take CD8 here but it doesnt change anything
  
  
  dftumor <- Main_table %>% filter(flag=="Tumor cells")
  CK <- sum(dftumor$hPanCK)
  
  df1[df1$biopsy==a, 3] <- CD3
  df1[df1$biopsy==a, 4] <- CD8
  df1[df1$biopsy==a, 5] <- true_CD8
  df1[df1$biopsy==a, 6] <- CK
  
  write_csv(df1, output_df1)
  
}

lapply(FUN = table_df, x2)

In this new table, you have the information about the area of each sample and a count of your cells of interest. Now you can transform it into densities for each row (= sample).

Please note that here we defined true CD8+ T cells as cells positive for both CD3 and CD8 markers but as we do multiplex stainings, sometimes, if a cell binds an antibody, there is no place to bind a second one (which means that cells stained for CD8 could potentially not be stained for CD3). But as we still want to take every cell into accont we create a column CD8 and true CD8.


setwd(wd_tables)

y2 <- paste(y,".csv", sep="") 
df1 <- read.csv(y2)

df1 <- df1 %>%
  mutate(CD3 = CD4 + CD8 + true_CD8)

df2 <- df1 %>%
  mutate(dCD3 = CD3/area) %>%
  mutate(dCD4 = CD4/area) %>%
  mutate(dCD8 = CD8/area) %>%
  mutate(dtrueCD8 = true_CD8/area) %>%
  mutate(dallCD8 = (true_CD8 + CD8)/area) %>%
  mutate(dCK = CK/area)


output_df2 <- paste(output_data, y, ".csv", sep="")
write_csv(df2, output_df2)

Please note that in this script there are several chunks which means that we need to set the working directory in each one of them or else the code is not going to run. If you are working in a basic R script, you don't need to reset it each time.

Now that you have created a data frame summarizing everything, you can start your statistical analysis.

Statistical analysis

Setting the inputs and outputs

output_data <- "~/project/tables/" ## set directory to output data
output_graphs <- "~/project/graphs/" ## set directory to output graphs

wd <- "~/project/tables/"
setwd(wd)

df <- read.csv("df.csv")
c_data <- read_excel("clinical_data.xlsx") 

df2 <- full_join(df, c_data, by = "patient") ##be sure you can merge with a column in common

Generating the graphs

Now, depending on the question you want to answer you may want to generate specific graphs.

In this section, we are going to generate graphs and add statistical values on them. To do this, we are going to need to load the ggpubr package. It works on the same principle as ggplot2 but it helps you to add the results of statistical tests on your graphs.

What is the difference of infiltration of CD3 cells before and after treatment?

To answer to this question, the better representation is a box plot:

Comparison between two groups

Let's say you have a column with the timepoint (before - after treatment) and a column with the densities of CD3 (number of CD3 in your tumor/tumor area). Here we make a Wilcox test which means that we compare the mean of density of CD3 in all tissues before and after treatment. Secondly, note that we do a paired test as we have a tissue before/after of the same patient. Pay attention, you need to have the same patients and the same number of tissues before/after, otherwise, remove the patients for whom you miss tissues.

If you want to compare the type of response between patients (complete, partial, ...), you compare different patients so don't forget to set paired to FALSE!

In the stat_compare_means() function, we can also add the argument label where we define if we want to know the exact p value (= "p.format") or if we want to know if it is significant or not (= "p.signif").

a <- ggboxplot(df2, x="timepoint", y="dCD3", add="jitter")+
 stat_compare_means(method="wilcox.test", paired = TRUE, aes(group=timepoint),label.y.npc = 0.9, label="p.format")

Comparison between three or more groups

If you have more than two groups (for example, you compare biopsies collected at timepoint 0, 2 and 15), you need to specify the tests you want to make.

First, create your graph (same as before, defining x and y and adding points on your box plot with jitter). Then create a variable with the comparisons you want to make and then add it in the function stat_compare_means().

b <- ggboxplot(df2, x = "timepoint", y = "dCD3", add="jitter")

my_comparisons <- list( c("0", "2"), c("2", "15"), c("0", "15") )

b <- b +
  stat_compare_means(method="wilcox.test", paired = TRUE, comparisons = my_comparisons, aes(group = timepoint), label.y.npc = 0.9, label="p.format")+ 
  scale_x_discrete(breaks=c("0","2","15"),labels=c("Baseline","Week 2", "Week 15"))+
  xlab("timepoint") + ylab("Density of CD3+ cells (cells/mm²)") + 
  ggtitle("Density of CD3+ cells in a biopsy (cells/mm²)", subtitle =  "at week 0, 2 and 15") +
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) + 
  theme(text = element_text(size = 15))

Is there a difference in the proliferation of CD3+ cells between patients who respond or not to the treatment?

Now, let's imagine you have a column with the densities of cells CD3+ RORC+ and a second column with the densities of cells CD3+ RORC- and you want to put them in on graph, you need to create a single column.

df_long <- df2 %>%
    pivot_longer(names_to = "cell_type", ##CD3+RORC+ or CD3+RORC-
                 values_to = "density", ##numeric values
                 cols = starts_with("dCD3")) ##define the columns you want to take

##OR

df_long <- df2 %>%
    pivot_longer(cols=c('dCD3_rorc', 'dCD3_nororc'), ##other way to define the columns you want
                 names_to = "cell_type", ##CD3+RORC+ or CD3+RORC-
                 values_to = "density") ##numeric values

Then this can be presented in a graph:

c <- ggboxplot(df_long, x="cell_type", y="density", add="jitter")+
 stat_compare_means(method="wilcox.test", paired = FALSE, aes(group=cell_type),label.y.npc = 0.9, label="p.format") + 
  facet_wrap(~ response) ## duplicate your graph to see the difference in the density of CD3 RORC between patients who respond or not to the treatment

What is the difference in the proportions of scoring before and after treatment ?

To answer this question, you will need to create a pie chart and not a box plott.

First step: Create a new data frames for each condition (before/after treatment) and counting down the number of different "immune scores" you have.

df_bt <- data.frame(
  immune_score = c("low","intermediate","high"),
  BeforeT = c(nrow(filter(df2, timepoint=="before") %>% filter(immune_score=="low")),
               nrow(filter(df2, timepoint=="before") %>%filter(immune_score=="intermediate")),
               nrow(filter(df2, timepoint=="before") %>%filter(immune_score=="high")))
)


df_at <- data.frame(
  immune_score = c("low","intermediate","high"),
  AfterT = c(nrow(filter(df2, timepoint=="after") %>% filter(immune_score=="low")),
             nrow(filter(df2, timepoint=="after") %>%filter(immune_score=="intermediate")),
             nrow(filter(df2, timepoint=="after") %>%filter(immune_score=="high")))
)

Second step: Create the labels (percentage of the different "immune scores"), select a color palette, define the layout of the graph (basic R plot!!) and then create the two pies (before/after treatment) with your 3 categories (here immune_score).

# Calculate the percentage of  sections and put it in the label
alabels <- round(100*df_bt$BeforeT/sum(df_bt$BeforeT), 1)
alabels <- paste(alabels, "%", sep="")

# Calculate the percentage of  sections and put it in the label
blabels <- round(100*df_at$AfterT/sum(df_at$AfterT), 1)
blabels <- paste(blabels, "%", sep="")

palette <- c("#CCCCFF" , "#FFE5CC", "#C4F5D7") ##select a color per category

layout(matrix(c(1,2,3,3), ncol=2, byrow=TRUE), heights=c(1, 1)) ##two pies = two columns
## matrix : first row two pies thus 1 and 2, second row, legend : one legend so 3, 3 (two times cause on the upper level you have 1, 2).

par(mai=rep(0.2, 4))
pie(df_bt$BeforeT, main="Before treatment",col = palette, labels=alabels, clockwise = TRUE, cex = 2, cex.main = 2)
pie(df_at$AfterT, main="After treatment", col = palette, labels=blabels, clockwise = TRUE, cex = 2, cex.main = 2)

par(mai=c(0,0,0,0))
plot.new()

legend("center", df_bt$BeforeT, fill = palette, ncol=3, cex=1.5, title="Immune score proportions before and after treatment") ## add legend

Third step: calculate the p-value.

Here we are going to do a Fischer test because we want to see if there is a difference of distribution of the different "immune scores" between the timepoints (before and after treatment). You could do a Chi-square test but in case you don't have a lot of values, you need to do a Fischer's exact test which will give a better value (and not an estimation).

df<-df2 %>% select(immune_score, timepoint) %>% filter(timepoint=="before" | timepoint=="after")
df <- mutate(df, biopsies=1)
df$immune_score <- as.factor(df$immune_score)
class(df$immune_score)
df <- ddply(df, .(timepoint, immune_score), summarize, biopsies=sum(biopsies))
df <- xtabs(biopsies~timepoint+immune_score, data=df)
df

chisq.test(df)$expected ##Check this value: if at least one point is lower than 5 : do a Fischer's exact test

fisher.result <- fisher.test(df)
print(fisher.result$p.value)

Repeat this test if you have more categories (timepoints 0, 2 and 15 for example).

What is the general overview of the densities of all the cell types in my multiplex analysis ?

For this question, you can create a heatmap. If you have different multiplex panels, you can full_join the different tables with markers together with the full_join() function. Then join in the clinical data. Let's continue with our first panel with CD3 and CD8 staining that we joined with the clinical data into the df2 variable.

For your heatmap:

select only the columns with the cell densities you want to study and the name of the biopsy
transform the biopsy column into rownames: you need to have an assay with only numeric values
normalize your data, here we show one way of normalization but it may not be the best one (please pay attention that the results are going to be different if you normalize on the columns or on the rows).
draw your heatmap.

A note on Summarized Experiment objects: When working with heatmaps, you need to have a matrix with only numeric values but sometimes, you will need to add comments (clinical data etc). Thus, you need to link the numeric data but not putting the columns with columns inside the matrix! You can do it easily with a Summarized Experiment type of object https://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html#:~:text=SummarizedExperiment%20is%20a%20matrix-like%20container%20where%20rows%20represent,of%20a%20SummarizedExperiment%20object%20represent%20features%20of%20interest.. It is usually used with sequencing datas.

df <- df2[,c("biopsy", "dCD3", "dCD4", "dCD8", "dCK")]

df <- column_to_rownames(df, "biopsy")
boxplot(df)

df1 <- normalize(df, method = "standardize")
boxplot(df1) ##just to see the overall distribution

ra <- as.matrix(df1) ##transform you assay into a matrix
rb <- na.omit(ra) ##remove all NA values

#Basic heatmap

ggp1 <- heatmap(rb) 
ggp1

rf <- t(rb) ##transpose the table

ggp2 <- Heatmap(rf, border = TRUE) ##heatmap the other way round
ggp2 ##best to have the cell's densities in rows and patients/biopsies in columns

# Complex heatmap

##create a data frame with the clinical data etc
colData <- df2 %>%
  select(timepoint, patient)

##Create a Summarized experiment (link a matrix and coldata)
se <- SummarizedExperiment(assays = rf, colData = colData)

ggpa <- Heatmap(assay(se), column_km = 3, border = TRUE) ##independant clustering
##column_km = 3 creates 3 clusters
ggpa 

colnames(assay(se)) ##check colnames: should be the same + same length as those in annotation df!

##create a variable with annotations (ha1 for treatment, ha2 for ...)
ha1 <- HeatmapAnnotation(treatment = df2$treatment, 
                         col = list(treatment = c("immunotherapy" = "#010000", "standard_of_care" = "#ffffff")))

ggpb <- Heatmap(assay(se), name = "density", column_split = colData(se)[1], border = TRUE, top_annotation = c(ha1))
                  
ggpb ##is there a difference according to the timepoint ? cause we split according to timepoint: colData(se)[1]

ggpc <- Heatmap(assay(se), column_split = colData(se)[2], border = TRUE, top_annotation = c(ha1))
ggpc ##split the graph according to patients: colData(se)[2]

ggpd <- Heatmap(assay(se), border = TRUE,top_annotation = c(ha1))
ggpd ##unsupervised clustering

ggpe <- Heatmap(assay(se), column_km = 3, border = TRUE, top_annotation = c(ha1))
ggpe

Now you saw the different graphs you could generate with your data and you can replicate the principle to answer all of your questions regarding the variation of densities of you cells according to the timepoint or the response to the treatment or the type of treatment received, ...

Distances-based analysis using G-cross function

Compared the probability for a i cell (e.g. a tumor cell) to meet a j cell (e.g. a CD3+ T-cell) inside the tumor biopsy

For this next section, if you want more details on the G-cross function, you can check the next articles:

Parra ER. Methods to Determine and Analyze the Cellular Spatial Distribution Extracted From Multiplex Immunofluorescence Data to Understand the Tumor Microenvironment. Front Mol Biosci. 2021;8:668340. Published 2021 Jun 14. doi:10.3389/fmolb.2021.668340. https://pubmed.ncbi.nlm.nih.gov/34179080/.
Barua S, Fang P, Sharma A, Fujimoto J, Wistuba I, Rao AUK, Lin SH. Spatial interaction of tumor cells and regulatory T cells correlates with survival in non-small cell lung cancer. Lung Cancer. 2018 Mar;117:73-79. doi: 10.1016/j.lungcan.2018.01.022. https://pubmed.ncbi.nlm.nih.gov/29409671/.

setwd("~/project/tables/")

x <- list.files ("~/project/data", pattern=".csv")

AUC_plot <- function(a){
  
Main_table <- read.table(a, sep=",",header=TRUE)
  
Main_table <- Main_table %>% mutate (Xadj = Xadj*1000) %>% mutate (Yadj = Yadj*1000) %>% mutate (flag1 = ifelse(str_detect(Main_table$flag, "CD3+|CD8+ CD3+"), "CD3+", Main_table$flag)) ## change your CD3+CD8+ into CD3+ cells only
  
p.sf <- st_as_sf(Main_table, coords = c("Xadj", "Yadj")) ## change the coordinates into the good format for the spatstat package
s.sp <- as.ppp(X=st_coordinates(p.sf$geometry), W=p.sf$geometry) 



###################CD3+################

marks(s.sp) <- factor(p.sf$flag1, levels=c("CD3+", "Stromal cells", "Tumor cells"))
  
valueG <- Gcross(s.sp, i="Tumor cells", j="CD3+", correction="rs") ## use the G-cross function to know the probablity for a tumor cell to meet a CD3+ T-cells by using the border correction (please see Gcross arguments description in R) 
plot(valueG, main = str_replace(a, ".csv", ""))
  
  try(complete_plot_path_name <- paste(output_graphs,  str_replace(a, ".csv", "_Gcrossfunction_CD3"), ".png", sep = ""))
dev.copy(png,complete_plot_path_name )
 dev.off()
  
 rs <- as.data.frame(valueG$rs)
 r <- as.data.frame(valueG$r)
 
rs_r <- cbind(r,rs)
 
AUC <- AUC(valueG$r, valueG$rs, from = min(valueG$r, na.rm = TRUE), to = max(20, na.rm = TRUE), 
    method = c("trapezoid"), 
    absolutearea = FALSE, subdivisions = 100, na.rm = FALSE) ## function used to know the area under the curve of the probability for a radius from 0 to 20µm for example as described Barua et al. 2018.

plot1 <- ggplot(rs_r, aes(x=valueG$r, y=valueG$rs)) + geom_point()+geom_area(fill = "light green")+ annotate("text",  label= "AUC (r<20µm) = ", x=14, y=0.9, size=4.5, color="light green")+ annotate("text",  label= round(AUC[1],5), x=45, y=0.9, size=4.5, color="light green")+theme_bw()+labs(title=str_replace(a, ".csv","CD3"), y="Border corrected estimate of G", x="Distance (µm)")+theme(plot.title = element_text(hjust = 0.5))

complete_plot_path_name <- paste(output_graphs, str_replace(a, ".csv", "_AUC_plot_CD3"), ".png", sep = "")
ggsave(plot1, file = complete_plot_path_name, dpi = 320, units = "mm")
}

lapply(FUN = AUC_plot, x)

Quadrat count

Another way of studying the tumor infiltrating lymphocytes is to check for the quadra count. The biopsy can be divided in squares and the lymphocyte infiltration can be studied in each square.

3 HS computation


#input_data <- "project/data/"

CD4HS<-c(1,2,3)
CD4HS <- as.data.frame(CD4HS)
CD4HS <- t(CD4HS)
CD8HS<-c(1,2,3)
CD8HS <- as.data.frame(CD8HS)
CD8HS <- t(CD8HS)

name_to_store_CD4 <- paste(output_data,"CD4HS.txt",sep = "")
write.table(CD4HS, file = name_to_store_CD4)

name_to_store_CD8 <- paste(output_data,"CD8HS.txt",sep = "")
write.table(CD8HS, file = name_to_store_CD8)

input_vector_x <- list.files("~/project/data", pattern = ".csv")

quadrat_computation <- function(x){
  
  nameOftable <- x
  input_path <- paste(output_data,nameOftable, sep = "")
  Main_table <- read.csv(input_path)
  
  CD4HS <- read.table(name_to_store_CD4, header=TRUE, sep=" ", stringsAsFactors=FALSE)
  CD8HS <- read.table(name_to_store_CD8, header=TRUE, sep=" ", stringsAsFactors=FALSE)
  as.data.frame(CD4HS)
  as.data.frame(CD8HS)
  
  
  CD8<- Main_table[which(Main_table$flag=="CD8+ CD3+" | Main_table$flag=="CD8+ CD3-"), ] 
  CD4<- Main_table[which(Main_table$flag=="CD3+"), ]
  CD8<-CD8[c('Xadj', 'Yadj')]
  CD4<-CD4[c('Xadj', 'Yadj')]
  
  
  
  Main_table <- sf::st_as_sf(Main_table, coords=c('Xadj', 'Yadj'))
  polygons <- concaveman(Main_table, concavity = 1, length_threshold = 0)
  Main_table <- read.csv(input_path)
  coordinates(Main_table)<- c('Xadj', 'Yadj')
  summary(Main_table)
  plot(Main_table, pch = 20, col = "steelblue")
  plot(Main_table)
  plot(polygons)
  poly_total <- as_Spatial(polygons)
  poly_total <- as.owin.SpatialPolygons(poly_total)
  box <- boundingbox(poly_total)
  plot.owin(box)
  box <- as.data.frame.owin(box)
  box
  x1<-box[1,1]
  x2<-box[2,1]
  x2
  x_lenght <- x2-x1
  x_lenght
  y1 <- box[1,2]
  y2 <- box[3,2]
  y_lenght<-y2-y1
  y_lenght
  
  
  test.ppp<-ppp(x=CD8$Xadj, y=CD8$Yadj, poly_total)
  
  
  a <- tryCatch({plot(quadratcount(test.ppp, nx = (x_lenght/0.25) , ny = (y_lenght/0.25)))
    area(poly_total)
    Qcount<-data.frame(quadratcount(test.ppp, nx= (x_lenght/0.25) , ny = (y_lenght/0.25)))
    QCountTable <- data.frame(table(Qcount$Freq))
    QCountTable <- QCountTable[order(QCountTable[,1], decreasing = TRUE),]
    QCountTable <- QCountTable[,1]
    a <- QCountTable[c(1:3)]
    a <- as.data.frame(a)
    a <- t(a)},
    error = function(e){a <- data.frame(V1=0,V2=0,V3=0)
    return(a)})
  
  
  row.names(a) <- nameOftable
  CD8HS <- rbind(CD8HS, a)
  name_to_store <- paste(output_data,"CD8HS.txt",sep = "")
  write.table(CD8HS, file = name_to_store)
  rm(QCountTable,Qcount,a)
  
  
  test.ppp<-ppp(x=CD4$Xadj, y=CD4$Yadj, poly_total)
  
  #L <- sqrt((((area(poly_total)))/10))
  #L
  b <- tryCatch({plot(quadratcount(test.ppp, nx = (x_lenght/0.25) , ny = (y_lenght/0.25)))
    area(poly_total)
    Qcount<-data.frame(quadratcount(test.ppp, nx= (x_lenght/0.25) , ny = (y_lenght/0.25)))
    QCountTable <- data.frame(table(Qcount$Freq))
    QCountTable <- QCountTable[order(QCountTable[,1], decreasing = TRUE),]
    QCountTable <- QCountTable[,1]
    b<- QCountTable[c(1:3)]
    b <- as.data.frame(b)
    b <- t(b)},
    error = function(e){b <- data.frame(V1=0,V2=0,V3=0)
    return(b)})
  
  row.names(b) <- nameOftable
  CD4HS <- rbind(CD4HS, b)
  name_to_store <- paste(output_data,"CD4HS.txt",sep = "")
  write.table(CD4HS, file = name_to_store)
  rm(QCountTable,Qcount,b)
  
}
lapply(FUN=quadrat_computation, input_vector_x)

github-112233 / ring Goto Github PK

ring's Introduction

Ring : Pipeline for the analysis of multiplex immunofluorescence stainings.

Packages to load for this script

Multiplex data transformation from HALO to R

Setting the inputs and outputs

Creating a function to transform your data automatically

Transforming your data in mm

Visualisation of your biopsies

Clusterisation of the biopsy

Create a summary table with all your samples

Statistical analysis

Setting the inputs and outputs

Generating the graphs

What is the difference of infiltration of CD3 cells before and after treatment?

Comparison between two groups

Comparison between three or more groups

Is there a difference in the proliferation of CD3+ cells between patients who respond or not to the treatment?

What is the difference in the proportions of scoring before and after treatment ?

What is the general overview of the densities of all the cell types in my multiplex analysis ?

Distances-based analysis using G-cross function

Compared the probability for a i cell (e.g. a tumor cell) to meet a j cell (e.g. a CD3+ T-cell) inside the tumor biopsy

Quadrat count

3 HS computation

ring's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Jobs