Introduction


Once you have generated summary data (e.g. number of T cells per sample, etc) in Spectre, or other programs such as FlowJo, it can be laborious to manually enter these values into a program to generate graphs for statistical analysis. Here we present a quick workflow script to rapidly generate graphs and heatmaps for quantitative, differential, and statistical analysis. Note, this can be used on summary data generated by any program, including FlowJo.

image2021-2-28_11-49-51.png



Citation

If you use Spectre in your work, please consider citing Ashhurst TM, Marsh-Wakefield F, Putri GH et al. (2020). bioRxiv. 2020.10.22.349563To continue providing open-source tools such as Spectre, it helps us if we can demonstrate that our efforts are contributing to analysis efforts in the community. Please also consider citing the authors of the individual packages or tools (e.g. CytoNorm, FlowSOM, tSNE, UMAP, etc) that are critical elements of your analysis work. We have provided some generic text that you can use for your methods section with each protocol and on the 'about' page.


Software and R script preparation


Note

  • If you haven't installed Spectre, please visit our Spectre installation page.
  • If you aren't familiar with using RStudio or Spectre, please see our RStudio and Spectre basic tutorials.


You can find the 'Spectre stats' script (and some demo data) here:

Spectre stats ZIP.zip

Copy the script into a folder containing a CSV with your summary data.

Your summary data (in this example, sum.dat.csv) should be saved as a CSV File. You will need at minimum one column that denotes the sample name, and one that denotes the group each sample belongs to. You can have as many additional 'annotation' columns as you like (e.g. batch, timepoint, treatment, etc). Each other column should represent some feature of the samples (CD4 T cells per sample, expression of Ly6C on monocytes in each sample, etc). Tables like this are generated using the 'Table Editor' from FlowJo.


1. Read in summary data


##########################################################################################################
#### 1. Load packages, and set working directory
##########################################################################################################

Load the packages.

    ### Load libraries

        library(Spectre)
        Spectre::package.check()    # Check that all required packages are installed
        Spectre::package.load()     # Load required packages

Save the folder where this script is located as 'PrimaryDirectory'.

    ### Set PrimaryDirectory
        
        dirname(rstudioapi::getActiveDocumentContext()$path)            # Finds the directory where this script is located
        setwd(dirname(rstudioapi::getActiveDocumentContext()$path))     # Sets the working directory to where the script is located
        getwd()
        PrimaryDirectory <- getwd()

Create an output directory for the plots. This will be a subfolder of PrimaryDirectory.

    ### Create output directory
        
        dir.create("Output_Spectre_stats", showWarnings = FALSE)
        setwd("Output_Spectre_stats")
        OutputDirectory <- getwd()
        setwd(PrimaryDirectory)



2. Import summary data


##########################################################################################################
#### 2. Import summary data
##########################################################################################################

Import the summary data into R. Note, the summary data needs to be saved as a '.csv' file (if using excel, try 'Save As' and select CSV).

    ### Import data
        
        setwd(PrimaryDirectory)
        sum.dat <- fread("sum.dat.csv")

Preview the data.

    ### Columns for analysis
        
        sum.dat

Examine the column names.

        as.matrix(names(sum.dat))
     [,1]                                              
 [1,] "Sample"                                          
 [2,] "Group"                                           
 [3,] "Batch"                                           
 [4,] "Cells per sample -- CD4 T cells"                 
 [5,] "Cells per sample -- CD8 T cells"                 
 [6,] "Cells per sample -- Infil Macrophages"           
 [7,] "Cells per sample -- Microglia"                   
 [8,] "Cells per sample -- Neutrophils"                 
 [9,] "Cells per sample -- NK cells"                    
[10,] "Percent Ly6C_asinh positive -- CD4 T cells"      
[11,] "Percent Ly6C_asinh positive -- CD8 T cells"      
[12,] "Percent Ly6C_asinh positive -- Infil Macrophages"
[13,] "Percent Ly6C_asinh positive -- Microglia"        
[14,] "Percent Ly6C_asinh positive -- Neutrophils"      
[15,] "Percent Ly6C_asinh positive -- NK cells"         

Define the columns that denote: sample names, group names, and then any other annotation columns. In this example:

  • Column 1 ('Sample') is the sample column
  • Column 2 ('Group') is the group column
  • Columns 2 ('Group') and 3 ('Batch') are both going to be annotation columns


        sample.col <- names(sum.dat)[c(1)] 
        group.col <- names(sum.dat)[c(2)]
        annot.cols <- names(sum.dat)[c(2:3)]

We can also specify which columns are sample features we wish to plot (in the demo, columns 4 to 15).

plot.cols <- names(sum.dat)[c(4:15)]
as.matrix(plot.cols)
     [,1]                                              
 [1,] "Cells per sample -- CD4 T cells"                 
 [2,] "Cells per sample -- CD8 T cells"                 
 [3,] "Cells per sample -- Infil Macrophages"           
 [4,] "Cells per sample -- Microglia"                   
 [5,] "Cells per sample -- Neutrophils"                 
 [6,] "Cells per sample -- NK cells"                    
 [7,] "Percent Ly6C_asinh positive -- CD4 T cells"      
 [8,] "Percent Ly6C_asinh positive -- CD8 T cells"      
 [9,] "Percent Ly6C_asinh positive -- Infil Macrophages"
[10,] "Percent Ly6C_asinh positive -- Microglia"        
[11,] "Percent Ly6C_asinh positive -- Neutrophils"      
[12,] "Percent Ly6C_asinh positive -- NK cells"              

Here we also need to specify the order in which we want the groups

    ### Experimental groups                  as.matrix(unique(sum.dat[[group.col]]))
     [,1]      
[1,] "Mock"    
[2,] "WNV"     
[3,] "WNV + Rx"

If we want to change the order, we can do it here (in the demo, the groups are already in the correct order – if they weren't, we could change it: c("WNV", "Mock", "WNV + Rx") etc).

        grp.order <- c("Mock", "WNV", "WNV + Rx")         
		as.matrix(grp.order)
     [,1]      
[1,] "Mock"    
[2,] "WNV"     
[3,] "WNV + Rx"

We also need to define which paired comparisons we want to make for statistics. In the demo we have three: Mock vs WNV, WNV vs WNV + Rx, and Mock vs WNV + Rx.

        comparisons <- list(c("Mock", "WNV"),
                            c("WNV", "WNV + Rx"),
                            c("Mock", "WNV + Rx")
                            )

Check the entries

        comparisons
[[1]]
[1] "Mock" "WNV" 

[[2]]
[1] "WNV"      "WNV + Rx"

[[3]]
[1] "Mock"     "WNV + Rx"

Define the statistical tests to use.

    ### Setup
    
        variance.test <- 'kruskal.test'
        pairwise.test <- "wilcox.test"

Reorder the rows

    ### Reorder summary data and SAVE
        
        sum.dat <- do.reorder(sum.dat, group.col, grp.order)
        
        sum.dat
        sum.dat[,c(1:5)]



3. Output analysis


##########################################################################################################
#### 3. Output analysis
##########################################################################################################

Save the data.

    setwd(OutputDirectory)
    fwrite(sum.dat, 'sum.dat.csv')
##########################################################################################################
#### 3. Output analysis
##########################################################################################################

Loop to create one plot per feature.

    ### Autographs

        for(i in plot.cols){

            make.autograph(sum.dat,
                           x.axis = group.col,
                           y.axis = i,
                           y.axis.label = i,
                           
                           grp.order = grp.order,
                           my_comparisons = comparisons,
                           
                           Variance_test = variance.test,
                           Pairwise_test = pairwise.test,
                           
                           title = i,
                           filename = paste0(i, '.pdf'))
            
        }

Z-score transformation of data.

    ### Create a fold change heatmap
        
        ## Z-score calculation
        
        sum.dat.z <- do.zscore(sum.dat, plot.cols, replace = TRUE)
        
        ## Group 
        
        t.first <- match(grp.order, sum.dat.z[[group.col]])
        t.first <- t.first -1
        t.first

Make a z-score heatmap.

        ## Make heatmap
        make.pheatmap(sum.dat.z, 
                      sample.col = sample.col, 
                      plot.cols = plot.cols, 
                      is.fold = TRUE, 
                      plot.title = 'Z-score',
                      annot.cols = annot.cols,
                      dendrograms = 'column',
                      row.sep = t.first,
                      cutree_cols = 3)



4. Output session info


##########################################################################################################
#### Output session info
##########################################################################################################
        setwd(OutputDirectory)
        dir.create("Output - info", showWarnings = FALSE)
        setwd("Output - info")

        sink(file = "session_info.txt", append=TRUE, split=FALSE, type = c("output", "message"))
        session_info()
        sink()


  • No labels