Introduction
Once you have generated summary data (e.g. number of T cells per sample, etc) in Spectre, or other programs such as FlowJo, it can be laborious to manually enter these values into a program to generate graphs for statistical analysis. Here we present a quick workflow script to rapidly generate graphs and heatmaps for quantitative, differential, and statistical analysis. Note, this can be used on summary data generated by any program, including FlowJo.
Citation
If you use Spectre in your work, please consider citing Ashhurst TM, Marsh-Wakefield F, Putri GH et al. (2020). bioRxiv. 2020.10.22.349563. To continue providing open-source tools such as Spectre, it helps us if we can demonstrate that our efforts are contributing to analysis efforts in the community. Please also consider citing the authors of the individual packages or tools (e.g. CytoNorm, FlowSOM, tSNE, UMAP, etc) that are critical elements of your analysis work. We have provided some generic text that you can use for your methods section with each protocol and on the 'about' page.
Software and R script preparation
Note
- If you haven't installed Spectre, please visit our Spectre installation page.
- If you aren't familiar with using RStudio or Spectre, please see our RStudio and Spectre basic tutorials.
You can find the 'Spectre stats' script (and some demo data) here:
Copy the script into a folder containing a CSV with your summary data.
Your summary data (in this example, sum.dat.csv) should be saved as a CSV File. You will need at minimum one column that denotes the sample name, and one that denotes the group each sample belongs to. You can have as many additional 'annotation' columns as you like (e.g. batch, timepoint, treatment, etc). Each other column should represent some feature of the samples (CD4 T cells per sample, expression of Ly6C on monocytes in each sample, etc). Tables like this are generated using the 'Table Editor' from FlowJo.
1. Read in summary data
########################################################################################################## #### 1. Load packages, and set working directory ##########################################################################################################
Load the packages.
### Load libraries library(Spectre) Spectre::package.check() # Check that all required packages are installed Spectre::package.load() # Load required packages
Save the folder where this script is located as 'PrimaryDirectory'.
### Set PrimaryDirectory dirname(rstudioapi::getActiveDocumentContext()$path) # Finds the directory where this script is located setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) # Sets the working directory to where the script is located getwd() PrimaryDirectory <- getwd()
Create an output directory for the plots. This will be a subfolder of PrimaryDirectory.
### Create output directory dir.create("Output_Spectre_stats", showWarnings = FALSE) setwd("Output_Spectre_stats") OutputDirectory <- getwd() setwd(PrimaryDirectory)
2. Import summary data
########################################################################################################## #### 2. Import summary data ##########################################################################################################
Import the summary data into R. Note, the summary data needs to be saved as a '.csv' file (if using excel, try 'Save As' and select CSV).
### Import data setwd(PrimaryDirectory) sum.dat <- fread("sum.dat.csv")
Preview the data.
### Columns for analysis sum.dat
Examine the column names.
as.matrix(names(sum.dat))
[,1] [1,] "Sample" [2,] "Group" [3,] "Batch" [4,] "Cells per sample -- CD4 T cells" [5,] "Cells per sample -- CD8 T cells" [6,] "Cells per sample -- Infil Macrophages" [7,] "Cells per sample -- Microglia" [8,] "Cells per sample -- Neutrophils" [9,] "Cells per sample -- NK cells" [10,] "Percent Ly6C_asinh positive -- CD4 T cells" [11,] "Percent Ly6C_asinh positive -- CD8 T cells" [12,] "Percent Ly6C_asinh positive -- Infil Macrophages" [13,] "Percent Ly6C_asinh positive -- Microglia" [14,] "Percent Ly6C_asinh positive -- Neutrophils" [15,] "Percent Ly6C_asinh positive -- NK cells"
Define the columns that denote: sample names, group names, and then any other annotation columns. In this example:
- Column 1 ('Sample') is the sample column
- Column 2 ('Group') is the group column
- Columns 2 ('Group') and 3 ('Batch') are both going to be annotation columns
sample.col <- names(sum.dat)[c(1)] group.col <- names(sum.dat)[c(2)] annot.cols <- names(sum.dat)[c(2:3)]
We can also specify which columns are sample features we wish to plot (in the demo, columns 4 to 15).
plot.cols <- names(sum.dat)[c(4:15)]
as.matrix(plot.cols)
[,1] [1,] "Cells per sample -- CD4 T cells" [2,] "Cells per sample -- CD8 T cells" [3,] "Cells per sample -- Infil Macrophages" [4,] "Cells per sample -- Microglia" [5,] "Cells per sample -- Neutrophils" [6,] "Cells per sample -- NK cells" [7,] "Percent Ly6C_asinh positive -- CD4 T cells" [8,] "Percent Ly6C_asinh positive -- CD8 T cells" [9,] "Percent Ly6C_asinh positive -- Infil Macrophages" [10,] "Percent Ly6C_asinh positive -- Microglia" [11,] "Percent Ly6C_asinh positive -- Neutrophils" [12,] "Percent Ly6C_asinh positive -- NK cells"
Here we also need to specify the order in which we want the groups
### Experimental groups as.matrix(unique(sum.dat[[group.col]]))
[,1] [1,] "Mock" [2,] "WNV" [3,] "WNV + Rx"
If we want to change the order, we can do it here (in the demo, the groups are already in the correct order – if they weren't, we could change it: c("WNV", "Mock", "WNV + Rx") etc).
grp.order <- c("Mock", "WNV", "WNV + Rx") as.matrix(grp.order)
[,1] [1,] "Mock" [2,] "WNV" [3,] "WNV + Rx"
We also need to define which paired comparisons we want to make for statistics. In the demo we have three: Mock vs WNV, WNV vs WNV + Rx, and Mock vs WNV + Rx.
comparisons <- list(c("Mock", "WNV"), c("WNV", "WNV + Rx"), c("Mock", "WNV + Rx") )
Check the entries
comparisons
[[1]] [1] "Mock" "WNV" [[2]] [1] "WNV" "WNV + Rx" [[3]] [1] "Mock" "WNV + Rx"
Define the statistical tests to use.
### Setup variance.test <- 'kruskal.test' pairwise.test <- "wilcox.test"
Reorder the rows
### Reorder summary data and SAVE sum.dat <- do.reorder(sum.dat, group.col, grp.order) sum.dat sum.dat[,c(1:5)]
3. Output analysis
########################################################################################################## #### 3. Output analysis ##########################################################################################################
Save the data.
setwd(OutputDirectory) fwrite(sum.dat, 'sum.dat.csv')
########################################################################################################## #### 3. Output analysis ##########################################################################################################
Loop to create one plot per feature.
### Autographs for(i in plot.cols){ make.autograph(sum.dat, x.axis = group.col, y.axis = i, y.axis.label = i, grp.order = grp.order, my_comparisons = comparisons, Variance_test = variance.test, Pairwise_test = pairwise.test, title = i, filename = paste0(i, '.pdf')) }
Z-score transformation of data.
### Create a fold change heatmap ## Z-score calculation sum.dat.z <- do.zscore(sum.dat, plot.cols, replace = TRUE) ## Group t.first <- match(grp.order, sum.dat.z[[group.col]]) t.first <- t.first -1 t.first
Make a z-score heatmap.
## Make heatmap make.pheatmap(sum.dat.z, sample.col = sample.col, plot.cols = plot.cols, is.fold = TRUE, plot.title = 'Z-score', annot.cols = annot.cols, dendrograms = 'column', row.sep = t.first, cutree_cols = 3)
4. Output session info
########################################################################################################## #### Output session info ##########################################################################################################
setwd(OutputDirectory) dir.create("Output - info", showWarnings = FALSE) setwd("Output - info") sink(file = "session_info.txt", append=TRUE, split=FALSE, type = c("output", "message")) session_info() sink()