This pipeline takes either 10X cellranger outputs for two tissues (always needed to generate metadata) or an existing Seurat object stored in an .RDS file, finds matching clones based on TCR chain sequences between these samples, either creates or loads a Seurat object, creates a custom metadata table, then write this modified Seurat object out to an .rds file. Below are some example commands to run the pipeline.
After running this pipeline, the modified metadata in the outputted Seurat v3
object accessible at [email protected]
will have the following
format (with definitions of what the data is rather than actual data):
Cell Barcode | orig.ident | nCount_RNA | nFeature_RNA | RNA_snn_res.1.2 | seurat_clusters | matching | u1 | u1 | freq | Frequency | TCR |
---|---|---|---|---|---|---|---|---|---|---|---|
10X barcode for this individual cell | not-used,artifact from processing | number of reads in this cell | number of genes detected in this cell | Seurat cluster (duplicate) | Seurat Cluster | whether the TCR of this cell was detected in paired tissue | UMAP 1 coordinate | UMAP 2 coordinate | Frequency of TCR (number of total cells indentical TCR detected in) with NAs added to generate color scales for expansion plots in webapp | Frequency (same as freq without NAs) | TCR sequence in the format of TRA|TRB |
Run mount_smbfs //[email protected]/singerlab ./dfci_share
.
Run python clone_pipeline.py /Users/jacobluber/dfci_share/jacob/data/K409/K409/K409LNVDJ/filtered_contig_annotations.csv /Users/jacobluber/dfci_share/jacob/data/K409/K409/K409bloodVDJ/filtered_contig_annotations.csv /Users/jacobluber/dfci_share/jacob/data/K409/K409/K409LNGEXouts/outs/analysis/tsne/2_components/projection.csv /Users/jacobluber/dfci_share/jacob/data/K409/K409/K409bloodGEXouts/outs/analysis/tsne/2_components/projection.csv /Users/jacobluber/dfci_share/jacob/data/K409/K409/K409LNVDJ/clonotypes.csv /Users/jacobluber/dfci_share/jacob/data/K409/K409/K409bloodVDJ/clonotypes.csv k409_ln-blood.csv k409_blood-ln.csv
. Note that clone_pipeline.py
will generate "matching" data for both blood and tumor.
Below is a table explaining the arguments to clone_pipeline.py
. In the example
the first tissue is tumor and the second tissue is blood. Arguments 3 and 4 are
deprecated (UMAP coordinates generated at later step) but still need to be included currently. All inputs are generated using either cellranger count
or cellranger vdj
commands.
clone_pipeline.py argument # | file details |
---|---|
1 | first tissue TCR contig annotations |
2 | second tissue TCR contig annotations |
3 | first tissue GEX t-SNE coordinates |
4 | second tissue GEX t-SNE coordinates |
5 | first tissue TCR clonotype info |
6 | second tissue TCR clonotype info |
7 | output file name for TCR metadata file for first tissue |
8 | output file name for TCR metadata file for second tissue |
Run conda activate R
followed by Rscript append_metadata_to_seurat_object.R /Users/jacobluber/browser/k409.ln-blood.rds k409_ln-blood.csv ~/singer_repos/blood-tcr-pipeline/ k409_test
where the first argument is the existing .rds file, the second argument is the output from generating the TCR metadata above, the third argument is the output path, and the fourth argument is the name of the output file excluding .rds at the end.
Run conda activate R
followed by Rscript make_seurat_object_from_scratch.R /Users/jacobluber/dfci_share/jacob/data/K409/K409/K409LNGEXouts/outs/filtered_gene_bc_matrices/GRCh38/ k409_ln-blood.csv ~/singer_repos/blood-tcr-pipeline/ k409_ln
where the first argument is the full path to the directory containing the cellranger GEX sparse expression matrix, and the remaining arguments are the same as the previous instructions for appending metadata.
This script needs to be run twice: once for blood and once for tumor.
Run seurat_object_name <- readRDS("k409_ln.rds")
.