hemberg-lab / scrna.seq.course Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rstudio/bookdown-demo

670.0 670.0 360.0 373.55 MB

Analysis of single cell RNA-seq data course

Home Page: https://www.singlecellcourse.org

License: GNU General Public License v3.0

TeX 50.07% CSS 6.30% HTML 3.59% Python 6.10% Perl 13.25% R 6.82% Dockerfile 10.91% Nextflow 2.38% Shell 0.58%

bookdown course r scrna-seq scrna-seq-analysis sequencing

scrna.seq.course's People

Contributors

Stargazers

Watchers

Forkers

mhemberg tallulandrews ingezz graddi zhyguo kiran0991 hoholee yixf-self hydebutterfy robertoalvarezm jixing475 alyamahmoud haniap puriney genomicsnx zhenweizhou ruslanalali cyang-2014 willazhou him72 statfungen francisfa richardlzq weizhiting jiashunzheng zhiyuanhu imex35 tanboyu mackaay vanhesling yangkangyf dyohanne rmtsoa supertigerinwater superastri liekevandehaar waylandm francescarivello emkwon7 vd4mmind sqsun raymondshang navarrabiomed shihabhasan jiaruimi rahelehrahbari haroon123 del2007 mosquitocat mutual-ai martinholub p-anand pfern linggef fadhlyemen wt215 davismcc goodgodric28 yunjiesong mfq426 liusyscholar debatewithalittlebird idelvalle fishinwind shmilyfhh al2na fantomq polojacky zouter dfajar2 merckey wenhuaren samll-rookie jaybee84 18853857973 jmwarner starxian yal054 flying-sheep zberlj hujiao1314 ken0936 lina-gao flora0420 singlecell-jusue404 summer-yangqin drshtabla jacobhepkema drlucymac xlw1207 xiaoxiaoh16 y461650833y yejg2017 luketerry zhumengyan wenwenmin mpaperlee jiuxuan olivia-wxhuang dayedepps

scrna.seq.course's Issues

PDF file of the course

Add texlive-full to the docker image, but it takes more than 2 hours to build, so won't be able to build it on DockerHub. So, we should probably move docker image building to Quay... And then we can put everything on the https://dockstore.org/

Add Slingshot to Pseudotime

"best" according to: https://www.biorxiv.org/content/early/2018/03/05/276907

obscure normalization results

Hi,
I am using some normalization method to normalize my data set, which is originally from 10X genomics, I managed it using SingleCellExperiment class using scater package;
after filtering with MT, total counts, total features, I did normalization as told in course, however, when I plot RLE, I got these strange results, which seems to be more 'normalized' in Raw than normalized with methods, shown below:
scranNorm.pdf

TMMnorm.pdf
UQnorm.pdf
RLEnorm.pdf
CPMnorm.pdf

for me, TMM seems to be good, but not others, especially for scran;
any explanation for this?
Thanks!

Unable to download PDF version of course material

Hi,

I am unable to download the PDF version of the course material found on the this website page. The page error I'm getting is the following.

Any way this issue can be fixed?

Best,
Leon

Remove SNN-cliq

We should consider removing SNN-cliq and replace with a more recent clustering method

Add correct gene length normalization (future)

Add paragraph & link to this paper in Normalization section about FPKMs, TPMs
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5167349/

docker image link not correct

Just had an issue trying to connect to the docker image listed in section 2.2--it doesn't match the (seemingly correct) link in section 1.4

add introduction to ggplot2 and pheatmap

Add to visualization

diffusion maps
graph-based

can not open "file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html"

I pulled the docker image v4.11. Now, I tried to open "file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html" in chrome, but failed. Could you please help me out? Thank you so much!

Jeff

Could you please provide some intermediate files so people like me can go through the whole classes?

I was wondering whether you could provide the intermediate files used as examples in this amazing tutorial.

I downloaded some files, such as "ERR522959_2.fastq", "ERR522959_1.fastq", to follow along the class, but couldn't find the files like "data/10cells_read1.fq, data/10cells_read2.fq", "data/droplet_id_example_per_barcode.txt.gz", "data/droplet_id_example_truth.gz".

I had experience in analyzing bulk RNAseq data, and is not currently working on ScRNAseq data. For some reasons, it is impractical for me to download all the original fastq files to our server to generate all the intermediate files. What I hope to do is to be able to follow along your classes on my personal Mac computer to get familiar with all the tools and the workflow of ScRNAseq analysis.

So I would greatly appreciate your kindness to provide a link (either here or some other sources) to download the intermediate files required to work through the examples in this class.

Best,

Jeff

Molecules.txt from tung data

I am trying to follow along this workshop, but I cannot find the tung data (molecules.txt) file anywhere. Could someone please upload this data?

total number of genes for PCA page incorrect

only ~10,000 not 14,000

Add real data to DE-1

Have exercise for them to fit distributions to real data.

CPM issue

A reply from one user:

I’ve been following the videos and materials for the scRNA-seq course that you ran last year, and have found them very helpful, thanks to you and the rest of the organisers for providing this resource, and hope to be able to attend in person in future!

Just one potential bug that I have come across so far that I thought you might like to have a look at - in 15.2.1 RUVg (and repeated in 15.2.2) - I think this code is missing “* 1000000", as it seems to produce counts per “one" rather than counts per million.

set_exprs(umi.qc, "ruvg2_logcpm") <- log2(t(t(ruvg$normalizedCounts) / 
                                           colSums(ruvg$normalizedCounts)) + 1)

I only noticed this as I tried to run a t-SNE of the resulting ruvg2_logcpm expression data, and it produced a 'perplexity too high' error, which was not correctable by manually adjust the perplexity. This led me to look at the expression values and notice they were all very low by a factor of 10E6. I'm not sure why this produced the error that it did (as the matrix dimensions were not affected?), but it clearly didn't like the low expression values as the t-SNE ran without error after I multiplied by 1000000 to produce what look like true CPM. plotPCA ran happily with the original very low expression values.

Perhaps this has already been noted, or is only an issue for my data, but thought worth pointing out in any case!

Check links to blischak data in chapter 6.2

At the moment there are no files at the link locations.

Introduce the blischak dataset before the exercises in chapter 04

Remove plots from 07-exprs-overview-reads.Rmd

Remove the same plots from 07-exprs-overview-reads.Rmd as in pull request #5.

Change Monocle to Monocle2

It's been released already in Spring 2016.

Add more to pseudotime chapter using this review paper

http://onlinelibrary.wiley.com/doi/10.1002/eji.201646347/full

Sort out normalisation chapters

In the latest version of scater normaliseExprs does not write to norm_counts slot, but instead to exprs slot. Normalisation chapter will have to be updated.

Update CCA for Seurate v3 - all different.

Add "truth" diagram to Pseudotime chapter.

MAGIC in R

https://github.com/pkathail/magic/blob/develop/R/run_magic.R

Change tSNE parameters

based on changes in #17

Add a chapter discussing this very important paper

http://www.cell.com/cell-systems/fulltext/S2405-4712(16)30331-3

"trim_galore" command could not be found

I am trying to follow the tutorials in RStudio. In "process-raw-QC.Rmd", when I run the following command, it said that trim_galore could not be found. I redownloaded the image, but still got the problem. Could you please let me know how to solve this? Should I try earlier version of this class? Thank you!

trim_galore -h

/tmp/RtmpgcO0S6/chunk-code-2e2786f67e.txt: line 1: trim_galore: command not found

Jeff

Missing files. The latest added part "Processing RAW scRNA-seq data" has no raw data in git.

Hello, dear professors, thanks for your great course! Recently, you seem to update the course and add the important part, processing the raw data. However, there is no share data folder in this git repository. I want to follow your course task and I need those files. It's more convenient for us who studying this course to directly use the data instead of finding them in the paper. And by the way, the paper you quoted "Kolodziejczyk, Aleksandra A., Jong Kyoung Kim, Valentine Svensson, John C. Marioni, and Sarah A. Teichmann. 2015. “The Technology and Biology of Single-Cell RNA Sequencing.” Molecular Cell 58 (4). Elsevier BV: 610–20. doi:10.1016/j.molcel.2015.04.005" seems no data providing. It might be another Deng's paper. However, in the SRA database, there as so much raw data of this paper that we cannot find the data used in your course, that is supposed to be saved in a "share" folder.

Simulated vs Real data vs Integrated RNA seq data analysis

#1 What are the changes required if using sc-RNA seq count data simulated using popular methods like splatter compared to real data?
#2 Methods to integrate different sc-RNA seq data sets?
#3 Changes in analysis workflow when integrating sc-RNA seq data sets with each other and with other sc-omics data like ATAC-seq data?

Add SCnorm & scone to normalization

FPKM and TPM note

Add a note that FPKM and TPM should not be used with UMIs!

add a note about Bioconductor

Users have to have the latest version installed on their machines

Add SIMLR to the course

http://bioconductor.org/packages/release/bioc/html/SIMLR.html

detect outlier in plotPCA

Hi, I copy the code in http://hemberg-lab.github.io/scRNA.seq.course/cleaning-the-expression-matrix.html#exprs-qc and try plotPCA for automatic cell filtering. I put the code "assay(umi, "logcounts") <- log2(counts(umi) + 1)" before running plotPCA and the warning message is like this

Warning messages:
1: In .disambiguate_args(...) :
non-plotting arguments like 'pca_data_input' should go in 'run_args'
2: 'return_SCE=TRUE' is deprecated, use 'runPCA' instead

And the dots in the PCAplot were not colored according to outliers or not.
My R version is 3.5 and scater package version is 1.8.0.

Thanks!

Include this paper to the course

http://www.sciencedirect.com/science/article/pii/S1097276517300497

Jenkins stuff to remember

If you need to check the build console output or start a new build from outside Sanger.

Use VPN and ssh to the instance with Jenkins, then

Last build console output

/var/lib/jenkins/jobs/PROJECT_NAME/builds/lastSuccessfulBuild/log

Start a new build

First create a crumb

CRUMB=$(curl -s 'http://USERNAME:PASSWORD@localhost:8080/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb)')

Then start a new build using the crumb

curl -X POST -u USERNAME -H "$CRUMB" localhost:8080/job/sc-course/build

DE-1 Pois-Beta equation is not correct.

lambdas should be multiplied by g

How to run browser in Docker?

Hi,

I am totally unfamiliar with Docker.

After installing Docker tool (under Win8), I run:
docker run -d -p 8787:8787 quay.io/hemberg-group/scrna-seq-course-rstudio, and it works.

Then how could I visit localhost:8787 in the browser? I open the browser and type localhost:8787, but it did not show the log in interface. Would you mind making this step more clear? (especially under Win8 environment)

Could I do alignment under this Docker image in the future using just a laptop?

Sorry for these silly questions, I just begin learning to preprocess raw fastq data.

Thanks a lot!

Wenhao

Clustering and Marker Gene Identification

≤ 5000 cells : SC3

> 5000 cells : Seurat

I haven't seen that elsewhere. Is there a specific reason for the difference?

New things

Imputation chapter
Separate chapter for SEURAT
Add more better batch correction methods
(optional) 4. Cross-dataset comparison (scmap, metaneighbour, mnnCorrect, CCA-part of Seurat)

Notes from course 31 Oct 2017

Make script for figure 3.2 (unique map, multi-map, unmapped) & include discussion of what if too many cells to visualize? -> fit linear or exponential distribution & find outliers
Discuss Droplets-> size proportional to wait time = exponential distribution -> ~exponential distribution of library sizes.

spend more time on last 3 days and less on processing raw reads
differential isoform analysis?
minimum # reads per cell
umi-counting with kallisto how to aggregate umis from transcripts to genes
Practicals to add: slalom, RNA velocity, dropEST, UMI-tools, BEARscc, Libinorm, Cell-cycle analysis
Drop-seq pipeline
imputation: when is zero really a zero?

hemberg-lab / scrna.seq.course Goto Github PK

scrna.seq.course's People

Contributors

Stargazers

Watchers

Forkers

scrna.seq.course's Issues

Start a new build

Recommend Projects

Recommend Topics

Recommend Org

Jobs