GithubHelp home page GithubHelp logo

tssports-master's Introduction

tssports : integrates tsRNA feature extraction, differential analysis, and extensive visualization in one

platforms

https://github.com/Xia-Youmei/tssports-master

tssports can perform small RNA classification, differential gene analysis, principal component analysis (PCA), and visualize the related functions and mechanisms of tsRNA through pie chart, volcano chart, heat map, MA chart etc.

Installation

You can install the development version of tssports like so:

if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

devtools::install_github('Xia-Youmei/tssports-master')

Alternatively, the latest version can be installed locally from Git-hub: https://github.com/Xia-Youmei/tssports-master, and use the R package for your own data.

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}

remotes::install_local("file_address/tssports-master-main.zip",upgrade = F,dependencies = T)

Note: For most use cases it is not necessary to install the tssports package locally, If you have a bad Internet connection, you can choose this method, and because of the sample data is large, please be patient for a few minutes.

To start working with the package tssports, you can load it in the R environment with the following command.

library('tssports')

Example

There is an example downloaded from Gene Expression Omnibus under the accession code GSE144666, it has 6 data sets that are SRR11004011-13 (RNAs were sequenced with standard protocols), SRR11004020-22 (RNAs were treated with T4PNK, then AlkB before cDNA library construction), organism is Mus musculus, tissue is brain.

First, you need to determine the working directory located in the dataset address:

setwd("./examples")

sportsV1.1 processed and generate the output files into one folder, you can determine the working directory located in the dataset address, it will automatically prossessing those contain "-miR-|-mir-|-let-" keywords annotation commented out, and automatically add reads of same genes at the end of the file, the filename ended in “_output_collapse_miRNA.txt” is obtained.

collapse_mature_mirna_reads()

This function will result in 3 unified files: 1. sports_combined_sample_fragments_counts_matrix_all.txt; 2. sports_combined_sample_fragments_counts_matrix_0.5.txt; 3. sports_combined_sample_fragments_annotation.txt. It can also read the entire folder, automatically identify and process the files ending in “_miRNA.txt” in the folder, first it will collect and sort out all genes to form the first file contains gene sequence, gene sequence length, whether to match to the genome, annotated genes; After that, all the same gene sequences in the input file were sorted into one file, and the screening condition was set as the probability of each fragment not being zero in the sample was greater than 0.5 to form the second file. The third file is the original file that collates all the same gene sequences in the input file into one file without setting screening conditions and gives users sufficient follow-up custom analysis.

combine_read_counts()

1.sports_combined_sample_fragments_counts_matrix_all
Fragment	Length	Match_Genome	Annotation
TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGT	44	Yes	28S-rRNA
TCACAGTGAACCGGTCTCTTTAA	23	NO	piRNA
CGCGACCTCAGATCAGACGT	20	NO	28S-rRNA
TCGGATCCGTCTGAGCTTGGCTTT	24	NO	piRNA
TCACAGTGAACCGGTCTCTTAA	22	NO	piRNA
TCTTTGGTTATCTAGCTGTATGTT	24	NO	piRNA
GGCTGGTCCGAAGGTAGTGAGTTATCTCAATT	32	Yes	RNY1-YRNA
TCAGTCGGTCCTGAG	15	Yes	28S-rRNA
TGGGCTGTAGTGCGCTATGC	20	Yes	misc_RNA
CTGGGCTGTAGTGCGCTATGC	21	Yes	misc_RNA
CGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGT	43	Yes	28S-rRNA
CGCGACCTCAGATCAGAC	18	NO	28S-rRNA
CATTGATCATCGACACTTCGAACGCACTTGCGGCCCCGGGT	41	NO	5.8S-rRNA

2.sports_combined_sample_fragments_counts_matrix_0.5.txt
Sequence	SRR11004011	SRR11004012	SRR11004013	SRR11004020	SRR11004021	SRR11004022
TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGT	269161	143901	82697	49357	96480	132035
TCACAGTGAACCGGTCTCTTTAA	204627	136474	72396	2679	5694	38834
CGCGACCTCAGATCAGACGT	93017	119623	66666	30488	33792	143835
TCGGATCCGTCTGAGCTTGGCTTT	66940	50898	49442	25380	38967	73429
TCACAGTGAACCGGTCTCTTAA	45249	32876	14415	503	997	8267
TCTTTGGTTATCTAGCTGTATGTT	35142	28972	27240	1199	3243	9601
GGCTGGTCCGAAGGTAGTGAGTTATCTCAATT	32900	20960	14730	52832	34279	44606
mmu-miR-3960	0	0	0	12	8	2
mmu-miR-3473g	0	0	0	2	1	1
mmu-miR-6987-5p	0	0	0	2	3	1
mmu-miR-1951	0	0	0	1	1	3

3.sports_combined_sample_fragments_annotation.txt
Sequence	SRR11004011	SRR11004012	SRR11004013	SRR11004020	SRR11004021	SRR11004022
TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGT	269161	143901	82697	49357	96480	132035
TCACAGTGAACCGGTCTCTTTAA	204627	136474	72396	2679	5694	38834
CGCGACCTCAGATCAGACGT	93017	119623	66666	30488	33792	143835
TCGGATCCGTCTGAGCTTGGCTTT	66940	50898	49442	25380	38967	73429
TCACAGTGAACCGGTCTCTTAA	45249	32876	14415	503	997	8267
TCTTTGGTTATCTAGCTGTATGTT	35142	28972	27240	1199	3243	9601
GGCTGGTCCGAAGGTAGTGAGTTATCTCAATT	32900	20960	14730	52832	34279	44606
mmu-miR-511-5p	0	0	0	0	0	1
mmu-miR-505-3p	0	0	0	0	0	1
mmu-miR-7074-5p	0	0	0	0	0	1
mmu-miR-7229-5p	0	0	0	0	0	1
mmu-miR-6984-3p	0	0	0	0	0	1

This function requires the user to input the set of the last two digits in the SRR filename of the experimental group to distinguish the experimental group from the control group. For example, 20:22 is required for difference analysis in the example data. This function will automatically recognize the sports_combined_sample_fragments_counts_matrix_0.5.txt file output from the previous function. After processing, you will get 4 files, 1. Match all fragment annotations to genes, and get the file sports_counts_all.txt; 2. Match all the annotations to genes, and processed with the DESeq2 R package to obtain the differentially expressed genes, and obtain the file sports_deg_fDR005_2fc_all.txt. 3. All fragments were annotated as genes, and the differentially expressed genes were obtained after processing with DESeq2, and the reads were normalized to cpm (Counts per million) value to obtain the file sports_cpm_fdr005_2fc_all.txt. 4. Save all the differential genes calculated by DESeq2, without distinguishing log2FoldChange and padj, and get the file sports_DEG_all.txt;

getdegs(20:22)

1.sports_counts_all.txt
	SRR11004011	SRR11004012	SRR11004013	SRR11004020	SRR11004021	SRR11004022
28S-rRNA	269161	143901	82697	49357	96480	132035
piRNA	204627	136474	72396	2679	5694	38834
RNY1-YRNA	32900	20960	14730	52832	34279	44606
misc_RNA	28044	35480	18645	13025	21257	15968
5.8S-rRNA	22734	13872	2363	21703	16378	11993
45S-rRNA	12263	12825	15348	12838	27474	53615
mature-tRNA-Gly-GCC_5_end;mature-tRNA-Gly-CCC_5_end	10286	8054	6349	15440	24174	38846
5S-rRNA	8157	19663	8814	4571	8266	13170

2.sports_deg_fDR005_2fc_all.txt
ID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
mature-tRNA-His-GTG	30035.3716708793	9.18307904145374	0.580202283242111	15.8273748771543	2.01452256749788e-56	2.96940626449187e-53
pre-tRNA-Leu-CAA	21805.5636328603	9.85974407854832	0.658354292658678	14.9763496471346	1.04818154091913e-50	7.72509795657397e-48
mature-tRNA-Arg-CCT;mature-tRNA-Arg-CCG	468.48506156553	7.40567460160257	0.655365046544287	11.3000756458593	1.31105017035912e-29	6.44162650369779e-27
mature-tRNA-Arg-ACG_CCA_end	43397.7312599756	7.76515581481599	0.692681783635114	11.2102786564782	3.630486126182e-29	1.33783413749807e-26
pre-tRNA-Tyr-GTA	505.057106284187	9.2287311054882	0.828001518415102	11.1457900743384	7.50750007057211e-29	2.21321102080466e-26
mature-tRNA-Pro-TGG_5_end;mature-tRNA-Pro-CGG_5_end;mature-tRNA-Pro-AGG_5_end	1946.81081659445	9.928241334204	1.04537621880354	9.49729021535149	2.15423010298028e-21	5.29222528632156e-19
mature-tRNA-Arg-ACG	1850.07541775039	6.35805962609316	0.67452163520523	9.42602771245233	4.25913920030078e-21	8.96853025891908e-19

3.sports_cpm_fdr005_2fc_all.txt
ID	SRR11004011	SRR11004012	SRR11004013	SRR11004020	SRR11004021	SRR11004022
28S-rRNA	14.3200235328116	13.5965848351574	13.0443714276675	14.6978424726186	14.9935127374893	14.6639604791408
piRNA	13.9245729094712	13.5201406814239	12.8524707211966	10.4952997919691	10.9115022019083	12.89856566266
RNY1-YRNA	11.2882200871046	10.8179018026627	10.5560816430719	14.7959962196167	13.5006881015406	13.0984586794382
misc_RNA	11.057924529273	11.5769419854079	10.8959117870999	12.776022553392	12.8113773297272	11.6166995041221
5.8S-rRNA	10.7552419878891	10.2228471945342	7.92102213175833	13.5125514135644	12.4352583087336	11.2038607265397
45S-rRNA	9.86541494133145	10.109728768677	10.6153362976346	12.7551626430638	13.1814608784966	13.36382994117
mature-tRNA-Gly-GCC_5_end;mature-tRNA-Gly-CCC_5_end	9.61208227726735	9.43932057807877	9.34318987544876	13.0213797611129	12.9968713182784	12.8990113391204
5S-rRNA	9.27799186112964	10.7257992535941	9.81583605097291	11.2657012714733	11.4490159975246	11.3388688232466

4.sports_DEG_all.txt
ID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
28S-rRNA	117369.725044431	0.382472823614208	0.644227936830634	0.593691769245269	0.552718289120833	0.673819151856151
piRNA	50767.1086731965	-2.40366212934164	0.765431138235381	-3.14027220643652	0.00168790918441722	0.00846251067289452
RNY1-YRNA	45719.3346323336	2.44151119683615	0.707386312647438	3.45145382824646	0.000557575147822217	0.00336997078222502
misc_RNA	22395.6619710466	0.67498992210017	0.6596268775384	1.02329050723206	0.306170510703025	0.448158225199861
5.8S-rRNA	19601.5728991033	1.97867701752085	0.882338784732432	2.24253659904668	0.0249267126520048	0.0724693776115484
45S-rRNA	24872.0434625502	2.26075136993462	0.569544671009807	3.96940132180727	7.20534381777695e-05	0.000689654336844366
mature-tRNA-Gly-GCC_5_end;mature-tRNA-Gly-CCC_5_end	21344.3984569	2.86332800867512	0.563247054211885	5.08360938111179	3.70329152216592e-07	9.00411658654179e-06
5S-rRNA	9981.49076365101	0.648247200530487	0.639794072975421	1.01321226299543	0.310958784289966	0.450329597790607

This function will automatically identify sports_DEG_fdr005_2fc_all.txt file in the folder, and categorizing the miRNAs, extract contains keywords "-miR-|-let-" for miRNA_diff. TXT file, The tsRNA_diff.txt file containing the keyword "tRNA", the rsRNA_diff.txt file containing the keyword "rRNA", and the ysRNA_diff.txt file containing the keyword "YRNA".

tsRNA_ann_classify()

1.miRNA_diff.txt
ID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
mmu-miR-2137	87.8349839348083	5.9563172036757	0.799484930810467	7.45019321081833	9.3203633809477e-14	7.63234201306495e-12
mmu-miR-153-3p	316.296855176786	-5.03376515234459	0.750489390856289	-6.70731020807795	1.98244544430187e-11	1.12389407111575e-09
mmu-miR-340-5p	1374.52373372071	-4.43397551622456	0.754895118332545	-5.87363119530906	4.26351320188798e-09	1.39653743546286e-07
mmu-miR-101a-3p	13763.2199746233	-4.86630273572994	0.87811726649203	-5.5417458708792	2.9947070804607e-08	8.82839647319815e-07
mmu-miR-690	1186.05912973371	3.84780460164948	0.713714629912904	5.39123683385759	6.99743788822742e-08	1.983504509086e-06
mmu-miR-9-3p	2296.27478279842	-4.20391814855742	0.810478351339246	-5.18695920947278	2.13755526571688e-07	5.62635082440478e-06
mmu-miR-136-3p	124.245832937843	-5.32869206253836	1.04845251393674	-5.08243529554823	3.72626263079409e-07	9.00411658654179e-06

2.tsRNA_diff.txt file
ID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
mature-tRNA-His-GTG	30035.3716708793	9.18307904145374	0.580202283242111	15.8273748771543	2.01452256749788e-56	2.96940626449187e-53
pre-tRNA-Leu-CAA	21805.5636328603	9.85974407854832	0.658354292658678	14.9763496471346	1.04818154091913e-50	7.72509795657397e-48
mature-tRNA-Arg-CCT;mature-tRNA-Arg-CCG	468.48506156553	7.40567460160257	0.655365046544287	11.3000756458593	1.31105017035912e-29	6.44162650369779e-27
mature-tRNA-Arg-ACG_CCA_end	43397.7312599756	7.76515581481599	0.692681783635114	11.2102786564782	3.630486126182e-29	1.33783413749807e-26
pre-tRNA-Tyr-GTA	505.057106284187	9.2287311054882	0.828001518415102	11.1457900743384	7.50750007057211e-29	2.21321102080466e-26
mature-tRNA-Pro-TGG_5_end;mature-tRNA-Pro-CGG_5_end;mature-tRNA-Pro-AGG_5_end	1946.81081659445	9.928241334204	1.04537621880354	9.49729021535149	2.15423010298028e-21	5.29222528632156e-19
mature-tRNA-Arg-ACG	1850.07541775039	6.35805962609316	0.67452163520523	9.42602771245233	4.25913920030078e-21	8.96853025891908e-19

3.rsRNA_diff.txt
ID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
45S-rRNA	24872.0434625502	2.26075136993462	0.569544671009807	3.96940132180727	7.20534381777695e-05	0.000689654336844366
rRNA	12.4284548829842	2.37423147013067	0.937010532600679	2.53383648051529	0.0112821365345229	0.0402660272442777

4.ysRNA_diff.txt
ID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
RNY3-YRNA	1275.60735378423	2.23955706124081	0.603348544285474	3.71187944754726	0.000205725970976646	0.00156309320216277
RNY1-YRNA	45719.3346323336	2.44151119683615	0.707386312647438	3.45145382824646	0.000557575147822217	0.00336997078222502

This function will automatically identifies the sports_counts_all.txt file in the folder, selects the top 1000 genes by multiple change, and generates a PDF file of the principal component analysis using the ggplot2 R package.

pca(20:22)

pca

function of visualization

pie_plot_tsRNA_aa.pdf: This function will automatically identifies the tsRNA_diff.txt file in the folder, and uses the ggplot2 R package to draw the pie chart of different amino acid classes of tsRNA, including Glu, Gly, Val, and Ser, generate the pie_plot_tsRNA_aa.pdf file.

pie_plot_tsRNA_end.pdf: This function will automatically identify the tsRNA_diff.txt file in the folder, and use the ggplot2 R package to draw the pie chart of different tsRNA end categories, including 5’end, 3’end, CCA end, generate the pie_plot_tsRNA_end.pdf file.

maplot.pdf: This function will automatically identifies the sports_deg_all.txt file in the folder and uses the ggplot2 R package to draw the MA map of sports output differentially expressed genes, generating the "maplot.pdf" file.

heatmap_plot.pdf: This function will automatically identifies the sports_cpm_fdr005_2fc_all.txt and sports_DEG_fdr005_2fc_all.txt files in the folder, selects and uses the ggplot2 R package to draw a heat map of the differentially expressed genes that sports outputs, Generate the heatmap_plot.pdf file.

volcano_plot: This function will will automatically identify the sports_DEG_all.txt file in the folder and use the ggplot2 R package to draw the volcano plot of sports output differentially expressed genes, log2FC>1, p≤0.05, generate the "volcano_plot.pdf" file.

visualization()

Those pictures shown below are the sample data output, and the format is adjusted by Adobe Illustrator software.

pie_plot_tsRNA_aa_end

maplot

heatmap_plot

volcano_plot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.