GithubHelp home page GithubHelp logo

bioscripts's Introduction

bioScripts

splitContigs.py

Usage splitConfigs.py -i <inputfile.fasta> -o

For example: 
$ cat data/contigs.fasta
>contig1
ACGTA
>contig2
GGGATAGTCA
>contig3
GACTACTTTT
>contig4
ACGTA
>contig5
GGGATAGTCA
>contig6
GACTACTTTT

# To run it, you need python3 and biopython. Here are the command to install them:
$ module load module load miniconda3/4.10.3
$ conda create -n biopython biopython python=3.7.4
$ source activate biopython

# To run the software
$ bin/splitContigs.py -i data/contigs.fasta -o out
$ for i in out*; do echo $i; cat $i; done | less
Content of out0.fasta
>contig2
GGGATAGTCA
Content of out1.fasta
>contig3
GACTACTTTT
Content of out2.fasta
>contig5
GGGATAGTCA
Content of out3.fasta
>contig6
GACTACTTTT
Content of out4.fasta
>contig1
ACGTA
>contig4
ACGTA

insertGenome.sh

Usage: instertGenome.sh <nomalGenome.fa, required> <insert.fa required>

For example: 
$ cat data/genome.fa
>chr1
1234567891
>chr2
1234567892
>chr3
1234567893
>chr4
123

$ cat data/insert.fa
>chr2 + 4 # '+' means the insert is forward strand
abc
>chr3 - 5  # '-' means the insert is reverse strand
cat

$ bin/instertGenome.sh data/genome.fa data/insert.fa
$ cat genomeWithInsert.fa
>chr1
1234567891
>chr2
123
abc
4567892
>chr3
1234
atg
567893
>chr4
123

modifyGTF.sh

Usage: modifyGTF.sh <nomal.gtf, required> <insert.fa required>

For example: 
$ cat data/sample.gtf
chr1	unknown	exon	3214482	3216968	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1	unknown	stop_codon	3216022	3216024	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1	unknown	CDS	3216025	3216968	.	-	2	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr2	unknown	CDS	3	9	.	-	1	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr2	unknown	exon	1	4	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr2	unknown	CDS	7	10	.	-	0	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr3	unknown	exon	7	9	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr3	unknown	start_codon	5	13	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1	unknown	exon	4290846	4293012	.	-	.	gene_id "Rp1"; gene_name "Rp1"; p_id "P17361"; transcript_id "NM_001195662"; tss_id "TSS6138";
chr1	unknown	stop_codon	4292981	4292983	.	-	.	gene_id "Rp1"; gene_name "Rp1"; p_id "P17361"; transcript_id "NM_001195662"; tss_id "TSS6138";
    
$ cat data/insert.fa
>chr2 4
abc
>chr3 - 5
ddd

$ bin/instertGenome.sh data/sample.gtf data/insert.fa
$ cat gtfWithInsert.gtf
chr1	unknown	exon	3214482	3216968	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1	unknown	stop_codon	3216022	3216024	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1	unknown	CDS	3216025	3216968	.	-	2	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr2	unknown	CDS	3	12	.	-	1	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr2	unknown	exon	1	7	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr2	unknown	CDS	10	13	.	-	0	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr3	unknown	exon	10	12	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr3	unknown	start_codon	8	16	.	-	.	gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1	unknown	exon	4290846	4293012	.	-	.	gene_id "Rp1"; gene_name "Rp1"; p_id "P17361"; transcript_id "NM_001195662"; tss_id "TSS6138";
chr1	unknown	stop_codon	4292981	4292983	.	-	.	gene_id "Rp1"; gene_name "Rp1"; p_id "P17361"; transcript_id "NM_001195662"; tss_id "TSS6138";

bioscripts's People

Contributors

ld32 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.