GithubHelp home page GithubHelp logo

tadkeys / tabsat Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 3.0 481.54 MB

Targeted Amplicon Bisulfite Sequencing Analysis Tool

Shell 5.15% Python 1.06% Dockerfile 0.02% Perl 47.33% HTML 10.56% R 0.09% Makefile 1.39% Batchfile 0.07% C++ 25.26% C 0.14% Smarty 8.44% CSS 0.09% GAP 0.38%
genome fastq dockerfile methylation bisulfite bisulfite-sequencing targeted-sequencing

tabsat's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tabsat's Issues

Install not quite correct

Hi. I've been trying to get the pipeline installed and feel like I put all the right pieces in place, but it seems a few things may not be right still, and I'm stumped. I have attached output streams from both test_tabsat_miseq.sh and test_tabsat_mouse.sh here. In the miseq test, it seems it runs fine but the .csv files produced in the latter stages are all empty except for the header line which (I think) messes up the final stages. In the mouse example it seems to have an error at the lollipop stage. In both cases, it seems to be missing some expected output files at the end, I'm presuming due to the earlier issues.

Also, sometimes running either example it seems to throw a samtools error at the sort stage even though the target file has been generated. This doesn't always happen and seems weird to me, but I ahae observed this outcome a few times now. If the sort fails then obviously downstream processes get compromised. Fortunately, this doesn't seem to happen frequently.

Can you please look over the attached files and see if you can get any hint of what might not be right with my configuration? And if you have any insight into the samtools error that would be great too.

Finally, if it's of any use for you to know, I'm running in a RedHat 7 environment with 24G of RAM, but I am doing the work on an external hard drive due to lack of space on the hard drives of the machine.

Thanks,
John Martinson

miseq.test.txt
mouse.test.txt

error running test

I have the following error when running the test for miseq (in bold). It might come from the (...)-zz_test/SRR3296596_1.fastq which should not contain the '-' before 'zz_test'. I have tried to figure out why this character was inserted, but I couldn't find so far.

  • CMD subpopulations: /home/gcristofari/tools/tabsat/tools/MethylSubpop/subpopulations.sh -i /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations -p 0.7 -t /home/gcristofari/tools/tabsat/tools/zz_test/target_list_miseq.csv
    Starting with methylation pattern analysis
    Output will be saved in /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/Output
    Whole Target for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Intermediate Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Paste intermediate Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Intermediate Subpops for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Final Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Paste final Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Final Subpops for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Comparision of first and last methylation positions in all samples
    Finding methylation subpopulations
    Done with workflow
    mv: cannot stat '/home/gcristofari/tools/tabsat/tools/-zz_test/SRR3296596_1.fastq /home/gcristofari/tools/tabsat/tools/zz_test/SRR3296596_2.fastq': No such file or directory

  • CMD patternmap: /home/gcristofari/tools/tabsat/tools/Patternmap/patternmap.sh -i /home/gcristofari/tools/tabsat/tabsat_test_output_miseq -s /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/copied_inputs -t /home/gcristofari/tools/tabsat/tools/zz_test/target_list_miseq.csv

Unresponsive script

While running it on the test data, script reaches a point where no more STDERR or STDOUT is produced. This is the last line of the combined STDERR and STDOUT:

---- Crete target lists in patternmap
-Patternmap:` removing *.log, *.target, *.jsons
.../tabsat/tabsat_test_output_miseq/Patternmap
.../tabsat/tools/Patternmap/patternmap.sh: line 181: $i: ambiguous redirect
.../tabsat/tools/Patternmap/patternmap.sh: line 182: $i: ambiguous redirect

Script keeps running for hours without any new output. A top command shows a tr tool being run

Documentation update

It is not indicated in which folder the the reference genome should be saved once downloaded. In tabsat/reference ?

bugs in samtools command lines (check_quality.sh script)

In the check_quality.sh script, the following lines (l. 63 & l. 66) give an error:

samtools sort -o ${BAM_FILE} aa | ${INTERSECTBED} -v -a - -b ${QUALITY_DIR}/target_list.bed > ${NON_INTERSECT_BAM}

samtools sort -o ${BAM_FILE} aa | ${INTERSECTBED} -a - -b ${QUALITY_DIR}/target_list.bed > ${INTERSECT_BAM}

If I understood well, it should be replaced by:

samtools sort ${BAM_FILE} | ${INTERSECTBED} -v -a - -b ${QUALITY_DIR}/target_list.bed > ${NON_INTERSECT_BAM}

samtools sort ${BAM_FILE} | ${INTERSECTBED} -a - -b ${QUALITY_DIR}/target_list.bed > ${INTERSECT_BAM}

Hard coding of "NONDIR" in patternmap.sh

Stephan,

I think I found something that should be modified in the patternmap.sh script. The following line appears in there:
SAMPLE_C="${INDIR}/COVERAGE_NONDIR_${ALIGNER}/MethylSubpopulations/Output/SampleComparison.txt

I was doing a "DIR" run so my outputs had gone to "COVERAGE_DIR_${ALIGNER}", therefore the following error message appeared at the pattern map stage (I assume due to the hard coding of "NONDIR" in the line above):

cp /media/MyBook/progs/tabsat-master/fhm_test_output_dir_all2/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/Output/SampleComparison.txt /media/MyBook/progs/tabsat-master/fhm_test_output_dir_all2/Patternmap/All_targets.txt
cp: cannot stat โ€˜/media/MyBook/progs/tabsat-master/fhm_test_output_dir_all2/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/Output/SampleComparison.txtโ€™: No such file or directory

Thought you would like to know.

John

Permission denied while accessing Docker.

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/tabsat/json: dial unix /var/run/docker.sock: connect: permission denied

unable to install in custom folder

The folder paths are hard-coded in multiple scripts, which renders impossible to install tabsat somewhere else than in /home, whithout manually editing all the .sh scripts. I would suggest to make a unique configuration file which will load all the relevant paths and variables. Using the export command might useful.

bugs in samtools command lines (main tabsat script)

In the main tabsat scripts, the command lines (l. 347 & 366) for samtools sort should be corrected:

samtools sort "${current_sam}_removed_cov_one.bam" "${current_sam}_removed_cov_one_sorted"

into:

samtools sort "${current_sam}_removed_cov_one.bam" > "${current_sam}_removed_cov_one_sorted.bam"

and,

samtools sort "${current_sam}_removed_cov_one.bam" "${current_sam}_removed_cov_one_sorted"

into

samtools sort "${current_sam}_removed_cov_one.bam" > "${current_sam}_removed_cov_one_sorted.bam"

I don't know if this might come from changes in samtools syntaxes, but currently it does not work with samtools 1.3.1 wich is installed on our server.

update documentation

Please provide a list of pre-requisites including:

  • Cairo.pm and Switch.pm perl modules
  • need to compile bedtools in the bedtools folder
  • other?

test script generates error

When starting the test script ./test_tabsat_tmap.sh

I've got the following error (note that I've got the same error running tabsat manually on the same dataset or using the script for MiSeq):

#################################

Welcome to TABSAT!

github.com/tadKeys/tabsat

#################################

  • Library is SE
  • Using aligner: tmap
  • MIN_READ_QUAL: 21
  • MIN_READ_LEN: 9
  • Maximum read length not specified. Setting it to 100000
  • MAX_READ_LEN: 100000
  • PERCENT_TARGET: 0.7
  • READ_CUTOFF: 4
  • Sort list not specified. Setting it to ''
  • SORT_LIST:
    Traceback (most recent call last):
    File "/Users/gcristof/tabsat/tools/ait/check_target_list.py", line 4, in
    import create_final_table
    File "/Users/gcristof/tabsat/tools/ait/create_final_table.py", line 150
    print "Prefilling strand information buffer with " + str(len(key_list)) + " items"
    ^
    SyntaxError: invalid syntax
    Target list is not in the correct format

bug in patternmap.sh script

In the patternmap.shscript, the variable SAMPLE_C contains the COVERAGE_NONDIR_tmappath, which does not exist if bowtie2 is used as aligner: SAMPLE_C="${INDIR}/COVERAGE_NONDIR_tmap/MethylSubpopulations/Output/SampleComparision.txt"

This makes the test_tabsat_miseq.shscript to crash. An ALIGNER variable should be defined (see also this issue: #4).

Script fails when not installed in $HOME

Script fails to call a few perl scripts from .../tabsat/tools/MethylSubpop/subpopulations.sh if not nstalled in $HOME
The reason is it expects that the preceding (installation) path to be $HOME/tabsat/...

analysis paired end using tmap

Hi,
these days, i got paired end ampliseq data.
so i want to run bismark using tmap to align to reference.
but i guess the bismark script of tools/bismakr_tmap does not support paired end when using tmap.

So i wanna modify something in that bismark script.
would you recommend anything?
or is there some possibility to run tmap to align paired end?

Best Regards.
Jeongmin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.