GithubHelp home page GithubHelp logo

w-l / deviate Goto Github PK

View Code? Open in Web Editor NEW
19.0 19.0 7.0 17.94 MB

Python tool for the analysis and visualization of mobile genetic elements

License: GNU General Public License v3.0

Python 63.87% R 15.43% Perl 18.89% Shell 1.80%

deviate's People

Contributors

w-l avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deviate's Issues

IndexError: list index out of range

Hi,

I got the following error when trying to use deviaTE_analyse:

Starting analysis of: Sequence in: Sample.fastq.fused.sort.bam
no annotaions found for: Sequence
Traceback (most recent call last):
File "/var/lib/miniconda3/envs/deviaTE_env/bin/deviaTE_analyse", line 59, in
sample.perform_pileup(hq_threshold=args.hq_threshold)
File "/var/lib/miniconda3/envs/deviaTE_env/lib/python3.6/site-packages/deviaTE/deviaTE_pileup.py", line 84, in perform_pileup
pr.count_nucleotide(sample_sites=self.sites)
File "/var/lib/miniconda3/envs/deviaTE_env/lib/python3.6/site-packages/deviaTE/deviaTE_pileup.py", line 356, in count_nucleotide
site = sample_sites[self.column_pos]
IndexError: list index out of range

Here is the command line:
deviaTE_analyse --input $bam --family $Fam --library $Library --single_copy_genes 1,2,3

I am using ubunu, samtools 1.7 (using htslib 1.7-2), bwa Version: 0.7.17-r1188

Can you help me ?

Vincent

Won't identify TEs that are created with EDTA for non-model organism

Hello,

I am trying to run this for a set of raw sequences for Anopheles gambiae. I used EDTA to create a TE library from the agamP4 genome assembly and then used my raw sequences as input for this pipeline to identify which TEs are present in which samples we have. Following trimming/mapping, the pipeline attempts to identify TEs but I get the following error for every TE identified by EDTA.

Starting analysis of [TE] in [RAW DATA]-final.fastq.fused.sort.bam..

No annotaions found for: [TE]

Traceback (most recent call last):
File "/home/ch943/bin/miniconda/envs/deviaTE_env/bin/deviaTE_analyse", line 100, in
sample.write_frame(out=args.output + '.raw', insertions=ihat, command=comm, t=timestamp, norm='raw')
File "/home/ch943/bin/miniconda/envs/deviaTE_env/lib/python3.6/site-packages/deviaTE/deviaTE_pileup.py", line 204, in write_frame
with open(out, 'w') as outfile:
FileNotFoundError: [Errno 2] No such file or directory: '[RAW DATA]-final.fastq.[TE].raw'

Any guidance would be appreciated.

--input_fq_dir contains .fq files

I ran the program with the parameter --input_fq_dir /path/to/dirname
but the dirname contained .fq files instead of fastq . It would be a quick fix I guess to allow for fq as well otherwise it exits with an error.

Idea for walkthrough: additional information on how to process paired reads

Hi Lukas,

I have a minor comment:

I use DeviaTE to estimate TE copy numbers on already mapped bam files. These bam files consist of paired reads that have been mapped in single read mode, but the input for the mapping algorithm was a concatenated .fq file with both read pairs (read1 and read2). Thus, the same read name occurs twice in these files.

For some of these files, DeviaTE works just fine while for others, I get an error message similar to what has been reported here in the issues sections:

... line 70, in
fam_strand = seg.reference_name + '+'
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

I realized that when renaming the reads before mapping (read_1, read_2, ..., read_n) - this resolves the issue.
Maybe this would be good to point out in the walkthrough? I can imagine that some users do have paired read data (e.g., for polymorphism scans) that are also used for TE analysis with DeviaTE.

Best,
Anna

various errors after conda installation

Hi,

I installed via conda as instructed. But I had a few questions.

  1. deviaTE --input_bam_dir ... how do you specify a folder of bam files, when I do the following --input_bam_dir ./bam_files i get an deviaTE: error: unrecognized arguments: ./bam_files

  2. When I provide a single file ... deviaTE --input_bam ./bam_files/F1_R1.aln.bam

Alignment provided, skipping step 1
Detecting internal deletions..

samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
[E::hts_open_format] Failed to open file ./bam_files/F1_R1.aln.bam.tmp.sam
Traceback (most recent call last):
  File "/home/rm786/.conda/envs/deviaTE_env/bin/deviaTE_fuse", line 23, in <module>
  1. What's your recommendation for PE information, align separately and combine into a single bam?

Thanks so much!

more than one primary alignment & unsupported operand type(s) for +: 'NoneType' and 'str'

Hello,

I recently got the following error messages running deviaTE on Drosophila samples with custom libraries:

[bam_sort_core] merging from 8 files and 1 in-memory blocks...
Traceback (most recent call last):
File "/var/lib/miniconda3/envs/deviaTE_env/bin/deviaTE_fuse", line 56, in
raise ValueError('more than one primary alignment')
ValueError: more than one primary alignment

Traceback (most recent call last):
File "/var/lib/miniconda3/envs/deviaTE_env/bin/deviaTE_fuse", line 70, in
fam_strand = seg.reference_name + '+'
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Here is the command used:
deviaTE_prep
--input /mnt/Samples/$fastq
--library /home/ubuntu/Library.fa
--quality_encoding phred+33

I am using a conda environment on ubuntu
(conda create deviaTE==0.3.7 -c r -c defaults -c conda-forge -c bioconda -c w-l -n deviaTE_env)

Can you help with this ?

Thank you,
Vincent.

Diversity Estimates

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is
Hi same person:

I am wondering about your diversity estimates. How exactly did you calculate these? Are they just averages across the four nucleotides from the table output of deviaTE? It isn't stated in the supplemental materials as far as I can find.

Copynumber I understand, as normalized to a single gene. But as far as divergence, do you suggest isolating consensus sequences from the alignments to the library then building a tree from these?

Thank you for getting back to me. I really appreciate your help.

Christopher

condo create problems with samtools

hallo!
i have a problem with samtools after creating the condo environment:

conda create deviaTE -c r -c defaults -c conda-forge -c bioconda -c w-l -n deviaTE_env
#and
#like you suggested in another ticket
conda create deviaTE==0.3.7 -c r -c defaults -c conda-forge -c bioconda -c w-l -n deviaTE_env 

if i want to run samtools in the deviaTE_env i get an error:
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

i found out that this is an known conda bug and tried to update samtools

conda install -c bioconda samtools=1.9 --force-reinstall
#or 
conda config --add channels bioconda
conda config --add channels conda-forge
conda install samtools==1.11

but nothing helped. maybe you have some insight? i was really looking forward on using this program!
kind regards,
lisa

Genome Size

Hello,

I am hoping to perform analysis on the TE family radiation within the human genome. Is this possible using this tool? The D.melanogaster is 132 million, compared to the human at 3.2 billion. Is there some other way I should be going about this?

I have whole genome sequences, but deviaTE does not seem to progress past the point of trimming the reads, even when run on an HPC and given 6 days. Am I doing something incorrect?

Error during deviaTE_prep step: ValueError: more than one primary alignment

I used the following command using WGS Illumina sequencing reads and RepeatModeler libraries as input.
deviaTE_prep --input Arthir.fastq --library Arthir-families.fa --threads 64 --quality_encoding phred+33

I get this output on the screen:
Trimming reads.

Reads processed: 474519376
Reads passed filtering: 474519375
5p poly-N sequences trimmed: 12048
3p poly-N sequences trimmed: 0
Reads discarded during 'remaining N filtering': 0
Reads discarded during length filtering: 1
Reads trimmed during quality filtering: 38317741

Mapping reads.
Detecting internal deletions.

[bam_sort_core] merging from 214 files and 1 in-memory blocks...
Traceback (most recent call last):
File "/home/morpheus/.local/bin/deviaTE_fuse", line 56, in
raise ValueError('more than one primary alignment')
ValueError: more than one primary alignment

Can you please help me through this error? I am not able to figure out what is causing this error.

Thank you.
Ajinkya

Error in install.packages("data.table", repos = rep) : unable to install packages

Hello,

I was running deviaTE, and I got the following error. I checked and the library is writable

This was my script:

deviaTE --input_fq vieillardiiBT023_cat_uniq.fq --families ALL --library $data_folder/impolita-families_mcclintok7.fa

This is the error:

During startup - Warning message:
Setting LC_CTYPE failed, using "C"
Warning in install.packages("data.table", repos = rep) :
'lib = "/home/apps/conda/miniconda3/envs/devia-te-0.3.8/lib/R/library"' is not writable
Error in install.packages("data.table", repos = rep) :
unable to install packages
Execution halted

End of deviaTE run

Best wishes,
Mayra

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.