GithubHelp home page GithubHelp logo

Comments (7)

W-L avatar W-L commented on August 24, 2024

Hello Vincent,
from the command line you used I assume you want to analyze a .bam file that contains reads that were already mapped to consensus sequences of TEs?

--input $bam

If that is the case, you have to specify your input file with --input_bam $bam instead.
Please let me know if this solves your issue. Otherwise it would be great if you could share what your bash variables evaluate to.
Best wishes

from deviate.

vmerel avatar vmerel commented on August 24, 2024

Thanks you for your answer.

If I try with --input_bam :

usage: deviaTE_analyse [-h] --input INPUT --family FAMILY [--library LIBRARY]
[--output OUTPUT] [--sample_id SAMPLE_ID]
[--annotation ANNOTATION] [--no_freq_corr]
[--hq_threshold HQ_THRESHOLD]
[--rpm | --single_copy_genes SINGLE_COPY_GENES]
deviaTE_analyse: error: the following arguments are required: --input

from deviate.

W-L avatar W-L commented on August 24, 2024

Ah, my mistake. I overlooked that you are using deviaTE_analyse instead of the wrapper script deviaTE. In that case it is hard to tell what is going on. A guess would be that a read was aligned to coordinates that are larger than the length of the reference sequence, which should not happen. Or maybe the detection of internal deletions has messed up and declared a deletion outside of the reference range. Would you mind sharing your --input and --library file or a sample thereof that recreates the error, as well as your argument to --family? Then I can investigate further.

from deviate.

vmerel avatar vmerel commented on August 24, 2024

Here you can find a fastq, a bam, and a subset of the library (if it can help you the last sequence produced an output no the others): https://filesender.renater.fr/?s=download&token=c58ff9ac-e0a7-73f2-7337-62b489eb4b73

from deviate.

W-L avatar W-L commented on August 24, 2024

Thank you for providing the files! There seem to be two issues here. One technical and one related to your library file.

  1. technical problem
    Did you install deviaTE with the conda environment? It seems that some functionality in conda has changed and it installs an older version of the tool. I am trying to figure out why that is at the moment.
    Could you run conda list | grep 'deviate' from within the environment and report the version number that comes up? If it is anything other than 0.3.7, then you will have to set up the conda environment again with an exact specification of the version number. Like this:
    conda create deviaTE==0.3.7 -c r -c defaults -c conda-forge -c bioconda -c w-l -n deviaTE_env
    Sorry about that!

  2. the first fasta sequence in your library contains the symbol / in the header, which causes a problem when creating the output files since / is the separator for filepaths. After replacing / with another symbol e.g. _ the reads will have to be remapped. I tested it on your data by running:
    deviaTE --library D_Tak.short_replaced.fa --families ALL --input_fq D_Tak_R1.fastq
    the keyword ALL with the argument --families automatically runs the analysis for all families in the library file)
    I will put an automatic replacement of / into the next version of the tool.
    Alternatively, you could also replace the symbol in the library as well as in the name of the reference sequence within your mapped bam file (e.g. with sed). That way the reads would not have to be remapped.

I hope this helps, please let me know

from deviate.

vmerel avatar vmerel commented on August 24, 2024

Thank you for your answer.

  1. Yes I installed deviaTE with the conda environment.
    conda list | grep 'deviate'
    deviate 0.2.1.1 py36_2 w-l

Ok so I started again using the v0.3.7 and replacing "/" in my library, and everything seems to work fine ! Thank you !

I just have a quick question, for some sequences I got this:
Reference sequence contains ambiguous nucleotide: W
Reference sequence contains ambiguous nucleotide: K
Reference sequence contains ambiguous nucleotide: W
Reference sequence contains ambiguous nucleotide: M

Do you have any advice on how dealing with this, knowing that for the moment I am more interested in abundance comparison between samples (more than sequence divergence) ? I thought about replacing these by "N", but I don't know if advised and/or necessary ...

Vincent.

from deviate.

W-L avatar W-L commented on August 24, 2024

Great to hear that, thanks for the reply!

Concerning the warning about ambiguous nucleotides:
This means that at certain positions in the reference/consensus sequence of the TE, the reference nucleotide is one of the letters that represents multiple, amibiguous nucleotides. All nucleotides that map to this position count towards coverage normally, like at any other position in the sequence. If you are only interested in the abundance, then this is not an issue at all and you do not have to replace the ambiguous nucleotides.
Essentially it only means that this site can not be identified as a reference SNP (a SNP, where the reference nucleotide has been completely replaced by another one). But it will still be identified as a polymorphic SNP with the same conditions independent of the reference nucleotide (min. 10% of the total counts at that position and a minimum of 10% frequency).

from deviate.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.