GithubHelp home page GithubHelp logo

Comments (21)

armintoepfer avatar armintoepfer commented on September 27, 2024

Please have a look if those work for you
#549 (comment)

from pbbioconda.

jamestwebber avatar jamestwebber commented on September 27, 2024

Weirdly I am having a new issue, lima is having this issue but it seems like pigeon and isoseq3 are not (I think).

I created a new VM with Ubuntu 20.04 because that is listed as supported. I can install from bioconda and run pigeon --version and isoseq3 --version (which I couldn't do before, in #549) but lima --isoseq segfaults as well as lima --log-level TRACE.

At this point my suspicion is that these tools are relying on system libraries that are not being properly checked/versioned for compatibility? lima is surprising because it hasn't received an update since I started using it, so it hasn't changed but something else must have.

from pbbioconda.

jamestwebber avatar jamestwebber commented on September 27, 2024

I think I'm tracking down my issue as something else...maybe bad data or bad output that is causing lima to fail (unfortunately without a useful error message). When I know more I'll open a separate issue.

from pbbioconda.

gushiro avatar gushiro commented on September 27, 2024

@armintoepfer thank you. This work for me. However, I notice a minor difference in this new version of Isoseq3, particularly for isoseq3 collapse. I have now a Warning message '>|> 20230106 03:24:09.947 -|- WARN -|- Run -|- 0x7f88b4d36d40|| -|- Transcripts do not contain quality values, will not output Id.fastq'

I should mentioned that the warning did not appear before in previous version of Isoseq3 using the same data.

from pbbioconda.

gushiro avatar gushiro commented on September 27, 2024

@armintoepfer also I just notice that Pigen classify split scaffolds into _classification and _junction but the program seems stuck in the scaffold 45 for hours. My genome ref have 49 scaffolds.

from pbbioconda.

armintoepfer avatar armintoepfer commented on September 27, 2024

Let me ping @derekwbarnett @jmattick for support

from pbbioconda.

derekwbarnett avatar derekwbarnett commented on September 27, 2024

Hi @gushiro This sounds like it might be a multithreading issue. Can you re-rerun and see if it hangs at the same point in the data? Let me know what you see. Also, you should know the answer long before 2 days of runtime.

from pbbioconda.

gushiro avatar gushiro commented on September 27, 2024

@derekwbarnett thank you for your reply. Yes, it still hangs at the same point.

from pbbioconda.

armintoepfer avatar armintoepfer commented on September 27, 2024

In that case, is it possible to provide a minimal working dataset to reproduce the issue? https://github.com/PacificBiosciences/pbbioconda#file-sharing

from pbbioconda.

gushiro avatar gushiro commented on September 27, 2024

@armintoepfer yes, where can I send them to you?

from pbbioconda.

armintoepfer avatar armintoepfer commented on September 27, 2024

Please click on the link, everything is explained how to upload data to us

from pbbioconda.

armintoepfer avatar armintoepfer commented on September 27, 2024

@gushiro We've received the data and will triage. Once we have more information, we will update here.

from pbbioconda.

astulaaa avatar astulaaa commented on September 27, 2024

I am getting the same warning in isoseq3 collapse. Interestingly, after pigeon classify not a single transcript is classified as 'coding'. Any ideas why that would happen? Is it only column 30 annotations are incorrect or all of them? There are many among them annotated as 'full-splice_match' and 'reference_match' and 'canonical' but none of them as 'coding'.

from pbbioconda.

jamestwebber avatar jamestwebber commented on September 27, 2024

There are many among them annotated as 'full-splice_match' and 'reference_match' and 'canonical' but none of them as 'coding'.

'coding' is not one of the options

from pbbioconda.

astulaaa avatar astulaaa commented on September 27, 2024

I'm sorry, I made it unclear. I mean the column 30 "coding". Every isoform is annotated as "non_coding". Even though the column 6 (which you linked the options for) has many "full-splice_match" isoforms. I am confused how is it possible for every transcript to be "non_coding"?

from pbbioconda.

gushiro avatar gushiro commented on September 27, 2024

@armintoepfer thanks. To add a little:
I think for non-model species with long UTRs, maybe will be good for pigeon to count genes if the read fall XXXX bp downstream the last exon. I can see in my data a lot of short cDNA and cDNA in utrs that would be classify as intergenic (and therefore not producing a count?). I can see the importance of classifying transcript before (removing antisense reads for example), but I wonder what would happen during the gene count matrix generation if the transcript does fall only in the UTR region.

from pbbioconda.

derekwbarnett avatar derekwbarnett commented on September 27, 2024

@gushiro I have identified the problem. In your genome annotations file, the following record is out-of-order:

chr46   rRNA    gene    15556622        15556733        .       +       .       gene_id "LSU_316"; gene_id "LSU_316"; gene_id "LSU_316"; gene_id "LSU_316"; gene_id "LSU_316";
chr46   rRNA    exon    15556622        15556733        .       +       .       gene_id "SSU_316"; transcript_id "LSU_316.1";
chr46   rRNA    transcript      15556622        15556733        .       +       .       gene_id "SSU_316"; transcript_id "LSU_316.1"; exon_number "1";

The exon entry appears before the transcript. pigeon is expecting exon records to be "children" of the transcript. The rest of the file follows this convention. It's just this one record that is different.

Re-ordering to this, allows pigeon to run to completion:

chr46   rRNA    gene    15556622        15556733        .       +       .       gene_id "LSU_316"; gene_id "LSU_316"; gene_id "LSU_316"; gene_id "LSU_316"; gene_id "LSU_316";
chr46   rRNA    transcript      15556622        15556733        .       +       .       gene_id "SSU_316"; transcript_id "LSU_316.1"; exon_number "1";
chr46   rRNA    exon    15556622        15556733        .       +       .       gene_id "SSU_316"; transcript_id "LSU_316.1";

This situation triggered an internal error state that was not properly handled and suspended processing. I will include a fix in the next release here.

from pbbioconda.

gushiro avatar gushiro commented on September 27, 2024

@derekwbarnett thanks, it worked perfect now.
I still have many "intergenic" transcript. When I check my data, I can see that each gene have many of these small intergenic transcript downstream the gene, spanning between 1000 - 3000 bp. Would it be possible for pigeon make-seurat or any step before to count these short transcript as part of the gene if the gene is "1000 bp" upstream, for example?

from pbbioconda.

armintoepfer avatar armintoepfer commented on September 27, 2024

I have uploaded a new version. If that still segfaults, but not the github binary itself, then conda is corrupting the binary.

from pbbioconda.

jamestwebber avatar jamestwebber commented on September 27, 2024

Just tested and it looks like they work now, thanks for fixing the issue

from pbbioconda.

armintoepfer avatar armintoepfer commented on September 27, 2024

Okay great. Conda is weird sometimes.

from pbbioconda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.