GithubHelp home page GithubHelp logo

Comments (3)

gconcepcion avatar gconcepcion commented on May 27, 2024

What repeat option did you add? you have pa_REPmask_code=0,300;0,300;0,300 but that's the no-op "don't filter repeats" option.

You should set the Group Size & Coverage according to Gene Myer's blog here if you want to effectively filter out repeats
https://dazzlerblog.wordpress.com/2016/04/01/detecting-and-soft-masking-repeats/
pa_REPmask_code=group_size,coverage;group_size,coverage;group_size,coverage

What option did you have for the one that worked? Are the plants similar in biology, (e.g. closely related species?) or only in genome size? I'm wondering why one would result would be so different.

from pbbioconda.

jayceejiao avatar jayceejiao commented on May 27, 2024

Hi,

Thanks for your help. I read the link you provided and edited this option to "pa_REPmask_code=1,1300;2,195;3,20", and "pa_DBsplit_option=-x1000 -s200. In my case, with -s200, 1 block will represent 0.5X coverage. So the three runs of repeat masking will mask regions representing 2500X, 187X and 13X, respectively. And with -x1000, I think reads less than 1kb will also be removed. Now the three runs of the REPmask were finished, but I still got 1049 jobs under this folder 0-rawreads/daligner-split/daligner-scripts/. Does that mean the reads were not filtered by the repeat masking step and 1kb option?

from pbbioconda.

pb-cdunn avatar pb-cdunn commented on May 27, 2024

The number of jobs is completely unrelated to repeat-masking. What REPmask (and TANmaks and DBdust) does is pretty simple: It produces a "mask-track", which daligner uses to avoid generating kmers for some regions. The result is lower memory and fewer comparisons.

But the daligner jobs are partitioned to do all-pairs comparisons of the blocks in your DB, and the number of blocks is determined solely by data-size and DBsplit parameters.

So this is all as expected.

I'm glad you found useful settings for pa_REPmask_code.

from pbbioconda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.