Comments (3)
What repeat option did you add? you have pa_REPmask_code=0,300;0,300;0,300
but that's the no-op "don't filter repeats" option.
You should set the Group Size & Coverage according to Gene Myer's blog here if you want to effectively filter out repeats
https://dazzlerblog.wordpress.com/2016/04/01/detecting-and-soft-masking-repeats/
pa_REPmask_code=group_size,coverage;group_size,coverage;group_size,coverage
What option did you have for the one that worked? Are the plants similar in biology, (e.g. closely related species?) or only in genome size? I'm wondering why one would result would be so different.
from pbbioconda.
Hi,
Thanks for your help. I read the link you provided and edited this option to "pa_REPmask_code=1,1300;2,195;3,20", and "pa_DBsplit_option=-x1000 -s200. In my case, with -s200, 1 block will represent 0.5X coverage. So the three runs of repeat masking will mask regions representing 2500X, 187X and 13X, respectively. And with -x1000, I think reads less than 1kb will also be removed. Now the three runs of the REPmask were finished, but I still got 1049 jobs under this folder 0-rawreads/daligner-split/daligner-scripts/. Does that mean the reads were not filtered by the repeat masking step and 1kb option?
from pbbioconda.
The number of jobs is completely unrelated to repeat-masking. What REPmask (and TANmaks and DBdust) does is pretty simple: It produces a "mask-track", which daligner uses to avoid generating kmers for some regions. The result is lower memory and fewer comparisons.
But the daligner jobs are partitioned to do all-pairs comparisons of the blocks in your DB, and the number of blocks is determined solely by data-size and DBsplit
parameters.
So this is all as expected.
I'm glad you found useful settings for pa_REPmask_code
.
from pbbioconda.
Related Issues (20)
- Isoseq collapse filtering out criteria HOT 1
- pbfusion HOT 11
- ipa fails at 18-purge_dups
- isoseq refine HOT 8
- Convert FASTQ in unaligned BAM HOT 1
- hifihla call-reads: thread 'main' panicked at src/cli/callreads_cli.rs:320:14: HOT 3
- dedup
- questions about pbsv
- Isoseq3 cluster fatal run HOT 1
- Mapping MT and RP genes from 10x single cell kinnex HOT 1
- Demultiplexing using lima HOT 2
- CL: Annotation of alignment classes HOT 7
- The *.flnc_count.txt output from isoseq collapse does not generate columns for multiple samples
- lima ERROR: [pbcopper] alarm ERROR: cannot write to empty alarm filename HOT 11
- Error while running Falcon HOT 1
- Cannot download Iso-Seq example data HOT 1
- filter
- Lima failing to detect CCS data after Skera de-concatination & Bam2Fastq conversion
- pigeon classification doesn't work (Chinese Hamster) HOT 4
- Segmentation fault (core dumped) when installing ilma.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pbbioconda.