milaboratory / mixcr Goto Github PK
View Code? Open in Web Editor NEWMiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
Home Page: https://mixcr.com
License: Other
MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
Home Page: https://mixcr.com
License: Other
Like
rna = -OvParameters.geneFeatureToAlign=VTranscript
full-length = ...
short-full-length = full_length - FR4
dont-cluster = ....
Possible usage:
mixcr align -:rna -:dont-cluster ...
here dont-cluster
will be skipped, as it affects only assembling stage, but by permitting such things we will allow to set the same set of parameters for all stages, in the end it will simplify implementation of #14 to be in the following form:
mixcr analyse -r report.txt -:compress-intermediate -:rna -:dont-cluster my_name_R1.fastq.gz my_name_R2.fastq.gz
which will produce the following set of files:
my_name.vdjca.gz
my_name.clns.gz
my_name.txt
report.txt
--cdr3-contains
--read-contains
--verbose
I am testing the MixCR program (v1.3) and I have found an unusual situation when running 'exportAlignments'. The problem I have noticed is that the order in which sequences are provided in a FASTA or FASTQ file will affect the number of successful sequences that are aligned.
In the example(s) I provide below, I made a FASTA file containing 7 total sequences. There are only 4 unique NGS reads in the FASTA file; that is I repeated one sequences 3x and a second sequence 2x. The remaining two sequences should not return strong hits.
if I run mixcr using the 7 test sequences (test1.fasta), then the Mixcr log file says that 2/7 sequences (rather than 5/7) returned results. This is problematic in that not all 5 are found, BUT even more problematic is if I simply change the order of the sequences in the file (test2.fasta) then the Mixcr log file says 4/7 (rather than 5/7) returned results.
The fact that I do not see 5/7 sequences successfully returned seems to be a bug. Also, I would not expect the output of exportalignments to be sensitive to the order of the sequences in a file. Is this true? If so, is it a known problem?
If its not a bug, then how can I run the settings so that I get all 5 successful sequences returned when using 'exportalignments'?
Make a special command line option for all export...
actions to convert column names to names without spaces. E.g.:
Check all export actions to output to stdout.
> mixcr info file.vdjca
Created on ...
MiXCR version ..
...
E.g. if we assemble clones using CDR3, assemble consensus sequences for all other sequence parts covered by reads.
Individual coverage values for each letter in consensus sequences.
To prevent possible artefacts on the right side of J gene alignment.
Instead such low-qulity endings (with nearly random sequence) can lead to bad alignments with j rightFloatingBound = false
?
-v
optionversionInfo
actionmergeAlignments
actionMy test showed that rna-seq
parameters performs slightly better on real highly enriched datasets. While I expected the opposite effect. MiXCR with this parameters has nearly zero false positive rate, and sensitivity is also very high (it detects nearly all V(D)J events even in short 75+75 RNA-Seq datasets).
So, why don't we use this parameters as default?
Additional testing on broader spectrum of real enriched datasets required.
By comparing number of extracted TCR/IG sequences with the number of alignments with corresponding C genes.
CXX...XX[WF]
mask for CDR3*
or _
in CDR3, V segment is not a pseudogeneLimit possible set of D genes only to loci of V and J genes.
To exclude combinations like:
TRBV -- IGHD -- TRBJ
High priority for clones lower priority for alignments (vdjca files).
From this letter:
Oh well. One more question about the mutations: these are interpretable as SHM, right? Assuming there is not sequencing/pcr error, so NGS being completely error-free, then these mutations would be SHM, and, not as it is now the case a mixture of SHM and NGS-related errors, right? And a suggestion: it would be nice to have them also on the amino acid levels (similar to IMGT).
Calculate percent of aligned letters and output it in the export tab-delimited files.
I have a set of fasta files with reference sequences of V, D, J or C genes.
Each file is padded with .
symbols or something similar to align anchor points. So each anchor point has the same position in all sequences. (exactly like IMGT gaps)
There is file or command-line argument with positions of all anchor points. Something like this:
V=108:117:125:148:157: etc...
I can create new loci library from this information or append it to already existing one:
mixcr addReferenceGenes --taxonId 9615 --speciesCommonName dog,canis --locus TRB --geneType V --anchorPoints 108:117:125:148:157 --geneNamePattern '...' input.fasta myLL.ll
this will create myLL.ll
file or add locus information to it if it already exists.
taxonId
, locus
and geneName
.I have a big fasta file with genomic sequence of chromosome or particular locus.
There is another file with tab-delimited list of reference genes. Example segments.txt
:
GeneName | Locus | GeneType | AnchorPoints |
---|---|---|---|
TRBV12-3 | TRB | V | 123341:123356:123387:123456 |
... | ... | ... | ... |
I can create new loci library from this information or append it to already existing one:
mixcr addReferenceLocus --taxonId 9615 --speciesCommonName dog,canis input.fasta segments.txt myLL.ll
Like this:
v1:::0:::56:93:102:::
where v1
is a version of reference points.
Requires
Make Hotfix!
mixcr -Xmx2g -Xms1g align ....
This issue is connected to #42
PATH_TO_MIXCR_SCRIPT/reference/
for system-wide installation~/.mixcr/reference/
for user-local installation.
or ./reference
If it is installed as described above:
mixcr align --lociLibrary myLL ....
mixcr assemble ...
mixcr exportClones ...
I don't have to specify loci library second time in assemble, as *.vdjca
file already contains this informatio
If I just have a file somewhere in the file system:
mixcr align --lociLibrary /path/to/myLL ....
mixcr assemble --lociLibrary /path/to/myLL ...
mixcr exportClones --lociLibrary /path/to/myLL ...
default.ll
will be used only if user specified it on the align
step, the internal mi.ll
will be used if --lociLibrary
option is not used.default.ll
will be used by default.CXX...XX[WF]
mask for CDR3*
or _
in CDR3, V segment is not a pseudogeneLike:
CDR3(-3,+3)
as a shortcut for
{CDR3Begin(-3):CDR3End(+3)}
Like this
--species {name}
instead of
--species
In small percent of cases in randomly shred libraries, alignment could gain additional total score from J or C gene in the right part of a paired-end sequence.
--filter-out-of-frames
and --filter-stops
to export
-presetFile
with preset-file
in export
-listFields
with list-fields
in export
--save-reads
in align
--index
in assemble
mixcr -Xmx2g align ...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.