alexdobin / star Goto Github PK

View Code? Open in Web Editor NEW

1.7K 1.7K 491.0 533.08 MB

RNA-seq aligner

License: MIT License

TeX 3.20% Awk 0.65% C++ 32.34% C 62.77% Makefile 0.68% Roff 0.30% MATLAB 0.03% Shell 0.01% Dockerfile 0.02%

star's People

Contributors

Stargazers

Watchers

Forkers

earonesty ndaniel bambooie q-kim godotgildor parolfe gvessere distue mmterpstra bosmont eyay biobp ryoshimo kbullaugheyga goohongzi lingrui thaocad cyang-2014 kellyquek twolfe emmaggie srw6v hmanoj felixschlesinger blellochr kisinghuagn abdul59 rsankowski mb84 wenliangz mint1234 boratonaj cinquin alexf101 idiv-biodiversity alejoba ddf555 nathanhaigh sung jasvinderkaur ctcncgr alexfinkel shengqh kkcheungus jcotignola oxphos changebio konrad barnowl1 yixuanashleyguo imalsoft9 xiuying thomsoncd arahuja justinjeyakani jdimeng wiesnert cooleel smoe glastonburyc ken0-1n vd4mmind coralzhang esterf reiverjohn ecwheele roeil wososa genomicsnx hbai521 alaindomissy dbolotin alenzhao roebidebruijn deepbiomed9 lingdudefeiteng chizhou-siti gsc0107 inambioinfo jchenpku dinvlad miachol sergeypry nml5566 abaghela bioxiao jiaolongsun dayedepps russellxie mxdoss tankmermaid noahpieta beneills zsl1024 mlebeur yixf-self guanminxiao gepellet warrenmcg tommycarstensen

star's Issues

fail generating very little index - core dumped

Hi,

I seem to have a problem reported by others before but on a little file.
You can find the logs and working files here : https://gist.github.com/JRudewicz/1bca5f8e99dcb9f8afcb
Please let me know if you need anything else.

Thanks

Usage info needed when calling with --help

It would be great if STAR could provide some usage information when calling

STAR --help

The current behavior looks rather undefined. If an actual fix is too complicated at least a url to the latest manual could be printed.

Seg fault in version 2.4.1d

I'm getting the following stacktrace after entering the mapping phase with STAR:

No	Address	Function	Text Location
#0	0x00000000004fe8a2	compareSeqToGenome	SuffixArrayFuns.cpp:27
#1	0x0000000003c3b9af	FailureSignalHandler	base/process_state.cc:1433
#2	0x00007fb15bf7933f		.../nptl/sysdeps/pthread/funlockfile.c:30
#3	0x00000000004fef64	maxMappableLength	SuffixArrayFuns.cpp:138
#4	0x00000000005171f2	ReadAlign::maxMappableLength2strands	ReadAlign_maxMappableLength2strands.cpp:82
#5	0x000000000050d8a4	ReadAlign::mapOneRead	ReadAlign_mapOneRead.cpp:60
#6	0x00000000005071d6	ReadAlign::oneRead	ReadAlign_oneRead.cpp:70
#7	0x0000000000505b50	ReadAlignChunk::mapChunk	ReadAlignChunk_mapChunk.cpp:25
#8	0x00000000004f77cc	ReadAlignChunk::processChunks	ReadAlignChunk_processChunks.cpp:144
#9	0x00000000004e580c	mapThreadsSpawn	mapThreadsSpawn.cpp:19
#10	0x00000000004a3625	main	STAR.cpp:256

uint compareSeqToGenome(char** s2, uint S, uint N, uint L, char* g, PackedArray& SA, uint iSA, bool dirR, bool& compRes, Parameters* P) {
    // compare s to g, find the maximum identity length
    // s2[0] read sequence; s2[1] complementary sequence
    // S position to start search from in s2[0],s2[1]
    //dirR forward or reverse direction search on read sequence

    register int64 ii; 

    uint SAstr=SA[iSA];
    bool dirG = (SAstr>>P->GstrandBit) == 0; //forward or reverse strand of the genome
    SAstr &= P->GstrandMask;


    if (dirR && dirG) {//forward on read, forward on genome
        char* s  = s2[0] + S + L;
        g += SAstr + L;
        for (ii=0;(uint) ii < N-L; ii++)
        {   
            if (s[ii]!=g[ii])   <-- seg fault occurs here

This code is kind of hard to follow - is it possible that ii could be out of bounds for s or g?

The command run was:

rna-star --genomeDir=Homo_sapiens.GRCh38.dna.primary_assembly.index.2 --outFileNamePrefix=output/ --outTmpDir=./_STARtmp --readFilesIn=fastq_reads/SRR062634.filt.fastq --runThreadN=1

(I was just testing it out on publicly available data).

There's a few local modifications in my source to do with file handling but I don't think any of them are relevant. I may have also deleted some commented-out code here and there, so it's possible that the line numbers don't match perfectly.

STAR-2.4.0d does not build on OS X

make STARforMac

ends with:

bamRemoveDuplicates.cpp:23:25: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
        for (int ii=0;ii<la;ii++) {
                      ~~^~~
bamRemoveDuplicates.cpp:74:41: error: arithmetic on a pointer to void
    uint32* pa2=(uint32*) *(uint32**) (a+8);
                                       ~^
bamRemoveDuplicates.cpp:77:41: error: arithmetic on a pointer to void
    uint32* pb2=(uint32*) *(uint32**) (b+8);
                                       ~^
bamRemoveDuplicates.cpp:106:18: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
        for (; ii<pa2[5]; ii+=2) {
               ~~^~~~~~~
2 warnings and 2 errors generated.
make: *** [bamRemoveDuplicates.o] Error 1

Makefile wrongly assumes presence of vim

Makefile:71: Depend.list: No such file or directory
xxd -i parametersDefault > parametersDefault.xxd
/bin/sh: xxd: command not found

is this use of xxd necessary? What is it doing exactly?

STAR 2pass fails if directory not present

Hi Alex,

quite impressed with STAR 2 pass finding ~25% extra splice sites on one sample compared to STAR.

One potential bug is that STAR2pass fails if the directory is not created in advance (even if --outFileNamePrefix is set).

It starts fine if I create the directory in advance (see bottom)

Version: STAR_2.4.0h

1

STAR --genomeDir MA/star/ --twopass1readsN 500000000 --sjdbOverhang 99 --outFileNamePrefix M-P --alignIntronMax 150000 --runThreadN 12 --outFilterIntronMotifs RemoveNoncanonical --readFilesIn ../../M-P_comb_R1.fastq ../../M-P_comb_R2.fastq

EXITING because of fatal ERROR: could not make pass1 directory: M-P/_STARpass1/
SOLUTION: please check the path and writing permissions

Dec 22 14:34:08 ...... FATAL ERROR, exiting

2

mkdir M-P

SyntaxError: Non-ASCII character '\xdb' in file

Dont mind. It was a typo

fail generating hg19 index - core dumped

I seem to have a problem reported by others before but could not fix it by lowering --genomeSAindexNbases and allowing 64G of my 98G RAM.
Am I limited by my RAM here?

I installed the h1 release, no better

Thanks

STAR --runThreadN 8 --genomeSAindexNbases 2 --limitGenomeGenerateRAM 64000000000 --runMode genomeGenerate --genomeDir /opt/biodata/reference/star --genomeFastaFiles /opt/biodata/reference/human/GRCh37.73.fa --sjdbGTFfile /o
pt/biodata/reference/human/GRCh37.73.gtf --sjdbOverhang 99
Dec 24 13:58:13 ..... Started STAR run
Dec 24 13:58:13 ... Starting to generate Genome files
Dec 24 13:58:26 ... Starting GTF processing
Segmentation fault (core dumped)

$ head -4 /opt/biodata/reference/human/GRCh37.73.fa

1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

$ head -4 /opt/biodata/reference/human/GRCh37.73.gtf

1 processed_transcript exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; gene_name "DDX11L1"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; exon_id "ENSE00002234944";
1 processed_transcript exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX11L1"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; exon_id "ENSE00003582793";
1 processed_transcript exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; gene_name "DDX11L1"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; exon_id "ENSE00002312635";
1 unprocessed_pseudogene exon 11872 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000515242"; exon_number "1"; gene_name "DDX11L1"; gene_biotype "pseudogene"; transcript_name "DDX11L1-201"; exon_id "ENSE00002234632";

genomeGenerate Aborted with std::bad_malloc

Hi Alex,

I am preparing STAR for use and getting this failure on genomeGenerate. I've tried twice, the second time adding --limitGenomeGenerateRAM 30000000000 but this doesn't seem to help. It failed both times having written ~47Gb in the genomeDir and seemed to be using ~19Gb or RAM most of the time (though I'm not sure exactly how much it used prior to the crash).

ubuntu@master:~/bcbio_datadir/genomes/Hsapiens/GRCh37/star$
ubuntu@master:~/bcbio_datadir/genomes/Hsapiens/GRCh37/star$ STAR --genomeDir /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star --genomeFastaFiles /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --runMode genomeGenerate --sjdbOverhang 99 --sjdbGTFfile /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../rnaseq/ref-transcripts.gtf --genomeSAindexNbases 14 --runThreadN 16 --limitGenomeGenerateRAM 30000000000
Mar 20 16:39:32 ..... Started STAR run
Mar 20 16:39:32 ... Starting to generate Genome files
Mar 20 16:40:58 ... finished processing splice junctions database ...
Mar 20 16:41:17 ... starting to sort  Suffix Array. This may take a long time...
Mar 20 16:41:37 ... sorting Suffix Array chunks and saving them to disk...
Mar 20 16:54:25 ... loading chunks from disk, packing SA...
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)
ubuntu@master:~/bcbio_datadir/genomes/Hsapiens/GRCh37/star$ echo `STAR --version`
STAR_2.4.0e
ubuntu@master:~/bcbio_datadir/genomes/Hsapiens/GRCh37/star$

Any help resolving this would be much appreciated!

Thanks,
Roy

Some extracts from Log:

STAR version=STAR_2.4.0e
STAR compilation time,server,dir=Fri Oct 24 10:43:53 EDT 2014 verona.cshl.edu:/sonas-hs/gingeras/nlsas_norepl/user/dobin/STAR/STAR.sandbox/source
##### Command Line:
STAR --genomeDir /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star --genomeFastaFiles /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --runMode genomeGenerate --sjdbOverhang 99 --sjdbGTFfile /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../rnaseq/ref-transcripts.gtf --genomeSAindexNbases 14 --runThreadN 16 --limitGenomeGenerateRAM 30000000000
##### Initial USER parameters from Command Line:
###### All USER parameters from Command Line:
genomeDir                     /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star     ~RE-DEFINED
genomeFastaFiles              /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa        ~RE-DEFINED
runMode                       genomeGenerate     ~RE-DEFINED
sjdbOverhang                  99     ~RE-DEFINED
sjdbGTFfile                   /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../rnaseq/ref-transcripts.gtf     ~RE-DEFINED
genomeSAindexNbases           14     ~RE-DEFINED
runThreadN                    16     ~RE-DEFINED
limitGenomeGenerateRAM        30000000000     ~RE-DEFINED
##### Finished reading parameters from all sources

##### Final user re-defined parameters-----------------:
runMode                           genomeGenerate
runThreadN                        16
genomeDir                         /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star
genomeFastaFiles                  /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa   
genomeSAindexNbases               14
limitGenomeGenerateRAM            30000000000
sjdbGTFfile                       /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../rnaseq/ref-transcripts.gtf
sjdbOverhang                      99

-------------------------------
##### Final effective command line:
STAR   --runMode genomeGenerate   --runThreadN 16   --genomeDir /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star   --genomeFastaFiles /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa      --genomeSAindexNbases 14   --limitGenomeGenerateRAM 30000000000   --sjdbGTFfile /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../rnaseq/ref-transcripts.gtf   --sjdbOverhang 99
Finished loading and checking parameters
Mar 20 16:39:32 ... Starting to generate Genome files
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 0  "1" chrStart: 0
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 1  "2" chrStart: 249298944
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 2  "3" chrStart: 492568576
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 3  "4" chrStart: 690749440
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 4  "5" chrStart: 882114560
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 5  "6" chrStart: 1063256064
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 6  "7" chrStart: 1234436096
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 7  "8" chrStart: 1393819648
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 8  "9" chrStart: 1540358144
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 9  "10" chrStart: 1681653760
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 10  "11" chrStart: 1817444352
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 11  "12" chrStart: 1952710656
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 12  "13" chrStart: 2086666240
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 13  "14" chrStart: 2202009600
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 14  "15" chrStart: 2309488640
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 15  "16" chrStart: 2412249088
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 16  "17" chrStart: 2502688768
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 17  "18" chrStart: 2583953408
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 18  "19" chrStart: 2662072320
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 19  "20" chrStart: 2721316864
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 20  "21" chrStart: 2784493568
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 21  "22" chrStart: 2832728064
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 22  "X" chrStart: 2884108288
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 23  "Y" chrStart: 3039559680
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 24  "MT" chrStart: 3099066368
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 25  "GL000207.1" chrStart: 3099328512
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 26  "GL000226.1" chrStart: 3099590656
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 27  "GL000229.1" chrStart: 3099852800
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 28  "GL000231.1" chrStart: 3100114944
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 29  "GL000210.1" chrStart: 3100377088
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 30  "GL000239.1" chrStart: 3100639232
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 31  "GL000235.1" chrStart: 3100901376
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 32  "GL000201.1" chrStart: 3101163520
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 33  "GL000247.1" chrStart: 3101425664
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 34  "GL000245.1" chrStart: 3101687808
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 35  "GL000197.1" chrStart: 3101949952
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 36  "GL000203.1" chrStart: 3102212096
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 37  "GL000246.1" chrStart: 3102474240
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 38  "GL000249.1" chrStart: 3102736384
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 39  "GL000196.1" chrStart: 3102998528
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 40  "GL000248.1" chrStart: 3103260672
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 41  "GL000244.1" chrStart: 3103522816
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 42  "GL000238.1" chrStart: 3103784960
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 43  "GL000202.1" chrStart: 3104047104
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 44  "GL000234.1" chrStart: 3104309248
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 45  "GL000232.1" chrStart: 3104571392
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 46  "GL000206.1" chrStart: 3104833536
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 47  "GL000240.1" chrStart: 3105095680
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 48  "GL000236.1" chrStart: 3105357824
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 49  "GL000241.1" chrStart: 3105619968
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 50  "GL000243.1" chrStart: 3105882112
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 51  "GL000242.1" chrStart: 3106144256
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 52  "GL000230.1" chrStart: 3106406400
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 53  "GL000237.1" chrStart: 3106668544
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 54  "GL000233.1" chrStart: 3106930688
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 55  "GL000204.1" chrStart: 3107192832
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 56  "GL000198.1" chrStart: 3107454976
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 57  "GL000208.1" chrStart: 3107717120
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 58  "GL000191.1" chrStart: 3107979264
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 59  "GL000227.1" chrStart: 3108241408
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 60  "GL000228.1" chrStart: 3108503552
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 61  "GL000214.1" chrStart: 3108765696
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 62  "GL000221.1" chrStart: 3109027840
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 63  "GL000209.1" chrStart: 3109289984
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 64  "GL000218.1" chrStart: 3109552128
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 65  "GL000220.1" chrStart: 3109814272
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 66  "GL000213.1" chrStart: 3110076416
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 67  "GL000211.1" chrStart: 3110338560
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 68  "GL000199.1" chrStart: 3110600704
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 69  "GL000217.1" chrStart: 3110862848
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 70  "GL000216.1" chrStart: 3111124992
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 71  "GL000215.1" chrStart: 3111387136
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 72  "GL000205.1" chrStart: 3111649280
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 73  "GL000219.1" chrStart: 3111911424
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 74  "GL000224.1" chrStart: 3112173568
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 75  "GL000223.1" chrStart: 3112435712
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 76  "GL000195.1" chrStart: 3112697856
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 77  "GL000212.1" chrStart: 3112960000
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 78  "GL000222.1" chrStart: 3113222144
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 79  "GL000200.1" chrStart: 3113484288
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 80  "GL000193.1" chrStart: 3113746432
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 81  "GL000194.1" chrStart: 3114008576
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 82  "GL000225.1" chrStart: 3114270720
/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/GRCh37.fa : chr # 83  "GL000192.1" chrStart: 3114532864
Processing sjdbGTFfile=/home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../rnaseq/ref-transcripts.gtf, found:
                196501 transcripts
                1195764 exons (non-collapsed)
                344569 collapsed junctions
Mar 20 16:40:58 ... finished processing splice junctions database ...
Writing genome to disk...Writing 3183874000 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/Genome ; empty space on disk = 166107160576 bytes ... done
 done.
Number of SA indices: 5865990696
SA size in bytes: 24197211622
Mar 20 16:41:17 ... starting to sort  Suffix Array. This may take a long time...
Number of chunks: 62;   chunks size limit: 886207608 bytes
Mar 20 16:41:37 ... sorting Suffix Array chunks and saving them to disk...
Writing 599357856 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_1 ; empty space on disk = 162923352064 bytes ... done
Writing 577153280 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_13 ; empty space on disk = 162322223104 bytes ...Writing 619007584 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_4 ; empty space on disk = 162035646464 bytes ... done
 done
Writing 634268672 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_14 ; empty space on disk = 161122529280 bytes ...Writing 716890648 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_0 ; empty space on disk = 160770732032 bytes ... done
 done
Writing 714483232 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_3 ; empty space on disk = 159772667904 bytes ... done
Writing 694834640 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_11 ; empty space on disk = 159056072704 bytes ... done
Writing 752341344 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_2 ; empty space on disk = 158359187456 bytes ...Writing 722691600 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_10 ; empty space on disk = 157626068992 bytes ... done
Writing 717814040 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_15 ; empty space on disk = 157309399040 bytes ...Writing 773328688 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_9 ; empty space on disk = 157141749760 bytes ...Writing 794598600 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_6 ; empty space on disk = 156506316800 bytes ...Writing 750742752 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_12 ; empty space on disk = 155990577152 bytes ... done
Writing 810151080 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_5 ; empty space on disk = 155494641664 bytes ...Writing 829488920 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_8 ; empty space on disk = 155056332800 bytes ...Writing 858391408 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_7 ; empty space on disk = 154779631616 bytes ... done
 done
 done
 done
 done
 done
 done
Writing 741898600 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_17 ; empty space on disk = 151357706240 bytes ... done
Writing 667114832 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_18 ; empty space on disk = 150613618688 bytes ... done
Writing 706991272 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_20 ; empty space on disk = 149946707968 bytes ... done
Writing 840425976 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_16 ; empty space on disk = 149237633024 bytes ... done
Writing 886122488 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_19 ; empty space on disk = 148401233920 bytes ... done
Writing 793602056 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_28 ; empty space on disk = 147515113472 bytes ... done
Writing 723365672 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_25 ; empty space on disk = 146719170560 bytes ...Writing 674447488 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_23 ; empty space on disk = 146449977344 bytes ... done
 done
Writing 711395888 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_29 ; empty space on disk = 145317228544 bytes ...Writing 700092856 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_27 ; empty space on disk = 144836620288 bytes ... done
Writing 751083424 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_21 ; empty space on disk = 144228253696 bytes ... done
 done
Writing 792413336 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_22 ; empty space on disk = 143148273664 bytes ... done
Writing 861232896 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_26 ; empty space on disk = 142359969792 bytes ... done
Writing 826416264 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_31 ; empty space on disk = 141496197120 bytes ...Writing 844898024 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_30 ; empty space on disk = 140956876800 bytes ...Writing 863024328 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_24 ; empty space on disk = 140389912576 bytes ... done
 done
 done
Writing 745171728 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_32 ; empty space on disk = 138973077504 bytes ... done
Writing 772097912 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_33 ; empty space on disk = 138225704960 bytes ... done
Writing 789894584 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_34 ; empty space on disk = 137455796224 bytes ... done
Writing 757973088 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_36 ; empty space on disk = 136665890816 bytes ... done
Writing 547743480 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_43 ; empty space on disk = 135907909632 bytes ... done
Writing 762137728 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_38 ; empty space on disk = 135358545920 bytes ... done
Writing 682276800 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_47 ; empty space on disk = 134594158592 bytes ...Writing 870757576 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_35 ; empty space on disk = 134004498432 bytes ... done
Writing 737698816 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_37 ; empty space on disk = 133276700672 bytes ...Writing 688834280 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_44 ; empty space on disk = 132982648832 bytes ... done
 done
 done
Writing 812803624 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_39 ; empty space on disk = 131607408640 bytes ... done
Writing 727867312 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_41 ; empty space on disk = 130794450944 bytes ... done
Writing 804211744 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_45 ; empty space on disk = 130066444288 bytes ...Writing 796289520 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_40 ; empty space on disk = 129299873792 bytes ... done
Writing 842436704 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_42 ; empty space on disk = 128590712832 bytes ... done
 done
Writing 872813384 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_46 ; empty space on disk = 127623057408 bytes ... done
Writing 674809448 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_48 ; empty space on disk = 126761959424 bytes ... done
Writing 711654952 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_49 ; empty space on disk = 126085156864 bytes ... done
Writing 823461280 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_50 ; empty space on disk = 125375483904 bytes ... done
Writing 558710384 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_57 ; empty space on disk = 124549591040 bytes ... done
Writing 718695328 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_61 ; empty space on disk = 123993296896 bytes ... done
Writing 841699200 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_54 ; empty space on disk = 123272478720 bytes ... done
Writing 857675400 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_52 ; empty space on disk = 122428297216 bytes ... done
Writing 686570256 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_60 ; empty space on disk = 121568079872 bytes ... done
Writing 655735720 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_56 ; empty space on disk = 120879484928 bytes ... done
Writing 869190152 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_53 ; empty space on disk = 120226410496 bytes ... done
Writing 855636920 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_55 ; empty space on disk = 119357186048 bytes ... done
Writing 785017520 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_58 ; empty space on disk = 118499024896 bytes ...Writing 879050080 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_51 ; empty space on disk = 117840498688 bytes ... done
Writing 848940904 bytes into /home/ubuntu/bcbio_datadir/genomes/Hsapiens/GRCh37/seq/../star/SA_59 ; empty space on disk = 116834627584 bytes ... done
 done
Mar 20 16:54:25 ... loading chunks from disk, packing SA...

Trimming of fastq/fasta names after "/"

Noticed that fastq or fasta names that have a / character are trimmed. For example Name/1/2/3 is trimmed to Name. Is there a reason for this?
I have a fastq file in which the reads have the identifier after a / in the name. Therefore after the STAR alignment I get all alignments with the same name.
eg.

Name/1
Name/2
Name/3
...

after the alignment I get a SAM with the following QNAMEs

Name
Name
Name
...

STAR 2.4.0h OSX version reports wrong version number

For STAR 2.4.0h the version is reported as f. Is it actually f? :)

Segmentation fault running STAR 2.4.1d Linux_x86_64

LSF Options:

bsub -n 12 -q normal -R span[hosts=1] -R rusage[mem=40000] -o lsf.out -e lsf.err

Command Line:

STAR --runThreadN 12 --outSAMtype BAM Unsorted SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --twopassMode Basic --genomeDir GRCh38.79.chrom1/ --readFilesIn fastq/SRR1039508_1.fastq fastq/SRR1039508_2.fastq

lsf.err File:

/gpfs/gpfs1/home/cs/.lsbatch/1432850412.956154: line 8: 14626 Segmentation fault (core dumped) STAR --runThreadN 12 --outSAMtype BAM Unsorted SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --twopassMode Basic --genomeDir GRCh38.79.chrom1/ --readFilesIn fastq/SRR1039508_1.fastq fastq/SRR1039508_2.fastq

lsf.out File:

Exited with exit code 139.

Resource usage summary:

CPU time :               22061.85 sec.
Max Memory :             8534 MB
Average Memory :         5596.50 MB
Total Requested Memory : 40000.00 MB
Delta Memory :           31466.00 MB
(Delta: the difference between total requested memory and actual max usage.)
Max Swap :               11843 MB

Max Processes :          3
Max Threads :            26

The output (if any) follows:

May 29 03:02:16 ..... Started STAR run
May 29 03:02:17 ..... Loading genome
May 29 03:02:50 ..... Started 1st pass mapping
May 29 03:35:28 ..... Finished 1st pass mapping
May 29 03:35:29 ..... Inserting junctions into the genome indices
May 29 03:35:49 ..... Started mapping

Alignment problem with STARlong

There is an alignment error using STARlong (STAR runs fine) on the following sequences. Initially I tried version 2.4.1c but the problem persists for the latest commit.

@A
ATGGCTACCTCTCGATATGAGCCAGTGGCTGAAATTGGTGTCGGTGCCTATGGGACAGTGTACAAGGCCCGTGATCCCCACAGTGGCCACTTTGTGGCCCTCAAGAGTGTGAGAGTCCTCCTCCATCTTTCTACAGAGATTACTTTGCTGCCTTAATGACATTCCCCTCCCACCTCTCCTTTTGAGGCTTCTCCTTCTCCTTCCCATTTCTCTACACTAAGGGGTATGTTCCCTCTTGTCCCTTTCCCTACCTTTATATTTGGGGTCCTTTTTTATACAGGAAAAAAAAAAAAAG
+
KKKKKKKKKKKKKKKKKKKKKKKK$%KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK))KK&$KKK@KKKK&KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK)KKKKKKKKKKKKKKKKK?KKKKKKKKKKKKKKKKKKKKKKGKKKKKKKKKKGKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
@B
ATGGCTACCTCTCGATATGAGCCAGTGGCTGAAATTGGTGTCGGTGCCTATGGGACAGTGTACAAGGCCCGTGATCCCCACAGTGGCCACTTTGTGGCCCTCAAGAGTGTGAGAGTCCTCCTCCATCTTTCTACAGAGATTACTTTGCTGCCTTAATGACATTCCCCTCCCACCTCTCCTTTTGAGGCTTCTCCTTCTCCTTCCCATTTCTCTACACTAAGGGGTATGTTCCCTCTTGTCCCTTTCCCTACCTTTATATTTGGGGTCCTTTTTTATACAGGAAAAAAAAAAAAA
+
KKKKKKKKKKKKKKKKKKKKKKKK$%KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK))KK&$KKK@KKKK&KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK)KKKKKKKKKKKKKKKKK?KKKKKKKKKKKKKKKKKKKKKKGKKKKKKKKKKGKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK

Note that originally I found the problem with sequence A and trying to identify the problem I manually created sequence B which is exactly the same sequence except that it misses the last nucleotide. Also note that both sequences align on the (-) strand of the hg19 assembly. The index was annotated with junctions from a GTF file.

The problem is that the cigar string for sequence A has an S in the beginning and at the end while the cigar for sequence B looks fine.
A: 1S176M3175N117M1S
B: 176M3175N118M

The S at the beginning is probably correct but the S at the end is wrong. It corresponds to the adenine at the beginning of the sequence (ATG: start codon) and is part of the reference genome.

Here is the command i used and the output I get:

STARlong --outFilterMultimapScoreRange 0 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0.66 --outFilterMismatchNmax 1000 --winAnchorMultimapNmax 200 --seedSearchLmax 30 --seedSearchStartLmax 12 --seedPerReadNmax 100000 --seedPerWindowNmax 100 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000 --alignSJDBoverhangMin 1 --genomeDir ../../../data/hg19/index/sparsed2/ --readFilesIn foo.fastq --runThreadN 1 --outSAMattributes All

A       16      chr12   58142032        255     1S176M3175N117M1S       *       0       0       CTTTTTTTTTTTTTCCTGTATAAAAAAGGACCCCAAATATAAAGGTAGGGAAAGGGACAAGAGGGAACATACCCCTTAGTGTAGAGAAATGGGAAGGAGAAGGAGAAGCCTCAAAAGGAGAGGTGGGAGGGGAATGTCATTAAGGCAGCAAAGTAATCTCTGTAGAAAGATGGAGGAGGACTCTCACACTCTTGAGGGCCACAAAGTGGCCACTGTGGGGATCACGGGCCTTGTACACTGTCCCATAGGCACCGACACCAATTTCAGCCACTGGCTCATATCGAGAGGTAGCCAT KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKGKKKKKKKKKKGKKKKKKKKKKKKKKKKKKKKKK?KKKKKKKKKKKKKKKKK)KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK&KKKK@KKK$&KK))KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK%$KKKKKKKKKKKKKKKKKKKKKKKK NH:i:1  HI:i:1  AS:i:278        nM:i:2  NM:i:2  MD:Z:2G4G285    jM:B:c,0        jI:B:i,58142208,58145382
B       16      chr12   58142032        255     176M3175N118M   *       0       0       TTTTTTTTTTTTTCCTGTATAAAAAAGGACCCCAAATATAAAGGTAGGGAAAGGGACAAGAGGGAACATACCCCTTAGTGTAGAGAAATGGGAAGGAGAAGGAGAAGCCTCAAAAGGAGAGGTGGGAGGGGAATGTCATTAAGGCAGCAAAGTAATCTCTGTAGAAAGATGGAGGAGGACTCTCACACTCTTGAGGGCCACAAAGTGGCCACTGTGGGGATCACGGGCCTTGTACACTGTCCCATAGGCACCGACACCAATTTCAGCCACTGGCTCATATCGAGAGGTAGCCAT  KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKGKKKKKKKKKKGKKKKKKKKKKKKKKKKKKKKKK?KKKKKKKKKKKKKKKKK)KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK&KKKK@KKK$&KK))KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK%$KKKKKKKKKKKKKKKKKKKKKKKK  NH:i:1  HI:i:1  AS:i:279        nM:i:2  NM:i:2  MD:Z:2G4G286    jM:B:c,0        jI:B:i,58142208,58145382

Let me know if you need more info.

"Segmentation fault (core dumped)" error when performing multi-sample 2-pass mapping

I'm trying to perform 2-pass mapping on some canine RNA-seq libraries (STAR v2.4.1d). When I tested with one sample, everything worked fine. However, now that I'm extending to the entire cohort (29 samples), I'm running into "Segmentation fault (core dumped)" errors during the second pass. Here's an example command:

/extscratch/morinlab/projects/canine_lymphoma/software/star-2.4.1d/bin/Linux_x86_64_static/STAR \
    --readFilesCommand zcat \
    --genomeDir /extscratch/morinlab/reference/igenomes/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex-2.4.1d/ \
    --outFileNamePrefix star_second_pass/ccb030084-0100_ \
    --sjdbFileChrStartEnd /extscratch/morinlab/projects/canine_lymphoma/results/star_pipeline-1.0.0/morin_cohort/2-concatenate_splice_junctions/all_samples_SJ.out.sort.tab \
    --outSAMtype BAM SortedByCoordinate \
    --runThreadN 5 \
    --sjdbOverhang 75 \
    --readFilesIn /extscratch/morinlab/projects/canine_lymphoma/data/fastq_rnaseq/morin_cohort/ccb030084-0100_R1.fastq.gz /extscratch/morinlab/projects/canine_lymphoma/data/fastq_rnaseq/morin_cohort/ccb030084-0100_R2.fastq.gz

At first, I thought it was caused by the presence of more than one SJ.out.tab files in the command, despite the documentation saying that this is supported. When I re-ran the command while only passing the SJ.out.tab file from one sample, it worked.

Strangely though, when I concatenated the SJ.out.tab files from all samples (using the command below to ensure no duplicates), it started segfaulting again. I even ensured that the order of the splice junctions matched the order of the chromosomes in the reference genome (using a separate command), but that didn't help.

cat ../2015-06-09_05-23-27/*_star_pipeline/outputs/star_first_pass/*_SJ.out.tab | cut -f1-4 | sort -k1,1 -k2,2n -k3,3n | uniq

Do you have any ideas as to why passing a SJ.out.tab file from one sample works, but passing a concatenated SJ.out.tab file from multiple samples ends up failing? Thanks!

STAR speed slowing down during run

Hi, I am trying to run a STAR job, but it is taking an unusually long time. I see the mapping speed declined drastically in time (50M reads paired end unstranded, mouse genome mm10, it has been running for 3 days...). I am using 7 threads on a desktop computer with 32Gb RAM running Xubuntu. Not really a dedicated server, really...
I see in the closed issues that there was a similar case, but the error came from an external memory leak. Here, I launch STAR inside a bash shell script function so there are no 3rd party software causing a leak.
I have been reading the manual for the previous version of STAR (2.3.0.1): there is a paragraph on the genomeLoad parameter usage which is not in the new manual. I was thinking of maybe trying to load the genome in the shared memory and see how it goes, after settting the kernel shmmax and shmall parameters.
I have pasted the log files below. The genome index was built successfully,
Thanks a lot for your help!
Cyril Cros

http://pastebin.com/dfDpt8wR <- Log file
http://pastebin.com/2LzzPCid <- Log progress file

EDIT1: I am running several jobs with genomeLoad=LoadAndKeep and an unsorted BAM output (to avoid reserving shared memory for the BAM sorting). A first job was finished sucessfully in 15min only, all went well. I suggest you add to the new version of the manual the paragraph on the option genomeLoad.

Manual: Recommends conflicting advice (do not include Haps/Patches; Do use Gencode)

Briefly, the manual recommends in section 2.2.1 that patches and alternative haplotypes not be included. However, in section 2.2.2, GENCODE is recommended for mouse and human. While GENCODE is a superb annotation, the manual further indicates that GENCODE is advantageous because it includes a matching GTF and FASTA. However, the GENCODE FASTA file also includes patches and haplotypes. Thus, these pieces of advice are in conflict.

output log files only option

Hi - for performance reasons I am running STAR from a specific disk with the output going to that disk as well (which by default per the manual would be the directory the program was launched, the CWD) - is it possible to redirect the output of the log files only to a central location? this would help out immensely in a clustered configuration. Apologies if there is an easy answer to this - maybe a symlink...

STAR-STAR_2.4.0f1 crashes with zero length reads

EXITING because of FATAL ERROR in reads input: short read sequence line: 1
Read Name=@M01750:160:000000000-ABY8Y:1:1103:6900:8107
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=500

Nov 14 21:03:24 ...... FATAL ERROR, exiting

The read is actually 0 bases long. STAR_2.3.0e handles this gracefully. It will be nice if future versions of STAR handle this gracefully as well.

segmentation fault mapping against Danio rerio Zv9

I've created a STAR index of the zebrafish reference genome (Danio rerio Zv9), available at ftp://ftp.ensembl.org/pub/release-75/fasta/danio_rerio/dna/Danio_rerio.Zv9.75.dna_sm.toplevel.fa.gz . When mapping this FASTA sequence against it with STAR, I get a segmentation fault. When I remove the last or the first line of the sequence in the FASTA file, STAR finishes successfully. It's also possible to remove a line in the middle such that the segfault is not triggered, so I guess it's not the sequence itself that leads to the problem.

I observed the problem originally with STAR 2.3.1o and could reproduce it with the version from git master. I used this command to map the sequence:

./STAR --genomeDir reference/Danio_rerio.Zv9.75.star/ --readFilesIn segfault.fasta

Use library information when outputting XS tag

Currently, STAR only uses intron motif information to determine the strand output in the XS tag. Other aligners (TopHat, HISAT) allow library information to be explicitly specified. When aligning reads from a stranded library (fr-firststrand), the XS tags are highly discordant between STAR and HISAT. This is not an issue when using Cufflinks for assembly (since it allows the library type to be specified), but the new StringTie assembler does not have a way to specify the library type, and relies only on the strand reported in the XS tag.

Clarification needed on reporting of chimeric junctions.

Dear Alex,

I've recently came across this post (https://www.biostars.org/p/121737/#121779), and it appears that I've also got confused with what junction end up in chimeric.junction.out file.
The documentation says that chimeric junctions are defined as:

[...] the segments belong to different chromosomes, or different strands, or are far from each other

So the question is basically what option/parameter tells STAR if segments are far from each other on the same chromosome. Correct me if I'm wrong, but it appears that --outSJfilterIntronMaxVsReadN option defines a threshold that filters junctions from splice junctions file, but doesn't automatically transfer them to the chimeric junction file.

Thanks in advance,
Mike

Segfault when running 2.4.1c with a blank GTF file

I ran across an unhandled exception while running a test case.

STAR --runMode genomeGenerate --genomeFastaFiles ${genomeRefFa} ${spikeInFa} --sjdbOverhang 100 --sjdbGTFfile ${combinedGtf} --runThreadN 12 --genomeDir ${starRefIndexDir} --outFileNamePrefix ${starRefIndexDir

Results in

May 08 03:43:03 ..... Processing annotations GTF
8 Segmentation fault      (core dumped)

When $combinedGtf points to an existing yet blank file.

Supporting GRCh38/hg38

I was wondering how STAR development will continue with regard to the new GRCh38/hg38 assembly and the existence of ALT contigs (Reference 1 Reference 2). To quote Heng Li:

GRCh38 ALT contigs are totaled 109Mb in length, spanning 60Mbp of the primary assembly. However, sequences that are highly diverged from the primary assembly only contribute a few million bp. Most subsequences of ALT contigs are nearly identical to the primary assembly. If we align sequence reads to GRCh38+ALT blindly, we will get many additional reads with zero mapping quality and miss variants on them. It is crucial to make mappers aware of ALTs. Source

Heng developed bwakit to approach a solution to ALT mapping for DNA-Seq, but as far as I know no similar implementation for RNA-Seq/spliced mapping has been proposed as of now. Since all relevant genome annotations for GRCh38/hg38 have been published, how is the recommended workflow for STAR concerning the new assembly?

Obviously, one can just remove the ALT contigs and map against hg38-noalt, but that would disregard the interesting advantages of the ALTs. Also, future assemblies will continue to capture known variation (until we will use the string graph directly, I imagine), so I suggest it could be fruitful sense to think about the consequences of this new assembly structure for spliced read mapping.

Segmentation fault - for phytozome_v9.0 genome - STAR genomeGenerate method

Hi Alex,

I am using STAR on phytozome_v9.0 genome specifically Brassica rapa, Zea mays and Glycine max. I am getting Segmentation fault error message when trying with genome index generation step.

Genome size are less than 2GB, I am running these experiments with 20GB of RAM.

Here is few last lines from valgrind out, I didn't see any obvious reasons for fail.
valgrind -v STAR --runMode genomeGenerate --genomeDir STARgenome/ --genomeFastaFiles Zmays_181.fa --runThreadN 1 --sjdbGTFfile Zmays_181_gene.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 100

==32016== 3332911 errors in context 242 of 242:
==32016== Conditional jump or move depends on uninitialised value(s)
==32016== at 0x552D48: _int_malloc (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016== by 0x55437D: malloc (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016== by 0x4FFDCC: operator new(unsigned long) (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016== by 0x4E2488: std::string::_Rep::S_create(unsigned long, unsigned long, std::allocator const&) (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016== by 0x4E2EF9: char* std::string::S_construct<char const*>(char const, char const, std::allocator const&, std::forward_iterator_tag) (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016== by 0x4E3032: std::basic_string<char, std::char_traits, std::allocator >::basic_string(char const*, std::allocator const&) (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016== by 0x411D2B: Parameters::Parameters() (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016== by 0x4016A6: main (in /software/STAR/2.4.0f1/bin/Linux_x86_64/STAR)
==32016==
==32016== ERROR SUMMARY: 10000000 errors from 242 contexts (suppressed: 0 from 0)
Segmentation fault

release notes

This isn't really a software issue, but I was wondering if it was possible to add some release notes. STAR is updated relatively frequently, so it would be nice to know how important each update is.

--genomeGenerate fails starting from the .g release on some files

Hi Alex,

STAR version .g and .h segfaults when building the genome for these two files: https://gist.github.com/roryk/3ddf5590a97114272702

The .f release works okay, though.

MANUAL: Incorrect flag description

Hi Alex, found a minor error in the manual.

In section 3.1, the manual repeats the description of --runThreadN from section 2.1.

States that it defines the threads to be used for genome generation, as opposed to threads to be used for mapping.

Hope this is the correct place to report such minor issues!

make STARlongStatic don't work

Hi,

The compilation of STAR 2.4.0h using "make clean STARlongStatic" do not work (there is an error in the linker step). However when I use "make clean STARlong", the compilation successfully works.

To do the compilation, I use the following Dockerfile :

Best regards,
Laurent.

############################################################
# Dockerfile to build star container images
# Based on Ubuntu
############################################################

# Set the base image to Ubuntu

FROM ubuntu:12.04

# File Author / Maintainer
MAINTAINER Sophie Lemoine <[email protected]>

# Update the repository sources list
RUN apt-get update

# Install compiler and perl stuff
RUN apt-get install --yes build-essential gcc-multilib apt-utils zlib1g-dev vim git

# Install STAR
WORKDIR /tmp
RUN git clone https://github.com/alexdobin/STAR.git
WORKDIR /tmp/STAR/source
RUN git checkout STAR_2.4.0h
RUN make clean STARlongStatic

aligning to a very small genome or a single gene

I have used STAR to align to small genomes. I ran into some issues, so I had to adjust the --genomeSAindexNbases parameter.

I now tried to create a separate genome for a single gene (<1 kb). The index is generated fine (setting --genomeSAindexNbases to 3 or 2). However, when I run the alignment, STAR just hangs. It does not exit and there are no errors. The latest progress.out message is "Started 1st pass mapping". Do you know how to resolve this?

Multiple input files (--readFilesIn)

Hi,
A very minor comment to the manual (page 19) - I thought it would be helpful to see how multiple input files can be listed from '--readFilesIn' parameter.
For example:
--readFilesIn Read1A.fq,Read1B.fq Read2A.fq,Read2B.fq

Thanks,
Sung

Executables are Outdated

I downloaded the static version of 2.4.0j and I found some binaries in a folder, but they were from 2.3.1z. Also, could the output be nicer if STAR is run without any parameters ? Other applications print a list of parameters in this situation.

Command line help menu

This is a feature request. When I run the STAR command with a typo in the parameters it automatically creates an empty Aligned.out.sam file along with several Log* files. It would be helpful if a help menu was generated instead of various empty output files.

STAR-Fusion directory empty in release

Hi Alex,

I'm wondering if something needs to be updated in how the submodule is being specified.... I just downloaded the latest release, and it has a STAR-Fusion/ directory but it's entirely empty.

Perhaps I need to make an official release (as opposed to a pre-release) for it to work?

best,

~brian

invalid arguments are silently ignored

Hi,

STAR usually expects parameters to be prefixed by two dashes (e.g., --sjdbGTFfile). If one accidentally forgets one dash (e.g., -sjdbGTFfile), then the parameter is silently ignored. No error is thrown. This is dangerous for optional arguments, because it can lead to parameters being ignored without the user being aware of it. For the given example (sjdbGTFfile) this leads to genome indices being created without making use of the annotation. If you do not carefully study the log file and make sure that the effective command-line is identical to the original command-line, you will not know.

Another issue is that parameters are silently ignored when used in invalid contexts. For example, STAR does not throw an error when the parameter sjdbGTFfile is used in runMode alignReads. Users coming from other RNA-Seq aligners (such as TopHat) could mistakenly expect STAR to make use of the specified GTF file, because TopHat automatically generates a new index when a GTF file is specified for which there exists no index yet. I am not expecting STAR to do the same, but at least it should throw an error when parameters are used in the wrong context.

Regards,
Sebastian

Multi-sample 2-pass mapping, gzip: stdout: Broken pipe Segmentation fault

Hi Alex,
First, I just want to say thank you for all of your great work.

I am having trouble with the 2nd pass step, while including annotations on the fly, for Multi-sample mapping.
I believe another user had a similar issue but I'm not sure if they ever got back to you with the ouptut logs.
The Problem seems to be somewhat intermittent. With a loop script I started 3 samples and failed with this error.
I also tried starting 96 at once and 89 of those failed with the same error.

Here is my ErrLog of the run

Jul 20 12:49:08 ..... Started STAR run
Jul 20 12:49:08 ..... Loading genome
Jul 20 12:52:56 ..... Processing annotations GTF
Jul 20 12:53:54 ..... Inserting junctions into the genome indices

gzip: stdout: Broken pipe
/cm/local/apps/torque/var/spool/mom_priv/jobs/2054630.pnap-mgt1.cm.cluster.SC: line 47: 12656 Segmentation fault      (core dumped) /home/achristofferson/tools/STAR/bin/Linux_x86_64_static/STAR --runThreadN 16 --runMode alignReads --genomeDir /scratch/achristofferson/ref_genome --readFilesCommand zcat --sjdbGTFfile ${GTF_file} --sjdbOverhang ${read_length} --outFilterType BySJout --outFilterMultimapNmax 10 --outFilterMismatchNmax 10 --outSAMmapqUnique 60 --outSAMunmapped Within --limitOutSAMoneReadBytes 90000000 --outSAMtype SAM --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --seedSearchStartLmax 30 --chimSegmentMin 15 --chimJunctionOverhangMin 15 --outReadsUnmapped Fastx --genomeLoad NoSharedMemory --outSAMattrRGline ${RGT} --outSAMmode Full --sjdbFileChrStartEnd ${SJ_out_tab} --limitSjdbInsertNsj 3000000 --readFilesIn ${FASTL1} ${FASTL2}

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

Also here is the tail end of the Log.out file

Processing sjdbGTFfile=/home/achristofferson/ensembl_ref/Homo_sapiens.GRCh37.75.gtf, found:
                196501 transcripts
                1195764 exons (non-collapsed)
                344569 collapsed junctions
Jul 20 12:53:20 ..... Finished GTF processing
Jul 20 12:53:20   Loaded database junctions from the GTF file: /home/achristofferson/ensembl_ref/Homo_sapiens.GRCh37.75.gtf: 32248208 total junctions

Jul 20 12:53:54   Finished preparing junctions
Jul 20 12:53:54 ..... Inserting junctions into the genome indices
Jul 20 12:59:25   Finished SA search: number of new junctions=2812730, old junctions=0

This may be a separate issue, however is it normal to see so many WARNING lines in the Log.out saying?
chromosome 'HG1472_PATCH' not found in Genome fasta files for line:

HG1472_PATCH was just an example, not all of them say PATCH but a lot do.

Thanks again Alex

help menu?

Hi Alex,

Great to see you on github. :)

In running the binary, I noticed it just errors out if you don't give it parameters. Usually -h | --help (or no parameters) will invoke some sort of usage info for the tool. Would this be an option for STAR?

I should mention that the manual (pdf) is very helpful.

many thanks,

~brian

Default behavior - save unmapped reads to SAM/BAM output

I would like to request that the option --outSamUnmapped be set to default to saving unmapped reads. Many users of STAR will be unaware of this setting and will end up distributing SAM/BAM files without unmapped reads. This will hinder future reanalysis of these datasets.

The Cancer Genome Hub here at UCSC makes it policy to always store unmapped reads in alignment files.

Are the indices version specific?

We are running several different version of STAR and used to keep a central folder with STAR indices for commonly used species. However, we now experience problems during loading the indices:

Shared memory error: 4, errno: Invalid argument(22) EXITING because
of FATAL ERROR: problems with shared memory: error from shmget() or
shm_open(). SOLUTION: check shared memory settings as explained in
STAR manual, OR run STAR with --genomeLoad NoSharedMemory to avoid
using shared memory

May this issues be related to different STAR versions using the same index? Follow-up question: is the compile option STARlong affecting the indices created?

Thank you very much ,
Tobias

Segfault with a particular read agains small genome

Up to 2.4.0j I was suffering from issue #17. Now I'm able to build my genome, but I've identified a single particular read that causes a segfault when aligning. Test data here: https://gist.github.com/tdido/55c94c003c21b4f3620d. Run parameters are as follows:

STAR --runMode genomeGenerate --genomeDir genome/ --genomeFastaFiles genome.fasta
STAR --genomeDir genome/ --readFilesIn test_single_fail.fastq

Additionally, if you change one of the first 'A's in the poly-A stretch of the read to any other nucleotide, the alignment completes with no problems.

Issue compliing on Mac OS X 10.9.5

Hi there,

I've tried to compile Star now several times with different compilers, including with the recommended homebrew g++:

make STARforMac CXX=/usr/local/Cellar/gcc/4.9.2_1/bin/g++-4.9

However, each time I seem to be hitting the same (or deceptively similar) error:

g++-4.9: error: htslib/libhts.a: No such file or directory
make: *** [STARforMac] Error 1

Any help you could provide would be much appreciated!

Supplying a FASTA directory instead of a .fa file causes STAR genomeGenerate to hang

When running STAR --runMode genomeGenerate, if you supply --genomeFastaFiles with the folder your FASTA file is in (vs. the actual FASTA file), STAR chugs away forever on a single core. It should return an error message instantly.

Output BAM file read pairing information incorrectly flagged?

I have not figured out what specifically is wrong, but when trying to extract paired reads from a STAR-generated BAM I ran into the following issue.

Samtools flagstat on a paired alignment output from STAR shows all reads properly paired and mapped.

ku Fri May 22 11:22 brain $samtools flagstat ERR033000Aligned.sortedByCoord.out.bam 67014464 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 67014464 + 0 mapped (100.00%:-nan%) 67014464 + 0 paired in sequencing 33507232 + 0 read1 33507232 + 0 read2 67014464 + 0 properly paired (100.00%:-nan%) 67014464 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

Attempts to extract these reads by using samtools bam2fq capturing singletons in a separate file has 97.5% of reads extracted as singletons.

kolossus Fri May 22 11:23 brain $samtools view -b ERR033000Aligned.sortedByCoord.out.bam | samtools bam2fq -O -s single.fq - > paired.fq [M::main_bam2fq] discarded 60937988 singletons [M::main_bam2fq] processed 62480188 reads

Sorting this BAM by name and taking alternate lines recovers 67014464 properly paired reads.

STAR aborts at the very end when running with shared genome and --quantMode TranscriptomeSAM

STAR version STAR_2.4.1d

After setting

sysctl -w kernel.shmall=8000000
sysctl -w kernel.shmmax=32000000000

running

$ STAR --genomeDir genome_dir --genomeLoad LoadAndExit

is successfully, then

$ STAR --genomeDir genome_dir --readFilesIn Test.fastq --genomeLoad LoadAndKeep --quantMode TranscriptomeSAM
Aug 26 14:39:47 ..... Started STAR run
Aug 26 14:39:47 ..... Loading genome
Aug 26 14:39:47 ..... Started mapping
Aug 26 14:39:54 ..... Finished successfully
*** Error in `STAR': double free or corruption (!prev): 0x00000000011a3270 ***
Aborted (core dumped)

but

$ STAR --genomeDir genome_dir --readFilesIn Test.fastq --genomeLoad LoadAndKeep

finishes without error message and also

$ STAR --genomeDir genome_dir --genomeLoad Remove
$ STAR --genomeDir genome_dir --readFilesIn Test.fastq --genomeLoad LoadAndKeep --quantMode TranscriptomeSAM

does not throw any error, only when repeating the command from above (STAR --genomeDir genome_dir --readFilesIn Test.fastq --genomeLoad LoadAndKeep --quantMode TranscriptomeSAM)
I get the same error message.

Thanks,
Kajetan

star-fusion error

I am getting the following error running star-fusion.

-parsing
/tutorial_data//refdata/gencode.v19.annotation.gtf
-building interval tree for fast searching of gene overlaps
-mapping junction reads to genes
Intervals must have positive width at /tutorial_data//packages/STAR/STAR-Fusion/STAR-Fusion line 682, <$__ANONIO__> line 134490.

--quantMode TranscriptomeSam segfaults

Hi Alex,

STAR 2.4.0e segfaults adding --quantMode TranscriptomeSAM:

/usr/local/bin/STAR --genomeDir /v-data/bcbio-nextgen/tests/data/genomes/mm9/star/ --readFilesIn /v-data/bcbio-nextgen/tests/test_automated_output/trimmed/1_110907_ERP000591_1_fastq.trimmed.txt.gz /v-data/bcbio-nextgen/tests/test_automated_output/trimmed/1_110907_ERP000591_2_fastq.trimmed.txt.gz --runThreadN 1 --outFileNamePrefix /v-data/bcbio-nextgen/tests/test_automated_output/align/Test1/1_110907_ERP000591 --outReadsUnmapped Fastx --outFilterMultimapNmax 10 --outStd SAM --quantMode TranscriptomeSAM --outSAMunmapped Within --outSAMattributes NH HI NM MD AS --readFilesCommand zcat --outSAMattrRGline ID:1 PL:illumina PU:1_110907_ERP000591 SM:Test1 --outSAMstrandField intronMotif

Without TranscriptomeSAM it works fine.
Here is a gist of the log file:

https://gist.github.com/roryk/3aade4cc66af6db93496

2.4.1a --sjdbOverhang=100 is not equal to the value at the genome generation step =50

The new on-the-fly sjdb options conflict with premade genomes including a GTF.

In particular, my genome is generated with --sjdbOverhang 50 ; however the default value for sjdbOverhang=100 at the mapping stage conflicts with this and the run is aborted:

$ STAR --genomeDir ~/data/ref/indices/STAR/GRCh37.NCBI.STAR50 --readFilesIn fastqgz/R1.fastq.gz fastqgz/R2.fastq.gz --readFilesCommand zcat --outFileNamePrefix SID/

Apr 20 15:25:14 ..... Started STAR run
Apr 20 15:25:14 ..... Loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=100 is not equal to the value at the genome generation step =50
SOLUTION:

Apr 20 15:25:14 ...... FATAL ERROR, exiting
$

This can be mitigated by including --sjdbOverhang N at the mapping step, where N is how your genome was generated.

Solution should be disabling that check if no GTF file is included on-the-fly at the mapping step.

short reads cause memory to blow up

Hi Alex,

I'm mapping to a genome that is relatively small, and STAR uses huge amounts of memory trying to map short reads, over 60 GB before I run out of memory. The reads have a minimum size of 20 bp. I posted the head and tail of the progress log file:

Feb 03 10:05:35    141.3     2473523      190    99.0%    189.8     0.2%     0.0%     0.0%     0.0%     1.0%     0.0%
Feb 03 10:06:36    158.0     5442105      190    99.0%    190.1     0.2%     0.1%     0.0%     0.0%     0.9%     0.0%
Feb 03 10:07:36    145.1     7414834      190    99.0%    190.4     0.2%     0.1%     0.0%     0.0%     0.9%     0.0%
Feb 03 10:08:38    151.8    10370092      191    98.9%    190.8     0.2%     0.2%     0.0%     0.0%     0.9%     0.0%
Feb 03 10:09:38    153.9    13078382      191    98.9%    190.8     0.2%     0.2%     0.0%     0.0%     0.9%     0.0%
Feb 03 10:10:38    163.8    16649732      191    98.9%    190.7     0.2%     0.2%     0.0%     0.0%     1.0%     0.0%
Feb 03 10:11:39    165.3    19602262      190    98.9%    190.5     0.2%     0.2%     0.0%     0.0%     0.9%     0.0%

Feb 03 18:45:08     13.0   112499554      191    77.2%    190.6     0.2%     0.3%     0.0%     0.0%    22.6%     0.0%
Feb 03 18:47:49     12.9   112743799      191    77.0%    190.6     0.2%     0.3%     0.0%     0.0%    22.7%     0.0%
Feb 03 18:48:51     12.9   113108446      191    76.8%    190.6     0.2%     0.3%     0.0%     0.0%    22.9%     0.0%
Feb 03 18:50:45     12.9   113230231      191    76.7%    190.6     0.2%     0.3%     0.0%     0.0%    23.0%     0.0%
Feb 03 18:56:46     12.8   113351490      191    76.7%    190.6     0.2%     0.3%     0.0%     0.0%    23.1%     0.0%
Feb 03 19:02:55     12.6   113472737      191    76.6%    190.6     0.2%     0.3%     0.0%     0.0%    23.1%     0.0%
Feb 03 19:05:50     12.6   113594358      191    76.5%    190.6     0.2%     0.3%     0.0%     0.0%    23.2%     0.0%
Feb 03 19:07:04     12.6   113715626      191    76.5%    190.6     0.2%     0.3%     0.0%     0.0%    23.3%     0.0%
Feb 03 19:09:31     12.5   113958408      191    76.3%    190.6     0.2%     0.3%     0.0%     0.0%    23.4%     0.0%
Feb 03 19:11:39     12.5   114079534      191    76.3%    190.6     0.2%     0.3%     0.0%     0.0%    23.5%     0.0%
Feb 03 19:13:58     12.5   114322705      191    76.1%    190.6     0.2%     0.3%     0.0%     0.0%    23.6%     0.0%

Eventually it dies when I run out of memory on the machine.

I put up a gist of the log file here: https://gist.github.com/roryk/3763737c82f5d534cfa8

Do you have any suggestions? Is there a minimum read size that would not trigger this?

Duplicated "ID:" element in @RG tag of BAM

Hi,

Using the latest version (2.4.0e), I consistently get a duplicated ID: element in the @RG:

Clip from the header, incl command used to generate:

@PG     ID:STAR PN:STAR VN:STAR_2.4.0e  CL:tools/bin/STAR   --runThreadN 8   --genomeDir /proj/b2010040/private/nobackup/autoseqer-genome/star   --genomeLoad LoadAndKeep   --readFilesIn /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_1.fastq.gz   /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_2.fastq.gz      --readFilesCommand zcat      --outFileNamePrefix /tmp/2904a6e6-9795-40c3-a0d4-0f5522089215//   --outSAMattributes NH   HI   AS   nM   NM   MD   XS      --outSAMmapqUnique 50   --outSAMattrRGline ID:JR61T_rnaseq   SM:JR61T_rnaseq   LB:JR61T_rnaseq   PL:ILLUMINA   
@RG     ID:JR61T_rnaseq ID:JR61T_rnaseq SM:JR61T_rnaseq LB:JR61T_rnaseq PL:ILLUMINA
@CO     user command line: tools/bin/STAR --genomeLoad LoadAndKeep --outSAMattributes NH HI AS nM NM MD XS --outSAMmapqUnique 50 --readFilesCommand zcat --outSAMattrRGline ID:JR61T_rnaseq SM:JR61T_rnaseq LB:JR61T_rnaseq PL:ILLUMINA --genomeDir /proj/b2010040/private/nobackup/autoseqer-genome/star --runThreadN 8 --readFilesIn /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_1.fastq.gz /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_2.fastq.gz --outFileNamePrefix /tmp/2904a6e6-9795-40c3-a0d4-0f5522089215//

The key part is @RG ID:JR61T_rnaseq ID:JR61T_rnaseq.

This does not occur with for example STAR_2.3.1z8_r447

@PG     ID:STAR PN:STAR VN:STAR_2.3.1z8_r447    CL:/home/daniel.klevebring/bin/autoseqer1/tools/STAR   --runThreadN 8   --genomeDir /proj/b2010040/private/nobackup/autoseqer-genome/star   --genomeLoad LoadAndKeep   --readFilesIn /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_1.fastq.gz   /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_2.fastq.gz      --readFilesCommand zcat      --outFileNamePrefix /tmp/2904a6e6-9795-40c3-a0d4-0f5522089215//   --outSAMattributes NH   HI   AS   nM   NM   MD   XS      --outSAMunmapped Within   --outSAMmapqUnique 50   --outSAMattrRGline ID:JR61T_rnaseq   SM:JR61T_rnaseq   LB:JR61T_rnaseq   PL:ILLUMINA      --chimSegmentMin 20
@RG     ID:JR61T_rnaseq SM:JR61T_rnaseq LB:JR61T_rnaseq PL:ILLUMINA
@CO     user command line: /home/daniel.klevebring/bin/autoseqer1/tools/STAR --genomeLoad LoadAndKeep --outSAMattributes NH HI AS nM NM MD XS --outSAMmapqUnique 50 --chimSegmentMin 20 --readFilesCommand zcat --outSAMunmapped Within --outSAMattrRGline ID:JR61T_rnaseq SM:JR61T_rnaseq LB:JR61T_rnaseq PL:ILLUMINA --genomeDir /proj/b2010040/private/nobackup/autoseqer-genome/star --runThreadN 8 --readFilesIn /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_1.fastq.gz /proj/b2010040/private/nobackup/clinseq/CLINICAL/JR61/DataReport_JR61T/rnaseq/JR61T_rnaseq.trimmed_2.fastq.gz --outFileNamePrefix /tmp/2904a6e6-9795-40c3-a0d4-0f5522089215//

STAR_2.4.0f1 does not compile on Linux (Fedora 18) g++ 4.8.3

$ make STAR
g++ -c -O3 -pipe -std=c++0x -Wall -Wextra -fopenmp -D'COMPILATION_TIME_PLACE="Fri Nov 14 17:58:46 EST 2014 :/home/newtong/Downloads/STAR-STAR_2.4.0f1/source"' Parameters.cpp
Parameters.cpp: In member function ‘void Parameters::inputParameters(int, char*)’:
Parameters.cpp:209:37: error: ‘parametersDefault’ was not declared in this scope
string parString( (const char) parametersDefault,parametersDefault_len);
^
Parameters.cpp:209:55: error: ‘parametersDefault_len’ was not declared in this scope
string parString( (const char_) parametersDefault,parametersDefault_len);
^
Parameters.cpp:697:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int ii=0;ii<vAttr1.size();ii++) {
^
make: *_* [Parameters.o] Error 1
$

$ g++ -v
Target: x86_64-unknown-linux-gnu
gcc version 4.8.3 (GCC)

On Fedora 18
$ uname -a
Linux onottr624241 3.11.10-100.fc18.x86_64 #1 SMP Mon Dec 2 20:28:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$

recommend use of build system like autotools or cmake

for instance, users may already have a version of htslib installed which you could detect and link to automatically, as opposed to including it in your source.

it would also ease detection of compilers etc.

alexdobin / star Goto Github PK

star's People

Contributors

Stargazers

Watchers

Forkers

star's Issues

1

2

Recommend Projects

Recommend Topics

Recommend Org

Jobs