This steps need to be done once for each new genome
GENOMENAME=H37RvCO
mkNucFile.sh ${GENOMENAME}.genome ${GENOMENAME}.fasta
Where ${GENOMENAME}.genome
is the BEDTOOLS
format genome file. The output of this command will be the base profile of the genome. Convert this to the format the R code wants with
Rscript --no-save mkRDA.R \
${GENOMENAME}_1000by100_TileBin.nuc.gz
which will create the file ${GENOMENAME}gcpct.rda
.
Use bedtools
WINDOW=100
bedtools makewindows \
-g ${GENOMENAME}.genome \
-w ${WINDOW} \
>${GENOMENAME}.genome.Windows_${WINDOW}.bed
Make a list of the BAMs to be processed in a file bamList2
and then count the number of reads per bin per sample with:
bedtools multicov \
-bed ${GENOMENAME}.genome.Windows_${WINDOW}.bed \
-bams $(cat bamList2) \
>counts.txt
Use the R script seqDNAcopyGCCorrection.R
to normalize and then segment the data with DNAcopy
and plot the segmentation profile and write the segments file.
Will need to edit the lines at the top of the file depending on your project files.