GithubHelp home page GithubHelp logo

hse22_hw1's Introduction

hse22_hw1

Обязательная часть

  1. Создадим символьные ссылки чтоб избежать копирования.
ln -s /usr/share/data-minor-bioinf/assembly/oil_R1.fastq
ln -s /usr/share/data-minor-bioinf/assembly/oil_R2.fastq
ln -s /usr/share/data-minor-bioinf/assembly/oilMP_S4_L001_R2_001.fastq
ln -s /usr/share/data-minor-bioinf/assembly/oilMP_S4_L001_R1_001.fastq
  1. Установим seed (в моем случае это 721) и через seqtk выбираем случайно 5 миллионов чтений типа paired-end и 1.5 миллиона чтений типа mate-pairs
seqtk sample -s721 oil_R1.fastq 5000000 > sub1.fastq
seqtk sample -s721 oil_R2.fastq 5000000 > sub2.fastq
seqtk sample -s721 oilMP_S4_L001_R1_001.fastq 1500000 > matepairs.fastq
seqtk sample -s722 oilMP_S4_L001_R2_001.fastq 1500000 > matepairs2.fastq
  1. С помощью программы fastQC оцениваем качество исходных чтений и выводим по ним общую статистику c помощью multiQC.
mkdir fastqc
ls sub* matepairs* | xargs -tI{} fastqc -o fastqc {}
mkdir multiqc
multiqc -o multiqc fastqc

Скрин Скрин2

  1. С помощью программ platanus_trim и platanus_internal_trim подрезаем чтение по качеству.
platanus_trim sub*
platanus_internal_trim matepair*
  1. Удаляем исходники.
rm sub1.fastq
rm sub2.fastq
rm matepairs.fastq 
rm matepairs2.fastq
  1. С помощью программы fastQC оцениваем качество исходных чтений и выводим по ним общую статистику c помощью multiQC.
mkdir fastqc_trim
ls sub* matepairs*| xargs -tI{} fastqc -o fastqc_trim {}
mkdir multqctrim
multiqc -o multqctrim fastqc_trim

Скрин3 Скрин4

  1. С помощью программы “platanus assemble” собираем контиги.
time platanus assemble -o Poil -f sub1.fastq.trimmed sub2.fastq.trimmed 2> assemble.log
  1. С помощью программы “platanus scaffold” собираем скаффолды из контигов.
time platanus scaffold -o Poil -c Poil_contig.fa -IP1 sub1.fastq.trimmed sub2.fastq.trimmed -OP2 matepairs.fastq.int_trimmed matepairs2.fastq.int_trimmed 2> scaffold.log
  1. C помощью программы “platanus gap_close” уменьшаем промежутки.
time platanus gap_close -o Poil -c Poil_scaffold.fa -IP1 sub1.fastq.trimmed sub2.fastq.trimmed -OP2 matepairs.fastq.int_trimmed  matepairs2.fastq.int_trimmed 2> gapclose.log

Ссылка на ноутбук: https://github.com/dpaleyev/hse22_hw1/blob/master/src/hw.ipynb

hse22_hw1's People

Contributors

dpaleyev avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.