hku-bal / clair3-trio Goto Github PK

Clair3-Trio: variant calling in trio using Nanopore long-reads

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.14% Python 68.29% Jupyter Notebook 17.85% C++ 4.56% C 2.14% Shell 7.02%

bioinformatics nanopore deep-learning computational-biology genomics long-reads ont-models variant-calling trio-variant-calling

clair3-trio's Introduction

Clair3-Trio: variant calling in trio using Nanopore long-reads

Contact: Ruibang Luo, Junhao Su
Email: [email protected], [email protected]

Clair3-Trio is archived. Please visit the new version Clair3-Nova that extended Clair3-Trio to support accurate de novo variant calling.

Introduction

Accurate identification of genetic variants from family child-mother-father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio’s predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods.

Detailed descriptions of the methodology and results for Clair3-Trio are available in this paper.

Introduction
Latest Updates
What's New in Clair3-Trio
Pre-trained Models
Quick Demo
Installation
Output Files
Usage
Folder Structure and Submodule Descriptions
Clair3-Trio Model Training
Training Data
Analysis Results
VCF/GVCF Output Formats
Publication

Latest Updates

v0.7.1 (Oct 17, 2023): Fix memory error issue.

v0.7 (July 9, 2023): Added the source/version/command tag into the VCF header. Fixed a bug of AF for 1/2 genotypes. Added AD into VCF output. Added document for output files. Added a page for the method of merging VCF.

v0.6 (April 25, 2023): Bumped up Python from 3.6 to 3.9, Whatshap from v1.0 to v1.7 Clair3 #193. Fixed gVCF format mistake #3. Added options "--enable_phasing", "--enable_output_phasing", and "enable_output_haplotagging" #4. Added singularity support.

v0.5 (April 10, 2023): Added support for gVCF output. Use --gvcf to enable gVCF output.

v0.4 (March 22, 2023): Added a model for R10.4 pore with the Kit 14 chemistry (Q20+). Check this page for more information about the model.

v0.3 (June 20, 2022): Optimized Clair3-Trio speed, the runtime of Clair3-Trio to call variants from the whole genome is about 2.4 times of calling a single sample form Clair3 (v0.1-r10).

v0.2 (May 15, 2022): A guppy5 model for Clair3-Trio is available now. Check this page for more information about the Guppy5 model.

v0.1 (April 22, 2022): Initial release.

What's New in Clair3-Trio

New Architecture. Clair3-Trio employs a Trio-to-Trio deep neural network model that allows it to take three samples as input and output the varaints of all three samples in one go.
Mendelian violations aware. Clair3-Trio uses MCVLoss to improve variants calling in trio by penalizing mendelian violoations.
Improved Performance. Using only 10x of HG002, 3 and 4 ONT data, Clair3-Trio achieved 97.30% SNP F1-score and 56.48% Indel F1-score. Compared to Clair3, Clair3-Trio reduced SNP errors by ~78%, and Indel errors by ~22%. Clair3-Trio signficantly reduced Mendelian violations from 48,345 to 7,072.

Pre-trained Models

Download models from here or click on the links below.

Model name	Platform	Training samples	Date	Basecaller	File	Link
c3t_hg002_dna_r1041_e82_400bps_sup	ONT 10.4.1	HG002,3,4	20230322	Dorado v4.0.0 SUP	c3t_hg002_dna_r1041_e82_400bps_sup.tar.gz	Download
c3t_hg002_r941_prom_sup_g5014	ONT r9.4.1	HG002,3,4	20220514	Guppy5 sup	c3t_hg002_r941_prom_sup_g5014.tar.gz	Download
c3t_hg002_g422	ONT r9.4.1	HG002,3,4	20220422	Guppy4 hac	c3t_hg002_g422.tar.gz	Download

Clair3's Pre-trained Models

When using the Clair3-Trio model, please use a corresponding Clair3 model for Pileup calling. Check here or here for more information about Clair3 pretrained model.

Model name	Platform	Training samples	Date	Basecaller	File	Link
r1041_e82_400bps_sup_v400	ONT r10.4.1	HG002,4,5	-	Dorado v4.0.0 SUP	r1041_e82_400bps_sup_v400.tar.gz	Download
r941_prom_sup_g5014	ONT r9.4.1	HG002,4,5 (Guppy5_sup)	20220112	Guppy5 sup	r941_prom_sup_g5014.tar.gz	Download
r941_prom_hac_g360+g422	ONT r9.4.1	HG001,2,4,5	20210517	Guppy3,4 hac	r941_prom_hac_g360+g422.tar.gz	Download

Quick Demo

see Trio Quick Demo.

Installation

Option 1. Docker pre-built image

A pre-built docker image is available here. With it you can run Clair3-Trio using a single command.

Caution: Absolute path is needed for both INPUT_DIR and OUTPUT_DIR.

INPUT_DIR="[YOUR_INPUT_FOLDER]"            # e.g. /input
REF=${_INPUT_DIR}/ref.fa                   # change your reference file name here
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]"          # e.g. /output
THREADS="[MAXIMUM_THREADS]"                # e.g. 8
MODEL_C3="[Clair3 MODEL NAME]"         	   # e.g. Clair3 model, r941_prom_hac_g360+g422 for Guppy4 data, r941_prom_sup_g5014 for Guppy5 data
MODEL_C3T="[Clair3-Trio MODEL NAME]"       # e.g. Clair3-Trio model, c3t_hg002_g422 for Guppy4 data, c3t_hg002_r941_prom_sup_g5014 for Guppy5 data


docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3-trio:latest \
  /opt/bin/run_clair3_trio.sh \
  --ref_fn=${INPUT_DIR}/ref.fa \                  ## change your reference file name here
  --bam_fn_c=${INPUT_DIR}/child_input.bam \       ## change your child's bam file name here 
  --bam_fn_p1=${INPUT_DIR}/parent1_input.bam \    ## change your parent-1's bam file name here     
  --bam_fn_p2=${INPUT_DIR}/parenet2_input.bam \   ## change your parent-2's bam file name here   
  --sample_name_c=${SAMPLE_C} \                   ## change your child's name here
  --sample_name_p1=${SAMPLE_P1} \                 ## change your parent-1's name here
  --sample_name_p2=${SAMPLE_P2} \                 ## change your parent-2's name here
  --threads=${THREADS} \                          ## maximum threads to be used
  --model_path_clair3="/opt/models/clair3_models/${MODEL_C3}" \
  --model_path_clair3_trio="/opt/models/clair3_trio_models/${MODEL_C3T}" \
  --output=${OUTPUT_DIR}                          ## absolute output path prefix

Option 2. Singularity

Caution: Absolute path is needed for both INPUT_DIR and OUTPUT_DIR.

INPUT_DIR="[YOUR_INPUT_FOLDER]"            # e.g. /input
REF=${_INPUT_DIR}/ref.fa                   # change your reference file name here
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]"          # e.g. /output
THREADS="[MAXIMUM_THREADS]"                # e.g. 8
MODEL_C3="[Clair3 MODEL NAME]"         	   # e.g. Clair3 model, r941_prom_hac_g360+g422 for Guppy4 data, r941_prom_sup_g5014 for Guppy5 data
MODEL_C3T="[Clair3-Trio MODEL NAME]"       # e.g. Clair3-Trio model, c3t_hg002_g422 for Guppy4 data, c3t_hg002_r941_prom_sup_g5014 for Guppy5 data

conda config --add channels defaults
conda create -n singularity-env -c conda-forge singularity -y
conda activate singularity-env

# singularity pull docker pre-built image
singularity pull docker://hkubal/clair3-trio:latest

singularity exec \
-B ${INPUT_DIR},${OUTPUT_DIR} \
clair3-trio_latest.sif \
/opt/bin/run_clair3_trio.sh \
--ref_fn=${INPUT_DIR}/ref.fa \                  ## change your reference file name here
--bam_fn_c=${INPUT_DIR}/child_input.bam \       ## change your child's bam file name here 
--bam_fn_p1=${INPUT_DIR}/parent1_input.bam \    ## change your parent-1's bam file name here     
--bam_fn_p2=${INPUT_DIR}/parenet2_input.bam \   ## change your parent-2's bam file name here   
--sample_name_c=${SAMPLE_C} \                   ## change your child's name here
--sample_name_p1=${SAMPLE_P1} \                 ## change your parent-1's name here
--sample_name_p2=${SAMPLE_P2} \                 ## change your parent-2's name here
--threads=${THREADS} \                          ## maximum threads to be used
--model_path_clair3="/opt/models/clair3_models/${MODEL_C3}" \
--model_path_clair3_trio="/opt/models/clair3_trio_models/${MODEL_C3T}" \
--output=${OUTPUT_DIR}                          ## absolute output path prefix

Option 3. Build an anaconda virtual environment

Anaconda install:

Please install anaconda using the official guide or using the commands below:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x ./Miniconda3-latest-Linux-x86_64.sh 
./Miniconda3-latest-Linux-x86_64.sh

Install Clair3 env and Clair3-Trio using anaconda step by step:

# create and activate an environment named clair3
conda create -n clair3 python=3.9.0 -y
source activate clair3

# install pypy and packages in the environemnt
conda install -c conda-forge pypy3.6 -y
pypy3 -m ensurepip
pypy3 -m pip install mpmath==1.2.1

# install python packages in environment
conda install -c conda-forge tensorflow==2.8.0 -y
conda install -c conda-forge pytables -y
conda install -c anaconda pigz cffi==1.14.4 -y
conda install -c conda-forge parallel=20191122 zstd -y
conda install -c conda-forge -c bioconda samtools=1.15.1 -y
conda install -c conda-forge -c bioconda whatshap=1.7 -y
conda install -c conda-forge xz zlib bzip2 automake curl -y
# tensorflow-addons is required in training
pip install tensorflow-addons

# clone Clair3-Trio
git clone https://github.com/HKU-BAL/Clair3-Trio.git
cd Clair3-Trio

# download Clair3's pre-trained models
mkdir -p models/clair3_models
wget http://www.bio8.cs.hku.hk/clair3_trio/clair3_models/clair3_models.tar.gz 
tar -zxvf clair3_models.tar.gz -C ./models/clair3_models


# download Clair3-Trio's pre-trained models
mkdir -p models/clair3_trio_models
wget http://www.bio8.cs.hku.hk/clair3_trio/clair3_trio_models/clair3_trio_models.tar.gz 
tar -zxvf clair3_trio_models.tar.gz -C ./models/clair3_trio_models


# run clair3-trio
_INPUT_DIR="[YOUR_INPUT_FOLDER]"            # e.g. ./input
_BAM_C=${_INPUT_DIR}/input_child.bam        # chnage your child's bam file name here
_BAM_P1=${_INPUT_DIR}/input_parent1.bam     # chnage your parent-1's bam file name here
_BAM_P2=${_INPUT_DIR}/input_parent2.bam     # chnage your parent-2's bam file name here
_SAMPLE_C="[Child sample ID]"               # child sample ID, e.g. HG002
_SAMPLE_P1="[Parent1 sample ID]"            # parent1 sample ID, e.g. HG003
_SAMPLE_P2="[Parent2 sample ID]"            # parent2 sample ID, e.g. HG004
_REF=${_INPUT_DIR}/ref.fa                   # change your reference file name here
_OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]"          # e.g. ./output
_THREADS="[MAXIMUM_THREADS]"                # e.g. 8
_MODEL_DIR_C3="[Clair3 MODEL NAME]"         # e.g. ./models/clair3_models/ont
_MODEL_DIR_C3T="[Clair3-Trio MODEL NAME]"   # e.g. ./models/clair3_trio_models/c3t_hg002_g422

./run_clair3_trio.sh \
  --bam_fn_c=${_BAM_C} \    
  --bam_fn_p1=${_BAM_P1} \
  --bam_fn_p2=${_BAM_P2} \
  --output=${_OUTPUT_DIR} \
  --ref_fn=${_REF} \
  --threads=${_THREADS} \
  --model_path_clair3="${_MODEL_DIR_C3}" \
  --model_path_clair3_trio="${_MODEL_DIR_C3T}" \
  --sample_name_c=${_SAMPLE_C} \
  --sample_name_p1=${_SAMPLE_P1} \
  --sample_name_p2=${_SAMPLE_P2}

Option 4. Bioconda

# make sure channels are added in conda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create conda environment named "clair3-trio"
conda create -n clair3-trio -c bioconda clair3-trio python=3.9.0 -y
conda activate clair3-trio

# run clair3-trio like this afterward
_INPUT_DIR="[YOUR_INPUT_FOLDER]"            # e.g. ./input
_BAM_C=${_INPUT_DIR}/input_child.bam        # chnage your child's bam file name here
_BAM_P1=${_INPUT_DIR}/input_parent1.bam     # chnage your parent-1's bam file name here
_BAM_P2=${_INPUT_DIR}/input_parent2.bam     # chnage your parent-2's bam file name here
_SAMPLE_C="[Child sample ID]"               # child sample ID, e.g. HG002
_SAMPLE_P1="[Parent1 sample ID]"            # parent1 sample ID, e.g. HG003
_SAMPLE_P2="[Parent2 sample ID]"            # parent2 sample ID, e.g. HG004
_REF=${_INPUT_DIR}/ref.fa                   # change your reference file name here
_OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]"          # e.g. ./output
_THREADS="[MAXIMUM_THREADS]"                # e.g. 8
_MODEL_DIR_C3="[Clair3 MODEL NAME]"         # e.g. r941_prom_sup_g5014
_MODEL_DIR_C3T="[Clair3-Trio MODEL NAME]"   # e.g. c3t_hg002_r941_prom_sup_g5014

run_clair3_trio.sh \
  --bam_fn_c=${_BAM_C} \    
  --bam_fn_p1=${_BAM_P1} \
  --bam_fn_p2=${_BAM_P2} \
  --output=${_OUTPUT_DIR} \
  --ref_fn=${_REF} \
  --threads=${_THREADS} \
  --model_path_clair3="${CONDA_PREFIX}/bin/models/${_MODEL_DIR_C3}" \ 
  --model_path_clair3_trio="${CONDA_PREFIX}/bin/models/${_MODEL_DIR_C3T}" \ 
  --sample_name_c=${_SAMPLE_C} \
  --sample_name_p1=${_SAMPLE_P1} \
  --sample_name_p2=${_SAMPLE_P2}

Check Usage for more options. Pre-trained models are already included in the bioconda package.

Option 5. Docker Dockerfile

Building a docker image.

# clone Clair3-Trio
git clone https://github.com/hku-bal/Clair3-Trio.git
cd Clair3-Trio

# build a docker image named hkubal/clair3-trio:latest
# might require docker authentication to build docker image 
docker build -f ./Dockerfile -t hkubal/clair3-trio:latest .

# run clair3-trio docker image like 
docker run -it hkubal/clair3-trio:latest /opt/bin/run_clair3_trio.sh --help

Output Files

Clair3-Trio outputs files in VCF/GVCF format for the trio genotype. The output files (for a trio [C ], [P1], [P2]) including:

.
├── run_clair3_trio.log		        # Clair3-Trio running log
├── [C ].vcf.gz				# Called variants in vcf format for [C ]
├── [P1].vcf.gz				# Called variants in vcf format for [P1]
├── [P2].vcf.gz				# Called variants in vcf format for [P2]
├── [C ].gvcf.gz			# Called variants in gvcf format for [C ] (when enabled `--gvcf`)
├── [P1].gvcf.gz			# Called variants in gvcf format for [P2] (when enabled `--gvcf`)
├── [P2].gvcf.gz			# Called variants in gvcf format for [P2] (when enabled `--gvcf`)
├── phased_[C ].vcf.gz			# Called phased variants for [C ] (when enabled `--enable_output_phasing`)		
├── phased_[P1].vcf.gz			# Called phased variants for [P1] (when enabled `--enable_output_phasing`)		
├── phased_[P2].vcf.gz			# Called phased variants for [P2] (when enabled `--enable_output_phasing`)		
├── phased_[C ].bam			# alignment tagged with phased variants info. for [C ] (when enabled `--enable_output_haplotagging`)		
├── phased_[P1].bam			# alignment tagged with phased variants info. for [P1] (when enabled `--enable_output_haplotagging`)		
├── phased_[P2].bam			# alignment tagged with phased variants info. for [P2] (when enabled `--enable_output_haplotagging`)		
├── [C ]_c3t.vcf.gz			# raw variants from Clair-Trio's trio model for [C ]
├── [P1]_c3t.vcf.gz			# raw variants from Clair-Trio's trio model for [P1]
├── [P2]_c3t.vcf.gz			# raw variants from Clair-Trio's trio model for [P2]
├── /log				# folder for detailed running log
└── /tmp				# folder for all running temporary files

Usage

General Usage

Caution: Use =value for optional parameters, e.g. --bed_fn=fn.bed instead of --bed_fn fn.bed.

./run_clair3_trio.sh \
  --bam_fn_c=${_BAM_C} \
  --bam_fn_p1=${_BAM_P1} \
  --bam_fn_p2=${_BAM_P2} \
  --output=${_OUTPUT_DIR} \
  --ref_fn=${_REF} \
  --threads=${_THREADS} \
  --model_path_clair3="${_MODEL_DIR_C3}" \
  --model_path_clair3_trio="${_MODEL_DIR_C3T}" \
  --bed_fn=${_INPUT_DIR}/quick_demo.bed \
  --sample_name_c=${_SAMPLE_C} \
  --sample_name_p1=${_SAMPLE_P1} \
  --sample_name_p2=${_SAMPLE_P2}

Options

Required parameters:

  --bam_fn_c=FILE             	 Child's BAM file input. The input file must be samtools indexed.
  --bam_fn_p1=FILE             	 Parent1's BAM file input (Parent1 can be father or mother). The input file must be samtools indexed.
  --bam_fn_p2=FILE             	 Parent2's BAM file input (Parent2 can be father or mother). The input file must be samtools indexed.
  -f, --ref_fn=FILE              FASTA reference file input. The input file must be samtools indexed.
  --model_path_clair3=STR        The folder path containing a Clair3 model (requiring six files in the folder, including pileup.data-00000-of-00002, pileup.data-00001-of-00002 pileup.index, full_alignment.data-00000-of-00002, full_alignment.data-00001-of-00002  and full_alignment.index.
  --model_path_clair3_trio=STR   The folder path containing a Clair3-Trio model.
  -t, --threads=INT              Max threads to be used. The full genome will be divided into small chunks for parallel processing. Each chunk will use 4 threads. The chunks being processed simultaneously is ceil($threads/4)*3. 3 is the overloading factor.
  -o, --output=PATH              VCF output directory.

Other parameters:

Caution: Use =value for optional parameters, e.g., --bed_fn=fn.bed instead of --bed_fn fn.bed

  --sample_name_c=STR       Define the sample name for Child to be shown in the VCF file.[Child].
  --sample_name_p1=STR      Define the sample name for Parent1 to be shown in the VCF file.[Parent1].
  --sample_name_p2=STR      Define the sample name for Parent2 to be shown in the VCF file.[Parent2].
  --bed_fn=FILE             Call variants only in the provided bed regions.
  --vcf_fn=FILE             Candidate sites VCF file input, variants will only be called at the sites in the VCF file if provided.
  --ctg_name=STR            The name of the sequence to be processed.
  --qual=INT                If set, variants with >$qual will be marked PASS, or LowQual otherwise.
  --samtools=STR            Path of samtools, samtools version >= 1.10 is required.
  --python=STR              Path of python, python3 >= 3.6 is required.
  --pypy=STR                Path of pypy3, pypy3 >= 3.6 is required.
  --parallel=STR            Path of parallel, parallel >= 20191122 is required.
  --whatshap=STR            Path of whatshap, whatshap >= 1.0 is required.
  --chunk_size=INT          The size of each chuck for parallel processing, default: 5Mbp.
  --gvcf                    Enable GVCF output, default: disable.
  --print_ref_calls         Show reference calls (0/0) in vcf file, default: disable.
  --include_all_ctgs        Call variants on all contigs, otherwise call in chr{1..22,X,Y} and {1..22,X,Y}, default: disable.
  --snp_min_af=FLOAT        Minimum SNP AF required for a candidate variant. Lowering the value might increase a bit of sensitivity in trade of speed and accuracy, default: ont:0.08.
  --indel_min_af=FLOAT      Minimum INDEL AF required for a candidate variant. Lowering the value might increase a bit of sensitivity in trade of speed and accuracy, default: ont:0.15.

  --pileup_model_prefix=STR EXPERIMENTAL: Model prefix in pileup calling, including $prefix.data-00000-of-00002, $prefix.data-00001-of-00002 $prefix.index. default: pileup.
  --fa_model_prefix=STR     EXPERIMENTAL: Model prefix in full-alignment calling, including $prefix.data-00000-of-00002, $prefix.data-00001-of-00002 $prefix.index, default: full_alignment.
  --trio_model_prefix=STR   EXPERIMENTAL: Model prefix in trio calling, including $prefix.data-00000-of-00002, $prefix.data-00001-of-00002 $prefix.index, default: trio.
  --var_pct_full=FLOAT      EXPERIMENTAL: Specify an expected percentage of low quality 0/1 and 1/1 variants called in the pileup mode for full-alignment mode calling, default: 0.3.
  --ref_pct_full=FLOAT      EXPERIMENTAL: Specify an expected percentage of low quality 0/0 variants called in the pileup mode for full-alignment mode calling, default: 0.3 for ilmn and hifi, 0.1 for ont.
  --var_pct_phasing=FLOAT   EXPERIMENTAL: Specify an expected percentage of high quality 0/1 variants used in Clair3 WhatsHap phasing, default: 0.8 for ont guppy5 and 0.7 for other platforms.
  --enable_output_phasing        Output phased variants using whatshap, default: disable.
  --enable_output_haplotagging   Output enable_output_haplotagging BAM variants using whatshap, default: disable.
  --enable_phasing               It means `--enable_output_phasing`. The option is retained for backward compatibility.

Folder Structure and Submodule Descriptions

Clair3-Trio shares the same folder structure as Clair3, except for an additional folder trio. For descriptions for Clair3 folder, please check Clair3's Descriptions for more inforamtion.

Submodules in clair3/ are for variant calling and model training. Submodules in preprocess are for data preparation.

For all the submodules listed below, you can use -h or --help for available options.

`trio/`	submodules under this folder are pypy incompatible, please run using python
`CheckEnvs_Trio`	Check the environment and validity of the input variables, preprocess the BED input if necessary, `--chunk_size` sets the chuck size to be processed per parallel job.
`SelectCandidates_Trio`	Select trio candidates for clair3-trio calling.
`CallVarBam_Trio`	Call variants using a trained model and three BAM files.
`SortVcf_Trio`	Sort Trio's VCF file.
`MergeTenorsBam_Trio`	Create and merge three tensors into trio's tensors.
`CallVariants_Trio`	Call variants using a trained model and merged tensors of candidate variants.
`model`	define Clair3-Trio model
`Training`	-
`SelectHetSnp_Trio`	Select heterozygous SNP candidates from pileup model and true set.
`Merge_Tenors_Trio`	Merge three tensors into trio's tensors.
`Tensor2Bin_Trio`	Convert trio's tensors into Bin file for training.
`Train_Trio`	Training a trio model using the `RectifiedAdam` optimizer. We also use the `Lookahead` optimizer to adjust the `RectifiedAdam` parameters dynamically. The initial learning rate is `1e-3` with `0.1` learning rate warm-up. Input a binary containing tensors created by `Tensor2Bin_Trio`.
`Evaluation`	-
`Check_de_novo`	Benchmark calling results in terms of de novo variants.

Training Data

Clair3-Trio trained its trio models using four GIAB samples (HG002, HG003 and HG004). All models were trained with chr20 excluded (including only chr1-19, 21, 22). All data links can be found at this page.

Platform	Reference	Aligner	Training samples
ONT	GRCh38_no_alt	minimap2	HG002,3,4

VCF/GVCF Output Formats

Clair3-Trio supports both VCF and GVCF output formats. Clair3-Trio uses VCF version 4.2 specifications. Specifically, Clair3-Trio adds a P INFO tag to the results called using a pileup model, and a T INFO tag to the results called using a trio model.

Clair3-Trio outputs a GATK-compatible GVCF format that passes GATK's ValidateVariants module. Different from DeepVariant that uses <*> to represent any possible alternative allele, Clair3-Trio uses <NON_REF>, the same as GATK.

Clair3-Trio GVCF files can be merged with GLNexus. A GLNexus caller-based configuration file is available Download.

Note that the reference call in VCF is called via Model, and Refcall in GVCF may be inferred from allele depth.

We left some comments for merging multiple VCF/GVCF here.

Publication

Zheng Z, Li S, Su J, Leung AW, Lam TW, Luo R. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nature Computational Science. 2022 Dec;2(12):797-803.
Su J, Zheng Z, Ahmed SS, Lam TW, Luo R. Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks. Briefings in Bioinformatics. 2022 Sep;23(5):bbac301.

clair3-trio's People

Contributors

Stargazers

Watchers

Forkers

lavenderyuuu

clair3-trio's Issues

chr1 data

Where can I download the training data chr1 you used in Figure 5?Thank you very much！

Figure 5. Comparison of different architectures and shapes for calling variants from trios. All results were trained on chr1 and tested on chr20.

Exceeding walltime and no output

Hi,
I am trying to run a Clair3-Trio analysis on a trio using the script run_clair3_trio.sh with the following options:
un_clair3_trio.sh
--bam_fn_c=child.bam
--bam_fn_p1=parent1.bam
--bam_fn_p2=parent2.bam
--output=output_dir/
--ref_fn=ref.fasta
--threads=12
--model_path_clair3=/path/to/Clair3-Trio/models/clair3_models/r941_prom_hac_g360+g422
--model_path_clair3_trio=/path/to/Clair3-Trio/models/clair3_trio_models/c3t_hg002_g422
--whatshap=/path/to/clair3_env/bin/whatshap
--include_all_ctgs
--enable_output_phasing
--enable_output_haplotagging

where the *bam files have been produced aligning the raw reads against my ref genome using minimap2.
However, it seems that the runtime exceeds 50 hours (which is my walltime limit). In addition, no output file have been produced.

Is there anyone who had the same problems? Is there any option I have to change in order to speed up the analysis or something I am actually missing?

Thank you for your help!

About the creating tensors for model training part

Dear author：
In the creating tensors for model training part, i see
each tensors are generated from combination of trio samples
e.g. in the following example, we generation different combination from [10x, 30x, 60x, 80x] data from HG002 trio into
combination of (Child, Parent1, Parent2): (10x, 10x, 10x), (30x, 10x, 10x)

I trained with this example and got similar results to the paper on 10X data sets, but did not work well on 20X and above data sets. I would like to ask you what method you used in your final training to generate tensor.
Thank you！

Q20+ data（FASTQs and BAMs） need

Hello! I want to obtain Q20+ data (FASTQs and BAMs format) for experimental purposes. However, I found on the main page that almost all of them need AWS to download, is there any other way or website? Since I am a student without a VISA card, I cannot register an AWS account.
Hope to get your help

Update of trio models

Hi,

I would like to know whether there is any plan to upload more trio models like Clair3 (rerio) in near future? Also, are the pre-trained trio models trained with Ashkenazim family samples (HG002-4) only?

Thank you.
Jimmy

Memory error while adding --gvcf parameter

Hi, I am running Clair3-Trio on HTCondor and I am experiencing issues while "merging variants and non-variants to GVCF" for each chromosome, prompting both MemoryError and OSError: [Errno 12] Cannot allocate memory
It just pass to the next chromosome, prompting the same error and go on, until the last chromosome and then pass to the next tasks. At the end of the process, .vcfs and .bam files are correctly created (in size of GBs), meanwhile .gvcf are basically empty (size of few Mbs)

Apart from the huge walltime I did not run into issues while running the same script without the --gvcf flag, with these condor settings:
request_cpus = 16
request_memory = 128 GB

I encounter this issue with --gvcf even boosting the settings to:
request_cpus = 64
request_memory = 512 GB
So I reckon it's not a matter of lack of memory from my side.

This is an example of the error prompted:

[INFO] Pileup variants processed in chr1: 12927629
[INFO] Trio variants processed in chr1: 1722491
[INFO] Merge variants and non-variants to GVCF
Traceback (most recent call last):
File "/home/Clair3-Trio/trio/../clair3.py", line 112, in
main()
File "/home/Clair3-Trio/trio/../clair3.py", line 106, in main
submodule.main()
File "/home/Clair3-Trio/trio/MergeVcf_Trio.py", line 259, in main
mergeNonVariant(args)
File "/home/Clair3-Trio/trio/MergeVcf_Trio.py", line 180, in mergeNonVariant
args.ctgEnd)
File "/home/Clair3-Trio/preprocess/utils.py", line 221, in mergeCalls
self.writeNonVarBlock(curNonVarStart, curNonVarEnd, curNonVarPos, curNonVarCall, save_writer)
File "/home/Clair3-Trio/preprocess/utils.py", line 181, in writeNonVarBlock
self._writeRightBlock(start, end, curNonVarCall, save_writer)
File "/home/Clair3-Trio/preprocess/utils.py", line 164, in _writeRightBlock
new_ref = self.readReferenceBaseAtPos(pos_cmd)
File "/home/Clair3-Trio/preprocess/utils.py", line 152, in readReferenceBaseAtPos
reader = os.popen(cmd)
File "/home/miniconda3/envs/clair3/lib-python/3/os.py", line 980, in popen
bufsize=buffering)
File "/home/miniconda3/envs/clair3/lib-python/3/subprocess.py", line 744, in init
restore_signals, start_new_session)
File "/home/miniconda3/envs/clair3/lib-python/3/subprocess.py", line 1323, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
[INFO] Pileup variants processed in chr4: 15631830
...

And this is the script I am running, I emptied the arguments by purpose:

home/Clair3-Trio/run_clair3_trio.sh \
  --bam_fn_c=/home/new_hg38/minimap_S2_sorted.bam \
  --bam_fn_p1=/home/new_hg38/minimap_S3_sorted.bam \
  --bam_fn_p2=home/new_hg38/minimap_S4_sorted.bam \
  --output=/home/new_trio_variants \
  --ref_fn=/home/new_hg38/hg38.fa \
  --threads=12 \
  --model_path_clair3=/home/Clair3-Trio/models/clair3_models/r941_prom_hac_g360+g422 \
  --model_path_clair3_trio=/home/Clair3-Trio/models/clair3_trio_models/c3t_hg002_g422 \
  --whatshap=/home/miniconda3/envs/clair3/bin/whatshap \
  --gvcf \
  --include_all_ctgs \
  --enable_output_phasing \
  --enable_output_haplotagging

Do someone experienced the same issue?

Presumable Installation Problem: Wall of Errors

Hi,
I have been trying for the last five hours to get this tool installed properly. After finally being able to run it without an immediate error, I thought the code was finally working. But I was wrong. It run for about half an hour before finishing but when I checked the output it was a very long list of errors (with lots of repetition). Here's the top:

[TRIO INFO] * Clir3-Trio pipeline start
[TRIO INFO] * 0 Check environment variables
[INFO] --include_all_ctgs not enabled, use chr{1..22,X,Y} and {1..22,X,Y} by default
[INFO] Call variant in contigs: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY
[INFO] Chunk number for each contig: 50 49 40 39 37 35 32 30 28 27 28 27 23 22 21 19 17 17 12 13 10 11 32 12
[TRIO INFO] * 1 call variants using pileup model
trio sample SAMPLE_C SAMPLE_P1 SAMPLE_P2
trio bam /rds/general/project/cebolalab_clinical_genomics/live/PRDM5_LSEC_ERG_KO/NanoPoreData/alignments/filtered_barcode01_mapped.sorted.bam /rds/general/project/cebolalab_clinical_genomics/live/PRDM5_LSEC_ERG_KO/NanoPoreData/alignments/filtered_barcode02_mapped.sorted.bam /rds/general/project/cebolalab_clinical_genomics/live/PRDM5_LSEC_ERG_KO/NanoPoreData/alignments/filtered_barcode03_mapped.sorted.bam
pileup threads 1
running pileup model
[INFO] Check environment variables
[INFO] --include_all_ctgs not enabled, use chr{1..22,X,Y} and {1..22,X,Y} by default
[INFO] Call variant in contigs: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY
[INFO] Chunk number for each contig: 50 49 40 39 37 35 32 30 28 27 28 27 23 22 21 19 17 17 12 13 10 11 32 12
[INFO] 1/7 Call variants using pileup model
[INFO] Delay 0 seconds before starting variant calling ...
[E::hts_parse_region] Coordinates must be > 0
[E::mpileup] fail to parse region 'chr1:-33-4979162' with /rds/general/project/cebolalab_clinical_genomics/live/PRDM5_LSEC_ERG_KO/NanoPoreData/alignments/filtered_barcode02_mapped.sorted.bam
Traceback (most recent call last):
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/bin/clair3/../clair3.py", line 112, in <module>
    main()
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/bin/clair3/../clair3.py", line 99, in main
    submodule = import_module("%s.%s" % (directory, submodule_name))
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/bin/clair3/CallVariants.py", line 5, in <module>
    import tensorflow as tf
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 64, in <module>
    from tensorflow.python.framework.framework_lib import *  # pylint: disable=redefined-builtin
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/tensorflow/python/framework/framework_lib.py", line 25, in <module>
    from tensorflow.python.framework.ops import Graph
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 47, in <module>
    from tensorflow.python.eager import context
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 28, in <module>
    from absl import logging
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/absl/logging/__init__.py", line 97, in <module>
    from absl import flags
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/absl/flags/__init__.py", line 35, in <module>
    from absl.flags import _argument_parser
  File "/rds/general/user/dpphill1/home/anaconda3/envs/clair3-trio/lib/python3.6/site-packages/absl/flags/_argument_parser.py", line 82, in <module>
    class ArgumentParser(Generic[_T], metaclass=_ArgumentParserCache):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
[INFO] Delay 4 seconds before starting variant calling ...
Traceback (most recent call last):

Package (for this attempt) was installed using the bioconda method. Besides the models not being preinstalled in /bin/ and having to supply them myself, I thought it looked OK. Script looks like this:

run_clair3_trio.sh \
  --bam_fn_c=<longpath>/filtered_barcode01_mapped.sorted.bam \
  --bam_fn_p1=<longpath>/filtered_barcode02_mapped.sorted.bam \
  --bam_fn_p2=<longpath>/filtered_barcode03_mapped.sorted.bam \
  --output=<longpath>/variantCalls/clair3 \
  --ref_fn=<longpath>/GRCh38.p14.genome.fa \
  --threads=8 \
  --model_path_clair3=<longpath>/r1041_e82_400bps_sup_v400 \
  --model_path_clair3_trio=<longpath>/c3t_hg002_dna_r1041_e82_400bps_sup

Anything obvious? I'd really like to get this to work rather than switching to a different tool.
Many thanks!
Dan

non-human Trio?

Hi there,

could this be useful for non-human trio data?

Many thanks!
CW

gVCF output option not working

Clair-trio won't output gVCF format when enabling "--gvcf"

There is no printed error other than:

cat: 'test/tmp/gvcf_tmp_output/*.tmp.g.vcf': No such file or directory
FileNotFoundError: [Errno 2] No such file or directory: 'SortVcf'

Let me know if there is any other info you need. Thanks.

trainning data guppy4.2.2

I downloaded GM24385_1_Guppy_4.2.2_prom.fastq.gz according to Training Data guppy4.2.2 -- -- GM24385_5_Guppy_4.2.2_prom.fastq.gz, The coverage of BAM file obtained after merge and minimap is 72.79, but I see that the coverage of HG002 in training_data.md file is 432.38. I want to ask in the chain of https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_panel/HG002/nanopore/Guppy_4.2.2/ ，Which fq files should I download?

--enable_phasing does not produce expected output

When using --enable_phasing I can't see any phased SNV on my final VCF. This happens in v0.4 and v0.5.

Illumina models

Hi,
Really thank you for the tool. Are there any plans to add also a clair3-Trio-illumina for Illumina short reads and a pre-trained model to use clair3-Trio with Illumina long reads?

3-generaton trio analysis possible?

Hi, I am toying with an idea to collect sample for a family with a condition of interest and there is 3 generation available, grandparent do not have conditon while 1 parent and 2 child has the condition. Can this kind of family structure be used effectively to identify variant? Also I currently only identified 1 family so far, does it work with 1 family or more family is required?

Question about variant calling conditions

Hello, I was wondering how decisions are made for a position in the child to be phased or not. I read your paper, but found it quite difficult to understand as I'm quite new to the field of bioinformatics.

For example: I was wondering why the following 50 positions are not called in the child; which parameters are considered?

Total positions in a specific region in my multi vcf): 185
-> multivcf is merge of child, father, mother vcf's which were generated using clair3-trio

50 positions not called in the child:
['chr15', '23633794', '.', 'T', 'TC', '13.67', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', '0/1:13:17:12,5:0.2941', './.:.:.:.:.']
['chr15', '23635280', '.', 'G', 'GA', '3.23', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', '1/1:3:18:9,3:0.1667', './.:.:.:.:.']
['chr15', '23635700', '.', 'C', 'CA', '3.88', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', '0/1:3:16:3,4:0.25', './.:.:.:.:.']
['chr15', '23637449', '.', 'A', 'T', '15.58', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:15:23:20,3:0.1304']
['chr15', '23640201', '.', 'G', 'A', '22.48', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:22:26:19,7:0.2692']
['chr15', '23646664', '.', 'G', 'A', '24.05', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:18:10,6:0.3333']
['chr15', '23666327', '.', 'CT', 'C', '6.03', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', '0/1:6:30:11,18:0.6', './.:.:.:.:.']
['chr15', '23667222', '.', 'A', 'AC', '6.32', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:6:16:12,2:0.125']
['chr15', '23667223', '.', 'A', 'C', '9.42', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:9:16:12,4:0.25']
['chr15', '23667225', '.', 'AAC', 'A', '1.66', 'LowQual', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:1:16:13,3:0.1875']
['chr15', '23675929', '.', 'G', 'GTA', '21.11', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:21:19:10,8:0.4211']
['chr15', '23675953', '.', 'C', 'CAT', '9.42', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:9:19:14,5:0.2632']
['chr15', '23675962', '.', 'G', 'A', '13.41', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:13:19:13,5:0.2632']
['chr15', '23676022', '.', 'A', 'G', '2.79', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:2:19:8,6:0.3158']
['chr15', '23676053', '.', 'A', 'G', '7.6', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:7:19:12,4:0.2105']
['chr15', '23676063', '.', 'A', 'ACACACATAT', '5.79', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:5:19:8,1:0.0526']
['chr15', '23676077', '.', 'C', 'T', '7.85', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:7:19:9,10:0.5263']
['chr15', '23676083', '.', 'A', 'T', '7.13', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:7:19:11,8:0.4211']
['chr15', '23676084', '.', 'A', 'T', '3.88', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:3:19:8,9:0.4737']
['chr15', '23676090', '.', 'C', 'CAT', '9.11', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:9:19:13,4:0.2105']
['chr15', '23676096', '.', 'T', 'C', '10.26', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:10:19:13,4:0.2105']
['chr15', '23676127', '.', 'T', 'C', '5.9', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:5:20:16,4:0.2']
['chr15', '23676134', '.', 'A', 'G', '7.08', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:7:20:12,4:0.2']
['chr15', '23676160', '.', 'C', 'T', '5.73', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:5:20:9,9:0.45']
['chr15', '23676163', '.', 'A', 'ATATG', '6.29', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:6:20:15,2:0.1']
['chr15', '23676171', '.', 'A', 'AT', '9.76', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:9:20:14,4:0.2']
['chr15', '23676195', '.', 'A', 'G', '7.24', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:7:20:14,3:0.15']
['chr15', '23676248', '.', 'T', 'C', '10.17', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:10:20:13,7:0.35']
['chr15', '23676270', '.', 'CAT', 'C', '11.49', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:11:20:11,9:0.45']
['chr15', '23676288', '.', 'A', 'G', '11.53', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:11:20:12,8:0.4']
['chr15', '23676359', '.', 'T', 'TATATGTG', '0', 'LowQual', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:0:20:10,6:0.3']
['chr15', '23679018', '.', 'C', 'G', '24.09', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', '0/1:24:28:12,16:0.5714', './.:.:.:.:.']
['chr15', '23679021', '.', 'G', 'A', '24.28', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:21:13,8:0.381']
['chr15', '23679039', '.', 'T', 'G', '24.08', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:20:12,8:0.4']
['chr15', '23679139', '.', 'A', 'G', '24.46', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:20:12,8:0.4']
['chr15', '23679561', '.', 'A', 'G', '22.81', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:22:20:13,6:0.3']
['chr15', '23679909', '.', 'T', 'C', '24.1', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:19:11,8:0.4211']
['chr15', '23680153', '.', 'C', 'T', '24.16', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:18:11,7:0.3889']
['chr15', '23680262', '.', 'G', 'A', '7.56', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', '0/1:5:29:23,5:0.1724', '0/1:7:19:16,3:0.1579']
['chr15', '23680525', '.', 'C', 'CA', '5.07', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', '0/1:5:29:22,4:0.1379', './.:.:.:.:.']
['chr15', '23680747', '.', 'A', 'G', '24.53', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:19:12,6:0.3158']
['chr15', '23681185', '.', 'T', 'C', '23.84', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:23:19:12,7:0.3684']
['chr15', '23681236', '.', 'G', 'T', '24.92', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:19:13,6:0.3158']
['chr15', '23681341', '.', 'G', 'GT', '6.85', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:6:19:13,6:0.3158']
['chr15', '23681491', '.', 'G', 'GAA', '4.81', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:4:20:10,2:0.1']
['chr15', '23681768', '.', 'A', 'AG', '12.36', 'LowQual', 'P', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:12:21:7,11:0.5238']
['chr15', '23682288', '.', 'G', 'T', '24.9', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:21:13,8:0.381']
['chr15', '23683947', '.', 'G', 'T', '25.63', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:25:22:13,8:0.3636']
['chr15', '23683960', '.', 'G', 'A', '23.99', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:23:22:12,9:0.4091']
['chr15', '23683973', '.', 'T', 'A', '24.3', 'PASS', 'T', 'GT:GQ:DP:AD:AF', './.:.:.:.:.', './.:.:.:.:.', '0/1:24:22:13,9:0.4091']

Kind regards

Sanché Verbeeck

Where can we get the data of HG006 and HG007 for Guppy 5

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble

hku-bal / clair3-trio Goto Github PK

clair3-trio's Introduction

Clair3-Trio: variant calling in trio using Nanopore long-reads

Introduction

Contents

Latest Updates

What's New in Clair3-Trio

Pre-trained Models

Clair3's Pre-trained Models

Quick Demo

Installation

Option 1. Docker pre-built image

Option 2. Singularity

Option 3. Build an anaconda virtual environment

Option 4. Bioconda

Option 5. Docker Dockerfile

Output Files

Usage

General Usage

Options

Folder Structure and Submodule Descriptions

Training Data

VCF/GVCF Output Formats

Publication

clair3-trio's People

Contributors

Stargazers

Watchers

Forkers

clair3-trio's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs