keoughkath / alleleanalyzer Goto Github PK

A software tool for personalized and allele-specific CRISPR editing.

Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1783-3

License: MIT License

Jupyter Notebook 38.75% Python 13.84% Shell 0.81% HTML 5.69% Makefile 0.48% C 29.35% C++ 4.39% Perl 2.68% Java 1.54% M4 1.09% Roff 0.46% JavaScript 0.64% R 0.22% SWIG 0.05%

crispr crispr-cas9 sgrna allele-specific-sgrnas

alleleanalyzer's People

Contributors

Stargazers

Watchers

Forkers

allgenesconsidered momor666 gnamianalain gladstone-institutes reubenthomas

alleleanalyzer's Issues

gen_sgRNAs --bed option redundant

naming inconsistencies

gen_targ_dfs is confusing since it is described as annotating variants, and the name should reflect the description.

irregularities between file naming for targ and gen file generation

Out dir vs out prefix

Output naming consistency (whether to include a discriptive suffix or not).

There is some inconsistency with a suffix being added to the end of output files (such as gen_sgRNAs.py and get_gens_df.py) while other scripts don't append anything (ex, annot_variants.py).

If hard-coding some sort of annotation is desired, perhaps we could replace the h5/hdf5 extensions with something more descriptive, since these files are only ment to work with ExcisionFinder. Ex:

OUT_gens.h5 -> OUT.gens
OUT_annotation.hdf5 -> OUT.annot

annot_variants.py unable to parse multi-locus gens files.

It seems like annot_variants.py does not check every line to make sure the chromosome names match, making it currently incompatible with multi-locus gens files.

It also fails to produce and error when a multi-locus gens file is present without all necessary fasta or .npy files (ie for analyzing variants for SpCas9 on chr1 and chr11, it does not check if all the following are present: chr1_SpCas9_pam_sites_for.npy, chr1_SpCas9_pam_sites_rev.npy, chr11_SpCas9_pam_sites_for.npy, and chr11_SpCas9_pam_sites_rev.npy.).

CRISPOR with non-spCas9

check that CRISPOR uses scoring for correct type of Cas, or is correctly annotated otherwise.

Suggestion: Add metadata to hdf5 files.

I've noticed how some ExcisionFinder functions repeat arguments, such as locus information, Cas enzymes, guide lengths, etc. I wonder if it would be worth it to try and save that metadata to the annotation hdf5 file, so users won't need to retype arguments between scripts.

We could also allow users to override previous metadata if for example they ran annot_variants on several cas enzymes but only wanted to output guides for a select few. Also, adding metadata would allow users to better recall the parameter ran to produce their data in case they need to come back to it at a later time.

One way to implement this would be here:
https://stackoverflow.com/questions/29129095/save-additional-attributes-in-pandas-dataframe/29130146#29130146

N position in Pam filter

Error designing allele-specific guides

Hello,

I am trying to reproduce the results from the code you provide in the tutorial . However, I end up with an error that I am unable to resolve, at the step of designing all possible allele-specific guides. It appears that using the exact same input files that you had used there are no guides that meet the chosen criteria (whereas the output you have linked to there are 5 possible guides). Would you be able to look into it?

Thanks,
-Reuben

% python3 ./AlleleAnalyzer_bcft_RT/scripts/gen_sgRNAs.py
wtc_phased_hg19.bcf
mfn2_hg19_annots.h5
1:12040238-12073572
hg19_pams
hg19_pams/chr1.fa
test_sgrnas
SpCas9
20
-v

[2023-01-02 01:00:51,990 root:INFO ]bcftools version 1.16 running
[2023-01-02 01:00:51,991 root:INFO ]{'--bed': False,
'--cas-list': False,
'--crispor': None,
'--help': False,
'--hom': False,
'--max_indel': '5',
'--min_score': '0',
'--ref_guides': False,
'--sim': False,
'--strict': False,
'-c': False,
'-d': False,
'-r': False,
'-v': True,
'<annots_file>': 'mfn2_hg19_annots.h5',
'': 'wtc_phased_hg19.bcf',
'<cas_types>': 'SpCas9',
'<gene_vars>': None,
'<guide_length>': '20',
'': '1:12040238-12073572',
'': 'test_sgrnas',
'<pams_dir>': 'hg19_pams',
'<ref_fasta>': 'hg19_pams/chr1.fa'}
[2023-01-02 01:00:51,991 root:INFO ]Finding allele-specific guides.
[2023-01-02 01:00:52,183 root:INFO ]There are 3 heterozygous variants in this locus in this genome.
[2023-01-02 01:00:56,691 root:INFO ]Currently evaluating SpCas9.
[2023-01-02 01:00:56,709 root:INFO ]No sgRNAs meet the criteria for this locus, exiting.
Traceback (most recent call last):
File "./AlleleAnalyzer_bcft_RT/scripts/gen_sgRNAs.py", line 1715, in
main(arguments)
File "./AlleleAnalyzer_bcft_RT/scripts/gen_sgRNAs.py", line 1678, in main
out = get_allele_spec_guides(args).query('variant_position_in_guide > -1')
AttributeError: 'NoneType' object has no attribute 'query'

eliminate dependence on crisprtools

Still in at least gen_sgRNAs.

integrate new CAS_LIST feature into gen_targ_dfs script

Would be useful to have this flexibility for the annotation portion as well.

implement new Cas feature in pam_pos scripts (preprocessing)

Increased modularity.

implement strict option in gen_sgRNAs

RNA seq for gen sgRNA

for ordering

Issues with conda environment in gen_sgRNAs.py

Ran the following commands to generate sgRNAs from targ and gen files:

cd ../ExcisionFinder/scripts/
current_dir=`pwd`
conda env create -f $current_dir/conda.yml

python3 gen_sgRNAs.py ../../ef_dat/MFN2_SNP_all/MN_P1_chr.hdf5\
 ../../ef_dat/MFN2_SNP_all/MN_P1_targ.hdf5 1:11977191-12018651\
 ../../ef_dat/GATK_ref/hg38_GATK_pams/ ../../ef_dat/dat/chr1.fa\
 ../../ef_dat/MFN2_SNP_all/MN_P1 SpCas9 20 ../../ef_dat/MFN2_SNP_all/chr1_1kgp_gendat_hg38.h5\
 ../../crispor/genomes/hg38 --crispor -c

Got the following error output (note, I am manually stopping the code at line 227 to truncate the error message around the first issue):

{'--crispor': True,
 '--help': False,
 '-c': True,
 '<cas_types>': 'SpCas9',
 '<gene_vars>': '../../ef_dat/MFN2_SNP_all/chr1_1kgp_gendat_hg38.h5',
 '<gens_file>': '../../ef_dat/MFN2_SNP_all/MN_P1_chr.hdf5',
 '<guide_length>': '20',
 '<locus>': '1:11977191-12018651',
 '<out_dir>': '../../ef_dat/MFN2_SNP_all/MN_P1',
 '<pams_dir>': '../../ef_dat/GATK_ref/hg38_GATK_pams/',
 '<ref_fasta>': '../../ef_dat/dat/chr1.fa',
 '<ref_gen>': '../../crispor/genomes/hg38',
 '<targ_file>': '../../ef_dat/MFN2_SNP_all/MN_P1_targ.hdf5'}
1:11977191-12018651
There are 51 het variants in this locus in this genome.
Currently evaluating SpCas9.
Running crispor.

Using Anaconda API: https://api.anaconda.org

SpecNotFound: Can't process without a name

Could not find conda environment: crispor
You can list all discoverable environments with `conda info --envs`.

/bin/sh: ../../ef_dat/MFN2_SNP_all/MN_P1/crispor_error.txt: No such file or directory
/bin/sh: ../../ef_dat/MFN2_SNP_all/MN_P1/crispor_error.txt: No such file or directory

I believe the issue is that the conda environment is not being established or is not recognized. None of the out files from get_crispor_scores() are generated, except for the teo .fa files (no socres.tsv or error.txt files).

The .yml file has been checked for a name. And the direct path was added to the .yml file based on sugestions from this post.

$conda -V
conda 4.5.0

sample bcf files

The input bcf files for the tutorial for "Design sgRNAs for allele specific excision of the gene MFN2 in the WTC genome" are re-directed to the Pollard lab. Could you point me the locations of sample bcf files?

curl https://lighthouse.ucsf.edu/public_files_no_password/excisionFinderData_public/gRNA_tutorial_sample_data/sample_input/wtc_phased_hg19.bcf -o wtc_phased_hg19.bcf

curl https://lighthouse.ucsf.edu/public_files_no_password/excisionFinderData_public/gRNA_tutorial_sample_data/sample_input/wtc_phased_hg19.bcf.csi -o wtc_phased_hg19.bcf.csi

SaCas9-KKH variant PAM

Is current NNNRT but would be better as NNNRRT to align with CRISPOR and the original Kleinstiver et al. 2015 paper (thanks Katie for pointing this out).

bcftools version

Hi,

The installation page suggests we need bcftools version number at least 1.5, whereas the latest release of bcftools appears to be 1.16. Was the requirement version 1.5 meant to be version 1.15?

Thanks,
-Reuben

keoughkath / alleleanalyzer Goto Github PK

alleleanalyzer's People

Contributors

Stargazers

Watchers

Forkers

alleleanalyzer's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs