GithubHelp home page GithubHelp logo

keoughkath / alleleanalyzer Goto Github PK

View Code? Open in Web Editor NEW
16.0 5.0 5.0 66.4 MB

A software tool for personalized and allele-specific CRISPR editing.

Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1783-3

License: MIT License

Jupyter Notebook 38.75% Python 13.84% Shell 0.81% HTML 5.69% Makefile 0.48% C 29.35% C++ 4.39% Perl 2.68% Java 1.54% M4 1.09% Roff 0.46% JavaScript 0.64% R 0.22% SWIG 0.05%
crispr crispr-cas9 sgrna allele-specific-sgrnas

alleleanalyzer's People

Contributors

allgenesconsidered avatar keoughkath avatar slyalina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

alleleanalyzer's Issues

naming inconsistencies

gen_targ_dfs is confusing since it is described as annotating variants, and the name should reflect the description.

Output naming consistency (whether to include a discriptive suffix or not).

There is some inconsistency with a suffix being added to the end of output files (such as gen_sgRNAs.py and get_gens_df.py) while other scripts don't append anything (ex, annot_variants.py).

If hard-coding some sort of annotation is desired, perhaps we could replace the h5/hdf5 extensions with something more descriptive, since these files are only ment to work with ExcisionFinder. Ex:

OUT_gens.h5 -> OUT.gens
OUT_annotation.hdf5 -> OUT.annot

annot_variants.py unable to parse multi-locus gens files.

It seems like annot_variants.py does not check every line to make sure the chromosome names match, making it currently incompatible with multi-locus gens files.

It also fails to produce and error when a multi-locus gens file is present without all necessary fasta or .npy files (ie for analyzing variants for SpCas9 on chr1 and chr11, it does not check if all the following are present: chr1_SpCas9_pam_sites_for.npy, chr1_SpCas9_pam_sites_rev.npy, chr11_SpCas9_pam_sites_for.npy, and chr11_SpCas9_pam_sites_rev.npy.).

CRISPOR with non-spCas9

check that CRISPOR uses scoring for correct type of Cas, or is correctly annotated otherwise.

Suggestion: Add metadata to hdf5 files.

I've noticed how some ExcisionFinder functions repeat arguments, such as locus information, Cas enzymes, guide lengths, etc. I wonder if it would be worth it to try and save that metadata to the annotation hdf5 file, so users won't need to retype arguments between scripts.

We could also allow users to override previous metadata if for example they ran annot_variants on several cas enzymes but only wanted to output guides for a select few. Also, adding metadata would allow users to better recall the parameter ran to produce their data in case they need to come back to it at a later time.

One way to implement this would be here:
https://stackoverflow.com/questions/29129095/save-additional-attributes-in-pandas-dataframe/29130146#29130146

Error designing allele-specific guides

Hello,

I am trying to reproduce the results from the code you provide in the tutorial . However, I end up with an error that I am unable to resolve, at the step of designing all possible allele-specific guides. It appears that using the exact same input files that you had used there are no guides that meet the chosen criteria (whereas the output you have linked to there are 5 possible guides). Would you be able to look into it?

Thanks,
-Reuben

% python3 ./AlleleAnalyzer_bcft_RT/scripts/gen_sgRNAs.py
wtc_phased_hg19.bcf
mfn2_hg19_annots.h5
1:12040238-12073572
hg19_pams
hg19_pams/chr1.fa
test_sgrnas
SpCas9
20
-v

[2023-01-02 01:00:51,990 root:INFO ]bcftools version 1.16 running
[2023-01-02 01:00:51,991 root:INFO ]{'--bed': False,
'--cas-list': False,
'--crispor': None,
'--help': False,
'--hom': False,
'--max_indel': '5',
'--min_score': '0',
'--ref_guides': False,
'--sim': False,
'--strict': False,
'-c': False,
'-d': False,
'-r': False,
'-v': True,
'<annots_file>': 'mfn2_hg19_annots.h5',
'': 'wtc_phased_hg19.bcf',
'<cas_types>': 'SpCas9',
'<gene_vars>': None,
'<guide_length>': '20',
'': '1:12040238-12073572',
'': 'test_sgrnas',
'<pams_dir>': 'hg19_pams',
'<ref_fasta>': 'hg19_pams/chr1.fa'}
[2023-01-02 01:00:51,991 root:INFO ]Finding allele-specific guides.
[2023-01-02 01:00:52,183 root:INFO ]There are 3 heterozygous variants in this locus in this genome.
[2023-01-02 01:00:56,691 root:INFO ]Currently evaluating SpCas9.
[2023-01-02 01:00:56,709 root:INFO ]No sgRNAs meet the criteria for this locus, exiting.
Traceback (most recent call last):
File "./AlleleAnalyzer_bcft_RT/scripts/gen_sgRNAs.py", line 1715, in
main(arguments)
File "./AlleleAnalyzer_bcft_RT/scripts/gen_sgRNAs.py", line 1678, in main
out = get_allele_spec_guides(args).query('variant_position_in_guide > -1')
AttributeError: 'NoneType' object has no attribute 'query'

Issues with conda environment in gen_sgRNAs.py

Ran the following commands to generate sgRNAs from targ and gen files:

cd ../ExcisionFinder/scripts/
current_dir=`pwd`
conda env create -f $current_dir/conda.yml

python3 gen_sgRNAs.py ../../ef_dat/MFN2_SNP_all/MN_P1_chr.hdf5\
 ../../ef_dat/MFN2_SNP_all/MN_P1_targ.hdf5 1:11977191-12018651\
 ../../ef_dat/GATK_ref/hg38_GATK_pams/ ../../ef_dat/dat/chr1.fa\
 ../../ef_dat/MFN2_SNP_all/MN_P1 SpCas9 20 ../../ef_dat/MFN2_SNP_all/chr1_1kgp_gendat_hg38.h5\
 ../../crispor/genomes/hg38 --crispor -c

Got the following error output (note, I am manually stopping the code at line 227 to truncate the error message around the first issue):

{'--crispor': True,
 '--help': False,
 '-c': True,
 '<cas_types>': 'SpCas9',
 '<gene_vars>': '../../ef_dat/MFN2_SNP_all/chr1_1kgp_gendat_hg38.h5',
 '<gens_file>': '../../ef_dat/MFN2_SNP_all/MN_P1_chr.hdf5',
 '<guide_length>': '20',
 '<locus>': '1:11977191-12018651',
 '<out_dir>': '../../ef_dat/MFN2_SNP_all/MN_P1',
 '<pams_dir>': '../../ef_dat/GATK_ref/hg38_GATK_pams/',
 '<ref_fasta>': '../../ef_dat/dat/chr1.fa',
 '<ref_gen>': '../../crispor/genomes/hg38',
 '<targ_file>': '../../ef_dat/MFN2_SNP_all/MN_P1_targ.hdf5'}
1:11977191-12018651
There are 51 het variants in this locus in this genome.
Currently evaluating SpCas9.
Running crispor.

Using Anaconda API: https://api.anaconda.org

SpecNotFound: Can't process without a name

Could not find conda environment: crispor
You can list all discoverable environments with `conda info --envs`.

/bin/sh: ../../ef_dat/MFN2_SNP_all/MN_P1/crispor_error.txt: No such file or directory
/bin/sh: ../../ef_dat/MFN2_SNP_all/MN_P1/crispor_error.txt: No such file or directory

I believe the issue is that the conda environment is not being established or is not recognized. None of the out files from get_crispor_scores() are generated, except for the teo .fa files (no socres.tsv or error.txt files).

The .yml file has been checked for a name. And the direct path was added to the .yml file based on sugestions from this post.

$conda -V
conda 4.5.0

sample bcf files

SaCas9-KKH variant PAM

Is current NNNRT but would be better as NNNRRT to align with CRISPOR and the original Kleinstiver et al. 2015 paper (thanks Katie for pointing this out).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.