crisprverse / crisprdesign Goto Github PK

Comprehensive design of CRISPR gRNAs for nucleases and base editors

License: MIT License

R 98.55% Shell 0.01% TeX 1.44%

bioconductor bioconductor-package crispr crispr-cas9 crispr-design crispr-target genomics-analysis grna grna-sequence grna-sequences sgrna sgrna-design

crisprdesign's Introduction

crisprVerse: ecosystem of R packages for CRISPR gRNA design

Installation and getting started
Components
Reproducibility

Authors: Jean-Philippe Fortin, Luke Hoberecht

Date: July 25, 2022

Installation and getting started

The crisprVerse is a collection of packages for CRISPR guide RNA (gRNA) design that can easily be installed with the crisprVerse package. This provides a convenient way of downloading and installing all crisprVerse packages with a single R command.

The package can be installed from the Bioconductor devel branch using the following commands in an R session:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(version="devel")
BiocManager::install("crisprVerse")

The core crisprVerse includes the packages that are commonly used for gRNA design, and are attached when you attach the crisprVerse package:

library(crisprVerse)

## Warning: multiple methods tables found for 'aperm'

## Warning: replacing previous import 'BiocGenerics::aperm' by
## 'DelayedArray::aperm' when loading 'SummarizedExperiment'

You can check that all crisprVerse packages are up-to-date with crisprVerse_update():

crisprVerse_update()

## The following packages are out of date:
## 
## • crisprDesign (0.99.176 -> 0.99.177)
## • crisprScore  (1.1.15 -> 1.1.16)
## 
## Start a clean R session then run:
## BiocManager::install(c("crisprDesign", "crisprScore"))

The complete documentation for the package can be found here.

Components

The following packages are installed and loaded with the crisprVerse package:

crisprBase to specify and manipulate CRISPR nucleases.
crisprBowtie to perform gRNA spacer sequence alignment with Bowtie.
crisprScore to annotate gRNAs with on-target and off-target scores.
crisprDesign to design and manipulate gRNAs with GuideSet objects.
crisprViz to visualize gRNAs.

Reproducibility

sessionInfo()

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] crisprViz_0.99.22     crisprDesign_0.99.176 crisprScore_1.1.15   
##  [4] crisprScoreData_1.1.3 ExperimentHub_2.5.0   AnnotationHub_3.5.1  
##  [7] BiocFileCache_2.5.0   dbplyr_2.2.1          BiocGenerics_0.43.4  
## [10] crisprBowtie_1.1.1    crisprBase_1.1.8      crisprVerse_0.99.9   
## [13] BiocStyle_2.25.0     
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.4.1               Hmisc_4.7-1                  
##   [3] lazyeval_0.2.2                splines_4.2.1                
##   [5] BiocParallel_1.31.12          GenomeInfoDb_1.33.7          
##   [7] ggplot2_3.3.6                 digest_0.6.29                
##   [9] ensembldb_2.21.4              htmltools_0.5.3              
##  [11] fansi_1.0.3                   checkmate_2.1.0              
##  [13] magrittr_2.0.3                memoise_2.0.1                
##  [15] BSgenome_1.65.2               cluster_2.1.4                
##  [17] tzdb_0.3.0                    Biostrings_2.65.3            
##  [19] readr_2.1.2                   matrixStats_0.62.0           
##  [21] prettyunits_1.1.1             jpeg_0.1-9                   
##  [23] colorspace_2.0-3              blob_1.2.3                   
##  [25] rappdirs_0.3.3                xfun_0.32                    
##  [27] dplyr_1.0.10                  crayon_1.5.1                 
##  [29] RCurl_1.98-1.8                jsonlite_1.8.0               
##  [31] survival_3.4-0                VariantAnnotation_1.43.3     
##  [33] glue_1.6.2                    gtable_0.3.1                 
##  [35] zlibbioc_1.43.0               XVector_0.37.1               
##  [37] DelayedArray_0.23.1           scales_1.2.1                 
##  [39] DBI_1.1.3                     Rcpp_1.0.9                   
##  [41] htmlTable_2.4.1               xtable_1.8-4                 
##  [43] progress_1.2.2                reticulate_1.26              
##  [45] foreign_0.8-82                bit_4.0.4                    
##  [47] Formula_1.2-4                 stats4_4.2.1                 
##  [49] htmlwidgets_1.5.4             httr_1.4.4                   
##  [51] dir.expiry_1.5.1              RColorBrewer_1.1-3           
##  [53] ellipsis_0.3.2                pkgconfig_2.0.3              
##  [55] XML_3.99-0.10                 nnet_7.3-17                  
##  [57] Gviz_1.41.1                   deldir_1.0-6                 
##  [59] utf8_1.2.2                    tidyselect_1.1.2             
##  [61] rlang_1.0.5                   later_1.3.0                  
##  [63] AnnotationDbi_1.59.1          munsell_0.5.0                
##  [65] BiocVersion_3.16.0            tools_4.2.1                  
##  [67] cachem_1.0.6                  cli_3.4.0                    
##  [69] generics_0.1.3                RSQLite_2.2.16               
##  [71] evaluate_0.16                 stringr_1.4.1                
##  [73] fastmap_1.1.0                 yaml_2.3.5                   
##  [75] knitr_1.40                    bit64_4.0.5                  
##  [77] purrr_0.3.4                   randomForest_4.7-1.1         
##  [79] AnnotationFilter_1.21.0       KEGGREST_1.37.3              
##  [81] Rbowtie_1.37.0                mime_0.12                    
##  [83] xml2_1.3.3                    biomaRt_2.53.2               
##  [85] compiler_4.2.1                rstudioapi_0.14              
##  [87] filelock_1.0.2                curl_4.3.2                   
##  [89] png_0.1-7                     interactiveDisplayBase_1.35.0
##  [91] tibble_3.1.8                  stringi_1.7.8                
##  [93] basilisk.utils_1.9.3          GenomicFeatures_1.49.6       
##  [95] lattice_0.20-45               ProtGenerics_1.29.0          
##  [97] Matrix_1.4-1                  vctrs_0.4.1                  
##  [99] pillar_1.8.1                  lifecycle_1.0.1              
## [101] BiocManager_1.30.18           data.table_1.14.2            
## [103] bitops_1.0-7                  httpuv_1.6.5                 
## [105] rtracklayer_1.57.0            GenomicRanges_1.49.1         
## [107] R6_2.5.1                      BiocIO_1.7.1                 
## [109] latticeExtra_0.6-30           promises_1.2.0.1             
## [111] gridExtra_2.3                 IRanges_2.31.2               
## [113] codetools_0.2-18              dichromat_2.0-0.1            
## [115] assertthat_0.2.1              SummarizedExperiment_1.27.2  
## [117] rjson_0.2.21                  GenomicAlignments_1.33.1     
## [119] Rsamtools_2.13.4              S4Vectors_0.35.3             
## [121] GenomeInfoDbData_1.2.8        parallel_4.2.1               
## [123] hms_1.1.2                     rpart_4.1.16                 
## [125] grid_4.2.1                    basilisk_1.9.6               
## [127] rmarkdown_2.16                MatrixGenerics_1.9.1         
## [129] biovizBase_1.45.0             Biobase_2.57.1               
## [131] shiny_1.7.2                   base64enc_0.1-3              
## [133] interp_1.1-3                  restfulr_0.0.15

crisprdesign's People

Contributors

Stargazers

Watchers

Forkers

prosaddas 00mjk matthewpace98 wangdong-ls bengalpirate

crisprdesign's Issues

error when constructing a tssObject using yeast gff file

Hi there,

I'm trying to construct a tssObject from a gff file, the genome I use is this one

I followed the steps in the tutorial

library(crisprDesign)
txdb <- getTxDb(file="saccharomyces_cerevisiae_R64-3-1_20210421.gff", organism="Saccharomyces cerevisiae")
grList <- TxDb2GRangesList(txdb)
GenomeInfoDb::genome(grList) <- "s288c"

and then

tssObject <- getTssObjectFromTxObject(grList)

In this step, it returns me error

Error in `[[<-`(`*tmp*`, name, value = "_") :
  1 elements in value to replace 0 elements

I checked my grList$fiveUTRs, it showed:

GRanges object with 0 ranges and 14 metadata columns:
   seqnames    ranges strand |       tx_id     gene_id  protein_id     tx_type
      <Rle> <IRanges>  <Rle> | <character> <character> <character> <character>
   gene_symbol     exon_id exon_rank cds_start   cds_end  tx_start    tx_end
   <character> <character> <integer> <integer> <integer> <integer> <integer>
     cds_len exon_start  exon_end
   <integer>  <integer> <integer>
  -------
  seqinfo: 16 sequences from s288c genome; no seqlengths

I guess it means there is nothing in there, is this the problem why I can't construct a tssObject?

Session info:

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS/LAPACK: /home/u1133824/.conda/envs/crisprVerse/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] crisprDesign_1.0.0 crisprBase_1.2.0

loaded via a namespace (and not attached):
  [1] bitops_1.0-7                  matrixStats_0.63.0
  [3] bit64_4.0.5                   filelock_1.0.2
  [5] progress_1.2.2                httr_1.4.5
  [7] GenomeInfoDb_1.34.9           tools_4.2.2
  [9] utf8_1.2.3                    R6_2.5.1
 [11] DBI_1.1.3                     BiocGenerics_0.44.0
 [13] withr_2.5.0                   tidyselect_1.2.0
 [15] prettyunits_1.1.1             bit_4.0.5
 [17] curl_4.3.3                    compiler_4.2.2
 [19] crisprBowtie_1.2.0            cli_3.6.0
 [21] Biobase_2.58.0                basilisk.utils_1.10.0
 [23] crisprScoreData_1.3.0         xml2_1.3.3
 [25] DelayedArray_0.24.0           rtracklayer_1.58.0
 [27] randomForest_4.7-1.1          readr_2.1.4
 [29] rappdirs_0.3.3                stringr_1.5.0
 [31] digest_0.6.31                 Rsamtools_2.14.0
 [33] crisprScore_1.3.1             basilisk_1.10.2
 [35] XVector_0.38.0                pkgconfig_2.0.3
 [37] htmltools_0.5.4               MatrixGenerics_1.10.0
 [39] dbplyr_2.3.1                  fastmap_1.1.1
 [41] BSgenome_1.66.3               rlang_1.0.6
 [43] RSQLite_2.3.0                 shiny_1.7.4
 [45] BiocIO_1.8.0                  generics_0.1.3
 [47] jsonlite_1.8.4                BiocParallel_1.32.5
 [49] dplyr_1.1.0                   VariantAnnotation_1.44.1
 [51] RCurl_1.98-1.10               magrittr_2.0.3
 [53] GenomeInfoDbData_1.2.9        Matrix_1.5-3
 [55] Rcpp_1.0.10                   S4Vectors_0.36.2
 [57] fansi_1.0.4                   reticulate_1.28
 [59] Rbowtie_1.38.0                lifecycle_1.0.3
 [61] stringi_1.7.12                yaml_2.3.7
 [63] SummarizedExperiment_1.28.0   zlibbioc_1.44.0
 [65] BiocFileCache_2.6.1           AnnotationHub_3.6.0
 [67] grid_4.2.2                    blob_1.2.3
 [69] parallel_4.2.2                promises_1.2.0.1
 [71] ExperimentHub_2.6.0           crayon_1.5.2
 [73] dir.expiry_1.6.0              lattice_0.20-45
 [75] Biostrings_2.66.0             GenomicFeatures_1.50.4
 [77] hms_1.1.2                     KEGGREST_1.38.0
 [79] pillar_1.8.1                  GenomicRanges_1.50.2
 [81] rjson_0.2.21                  codetools_0.2-19
 [83] biomaRt_2.54.0                stats4_4.2.2
 [85] XML_3.99-0.13                 glue_1.6.2
 [87] BiocVersion_3.16.0            BiocManager_1.30.20
 [89] png_0.1-8                     vctrs_0.5.2
 [91] tzdb_0.3.0                    httpuv_1.6.9
 [93] purrr_1.0.1                   cachem_1.0.7
 [95] mime_0.12                     xtable_1.8-4
 [97] restfulr_0.0.15               later_1.3.0
 [99] tibble_3.2.0                  GenomicAlignments_1.34.0
[101] AnnotationDbi_1.60.0          memoise_2.0.1
[103] IRanges_2.32.0                ellipsis_0.3.2
[105] interactiveDisplayBase_1.36.0

Alignments error while using `PairedGuideSet` object as input

Is it possible to run addSpacerAlignmentsIterative on PairedGuideSet object?! I an error running this code:

pairsGuideSet <- addSpacerAlignmentsIterative(
    pairsGuideSet,
    txObject=txdb,
    aligner_index=bowtie_index,
    bsgenome=bsgenome,
    n_mismatches=2
)

R[write to console]: Error in METHOD(x, i) : 
  Subsetting operation on CompressedGRangesList object 'x' produces a
  result that is too big to be represented as a CompressedList object.
  Please try to coerce 'x' to a SimpleList object first (with 'as(x,
  "SimpleList")').


Error in METHOD(x, i) : 
  Subsetting operation on CompressedGRangesList object 'x' produces a
  result that is too big to be represented as a CompressedList object.
  Please try to coerce 'x' to a SimpleList object first (with 'as(x,
  "SimpleList")').

This is the size of my pairsGuideSet:

PairedGuideSet object with 110196 pairs and 4 metadata columns:

`addEditedAlleles()` fails when the guide sequence is not editable

Hi JP and Team, really enjoying this set of tools.

I've run into an error from the addEditingWeights() function that occurs when the targeted sequence doesn't have any nucleotides with values greater than zero in the base editor weight matrix.

I'm not sure how you'd want to address circumstances like this in terms of the functionality of the code, so I thought I'd raise it as an issue instead of a PR. I think you can resolve it on line 220 of addEditedAlleles.R by checking length(nNucs) > 0.
For example:

> crisprDesign:::.getExtendedSequences(gs, start = editingWindow[1], 
+                                      end = editingWindow[2])
spacer_1160 
   "GGGTGGTTGGGGT" 
> addEditedAlleles(gs, baseEditor = abe, txTable = txtable, editingWindow = win)
[addEditedAlleles] Obtaining edited alleles at each gRNA target site.
Error in -out$score : invalid argument to unary operator

AddOnTargetScores unable to run due to error with creating conda environment

Hi developers,
I am having troubles running the AddOnTargetScores command as it produces an error while creating the conda environment. The error seems to be due to incompatible packages as below.

> testSpacer <- addOnTargetScores(testSpacer, methods = c("deephf", "deepspcas9"))
[addOnTargetScores] Adding deephf scores. 

/nemo/project/home/tanb/.cache/R/ExperimentHub
  does not exist, create directory? (yes/no): yes
  |======================================================================| 100%

snapshotDate(): 2022-10-31
see ?crisprScoreData and browseVignettes('crisprScoreData') for documentation
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
trying URL 'https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh'
Content type 'application/x-sh' length 76120962 bytes (72.6 MB)
==================================================
downloaded 72.6 MB

PREFIX=/nemo/project/home/tanb/.cache/R/basilisk/1.10.2/0
Unpacking payload ...
Collecting package metadata (current_repodata.json): done                                              
Solving environment: done

## Package Plan ##

  environment location: /nemo/project/home/tanb/.cache/R/basilisk/1.10.2/0

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - _openmp_mutex==4.5=1_gnu
    - brotlipy==0.7.0=py38h27cfd23_1003
    - ca-certificates==2022.3.29=h06a4308_1
    - certifi==2021.10.8=py38h06a4308_2
    - cffi==1.15.0=py38hd667e15_1
    - charset-normalizer==2.0.4=pyhd3eb1b0_0
    - colorama==0.4.4=pyhd3eb1b0_0
    - conda-content-trust==0.1.1=pyhd3eb1b0_0
    - conda-package-handling==1.8.1=py38h7f8727e_0
    - conda==4.12.0=py38h06a4308_0
    - cryptography==36.0.0=py38h9ce1e76_0
    - idna==3.3=pyhd3eb1b0_0
    - ld_impl_linux-64==2.35.1=h7274673_9
    - libffi==3.3=he6710b0_2
    - libgcc-ng==9.3.0=h5101ec6_17
    - libgomp==9.3.0=h5101ec6_17
    - libstdcxx-ng==9.3.0=hd4cf53a_17
    - ncurses==6.3=h7f8727e_2
    - openssl==1.1.1n=h7f8727e_0
    - pip==21.2.4=py38h06a4308_0
    - pycosat==0.6.3=py38h7b6447c_1
    - pycparser==2.21=pyhd3eb1b0_0
    - pyopenssl==22.0.0=pyhd3eb1b0_0
    - pysocks==1.7.1=py38h06a4308_0
    - python==3.8.13=h12debd9_0
    - readline==8.1.2=h7f8727e_1
    - requests==2.27.1=pyhd3eb1b0_0
    - ruamel_yaml==0.15.100=py38h27cfd23_0
    - setuptools==61.2.0=py38h06a4308_0
    - six==1.16.0=pyhd3eb1b0_1
    - sqlite==3.38.2=hc218d9a_0
    - tk==8.6.11=h1ccaba5_0
    - tqdm==4.63.0=pyhd3eb1b0_0
    - urllib3==1.26.8=pyhd3eb1b0_0
    - wheel==0.37.1=pyhd3eb1b0_0
    - xz==5.2.5=h7b6447c_0
    - yaml==0.2.5=h7b6447c_0
    - zlib==1.2.12=h7f8727e_1


The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-4.5-1_gnu
  brotlipy           pkgs/main/linux-64::brotlipy-0.7.0-py38h27cfd23_1003
  ca-certificates    pkgs/main/linux-64::ca-certificates-2022.3.29-h06a4308_1
  certifi            pkgs/main/linux-64::certifi-2021.10.8-py38h06a4308_2
  cffi               pkgs/main/linux-64::cffi-1.15.0-py38hd667e15_1
  charset-normalizer pkgs/main/noarch::charset-normalizer-2.0.4-pyhd3eb1b0_0
  colorama           pkgs/main/noarch::colorama-0.4.4-pyhd3eb1b0_0
  conda              pkgs/main/linux-64::conda-4.12.0-py38h06a4308_0
  conda-content-tru~ pkgs/main/noarch::conda-content-trust-0.1.1-pyhd3eb1b0_0
  conda-package-han~ pkgs/main/linux-64::conda-package-handling-1.8.1-py38h7f8727e_0
  cryptography       pkgs/main/linux-64::cryptography-36.0.0-py38h9ce1e76_0
  idna               pkgs/main/noarch::idna-3.3-pyhd3eb1b0_0
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.35.1-h7274673_9
  libffi             pkgs/main/linux-64::libffi-3.3-he6710b0_2
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.3.0-h5101ec6_17
  libgomp            pkgs/main/linux-64::libgomp-9.3.0-h5101ec6_17
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.3.0-hd4cf53a_17
  ncurses            pkgs/main/linux-64::ncurses-6.3-h7f8727e_2
  openssl            pkgs/main/linux-64::openssl-1.1.1n-h7f8727e_0
  pip                pkgs/main/linux-64::pip-21.2.4-py38h06a4308_0
  pycosat            pkgs/main/linux-64::pycosat-0.6.3-py38h7b6447c_1
  pycparser          pkgs/main/noarch::pycparser-2.21-pyhd3eb1b0_0
  pyopenssl          pkgs/main/noarch::pyopenssl-22.0.0-pyhd3eb1b0_0
  pysocks            pkgs/main/linux-64::pysocks-1.7.1-py38h06a4308_0
  python             pkgs/main/linux-64::python-3.8.13-h12debd9_0
  readline           pkgs/main/linux-64::readline-8.1.2-h7f8727e_1
  requests           pkgs/main/noarch::requests-2.27.1-pyhd3eb1b0_0
  ruamel_yaml        pkgs/main/linux-64::ruamel_yaml-0.15.100-py38h27cfd23_0
  setuptools         pkgs/main/linux-64::setuptools-61.2.0-py38h06a4308_0
  six                pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_1
  sqlite             pkgs/main/linux-64::sqlite-3.38.2-hc218d9a_0
  tk                 pkgs/main/linux-64::tk-8.6.11-h1ccaba5_0
  tqdm               pkgs/main/noarch::tqdm-4.63.0-pyhd3eb1b0_0
  urllib3            pkgs/main/noarch::urllib3-1.26.8-pyhd3eb1b0_0
  wheel              pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0
  xz                 pkgs/main/linux-64::xz-5.2.5-h7b6447c_0
  yaml               pkgs/main/linux-64::yaml-0.2.5-h7b6447c_0
  zlib               pkgs/main/linux-64::zlib-1.2.12-h7f8727e_1


Preparing transaction: done
Executing transaction: done
installation finished.
WARNING:
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Miniconda3: /nemo/project/home/tanb/.cache/R/basilisk/1.10.2/0
+ /nemo/project/home/tanb/.cache/R/basilisk/1.10.2/0/bin/conda 'create' '--yes' '--prefix' '/nemo/project/home/tanb/.cache/R/basilisk/1.10.2/crisprScore/1.2.0/deephf_basilisk' 'python=3.6' '--quiet' '-c' 'conda-forge' '-c' 'bioconda'
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /nemo/project/home/tanb/.cache/R/basilisk/1.10.2/crisprScore/1.2.0/deephf_basilisk

  added / updated specs:
    - python=3.6


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2016.9.26          |           py36_0         217 KB  conda-forge
    openssl-1.1.1v             |       hd590300_0         1.9 MB  conda-forge
    pip-20.0.2                 |           py36_1         1.9 MB  conda-forge
    python-3.6.15              |hb7a2778_0_cpython        38.4 MB  conda-forge
    setuptools-49.6.0          |   py36h5fab9bb_3         936 KB  conda-forge
    wheel-0.36.2               |     pyhd3deb0d_0          31 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        43.3 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu
  ca-certificates    conda-forge/linux-64::ca-certificates-2023.7.22-hbcca054_0
  certifi            conda-forge/linux-64::certifi-2016.9.26-py36_0
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.40-h41732ed_0
  libffi             conda-forge/linux-64::libffi-3.4.2-h7f98852_5
  libgcc-ng          conda-forge/linux-64::libgcc-ng-13.1.0-he5830b7_0
  libgomp            conda-forge/linux-64::libgomp-13.1.0-he5830b7_0
  libnsl             conda-forge/linux-64::libnsl-2.0.0-h7f98852_0
  libsqlite          conda-forge/linux-64::libsqlite-3.42.0-h2797004_0
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-13.1.0-hfd8a6a1_0
  libzlib            conda-forge/linux-64::libzlib-1.2.13-hd590300_5
  ncurses            conda-forge/linux-64::ncurses-6.4-hcb278e6_0
  openssl            conda-forge/linux-64::openssl-1.1.1v-hd590300_0
  pip                conda-forge/linux-64::pip-20.0.2-py36_1
  python             conda-forge/linux-64::python-3.6.15-hb7a2778_0_cpython
  python_abi         conda-forge/linux-64::python_abi-3.6-2_cp36m
  readline           conda-forge/linux-64::readline-8.2-h8228510_1
  setuptools         conda-forge/linux-64::setuptools-49.6.0-py36h5fab9bb_3
  sqlite             conda-forge/linux-64::sqlite-3.42.0-h2c6b66d_0
  tk                 conda-forge/linux-64::tk-8.6.12-h27826a3_0
  wheel              conda-forge/noarch::wheel-0.36.2-pyhd3deb0d_0
  xz                 conda-forge/linux-64::xz-5.2.6-h166bdaf_0


Preparing transaction: ...working... 
done
Verifying transaction: ...working... done
Executing transaction: ...working... done
+ /nemo/project/home/tanb/.cache/R/basilisk/1.10.2/0/bin/conda 'install' '--yes' '--prefix' '/nemo/project/home/tanb/.cache/R/basilisk/1.10.2/crisprScore/1.2.0/deephf_basilisk' 'python=3.6'
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.12.0
  latest version: 23.7.2

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

+ /nemo/project/home/tanb/.cache/R/basilisk/1.10.2/0/bin/conda 'install' '--yes' '--prefix' '/nemo/project/home/tanb/.cache/R/basilisk/1.10.2/crisprScore/1.2.0/deephf_basilisk' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.6' 'python=3.6' 'viennarna=2.4.5' 'absl-py=0.12.0' 'astor=0.8.1' 'biopython=1.71' 'bleach=1.5.0' 'certifi=2020.12.5' 'cycler=0.10.0' 'decorator=4.4.2' 'dotmap=1.2.20' 'gast=0.4.0' 'GPy=1.9.8' 'GPyOpt=1.2.6' 'grpcio=1.36.1' 'h5py=2.9.0' 'html5lib=0.9999999' 'importlib-metadata=3.7.2' 'Keras=2.1.6' 'kiwisolver=1.3.1' 'Mako=1.1.4' 'Markdown=3.3.4' 'MarkupSafe=1.1.1' 'matplotlib=3.1.1' 'numpy=1.14.0' 'pandas=0.25.3' 'paramz=0.9.5' 'pip=21.0.1' 'protobuf=3.15.5' 'pygpu=0.7.6' 'pyparsing=2.4.7' 'python-dateutil=2.8.1' 'pytz=2021.1' 'PyYAML=5.4.1' 'scikit-learn=0.19.1' 'scipy=1.1.0' 'setuptools=49.6.0' 'six=1.15.0' 'tensorboard=1.8.0' 'tensorflow=1.8.0' 'termcolor=1.1.0' 'Theano=1.0.5' 'tornado=6.1' 'typing-extensions=3.7.4.3' 'webencodings=0.5.1' 'Werkzeug=1.0.1' 'wheel=0.36.2' 'zipp=3.4.1'
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: - 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                           \                                   \ 

UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (python):

What follows after that code chunk is a long list of incompatible packages which ends with this code chunk.

Package libedit conflicts for:
sqlite -> libedit[version='>=3.1.20170329,<3.2.0a0|>=3.1.20181209,<3.2.0a0|>=3.1.20191231,<3.2.0a0']
python[version='3.6.*,3.6.*'] -> sqlite[version='>=3.33.0,<4.0a0'] -> libedit[version='>=3.1.20170329,<3.2.0a0|>=3.1.20181209,<3.2.0a0|>=3.1.20191231,<3.2.0a0']The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.17=0
  - feature:|@/linux-64::__glibc==2.17=0
  - biopython=1.71 -> libgcc-ng[version='>=7.2.0'] -> __glibc[version='>=2.17']
  - gpy=1.9.8 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - grpcio=1.36.1 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
  - h5py=2.9.0 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - keras=2.1.6 -> tensorflow -> __cuda
  - keras=2.1.6 -> tensorflow -> __glibc[version='>=2.17']
  - kiwisolver=1.3.1 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
  - libffi -> libgcc-ng[version='>=9.4.0'] -> __glibc[version='>=2.17']
  - libnsl -> libgcc-ng[version='>=9.4.0'] -> __glibc[version='>=2.17']
  - libstdcxx-ng -> __glibc[version='>=2.17']
  - libzlib -> libgcc-ng[version='>=10.3.0'] -> __glibc[version='>=2.17']
  - markupsafe=1.1.1 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
  - matplotlib=3.1.1 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - ncurses -> libgcc-ng[version='>=10.3.0'] -> __glibc[version='>=2.17']
  - numpy=1.14.0 -> libgcc-ng[version='>=7.2.0'] -> __glibc[version='>=2.17']
  - openssl -> libgcc-ng[version='>=10.3.0'] -> __glibc[version='>=2.17']
  - pandas=0.25.3 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - protobuf=3.15.5 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
  - pygpu=0.7.6 -> libgcc-ng[version='>=9.4.0'] -> __glibc[version='>=2.17']
  - python[version='3.6.*,3.6.*'] -> libgcc-ng[version='>=9.4.0'] -> __glibc[version='>=2.17']
  - pyyaml=5.4.1 -> libgcc-ng[version='>=9.4.0'] -> __glibc[version='>=2.17']
  - readline -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
  - scikit-learn=0.19.1 -> libgcc-ng[version='>=7.2.0'] -> __glibc[version='>=2.17']
  - scipy=1.1.0 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - sqlite -> libgcc-ng[version='>=10.3.0'] -> __glibc[version='>=2.17']
  - tensorboard=1.8.0 -> libgcc-ng[version='>=7.2.0'] -> __glibc[version='>=2.17']
  - tk -> libgcc-ng[version='>=9.4.0'] -> __glibc[version='>=2.17']
  - tornado=6.1 -> libgcc-ng[version='>=10.3.0'] -> __glibc[version='>=2.17']
  - viennarna=2.4.5 -> libgcc-ng[version='>=4.9'] -> __glibc[version='>=2.17']
  - xz -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.17

Note that strict channel priority may have removed packages required for satisfiability.

Error: one or more Python packages failed to install [error code 1]

Could you please advise how to fix this issue? Thank you.

Change default to FALSE for standard_chr_only

addGeneAnnotation module can not annotate the genes on the minus strand

Hi authors,

My studied organism is Schizosaccharomyces pombe . The latest GFF annotation file is from the Pombase website, not from Ensembl. So I construct the TxDb object by giving the GFF file. The BSgenome is also self constructed by using the BSgenome package. I want to find the spacer sequences targeting a specific gene guided by the website Design_CRISPRko_Cas9.

It is strange that the program worked very well for all genes on the plus strand, but it promotes an error for genes on the minus strand ( 'start' or 'end' cannot contain NAs).

The detailed procession is shown below.

> pombe_gff_file <- "./DY47073.gff3"
> pombe_txdb <- getTxDb(pombe_gff_file, organism=NA)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> pombe_grList <- TxDb2GRangesList(pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> library(BSgenome.Spombe.DY47073.4CrisprVerse)
> mybsgenome <- BSgenome.Spombe.DY47073.4CrisprVerse



> # gene on the plus strand
> pombe_ade6_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="cds",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.13")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_ade6_guideSet <- findSpacers(pombe_ade6_gr,
+                              bsgenome=mybsgenome,
+                              crisprNuclease=SpCas9)
> 
> pombe_ade6_guideSet <- addGeneAnnotation(pombe_ade6_guideSet,
+                                    txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns


> # gene on the minus strand
> pombe_vtc4_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="cds",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.14c")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_vtc4_guideSet <- findSpacers(pombe_vtc4_gr,
+                                    bsgenome=mybsgenome,
+                                    crisprNuclease=SpCas9)
> 
> pombe_vtc4_guideSet <- addGeneAnnotation(pombe_vtc4_guideSet,
+                                          txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
Error in.new_IRanges_from_start_end(start, end): 
  'start' or 'end' cannot contain NAs

Below are the part of two genes from gff file.

chrIII	Liftoff	gene	1363196	1364854	.	+	.	ID=SPCC1322.13;Name=ade6;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.13_0
chrIII	Liftoff	mRNA	1363196	1364854	.	+	.	ID=SPCC1322.13.1;Parent=SPCC1322.13;product=phosphoribosylaminoimidazole carboxylase Ade6;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII	Liftoff	exon	1363196	1364854	.	+	.	ID=SPCC1322.13.1:exon:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0
chrIII	Liftoff	CDS	1363196	1364854	.	+	0	ID=SPCC1322.13.1:CDS:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0

chrIII	Liftoff	gene	1365284	1367449	.	-	.	ID=SPCC1322.14c;Name=vtc4;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.14c_0
chrIII	Liftoff	mRNA	1365284	1367449	.	-	.	ID=SPCC1322.14c.1;Parent=SPCC1322.14c;product=vacuolar transporter chaperone (VTC) complex subunit;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII	Liftoff	exon	1365284	1367449	.	-	.	ID=SPCC1322.14c.1:exon:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0
chrIII	Liftoff	CDS	1365284	1367449	.	-	0	ID=SPCC1322.14c.1:CDS:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0

I don't know how to debug this. I am looking forward to your reply.

CrisprVerse problem: export a guideSet in a file

Shortcut to my problem:
I need to find unique protospacers. I forged a R package with a newly released genome of A thaliana. Therefore is lacking TxDb object and I cannot visualize my best protospacers from guideSet with CrisprViz lacking a geneModel parameter. I need to visualize guideSet database with info on the Off-target.

Longer explanation:

I created a BSgenome package. It seems to work.
I made GRanges in the centromere 2 for searching my protospacers.
gr_only_Chr2 <- GRanges(seqnames = c("Chr2"), strand = c("*"), ranges = IRanges(start = c(5430214,6216078,6584087), end = c(5439350,6250009,6645423)))
I successfully created a guideSet with findSpacers(). It reported a GuideSet object with 5152 ranges.
guideSet <- findSpacers(gr_only_Chr2, bsgenome=BSgenome.AtNaish2021.NCBI.Col, crisprNuclease=LbCas12a)
I made a bowtie alignement for seqrching off-target matches of the protospacer designed with guideSet in all the chromosomes. I also tried with single chromnosomes to reduce computing power demand in later steps.
fastaFile <- "C:/folder/sequence.fasta" outdirAll <- tempdir() Rbowtie::bowtie_build(fastaFile, outdir= outdirAll, force=TRUE, prefix="Arabidopsis") index <- file.path(outdirAll, index_prefix="Arabidopsis")
With addSpacerAlignments(), I scored I aligned the protospacers with my newly forged BSgenome package.
guideSet <- addSpacerAlignments(guideSet, aligner = "bowtie", colname = "alignments", aligner_index=index, bsgenome=bsgenome, n_mismatches=1, n_max_alignments = 3)

Now, I struggle anyway to visualize my data. I do not have annotations so the command plotGuideSet() from ‘CrisprViz’ cannot be used. I am currently try to forge correctly formatted GRangeList from TxDb to use as geneModel. This procedure will take time.

My questions are:
Is there a way to simply export the guideSet data frame to a txt file to speed up my analysis?
Should I invest my time to create the annotations "geneModel-friendly" for visualize my result with plotGuideSet()? Do you have suggestion how?

Thanks for the your kind assistance

guideSetExampleFullAnnotation lacking exon_id in geneAnnotation.

addGeneAnnotation now includes exon_id, which is necessary for addExonTable function; need to regenerate guideSetExampleFullAnnotation object.

what's the meaning of "alignments" in GuideSet object ?

For example, I will type some codes like this:

library(crisprDesign)
data(guideSetExampleWithAlignments, package="crisprDesign")
test<-guideSetExampleWithAlignments
unlist(mcols(test)["spacer_10",]$alignments)

then I can got

GRanges object with 28 ranges and 15 metadata columns:
                         seqnames    ranges strand |               spacer          protospacer
                            <Rle> <IRanges>  <Rle> |       <DNAStringSet>       <DNAStringSet>
  spacer_524.spacer_1019    chr12 101842137      + | TGCTTTCCAGATTACTCTCA TGCTTTCCAGAATACTGTCA
   spacer_524.spacer_505    chr19  51689401      + | CAGAGCCGTGGAGGAGGAGA CAGAGCCGGGGAGGAGGTGA
   spacer_524.spacer_505    chr11  48066239      - | CAGAGCCGTGGAGGAGGAGA CAGAGCACTGGAGGAGGAGT
   spacer_524.spacer_505    chr12  78501961      - | CAGAGCCGTGGAGGAGGAGA CAGAGACAGGGAGGAGGAGA
   spacer_524.spacer_505     chr1 227493027      - | CAGAGCCGTGGAGGAGGAGA CACAGCCGAGGAGGAGGAGG
                     ...      ...       ...    ... .                  ...                  ...
    spacer_524.spacer_10    chr14 105159879      - | GGGTGTGGATGAGGCTCTGC GGGTGTGGATGGAGCTCTGG
    spacer_524.spacer_10    chr16  49657876      - | GGGTGTGGATGAGGCTCTGC GGCTGTGGAGGAGGCTATGC
   spacer_524.spacer_965     chrX 103979982      - | GGCTGCAGCACACCAGGCGG GGCTGCAGCAGCCCAGGCTG
   spacer_524.spacer_965    chr17  41047092      + | GGCTGCAGCACACCAGGCGG GGTCGCAGCACACCGGGCGG
   spacer_524.spacer_503    chr12    138847      - | AGGAGGAGACGGATATGTTC AGGAGGAGACGGATATGTTC
...

So why the alignments for spacer_10 have many aligned results from another spacers?

Validating Existing gRNA Libraries - Error in Off-Target Characterization (addSpacerAlignment)

Hi, really appreciate for the tools provided by crisprVerse team. I tried to score different sgRNA libraries using Validating Existing gRNA Libraries tutorial. First, I used Avana library (70018 rows) and successfully generate the on and off target scoring. However, when I use Cellecta library (150076 rows), an error occurred in addSpacerAlignment function (Off-target characterization).

[runCrisprBowtie] Using BSgenome.Hsapiens.UCSC.hg38 
[runCrisprBowtie] Searching for SpCas9 protospacers 

reads processed: 149545
reads with at least one alignment: 149545 (100.00%)
reads that failed to align: 0 (0.00%)
Reported 6177820 alignments

Error in METHOD(x, i) : 
  Subsetting operation on CompressedGRangesList object 'x'
  produces a result that is too big to be represented as a
  CompressedList object. Please try to coerce 'x' to a SimpleList
  object first (with 'as(x, "SimpleList")').

The ensuing alignment generate large data (614520 rows), after subsequent data filtering and construction of guideset as mentioned in the tutorial, the resulting guideset consists of (231660 rows). Furthermore, I noticed this error similar to other package in #312 and #328. Kindly assists in this issue, any suggestion and advice would be appreciated.

This is my session info

R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_IE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=en_IE.UTF-8    
 [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_IE.UTF-8   
 [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape_0.8.9                     ggfortify_0.4.16                 
 [3] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3                  
 [5] Biostrings_2.66.0                 XVector_0.38.0                   
 [7] crisprDesignData_0.99.28          crisprViz_1.0.0                  
 [9] crisprDesign_1.0.0                crisprScore_1.2.0                
[11] crisprScoreData_1.2.0             ExperimentHub_2.6.0              
[13] AnnotationHub_3.6.0               BiocFileCache_2.6.1              
[15] dbplyr_2.3.2                      crisprBowtie_1.2.0               
[17] crisprBase_1.2.0                  crisprVerse_1.0.0                
[19] splitstackshape_1.4.8             rtracklayer_1.58.0               
[21] GenomicRanges_1.50.2              GenomeInfoDb_1.34.9              
[23] IRanges_2.32.0                    S4Vectors_0.36.2                 
[25] BiocGenerics_0.44.0               geno2proteo_0.0.6                
[27] patchwork_1.1.2                   hgnc_0.1.2                       
[29] data.table_1.14.8                 lubridate_1.9.2                  
[31] forcats_1.0.0                     stringr_1.5.0                    
[33] dplyr_1.1.1                       purrr_1.0.1                      
[35] readr_2.1.4                       tidyr_1.3.0                      
[37] tibble_3.2.1                      ggplot2_3.4.2                    
[39] tidyverse_2.0.0                   UniprotR_2.2.2                   

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                    reticulate_1.28              
  [3] R.utils_2.12.2                RUnit_0.4.32                 
  [5] tidyselect_1.2.0              RSQLite_2.3.1                
  [7] AnnotationDbi_1.60.2          htmlwidgets_1.6.2            
  [9] grid_4.2.3                    BiocParallel_1.32.6          
 [11] airr_1.4.1                    munsell_0.5.0                
 [13] codetools_0.2-19              interp_1.1-4                 
 [15] withr_2.5.0                   colorspace_2.1-0             
 [17] Biobase_2.58.0                filelock_1.0.2               
 [19] knitr_1.42                    rstudioapi_0.14              
 [21] ggsignif_0.6.4                MatrixGenerics_1.10.0        
 [23] GenomeInfoDbData_1.2.9        bit64_4.0.5                  
 [25] basilisk_1.10.2               vctrs_0.6.1                  
 [27] generics_0.1.3                xfun_0.38                    
 [29] biovizBase_1.46.0             timechange_0.2.0             
 [31] randomForest_4.7-1.1          R6_2.5.1                     
 [33] AnnotationFilter_1.22.0       bitops_1.0-7                 
 [35] cachem_1.0.7                  DelayedArray_0.24.0          
 [37] vroom_1.6.1                   promises_1.2.0.1             
 [39] BiocIO_1.8.0                  networkD3_0.4                
 [41] scales_1.2.1                  nnet_7.3-18                  
 [43] gtable_0.3.3                  ensembldb_2.22.0             
 [45] rlang_1.1.0                   rstatix_0.7.2                
 [47] lazyeval_0.2.2                dichromat_2.0-0.1            
 [49] checkmate_2.1.0               broom_1.0.4                  
 [51] BiocManager_1.30.20           yaml_2.3.7                   
 [53] abind_1.4-5                   GenomicFeatures_1.50.4       
 [55] backports_1.4.1               httpuv_1.6.9                 
 [57] Hmisc_5.0-1                   tools_4.2.3                  
 [59] ellipsis_0.3.2                RColorBrewer_1.1-3           
 [61] Rcpp_1.0.10                   plyr_1.8.8                   
 [63] base64enc_0.1-3               progress_1.2.2               
 [65] zlibbioc_1.44.0               RCurl_1.98-1.12              
 [67] basilisk.utils_1.10.0         prettyunits_1.1.1            
 [69] deldir_1.0-6                  rpart_4.1.19                 
 [71] ggpubr_0.6.0                  cluster_2.1.4                
 [73] SummarizedExperiment_1.28.0   magrittr_2.0.3               
 [75] magick_2.7.4                  alakazam_1.2.1               
 [77] ProtGenerics_1.30.0           matrixStats_0.63.0           
 [79] evaluate_0.20                 hms_1.1.3                    
 [81] mime_0.12                     xtable_1.8-4                 
 [83] XML_3.99-0.14                 jpeg_0.1-10                  
 [85] gridExtra_2.3                 compiler_4.2.3               
 [87] biomaRt_2.54.1                crayon_1.5.2                 
 [89] R.oo_1.25.0                   htmltools_0.5.5              
 [91] later_1.3.0                   tzdb_0.3.0                   
 [93] Formula_1.2-5                 qdapRegex_0.7.5              
 [95] Rbowtie_1.38.0                DBI_1.1.3                    
 [97] gprofiler2_0.2.1              MASS_7.3-58.2                
 [99] rappdirs_0.3.3                data.tree_1.0.0              
[101] Matrix_1.5-3                  ade4_1.7-22                  
[103] car_3.1-2                     cli_3.6.1                    
[105] R.methodsS3_1.8.2             parallel_4.2.3               
[107] Gviz_1.42.1                   igraph_1.4.2                 
[109] pkgconfig_2.0.3               GenomicAlignments_1.34.1     
[111] dir.expiry_1.6.0              foreign_0.8-84               
[113] plotly_4.10.1                 xml2_1.3.3                   
[115] VariantAnnotation_1.44.1      digest_0.6.31                
[117] rmarkdown_2.21                htmlTable_2.4.1              
[119] restfulr_0.0.15               curl_5.0.0                   
[121] shiny_1.7.4                   Rsamtools_2.14.0             
[123] rjson_0.2.21                  lifecycle_1.0.3              
[125] nlme_3.1-162                  jsonlite_1.8.4               
[127] carData_3.0-5                 seqinr_4.2-30                
[129] viridisLite_0.4.1             fansi_1.0.4                  
[131] pillar_1.9.0                  ggsci_3.0.0                  
[133] lattice_0.20-45               KEGGREST_1.38.0              
[135] fastmap_1.1.1                 httr_1.4.5                   
[137] interactiveDisplayBase_1.36.0 glue_1.6.2                   
[139] png_0.1-8                     BiocVersion_3.16.0           
[141] bit_4.0.5                     stringi_1.7.12               
[143] blob_1.2.4                    latticeExtra_0.6-30          
[145] memoise_2.0.1                 ape_5.7-1

The style specified by 'NCBI' does not have a compatible entry for the species Danio rerio

when I use TxDb2GRangesList, I met an error called "The style specified by 'NCBI' does not have a compatible entry for the species Danio rerio". Do you know why? I didn't find any code that contains extractSeqlevels

Increase barcode length for Levenshtein distance

For n cycles, also search in the n+min_dist_edit-length space

`addSpacerAlignmentsIterative` – Error in `curl::curl_fetch_memory`

When I run alignment a few times (trying to troubleshot my codes), I get this error:

R[write to console]: Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [[www.ensembl.org:443](https://www.ensembl.org/)] Operation timed out after 10002 milliseconds with 514839 bytes received

`max_mm` vs `n_mismatches`

You can specify the number of mismatches for addSpacerAlignments using n_mismatches and you can do the same for addOffTargetScores using max_mm. Should there ever be a difference between the value for max_mm and for n_mismatches?

Error building gene annotation object with GFF for CHM13v2.0

Hi developers,
I am having some issues with making a gene annotation object for CMH13 using the GFF file obtained here: https://github.com/marbl/CHM13

Here's the error message I got:

> txdb <- getTxDb(organism = "Homo sapiens", file = "./ref_genome/chm13v2.0_RefSeq_Liftoff_v5.1.gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  some transcripts have no "transcript_id" attribute ==> their name ("tx_name" column
  in the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  the transcript names ("tx_name" column in the TxDb object) imported from the
  "transcript_id" attribute are not unique
3: In .find_exon_cds(exons, cds) :
  The following transcripts have exons that contain more than one CDS (only the first
  CDS was kept for each exon): NM_001134939.1, NM_001172437.2, NM_001184961.1,
  NM_001301020.1, NM_001301302.1, NM_001301371.1, NM_002537.3, NM_004152.3,
  NM_015068.3, NM_016178.2
> grList <- TxDb2GRangesList(txdb)
'select()' returned many:many mapping between keys and columns
Error in `rownames<-`(`*tmp*`, value = names(x)) : 
  missing values not allowed in rownames
In addition: Warning message:
In .set_group_names(ans, use.names, txdb, "tx") :
  some group names are NAs or duplicated

Could you please advice on how to proceed? I don't see any duplicated row names on the TxDB file, so very unsure what the error message means...

Additionally, does the warning messages affect the generation of the GRanges object? There are some NAs and duplicated tx_name in the TxDB object.

Thank you!

DNA and RNA bulge off target DSB

I want to consider the off target risk of DNA and gRNA bulges in the design of my protospacers. Is there a system in CrisprVerse that allows me to do it with my custom pakage genomes or should I rely on other programs e.g. Cas OFF finder?

Thank for your work

Unexpected PAM truncation

Hi JP and team, I'm trying to make a new CrisprNuclease object based on an enzyme that has been shown to have a more permissive pam sequence, which I initially tried to encode by specifying more pams and weights. When I did this, I found that the pams appear to be internally capped at 4:

> pams
 [1] "(3/3)ACC" "(3/3)CCC" "(3/3)TCC" "(3/3)GCC" "(3/3)ACA" "(3/3)CCA" "(3/3)TCA" "(3/3)GCA" "(3/3)ACG" "(3/3)CCG" "(3/3)TCG" "(3/3)GCG"
[13] "(3/3)ACT" "(3/3)CCT" "(3/3)TCT" "(3/3)GCT"
> pw
 [1] 0.40 0.40 0.40 0.40 0.43 0.43 0.43 0.43 0.32 0.32 0.32 0.32 0.30 0.30 0.30 0.30
> 
> eNme2c <- CrisprNuclease("eNme2c",
+                          targetType="DNA",
+                          pams=pams,
+                          weights=pw,
+                          metadata=list(description="eNme2c nuclease, Cas9 variant from Neisseria meningitidis"),
+                          pam_side="3prime",
+                          spacer_length=20)
> 
> pams(eNme2c)
DNAStringSet object of length 4:
    width seq                                                                                                            names               
[1]     3 ACA                                                                                                            ACA
[2]     3 CCA                                                                                                            CCA
[3]     3 TCA                                                                                                            TCA
[4]     3 GCA                                                                                                            GCA

This does not happen when I try to make a simple Nuclease object, but is introduced when turn that into a CrisprNuclease:

> flarg <- Nuclease('Flarg', 'DNA', motifs = pams, weights = pw)
> motifs(flarg)
DNAStringSet object of length 16:
     width seq
 [1]     3 ACC
 [2]     3 CCC
 [3]     3 TCC
 [4]     3 GCC
 [5]     3 ACA
 ...   ... ...
[12]     3 GCG
[13]     3 ACT
[14]     3 CCT
[15]     3 TCT
[16]     3 GCT
> flarg.cn <- new("CrisprNuclease", flarg, pam_side="3prime", spacer_length = as.integer(20))
> pams(flarg.cn)
DNAStringSet object of length 4:
    width seq                                                                                                            names               
[1]     3 ACA                                                                                                            ACA
[2]     3 CCA                                                                                                            CCA
[3]     3 TCA                                                                                                            TCA
[4]     3 GCA                                                                                                            GCA

I personally have a workaround for this use case, but I thought I would raise it in case this isn't the functionality you want.

Thanks again for this awesome toolset!

I don't konw why but the off-targets found by addSpacerAlignments was abosolutlely wrong?

Design spacer for CRISPR-Csm transcript targeting systems [feature request]

Are you adding any features for CRISPR-Csm transcript targeting systems?

https://www.nature.com/articles/s41587-022-01649-9

`addSpacerAlignments` fails to find off-targets with bowtie and custom genomes

Thanks for maintaining this suite of packages.
I just ran into a series of problems when using addSpacerAlignments with bowtie instead of biostrings as aligner.
Basically I'm working with bacterial genomes that are well known and annotated in NCBI but often have no BSgenome package available. I therefore need to build those custom-wise. This works only until the point I'm trying to annotate off targets with addSpacerAlignments. The following example illustrates this:

library(tidyverse)
library(Biostrings)
library(GenomicRanges)
library(GenomicFeatures)
library(crisprBase)
library(crisprDesign)

# import genome annotation from GFF
txdb <- makeTxDbFromGFF("results/get_genome/genome.gff")

# import sequence from FASTA
genome_dna <- readDNAStringSet("results/get_genome/genome.fasta")

# find spacers
data(list = c("SpCas9"), package = "crisprBase")
list_pred_guides <- findSpacers(
  x = genome_dna,
  spacer_len = 20,
  crisprNuclease = SpCas9
)

# add off targets using bowtie
library(BSgenomeSalmonellaenterica)
list_pred_guides <- addSpacerAlignments(
  list_pred_guides[1:1000],
  aligner = "bowtie",
  aligner_index = "results/bowtie_index/index",
  bsgenome = BSgenomeSalmonellaenterica,
  addSummary = TRUE,
  n_mismatches = 3
)

The output I get indicates that reads were aligned but they don't show up in the guide set:

Loading required namespace: crisprBwa
[runCrisprBowtie] Using BSgenomeSalmonellaenterica 
[runCrisprBowtie] Searching for SpCas9 protospacers 
# reads processed: 1000
# reads with at least one alignment: 1000 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 2408 alignments

> # but no alignments were added to guide set
> all(list_pred_guides$n0 == 0)
[1] TRUE

I then tried the same using the txdb object as input. Fails with either a time out error or a biomart error related to the genome being non-standard.

addSpacerAlignments(
  list_pred_guides[1:1000],
  aligner = "bowtie",
  aligner_index = "results/bowtie_index/index",
  bsgenome = BSgenomeSalmonellaenterica,
  addSummary = TRUE,
  n_mismatches = 3,
  txObject = txdb
)

Error in .getBiomartData(txdb, organism) :                                                                         
  Organism "NA" not recognized in biomaRt. You can use",
                "organism=NULL as a solution.

Using the conversion function TxDb2GRangesList(txdb) as a fallback throws the same error.
I attached files with an example genome, the BSgenome library and the bowtie index --> input.zip

Any ideas for this problem are highly appreciated.

Thanks for your help!

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS/LAPACK: /home/michael/micromamba/envs/snakemake-crispr-guides/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] BSgenomeSalmonellaenterica_1.0.0 BSgenome_1.66.3                 
 [3] rtracklayer_1.58.0               crisprDesign_1.0.0              
 [5] crisprBase_1.2.0                 GenomicFeatures_1.50.4          
 [7] AnnotationDbi_1.60.2             Biobase_2.58.0                  
 [9] GenomicRanges_1.50.2             Biostrings_2.66.0               
[11] GenomeInfoDb_1.34.9              XVector_0.38.0                  
[13] IRanges_2.32.0                   S4Vectors_0.36.2                
[15] BiocGenerics_0.44.0              lubridate_1.9.2                 
[17] forcats_1.0.0                    stringr_1.5.0                   
[19] dplyr_1.1.1                      purrr_1.0.1                     
[21] readr_2.1.4                      tidyr_1.3.0                     
[23] tibble_3.2.1                     ggplot2_3.4.2                   
[25] tidyverse_2.0.0                 

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                  matrixStats_0.63.0           
 [3] bit64_4.0.5                   filelock_1.0.2               
 [5] progress_1.2.2                httr_1.4.5                   
 [7] tools_4.2.2                   utf8_1.2.3                   
 [9] R6_2.5.1                      DBI_1.1.3                    
[11] colorspace_2.1-0              withr_2.5.0                  
[13] tidyselect_1.2.0              prettyunits_1.1.1            
[15] bit_4.0.5                     curl_4.3.3                   
[17] compiler_4.2.2                crisprBowtie_1.2.0           
[19] cli_3.6.1                     basilisk.utils_1.10.0        
[21] crisprScoreData_1.2.0         xml2_1.3.3                   
[23] DelayedArray_0.24.0           scales_1.2.1                 
[25] randomForest_4.7-1.1          rappdirs_0.3.3               
[27] digest_0.6.31                 Rsamtools_2.14.0             
[29] crisprScore_1.2.0             basilisk_1.10.2              
[31] htmltools_0.5.5               pkgconfig_2.0.3              
[33] MatrixGenerics_1.10.0         dbplyr_2.3.2                 
[35] fastmap_1.1.1                 rlang_1.1.0                  
[37] RSQLite_2.3.1                 shiny_1.7.4                  
[39] BiocIO_1.8.0                  generics_0.1.3               
[41] jsonlite_1.8.4                vroom_1.6.1                  
[43] BiocParallel_1.32.6           VariantAnnotation_1.44.1     
[45] RCurl_1.98-1.12               magrittr_2.0.3               
[47] GenomeInfoDbData_1.2.9        Matrix_1.5-4                 
[49] Rcpp_1.0.10                   munsell_0.5.0                
[51] fansi_1.0.4                   reticulate_1.28              
[53] Rbowtie_1.38.0                lifecycle_1.0.3              
[55] stringi_1.7.12                yaml_2.3.7                   
[57] SummarizedExperiment_1.28.0   zlibbioc_1.44.0              
[59] BiocFileCache_2.6.1           AnnotationHub_3.6.0          
[61] grid_4.2.2                    blob_1.2.4                   
[63] promises_1.2.0.1              parallel_4.2.2               
[65] ExperimentHub_2.6.0           crayon_1.5.2                 
[67] crisprBwa_1.2.0               dir.expiry_1.6.0             
[69] lattice_0.21-8                hms_1.1.3                    
[71] KEGGREST_1.38.0               pillar_1.9.0                 
[73] rjson_0.2.21                  codetools_0.2-19             
[75] biomaRt_2.54.1                XML_3.99-0.14                
[77] glue_1.6.2                    BiocVersion_3.16.0           
[79] BiocManager_1.30.20           httpuv_1.6.9                 
[81] png_0.1-8                     vctrs_0.6.1                  
[83] tzdb_0.3.0                    gtable_0.3.3                 
[85] cachem_1.0.7                  mime_0.12                    
[87] Rbwa_1.2.0                    xtable_1.8-4                 
[89] restfulr_0.0.15               later_1.3.0                  
[91] GenomicAlignments_1.34.1      memoise_2.0.1                
[93] timechange_0.2.0              ellipsis_0.3.2               
[95] interactiveDisplayBase_1.36.0

"downstreamATG" colname misspelled

"downstreamATG" colname misspelled in output.

onTargets for guideSet object always get nothing

crisprViz: Error in TxDb2GRangesList (.getBiomartData(txdb, organism) : Organism "NA" not recognized in biomaRt)

Really appreciate the crisprVerse team for this robust and versatile tool to visualize and annotate sgRNAs. I tried the crisprViz tool by using the example datasets provided (gpr21GuideSet and gpr21GeneModel), and it works perfectly fine, the same as in the tutorial

Now I am interested in visualizing my sgRNAs targeting a particular gene of interest. First to build a gene model, the subset of txdb_human (GRangesList) was retrieved from crisprDesignData (as mentioned in the documentation on how to build the gpr21GeneModel). This is the step-by-step of what I have tried:

Import the txdb_human (GRangesList) from crisprDesignData
Unlist the GRangesList object
Taking a subset of txdb_human by only selecting a gene and its canonical transcript (using subset function)
Create a 'type' column in the metadata to suit the required format input in makeTxDbFromGRanges
TxDb object successfully created by using makeTxDbFromGRanges function
Convert the TxDb object from step number 5 into GRangesList (the required format for plotGuideSet in crisprViz) by using TxDb2GRangesList

My plan is to directly run the plotGuideSet function after the GRangesList object is successfully created (already have the sgRNA GuideSet object). However, in the step 6, an error occurred :

> granges_list_gene_model <- TxDb2GRangesList(granges_gene_model_txdb, 
+                                             standardChromOnly = TRUE,
+                                             genome = 'hg38',
+                                             seqlevelsStyle = 'UCSC')
Error in .getBiomartData(txdb, organism) : 
Organism "NA" not recognized in biomaRt. You can use",
"organism=NULL as a solution.

I checked the genomeInfo inside the GRanges object of my gene model and compared it with the gpr21GeneModel. Both indicate the same Organism: Homo sapiens. Furthermore, I noticed the TxDb2GRangesList doesn't have a parameter to state what kind of organism the user can specify.

Looking at the source code, it turns out this function is linked with another function getTxDb which allows the user to specify the organism (default: Homo sapiens). Since specifying the organism is not a parameter in the TxDb2GRangesList function, this means the user doesn't have direct control over it. Despite it was stated that the user can use 'organism = NULL' as a solution.

Could any of the team assist in this error? in particular, the steps that I have taken so far or any other way around to resolve this issue, looking forward to hearing more soon.

`addSpacerAlignments` maximum `n_mismatches` query

Our team has detected some off-target editing in vitro in regions with 4 or 5 mismatches when aligned to the gRNA, and we would like to design guides with this in mind. I understand that increasing the number of allowed mismatches would exponentially increase the computational demand, but I think it would be useful to attempt addSpacerAlignments with n_mismatches> 3. Could this be implemented, please?

GuideSet2DataFrames: improvements

Change to handle NTCs
BE tables are currently being ignored.

Reported by Maggie

Support guide efficiency prediction for DjCas13d [feature request]

It would be nice if crisprVerse can support DjCas13d!

Wei, J., Lotfy, P., Faizi, K., Baungaard, S., Gibson, E., Wang, E., Slabodkin, H., Kinnaman, E., Chandrasekaran, S., Kitano, H., Durrant, M. G., Duffy, C. V., Pawluk, A., Hsu, P. D., & Konermann, S. (2023). Deep learning and CRISPR-Cas13d ortholog discovery for optimized RNA targeting. Cell systems, 14(12), 1087–1102.e13. https://doi.org/10.1016/j.cels.2023.11.006

also see https://github.com/ArcInstitute/RNAtargeting_web_custom

cc @nick-youngblut, @jingyi7777

addSpacerAlignments Error in .new_IRanges_from_start_width(start, width) : 'start' or 'width' cannot contain NAs

From this call:

addSpacerAlignments(guideSet,
           aligner="bowtie",
           aligner_index=canFam3,
           bsgenome=BSgenome.Cfamiliaris.UCSC.canFam3,
           n_mismatches=3,
           txObject=txdb_canine)

I get the following output, where IRanges complains:

[runCrisprBowtie] Using BSgenome.Cfamiliaris.UCSC.canFam3
[runCrisprBowtie] Searching for SpCas9 protospacers
# reads processed: 10
# reads with at least one alignment: 10 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1808 alignments

 Error in .new_IRanges_from_start_width(start, width) : start' or 'width' cannot contain NAs

The error is not terribly informative, could somebody kindly help troubleshoot this? It works when using the human data in the tutorial.