neurogenomics / hpoexplorer Goto Github PK

View Code? Open in Web Editor NEW

2.0 3.0 1.0 11.44 MB

Functions for working with the Human Phenotype Ontology data

Home Page: https://neurogenomics.github.io/HPOExplorer/

R 100.00%

ontologies rare-disease genetics bioinformatics r-package human-phenotype-ontology clinical-genomics phenome

hpoexplorer's Introduction

Authors: Brian Schilder, Robert Gordon-Smith, Nathan Skene

Most recent update: Mar-08-2024

Intro

About HPO

The Human Phenotype Ontology (HPO) is a controlled vocabulary of phenotypic abnormalities encountered in human disease. It currently contains over 18,000 hierarchically organised terms. Each term in the HPO describes a phenotypic abnormality, ranging from very broad phenotypes (e.g. “Abnormality of the nervous system”) down to extremely specific phenotypes (e.g. “Decreased CSF 5-hydroxyindolacetic acid concentration”).

The HPO is currently being used in thousands of exome and genome sequencing projects around the world to aid in the interpretation of human variation, in clinical practice to support differential diagnosis and to annotate patient information, and in research to understand the role of rare variants in human health and disease. The HPO was developed by the Monarch Initiative in collaboration with The Jackson Laboratory.

About `HPOExplorer`

HPOExplorer is an R package with extensive functions for easily importing, annotating, filtering, and visualising the Human Phenotype Ontology (HPO) at the disease, phenotype, and gene levels. By pulling fresh data directly from official resources like HPO, Monarch and GenCC, it ensures tightly controlled version coordination with the most up-to-date data available at any given time (with the option to use caching to boost speed). Furthermore, it can efficiently reorganise gene annotations into sparse matrices for usage within downstream statistical and machine learning analysis.

HPOExplorer was developed by the Neurogenomics Lab at Imperial College London, along with valuable feedback provided by the HPO team. This package is still actively evolving and growing. Community engagement is welcome and any suggestions can be submitted as an Issue or Pull Request.

Installation

Within R:

if(!require("BiocManager")) install.packages("BiocManager")

BiocManager::install("neurogenomics/HPOExplorer")
library(HPOExplorer)

Documentation website

Getting started

A quick tutorial on how to get started with HPOExplorer.

Docker

HPOExplorer is also available via DockerHub. Click here for instructions on how to create a Docker or Singularity container with HPOExplorer and Rstudio pre-installed.

Citation

If you use HPOExplorer, please cite:

Kitty B. Murphy, Robert Gordon-Smith, Jai Chapman, Momoko Otani, Brian M. Schilder, Nathan G. Skene (2023) Identification of cell type-specific gene targets underlying thousands of rare diseases and subtraits. medRxiv, https://doi.org/10.1101/2023.02.13.23285820

Session Info

utils::sessionInfo()

## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.4        jsonlite_1.8.8      renv_1.0.3         
##  [4] dplyr_1.1.4         compiler_4.3.1      BiocManager_1.30.22
##  [7] tidyselect_1.2.0    rvcheck_0.2.1       scales_1.3.0       
## [10] yaml_2.3.8          fastmap_1.1.1       here_1.0.1         
## [13] ggplot2_3.4.4       R6_2.5.1            generics_0.1.3     
## [16] knitr_1.45          yulab.utils_0.1.4   tibble_3.2.1       
## [19] desc_1.4.3          dlstats_0.1.7       rprojroot_2.0.4    
## [22] munsell_0.5.0       pillar_1.9.0        RColorBrewer_1.1-3 
## [25] rlang_1.1.3         utf8_1.2.4          cachem_1.0.8       
## [28] badger_0.2.3        xfun_0.42           fs_1.6.3           
## [31] memoise_2.0.1.9000  cli_3.6.2           magrittr_2.0.3     
## [34] rworkflows_1.0.1    digest_0.6.34       grid_4.3.1         
## [37] rstudioapi_0.15.0   lifecycle_1.0.4     vctrs_0.6.5        
## [40] data.table_1.15.0   evaluate_0.23       glue_1.7.0         
## [43] fansi_1.0.6         colorspace_2.1-0    rmarkdown_2.25     
## [46] tools_4.3.1         pkgconfig_2.0.3     htmltools_0.5.7

hpoexplorer's People

Contributors

Stargazers

Watchers

Forkers

lizzyjoan

hpoexplorer's Issues

Visualise disease-phenotype overlap

Currently doing this via visNetwork networks, but this gets messy when you try to scale it up to all phenotypes.

Instead, might try to summarise this with a circos plot, where line thickness is the number of genes (or proportion of genes, or pearson correlation) of a particular phenotype to each associated disease.

Might also want to cite this paper at some point in the rare diseases manuscript:
https://twitter.com/MiriForbes/status/1638308495979352064?s=20

add ontoPlot heatmap function

Need to add the functions to build the non-interactive version of the network plot with heat mapped colour for nodes.

It's just a modification of the onto_plot function that comes with the ontologyIndex package to allow mapping of the results to the colour of the nodes.

Start version at 0.99.0

This conforms to Bioconductor standards

Improve HPO data version control

One of the most important things for consistency with the rare disease analyses is knowing exactly which version of the HPO data we're using.

The caching feature within HPOExplorer is helpful in that it speeds things up, but it also means it's unclear which version of the data was cached.

The HPO ontology obj: `get_hpo`

As I now get the ontology from the official HPO GH releases, it already includes metadata about the precise release version.
This is accessible via attr(hpo,"version") or simply by printing the hpo object.
I've also added a new internal function make_hpo which constructs a new ontologyIndex object from the OBO file provided on the HPO GH Releases page.

> attr(hpo,"version")
 [1] "format-version: 1.2"                                                                                                                                                                                                             
 [2] "data-version: hp/releases/2023-10-09/hp-base.owl"                                                                                                                                                                                
 [3] "subsetdef: hposlim_core \"Core clinical terminology\""                                                                                                                                                                           
 [4] "subsetdef: secondary_consequence \"Consequence of a disorder in another organ system.\""                                                                                                                                         
 [5] "synonymtypedef: abbreviation \"abbreviation\""                                                                                                                                                                                   
 [6] "synonymtypedef: HP:0034334 \"allelic_requirement\""                                                                                                                                                                              
 [7] "synonymtypedef: layperson \"layperson term\""                                                                                                                                                                                    
 [8] "synonymtypedef: obsolete_synonym \"discarded/obsoleted synonym\""                                                                                                                                                                
 [9] "synonymtypedef: plural_form \"plural form\""                                                                                                                                                                                     
[10] "synonymtypedef: uk_spelling \"UK spelling\""                                                                                                                                                                                     
[11] "default-namespace: human_phenotype"                                                                                                                                                                                              
[12] "remark: Please see license of HPO at http://www.human-phenotype-ontology.org"                                                                                                                                                    
[13] "ontology: hp/hp-base"                                                                                                                                                                                                            
[14] "property_value: http://purl.org/dc/elements/1.1/creator \"Human Phenotype Ontology Consortium\" xsd:string"                                                                                                                      
[15] "property_value: http://purl.org/dc/elements/1.1/creator \"Monarch Initiative\" xsd:string"                                                                                                                                       
[16] "property_value: http://purl.org/dc/elements/1.1/creator \"Peter Robinson\" xsd:string"                                                                                                                                           
[17] "property_value: http://purl.org/dc/elements/1.1/creator \"Sebastian KÃ¶hler\" xsd:string"                                                                                                                                        
[18] "property_value: http://purl.org/dc/elements/1.1/description \"The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities and clinical features encountered in human disease.\" xsd:string"
[19] "property_value: http://purl.org/dc/elements/1.1/rights \"Peter Robinson, Sebastian Koehler, The Human Phenotype Ontology Consortium, and The Monarch Initiative\" xsd:string"                                                    
[20] "property_value: http://purl.org/dc/elements/1.1/subject \"Phenotypic abnormalities encountered in human disease\" xsd:string"                                                                                                    
[21] "property_value: http://purl.org/dc/elements/1.1/title \"Human Phenotype Ontology\" xsd:string"                                                                                                                                   
[22] "property_value: http://purl.org/dc/elements/1.1/type IAO:8000001"                                                                                                                                                                
[23] "property_value: http://purl.org/dc/terms/license https://hpo.jax.org/app/license"                                                                                                                                                
[24] "property_value: IAO:0000700 HP:0000001"                                                                                                                                                                                          
[25] "property_value: owl:versionInfo \"2023-10-09\" xsd:string"                                                                                                                                                                       
[26] "logical-definition-view-relation: has_part"

The HPO gene lists:`load_phenotype_to_genes`

This one was a little trickier. Since the data was provided as a csv, it doesn't store any attributes made to the R-based object after importing with data.table::fread. I could add it to the cached file name, or make another file logging the versions, or add an extra column where every row repeats the "version" character. But none of these seemed like great options.
Instead, i've added an extra step to load_phenotype_to_genes that imports the table, adds the "version" attributes, and then caches the obj as an RDS file. The next time the function is run, it will use the stored RDS file by default, which has the version accessible as attr(x,"verison").
Also, I added some code such that every time one of these objects is loaded into R, the version is printed to the console. Thi way, the user automatically always knows exactly which version of the data they're currently using.

> gene_data <- HPOExplorer::load_phenotype_to_genes()
Reading cached RDS file: phenotype_to_genes.txt
+ Version: v2023-10-09

Next steps

I plan to extend this approach to MultiEWCE for the functions that distributed results and prioritised gene therapy targets.

GHA: `pkgdown` site

pkgdown site has trouble bullding on GHA:
https://github.com/neurogenomics/HPOExplorer/actions/runs/4085388887/jobs/7043296532

11 Cleaning files from old site 111111111111111111111111111111111111111111111111
== Building pkgdown site =======================================================
Reading from: '/__w/HPOExplorer/HPOExplorer'
Writing to:   '/__w/HPOExplorer/HPOExplorer/docs'
-- Initialising site -----------------------------------------------------------
Copying '../../_temp/Library/pkgdown/BS3/assets/bootstrap-toc.css' to 'bootstrap-toc.css'
Copying '../../_temp/Library/pkgdown/BS3/assets/bootstrap-toc.js' to 'bootstrap-toc.js'
Copying '../../_temp/Library/pkgdown/BS3/assets/docsearch.css' to 'docsearch.css'
Copying '../../_temp/Library/pkgdown/BS3/assets/docsearch.js' to 'docsearch.js'
Copying '../../_temp/Library/pkgdown/BS3/assets/link.svg' to 'link.svg'
Copying '../../_temp/Library/pkgdown/BS3/assets/pkgdown.css' to 'pkgdown.css'
Copying '../../_temp/Library/pkgdown/BS3/assets/pkgdown.js' to 'pkgdown.js'
-- Building home ---------------------------------------------------------------
Writing 'authors.html'
Writing '404.html'
-- Building function reference -------------------------------------------------
Writing 'reference/index.html'
Reading 'man/add_ancestor.Rd'
Error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: add_ancestor.Rd.
Caused by error in `.f()`:
! Failed to parse Rd in add_ancestor.Rd
ℹ Graphics API version mismatch
Caused by error in `ragg::agg_png()`:
! Graphics API version mismatch
Backtrace:
     ▆
  1. ├─pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
  2. │ └─pkgdown::build_site(...)
  3. │   └─pkgdown:::build_site_local(...)
  4. │     └─pkgdown::build_reference(...)
  5. │       └─purrr::map(...)
  6. │         └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
  7. │           ├─purrr:::with_indexed_errors(...)
  8. │           │ └─base::withCallingHandlers(...)
  9. │           ├─purrr:::call_with_cleanup(...)
 10. │           └─pkgdown (local) .f(.x[[i]], ...)
 11. │             ├─base::withCallingHandlers(...)
 12. │             └─pkgdown:::data_reference_topic(...)
 13. │               └─pkgdown:::run_examples(...)
 14. │                 └─pkgdown:::highlight_examples(code, topic, env = env)
 15. │                   └─downlit::evaluate_and_highlight(...)
 16. │                     └─evaluate::evaluate(code, child_env(env), new_device = TRUE, output_handler = output_handler)
 17. │                       └─grDevices::dev.new()
 18. │                         ├─base::do.call(dev, a)
 19. │                         └─pkgdown (local) `<fn>`()
 20. │                           └─ragg::agg_png(..., bg = bg)
 21. └─base::.handleSimpleError(...)
 22.   └─pkgdown (local) h(simpleError(msg, call))
 23.     └─rlang::abort(msg, parent = err)
Execution halted
Error: Process completed with exit code 1.
Run actions/upload-artifact@v3
/usr/bin/docker exec  0b6462d5cd17703709d2d551cbb07538c7b6c452d78903639f8224cf7d72029e sh -c "cat /etc/*release | grep ^ID"
With the provided path, there will be 149 files uploaded
Starting artifact upload
For more detailed logs during the artifact upload process, enable step-debugging: https://docs.github.com/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging#enabling-step-debug-logging
Artifact name is valid!
Container for artifact "Linux-biocversion-devel-r-auto-results" successfully created. Starting upload of file(s)
Total file count: 149 ---- Processed file #127 (85.2%)
Total size of all the files uploaded is 4505985 bytes
File upload process has finished. Finalizing the artifact upload
Artifact has been finalized. All files have been successfully uploaded!

The raw size of all the files that were specified for upload is 6371321 bytes
The size of all the files that were uploaded is 4505985 bytes. This takes into account any gzip compression used to reduce the upload size, time and storage

Note: The size of downloaded zips can differ significantly from the reported size. For more information see: https://github.com/actions/upload-artifact#zipped-artifact-downloads 

Artifact Linux-biocversion-devel-r-auto-results has been successfully uploaded!

Add links to functions in roxygen notes

#' It may be possible to use a hash table for ggnetwork

#' It may be possible to use a hash table for \link[ggnetwork]{ggnetwork},

Add Session Info at the bottom of all README/Vignettes

Make sure you always include a Session Info report at the bottom of your README/vignettes so users will know exactly what OS/versions you ran everything on.

Explore `DOSE` package

Just came across this package. Seems relevant.
http://www.bioconductor.org/packages/release/bioc/html/DOSE.html

HPO gene annotations changed

So I just recently started getting an error:

phenotype_to_genes <- HPOExplorer::load_phenotype_to_genes()

Error in setnames(ans, col.names) :
Can't assign 7 names to a 4 column data.table

Turns out it's because HPO decided to totally change the format of their gene annotation files, which HPOExplorer directly imports from here. Did this instead of using a static file so that HPOExplorer could always be fully up-to-date.

They used to include disease-specific info:

Now they just look like this...

This is not even the same kind of data. I may have to reach out to the HPO team to figure out what happened...

Add biocViews to DESCRIPTION

Requirement for Bioconductor.
Approved terms can be found here.

I've added these:

biocViews:
    Genetics, Preprocessing, GeneTarget, SystemsBiology,
    GraphAndNetwork, Annotation, Network, Homo_sapiens

Let examples run

Exported functions should have running examples. Will need to remove \dontrun{}

Move developer notes into Issues

Note like the following should be removed from the roxygen note and documented as issues, with links to the relevant source code:

#' It may be possible to use a hash table for \link[ggnetwork]{ggnetwork},
#'  which may be more efficient than the matrix in a shiny app ?

`hpo` from`ontologyIndex` is outdated

The issue

@NathanSkene
The hpo object fromontologyIndex is outdated.
This presents issues for getting some of the metadata (eg ontology levels) and translating HPO IDs, which currently relies on the ontologyIndex object.

In the example below, I made the hpo_meta object from the CSV that comes from here, which was last updated in January:
https://bioportal.bioontology.org/ontologies/HP

library(ontologyIndex)
data("hpo")
length(hpo$id)
# 11939
length(unique(HPOExplorer::hpo_meta$HPO_ID))
# 16810

This means we end up losing 521 phenotypes in our analyses involving things like ontology levels, or the MultiEWCE::prioritise_targets, which partly relies on ontologyIndex

results <- MultiEWCE::load_example_results()
missing_phenos <- unique(results[!HPO_ID %in% hpo$id, ]$Phenotype)
length(missing_phenos)
# 521

Moving forward

ontologyIndex does not have a GitHub repo besides the read-only mirror maintained by CRAN (not the developer).
But the DESCRIPTION gives his contact details, so I'll see if he might be willing to make an update to it:
https://github.com/cran/ontologyIndex/blob/master/DESCRIPTION

In the meantime, I can look for some workaround solutions (perhaps creating my own ontology object?)

Thanks for noting the discrepancy in the number of HPO IDs @KittyMurphy

Avoid going over 80 characters in code width

This is a Bioc standard. Not essential but it makes the code easier to read.

Switch license setup

I find it easier to simply include a link to the license using a badge rather than including the file itself. This causes less problems with "non-standard files" in CRAN/Bioc checks.

Also, unless there's a particular reason you want an MIT licensed, I find GPL-3 a bit more preferable since it has slightly more protections (tho im no expert on this topic).

Don't actually run install functions in vignettes/README/code

While you def want to include documentation on how to install your package, actually running this code within your vignettes/README/code can cause problems.

```{r install, message=FALSE, warning=FALSE}
if (!require("remotes")) install.packages("remotes")
if (!require("HPOExplorer")) remotes::install_github("ovrhuman/HPOExplorer")

```{r install, eval = FALSE}
if (!require("remotes")) install.packages("remotes")
if (!require("HPOExplorer")) remotes::install_github("ovrhuman/HPOExplorer")

Add descriptions of each disease

Can't find disease descriptions beyond the name of the disease itself in any of the HPO annotation files. But this info should be in a table somewhere. Perhaps in the bulk downloads provided on each disease database: OMIM/DECIPHER/Orphanet
neurogenomics/RareDiseasePrioritisation#26

Add `@importFrom`

Add @importFrom in the roxygen notes whenever a function is called.

For example load_phenotype_to_genes uses utils::read.delim

#' @importFrom utils read.delim

If the package the func comes from is a Suggest and not a dep, instead use this function (inside the function itself) instead of @importFrom.

requireNamespace("utils")

Modify .gitignore and .Rbuildignore to be more comprehensive

Print where files are being saved to

I always find it helpful to print a message to the user exactly where a file is being saved (and what it's named) whenever applicable.

Pass all Bioc checks

Let's kick it up a notch: pass all Bioconductor checks in BiocCheck::BiocCheck()

Error, no applicable method for '@' applied to an object of class "ontology_index"

Hello,
Thank you so much for your work on this package - I have found it very useful. I noticed recently, however, that some of the functions seem to no longer be working as expected (make_phenos_dataframe and add_ont_lvl).
So, for the vignette example

hpo <- get_hpo()
ancestor <- "Neurodevelopmental delay"
phenos <- make_phenos_dataframe(hpo = hpo, 
                                ancestor = ancestor)

Returns the error in hpo@terms: no applicable method for '@' applied to an object of class "ontology_index"

The add_pheno_frequency(), add_gpt_annotations(), and load_phenotype_to_genes() all seem to work fine. Please let me know if there's any other info I can provide to help, and thank you for your time!

Session info:

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ontologyIndex_2.11 data.table_1.14.10 MultiEWCE_1.0.0    HPOExplorer_1.0.0  viridis_0.6.4      viridisLite_0.4.2 
 [7] lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5       
[13] tidyr_1.3.1        tibble_3.2.1       ggplot2_3.4.4      tidyverse_2.0.0   

loaded via a namespace (and not attached):
  [1] later_1.3.2                   bitops_1.0-7                  ggplotify_0.1.2               GeneOverlap_1.38.0           
  [5] filelock_1.0.3                lifecycle_1.0.4               KGExplorer_0.99.0             rstatix_0.7.2                
  [9] doParallel_1.0.17             lattice_0.21-8                pals_1.8                      backports_1.4.1              
 [13] magrittr_2.0.3                limma_3.58.1                  plotly_4.10.4                 yaml_2.3.8                   
 [17] remotes_2.4.2.1               httpuv_1.6.14                 HGNChelper_0.8.1              mapproj_1.2.11               
 [21] DBI_1.2.1                     RColorBrewer_1.1-3            maps_3.4.2                    abind_1.4-5                  
 [25] zlibbioc_1.48.0               GenomicRanges_1.54.1          rvest_1.0.3                   BiocGenerics_0.48.1          
 [29] RCurl_1.98-1.14               yulab.utils_0.1.4             rappdirs_0.3.3                circlize_0.4.15              
 [33] GenomeInfoDbData_1.2.11       IRanges_2.36.0                S4Vectors_0.40.2              tidytree_0.4.6               
 [37] piggyback_0.1.5               codetools_0.2-19              DelayedArray_0.28.0           xml2_1.3.5                   
 [41] tidyselect_1.2.0              shape_1.4.6                   aplot_0.2.2                   matrixStats_1.2.0            
 [45] stats4_4.3.1                  BiocFileCache_2.10.1          jsonlite_1.8.8                GetoptLong_1.0.5             
 [49] ellipsis_0.3.2                tidygraph_1.3.0               iterators_1.0.14              foreach_1.5.2                
 [53] tools_4.3.1                   treeio_1.26.0                 Rcpp_1.0.12                   glue_1.7.0                   
 [57] SparseArray_1.2.3             gridExtra_2.3                 MatrixGenerics_1.14.0         GenomeInfoDb_1.38.5          
 [61] RNOmni_1.0.1.2                withr_3.0.0                   BiocManager_1.30.22           fastmap_1.1.1                
 [65] fansi_1.0.6                   caTools_1.18.2                digest_0.6.34                 timechange_0.3.0             
 [69] R6_2.5.1                      mime_0.12                     gridGraphics_0.5-1            colorspace_2.1-0             
 [73] gtools_3.9.5                  dichromat_2.0-0.1             RSQLite_2.3.5                 utf8_1.2.4                   
 [77] generics_0.1.3                httr_1.4.7                    htmlwidgets_1.6.4             S4Arrays_1.2.0               
 [81] scatterplot3d_0.3-44          pkgconfig_2.0.3               gtable_0.3.4                  blob_1.2.4                   
 [85] ComplexHeatmap_2.16.0         SingleCellExperiment_1.24.0   XVector_0.42.0                htmltools_0.5.7              
 [89] carData_3.0-5                 clue_0.3-65                   scales_1.3.0                  Biobase_2.62.0               
 [93] png_0.1-8                     ggfun_0.1.4                   rstudioapi_0.15.0             reshape2_1.4.4               
 [97] tzdb_0.4.0                    rjson_0.2.21                  nlme_3.1-163                  curl_5.2.0                   
[101] cachem_1.0.8                  GlobalOptions_0.1.2           Polychrome_1.5.1              BiocVersion_3.18.1           
[105] KernSmooth_2.23-22            parallel_4.3.1                AnnotationDbi_1.64.1          pillar_1.9.0                 
[109] grid_4.3.1                    vctrs_0.6.5                   gplots_3.1.3                  promises_1.2.1               
[113] ggpubr_0.6.0                  car_3.1-2                     dbplyr_2.4.0                  xtable_1.8-4                 
[117] cluster_2.1.4                 orthogene_1.8.0               cli_3.6.2                     compiler_4.3.1               
[121] rlang_1.1.3                   crayon_1.5.2                  grr_0.9.5                     simona_1.0.6                 
[125] ggsignif_0.6.4                gprofiler2_0.2.2              plyr_1.8.9                    EWCE_1.11.3                  
[129] fs_1.6.3                      stringi_1.8.3                 ewceData_1.10.0               babelgene_22.9               
[133] munsell_0.5.0                 Biostrings_2.70.2             lazyeval_0.2.2                homologene_1.4.68.19.3.27    
[137] Matrix_1.6-1                  ExperimentHub_2.10.0          hms_1.1.3                     patchwork_1.2.0              
[141] bit64_4.0.5                   statmod_1.5.0                 KEGGREST_1.42.0               shiny_1.8.0                  
[145] SummarizedExperiment_1.32.0   interactiveDisplayBase_1.40.0 AnnotationHub_3.10.0          igraph_2.0.1                 
[149] broom_1.0.5                   memoise_2.0.1                 ggtree_3.10.0                 bit_4.0.5                    
[153] ape_5.7-1

Make wrapper function for phenos_dataframe step

The phenos_dataframe step in the main vignette is a bit messy and not super user friendly:

phenos = data.frame()
for (p in unique(Neuro_delay_descendants$Phenotype)) {
  id <- get_hpo_termID(phenotype = p, 
                       phenotype_to_genes = phenotype_to_genes)
  ontLvl_geneCount_ratio <- (get_ont_level(hpo = hpo,
                                           term_ids = p) + 1)/length(get_gene_list(p,phenotype_to_genes))
  description <- get_term_definition(ontologyId = id, 
                                     line_length = 10)
  phenos <- rbind(phenos,
                  data.frame("Phenotype"=p,
                             "HPO_Id"=id,
                             "ontLvl_geneCount_ratio"=ontLvl_geneCount_ratio,
                             "description"=description))
}

Would be nice to add a wrapper function that does this whole step in one clean function.

save files to `tempdir()` by default

Whenever you have a default location to store files, it's a good idea to make that default a tempdir(). This is requires by CRAN/Bioc checks, but also helps you avoid accidentally including temp files in your package build.

Export `ontologyIndex` rather than having user do this

instead of having the user import ontologyIndex, just make it an import so it's automatically loaded when HPOExplorer is loaded.

Somewhere in the Roxygene notes add:

@import ontologyIndex

Pass all CRAN checks

Make sure the package can successfully pass all CRAN build checks ("Check" under the "Build" tab in the upper right of Rstudio).

Reduce vignette size

This is a tricky one that i still figured out a perfect solution to. But when you go to build you package, it should be under 5Mb total in order to pass CRAN/Bioc checks.

One issue that comes up a lot is that everything seems fine, until you go to build your vignettes. The build process makes a folder called "doc" and you end up with a note that says;

N  checking installed package size ...
     installed size is  5.1Mb
     sub-directories of 1Mb or more:
       doc   5.0Mb

One way to improve this is to change the html output format in the Rmd yaml header.
You're currently using the default:

output: rmarkdown::html_vignette

but this format tends to produce smaller file sizes:

output:
  BiocStyle::html_document:

Note this means you'll have to add BiocStyle as a Suggest (I've done this for you).

Fix all Roxygen errors

I've gone through and fixed all Roxygen errors that come up when you run
devtools::document()

Remove `LazyData: true`

CRAN doesn't like this (gives you a Note but still passes), but Bioc doesn't allow this (gives you an error). Best to set to false or remove (i've done the latter).

The downside is you won't be able to access datasets via the HPOExplorer::dataset syntax, but there's other ways to do this I can share if it comes up.

Map disease IDs to names

Some disease IDs (the DatabaseID/LinkID column, depending on the HPO file) do not include definitions. I'll need to find a comprehensive mapping of these disease IDs for OMIM, DECIPHER and Orphanet.

Add Docker vignette

Copy and paste the vignette found here (with some minor edits).

https://github.com/neurogenomics/orthogene/blob/main/vignettes/docker.Rmd

` SSL certificate problem: unable to get local issuer certificate`

Might be an issue specific to my Linux-based Docker container, but i when i run this code in the main HPOExplorer vignette:

phenos = data.frame()
for (p in unique(Neuro_delay_descendants$Phenotype)) {
  id <- get_hpo_termID(p, phenotype_to_genes)
  ontLvl_geneCount_ratio <- (get_ont_level(hpo,p) + 1)/length(get_gene_list(p,phenotype_to_genes))
  description <- get_term_definition(id, line_length = 10)
  phenos <- rbind(phenos,
                  data.frame("Phenotype"=p,
                             "HPO_Id"=id,
                             "ontLvl_geneCount_ratio"=ontLvl_geneCount_ratio,
                             "description"=description))

I get the error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  SSL certificate problem: unable to get local issuer certificate

Escape html characters in roxygen notes

#' @param as_dataframe can return matrix or df <bool>

#' @param as_dataframe can return matrix or df \<bool\>

 hierarchy <- get_relative_ont_level_multiple(adjacency, hpo)

 hierarchy <- get_relative_ont_level_multiple(phenoAdj = adjacency,
                                                 hpo = hpo)

` error:0A000152:SSL routines::unsafe legacy renegotiation disabled`

1. Bug description

Package: HPOExplorer

Error produced when downloading medium-sized (65Mb) resource files from a remote site.
Unclear why this causes an error.

error:0A000152:SSL routines::unsafe legacy renegotiation disabled

Console output

https://github.com/neurogenomics/HPOExplorer/actions/runs/3893687320/jobs/6646626855#step:4:7338