rfam / rfam-website Goto Github PK
View Code? Open in Web Editor NEWRfam website source code
Home Page: https://rfam.org
License: Apache License 2.0
Rfam website source code
Home Page: https://rfam.org
License: Apache License 2.0
Currently Rfam families are associated with GO terms but no additional metadata is captured. It would be useful to provide the following for all GO terms:
Documentation: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/README
If implemented, the information can be propagated to RNAcentral.
I tried to download a fasta alignment of motif and I got an empty file in safari. For example the fasta alignment here http://rfam.xfam.org/motif/RM00004#tabview=tab1 gives an empty file. The stockholm file looks correct.
Nagios needs a page with a standard string of characters that will only be present if the site is up and the database is available.
We need to add Genome as a fourth entity type in Rfam search in addition to Family, Clan, and Motif.
The Genome object should contain the following fields:
uniprot_reference_proteome_id
- empty if UPID not availablerfam_genome_id
- empty if UPID is availablegca_accession
- empty if GCA not availabledescription
length
- in nucleotidestaxonomy_lineage
- string like Bacteria; Firmicutes; etc
ncbi_taxonomy_id
cross reference - see comment belownum_rfam_hits
- number of significant Rfam family hitsnum_rfam_families
- number of distinct Rfam families with significant hitsThe flavivirus families have an error on the Summary web page, the wiki info is not loading.
These families have an auto_wiki identifier of 2759 which is Flavivirus_3_UTR - however, https://en.wikipedia.org/wiki/Flavivirus_3_UTR is not a valid entry.
It seems these families should be 2754 Flavivirus_3'_UTR
Will change these families to Flavivirus_3'_UTR by updating their auto_wiki ID in the database.
RF03536
RF03537
RF03538
RF03539
RF03540
RF03541
RF03542
RF03544
RF03545
RF03547
has_pseudoknot
to find families with pseudoknots0 sequences, 0 structure, 0 species
brown panel at the top of the search results pageIt seems that the file containing mapping of rfam families to pdb codes is incomplete in rfam 14.0 and 14.1.
In 14.0 (ftp://ftp.ebi.ac.uk/pub/databases/Rfam/14.0/Rfam.tar.gz) pdb code is missing:
pdb_id chain pdb_start pdb_end bit_score evalue_score cm_start cm_end hex_colour
RF00001 C 3 118 77.20 1.6e-20 1 119 0064f4
RF00001 9 1 121 77.40 1.4e-20 1 119 ebeb30
RF00001 9 1 121 77.40 1.4e-20 1 119 93c090
RF00001 B 1 121 77.40 1.4e-20 1 119 c008ae
RF00001 B 1 121 77.40 1.4e-20 1 119 8484c0
In 14.1 (ftp://ftp.ebi.ac.uk/pub/databases/Rfam/14.1/Rfam.pdb.gz) the header is missing and there is no rfam ids:
3hjw D 1 57 48.50 1.3e-11 1 70 1fc01f
3lwo D 1 57 48.50 1.3e-11 1 70 ff87a4
3lwp D 1 57 48.50 1.3e-11 1 70 ebeb30
3lwq D 1 57 48.50 1.3e-11 1 70 f29242
3lwr D 1 57 48.50 1.3e-11 1 70 8585e6
Request to bring back the average sequence length field, which will allow sorting through families based on increasing length. User feed back: It was very useful having an overview of sequence lengths in Rfam families when selecting a data set.
Add a page that lets the users browse GO and SO terms associated with Rfam families.
It looks like some PDB chains do not match the right family.
For example, chains 1A and 2A from 5DOX should match the Bacterial LSU but they match the 23S pseudoknot (RF01118).
Trying to list all the Rfam families that have pseudoknots
and 3D structures
, but at the moment it is not possible to sort by the # of 3D structures and would be very useful.
Blame @kalvari
There is a new version of R-chie - we should try to install it and switch to it on the Rfam website soon:
https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkaa708/5911741
indicate how many Rscape-significant base pairs are not shown in the displayed structure. This would be both a quality control hint (if an Rfam curated structure was messed up and missing a bunch of base pairs), plus an indication of whether a pseudoknot might be present.
include a note of how many Rfam-annotated base pairs are not shown in the Rfam structure because they're an annotated pseudoknot.
display pseudoknots. Perhaps draw a bracket over a contiguous string of nucleotides that are one side of the PK stem and lettering the two halves A and a, B and b.
The left sidebar links are broken when the site is loaded over https.
NDB is being replaced by NAKB (https://nakb.org/). The links in Rfam that point to NDB should be replaced by links to NAKB.
For example, GNRA Rfam motif links to Tetraloop on Wikipedia but the wiki content does not appear on the Rfam page.
I used web-search to search for accession AP011496.1 (RF02547 : mtPerm-5S) and the sequence matched all the families in the LSU clan apart from the 5S RNA family I took the sequence from.
The families are listed below:
RF02543 - LSU_rRNA_eukarya
RF02541 - LSU_rRNA_bacteria
RF02540 - LSU_rRNA_archaea
The website had a bug where CMs could not be downloaded. It isn't clear why but this should be fixed. We should check it post release 14.9.
num_full
field in the family table is not updatedseed
in the Sequences tabsRNA
type in RF01684 so that it's just Gene
0 species
http://rfamlive.xfam.org/search?q=GO:0039703select * from family where number_of_species = 0;
should not return any rowsPseudomonas syringae pv. tomato str. DC3000
is not in the sunburst) http://rfamlive.xfam.org/search?q=RF02749Each author name should appear on a separate line in DESC files followed by ORCID id:
AU Author AB 0000-0000-0000-0000
It seems not all HTML alignments can be viewed. One which cannot is:
https://rfam.org/family/RF04222#tabview=tab2
When trying to view the HTML alignment nothing shows up.
Rfam Curator needs to be able to add or delete a genome using UPID or GCA accession.
The websearch currently does not clan compete hits resulting in spurious results.
Use is_significant
field to filter the entries.
Fold the sequence using the CM, show the structure in Rfam and export it to RNAcentral too.
Several users pointed out that it's difficult to find where to download various family specific files, so I suggest creating a new Downloads
tab on the left hand side of Rfam family pages. It would aggregate all the downloadable items that are spread across all the other tabs, such as:
There may be other downloadable items that I missed so please click around the family pages to double check.
We can keep the existing download links as they are or drop them in favour of the new tab (this can be decided on a case by case basis).
RNA Thermometer - SO:0002168
See The-Sequence-Ontology/SO-Ontologies#404 for details.
Hi,
This fails with Ubuntu 20.04:
curl https://rfam.xfam.org
#curl: (35) error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure
but works fine with Ubuntu < 20.04 and on Windows and macOS Mojave.
This seems to happen with some websites because of a combination of three reasons: server misconfiguration, increased TLS security level in Ubuntu 20.04 by default, and a bug in OpenSSL 1.1.1. See Ensembl/ensembl-rest#427 for a similar issue with the Ensembl server.
FWIW this breaks Bioconductor package rfaRm: https://bioconductor.org/checkResults/3.12/bioc-LATEST/rfaRm/nebbiolo1-install.html
Internally the package tries to access rfam.xfam.org with the following R code:
> library(xml2)
> read_xml("https://rfam.xfam.org/clans")
Error in open.connection(x, "rb") :
error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure
> sessionInfo()
R version 4.0.2 Patched (2020-08-04 r78971)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
Matrix products: default
BLAS: /home/hpages/R/R-4.0.r78971/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.r78971/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xml2_1.3.2
loaded via a namespace (and not attached):
[1] compiler_4.0.2 curl_4.3
Thanks!
H.
Example
Species sunburst for Clostridia in RF01315 shows that there are 64 sequences:
An example SQL query confirming the number of sequences:
SELECT CONCAT(t1.rfamseq_acc, '/', seq_start, '-', `seq_end`)
FROM full_region t1, rfamseq t2, taxonomy t3
WHERE t1.rfam_acc = 'RF01315'
AND t1.rfamseq_acc = t2.rfamseq_acc
AND t2.ncbi_id = t3.ncbi_id
AND t3.tax_string LIKE '%Clostridia;%'
AND is_significant = 1
GROUP BY rfamseq_acc;
64 rows (like in sunburst UI) - note the GROUP BY clause
However, there are many more annotated regions:
SELECT CONCAT(t1.rfamseq_acc, '/', seq_start, '-', `seq_end`)
FROM full_region t1, rfamseq t2, taxonomy t3
WHERE t1.rfam_acc = 'RF01315'
AND t1.rfamseq_acc = t2.rfamseq_acc
AND t2.ncbi_id = t3.ncbi_id
AND t3.tax_string LIKE '%Clostridia;%'
AND is_significant = 1;
6222 rows - no GROUP BY clause
So the number of entries in the resulting FASTA file is inconsistent with sunburst UI.
When a user is on an Rfam page and clicks on one of the tabs on the left panel, this action redirects him/her to the Rfam home page rather than displaying the requested data on the page.
This issue only occurs when accessing Rfam using the old URL http://rfam.xfam.org. Website functionality through the new URL http://rfam.org seems normal.
Retire old URL?
Sequence download is not working correctly. The sequence extracted isn't the correct one. For example there is an inconsistency between the SEED
sequence of family RF00005
AM286415.1/4382210-4382145
GAAGGCACGACAUUGCUCACAUUGCUUCCAGUGUUUACUU-AGCCAGC---CGGG-UGCUGGCUUUUUUUU
and the sequence downloaded directly from the website:
AM286415.1/4382210-4382145
CTTCCGTGCTGTAACGAGTGTAACGAAGGTCACAAATGAATCGGTCGGCCCACGACCGAAAAAAAA
When a user submits a job using Job Dispatcher APIs, a job identifier (i.e. iprscan5-R20170926-112045-0446-74958378-es) is returned that will also include "-p1m" and "-p2m" at the end of the string. This is in addition to current values "-es", "-pg" and "-oy". The latter three will eventually disappear and another announcement will follow with proposed dates. These characters denote independent job clusters, which we use to provide resilience and failover.
If the code takes into account these last characters after the dash please take note of the change.
Examples of existing job identifiers:
iprscan5-R20170926-112045-0446-74958378-es
ncbiblast-R20170926-112324-0724-35376033-pg
emboss_needle-I20170926-112309-0557-5534481-oy
In addition to the above you should expect:
emboss_needle-I20170926-112309-0557-5534481-p2m
ncbiblast-R20170926-112324-0724-35376033-p1m
emboss_needle-I20170926-112309-0557-5534481-p2m
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.