import_qiime(): Add GreenGenes and other alternative ref seq database options. Greengenes in particular is very popular and should be supported alongside the RDP reference that QIIME uses by default.
For an example, a large jagged table of OTU-ID's and their associated taxonomic assignment is available at:
http://greengenes.lbl.gov/Download/OTUs/gg_otus_6oct2010/taxonomies/otu_id_to_greengenes.txt
Here is an example line from that file:
300253 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Oscillospira;s__
The only white space appears to be separating the OTU-ID from the taxonomy. The taxonomy is semicolon-delimited, with a three-character prefix indicating the taxonomic assignment.
Currently, this file appears to be properly read by import_qiime(), but the following things would improve the behavio:
(1) Prefixes should be used in filling the taxonomyTable to make sure assignments go in the correct column. This is useful to enforce consistency of taxonomic rank labels.
(2) The prefixes should be removed from the label after they are used. The rank is already stored as the column header.
(3) The GreenGenes taxonomy leaves a "N__" when no information is included for a particular rank. This should actually be an NA in the taxonomyTable in R. Otherwise there might be some unequal treatment of missing information.
IMPLEMENTATION:
Ideally, there is one additional option in import_qiime() that would be passed along to the internal OTU/tax importer. The default could remain the RDP file structure.