GithubHelp home page GithubHelp logo

Checking for ratio values about qurro HOT 18 CLOSED

pavlo888 avatar pavlo888 commented on August 16, 2024
Checking for ratio values

from qurro.

Comments (18)

gibsramen avatar gibsramen commented on August 16, 2024 1

Hi @pavlo888

Thanks for using Qurro! To clarify, you are interested in the numerator and denominator values of a given log-ratio, correct? In this case, extracting these values may depend on how you've selected the features. If you searched by taxonomy then you should be able to use the qarcoal command which returns the numerator and denominator sums.

If you are selecting features a different way (e.g. autoselection, manual, etc.), to my knowledge there is no way to extract these sums directly from Qurro. What you could do is download the selected features using the "Export Selected Features" option and then calculate the numerator and denominator yourself.

The formula Qurro uses to calculate log-ratios is:

image

so you can just calculate the sum of numerator features as well as the sum of denominator features.

When @fedarko gets to this he may also have some advice 😄 .

from qurro.

fedarko avatar fedarko commented on August 16, 2024 1

There are a couple of ways of doing this.

Option 1. All you want to know is the "raw" ratios of two genera (and you don't care about the individual numerator or denominator values)

You can just use Qurro to select the log-ratio normally, and then export the selected log-ratios using the Export current sample plot data button. This will give you a TSV file containing the selected log-ratio for each sample. You can load this in Python as a Pandas DataFrame or something (see here for an example of using pd.read_csv() on this), and then you can create a new column, raw_ratio, as follows:

import math
# You may need to filter out samples with a NaN or null log-ratio first
sample_plot_data["raw_ratio"] = math.e**(sample_plot_data["Current_Natural_Log_Ratio"])

This is possible because Qurro computes log-ratios, as @gibsramen mentioned above, by just taking ln(N) - ln(D) (or equivalently ln(N/D)). Taking e**(ln(N/D)) leaves you with just N/D.

Option 2. You want to know the actual numerator and denominator values for each sample (i.e. each "half" of the ratio)

You can use Qarcoal for this. You can run Qarcoal with the --p-num-string and --p-denom-string parameters containing text unique to the genera you want to select -- e.g. something like --p-num-string "g__Bacteroides;" as shown here. This will give you a QZA which contains the summed numerator and denominator abundances for each sample -- it's basically a fancy version of the TSV file we worked with in "Option 1" above.

You can load this QZA into a pandas DataFrame in Python as shown here, and at this point you'll already have the numerator and denominator sum information in the Num_Sum and Denom_Sum columns respectively -- to make a column of the raw ratios for each sample you can use either of the following code snippets:

Option 2.1
qarcoal_log_ratios_df["raw_ratio"] = qarcoal_log_ratios_df["Num_Sum"] / qarcoal_log_ratios_df["Denom_Sum"]
Option 2.2

Alternatively, you can also do the following (this is the way we did this with the Qurro sample plot data TSV in "Option 1"):

import math
qarcoal_log_ratios_df["raw_ratio"] = math.e**(qarcoal_log_ratios_df["log_ratio"])

In closing

All of these three ways of doing this should give you the same answer (the same "raw ratios"). You may want to try them out to verify for yourself that this is true (there may be slight precision differences, but I doubt they'll be big enough to make a difference). Hope this helps.

from qurro.

fedarko avatar fedarko commented on August 16, 2024 1

Glad that worked!

Yes, the raw_ratio column (for sample 3-8B-rep1) is really just saying that the ratio of (the sum of the numerator features) to (the sum of the denominator features) is ~5.38 for that sample. Whatever these counts "are" depends entirely on how you produced your BIOM table initially (if your data is from 16S rRNA sequencing, this was probably by denoising or OTU clustering; if your data is from shotgun metagenomic sequencing, the BIOM table's relative abundances might have been estimated by something like MetaPhlAn2; etc.) I don't know what more can be said about the raw ratios besides that -- I would be cautious against over-interpreting the raw ratios.

The reason many compositional data analysis techniques generally use log-ratios instead of just raw ratios is that logarithms symmetrize things between the numerator and denominator around 0:

3/4 = 0.75
4/3 = 1.33...

but

log(3/4) = -0.12...
log(4/3) = +0.12...

More generally, log(a/b) = -log(b/a) (assuming a > 0 and b > 0).

So, using the log-ratio (rather than just the raw ratio) gives equal weight to the numerator and denominator, making it easier to compare samples (and enabling the use of ordinary statistical tools, e.g. t-tests). To quote this paper, emphasis mine:

The starting point for any compositional analyses is a ratio transformation of the data. Ratio transformations capture the relationships between the features in the dataset and these ratios are the same whether the data are counts or proportions. Taking the logarithm of these ratios, thus log-ratios, makes the data symmetric and linearly related, and places the data in a log-ratio coordinate space (Pawlowsky-Glahn et al., 2015). Thus, we can obtain information about the log-ratio abundances of features relative to other features in the dataset, and this information is directly relatable to the environment. We cannot get information about the absolute abundances since this information is lost during the sequencing process as explained in Figure 1. However, log-ratios have the nice mathematical property that their sample space is real numbers, and this represents a major advantage for the application of standard statistical methods that have been developed for real random variables.

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @gibsramen

That's exactly what I did! I searched based on taxonomy. I see the qarcoal command is available as a python script but is it also available in the qiime2 plug-in of Qurro?

Cheers,
Pablo

from qurro.

gibsramen avatar gibsramen commented on August 16, 2024

Yes, there is a Qiime2 implementation of the qarcoal command as well. An example usage can be seen in the qarcoal example notebook and I've reproduced the example Qiime2 command below.

qiime qurro qarcoal \
    --i-table output/qiita_10422_table.biom.qza \
    --i-taxonomy ../DEICODE_sleep_apnea/input/taxonomy.qza \
    --p-num-string g__Allobaculum \
    --p-denom-string g__Coprococcus \
    --o-qarcoal-log-ratios output/allobaculum_coprococcus_log_ratios.qza

from qurro.

fedarko avatar fedarko commented on August 16, 2024

Hi @pavlo888,

Thanks for the kind comments! @gibsramen is correct -- aside from using Qarcoal to replicate a taxonomy-based selection, there isn't an "easy" way currently to extract the selected summed numerator and denominator values for each sample.

We have an open issue to allow plotting the "raw" ratios instead of the log-ratios in #178, but it sounds like what you would like are the actual numerator and denominator summed values (analogous to what Qarcoal gives you).

Would a solution where we add this information to the "Export current sample plot data" output file (say, by making each sample have a Num_Sum and Denom_Sum column, to be consistent with Qarcoal) be good for you? I don't think implementing this change should take a large amount of time, but there will be a few inconveniences due to the way Qurro's JavaScript code stores log-ratio information (and things are a bit busy now) so I can't guarantee this would be ready any time soon.

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @fedarko,

Indeed I am looking for the raw ratios. I am interested in knowing the actual ratios of two specific genera.

Can that be done with Qarcoal? I have already run it but I am not sure how to interpret the output.

Could you please help me out with this?

Cheers,
Pablo

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @fedarko

Thank you for your reply. I think that the raw ratios obtained with option 2.1 is the output I am looking for. However, I am not very familiar with Python. I have tried running these commands (https://nbviewer.jupyter.org/github/biocore/qurro/blob/master/example_notebooks/qarcoal/qarcoal_example.ipynb#1.B.-Run-Qarcoal!) on Spyder but I get some errors.

Could you point me out to a platform where I can easily run the commands suggested?

Cheers,
Pablo

from qurro.

fedarko avatar fedarko commented on August 16, 2024

I haven't used Spyder, but the necessary code to extract the raw ratios should be runnable through any Python interface (python, ipython, Jupter Notebooks, etc.). For not-large tasks like this I normally just use ipython from the terminal, but I'm sure it's possible to run this code through an IDE like Spyder also (as long as it's hooked up to your QIIME 2 conda environment, so you can do stuff like from qiime 2 import Artifact without errors).

First off, what sort of error(s) are you getting? If you wouldn't mind copying them here, this would help us figure out where things are going wrong (and it'll help people coming here from Google or whatever who might have the same problem).

Here's what I think the code to get "raw" ratios from Qarcoal output would look like, in some more detail. This should be run from within a QIIME 2 conda environment.

import pandas as pd
from qiime2 import Artifact

# Load the output QIIME 2 artifact from Qarcoal
qarcoal_log_ratios = Artifact.load("your_qarcoal_output.qza")

# Convert the artifact to a pandas DataFrame
qarcoal_log_ratios_df = qarcoal_log_ratios.view(pd.DataFrame)

# Make a new column in the DataFrame, "raw_ratio"
qarcoal_log_ratios_df["raw_ratio"] = qarcoal_log_ratios_df["Num_Sum"] / qarcoal_log_ratios_df["Denom_Sum"]

# Save the Qarcoal output (including the raw_ratio column we just added) to a TSV file
qarcoal_log_ratios_df.to_csv("raw_ratio_info.tsv", sep="\t")

This should accomplish what you want, I think. Let us know if this works!

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @fedarko,

It worked perfectly!!!! I opened ipython on the terminal while having the qiime2 Conda environment active and I followed your script and it worked great! Thanks a lot!

Now, for the interpretation I just wanna make sure I am doing it right.
The raw ratio column is telling me that there is 5 times more counts (based on reads?) of the Num_Sum than counts in the Denom_Sum?

Or what is the correct wording for this type of output?

Thank you in advance for your amazing support!

Cheers,
Pablo
Screenshot 2020-06-21 at 23 47 19

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @fedarko

Thanks a lot for the great insight and explanation. From what you have mentioned above, I assume it would be more advisble if I discuss this kind of results based on the log ratio, right?

Cheers,
Pablo

from qurro.

fedarko avatar fedarko commented on August 16, 2024

In general, yes, I would suggest discussing log-ratios rather than just raw ratios. It might seem less intuitive at first, but (in my opinion) the advantages outweigh the disadvantages.

If you'd like further background on log vs. non-log ratios, you might want to check out this issue thread and/or Modeling and Analysis of Compositional Data (link), page 14.

from qurro.

fedarko avatar fedarko commented on August 16, 2024

I'm going to close this for now, but please feel free to open a new issue if you have any other questions.

Best,
Marcus

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @fedarko,

I was wondering if you have any experience on plotting the results from qurro in a PCA plot? I have a dataframe looking like this:
Screenshot from 2020-09-26 22-08-21

Any idea? I tried using ggbiplot on R but I cannot make it work.

Thank you in advance.

Cheers,
Pablo

from qurro.

fedarko avatar fedarko commented on August 16, 2024

Not super sure what you mean; what's your goal with this analysis? I am unclear on how you'd go from Qurro's results (log-ratios) to PCA. The more common use case would be going the opposite way, i.e. using the feature loadings in a PCA biplot (which I think is what was shown in the rank plot in your first post in this thread?) as input to Qurro to guide the selection of log-ratios that differ across sample types.

I guess you could do something like select two log-ratios and then use those as the axes in a scatterplot, which would probably look kinda like a PCA, but I'm not sure that would be more meaningful than just showing two box (or jitter/violin) plots of the different log-ratios.

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @fedarko,

Yes indeed. That is exactly what I got. Apologies for not explaining myself clearly. I took the axes from two log ratios and put them in a scatterplot in order to obtain a PCA. My main goal is to shown a bit clearer and in a summarized way the families that are coupled with the differential log ratios. Do you think this would be a good approach?

Cheers,
Pablo

from qurro.

fedarko avatar fedarko commented on August 16, 2024

I guess that could be useful -- I remember similar scatterplots of two log-ratios were talked about by @mortonjt in the context of microbe-metabolite datasets (biocore/mmvec#76) a while back. Although I don't think that a scatterplot between two log-ratios can be called a PCA -- it would just be a normal scatterplot (although it would be interpretable, kind of, as a basic form of "dimensionality reduction").

It should be possible to plot the scatterplot in pretty much any plotting software (ggplot / matplotlib / etc.), I think?

Looking at the data you posted earlier, I am a bit confused: it seems like you have coordinates defined for features, not for samples. If you want to make a scatterplot in the way described above, I think the way to do that (assuming you want to use Qurro to select the log-ratios) is to select one log-ratio in Qurro, export it using the Export current sample plot data button, then select another log-ratio and export it again in the same way. (You'll probably need to rename the Current_Natural_Log_Ratio column within each file to distinguish the two log-ratios.) Once that's done, you should be able to merge the files and load them into R / Python / Excel / etc. for visualization. Does that make sense?

from qurro.

pavlo888 avatar pavlo888 commented on August 16, 2024

Hi @fedarko,

I am a bit confused by the last thing you suggested, but I have built the "PCA" and it looks like this:
qurro_PCA

Personally, for me this would work.

Thanks a lot for your help!

from qurro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.