hildebra / rarefaction Goto Github PK
View Code? Open in Web Editor NEWRarefaction scripts
License: GNU General Public License v2.0
Rarefaction scripts
License: GNU General Public License v2.0
Hi!
Great implementation, its very fast. But I only need to rarefy and not calculate diversity and richness measures. Is it possible to ONLY rarefy and only return the rarefied count matrix in R?
I tried to fiddle a bit with the R code, but it seems I have to go into the C++ code to be able to perform rarefaction only, and I foresee I would spend a lot of time on that as I'm not a big expert in C++.
Thanks in advance
Currently rarefying to the min(rowSums(data))
using vegan and rarefaction produces no warning in vegan, meaning rarefaction works for all samples, but a column is lost in the rarefaction package when using the same value.
Here the minimal rarefaction size seems to be currently min(rowSums(data)-1
, at least for one test I ran.
This behaviour seems to be a bug.
Hi,
I am able to run rtk with 16S rRNA OTU abundance as follows:
/Users/sen/git_repos/rtk/rtk rarefaction \
-i ~/exp/ynp/results/plt_CRAnoxy/Anoxy.s3 \
-o ~/exp/ynp/results/plt_CRAnoxy/rtk_CR.tbl.1000 \
-r 1000 -w 1 -t 8;
The above works as intended and I get the output tables for alpha diversity measurements.
I want to plot the alpha diversity measures in R. So instead, I thought of using the rtk library provided for R and I used the same CSV input file.
library(rtk)
d <- rtk(input='../results/plt_CRAnoxy/Anoxy.s3', ReturnMatrix=1)
And d
shows:
> d
$divvs
list()
$raremat
$raremat[[1]]
<0 x 0 matrix>
$ICE
$ICE[[1]]
numeric(0)
$ACE
$ACE[[1]]
numeric(0)
$chao2
$chao2[[1]]
numeric(0)
$skipped
character(0)
$div.median
$div.median$median.richness
NULL
$div.median$median.shannon
NULL
$div.median$median.simpson
NULL
$div.median$median.invsimpson
NULL
$div.median$median.chao1
NULL
$div.median$median.eveness
NULL
$depths
[1] 0
$repeats
[1] 10
attr(,"class")
The data in the CSV file looks like this:
628D 629B 629F 629H 630B 630C 721C 723ABG 723AZ 724A 724B 725AB 725AO
Otu00001 51 2574 2955 7176 1573 22 11 1 0 23 971 1496 18047
Otu00002 1 3626 4971 3175 2044 19 2231 0 0 13 2 472 1119
Otu00003 1 10 3 3 4242 69 7968 0 0 56 1 3140 256
Otu00004 110 3217 307 613 2965 81 8978 2 0 119 61 4212 1050
Otu00005 1 4 3 1 2732 20 6631 1 0 81 0 3459 218
Otu00006 0 1178 13 7 67 2 139 2 0 151 0 3769 12176
Any pointers on how to import the tabular data generated by the C++ binary in R as rtk object for plotting the results are appreciated.
Thank you in advance,
Sen
Before releasing the package a sample dataset could be included to allow the user to quickly test the setup.
Greets Devs,
Thanks as ever for the tk. If my samples vary around a median depth of 50,000, and I rarefy along seq(1,000, 50,000, 10,000), RTK drops, at all rarefaction depths, all samples that cannot be rarefied to the largest value, even if the sample has enough reads to be rarefied at the lower specified depths. In the above case of rarefying to the median, it would drop half of all samples from all stages/steps of the rarefaction.
Possibly it would make more sense to only drop samples as necessary? e.g. from the above example, drop a sample of 45K reads only at the final rarefaction step of 50,000, and not drop it for 11k, 21K, 31K, 41K as in the current implementation.
I think this might even be the intent as a warning is issued at each rarefaction depth to notify the user, rather than just once.
Interested to hear if this is by design etc.
As of the commit 3e4dd9b the matrix is transposed by default, producing a similar output to vegan.
Input:
OUT | Reihe 1 | Reihe 2 | Reihe 3 |
---|---|---|---|
Zeile 1 | 100 | 2 | 3 |
Zeile 2 | 4 | 100 | 6 |
Zeile 3 | 70 | 2 | 500 |
Zeile 4 | 10 | 100 | 5 |
Vegan produces:
Reihe.1 | Reihe.2 | Reihe.3 | |
---|---|---|---|
Zeile 1 | 20 | 0 | 0 |
Zeile 2 | 0 | 18 | 2 |
Zeile 3 | 3 | 0 | 17 |
Zeile 4 | 0 | 18 | 2 |
Rarefaction used to produce (and still does with margin = 1 as option):
Zeile 1 | Zeile 2 | Zeile 3 | Zeile 4 | |
---|---|---|---|---|
Reihe 1 | 11 | 0 | 8 | 1 |
Reihe 2 | 0 | 7 | 0 | 13 |
Reihe 3 | 0 | 0 | 20 | 0 |
Now it has default margin=2:
Reihe 1 | Reihe 2 | Reihe 3 | |
---|---|---|---|
Zeile 1 | 19 | 1 | 0 |
Zeile 2 | 0 | 19 | 1 |
Zeile 3 | 4 | 0 | 16 |
Zeile 4 | 3 | 16 | 1 |
And thus is similar to vegan in this aspect.
Hi, replicability is important and it would great if rtk
users could input their own pseudo-randomisation seed values. Doing so would allow different collaborators or different runs of the same pipeline to produce exactly the same rarefaction results, eliminating one source of variability.
I did not try it yet, but I assume that setting a literal value on the line 552 of IO.cpp
would do the trick:
// IO.cpp:552
unsigned long long seed = (unsigned long long)chrono::high_resolution_clock::now().time_since_epoch().count();
A command line option such as --seed INT
would be ideal.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.