hildebra / rarefaction Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 13.0 10.62 MB

Rarefaction scripts

License: GNU General Public License v2.0

C++ 79.56% Makefile 0.49% R 19.69% C 0.27%

rarefaction's People

Contributors

Stargazers

Watchers

Forkers

alenzhao ncgrp tankmermaid vmikk aboffin biofuture kasperskytte haithamsghaier telatin ozkurt liusoil liuchen92 apduncan

rarefaction's Issues

Rarefy only

Hi!

Great implementation, its very fast. But I only need to rarefy and not calculate diversity and richness measures. Is it possible to ONLY rarefy and only return the rarefied count matrix in R?
I tried to fiddle a bit with the R code, but it seems I have to go into the C++ code to be able to perform rarefaction only, and I foresee I would spend a lot of time on that as I'm not a big expert in C++.

Thanks in advance

Rarefaction to same Depth, different behaviour between vegan and rarefaction

Currently rarefying to the min(rowSums(data)) using vegan and rarefaction produces no warning in vegan, meaning rarefaction works for all samples, but a column is lost in the rarefaction package when using the same value.

Here the minimal rarefaction size seems to be currently min(rowSums(data)-1, at least for one test I ran.

This behaviour seems to be a bug.

C++ binary works with input TSV but R library does not generate desired output

Hi,

I am able to run rtk with 16S rRNA OTU abundance as follows:

/Users/sen/git_repos/rtk/rtk rarefaction \
 -i ~/exp/ynp/results/plt_CRAnoxy/Anoxy.s3 \
 -o ~/exp/ynp/results/plt_CRAnoxy/rtk_CR.tbl.1000 \
 -r 1000 -w 1 -t 8;

The above works as intended and I get the output tables for alpha diversity measurements.

I want to plot the alpha diversity measures in R. So instead, I thought of using the rtk library provided for R and I used the same CSV input file.

library(rtk)
d <- rtk(input='../results/plt_CRAnoxy/Anoxy.s3', ReturnMatrix=1)

And d shows:

> d
$divvs
list()

$raremat
$raremat[[1]]
<0 x 0 matrix>


$ICE
$ICE[[1]]
numeric(0)


$ACE
$ACE[[1]]
numeric(0)


$chao2
$chao2[[1]]
numeric(0)


$skipped
character(0)

$div.median
$div.median$median.richness
NULL

$div.median$median.shannon
NULL

$div.median$median.simpson
NULL

$div.median$median.invsimpson
NULL

$div.median$median.chao1
NULL

$div.median$median.eveness
NULL


$depths
[1] 0

$repeats
[1] 10

attr(,"class")

The data in the CSV file looks like this:

           628D 629B 629F 629H 630B 630C 721C 723ABG 723AZ 724A 724B 725AB 725AO
Otu00001   51 2574 2955 7176 1573   22   11      1     0   23  971  1496 18047
Otu00002    1 3626 4971 3175 2044   19 2231      0     0   13    2   472  1119
Otu00003    1   10    3    3 4242   69 7968      0     0   56    1  3140   256
Otu00004  110 3217  307  613 2965   81 8978      2     0  119   61  4212  1050
Otu00005    1    4    3    1 2732   20 6631      1     0   81    0  3459   218
Otu00006    0 1178   13    7   67    2  139      2     0  151    0  3769 12176

Any pointers on how to import the tabular data generated by the C++ binary in R as rtk object for plotting the results are appreciated.

Thank you in advance,
Sen

Supply example dataset

Before releasing the package a sample dataset could be included to allow the user to quickly test the setup.

Skip samples only as necessary

Greets Devs,

Thanks as ever for the tk. If my samples vary around a median depth of 50,000, and I rarefy along seq(1,000, 50,000, 10,000), RTK drops, at all rarefaction depths, all samples that cannot be rarefied to the largest value, even if the sample has enough reads to be rarefied at the lower specified depths. In the above case of rarefying to the median, it would drop half of all samples from all stages/steps of the rarefaction.

Possibly it would make more sense to only drop samples as necessary? e.g. from the above example, drop a sample of 45K reads only at the final rarefaction step of 50,000, and not drop it for 11k, 21K, 31K, 41K as in the current implementation.

I think this might even be the intent as a warning is issued at each rarefaction depth to notify the user, rather than just once.

Interested to hear if this is by design etc.

Transpose matrix

As of the commit 3e4dd9b the matrix is transposed by default, producing a similar output to vegan.

Input:

OUT	Reihe 1	Reihe 2	Reihe 3
Zeile 1	100	2	3
Zeile 2	4	100	6
Zeile 3	70	2	500
Zeile 4	10	100	5

Vegan produces:

	Reihe.1	Reihe.2	Reihe.3
Zeile 1	20	0	0
Zeile 2	0	18	2
Zeile 3	3	0	17
Zeile 4	0	18	2

Rarefaction used to produce (and still does with margin = 1 as option):

	Zeile 1	Zeile 2	Zeile 3	Zeile 4
Reihe 1	11	0	8	1
Reihe 2	0	7	0	13
Reihe 3	0	0	20	0

Now it has default margin=2:

	Reihe 1	Reihe 2	Reihe 3
Zeile 1	19	1	0
Zeile 2	0	19	1
Zeile 3	4	0	16
Zeile 4	3	16	1

And thus is similar to vegan in this aspect.

allow users to select their own seed value

Hi, replicability is important and it would great if rtk users could input their own pseudo-randomisation seed values. Doing so would allow different collaborators or different runs of the same pipeline to produce exactly the same rarefaction results, eliminating one source of variability.

I did not try it yet, but I assume that setting a literal value on the line 552 of IO.cpp would do the trick:

// IO.cpp:552
unsigned long long seed = (unsigned long long)chrono::high_resolution_clock::now().time_since_epoch().count();

A command line option such as --seed INT would be ideal.

hildebra / rarefaction Goto Github PK

rarefaction's People

Contributors

Stargazers

Watchers

Forkers

rarefaction's Issues

Rarefy only

Rarefaction to same Depth, different behaviour between vegan and rarefaction

C++ binary works with input TSV but R library does not generate desired output

Supply example dataset

Skip samples only as necessary

Transpose matrix

allow users to select their own seed value

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs