GithubHelp home page GithubHelp logo

rarefaction's People

Contributors

apduncan avatar hifa avatar hildebra avatar openpaul avatar telatin avatar vmikk avatar yazgoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rarefaction's Issues

Rarefy only

Hi!

Great implementation, its very fast. But I only need to rarefy and not calculate diversity and richness measures. Is it possible to ONLY rarefy and only return the rarefied count matrix in R?
I tried to fiddle a bit with the R code, but it seems I have to go into the C++ code to be able to perform rarefaction only, and I foresee I would spend a lot of time on that as I'm not a big expert in C++.

Thanks in advance

Rarefaction to same Depth, different behaviour between vegan and rarefaction

Currently rarefying to the min(rowSums(data)) using vegan and rarefaction produces no warning in vegan, meaning rarefaction works for all samples, but a column is lost in the rarefaction package when using the same value.

Here the minimal rarefaction size seems to be currently min(rowSums(data)-1, at least for one test I ran.

This behaviour seems to be a bug.

C++ binary works with input TSV but R library does not generate desired output

Hi,

I am able to run rtk with 16S rRNA OTU abundance as follows:

/Users/sen/git_repos/rtk/rtk rarefaction \
 -i ~/exp/ynp/results/plt_CRAnoxy/Anoxy.s3 \
 -o ~/exp/ynp/results/plt_CRAnoxy/rtk_CR.tbl.1000 \
 -r 1000 -w 1 -t 8;

The above works as intended and I get the output tables for alpha diversity measurements.

I want to plot the alpha diversity measures in R. So instead, I thought of using the rtk library provided for R and I used the same CSV input file.

library(rtk)
d <- rtk(input='../results/plt_CRAnoxy/Anoxy.s3', ReturnMatrix=1)

And d shows:

> d
$divvs
list()

$raremat
$raremat[[1]]
<0 x 0 matrix>


$ICE
$ICE[[1]]
numeric(0)


$ACE
$ACE[[1]]
numeric(0)


$chao2
$chao2[[1]]
numeric(0)


$skipped
character(0)

$div.median
$div.median$median.richness
NULL

$div.median$median.shannon
NULL

$div.median$median.simpson
NULL

$div.median$median.invsimpson
NULL

$div.median$median.chao1
NULL

$div.median$median.eveness
NULL


$depths
[1] 0

$repeats
[1] 10

attr(,"class")

The data in the CSV file looks like this:

           628D 629B 629F 629H 630B 630C 721C 723ABG 723AZ 724A 724B 725AB 725AO
Otu00001   51 2574 2955 7176 1573   22   11      1     0   23  971  1496 18047
Otu00002    1 3626 4971 3175 2044   19 2231      0     0   13    2   472  1119
Otu00003    1   10    3    3 4242   69 7968      0     0   56    1  3140   256
Otu00004  110 3217  307  613 2965   81 8978      2     0  119   61  4212  1050
Otu00005    1    4    3    1 2732   20 6631      1     0   81    0  3459   218
Otu00006    0 1178   13    7   67    2  139      2     0  151    0  3769 12176

Any pointers on how to import the tabular data generated by the C++ binary in R as rtk object for plotting the results are appreciated.

Thank you in advance,
Sen

Supply example dataset

Before releasing the package a sample dataset could be included to allow the user to quickly test the setup.

Skip samples only as necessary

Greets Devs,

Thanks as ever for the tk. If my samples vary around a median depth of 50,000, and I rarefy along seq(1,000, 50,000, 10,000), RTK drops, at all rarefaction depths, all samples that cannot be rarefied to the largest value, even if the sample has enough reads to be rarefied at the lower specified depths. In the above case of rarefying to the median, it would drop half of all samples from all stages/steps of the rarefaction.

Possibly it would make more sense to only drop samples as necessary? e.g. from the above example, drop a sample of 45K reads only at the final rarefaction step of 50,000, and not drop it for 11k, 21K, 31K, 41K as in the current implementation.

I think this might even be the intent as a warning is issued at each rarefaction depth to notify the user, rather than just once.

Interested to hear if this is by design etc.

Transpose matrix

As of the commit 3e4dd9b the matrix is transposed by default, producing a similar output to vegan.

Input:

OUT Reihe 1 Reihe 2 Reihe 3
Zeile 1 100 2 3
Zeile 2 4 100 6
Zeile 3 70 2 500
Zeile 4 10 100 5

Vegan produces:

Reihe.1 Reihe.2 Reihe.3
Zeile 1 20 0 0
Zeile 2 0 18 2
Zeile 3 3 0 17
Zeile 4 0 18 2

Rarefaction used to produce (and still does with margin = 1 as option):

Zeile 1 Zeile 2 Zeile 3 Zeile 4
Reihe 1 11 0 8 1
Reihe 2 0 7 0 13
Reihe 3 0 0 20 0

Now it has default margin=2:

Reihe 1 Reihe 2 Reihe 3
Zeile 1 19 1 0
Zeile 2 0 19 1
Zeile 3 4 0 16
Zeile 4 3 16 1

And thus is similar to vegan in this aspect.

allow users to select their own seed value

Hi, replicability is important and it would great if rtk users could input their own pseudo-randomisation seed values. Doing so would allow different collaborators or different runs of the same pipeline to produce exactly the same rarefaction results, eliminating one source of variability.

I did not try it yet, but I assume that setting a literal value on the line 552 of IO.cpp would do the trick:

// IO.cpp:552
unsigned long long seed = (unsigned long long)chrono::high_resolution_clock::now().time_since_epoch().count();

A command line option such as --seed INT would be ideal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.