GithubHelp home page GithubHelp logo

Comments (3)

msberends avatar msberends commented on May 28, 2024

Good question. The classes that the AMR package provides are, just like all custom R classes in any package, only supported by R. This means that exporting the data to other formats than R will lose the attributes. Fortunately, this is not a big problem. Since the data were converted by the AMR package in the first place, rerunning functions like as.mo() and as.disk() will only take milliseconds. The functions are built so that they first check if all values that are to be transformed, already look like valid values.

As an example:

microbenchmark::microbenchmark(x <- as.mo(c("E. coli test",
                                            "Staph aureus kind of",
                                            "Some Str. pyogenes")),
                               times = 5)
#> Unit: seconds
#> expr       min       lq     mean   median       uq      max neval
#>    x  7.516604 7.569028 7.584643 7.576983 7.579677 7.680924     5

# `x` now consists of valid MO values:
x
#> Class <mo>
#> [1] B_ESCHR_COLI      B_STPHY_AURS_ANRB B_STRPT_PYGN     

# transform to character, as it would be in e.g. a CSV
x <- as.character(x)
x
#> [1] "B_ESCHR_COLI"      "B_STPHY_AURS_ANRB" "B_STRPT_PYGN"     

# now run again:
microbenchmark::microbenchmark(x <- as.mo(x), times = 5, unit = "s")
#> Unit: seconds
#>           expr         min          lq        mean     median          uq         max neval
#>  x <- as.mo(x) 0.004543719 0.004590081 0.004932438 0.00467622 0.004956594 0.005895575     5

The median time for this small test went from 7.6 seconds to 0.005 seconds. Also for as.rsi(), as.disk() or as.mic(), if you used any of those functions before, running it again on data where the attributes were lost will be very, very fast.

Of course it's better to prevent the need to transform variables again, which is only possible with rds. You already say that you use this, so I don't understand why attributes are lost on your system. As an example test (examples_isolates is an example data set from the AMR package that contains an <mo> column and numerous <rsi> columns):

saveRDS(example_isolates, "test.rds")
test <- readRDS("test.rds")
identical(example_isolates, test)
#> [1] TRUE

Since identical() also checks attributes, this means that the newly imported test data set is identical to the example data set, including all attributes. Exactly what one would expect from exporting to RDS and importing again. So not sure what happened in your script?

from amr.

jukkiebah avatar jukkiebah commented on May 28, 2024

thanks, i am still learning. I had switched to feather because of its increased speed with the large files that i am handling (>2Gb). It hadn't realised that RDS is within R. I thought i would also loose the attributes. This is not the case. I switched back to RDS now. thanks!

from amr.

msberends avatar msberends commented on May 28, 2024

No problem at all! We’re all still learning, that’s what’s great about new methods 😄

from amr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.