GithubHelp home page GithubHelp logo

ropensci / fastmatmr Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 2.44 MB

R bindings to fast_matrix_market for reading and writing .mtx files

Home Page: https://docs.ropensci.org/fastMatMR/

License: Other

R 25.77% C++ 74.21% C 0.02%
cpp17 matrix-market matrix-market-format r-package r-cpp11

fastmatmr's Introduction

fastMatMR

CRAN_Status_Badge Status at rOpenSci Software Peer Review Lifecycle: stable runiverse-name runiverse-package DOI R-CMD-check pkgcheck

About

fastMatMR provides R bindings for reading and writing to Matrix Market files using the high-performance fast_matrix_market C++ library (version 1.7.4).

Why?

Matrix Market files are crucial to much of the data-science ecosystem. The fastMatMR package focuses on high-performance read and write operations for Matrix Market files, serving as a key tool for data extraction in computational and data science pipelines.

The target audience and scientific applications primarily include data scientists or researchers developing numerical methods who may wish to either test standard NIST (National Institute of Standards and Technology) which include:

comparative studies of algorithms for numerical linear algebra, featuring nearly 500 sparse matrices from a variety of applications, as well as matrix generation tools and services.

Additionally, being able to use the matrix market file format, means it is easier to interface R analysis with those in Python (e.g. SciPy uses the same underlying C++ library). These files can also be used with the Tensor Algebra Compiler (TACO).

Features

  • Extended Support: fastMatMR supports standard R vectors, matrices, as well as Matrix sparse objects.

  • Performance: The package is a thin wrapper around one of the fastest C++ libraries for reading and writing .mtx files.

  • Correctness: Unlike Matrix, roundtripping with NA and NaN values works by coercing to NaN instead of to arbitrarily high numbers.

We have vignettes for both read and write operations to demonstrate the performance claims.

Alternatives and statement of need

  • The Matrix package allows reading and writing sparse matrices in the .mtx (matrix market) format.
    • However, for .mtx files, it can only handles sparse matrices for writing and reading.
    • Round-tripping (writing and subsequently reading) data with NA and NaN values produces arbitrarily high numbers instead of preserving NaN / handling NA

Installation

CRAN

For the latest CRAN version:

install.packages("fastMatMR")

R-Universe

For the latest development version of fastMatMR:

install.packages("fastMatMR",
                 repos = "https://ropensci.r-universe.dev")

Development Git

For the latest commit, one can use:

# install.packages("devtools")
devtools::install_github("ropensci/fastMatMR")

Quick Example

library(fastMatMR)
spmat <- Matrix::Matrix(c(1, 0, 3, 2), nrow = 2, sparse = TRUE)
write_fmm(spmat, "sparse.mtx")
fmm_to_sparse_Matrix("sparse.mtx")

The resulting .mtx file is language agnostic, and can even be read back in python as an example:

pip install fast_matrix_market
python -c 'import fast_matrix_market as fmm; print(fmm.read_array_or_coo("sparse.mtx"))'
((array([1., 3., 2.]), (array([0, 0, 1], dtype=int32), array([0, 1, 1], dtype=int32))), (2, 2))
python -c 'import fast_matrix_market as fmm; print(fmm.read_array("sparse.mtx"))'
array([[1., 3.],
       [0., 2.]])

Similarly, fastMatMR supports writing and reading from other R objects (e.g. standard R vectors and matrices), as seen in the getting started vignette.

Contributing

Contributions are very welcome. Please see the Contribution Guide and our Code of Conduct.

License

This project is licensed under the MIT License.

Logo

The logo was generated via a non-commercial use prompt on hotpot.ai, both blue, and green, as a riff on the NIST Matrix Market logo. The text was added in a presentation software (WPS Presentation). Hexagonal cropping was accomplished in a hexb compatible design using hexsticker.

fastmatmr's People

Contributors

czeildi avatar haozeke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fastmatmr's Issues

TST: Refactor unit tests

the readability of read unit tests could be improved if all tests are self contained and independent of each other: e.g. you can run any test_that block in itself and it is a working test (no setup code before test_that blocks ideally)

I think there is no need to add library(testthat) to test files

consistency of indentation styles could be a bit improved, consider running styler::style_pkg, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34

First noted here: ropensci/software-review#606 (comment).

DOC: Reword `readme`

readme: "Unlike the Matrix package" : maybe rephrase? it is a bit off-putting for me to start the whole readme with a negative comparison, although
I understand that a core part of the motivation for this package is that Matrix does not support everything and not fast enough. Even changing the order of the two parts of the sentence would feel better for me but this is really personal

First noted here: ropensci/software-review#606 (comment).

ENH: Better named functions

consider increased consistency of function names? e.g fmm_to_sparseMatrix vs sparse_to_fmm. sparse_to_fmm was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, while sparse_matrix is not. A vague suggestion: possibly follow the pattern of fmm_write_vector, fmm_read_to_vector for all formats?

First noted here: ropensci/software-review#606 (comment).

This is a breaking change, so I'm not too sure, but then again, better now than any other time. I'm OK with fmm myself, since its the package name, but yeah consistency is a good target.

ENH: Handle `~` or be more explicit

it seems the lib cannot handle paths with ~ (?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed
to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful.
(I am guessing the issue is related to ~ in path because write_fmm(vec, "~/vector.mtx") returned false for me while write_fmm(vec, "/home/ildi_home/vector.mtx") worked)

First noted here: ropensci/software-review#606 (comment).

Generally, since R needs to work on Windows as well, supporting ~ will be hard, but a better error message would be nicer.

ENH: Use Ryu

Or another floating point library. As is done upstream.

Stalling when reading sparse matrices W/O loading Matrix

Thank you for a fantastic tool! It has really sped up my day-to-day work when combining R and Python.

If I do

library(fastMatMR)
fmm_to_sparse_Matrix("/path/to/my/sparse/matrix.mtx")

it stalls. It seems to read in the matrix (activity on multiple CPUs, memory increases), but when it tries to write it as sparse in memory nothing happens (one core at 100%, no change in memory).

But, if I do

library(Matrix)
library(fastMatMR)
fmm_to_sparse_Matrix("/path/to/my/sparse/matrix.mtx")

It finishes in seconds, as expected.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fastMatMR_1.0.0.0 Matrix_1.6-1.1   

loaded via a namespace (and not attached):
[1] compiler_4.3.1 tools_4.3.1    grid_4.3.1     lattice_0.21-8

MAINT: Get `R CMD check` to give less notes

I see that R CMD check seem to give a note about pixi files, it seems pixi.* might not
be the correct syntax in .Rbuildignore? (I do not know what the correct syntax would be,
but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works)

First noted here: ropensci/software-review#606 (comment).

pkgcheck results - main

Checks for fastMatMR (v0.0.1.0)

git hash: 7c7b96af

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 90.5%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Package License: MIT + file LICENSE

MAINT: More scoped `nolint` rules

this is really overly pedantic, but you could consider specifying which rule to ignore in the #nolint comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rules

First noted here: ropensci/software-review#606 (comment).

CI: Rename workflow

pre commit github workflow name is a bit misleading to me, I understand you could add the same functionality as a pre commit hook, but a github action is a post commit/push thing, maybe title it after what it does, e.g. style checking?

First noted here: ropensci/software-review#606 (comment).

DOC: Make `fmm_to*` runnable

I understand that the fmm_to_* functions need an mtx file to work, and I guess it’s therefore the examples with these functions are marked with \dontrun{}. However, wouldn’t it possible to define a character vector in R that contains the required input, and then feed this to fmm_to_* in order to have an example that can actually be run?

First noted here: ropensci/software-review#606 (comment)

DOC: Function help details are incorrect

In several function documentation you write "This function has no return value." This is technically not true, all functions return NULL if they have no other return value but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions?

First noted here: ropensci/software-review#606 (comment).

MAINT: Acknowledge reviewers

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Make these changes before final changes!

TST: Add some

Both round tripping through R and also the python integration tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.