GithubHelp home page GithubHelp logo

bigmemory's People

Contributors

adamryczkowski avatar cdeterman avatar eddelbuettel avatar jameslamb avatar jeblundell avatar kaneplusplus avatar privefl avatar rtobar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigmemory's Issues

Session crash when using two versions of bigmemory.

I have a package that depends on bigmemory. If I build it with one version of bigmemory, it is OK.
Then if I rebuild another version of bigmemory (without rebuilding my package), my package makes the session crash when I use a function.
Rebuilding the package with the current version of bigmemory solves the problem.
If someone can tell why this happens, I am interested.

TL;DR: Each time you build another version of bigmemory, you may need to rebuild your packages that depends on bigmemory if it makes your session crash.

Migrate util.h and util.cpp to Rcpp

We are currently resorting to undefing length in util.h on the Mac (and Widows?) because we are mixing R libraries with Rcpp. It would be nice if this file was migrated to using Rcpp only.

sub.big.matrix of a non-shared big.matrix?

a <- big.matrix(5, 5, shared = FALSE)
a.sub <- sub.big.matrix(a, 1, 3)
Error in DescribeBigMatrix(x) : 
  you can't describe a non-shared big.matrix.

Is it normal? Why?

Non-contiguous sub.big.matrix?

Currently we have sub.big.matrix but it explicitly states that non-contiguous subsets are not possible. It would be great if we could find a way to get this implemented before the next release.

Bigmemory gives an error in R CMD CHECK environment

From @cdeterman. The primary problem appears to be in the big.matrix.Rd file with the example

x <- big.matrix(10, 2, type='integer', init = -5)

Strangely this returns the error

Error: memory could not be allocated for instance of type big.matrix
where somehow CreateSharedMatrix is returning a null address. This is very odd as it works without a problem when run in the R console even when multiple big.matrix objects are created consecutively. The only way I have been able to reproduce the error is by rerunning R CMD CHECK or R --vanilla < bigmemory-Ex.R where bigmemory-Ex.R was previously produced by R CMD CHECK.

rbind-like function?

Hi,
I am wondering is there is rbind-like function, as it seems that rbind() can not handle big.matrix objects
Thanks
Vincent

Problem is SetElements

Fix this and add a test.

N <- 1000
K <- 26
x <- big.matrix(N, K, type="char", backingfile="toy.bin", descriptorfile="toy.desc")
dim(x)
# [1] 1000   26
# # Some interesting points, here...
options(bigmemory.typecast.warning=FALSE)
options(bigmemory.allow.dimnames=TRUE)
# # The do some basic things with the natural R syntax:
x[1:2, 1:2] <- 1:4 # Simple assignment
# Error in SetElements.bm(x, i, j, value) : 
#   number of items to replace is not a multiple of replacement length

Question about multiplyr fix

@privefl Thanks very much for the multiplyr pull request. @jeblundell has committed it and the new version of bigmemory should be on CRAN soon.

I was just reviewing the change here. Can you tell me what the problem was? I'm having trouble isolating it with a simple example:

library(bigmemory)
x <- big.matrix(5, 2, type="integer", init=0,
                dimnames=list(NULL, c("alpha", "beta")))

# I thought the following would reproduce the problem.
x[x[,1] < 3, 2]

Current state on Windows unclear

Hi Michael,

I see that a lot of work has been done since this questions has been asked last time (see #2), and I had no problems installing the package from CRAN on a virtual machine running Windows, so I am wondering if there are any remaining issues with Windows support? http://bigmemory.org still mentions that "updating bigmemory with restored support for Windows" is a work in progress.

Thanks,
Alex

Questions on dev

I'm starting to put my nose in the code of this package to

  • understand it better,
  • try to solve some problems,
  • try to add some features.

I have some questions on some choices of implementation. It will be mostly C/C++ questions as I'm more comfortable with R. They may sound "stupid" to you.

I will use this conversation to ask my questions and hope for your answers, so that I can contribute more to this package.

Ordering Columns?

Right now we currently have mpermute which will reorder rows. It seems to me we should have the option to reorder columns as well. Thoughts?

Resource leak when shared big.matrix used with multicore

library(bigmemory)
library(doMC)
registerDoMC(cores=2)
a = big.matrix(nrow=3, ncol=3, shared=TRUE)
desc = describe(a)
foreach (i=1:3) %dopar% {
  m = attach.big.matrix(desc)
  m[i,1] = i
  NULL
}

causes a shared memory resource leak. The following does not.

library(bigmemory)
library(doMC)
registerDoMC(cores=2)
a = big.matrix(nrow=3, ncol=3, shared=TRUE)
desc = describe(a)
foreach (i=1:3) %dopar% {
  m = attach.big.matrix(desc)
  m[i,1] = i
  rm(m)
  gc()
  NULL
}

What are the current limitations for windows?

I am well aware that bigmemory is not currently installable for Windows. I was wondering what are the current reasons that it is limited? What about Windows is preventing the package from being platform independent?

big.matrix alters R's RNG seeds

If I do the following sequence of functions calls multiple times, the outputs are always the same:

set.seed(123)
x <- rbinom(25, 1, 0.2)
m <- matrix(1, nrow = 5, ncol = 5)
rbinom(25, 1, 0.2)

but if I use big.matrix then the calls are different each time:

set.seed(123)
x <- rbinom(25, 1, 0.2)
m <- big.matrix(nrow = 5, ncol = 5, init = 1)
rbinom(25, 1, 0.2)

Any idea why this is happening? I suspected somewhere in the C++ code a 'random' function is called but I couldn't find anything to that effect.

Windows won't create consecutive filebacked.big.matrix objects

For some strange reason, Windows builds but errors out when you try to build a filebacked.big.matrix object a second time. This results in problems when trying to run the testing framework. The following reproduces the problem:

z <- filebacked.big.matrix(3, 3, type='integer', init=123,
                           backingfile="example.bin",
                           descriptorfile="example.desc",
                           dimnames=list(c('a','b','c'), c('d', 'e', 'f')))

z <- filebacked.big.matrix(3, 3, type='integer', init=123,
                           backingfile="example.bin",
                           descriptorfile="example.desc",
                           dimnames=list(c('a','b','c'), c('d', 'e', 'f')))

Error in CreateFileBackedBigMatrix(as.character(backingfile), as.character(backingpath),  : 
  Problem creating filebacked matrix.

So this is clearly having issues with the create function on line 2558 of bigmemory.cpp. I have no idea at the moment why this is happening as it is not reproducible on linux.

allow.duplicates not working?

From an email to bigmemoryauthors:

I've been trying to use the "bigmemory" R package but I'm having some issues when trying to remove all duplicated rows (across all columns) from a big.matrix. According to the manual I should be able to use mpermute with "allow.duplicates = FALSE", but don't think it is working. For example:

m = matrix(as.double(as.matrix(iris)), nrow=nrow(iris))
x=m[,c(1,2)]
mpermute(x, cols=c(1:ncol(x)), allow.duplicates=FALSE)

I still get duplicated lines (e.g. 134, 135 and 136).
Is it a bug? Or am I doing something wrong?

`UndefinedBehaviorSanitizer`: object of type `SharedMemoryBigMatrix` is not `BigMatrix`

Hi @kaneplusplus ,

I use library(bigmemory) in my package bigKRLS (with @rbshaffer) (and it's been highly effective, increasing what a typical laptop can do fivefold). bigKRLS also uses Rcpp and RcppArmadillo. bigKRLSuseslibrary(parallel) for the marginal effects. (bigKRLSregressesyon some matrixX. If ncol(X) = p, bigKRLSusesporNcores - 2`, whichever is less).

When I submit to CRAN, I get the following error:

> N <- 500  # proceed with caution above N = 5,000 for system with 8 gigs made available to R
> P <- 4
> X <- matrix(rnorm(N*P), ncol=P)
> b <- 1:P 
> y <- sin(X[,1]) + X %*% b + rnorm(N)
> out <- bigKRLS(y, X, Ncores=1)
gauss_kernel.cpp:38:40: runtime error: member call on address 0x6120001ac8c0 which does not point to an object of type 'BigMatrix'
0x6120001ac8c0: note: object is of type 'SharedMemoryBigMatrix'
 25 00 80 47  38 80 92 67 87 7f 00 00  04 00 00 00 00 00 00 00  f4 01 00 00 00 00 00 00  f4 01 00 00

Similar errors repeat for the other function calls.

This means that X is coerced to a big.matrix by as.big.matrix() without issue but then by the time the Rcpp function is called there is apparently some slippage between SharedBigMatrix and BigMatrix. Any thoughts or suggestions on how to address this issue? (Also, please let me know if more detail re: bigKRLS would be helpful.)

object of type 'S4' is not subsettable when using dim names for subsetting

Hi,

the newest version of the bigmemory package on CRAN (4.5.28) introduced an issue. big.matrix used to support subsetting by dim names, e.g. row and column names. Now this leads to the error: object of type S4 not subsettable

Example:

test_big_matrix <- big.matrix(nrow = 5, ncol = 5, 
                              dimnames = list(
                                  c("A", "B", "C", "D", "E"), 
                                  c("F", "G", "H", "I", "J")))
test_big_matrix[,] <- 0 #works
test_big_matrix[c("A", "D"), c("H", "J")] <- 1 #used to work

I also tried if setting this option helps but it did not:

$bigmemory.allow.dimnames
[1] TRUE

Missing GetIndivMatrixElements in RcppExports

When building I'm getting the following note:

GetIndivElements.bm: no visible global function definition for
  ‘GetIndivMatrixElements’

It looks like this is happening because GetIndivElements.bm in bigmemory.R is calling GetIndivMatrixElements, which isn't defined.

I think the there should be a GetIndivMatrixElements in RcppExports.R should call GetIndivMatrixElements in bigmemory.cpp. However, I'm a little bit confused by the signature.

How can I create a shared big.matrix without using the backing file?

How can I create a shared big.matrix without using the backing file so as to force the matrix to always be used in memory? The reason I wish to do this is it seems that the big.matrix will use a swap file when it thinks it needs to, drastically slowing performance, so I would like to make the swap file usage off limits.

attaching a file backed big matrix fails when using '~' in backingpath on OS X

Currently I am on R 3.3.2, with bigmemory 4.5.19, OS X 10.11.6.

I noticed that when I use the tilde ('~') in a backing path of a file backed big matrix, attaching the matrix fails. It's a minor issue, I can work around it, but I think it's worth sharing with the community since it took me quite some time to find out what went wrong.

Here is the code that reproduces the error:

library(bigmemory)

options(bigmemory.typecast.warning=FALSE)

# create a simple filebacked matrix
BM = filebacked.big.matrix(10, 10, type = "integer", init = 5,
                            backingfile = 'M.bin',
                            backingpath = '~/Documents', descriptorfile='M.desc')

BMdescription <- describe(BM)

# attach it from the backing file
y <- attach.big.matrix(BMdescription, backingpath='~/Documents')

and the error:

Error in attach.resource(obj, path = list(...)[["backingpath"]], ...) : 
  Fatal error in attach: big.matrix could not be attached.

Memory leak in bigmemory?

Hi bigmemory authors,

I am resubmitting to CRAN my R package biglasso, which depends on bigmemory, and noticed that the Memtest with clang-UBSAN reveals some strange runtime errors related to BigMatrix object, though I didn't encounter any errors with R CMD check --as-cran, nor with win-builder.

I attached some output as below. Detailed output can be found here.

Could you clarify me how to fix this? In addition, I have my package depend on bigmemory (>= 4.0.0), should I change the dependency to the latest version, say >=4.5.0 ?

> cleanEx()
> nameEx("biglasso")
> ### * biglasso
> 
> flush(stderr()); flush(stdout())
> 
> ### Name: biglasso
> ### Title: Fit lasso penalized regression path for big data
> ### Aliases: biglasso
> 
> ### ** Examples
> 
> ## Linear regression
> data(colon)
> X <- colon$X
> y <- colon$y
> X.bm <- as.big.matrix(X)
> # lasso, default
> par(mfrow=c(1,2))
> fit.lasso <- biglasso(X.bm, y, family = 'gaussian')
gaussian_hsr.cpp:87:17: runtime error: member call on address 0x6120001bd540 which does not point to an object of type 'BigMatrix'
0x6120001bd540: note: object is of type 'SharedMemoryBigMatrix'
 36 00 80 0f  38 c1 ae 75 ad 7f 00 00  d0 07 00 00 00 00 00 00  3e 00 00 00 00 00 00 00  3e 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'SharedMemoryBigMatrix'
SUMMARY: AddressSanitizer: undefined-behavior gaussian_hsr.cpp:87:17 in 
/data/gannet/ripley/R/test-clang/bigmemory/include/bigmemory/BigMatrix.h:44:37: runtime error: member access within address 0x6120001bd540 which does not point to an object of type 'const BigMatrix'
0x6120001bd540: note: object is of type 'SharedMemoryBigMatrix'
 36 00 80 0f  38 c1 ae 75 ad 7f 00 00  d0 07 00 00 00 00 00 00  3e 00 00 00 00 00 00 00  3e 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'SharedMemoryBigMatrix'
SUMMARY: AddressSanitizer: undefined-behavior /data/gannet/ripley/R/test-clang/bigmemory/include/bigmemory/BigMatrix.h:44:37 in 
/data/gannet/ripley/R/test-clang/bigmemory/include/bigmemory/MatrixAccessor.hpp:37:39: runtime error: member call on address 0x6120001bd540 which does not point to an object of type 'BigMatrix'
0x6120001bd540: note: object is of type 'SharedMemoryBigMatrix'
 36 00 80 0f  38 c1 ae 75 ad 7f 00 00  d0 07 00 00 00 00 00 00  3e 00 00 00 00 00 00 00  3e 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'SharedMemoryBigMatrix'
SUMMARY: AddressSanitizer: undefined-behavior /data/gannet/ripley/R/test-clang/bigmemory/include/bigmemory/MatrixAccessor.hpp:37:39 in 
/data/gannet/ripley/R/test-clang/bigmemory/include/bigmemory/BigMatrix.h:41:28: runtime error: member access within address 0x6120001bd540 which does not point to an object of type 'BigMatrix'
0x6120001bd540: note: object is of type 'SharedMemoryBigMatrix'
 36 00 80 0f  38 c1 ae 75 ad 7f 00 00  d0 07 00 00 00 00 00 00  3e 00 00 00 00 00 00 00  3e 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'SharedMemoryBigMatrix'
SUMMARY: AddressSanitizer: undefined-behavior /data/gannet/ripley/R/test-clang/bigmemory/include/bigmemory/BigMatrix.h:41:28 in 
/data/gannet/ripley/R/test-clang/bigmemory/include/bigmemory/MatrixAccessor.hpp:38:23: runtime error: member call on address 0x6120001bd540 which does not point to an object of type 'BigMatrix'
0x6120001bd540: note: object is of type 'SharedMemoryBigMatrix'
 36 00 80 0f  38 c1 ae 75 ad 7f 00 00  d0 07 00 00 00 00 00 00  3e 00 00 00 00 00 00 00  3e 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'SharedMemoryBigMatrix'

read from gzfile

Hi,

Thank you for this useful package. I use to read my matrices from text files using read.big.matrix. I was wondering whether it would be possible to support input from gzfiles?

Best regards,

Marc

Binary backing file not consistent

Here is the code:

# test big.matrix
x = matrix(4, 2, 2)
y = as.big.matrix(backingfile = "test.bin", descriptorfile="test.desc", backingpath = "/tmp", binarydescriptor=TRUE, x = x)
y[, ]

On Ubuntu 15.04:

kaiyin@tron 18:58:11 | /tmp =>
        xxd -b test.bin
0000000: 00000000 00000000 00000000 00000000 00000000 00000000  ......
0000006: 00010000 01000000 00000000 00000000 00000000 00000000  .@....
000000c: 00000000 00000000 00010000 01000000 00000000 00000000  ...@..
0000012: 00000000 00000000 00000000 00000000 00010000 01000000  .....@
0000018: 00000000 00000000 00000000 00000000 00000000 00000000  ......
000001e: 00010000 01000000

On Mac OS 10.10:

kaiyin@kaiyins-mbp 18:59:48 | /tmp =>
        xxd -b test.bin
0000000: 00000000 00000000 00000000 00000000 00000000 00000000  ......
0000006: 00010000 01000000 00000000 00000000 00000000 00000000  .@....
000000c: 00000000 00000000 00010000 01000000 00000000 00000000  ...@..
0000012: 00000000 00000000 00000000 00000000 00010000 01000000  .....@
0000018: 00000000 00000000 00000000 00000000 00000000 00000000  ......
000001e: 00010000 01000000 00000000

As you can see, there is an extra null byte on mac.

float type matrices?

Given the focus of this package is to use the minimal amount of memory as efficiently as possible I believe it should include big.matrix objects of type float. There may be situations in which single precision is sufficient and therefore would need only ~half the space as a double matrix. I know R does not have any single precision data type but seeing how all the 'heavy lifting' is done in C++ it seems like this should be an approachable thing. However I would like to have additional opinions on the following points before I begin writing a bunch of code.

  1. Naturally, do you agree that float type matrices should be a part of this package?
  2. The current structure has the matrix_type representing the byte size of each type. Continuing this with float types would lead a conflict in various case statements. The only solution I could come up off the top of my head was to add another field to the big.matrix object that is more specific to the data type and not the byte size. e.g. pMat->matrix_data_type() would return a string (i.e. 'int', 'float', 'double', etc). This would lead to other code requiring updating, such as the Rcpp Gallery posts unless a more elegant solution can be conceived.
  3. Approaches likely would involved the use of typeid from <typeinfo> unless we would want to also begin moving towards C++11 standards where we could use the newer decltype function but possibly a moot point here (but worth beginning thoughts about C++11.

Any thoughts are appreciated :)

Is it possible to access a memory-backed bigmemory deterministically?

I want to use the bigmemory::matrix as a means of communication between two unrelated R processes. The bigmemory::matrix can be shared using its descriptor file, but one needs to send the descriptor file to the other process, so there is a chicken-and-the-egg problem.

One way of solving the problem is to save the descriptor file in some place on the mutually agreed location within the file structure.

Is there any way to get the access to the bigmemory::matrix without using the filesystem? Something like "named bigmemory::matrix" that work similar to the named mutexes?

Winbuilder notes to fix in CRAN submission.

* using log directory 'd:/RCompile/CRANguest/R-devel/bigmemory.Rcheck'
* using R Under development (unstable) (2017-03-24 r72390)
* using platform: x86_64-w64-mingw32 (64-bit)
* using session charset: ISO8859-1
* checking for file 'bigmemory/DESCRIPTION' ... OK
* this is package 'bigmemory' version '4.5.22'
* checking CRAN incoming feasibility ... Note_to_CRAN_maintainers
Maintainer: 'Michael J. Kane <[email protected]>'
* checking package namespace information ... OK
* checking package dependencies ... NOTE
Package which this enhances but not available for checking: 'synchronicity'
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking whether package 'bigmemory' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking 'build' directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* loading checks for arch 'i386'
** checking whether the package can be loaded ... OK
** checking whether the package can be loaded with stated dependencies ... OK
** checking whether the package can be unloaded cleanly ... OK
** checking whether the namespace can be loaded with stated dependencies ... OK
** checking whether the namespace can be unloaded cleanly ... OK
** checking loading without being on the library search path ... OK
** checking use of S3 registration ... OK
* loading checks for arch 'x64'
** checking whether the package can be loaded ... OK
** checking whether the package can be loaded with stated dependencies ... OK
** checking whether the package can be unloaded cleanly ... OK
** checking whether the namespace can be loaded with stated dependencies ... OK
** checking whether the namespace can be unloaded cleanly ... OK
** checking loading without being on the library search path ... OK
** checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking compiled code ... NOTE
File 'bigmemory/libs/i386/bigmemory.dll':
  Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'
File 'bigmemory/libs/x64/bigmemory.dll':
  Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'

It is good practice to register native routines and to disable symbol
search.

See 'Writing portable packages' in the 'Writing R Extensions' manual.
* checking sizes of PDF files under 'inst/doc' ... OK
* checking installed files from 'inst/doc' ... OK
* checking files in 'vignettes' ... OK
* checking examples ...
** running examples for arch 'i386' ... [2s] OK
** running examples for arch 'x64' ... [2s] OK
* checking for unstated dependencies in 'tests' ... OK
* checking tests ...
** running tests for arch 'i386' ... [4s] OK
  Running 'testthat.R' [4s]
** running tests for arch 'x64' ... [7s] OK
  Running 'testthat.R' [6s]
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking re-building of vignette outputs ... [6s] OK
* checking PDF version of manual ... OK
* DONE
Status: 2 NOTEs

Support for n-dimensional arrays?

I was wondering if you would consider supporting n-dimensional arrays in your package. It seems it would be relatively straightforward to do so. ff does this, but it doesn't provide a C++-level interface that would allow me to work with ff objects from Rcpp, as bigmemory does.

as.big.matrix() with type "raw"

I tried:

> x <- matrix(as.raw(sample(0:255, 100)), 10, 10)
> class(x)
[1] "matrix"
> typeof(x)
[1] "raw"
> as.big.matrix(x, type = "raw")
Error in SetMatrixElements(x@address, as.double(j), as.double(i), value) : 
  RAW() can only be applied to a 'raw', not a 'double'
In addition: Warning messages:
1: In as.big.matrix(x, type = "raw") : Casting to numeric type
2: In SetElements.bm(x, i, j, value) :
  Assignment will down cast from double to raw
Hint: To remove this warning type:  options(bigmemory.typecast.warning=FALSE)

Am I doing something wrong or is it a missing implementation?

@adamryczkowski

object of type 'S4' is not subsettable

a <- big.matrix(3, 3)
m <- matrix(1:6, 2, 3)
a[1:2, ] <- m

now gives

Error in a[1:2, ] <- m : object of type 'S4' is not subsettable

This seems to be a problem only when accessing a subset of rows.
It worked in earlier versions (don't know which though).

I now use a[1:2, ] <- as.numeric(m) which less straight-forward but works.

How to use big.matrix with raw bytes?

It seems, that big.matrix of type char is... char, i.e. signed byte. For R it is unsigned (there is no singed byte in R).

How to store a byte in a big.matrix?

Or: how to make this test pass?

test_that("Reading and writing byte>=128 on memory-backed file",{
        m<-big.matrix(10,1,type='char')
    m[4,1]<-as.raw(130)
    expect_equal(m[4,1],130)
})

Define Location of 'backing counter' files?

When I create a given big.matrix object a series of associated files (counter, counter_mutex, etc.) are created in to a system defined temporary directory. For example, I have seen /dev/shm and /run/shm. Is there a way to define an alternate directory where these files could go? If not, perhaps a similar option that is available in bigalgebra with options(bigalgebra.tempdir) would be preferable. This way a user can place these files in a defined location and can deal with them separately from other temp files if they so wish.

deepcopy appears to drops dimnames when `cols` argument used

Even when options(bigmemory.allow.dimnames = TRUE) it appears that deepcopy is dropping dimnames (notably column names) when you wish to subset the columns.

For example:

df <- data.frame(A = seq(2), B = rnorm(2))
bm <- as.big.matrix(df)
dm <- deepcopy(bm, cols = 1)

colnames(bm)
[1] "A" "B"

colnames(dm)
NULL

This is not the case though if the cols argument is not used:

dm <- deepcopy(bm)

colnames(bm)
[1] "A" "B"

convert a sparse matrix to a big.matrix.

when I use biglasso,my data was a sparse matrix class of Matrix package,biglasso seems only support a big.matrix,I can not onvert a sparse matrix to a big.matrix. any suggestions? thanks

Test coverage is low

@kaneplusplus The bigmemory package has low test coverage, so it is easy to add a feature, that breaks something else. (I will add the quoted m[1:2,1] == c(NA, -5) to the tests)

When I was browsing the code, I found a lot of repetitions, which are places that can easily introduce errors when the package evolve. I am willing to put some time and effort to write the unit tests and to refactor the code to eliminate the redundacies and possible bugs. But since it is your project, I would like to first discuss with you your vision of this package, so I don't waste my time doing things that would ultimately get rejected.

Can we talk via VoIP (e.g. Skype) sometime?

timeline for big.sparse.matrix support?

Thanks for this great package. I have my package that depends on bigmemory, and now I wish to add the support for sparse matrix to my package. I noticed that you put big.sparse.matrix on your wish list. Just wondering whether you have any plan to implement and any timeline expected for that?

Of course I can use sparse matrix from Eigen or Armadillo libraries. But those don't support memory mapping, which is the key feature I need.

Do big.matrices know where they are filebacked?

Is there a way to know where a filebacked big.matrix is stored on disk (the directory)?
If not, I think it should be easy to add one slot to the big.matrix object with its stored backingpath so that we can directly attach or sub a big.matrix without asking the user to specify the directory (backingpath). Or maybe add it to the description object instead?

Do you want to do it? I think I can do it if you want to.
If not, I will have to make an object that extends a big.matrix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.