shinra-dev / memuse Goto Github PK

View Code? Open in Web Editor NEW

45.0 6.0 3.0 7.9 MB

An R package of utilities for benchmarking and optimization

Home Page: https://shinra-dev.github.io/

License: Other

R 35.55% Shell 0.65% Makefile 0.37% C 34.77% M4 0.27% TeX 28.39%

r memory-estimation

memuse's Introduction

memuse

Version: 4.2-3
License: BSD 2-Clause
Project home: https://github.com/shinra-dev/memuse
Bug reports: https://github.com/shinra-dev/memuse/issues

memuse is an R package for memory estimation. It has tools for estimating the size of a matrix (that doesn't exist), showing the size of an existing object in a nicer way than object.size(). It also has tools for showing how much memory the current R process is consuming, how much ram is available on the system, and more.

Originally, this package was an over-engineered solution to a mostly non-existent problem, as a sort of love letter to other needlessly complex programs like the Enterprise Fizzbuzz. However, as of version 2.0-0, I'm sad to report that the package is actually becoming quite useful.

The package has been exhaustively tested on Linux, FreeBSD, Windows, Mac, and "other"-NIX. That is also roughly the platforms in descending order of support for the various operations. However, if you have a problem installing or using the package, please open an issue on the project's GitHub repository.

Installation

To install the R package, run:

install.package("memuse")

The development version is maintained on GitHub:

remotes::install_github("shinra-dev/memuse")

The C internals, found in memuse/src/meminfo/ are completely separated from the R wrapper code. So if you prefer, you can easily build this as a standalone C shared library.

Package Utilities

The package comes with several classes of utilities. I find all of them very useful during the course of benchmarking, but some are certainly more useful than others.

Memory Lookups

With this package you can get some information about how much memory is physically available on the host machine:

Sys.meminfo()
# Totalram:  15.656 GiB 
# Freeram:   10.504 GiB 

Sys.meminfo(compact.free=FALSE) ### Linux and FreeBSD only
# Totalram:   15.656 GiB 
# Freeram:     1.067 GiB 
# Bufferram:   1.332 GiB 
# Cachedram:   8.207 GiB 

Sys.swapinfo() ## same as Sys.pageinfo()
# Totalswap:    32.596 GiB 
# Freeswap:     32.595 GiB 
# Cachedswap:  444.000 KiB

You can find the ram usage of the current R process:

Sys.procmem()
# Size:  258.426 MiB 
# Peak:  258.426 MiB 

x <- rnorm(1e8)
memuse(x)
# 762.939 MiB

rm(x);invisible(gc())

Sys.procmem()
# Size:   258.426 MiB 
# Peak:  1021.363 MiB

Also, if you're working close to the metal, you may be interested in seeing how large the CPU caches are and/or how big the cache linesize is:

Sys.cachesize()
# L1I:   32.000 KiB 
# L1D:   32.000 KiB 
# L2:   256.000 KiB 
# L3:     6.000 MiB 

Sys.cachelinesize()
# Linesize:  64 B

Estimating Memory Usage

You can estimate memory storage requirements of a matrix without having to divide by some annoying power of 2:

howbig(10000, 500)
# 38.147 MiB

howbig(10000, 500, type="int")
# 19.073 MiB

howbig(10000, 500, representation="sparse", sparsity=.05)
# 1.907 MiB

Alternatively, given a (memory) size, you can also find the dimensions of such a matrix:

howmany(mu(800, "mib"))
# [1] 10240 10240
howmany(mu(800, "mib"), ncol=500)
# [1] 209715    500

For more information, see the package vignette.

Misc

The package also has some miscellaneous helpful utilities:

approx.size(12345)
# 12.3 Thousand
 
approx.size(123456789)
# 123.5 Million
 
approx.size(123456789, unit.names="short")
# 123.5m
 
approx.size(123456789, unit.names="comma")
# 123,456,789

Authors

memuse is authored and maintained by:

Drew Schmidt

With additional contributions from:

Christian Heckendorf (FreeBSD improvements to meminfo)
Wei-Chen Chen (Windows build fixes)
Dan Burgess (donation of a Mac for development and testing)

memuse's People

Contributors

Stargazers

Watchers

Forkers

arturochian nbenn j450h1

memuse's Issues

installation on alpine

Hi,

I'm trying to install this on Alpine Linux. I get the following error message. Any clues on how to fix it? (FYI, memuse is installed as a dependency of the R package vcfR).

Thanks!

meminfo/src/cacheinfo.c: In function 'meminfo_cachesize':
meminfo/src/cacheinfo.c:67:12: error: '_SC_LEVEL1_ICACHE_SIZE' undeclared (first use in this function)
name = _SC_LEVEL1_ICACHE_SIZE;
^~~~~~~~~~~~~~~~~~~~~~
meminfo/src/cacheinfo.c:67:12: note: each undeclared identifier is reported only once for each function it appears in
meminfo/src/cacheinfo.c:69:12: error: '_SC_LEVEL1_DCACHE_SIZE' undeclared (first use in this function)
name = _SC_LEVEL1_DCACHE_SIZE;
^~~~~~~~~~~~~~~~~~~~~~
meminfo/src/cacheinfo.c:71:12: error: '_SC_LEVEL2_CACHE_SIZE' undeclared (first use in this function)
name = _SC_LEVEL2_CACHE_SIZE;
^~~~~~~~~~~~~~~~~~~~~
meminfo/src/cacheinfo.c:73:12: error: '_SC_LEVEL3_CACHE_SIZE' undeclared (first use in this function)
name = _SC_LEVEL3_CACHE_SIZE;
^~~~~~~~~~~~~~~~~~~~~
meminfo/src/cacheinfo.c: In function 'meminfo_cachelinesize':
meminfo/src/cacheinfo.c:182:50: error: '_SC_LEVEL1_DCACHE_LINESIZE' undeclared (first use in this function)
cachesize_t cache_size = (cachesize_t) sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
^~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [/usr/lib/R/etc/Makeconf:167: meminfo/src/cacheinfo.o] Error 1
ERROR: compilation failed for package ‘memuse’

removing ‘/usr/lib/R/library/memuse’

version 3.0 building error under OSX ElCapitan (macport uptodate)

any idea what I could do to fix this?
Thanks
Stephane

>sudo R CMD install memuse_3.0-0.tar.gz 

* installing to library ‘/opt/R_LIBS’
* installing *source* package ‘memuse’ ...
** package ‘memuse’ successfully unpacked and MD5 sums checked
** libs
clang -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I../inst/RNACI -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include    -fPIC  -Wall -mtune=core2 -g -O2  -c meminfo/src/cacheinfo.c -o meminfo/src/cacheinfo.o
meminfo/src/cacheinfo.c:63:5: error: use of undeclared identifier 'ret'
    ret = sysctlbyname("hw.l1icachesize", &cache_size, &size, NULL, 0);
    ^
meminfo/src/cacheinfo.c:65:5: error: use of undeclared identifier 'ret'
    ret = sysctlbyname("hw.l1dcachesize", &cache_size, &size, NULL, 0);
    ^
meminfo/src/cacheinfo.c:67:5: error: use of undeclared identifier 'ret'
    ret = sysctlbyname("hw.l2cachesize", &cache_size, &size, NULL, 0);
    ^
meminfo/src/cacheinfo.c:69:5: error: use of undeclared identifier 'ret'
    ret = sysctlbyname("hw.l3cachesize", &cache_size, &size, NULL, 0);
    ^
meminfo/src/cacheinfo.c:71:10: error: use of undeclared identifier 'ret'
  chkret(ret, CACHE_ERROR);
         ^
meminfo/src/meminfo.h:14:28: note: expanded from macro 'chkret'
#define chkret(ret,val) if(ret)return(val)
                           ^
5 errors generated.
make: *** [meminfo/src/cacheinfo.o] Error 1
ERROR: compilation failed for package ‘memuse’
* removing ‘/opt/R_LIBS/memuse’
* restoring previous ‘/opt/R_LIBS/memuse’

Resident set size instead of virtual memory size

When determining memory usage of the current process under linux, I personally find the fields VmRSS and VmHWM to be closer to what I'd intuitively expect to correspond to

Sys.procmem() returns the total memory usage of the current R process and (if supported), the maximum memory usage as well.

From the proc manpages:

VmPeak: Peak virtual memory size.
VmSize: Virtual memory size.
VmHWM: Peak resident set size ("high water mark").
VmRSS: Resident set size. Note that the value here is the sum of RssAnon, RssFile, and RssShmem.

The reason why I'm reporting this is that when using memory mapped files and multiple processes, the values as reported by Sys.procmem() are massively inflated under linux compared to macOS.

Currently, under linux, mapping a 30 GB file into 8 processes, will yield 200 GB memory use, while staying in the 10's of GB on macOS. The 200 GB reported under linux have nothing to do with what I understand as memory usage but rather corresponds to the available/mapped "address space" (I'm a bit unsure on the precise terminology here).

After making the suggested change, the two platforms report much more similar memory usage.

I'm opening a PR in case you agree with my logic.

Reset peak memory as reported by Sys.procmem()

I use Sys.procmem() to report on memory usage and I really like the possibility to track peak memory usage.

Sometimes it would be nice to be able to reset the peak value to 0, such that we can also have a 'peak per section' statistic.

Suggestions to measure peak memory usage during parallel processing

Hi Drew,

When used correctly, parallel processing over multiple cores brings down the time taken to process. But the memory usage might be difficult to measure. (Its easy for one R session with peakRAM which essentially call gc to get information of maximum memory used).

Unless, I missed something in memuse, it hard to do it inside R. I was thinking of scraping output of top or free unix commands. Any ideas?