nathan-russell / hashmap Goto Github PK
View Code? Open in Web Editor NEWFaster hash maps in R
License: Other
Faster hash maps in R
License: Other
I would find it useful to be able to pass hashmap
objects from R to my C++ routines, because Rcpp::List
only allows Strings as "keys", but ideally I would need Integers. Is this currently possible/supported?
As you explained in http://stackoverflow.com/questions/42261822/c-interface-with-rcppinterfaces-not-working-for-a-function-returning-stdpa (again, best thanks for that), I tried // [[Rcpp::depends(hashmap)]]
and #include <hashmap.h>
as well as adding hashmap to the LinkingTo field in DESCRIPTION. I then tried
using namespace hashmap; typedef HashTemplate<int, Rcpp::IntegerVector> hmap;
,
which, however, doesn't work when I use const hmap& foo
in a function signature.
I also tried including #include <hashmap/HashTemplate.hpp>
, but this didn't help.
Most likely needs to happen before coercion.
library(hashmap)
x <- Sys.Date() + 1:5
y <- Sys.Date() + 1:5
h <- hashmap(x, y)
h$data()
# 16911 16910 16912 16909 16908
# "2016-04-20" "2016-04-19" "2016-04-21" "2016-04-18" "2016-04-17"
h$cache_keys()
h$cache_values()
h$data()
# 2016-04-20 2016-04-19 2016-04-21 2016-04-18 2016-04-17
# "2016-04-20" "2016-04-19" "2016-04-21" "2016-04-18" "2016-04-17"
##
##
x <- Sys.time() + 3600 * (1:5)
y <- Sys.time() + 3600 * (1:5)
h <- hashmap(x, y)
h$data()
# 1460841642.22724 1460834442.22724 1460838042.22724
# "2016-04-16 21:20:52 UTC" "2016-04-16 19:20:52 UTC" "2016-04-16 20:20:52 UTC"
# 1460830842.22724 1460827242.22724
# "2016-04-16 18:20:52 UTC" "2016-04-16 17:20:52 UTC"
h$cache_keys()
h$cache_values()
h$data()
# 2016-04-16 21:20:42 2016-04-16 19:20:42 2016-04-16 20:20:42
# "2016-04-16 21:20:52 UTC" "2016-04-16 19:20:52 UTC" "2016-04-16 20:20:52 UTC"
# 2016-04-16 18:20:42 2016-04-16 17:20:42
# "2016-04-16 18:20:52 UTC" "2016-04-16 17:20:52 UTC"
Hi, I am doing a pretty basic assignment where I read in a csv file, but now I can't query anything. Any thoughts?
setwd("/users/monicahealy/desktop/assignment4")
library("RSQLite")
db<-dbConnect(SQLite(), dbname="booksDB.sqlite")
dbWriteTable(conn = db, name = "List_Books", value = "books.csv", row.names = FALSE, header = TRUE)
[1]TRUE
dbGetQuery(con, "SELECT author FROM List_Books")
Error in rsqlite_send_query(conn@ptr, statement) :
external pointer is not valid
I don't know how difficult this is to implement, but having the ability to subnest hashmaps, allowing the use of hierachial dictionaries would make hashmaps a lot more powerful. As hashmaps works at the moment it is only going one layer deep, but in real world usage you would often want to have dictionaries of dictionaries (1st key, 2nd key...).
devtools::install_github("nathan-russell/hashmap")
Downloading GitHub repo nathan-russell/hashmap@master
✓ checking for file ‘/private/var/folders/w8/r2f28p5153d8xcq_kb2__cxh0000gn/T/RtmpHy3cNA/remotes2602cab058e/nathan-russell-hashmap-39d547d/DESCRIPTION’ ...
─ preparing ‘hashmap’:
✓ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘hashmap_0.2.2.tar.gz’
* installing *source* package ‘hashmap’ ...
** using staged installation
** libs
clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -DBOOST_NO_INT64_T -DBOOST_NO_INTEGRAL_INT64_T -DBOOST_NO_LONG_LONG -DNO_SPP_LONG_LONG -I../inst/include/hashmap -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include" -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fPIC -Wall -g -O2 -c HashMapClass.cpp -o HashMapClass.o
In file included from HashMapClass.cpp:21:
In file included from ./../inst/include/hashmap/HashTemplate.hpp:25:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/unordered_map.hpp:12:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config.hpp:57:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/platform/macos.hpp:28:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/detail/posix_features.hpp:18:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:655:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/gethostuuid.h:39:17: error: C++ requires a type specifier for all declarations
int gethostuuid(uuid_t, const struct timespec *) __OSX_AVAILABLE_STARTING(__MAC_10_5, __IPHONE_NA);
^
In file included from HashMapClass.cpp:21:
In file included from ./../inst/include/hashmap/HashTemplate.hpp:25:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/unordered_map.hpp:12:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config.hpp:57:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/platform/macos.hpp:28:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:662:27: error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int getsgroups_np(int *, uuid_t);
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31: note: 'uid_t' declared here
typedef __darwin_uid_t uid_t;
^
In file included from HashMapClass.cpp:21:
In file included from ./../inst/include/hashmap/HashTemplate.hpp:25:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/unordered_map.hpp:12:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config.hpp:57:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/platform/macos.hpp:28:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:664:27: error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int getwgroups_np(int *, uuid_t);
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31: note: 'uid_t' declared here
typedef __darwin_uid_t uid_t;
^
In file included from HashMapClass.cpp:21:
In file included from ./../inst/include/hashmap/HashTemplate.hpp:25:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/unordered_map.hpp:12:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config.hpp:57:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/platform/macos.hpp:28:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:727:31: error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int setsgroups_np(int, const uuid_t);
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31: note: 'uid_t' declared here
typedef __darwin_uid_t uid_t;
^
In file included from HashMapClass.cpp:21:
In file included from ./../inst/include/hashmap/HashTemplate.hpp:25:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/unordered_map.hpp:12:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config.hpp:57:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/platform/macos.hpp:28:
In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:729:31: error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int setwgroups_np(int, const uuid_t);
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31: note: 'uid_t' declared here
typedef __darwin_uid_t uid_t;
^
5 errors generated.
make: *** [HashMapClass.o] Error 1
ERROR: compilation failed for package ‘hashmap’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/hashmap’
Error: Failed to install 'hashmap' from GitHub:
(converted from warning) installation of package ‘/var/folders/w8/r2f28p5153d8xcq_kb2__cxh0000gn/T//RtmpHy3cNA/file260ff9c48b/hashmap_0.2.2.tar.gz’ had non-zero exit status
boost::lexical_cast
on Vector<REALSXP>
equivalent to as.character
Date
and POSIXct
vectors correctlyhttps://github.com/nathan-russell/hashmap/blob/master/inst/include/hashmap/tools.hpp
Dear Nathan Russell,
Is there a way to create an empty hashmap? If not, would it be handy to alllow hashmap::hashmap
leaving keys and values empty? Or some syntactic sugar ?
I end up doing this (which isn't too bad):
H = hashmap::hashmap("temp" = 1)
H$clear()
Hi Nathan,
What would be the ideal method to delete a hashmap and possibly return memory?
Regards,
Srikanth
I'm getting compiling errors when trying to install hashmap
- see the gist here: https://gist.github.com/SaintZeno/fbaa4c9dc53e33b93d39d896d93fd078
system specs:
Ubuntu 18.04.2
g++ 7.4.0
gcc 7.4.0
Please have a look at the below code and output:
Code:
H3 <- hashmap((as.Date("10/9/2009", "%m/%d/%Y")), 1)
H3$data()
Output
14526
1
What is the correct way to insert date field, because its giving wrong output.
Regards
Ashwin
(h <- hashmap(1:5, c(letters[1:4], NA)))
## [5] => [NA]
## [4] => [d]
## [3] => [c]
## [2] => [b]
## [1] => [a]
## [...] => [...]
anyNA(h$all_values())
#[1] FALSE
class(h[[5]])
#[1] "character"
https://github.com/nathan-russell/hashmap/blob/master/inst/include/hashmap/traits.hpp#L32
(empty) hashmap creation: ~ 80x slower
microbenchmark(new.env(), hashmap(numeric(0), numeric(0)))
# Unit: nanoseconds
# expr min lq mean median uq max neval cld
# new.env() 640 803.5 1245.52 1182.5 1310.5 14147 100 a
# hashmap(numeric(0), numeric(0)) 93833 95027.0 101540.09 95987.5 98018.5 258528 100 b
inserting one element: ~ 10x slower
microbenchmark(assign("toto", TRUE, envir = env), hm$insert("toto", TRUE))
# Unit: nanoseconds
# expr min lq mean median uq max neval cld
# assign("toto", TRUE, envir = env) 514 622.5 743.01 713.0 800.5 2406 100 a
# hm$insert("toto", TRUE) 7512 7765.0 8881.30 7860.5 8039.5 93803 100 b
I understand (from the benchmarks) that you are focusing on bulk operations, but still considering the title "The Faster Hash Map" I was quite disappointed.
when accented UTF-8 key and/or value is passed to hashmap() on windows, the contents get garbled implying encoding problems, see below. here's an example on my windows 10 (sessionInfo at the end):
library(tidyverse)
library(hashmap)
> j_l1 <- "joão"
> j_l1 %>% Encoding
[1] "latin1"
# this works
> hashmap(j_l1,j_l1)
## (character) => (character)
## [joão] => [joão]
# however, hashmap does not like UTF-8's on windows.
> j_u8 <- iconv(j_l1,"latin1","UTF-8")
> j_u8 %>% Encoding
[1] "UTF-8"
# the console still displays this correctly!
> j_u8
[1] "joão"
# this is where it breaks!
> hashmap(j_u8,j_u8)
## (character) => (character)
## [joão] => [joão]
# note: somehow all tidyverse functions handle strings beautifully on windows.
# note: the reverse problem happens on linux: hashmap "likes" utf-8's but garbles latin1 strings!
>sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 stringi_1.2.4 furrr_0.1.0.9001 future_1.9.0
[5] tictoc_1.0 data.table_1.11.4 foreach_1.4.4 jsonlite_1.5
[9] glue_1.3.0 pipeR_0.6.1.3 rlist_0.4.6.1 lubridate_1.7.4
[13] forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5
[17] readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0
[21] tidyverse_1.2.1 hashmap_0.2.2
loaded via a namespace (and not attached):
[1] tidyselect_0.2.4 listenv_0.7.0 haven_1.1.2 lattice_0.20-35 colorspace_1.3-2
[6] yaml_2.2.0 rlang_0.2.2 pillar_1.3.0 withr_2.1.2 modelr_0.1.2
[11] readxl_1.1.0 bindr_0.1.1 plyr_1.8.4 munsell_0.5.0 gtable_0.2.0
[16] cellranger_1.1.0 rvest_0.3.2 codetools_0.2-15 knitr_1.20 parallel_3.5.1
[21] broom_0.5.0 Rcpp_0.12.18 scales_1.0.0 backports_1.1.2 hms_0.4.2
[26] digest_0.6.15 grid_3.5.1 cli_1.0.0 tools_3.5.1 magrittr_1.5
[31] lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.2 xml2_1.2.0 assertthat_0.2.0
[36] httr_1.3.1 rstudioapi_0.7 iterators_1.0.10 globals_0.12.1 R6_2.2.2
[41] nlme_3.1-137 compiler_3.5.1
saveRDS
function to save hashmap objectreadRDS
functionThanks.
FYI sparsepp
is on cran for quite some time - https://cran.r-project.org/web/packages/sparsepp/
If there is something which you think prevent to use it here - let's discuss, I will be happy to fix.
E.g. in spp_traits.h
either remove completely or use a better check for C++98 mode. Currently still triggering -Wc++11-long-long
on r-devel-debian-clang
and r-devel-fedora-clang
.
I used the hashmap sucessfully in some R script. I later decided to put some of my functions in a package as described here: https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
I load my package using the library
command.
When I call the package-exported function that internally uses a hashmap (created within the same package file but outside the function), I get an error: "Error: external pointer is not valid", (just when the hashmap is used, according to my debugging prints).
I'm not yet very familiar with R, so I might be doing things wrong. Any idea of a typical cause for such an error ?
Thanks in advance
I imagine since you already have the ability to hash character vectors, it may not be too difficult to add the ability to hash serialized versions of other objects.
serialize(list(),connection=NULL)
[1] 41 0a 32 0a 31 39 37 33 37 37 0a 31 33 31 38 34 30 0a 31 39 0a 30 0a
Is there a way to determine how much RAM a hashmap object is taking up in memory, similar to R's object.size() function? The size() method gives the number of key-value pairs, but not the memory footprint, and object.size() doesn't seem to capture all the memory allocated (perhaps missing C++ structures?).
And semi-relatedly, if you delete all the key-values from the hashmap using the clear() method, does the memory previously needed get released back to the system?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.