GithubHelp home page GithubHelp logo

fastmatch's People

Contributors

s-u avatar joshuaulrich avatar

Stargazers

devduck avatar Hieu Nguyen avatar Narasimha Prasanna HN avatar Larefly avatar  avatar James Martherus avatar Connor Krenzer avatar Michael Sumner avatar Garrett Mooney avatar  avatar Wilro van Niekerk avatar Joshua Pollack avatar Arun Srinivasan avatar Juan Manuel Truppia avatar Dmitry Selivanov avatar Dan McGlinn avatar Tim Triche, Jr. avatar

Watchers

James Cloos avatar  avatar Sindri avatar  avatar

fastmatch's Issues

NA match

Hello.

The reason I use base's %in% or match commands is because it makes easy to match values no matter they are numbers, characters or even NAs.

match(c(1,0,NA),NA)   NA, NA, 1
c(1,0,NA) %in% NA  gives  FALSE, FALSE, TRUE

I specially like the latter one.

But
fmatch(c(1,0,NA),NA) gives NA, NA, NA (useless)

If fastmatch is supposed to improve base::match I think it should mimic it's behaviour with NA.

fmatch fails (crashes) when table is NULL

As the title says: fmatch('somevalue', c()) gives me:

Error in fmatch("somevalue", c()) :
  uable to allocate 67108864.00Mb for a hash table

(I've also seen it with 33554432.00Mb, but anyway it's trying to allocate a nearly infinte amount of memory)

It does work fine with regular length-0 vectors (e.g. giving character(0) instead of c()).

On the latest R-4.2.1, is there a conflicting type with R_xlen_t?

Hi Simon
In hand compiling R-4.2.1 just out 6 days ago, there could be an issue with R_xlen_t, making the package fails, ie.

In file included from fasthash.c:19:
common.h:13:17: error: conflicting types for ‘R_xlen_t’; have ‘R_len_t’ {aka ‘int’}
   13 | typedef R_len_t R_xlen_t;
      |                 ^~~~~~~~
In file included from common.h:8,
                 from fasthash.c:19:
/opt/R-4.2.1/lib/R/include/Rinternals.h:72:23: note: previous declaration of ‘R_xlen_t’ with type ‘R_xlen_t’ {aka ‘long int’}
   72 |     typedef ptrdiff_t R_xlen_t;

Add %in% analogue

A question on StackOverlow asked about a faster version of %in% analogous to fastmatch::fmatch for base::match. Since %in% is basically a simple wrapper around match, it would be an easy addition... unless I'm overlooking something.

For example:

`%fin%` <- function(x, table) {
  fmatch(x, table, nomatch = 0L) > 0L
}

fmatch.hash inconsistent for length-0 input

I used fmatch.hash to attach a hash-table to my objects, but this gets problematic when my table is of length 0 (but not NULL, see issue #8): the returned value is NA_integer_ (regardless of the value of x and the class of table).

This is a problem, both because my length is now different, but also because it now matches NA's.
I think the following shows the problems:

subtable <- data.frame(value=fmatch.hash("myval", bigtable[somecondition],
                       bigtablerows=which(somecondition))

Works fine if some value of somecondition is TRUE, but not when it's all FALSE

And:

> match(NA, vector[c(F,F,F,F)])
[1] NA
> fmatch(NA, vector[c(F,F,F,F)])
[1] NA
> temp <- fmatch.hash('someval', vector(c(F,F,F,F]))
> fmatch(NA, temp)
[1] 1

I'd expect the final result to always be the same as when calling (f)match directly

Details about my setup:
Reproduced with both fastmatch 1.1-0 and 1.1-1
R 3.5.3 and 3.6.1
Under Rstudio 1.1.463 and 1.2.1335, and command prompt with R --vanilla
Windows 10

-0 is never matched

different outputs

xs <- ceiling(log(c(114.0916, 114.9999)/115)/log(1+1E-6))
match(xs, xs); fastmatch::fmatch(xs, xs)

the same outputs

ys <- as.integer(xs)
match(ys, ys); fastmatch::fmatch(ys, ys)

Release 1.1 or a 1.0 bugfix release, please?

Hi

I'm getting bitten by "bugfix: fix crash when a newly unserialized hash table is used (since the table hash is not stored during serialization)."

Would it be possible to release a version of fastmatch that doesn't suffer from this problem please?

rcnst issue with reverse dependency (TeXCheckR)

Thank you very much for fastmatch. I'm using it in a package but I've been notified by CRAN of an rcnst issue that seems to be associated with fastmatch. I posted in the R-package-dev mailing list seeking advice but haven't received a reply there. The guidelines in the CRAN documentation for rcnst issues suggest getting in contact with the maintainer of an imported package if that package might be the cause; hence this issue. Apologies if I'm mistaken. Any help you could provide would be appreciated.

The package is TeXChecKR and the issue page for the rcnst issue is https://github.com/kalibera/cran-checks/tree/master/rcnst/results/TeXCheckR . The modified constant is c(".", "?"). TeXCheckR does not require compilation; it contains no C/C++ code.

The offending test has the following log:

Space inserted before \footnote
✖ 11: \footnote{\gls{HELP} lending, tuition funding, and most other higher education programs are special appropriations from consolidated government revenue.

ERROR: modification of compiler constant of type character, length 2
ERROR: the modified value of the constant is:
[1] "." "?"
attr(,".match.hash")
<hash table>
ERROR: the original value of the constant is:
[1] "." "?"
ERROR: the modified constant is at index 1
ERROR: the modified constant is in this function body:
c(".", "?")
Fatal error: compiler constants were modified!

I'm using the following line if (split_line_after_footnote[footnote_closes_at - 1] %notin% c(".", "?")){ where `%notin% is imported from the hutils package and is defined as:

`%notin%` <- function(x, y){
  if (is.null(y)) {
    rep_len(TRUE, length(x))
  } else {
    is.na(fmatch(x, y))
  }
}

I note in the documentation for fmatch, you remark

fmatch modifies the table by attaching an attribute to it. It is expected that the values will not change unless that attribute is dropped. Under normal circumstances this should not have any effect from user's point of view, but there is a theoretical chance of the cache being out of sync with the table in case the table is modified directly (e.g. by some C code) without removing attributes.

TeXCheckR alone does not modify table directly (or at least it has no C code with that intent), so I'm not sure if this part of the documentation is applicable.

Thank you again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.