michaeldorman / nngeo Goto Github PK

View Code? Open in Web Editor NEW

76.0 76.0 11.0 77.01 MB

k-Nearest Neighbor Join for Spatial Data

License: Other

R 22.98% C 77.02%

nngeo's People

Contributors

Stargazers

Watchers

Forkers

rushgeo ihough xiaguoxin domrussel ranghetti mrecos a-benini fjellrev matthieustigler shehackedyou latot

nngeo's Issues

Distance polygon to point: rule for ordering within polygon, and speed?

Very nice package, thanks! I was using distance from point to polygons, and have a few questions:

I see that points within polygon have distance of 0. Does this mean ordering for those is arbitrary?
Speed: doing point to polygon is much slower than point to centroid. Is this because the st_dist() is used on the full dataset? Interestingly, st_contains() seems much faste. Could that be used to restrict the dimensionality of the search (st_contain to get inner point, if n_inner<k, get distance on outer ones?)

Thanks!

Distance vector to variable/column not available anymore (v0.3.0)

Hi,

I used the function st_nn to get the distance vector (class: units) to a variable/column, as follows.

listings$distance_subway <- (st_nn(listings, subway_nyc, returnDist = TRUE))$dist

I noticed that the above function stopped to work on nngeo v0.3.0.

I tried the following but it stored the vector as a list instead of units:

distance_subway <- st_nn(listings, subway_nyc, returnDist = TRUE)
listings$distance_subway <- distance_subway[[2]]

How to proceed?

st_connect use improper for unprojected objects

The warnings on CRAN are coming from sp (or sf if st_connect uses st_sample) about *sample being used on unprojected data:

The warnings seem to appear in nngeo::st_connect() in the lapply() local function starting at line 97 in nngeo/R/st_connect.R, and come from sp:::sample.SpatialLines, sp/R/spsample.R line 177:

        if (isTRUE(!is.projected(x)))
                warning("working under the assumption of projected data!")

In fact the warning should be issued in all cases as far as I can see, nngeo is sampling from a line on the ellipsoid but assuming planar geometry. They might also switch from sp to sf for the spatial sampling:

#      x_sp = as(x[i], "Spatial")
#      start_pool = sp::spsample(x_sp, type = "regular", n = n_x[i])
#      start_pool = st_as_sfc(start_pool)
       start_pool = st_sample(x[i], type = "regular", size = n_x[i])

and

#        y_sp = as(y[j], "Spatial")  
#        end_pool = sp::spsample(y_sp, type = "regular", n = n_y[j])
#        end_pool = st_as_sfc(end_pool)
        end_pool = st_sample(y[i], type = "regular", size = n_y[i])

in nngeo/R/st_connect.R: with sf 0.8-1, the vignette gets three:

#> although coordinates are longitude/latitude, st_sample assumes that they are planar

messages.

So the underlying problem is the misuse of sp::spsample or sf::st_sample on unprojected data.

large dataset with st_nn

I have a spatial dataframe of 12million GPS locations, and I'm trying to find the nearest 4 neighbours to each event from a source of 2480 line segments.

Running this crashes my machine after about 15mins. this is the code.

nngeo::st_nn(events, segments, k = k, returnDist = TRUE, maxdist = 30, parallel = 10)

are there limitations on st_nn or am I doing something wrong?

A few typos/missing words in `st_nn`

Hi,

Thanks for making a great package! I'm using version 0.3.4.

The sparse argument in st_nn has an unfinished sentence in the help file.

The progress argument is missing a bracket and maybe more words - not sure.

Only very minor things, but thought I would point them out.

st_remove_holes Feature Renaming

Hi! I have been using the st_remove_holes function and absolutely loving it but I did notice a small modification that it was doing on the side: when I used this function it was renaming the "geometry" feature as "geom".

This is easily caught and fixed but I figured I could reach out and see if the feature renaming could be made optional (perhaps by setting the default to 'true' in case later nngeo functions anticipate a "geom" feature by name).

This example is not reproducible because I can't share the shapefile I used as "data1" but I hope it illustrates what I mean nonetheless.

Regardless, thanks for making a really useful function!

Library call

library(tidyverse); library(nngeo)
#> Loading required package: sf
#> Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE

Check data structure

str(data1)
#> Classes ‘sf’ and 'data.frame': 15 obs. of 2 variables:
#> $ uniqueID: chr "UMR_I080.2M" "UMR_SG16.2C" "UMR_CH00.1M" "UMR_CN00.1M" ...
#> $ geometry:sfc_GEOMETRY of length 15; first list element: List of 1
#> ..$ : num [1:1907, 1:2] -90.5 -90.5 -90.5 -90.5 -90.5 ...
#> ..- attr(, "class")= chr [1:3] "XY" "POLYGON" "sfg"
#> - attr(, "sf_column")= chr "geometry"
#> - attr(, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
#> ..- attr(, "names")= chr "uniqueID"

Use nngeo::st_remove_holes()

data2 <- data1 %>%
group_by(uniqueID) %>%
nngeo::st_remove_holes()

Re-check structure

str(data2)
#> Classes ‘sf’ and 'data.frame': 15 obs. of 2 variables:
#> $ uniqueID: chr "UMR_I080.2M" "UMR_SG16.2C" "UMR_CH00.1M" "UMR_CN00.1M" ...
#> $ geom :sfc_GEOMETRY of length 15; first list element: List of 1
#> ..$ : num [1:1907, 1:2] -90.5 -90.5 -90.5 -90.5 -90.5 ...
#> ..- attr(, "class")= chr [1:3] "XY" "POLYGON" "sfg"
#> - attr(, "sf_column")= chr "geom"
#> - attr(, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
#> ..- attr(, "names")= chr "uniqueID"

Unary st_nn() -- eliminate self-neighboring

sf has several functions with mixed argument behavior. For example, st_union and st_intersection accept either one or two spatial inputs.

I think this would be of great benefit.

# Example
library(nngeo)
data(towns)

head(st_nn(towns, k = 2, maxdist = 10e3))
# [[1]]
# [1]  93  5
# 
# [[2]]
# [1]  49
# 
# [[3]]
# [1]  42  8
# 
# [[4]]
# integer(0)
# 
# [[5]]
# [1]  12 93
# 
# [[6]]
# [1]  20 13

question: what replaces the raster_* function?

The latest versions have removed functions such as raster_extract. What is the best practice for replacing them?

Great-circle distance

Hi,

Is there a way to define the distance calculation methodology, i.e. "Great Circle", "Euclidian", etc., similar to the which argument under sf::st_distance?

Performance of st_nn when parallel > 1

Hi, I've been using st_nn for sf points and I noticed that it is much slower using parallel processing.

I had a look at the source code and I think the way you structure the function is causing a lot of unnecessary copying of data.

For example with 10,000 points I get:

Single Core:

> system.time(r1 <- nngeo::st_nn(x, y, k = 50, parallel = 1))
   user  system elapsed 
  23.33    0.41   24.25

4 Core: Much slower

> system.time(r1 <- nngeo::st_nn(x, y, k = 50, parallel = 4))
   user  system elapsed 
   6.34    0.27   65.10

Tweaked code: Better but not 4x faster

system.time({
  cl = parallel::makeCluster(4)
  x_split = split(x, 1:4)
  parallel::clusterExport(
    cl = cl,
    varlist = c("y"),
    envir = environment()
  )
  result = parallel::parLapply(
    cl,
    x_split,
    function(i) nngeo:::.st_nn_pnt_proj(i, y, k = 50, maxdist = Inf, progress = FALSE)
  )
  parallel::stopCluster(cl)
  ids = lapply(result, `[[`, 1)
  ids = unlist(ids, recursive = FALSE, use.names = FALSE)
   r2 = ids
})
   user  system elapsed 
   0.55    0.06   10.86

> identical(r1, r2)
[1] TRUE

This might be a windows only issue as the way parallelisation works on Linux is very different.

st_nn() unit error for projection in US Survey feet/ftUS

Seems like st_nn() may have an issue handling projections in US Survey feet/ftUS. When trying to use it for an sf object in EPSG:2248 (NAD83/Maryland), I get this error:

x cannot convert ft into us
Did you try to supply a value in a context where a bare expression was expected?

It works fine when I transform to ESPG:4326, which is in meters.

New to R and to submitting issue reports, so let me know if you need more details, but it seems similar to this issue: r-spatial/sf#504

Clean output when progress = FALSE

Hi, I notice if we try to use lat-lon and disable progress bar, we still will get output:

ret <- nngeo::st_nn(
      points,
      nodes,
      returnDist = TRUE,
      k = k,
      progress = FALSE,
      parallel = cores
  )
lon-lat points

I think the best would be a way to disable that "lon-lat" message, maybe a new param for it? or if we disable progress is likely we want quiet output, so disable if is FALSE.

Thx!

nngeo is failing with some CRS due to lack of units

Hi, I found this to fail here, is also related to r-spatial/sf#2299.

points <- sf::st_sfc(
  sf::st_point(c(0, 1)),
  sf::st_point(c(0, 2)),
  crs = 'LOCAL_CS["planar", UNIT["METER",1]]'
)

points2 <- sf::st_sfc(
  sf::st_point(c(0, 1)),
  sf::st_point(c(0, 2)),
  crs = 'LOCAL_CS["planar", UNIT["METER",1]]'
)

nngeo::st_nn(points, points2)
projected points
Error in if (!is.na(crs_units) & crs_units != "m") { : 
  argument is of length zero

Here the line:

nngeo/R/st_nn_pnt_proj.R

Line 61 in fcfde8e

if(!is.na(crs_units) & crs_units != "m") {

For some reason, SF is not setting the units for this CRS, maybe is only setting the ones that have the shortcut of numeric value, like 4326 for WGS84.

Thx.

Distance result when parallel parameter is larger than 1

Thank you very much for the amazing package!

I just have a quick question on the distance result of st_nn after setting the parallel parameter larger than 1. It seems that the result in $dist is the same as $nn.

I first tried st_nn with the example data (cities and water) to calculate distance without parallel processing. Below are the code.

library(nngeo)
library(parallel)
nn = st_nn(cities, water, returnDist = TRUE, progress = TRUE)

[[1]]
[1] 3

[[2]]
[1] 2

[[3]]
[1] 2

nn$dist

[[1]]
[1] 22833.09

[[2]]
[1] 1372.235

[[3]]
[1] 2777.558

However, the distance result changes when I add parallel = 4

nn_p2 = st_nn(cities, water, returnDist = TRUE, progress = TRUE, parallel = 4)
nn_p2 $nn

[[1]]
[1] 3

[[2]]
[1] 2

[[3]]
[1] 2

nn_p2$dist

[[1]]
[1] 3

[[2]]
[1] 2

[[3]]
[1] 2

I hope I did not ask the same question already covered. Could you let me know if I miss anything here?

issue with nngeo data sets

Hi Michael
This package would be ideal for a problem I am trying to solve.
I hope you progress it and add additional functionality/articulation of methods. For example, I would like to include more fields from the original data set, and have the columns labelled by their origin.
My issue is that I get an error when running your code example. Running > nn = st_nn(cities, towns, progress = TRUE) produces this error message:
projected points
Error in if (!is.na(crs_units) & crs_units != "m") { :
argument is of length zero

I'd welcome your advice because having your example and its variants working would provide much needed insight. The proj string for those two data sets does not have a units comment.

Finding no of neighbours within a certain distance

Thank you once again for a great package!

This might be beyond the scope of the package and not of interest to you, but I thought I should mention it anyway. I think it would be valuable to have the possibility to find how many neighbours there are within a certain, specified distance. I assume this could be achieved by setting a very high k, and then adding up the number of neighbours within each observation. And there are of course other ways to calculate this. But it would be very convenient with an additional function in nngeo that adds the number of neighbours within distance d, as a new variable to the data frame. It might be just my field, but there this type of calculation is very common.

But there might be problems with this idea that I don't grasp.

All the best, Richard

Performance of st_nn with lines or polygons

Hi,

Thanks for a great package. I've noticed that st_nn is very slow when getting the nearest lines to a point, and looking at the code it seems that it just defaults to using sf::st_distance from each point to all the lines.

I've been working on my own function to get around this problem, by measuring the nearest 10 centroids of the lines (which is fast), and then performing the st_nn on just those 10 lines rather than the whole dataset.

nn_line <- function(point, lines){
  cents <- sf::st_centroid(lines)
  nn <- nngeo::st_nn(point, cents, k = 10)

  res <- list()
  for(i in seq_len(nrow(point))){
    nnsub <- nn[[i]]
    sub <- unlist(nngeo::st_nn(point$geometry[i], lines$geometry[nnsub], progress = FALSE))
    res[[i]] <- nnsub[sub]
  }
  return(res)

}

This function is about 143x faster on my data, so I thought it might be useful to share more generally.

segfault 'memory not mapped' prevents installation

I am unable to install nngeo_0.4.7 (either from CRAN or github) in R 4.3.0 in ubuntu 22.04, getting the following error:

 *** caught segfault ***
address 0x55582a37ddb6, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)

I have installed Rccp and RcppEigen via apt install (rather than install.packages() in R) from the c2d4u repository. Perhaps this is causing a conflict?

Is it possible to set an option to run on multiple CPUs or GPUs?

For large amounts of data, many of this package's functions are quite slow. Is it possible to pass a flag to the underlying C++ to split the task up and utilize all available CPU cores? Right now it runs single threaded.

st_connect returns lines with no crs

The lines that st_connect returns do not have a CRS:

library(nngeo) # version 0.1.8

x <- data.frame(x = runif(1), y = runif(1))
x <- st_as_sf(x, coords = c("x", "y"), crs = 4326)

y <- data.frame(x = runif(5), y = runif(5))
y <- st_as_sf(y, coords = c("x", "y"), crs = 4326)

lines <- st_connect(x, y, st_nn(x, y))

print(st_crs(lines))
# Coordinate Reference System: NA

stopifnot(st_crs(lines) == st_crs(x))
# Error: st_crs(lines) == st_crs(x) is not TRUE

How to speed up while using st_nn_pnt_geo.R

I got some longitude and latitude geo point data and found out it is very slow the st_nn() works on this kind of data, is there something I can do to make it more efficient?
I know I cannot use RANN::nn2() directly for this kind of data, but is there some kind of way to convert epsg 4326 to some projection that is good for use RANN::nn2()?
Thanks.

michaeldorman / nngeo Goto Github PK

nngeo's People

Contributors

Stargazers

Watchers

Forkers

nngeo's Issues

Library call

Check data structure

Use nngeo::st_remove_holes()

Re-check structure

Recommend Projects

Recommend Topics

Recommend Org

Jobs