GithubHelp home page GithubHelp logo

Comments (2)

dbetebenner avatar dbetebenner commented on August 26, 2024

Thank you for the comment.

I will think about how to add better messaging for the circumstance you provide.

If you are interested in getting a longer list of unique first.last name combinations, you can change sample.with.replacement = TRUE and then select out the unique combinations that occur.

The error you provide is because the internal data probably doesn't have enough female or male first names. Since the package is making combinations of first and last, there are probably millions of those.

To get 25,000 first/last name combinations you could do the following:

gender <- rep(c("M", "F"), 15000)
names <- randomNames::randomNames(
which.names = "both",
name.sep = " ",
name.order = "first.last",
gender = gender,
sample.with.replacement = TRUE
)

unique_names <- head(unique(names), 25000)

I asked for 30,000 names to begin with to make sure I had 25,000 uniques.

I've considered how to add this little trick for creating LONG lists of names, but haven't quite figured out how to put this into the package well.

from randomnames.

joshwlambert avatar joshwlambert commented on August 26, 2024

Thanks for the response. I hadn't realised that sample.with.replacement = TRUE had a higher capacity for unique names. The suggestion of oversampling and then subsetting out the unique names worked well for my case. Here is a function I put together for that {simulist} package that is using {randomNames}. Feel free to use some of this code if it would be useful for {randomNames}.

#' Sample names using [randomNames::randomNames()]
#'
#' @description
#' Sample names for specified genders by sampling with replacement to avoid
#' exhausting number of name when `sample.with.replacement = FALSE`. The
#' duplicated names during sampling need to be removed to ensure each
#' individual has a unique name. In order to have enough unique names, more
#' names than required are sampled from [randomNames()], and the level of
#' oversampling is determined by the `buffer_factor` argument. A
#' `buffer_factor` too high and the more names are sampled which takes longer,
#' a `buffer_factor` too low and not enough unique names are sampled and
#' the `.sample_names()` function will need to loop until it has enough
#' unique names.
#'
#' @inheritParams .add_date
#' @param buffer_factor A single `numeric` determining the level of
#' oversampling (or buffer) when creating a vector of unique names from
#' [randomNames()].
#'
#' @return A `character` vector.
#' @keywords internal
.sample_names <- function(.data,
                          buffer_factor = 1.5) {
  m_idx <- .data$gender == "m"
  f_idx <- .data$gender == "f"
  num_m <- sum(m_idx)
  num_f <- sum(f_idx)
  num_sample_m <- ceiling(num_m * buffer_factor)
  num_sample_f <- ceiling(num_f * buffer_factor)

  # create sample of names so there are no duplicates
  names_m <- character(0)
  while(length(names_m) < num_m) {
    names_m <- unique(
      randomNames::randomNames(
        which.names = "both",
        name.sep = " ",
        name.order = "first.last",
        gender = rep("M", num_sample_m),
        sample.with.replacement = TRUE
      )
    )
  }

  names_f <- character(0)
  while(length(names_f) < num_f) {
    names_f <- unique(
      randomNames::randomNames(
        which.names = "both",
        name.sep = " ",
        name.order = "first.last",
        gender = rep("F", num_sample_f),
        sample.with.replacement = TRUE
      )
    )
  }

  # subset to use required number of names
  names_m <- names_m[1:num_m]
  names_f <- names_f[1:num_f]

  # order names with gender codes from .data
  names_mf <- vector(mode = "character", length = nrow(.data))
  names_mf[m_idx] <- names_m
  names_mf[f_idx] <- names_f

  # return vector of names
  names_mf
}

from randomnames.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.