Comments (2)
Thank you for the comment.
I will think about how to add better messaging for the circumstance you provide.
If you are interested in getting a longer list of unique first.last name combinations, you can change sample.with.replacement = TRUE and then select out the unique combinations that occur.
The error you provide is because the internal data probably doesn't have enough female or male first names. Since the package is making combinations of first and last, there are probably millions of those.
To get 25,000 first/last name combinations you could do the following:
gender <- rep(c("M", "F"), 15000)
names <- randomNames::randomNames(
which.names = "both",
name.sep = " ",
name.order = "first.last",
gender = gender,
sample.with.replacement = TRUE
)
unique_names <- head(unique(names), 25000)
I asked for 30,000 names to begin with to make sure I had 25,000 uniques.
I've considered how to add this little trick for creating LONG lists of names, but haven't quite figured out how to put this into the package well.
from randomnames.
Thanks for the response. I hadn't realised that sample.with.replacement = TRUE
had a higher capacity for unique names. The suggestion of oversampling and then subsetting out the unique names worked well for my case. Here is a function I put together for that {simulist} package that is using {randomNames}. Feel free to use some of this code if it would be useful for {randomNames}.
#' Sample names using [randomNames::randomNames()]
#'
#' @description
#' Sample names for specified genders by sampling with replacement to avoid
#' exhausting number of name when `sample.with.replacement = FALSE`. The
#' duplicated names during sampling need to be removed to ensure each
#' individual has a unique name. In order to have enough unique names, more
#' names than required are sampled from [randomNames()], and the level of
#' oversampling is determined by the `buffer_factor` argument. A
#' `buffer_factor` too high and the more names are sampled which takes longer,
#' a `buffer_factor` too low and not enough unique names are sampled and
#' the `.sample_names()` function will need to loop until it has enough
#' unique names.
#'
#' @inheritParams .add_date
#' @param buffer_factor A single `numeric` determining the level of
#' oversampling (or buffer) when creating a vector of unique names from
#' [randomNames()].
#'
#' @return A `character` vector.
#' @keywords internal
.sample_names <- function(.data,
buffer_factor = 1.5) {
m_idx <- .data$gender == "m"
f_idx <- .data$gender == "f"
num_m <- sum(m_idx)
num_f <- sum(f_idx)
num_sample_m <- ceiling(num_m * buffer_factor)
num_sample_f <- ceiling(num_f * buffer_factor)
# create sample of names so there are no duplicates
names_m <- character(0)
while(length(names_m) < num_m) {
names_m <- unique(
randomNames::randomNames(
which.names = "both",
name.sep = " ",
name.order = "first.last",
gender = rep("M", num_sample_m),
sample.with.replacement = TRUE
)
)
}
names_f <- character(0)
while(length(names_f) < num_f) {
names_f <- unique(
randomNames::randomNames(
which.names = "both",
name.sep = " ",
name.order = "first.last",
gender = rep("F", num_sample_f),
sample.with.replacement = TRUE
)
)
}
# subset to use required number of names
names_m <- names_m[1:num_m]
names_f <- names_f[1:num_f]
# order names with gender codes from .data
names_mf <- vector(mode = "character", length = nrow(.data))
names_mf[m_idx] <- names_m
names_mf[f_idx] <- names_f
# return vector of names
names_mf
}
from randomnames.
Related Issues (17)
- randomNames(0) returns more than 'n' random first and/or last names. HOT 1
- offer option to sample without replacement HOT 1
- `sample.with.replacement = FALSE` across ethnicities/ genders HOT 2
- Handle NAs in randomNames()'s gender argument HOT 3
- firstnames weighted by birth year? HOT 1
- Error message populating out even though the package runs fine HOT 3
- Random Error with large samples without replacement HOT 2
- data source? HOT 1
- Leading whitespaces in some names HOT 2
- set.seed() only works for the first row of a dataframe. HOT 1
- Please add argument `initial.letter =` HOT 2
- Are the first and second names independent or associated
- Partial argument matching in `rep()`
- Plans to expand database of names? HOT 1
- Anonymize name across multiple records
- New package version & CRAN release? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from randomnames.