rjake / headliner Goto Github PK

Home Page: https://rjake.github.io/headliner

License: Other

R 100.00%

headliner's Introduction

👋 Hi there, I am Jake!

I am a clinical data analyst at the Children's Hospital of Philadelphia. My projects are focused on improving the workflow of other analysts and supporting our central analytics architecture team.

I specialize in:

📊 Data visualization, dashboarding and reports with shiny/flexdashboards
🌍 GIS and 📓 text analysis
💻 SQL / dbt and 🤖 CI
📦 Package development

I spend a lot of time in R

I use these:

I maintain these:

Want to know more?

headliner's People

Contributors

Stargazers

Watchers

Forkers

davisvaughan

headliner's Issues

`demo_data()` has NA if n > 26

should use

        group = letters[2:(n + 1) %/% 2 %% 26]

instead of

headliner/R/demo-data.R

Line 11 in c6f29e9

group = letters[2:(n + 1) %/% 2],

example

letters[2:(60 + 1) %/% 2]

 [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" "f" "f" "g" "g" "h" "h" "i" "i" "j" "j"
[21] "k" "k" "l" "l" "m" "m" "n" "n" "o" "o" "p" "p" "q" "q" "r" "r" "s" "s" "t" "t"
[41] "u" "u" "v" "v" "w" "w" "x" "x" "y" "y" "z" "z" NA  NA  NA  NA  NA  NA  NA  NA 


letters[2:(60 + 1) %/% 2 %% 26]

 [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" "f" "f" "g" "g" "h" "h" "i" "i" "j" "j"
[21] "k" "k" "l" "l" "m" "m" "n" "n" "o" "o" "p" "p" "q" "q" "r" "r" "s" "s" "t" "t"
[41] "u" "u" "v" "v" "w" "w" "x" "x" "y" "y" "a" "a" "b" "b" "c" "c" "d" "d"

Add augment_articles & default values

Add argument to allow users to add and drop article matches

a = unique
an = hour

augment_articles(
  a_patterns = c("^uni", "Europ"), 
  an_patterns = "hour|honor"
)

Can this be a global setting like ggplot2::theme_set()?

Allow '...' in add_headline_column()

The ability to pass ... is missing from add_headline_column()

Rethink compare_conditions output

Currently returns a list. Would be nice to allow with group_by()

Test case:

flights_jfk %>%
  group_by(year) |> 
  compare_conditions(
    compare = (hour > 12),
    reference = (hour <= 12),
    dep_delay
  )

Result should allow other metrics like # of observations to happen at same time.

Should the output be:

x_2012	x_2013	n_2012	n_2013
12.3	45.6	100	200

year	x	n
2012	12.3	100
2013	45.6	200

Increase test coverage

Only check rounding once

Right now, check_rounding() is called every time compare_values() is called. This can cause a lot of warnings to pop up.

Suggestion:

add check_decimals argument to compare_values()
find max decimal length in add_headliner_column() and check rounding there once
look into warning message about no difference

flights_jfk |> 
  select(-hour) |> 
  group_by(year) |> 
  summarise(across(where(is.numeric), lst(mean, sd))) |> 
  ungroup() |> 
  mutate(across(where(is.numeric), round, 1)) |> 
  pivot_longer(-year) |> 
  pivot_wider(
    names_from = year, 
    values_from = value, 
    names_prefix = "y"
  ) |> 
  add_headline_column(
    y2012, y2013,
    return_cols = delta
  ) |> 
  arrange(desc(delta))

Fix headline method documentation

Put at top of file. Mimic
raw - https://github.com/tidyverse/dplyr/blob/master/R/pull.R
rendered - https://dplyr.tidyverse.org/reference/pull.html

Add custom phrasing for single/multiple

headline(
  compare = 10, 
  reference = 8,
  add_phrases = list(
    were = plural_phrasing(single = "was", multi = "were"),
    people = plural_phrasing(single = "person", multi = "people")
  ),
  headline = "there {were} {delta} {people}"
)

#> there were 2 people
#> there was 1 person

Add favicons

Missing article_trend

have to use "{add_article(trend)}"

Doesn't play well with mutate()

See thread/issue here:
https://rfordatascience.slack.com/archives/C010GJ3VAE5/p1603679643222200

Use 2 month for demo_data

Readme expects being able to find month == -12 and it doesn't always appear using the -60 day logic

Add function for adding a headline column

Using mutate( x = map2_chr(...)) is likely to frustrate some ppl. Create a function like

df %>%
  add_headline_column(
    compare = col_a, 
    reference = col_b
 )

Arguments not listed for headline()

Want easy way to pass multiple "trend phrases"

currently trend_phrasing only takes 1 argument

headline(
  ...,
  trend_phrases = trend_terms(more = "increase", less = "decrease")
)

while plural_phrasing can take a list

headline(
  30, 20,
  ...,
  trend_phrases = trend_terms(more = "more", less = "less"),
  plural_phrases = list(
    employees = plural_phrasing(single = "employee", multi = "employees"),
    applicants = plural_phrasing(single = "applicant", multi = "applicants"),
    
  )
)

you can then call the headline

"We hired {delta} new {employees} ({delta_p}% {trend} {people} than last year)"
#> We hired 10 new employees (50% more applicants than last year)

How should multiple trend terms be incorporated?

You can technically pass it vectors like this with brackets ({trend[1]}) but not sure it's intuitive and it's more of a side-effect rather than a feature

headline(
  30, 20,
  headline =
    "We had {article_trend[1]} of {delta} {people[1]} ({delta_p}% {trend[2]} {people[2]} than last year)",
  trend_phrases =
    trend_terms(
      more = c("increase", "more"),
      less = c("decrease", "less")
    ),
  plural_phrases =
    list(
      people = plural_phrasing(
        single = c("employee", "applicant"),
        multi = c("employees", "applicants")
      )
    )
)

#> We had an increase of 5 people.
#> That is 5 more employees than the same time last year (35 vs. 30).

should it be a list like plural_phrases?

headline(
  ...,
  trend_phrases = list(
    increase = trend_terms(more = "increase", less = "decrease"
    more = trend_terms(more = "more", less = "less")
  ),
  plural_phrases = list(
    people = plural_phrasing(single = "person", multi = "people"),
    employees = plural_phrasing(single = "employee", multi = "employees")
  )
)

For the most part, the name of the object is the same as one of the arguments. Should it be reduced to pairs with the word on the left being the name to call?

headline(
  ...,
  trend_phrases = trend_terms(
    c("increase", "decrease"),
    c("more", "less")
    ),
    plural_phrases = plural_phrasing(
      c("person", "people"),
      c("employee", "employees")
    )
  )

Or even shorter, a named vector, but not sure that makes any more sense:

headline(
  ...,
  trend_phrases = trend_terms(
     increase = "decrease", 
     more = "less"
    ),
    plural_phrases = plural_phrasing(
      person = "people",
      employee = "employees"
    )
  )

Add n values to compare_values() output

Want access to n_comp and n_ref

add_date_columns not using right calculation for # of quarters

library(headliner)
demo_data(by = "-1 month") %>% 
  add_date_columns(date)

# correct, 1 quarter ago
headliner:::calc_distance(
  from = as.Date("2020-11-20"), 
  unit = "month", 
  n = 3, 
  to = as.Date("2020-07-01")
)

# incorrect, 2 quarters ago
headliner:::calc_distance(
  from = as.Date("2020-11-20"), 
  unit = "month", 
  n = 3, 
  to = as.Date("2020-06-01")
)

Allow `add_headline_column()` to accept cols named `x` or `y`

Right now add_headliner_column() appends indexes to data frames with x or y already present because of this argument:

headliner/R/add-headline-column.R

Line 114 in 4b4d79d

full_data <- bind_cols(df, new_cols)

bind_cols(tibble(x = 1, y = 2), tibble(x = 1, y = 3))

New names:
* x -> x...1
* y -> y...2
* x -> x...3
* y -> y...4
# A tibble: 1 x 4
  x...1 y...2 x...3 y...4        # <---- problem
  <dbl> <dbl> <dbl> <dbl>
1     1     2     1     3

The user should be able to pass

tibble(a = 1, b = 2) |> add_headline_column(a, b) # currently ok
tibble(x = 1, y = 2) |> add_headline_column(x, y)
tibble(x = 1, y = 2) |> add_headline_column(y, x)
tibble(x = 1, z = 2) |> add_headline_column(x, z)

Use headline() in README

Add vignette for complex phrases

Add headlines together (give credit to glue)

headliner::headline(1, 2) + headliner::headline(3,4)
# decrease of 1 (1 vs. 2) decrease of 1 (3 vs. 4)

Multiple trend_terms() and plural_phrasing()

headline(
  35, 30, 
  headline = 
    "We had {article_trend[1]} {trend[1]} of {delta} {people[1]}.
    That is {delta} {trend[2]} {people[2]} \\
    than the same time last year ({orig_values}).",
  trend_phrases = 
    trend_terms(
      more = c("increase", "more"),
      less = c("decrease", "less")
    ),
  plural_phrases = 
    list(
      people = plural_phrasing(
        single = c("person", "employee"), 
        multi = c("people", "employees")
      )
    )
)

#> We had an increase of 5 people.
#> That is 5 more employees than the same time last year (35 vs. 30).

Create list of headlines

headline_counts <- function(...) {
  headline(
    ..., 
    headline = "{delta} {trend} {people}",
    trend_phrases = trend_terms("more", "less"),
    plural_phrases = list(people = plural_phrasing("person", "people"))
  )
}

headline_percents <- function(...) {
  headline(
    ..., 
    headline = "{delta_p}% {trend}",
    trend_phrases = trend_terms("higher", "lower")
  )
}

headline(30, 40)
# decrease of 10 (30 vs. 40)

headline_counts(30, 40)
# 10 less people

headline_percents(30, 40)
# 25% lower


headline_types <- 
  list(
    simple = headline, 
    n = headline_counts, 
    pct = headline_percents
  )


make_headlines <- function(compare, reference, headline_methods) {
  map(
    .x = headline_methods,
    .f = ~(.x(compare, reference))
  )
}

headline_employees <- make_headlines(30, 35, headline_types)

headline_employees$simple
# decrease of 5 (30 vs. 35)

headline_employees$n
# 5 less people 

headline_employees$pct
# 14.3% lower

Capitalize words

headline(
  x = 12,
  y = 8,
  headline = "{cap(art( trend ))} of {delta_p}%",
  cap = stringr::str_to_sentence,
  art = add_article
)
# "An increase of 50%"

Vectorize `headline()`

Right now headline() won't work on vectors. See logic in add_headline_column() for ideas

headliner/R/add-headline-column.R

Lines 94 to 100 in 1a8c784

 map2( 

 .x = {{x}}, 

 .y = {{y}}, 

 .f = 

 ~compare_values( 

 .x, 

 .y,

named parameters don't work when return_data = TRUE

example:

headline(
  5, 4, 
  "{delta} on {date}", 
  date = Sys.Date(), # <------
  return_data = TRUE
)

Change needs to happen here

headliner/R/headline.R

Line 124 in 632212a

res <- append(res, list(headline = glue_data(res, headline)))

         res <- append(res, list(headline = glue_data(res, headline, ...)))

Restructure files

one function per file

Use same naming convention for plural phrasing and trend terms

name of params and name of functions should have same naming convention

trend_phrases = & plural_phrases =
trend_terms() & plural_terms()

headliner(
  ...,
  trend_phrasing = trend_terms(...),
  plural_phrases = list(x = plural_phrasing(...))
)

Create article outputs

delta = 5.4
article_delta = "a"
trend = "increase"
article_trend = "an"

glue("there was {article_delta} ${delta}K {trend}"
#> there was a $5.4K increase

glue("there was {article_trend} {trend} of ${delta}K"
#> there was an increase of $5.4K

Add info about using headliner within a function

library(headliner)

# base R
func <- function(add = 123) {
  fn_env <- new.env()             # create new environment fn_env
  fn_env$x <- add                 # assign to fn_env
  
  headline(
    1, 2, headline = "{x}",
    .envir = fn_env               # use fn_env
  )
}

func()


# use rlang::current_env()
func <- function(add = 123) {
  x <- add                        # assign as normal
  
  headline(
    1, 2, headline = "{x}",
    .envir = rlang::current_env() # use function environment
  )
}

func()

# use rlang::env()
func <- function(add = 123) {
  fn_env <- rlang::env(x = add)   # create environment & assign at same time
  
  headline(
    1, 2, headline = "{x}",
    .envir = fn_env               # use fn_env
  )
}

func()

Resolve open questions before CRAN

{article_delta} returns article only or article + delta (ex. "an" vs "an 8")
- "an 8"

function for capitalization?

no, use

headliner::headline(10, 12) |> 
  stringr::str_to_sentence()

#75 - if so, to which? {delta_p}? {orig_values}?
- do in later release
#74
- ~~headline(compare, reference)~~
- headline(value, reference)
- headline(x, compare)
- headline(x, reference)
- headline(x, base)
- headline(x, y)

Use first 2 columns/list items if piped in from a data frame/list

Works for both df & list methods

if (missing(compare) & missing(reference) & length(x == 2) {
  compare <- x[[1]][1]
  reference <- x[[2]][1]
}

Pass multiple terms to trend_terms

Having trouble passing muliple trend_terms(), may need to use map() see issue #39

Should functions return data frames instead of lists?

should be able to pass multiple lines to compare_columns(), compare_conditions(), and headline()

flights_jfk %>% 
  group_by(carrier) %>% 
  compare_columns(c(arr_delay, dep_delay))

Also need to fix readme

flights_jfk %>%
   compare_conditions(
     compare = (carrier == "AA"),
     reference = complete.cases(.),
     c(dep_delay, arr_delay),
     calc = list(mean = mean, sd = sd)
   ) %>%
   view_list()

Error: Problem with `summarise()` input `..1`.
i `..1 = across(..., calc, .names = "{.fn}_{.col}{name}")`.
x Can't subset columns that don't exist.
x Column `..1` doesn't exist.

Fix article when value is negative

{artice_*} should always be "a" when value is negative:

a -8
a -6
a -0.2

> checking examples ... ERROR
  Running examples in 'headliner-Ex.R' failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: compare_conditions
  > ### Title: Compare two conditions within a data frame
  > ### Aliases: compare_conditions
  > 
  > ### ** Examples
  > 
  > flights_jfk %>%
  +   compare_conditions(
  +     compare = (carrier == "AA"),
  +     reference = complete.cases(.),
  +     c(dep_delay, arr_delay),
  +     calc = list(mean = mean, sd = sd)
  +   ) %>%
  +   view_components()
  Error in view_components(.) : could not find function "view_components"
  Calls: %>% ... eval -> _fseq -> freduce -> withVisible -> <Anonymous>
  Execution halted

> checking Rd cross-references ... WARNING
  Missing link or links in documentation object 'compare_values.Rd':
    'headline'
  
  See section 'Cross-references' in the 'Writing R Extensions' manual.

> checking for missing documentation entries ... WARNING
  Undocumented code objects:
    'headline'
  All user-level objects in a package should have documentation entries.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

> checking Rd \usage sections ... WARNING
  Documented arguments not in \usage in documentation object 'compare_values':
    'calc'
  
  Undocumented arguments in documentation object 'headline.default'
    '...' 'return_data'
  
  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

> checking for future file timestamps ... NOTE
  unable to verify current time

> checking top-level files ... NOTE
  Non-standard file/directory found at top level:
    'docs'

headliner missing if_match option for add_headline_column

tibble(
  x = 1:5,
  y = 5:1
) |> 
  add_headline_column(x, y)

#   x     y  headline                 
#
#   1     5  decrease of 4 (1 vs. 5)  
#   2     4  decrease of 2 (2 vs. 4)  
#   3     3  difference of 0 (3 vs. 3)  <---- should say "there was no difference"
#   4     2  increase of 2 (4 vs. 2)  
#   5     1  increase of 4 (5 vs. 1)

duplicated field names get odd new names

df <-
    demo_data() %>% 
    dplyr::mutate(day = weekdays(date))

add_date_columns(df, date)

headline(20.4, 20.5, "A {trend} of {delta} ({delta_p})%")
# A decrease of 0.1 (0.5%)

desired, uses percentage change

headline(20.4, 20.5, "A {trend} of {delta} ({delta_p})%", threshold = 0.03) 
# A small decrease of 0.1 (0.5%) 
# 0.5% is actually 0.005, well below 0.03

Allow add_headline_column() to reference other values

Currently can't reference other columns when writing headline.

Also, allow return_data = TRUE

mtcars %>%
  rownames_to_column(var = "car") %>% 
  head(8) %>% 
  select(car, cyl, am, gear, carb) %>% 
  mutate(
    comp_values = # returns a list per row
      map2(gear, carb, ~as.data.frame(compare_values(.x, .y)))
  ) %>% 
  unnest(comp_values) %>% 
  transmute(
    headline = 
      glue("The {car} has {article_trend} of {delta} ({delta_p}%, {orig_values})")
  )

#> [1] The Duster 360 has a decrease of 1 (25%, 3 vs. 4)       
#> [2] The Hornet Sportabout has an increase of 1 (50%, 3 vs. 2)
#> [3] The Hornet 4 Drive has an increase of 2 (200%, 3 vs. 1)  
#> [4] The Valiant has an increase of 2 (200%, 3 vs. 1)         
#> [5] The Merc 240D has an increase of 2 (100%, 4 vs. 2)

Move intro to vignette

Append article to `trend_terms()` defaults

Use different defaults

trend_terms <- function(more = "an increase",
                        less = "a decrease",
                        same = "no difference") {

  ...

}

instead of

headliner/R/trend-terms.R

Lines 11 to 13 in c6f29e9

 trend_terms <- function(more = "increase", 

 less = "decrease", 

 same = "difference") {

Remove "within mutate" section of vignette

or point at add_headline_columns()

scale doesn't work with values that are already percents

would help to add an argument raw_percents = TRUE so you can pass 2 percentages

> compare_values(0.4, 0.7, scale = 100)$delta
# [1] 0.3

Add multiplier to compare_conditions/compare columns

when using 0/1 indicators, include option to multiply the result by 100

Need to pick names for headliner arguments

compare vs reference is too confusing... can there be a different pairing?

Examples:

	trend_terms <- function(more = "increase",
	less = "decrease",
	same = "difference") {