GithubHelp home page GithubHelp logo

headliner's Introduction

๐Ÿ‘‹ Hi there, I am Jake!

I am a clinical data analyst at the Children's Hospital of Philadelphia. My projects are focused on improving the workflow of other analysts and supporting our central analytics architecture team.

I specialize in:

  • ๐Ÿ“Š Data visualization, dashboarding and reports with shiny/flexdashboards
  • ๐ŸŒ GIS and ๐Ÿ““ text analysis
  • ๐Ÿ’ป SQL / dbt and ๐Ÿค– CI
  • ๐Ÿ“ฆ Package development

I spend a lot of time in R

I use these:

I maintain these:

whereiation

Want to know more?

Resume LinkedIn Portfolio

headliner's People

Contributors

davisvaughan avatar rjake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

davisvaughan

headliner's Issues

`demo_data()` has NA if n > 26

should use

        group = letters[2:(n + 1) %/% 2 %% 26]

instead of

group = letters[2:(n + 1) %/% 2],

example

letters[2:(60 + 1) %/% 2]

 [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" "f" "f" "g" "g" "h" "h" "i" "i" "j" "j"
[21] "k" "k" "l" "l" "m" "m" "n" "n" "o" "o" "p" "p" "q" "q" "r" "r" "s" "s" "t" "t"
[41] "u" "u" "v" "v" "w" "w" "x" "x" "y" "y" "z" "z" NA  NA  NA  NA  NA  NA  NA  NA 


letters[2:(60 + 1) %/% 2 %% 26]

 [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" "f" "f" "g" "g" "h" "h" "i" "i" "j" "j"
[21] "k" "k" "l" "l" "m" "m" "n" "n" "o" "o" "p" "p" "q" "q" "r" "r" "s" "s" "t" "t"
[41] "u" "u" "v" "v" "w" "w" "x" "x" "y" "y" "a" "a" "b" "b" "c" "c" "d" "d"

Add augment_articles & default values

Add argument to allow users to add and drop article matches

  • a = unique
  • an = hour
augment_articles(
  a_patterns = c("^uni", "Europ"), 
  an_patterns = "hour|honor"
)

Can this be a global setting like ggplot2::theme_set()?

Rethink compare_conditions output

Currently returns a list. Would be nice to allow with group_by()

Test case:

flights_jfk %>%
  group_by(year) |> 
  compare_conditions(
    compare = (hour > 12),
    reference = (hour <= 12),
    dep_delay
  )

Result should allow other metrics like # of observations to happen at same time.

Should the output be:

x_2012 x_2013 n_2012 n_2013
12.3 45.6 100 200

or

year x n
2012 12.3 100
2013 45.6 200

Only check rounding once

Right now, check_rounding() is called every time compare_values() is called. This can cause a lot of warnings to pop up.

Suggestion:

  • add check_decimals argument to compare_values()
  • find max decimal length in add_headliner_column() and check rounding there once
  • look into warning message about no difference
flights_jfk |> 
  select(-hour) |> 
  group_by(year) |> 
  summarise(across(where(is.numeric), lst(mean, sd))) |> 
  ungroup() |> 
  mutate(across(where(is.numeric), round, 1)) |> 
  pivot_longer(-year) |> 
  pivot_wider(
    names_from = year, 
    values_from = value, 
    names_prefix = "y"
  ) |> 
  add_headline_column(
    y2012, y2013,
    return_cols = delta
  ) |> 
  arrange(desc(delta))

Add custom phrasing for single/multiple

headline(
  compare = 10, 
  reference = 8,
  add_phrases = list(
    were = plural_phrasing(single = "was", multi = "were"),
    people = plural_phrasing(single = "person", multi = "people")
  ),
  headline = "there {were} {delta} {people}"
)

#> there were 2 people
#> there was 1 person

Use 2 month for demo_data

Readme expects being able to find month == -12 and it doesn't always appear using the -60 day logic

Want easy way to pass multiple "trend phrases"

currently trend_phrasing only takes 1 argument

headline(
  ...,
  trend_phrases = trend_terms(more = "increase", less = "decrease")
)

while plural_phrasing can take a list

headline(
  30, 20,
  ...,
  trend_phrases = trend_terms(more = "more", less = "less"),
  plural_phrases = list(
    employees = plural_phrasing(single = "employee", multi = "employees"),
    applicants = plural_phrasing(single = "applicant", multi = "applicants"),
    
  )
)

you can then call the headline

"We hired {delta} new {employees} ({delta_p}% {trend} {people} than last year)"
#> We hired 10 new employees (50% more applicants than last year)

How should multiple trend terms be incorporated?

You can technically pass it vectors like this with brackets ({trend[1]}) but not sure it's intuitive and it's more of a side-effect rather than a feature

headline(
  30, 20,
  headline =
    "We had {article_trend[1]} of {delta} {people[1]} ({delta_p}% {trend[2]} {people[2]} than last year)",
  trend_phrases =
    trend_terms(
      more = c("increase", "more"),
      less = c("decrease", "less")
    ),
  plural_phrases =
    list(
      people = plural_phrasing(
        single = c("employee", "applicant"),
        multi = c("employees", "applicants")
      )
    )
)

#> We had an increase of 5 people.
#> That is 5 more employees than the same time last year (35 vs. 30).

should it be a list like plural_phrases?

headline(
  ...,
  trend_phrases = list(
    increase = trend_terms(more = "increase", less = "decrease"
    more = trend_terms(more = "more", less = "less")
  ),
  plural_phrases = list(
    people = plural_phrasing(single = "person", multi = "people"),
    employees = plural_phrasing(single = "employee", multi = "employees")
  )
)

For the most part, the name of the object is the same as one of the arguments. Should it be reduced to pairs with the word on the left being the name to call?

headline(
  ...,
  trend_phrases = trend_terms(
    c("increase", "decrease"),
    c("more", "less")
    ),
    plural_phrases = plural_phrasing(
      c("person", "people"),
      c("employee", "employees")
    )
  )

Or even shorter, a named vector, but not sure that makes any more sense:

headline(
  ...,
  trend_phrases = trend_terms(
     increase = "decrease", 
     more = "less"
    ),
    plural_phrases = plural_phrasing(
      person = "people",
      employee = "employees"
    )
  )

add_date_columns not using right calculation for # of quarters

library(headliner)
demo_data(by = "-1 month") %>% 
  add_date_columns(date)

image

# correct, 1 quarter ago
headliner:::calc_distance(
  from = as.Date("2020-11-20"), 
  unit = "month", 
  n = 3, 
  to = as.Date("2020-07-01")
)

# incorrect, 2 quarters ago
headliner:::calc_distance(
  from = as.Date("2020-11-20"), 
  unit = "month", 
  n = 3, 
  to = as.Date("2020-06-01")
)

Allow `add_headline_column()` to accept cols named `x` or `y`

Right now add_headliner_column() appends indexes to data frames with x or y already present because of this argument:

full_data <- bind_cols(df, new_cols)

bind_cols(tibble(x = 1, y = 2), tibble(x = 1, y = 3))

New names:
* x -> x...1
* y -> y...2
* x -> x...3
* y -> y...4
# A tibble: 1 x 4
  x...1 y...2 x...3 y...4        # <---- problem
  <dbl> <dbl> <dbl> <dbl>
1     1     2     1     3

The user should be able to pass

tibble(a = 1, b = 2) |> add_headline_column(a, b) # currently ok
tibble(x = 1, y = 2) |> add_headline_column(x, y)
tibble(x = 1, y = 2) |> add_headline_column(y, x)
tibble(x = 1, z = 2) |> add_headline_column(x, z)

Add vignette for complex phrases

Add headlines together (give credit to glue)

headliner::headline(1, 2) + headliner::headline(3,4)
# decrease of 1 (1 vs. 2) decrease of 1 (3 vs. 4)

Multiple trend_terms() and plural_phrasing()

headline(
  35, 30, 
  headline = 
    "We had {article_trend[1]} {trend[1]} of {delta} {people[1]}.
    That is {delta} {trend[2]} {people[2]} \\
    than the same time last year ({orig_values}).",
  trend_phrases = 
    trend_terms(
      more = c("increase", "more"),
      less = c("decrease", "less")
    ),
  plural_phrases = 
    list(
      people = plural_phrasing(
        single = c("person", "employee"), 
        multi = c("people", "employees")
      )
    )
)

#> We had an increase of 5 people.
#> That is 5 more employees than the same time last year (35 vs. 30).

Create list of headlines

headline_counts <- function(...) {
  headline(
    ..., 
    headline = "{delta} {trend} {people}",
    trend_phrases = trend_terms("more", "less"),
    plural_phrases = list(people = plural_phrasing("person", "people"))
  )
}

headline_percents <- function(...) {
  headline(
    ..., 
    headline = "{delta_p}% {trend}",
    trend_phrases = trend_terms("higher", "lower")
  )
}

headline(30, 40)
# decrease of 10 (30 vs. 40)

headline_counts(30, 40)
# 10 less people

headline_percents(30, 40)
# 25% lower


headline_types <- 
  list(
    simple = headline, 
    n = headline_counts, 
    pct = headline_percents
  )


make_headlines <- function(compare, reference, headline_methods) {
  map(
    .x = headline_methods,
    .f = ~(.x(compare, reference))
  )
}

headline_employees <- make_headlines(30, 35, headline_types)

headline_employees$simple
# decrease of 5 (30 vs. 35)

headline_employees$n
# 5 less people 

headline_employees$pct
# 14.3% lower

Capitalize words

headline(
  x = 12,
  y = 8,
  headline = "{cap(art( trend ))} of {delta_p}%",
  cap = stringr::str_to_sentence,
  art = add_article
)
# "An increase of 50%"

Create article outputs

delta = 5.4
article_delta = "a"
trend = "increase"
article_trend = "an"

glue("there was {article_delta} ${delta}K {trend}"
#> there was a $5.4K increase

glue("there was {article_trend} {trend} of ${delta}K"
#> there was an increase of $5.4K

Add info about using headliner within a function

library(headliner)

# base R
func <- function(add = 123) {
  fn_env <- new.env()             # create new environment fn_env
  fn_env$x <- add                 # assign to fn_env
  
  headline(
    1, 2, headline = "{x}",
    .envir = fn_env               # use fn_env
  )
}

func()


# use rlang::current_env()
func <- function(add = 123) {
  x <- add                        # assign as normal
  
  headline(
    1, 2, headline = "{x}",
    .envir = rlang::current_env() # use function environment
  )
}

func()

# use rlang::env()
func <- function(add = 123) {
  fn_env <- rlang::env(x = add)   # create environment & assign at same time
  
  headline(
    1, 2, headline = "{x}",
    .envir = fn_env               # use fn_env
  )
}

func()

Resolve open questions before CRAN

  • {article_delta} returns article only or article + delta (ex. "an" vs "an 8")
    • "an 8"
  • function for capitalization?
    • no, use
      headliner::headline(10, 12) |> 
        stringr::str_to_sentence()
  • #75 - if so, to which? {delta_p}? {orig_values}?
    • do in later release
  • #74
    • headline(compare, reference)
    • headline(value, reference)
    • headline(x, compare)
    • headline(x, reference)
    • headline(x, base)
    • headline(x, y)

aggregate_groups() not working with R 4.1.0 upgrade

not sure if issue is 4.1.0 or change to packages when re-installing

flights_jfk %>%
   compare_conditions(
     compare = (carrier == "AA"),
     reference = complete.cases(.),
     c(dep_delay, arr_delay),
     calc = list(mean = mean, sd = sd)
   ) %>%
   view_list()

Error: Problem with `summarise()` input `..1`.
i `..1 = across(..., calc, .names = "{.fn}_{.col}{name}")`.
x Can't subset columns that don't exist.
x Column `..1` doesn't exist.

Fix checks issues

> checking examples ... ERROR
  Running examples in 'headliner-Ex.R' failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: compare_conditions
  > ### Title: Compare two conditions within a data frame
  > ### Aliases: compare_conditions
  > 
  > ### ** Examples
  > 
  > flights_jfk %>%
  +   compare_conditions(
  +     compare = (carrier == "AA"),
  +     reference = complete.cases(.),
  +     c(dep_delay, arr_delay),
  +     calc = list(mean = mean, sd = sd)
  +   ) %>%
  +   view_components()
  Error in view_components(.) : could not find function "view_components"
  Calls: %>% ... eval -> _fseq -> freduce -> withVisible -> <Anonymous>
  Execution halted

> checking Rd cross-references ... WARNING
  Missing link or links in documentation object 'compare_values.Rd':
    'headline'
  
  See section 'Cross-references' in the 'Writing R Extensions' manual.

> checking for missing documentation entries ... WARNING
  Undocumented code objects:
    'headline'
  All user-level objects in a package should have documentation entries.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

> checking Rd \usage sections ... WARNING
  Documented arguments not in \usage in documentation object 'compare_values':
    'calc'
  
  Undocumented arguments in documentation object 'headline.default'
    '...' 'return_data'
  
  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

> checking for future file timestamps ... NOTE
  unable to verify current time

> checking top-level files ... NOTE
  Non-standard file/directory found at top level:
    'docs'

headliner missing if_match option for add_headline_column

tibble(
  x = 1:5,
  y = 5:1
) |> 
  add_headline_column(x, y)

#   x     y  headline                 
#
#   1     5  decrease of 4 (1 vs. 5)  
#   2     4  decrease of 2 (2 vs. 4)  
#   3     3  difference of 0 (3 vs. 3)  <---- should say "there was no difference"
#   4     2  increase of 2 (4 vs. 2)  
#   5     1  increase of 4 (5 vs. 1)  

Add threshold for "same"

Can adjust trend_terms() to have a modifier small differences

headline(20.4, 20.5, "A {trend} of {delta} ({delta_p})%")
# A decrease of 0.1 (0.5%)

desired, uses percentage change

headline(20.4, 20.5, "A {trend} of {delta} ({delta_p})%", threshold = 0.03) 
# A small decrease of 0.1 (0.5%) 
# 0.5% is actually 0.005, well below 0.03

Allow add_headline_column() to reference other values

Currently can't reference other columns when writing headline.

Also, allow return_data = TRUE

mtcars %>%
  rownames_to_column(var = "car") %>% 
  head(8) %>% 
  select(car, cyl, am, gear, carb) %>% 
  mutate(
    comp_values = # returns a list per row
      map2(gear, carb, ~as.data.frame(compare_values(.x, .y)))
  ) %>% 
  unnest(comp_values) %>% 
  transmute(
    headline = 
      glue("The {car} has {article_trend} of {delta} ({delta_p}%, {orig_values})")
  )
โ€‹
#> [1] The Duster 360 has a decrease of 1 (25%, 3 vs. 4)       
#> [2] The Hornet Sportabout has an increase of 1 (50%, 3 vs. 2)
#> [3] The Hornet 4 Drive has an increase of 2 (200%, 3 vs. 1)  
#> [4] The Valiant has an increase of 2 (200%, 3 vs. 1)         
#> [5] The Merc 240D has an increase of 2 (100%, 4 vs. 2)       

Need to pick names for headliner arguments

compare vs reference is too confusing... can there be a different pairing?

  • headline(compare, reference) - current
  • headline(x, compare)
  • headline(x, reference)
  • headline(x, base)
  • headline(x, y)

Examples:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.