r-lib / pillar Goto Github PK

View Code? Open in Web Editor NEW

178.0 11.0 38.0 11.5 MB

Format columns with colour

Home Page: https://pillar.r-lib.org/

License: Other

R 99.70% Mermaid 0.30%

r colour

pillar's Issues

Use utf8_width(utf8 = TRUE)

Requires utf8 >= 1.1.0.

d00c11f#commitcomment-25872137

Rename new_column()

and the classes it uses.

Also export it, and describe in #42.

Fix formatting on Windows

especially R-devel. Known output tests differ. Ideally, we'd be able to recreate the same output on Windows and Linux with LC_CTYPE=C or LC_CTYPE=latin1.

setting string max length?

hello everyone, thanks for your great work!

Just wondering if there are any plans about implementing that? I think its pretty useful. For instance, in Pandas one could simply do:

In [43]: df = pd.DataFrame(np.array([['foo', 'bar', 'bim', 'uncomfortably long string'],
   ....:                             ['horse', 'cow', 'banana', 'apple']]))
   ....: 

In [44]: pd.set_option('max_colwidth',40)

In [45]: df
Out[45]: 
       0    1       2                          3
0    foo  bar     bim  uncomfortably long string
1  horse  cow  banana                      apple

In [46]: pd.set_option('max_colwidth', 6)

In [47]: df
Out[47]: 
       0    1      2      3
0    foo  bar    bim  un...
1  horse  cow  ba...  apple

In [48]: pd.reset_option('max_colwidth')

which is really helpful when one prints a tibble that contains both text and numeric values.

Data type for extra columns must be formatted in subtle style

for consistency. Column names are already formatted in bold.

Engineering format for large numbers

https://en.wikipedia.org/wiki/Engineering_notation

CC @christophsax

Handle more than one column

so that printing the body part of a tibble becomes entirely the responsibility of this package.

New constructor: multicolformat() or colformats().

@hadley: Do you have a strong opinion against this?

Turn on Travis CI

@hadley: Still seems to be off: https://travis-ci.org/hadley/colformat.

control over < > for print.tbl_df method

The standard way to print units of measure is between square brackets, as in [km/h]. When I do this in a tibble header, I get <[km/h]>, which looks odd (example below) and takes unnecessary space. Is there a way to get rid of the < and >? Should I raise this issue with tibble?

suppressPackageStartupMessages(library(units))
mt = mtcars
mt$mpg = set_units(mt$mpg, km/h)
library(tibble)
(m <- as.tibble(mt))
# # A tibble: 32 x 11
#         mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#  * <[km/h]> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#  1     21.0  6.00   160 110    3.90  2.62  16.5  0     1.00  4.00  4.00
#  2     21.0  6.00   160 110    3.90  2.88  17.0  0     1.00  4.00  4.00
#  3     22.8  4.00   108  93.0  3.85  2.32  18.6  1.00  1.00  4.00  1.00
#  4     21.4  6.00   258 110    3.08  3.22  19.4  1.00  0     3.00  1.00
#  5     18.7  8.00   360 175    3.15  3.44  17.0  0     0     3.00  2.00
#  6     18.1  6.00   225 105    2.76  3.46  20.2  1.00  0     3.00  1.00
#  7     14.3  8.00   360 245    3.21  3.57  15.8  0     0     3.00  4.00
#  8     24.4  4.00   147  62.0  3.69  3.19  20.0  1.00  0     4.00  2.00
#  9     22.8  4.00   141  95.0  3.92  3.15  22.9  1.00  0     4.00  2.00
# 10     19.2  6.00   168 123    3.92  3.44  18.3  1.00  0     4.00  4.00
# # ... with 22 more rows

Consistent naming scheme

#27 (review)

Change default styling of NA

to work better on dark consoles. #14 (review)

API changes

type_sum() -> pillar_capital()
frieze

Column header and footer

Generalise title, and make it easier to use in other packages

Break into multiple chunks if width > getOption("width")

as in tibble < 1.3.

Control significant figures with an option

For instance, I see that

Digits after the first three are dimmed to emphasise the important components.

But you'll have to explain to me how the digits 201 are the important components of the year column in the example 😛 . Being able to change the default behavior somehow to give me 4 digits would be nice in instances like my class where students end up reading in a lot of examples where year is a column. I want them to like tibbles and not get frustrated by this unintuitive behavior.

Format for list cols

Maybe obj_sum plus colour?

Vignette that describes how to extend

e.g. tidyverse/hms#43

Custom column headers

Just as reminder related to your suggestion here:
strengejacke/sjlabelled#3 (comment)

Trailing insignificant digits not printed?

colformat::colformat(c(1000.34, 0.34567))
#>    <dbl>
#> 1000    
#>    0.346

@hadley: Is this intended?

Inline boxplot

From precis

Rename .Rproj

To match new package new.

Last observation carried forward

It would be handy to provide a helper class that only produced a value when it changed

ie.

x <- c("a", "a", "a", "b", "b")
format(locf(x))
#> a . . b .

Only print quotes for strings if whitespace on either end

I don't think they're needed if the whitespace is in the middle

No colours in Linux or Windows

I'm not seeing any colour in the output on Linux or Windows. Is there something else I need to install? What is a console that supports colour? That would be helpful to know for using this package.

Here's my RStudio console in Windows:

Here's my RStudio console in Linux:

Consider helpers API

And need to think about sparkline and sparkbar (which are useful helpers for lists of numeric vectors)

Should they have a common prefix?

I think this should be the focus of the second release of pillar and for now just we probably just want to un-export spark_line() and spark_bar()

Negative numbers in scientific notation miss the minus sign

#7 (comment)

How to use with a data frame?

I'm looking forward to use this to present tables via data frames. But I can't seem to see a data frame method in the pkg so far, and this doesn't work:

colformat.data.frame <- function(x, ...){
  x[] <- lapply(x, colformat)
  x
}

xx <- colformat(antibiotic[, 2:4])
str(xx)

So I'm curious to know how we can format columns in a dataframe with these methods. Thanks!

0 displayed as NA

print(tibble::tibble(x = c(0, 1e-30)), width = 20)
#> # A tibble: 2 x 1
#>                   x
#>               <dbl>
#> 1      NANANANA    
#> 2          1.00e⁻³⁰

Change prefix col_data() et al.

col_ is taken by crayon :-\

Likert helper

likert(c(1, 3, 5), max = 5)
#> +----
#> --+--
#> ----+

Format for factors

Same output as character.

Flexible widths

colformat should return an object that can be rendered at multiple widths, and should include some metadata about possible widths, as well as the optimal width.

Idea on datetimes/timezones

Following this idea from tidyverse/tibble#173 - guessing that this is the place now.

The idea in the original issue is to do something like:

#> # A tibble: 1 × 3
#>                 time1               time2               time3
#>                <dttm>           <dttm-02>           <dttm+11>
#> 1 2015-06-01 01:00:00 2015-06-01 01:00:00 2015-06-01 01:00:00

The non-DST offset is shown in the column header.

Given that you are now using color and font-weight as signifiers, could this be applied to datetimes?

The column header could be something like <dttm+11+12> then the +11, +12, and values could be colored/weighted accordingly.

Eliminate underline from frieze

Highlight significant characters for factors?

We know all levels, for iris$Species the characters "s", "ve" and "vi" would be highlighted.

Use NA for missing values

Instead of ?

Show significant but constant digits in a different color

Example: lat-lon of a local area, years, years and months in dates varying by days, strings with a common prefix, ...

Should simply look at the decimal/textual representation and check:

If sign different, abort
If leftmost digit different anywhere, abort
Else, highlight and proceed with the next digit

How should the significant-but-constant digits be formatted?

Because this heuristic seems to be valid not only for numbers, but also dates, hms, strings, ..., maybe we need a generic helper.

Reference: tidyverse/tibble#305 (comment)

Can't use superscript characters with LaTeX output

https://github.com/tidyverse/tibble/blob/e17970597fe63132c42229c22a6c06d653637351/revdep/problems.md#newly-broken-4

@yihui: Is there a way to detect that we're running in knitr session that targets LaTeX? We're using superscripted Unicode characters (like in 1.00e⁻⁵) that seem to confuse the LaTeX inputenc package.

Strange formatting of AsIs list columns

Maybe due to as.character.AsIs() ?

library(magrittr)
list(a = 1:3, b = list(1, 1:2, 1:3)) %>% pillar::colonnade()
#>       a b        
#>   <int> <list>   
#> 1     1 <dbl [1]>
#> 2     2 <int [2]>
#> 3     3 <int [3]>
list(a = 1:3, b = I(list(1, 1:2, 1:3))) %>% pillar::colonnade()
#>       a b         
#>   <int> <S3: AsIs>
#> 1     1 1         
#> 2     2 1:2       
#> 3     3 1:3

@hadley: How do we deal with this?

Related: tidyverse/tibble#304

Possible update of reference to earliest usage of technique

This looks like a great package. I noticed the reference to sparklines being first used in 2009. Edward Tufte actually wrote about this technique in The Visual Display of Quantitative Data back in 1983. A good discussion of the history of his usage and prior art can be found here: https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=000AIr

Perhaps this link can be included in the readme? It certainly is interesting!

Best
Leon

Too narrow list columns break formatting

Example:

colformat::multicolformat(list(a = "12345678901234567890", b = list(1)), width = 20)

Short type name for complex vectors

We've been using cpl in rlang, should type_sum() return that as well?
Also fun versus fn.

Don't print decimals if no value in vector has decimals.

When importing data from other software packages into R (e.g. from Stata, SAS or SPSS, using haven), vector are of type double, even if they are integers.

Would you mind checking if a vector has "floating point" values, or are actually "interger-doubles", and then omit the decimals? (something like is.numeric(x) && !all(x %% 1 == 0, na.rm = T))

Current output:

library(tibble)
tibble(a = c(1, 2, 3), b = c(1L, 2L, 3L))
#> # A tibble: 3 x 2
#>       a     b
#>   <dbl> <int>
#> 1  1.00     1
#> 2  2.00     2
#> 3  3.00     3

Since all values in a are "integers", the desired output would be like column b. The problem is, that this is a guess, if it's a double or probably was intended as integer. But I can think of (new) R users being confused when they see their values in the SPSS data sheet as "integers", and in the R console as doubles.

NA styling

I'm not in love with the current treatment. Did we try yellow or red foreground colour?

Can pillar_shaft.POSIXt() respect the digits.secs option?

I think it would be nice if pillar_shaft.POSIXt() printed POSIXct objects in the same way as base R, respecting getOption("digits.secs"). This is mainly useful for printing fractional seconds, which aren't currently shown at any time in the current implementation.

The following seems to work well enough for me:

pillar_shaft.POSIXt <- function(x, ...) {
  
  # Get the value of the option, default is 0
  fractional       <- getOption("digits.secs")
  # The width needs to be adjusted. If we are printing fractional seconds, adjust
  # by adding the number of fractional seconds to print +1 for the decimal, otherwise do nothing.
  fractional_width <- ifelse(fractional, fractional + 1, 0)
  
  date <- format(x, format = "%Y-%m-%d")

  # Use the "%OS" format to print. When we don't use any fractional seconds, I believe
  # "%OS0" is equivalent to "%S"
  time <- format(x, format = paste0("%H:%M:%OS", fractional))
  
  datetime <- paste0(date, " " , style_subtle(time))
  datetime[is.na(x)] <- NA
  
  # Add to the width
  new_pillar_shaft_simple(datetime, width = 19 + fractional_width, align = "left")
}

Using this gives:

from <- as.POSIXct("14:03:55", format="%H:%M:%OS",tz="UTC")
to   <- as.POSIXct("14:04:00", format="%H:%M:%OS", tz="UTC")

ex <- tibble::tibble(datetime = seq(from, to, by = 0.01))

ex
#> # A tibble: 501 x 1
#>    datetime           
#>    <dttm>             
#>  1 2018-01-04 14:03:55
#>  2 2018-01-04 14:03:55
#>  3 2018-01-04 14:03:55
#>  4 2018-01-04 14:03:55
#>  5 2018-01-04 14:03:55
#>  6 2018-01-04 14:03:55
#>  7 2018-01-04 14:03:55
#>  8 2018-01-04 14:03:55
#>  9 2018-01-04 14:03:55
#> 10 2018-01-04 14:03:55
#> # ... with 491 more rows

options(digits.secs = 4)

ex
#> # A tibble: 501 x 1
#>    datetime                
#>    <dttm>                  
#>  1 2018-01-04 14:03:55.0000
#>  2 2018-01-04 14:03:55.0099
#>  3 2018-01-04 14:03:55.0199
#>  4 2018-01-04 14:03:55.0299
#>  5 2018-01-04 14:03:55.0399
#>  6 2018-01-04 14:03:55.0499
#>  7 2018-01-04 14:03:55.0599
#>  8 2018-01-04 14:03:55.0699
#>  9 2018-01-04 14:03:55.0799
#> 10 2018-01-04 14:03:55.0899
#> # ... with 491 more rows

The results from using options(digits.secs = 4) are a bit strange, but I think this has been confirmed as the intended output by R core.

Use colour for types

I think blue or green maybe? And lets try italics too.

Better display for list columns if all elements have the same type

e.g., nested data frames

From tidyverse/tibble#117.

r-lib / pillar Goto Github PK

pillar's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs