co-analysis / a11ytables Goto Github PK

R package: generate best-practice stats spreadsheets for publication

Home Page: https://co-analysis.github.io/a11ytables/

License: Other

R 98.26% CSS 1.41% Rebol 0.33%

rstats rstats-package accessibility spreadsheet openxlsx reproducible-analytical-pipeline hacktoberfest uk-gov-data-science

a11ytables's Introduction

{a11ytables}

Purpose

An R package to help automatically create reproducible spreadsheets that adhere to the guidance on releasing statistics in spreadsheets from the UK government’s Analysis Function, with a focus on accessibility (‘a11y’).

Visit the {a11ytables} website for documentation.

Accessibility

This package is not yet capable of creating perfectly accessible spreadsheets but will help with the bulk of the work needed. Users of this packages should refer back to the main spreadsheet guidance or the spreadsheet accessibility checklist after using it to make sure nothing has been missed. Please email [email protected] if you use the package so they can monitor use and the outputs produced.

Contribute

The package is under (opinionated) active development. Please see the NEWS file for the latest changes.

To contribute, please add an issue or a pull request after reading the code of conduct and contributing guidance.

Install

Install the package from GitHub using {remotes}.

install.packages("remotes")  # if not already installed

remotes::install_github(
  repo = "co-analysis/a11ytables",  # GitHub user/repository
  dependencies = TRUE,              # install required/suggested packages
  build_vignettes = TRUE            # generate vignette documentation
)

library(a11ytables)  # attach package

If you need an earlier version of the package, you should replace the repo argument call with, for example, "co-analysis/[email protected]" for the version 0.1.0 release.

The package depends on {openxlsx} and {pillar}, which are also installed with {a11ytables}.

Use

To create a spreadsheet:

Use create_a11ytable()
Pass the output to generate_workbook()
Pass the output to openxlsx::saveWorkbook()

Run ?function_name or visit the package website for function documentation. For long-form documentation, visit the package website or run browseVignettes("a11ytables") to read the:

introductory vignette to get started
accessbility checklist vignette to see how the package complies with best-practice guidance
terminology vignette to understand the nomenclature of spreadsheet terms as used in this package
package structure vignette to see how the package works under the hood

This package also includes an RStudio Addin that inserts pre-filled demo skeletons of the {a11ytables} workflow.

Related projects

The ONS’s Analysis Standards and Pipelines team has released a Python package called ‘gptables’. {a11ytables} is an independent effort that offers a native R solution that is very similar to gptables in its outputs, though there are some differences in implementation. You can always use gptables in R via the {reticulate} package if you prefer.

{a11ytables} can help you fulfil a Reproducible Analytical Pipeline by automating the generation of compliant spreadsheets for publication.

Code of Conduct

Please note that the {a11ytables} project is released with a Contributor Code of Conduct.

Copyright and Licensing

This work is Crown Copyright. The source code for the software is released under the MIT licence as per the the UK Government Licensing Framework and the GDS Way licensing guidance. The documentation for the software is released under the Open Government Licence.

a11ytables's People

Contributors

Stargazers

Watchers

Forkers

dipad-fran-bryden jtrim-ons

a11ytables's Issues

Add warnings to guide the user

Consider warnings that relate to the functioning of the package, user error, or the good practice principles. Erroring is too restrictive, especially because the user may wish to do things like tinker with their Workbook-class object after its production.

You have a notes sheet, but no notes in the tables (or vice versa)
Max note value doesn't match between notes sheet and notes in tables (inc the exact note numbers)
You didn't provide a source for the tables-type sheets
~~Mismatch between cover table content and actual table names (see #38)~~ Mismatch between rows in contents table and the expected number of rows (i.e. there should be a row for each 'tables' and 'notes' sheet_type)
You have blank cells in stats tables (and by extension, you have empty columns and/or rows)
There should only be one each of sheet_type of cover, contents and notes (this should be caught in validation rather than flagged as a warning)
~~You don't have a cover or contents~~ (already caught in validation)
Table names should be unique, lowercase and underscore-separated (this will become irrelevant with #61)

Shift focus for end-user

Shift focus of user functions to the construction of the data.frame object that {a11ytables} can turn into a publication. Shift add_*() functions to add those elements to an a11y_tbls format object, which then something like create_a11y_workbook() would turn into a publication ready object.

YAML for table specification

Consider using YAML files to guide table construction.

title: The 'mtcars' Demo Dataset
description: >
  Aspects of automobile design and performance.
properties: >
  Suppressed values are replaced with the value ['c'].
  
  Blank cells in the 'Notes' column indicate the absence of a note.
contact: >
    The mtcars Team, telephone: 012 3456 789

tables:
  - name: Table 1
    title: Car Road Tests 1
    source: Motor Trend (1974)
    file: table1.csv
  - name: Table 2
    title: Car Road Tests 2
    source: Motor Trend (1974)
    file: table2.csv

notes:
  - number: 1
    description: US gallons
  - number: 2
    description: >
      Retained to enable comparisons with previous analyses.

Processing steps:

The cover page can be generated from the head elements, title, description, properties, contact.
The contents can be generated from the information in the tables list (using the name and title components) and the existence (or otherwise) of the notes list.
The notes list can be coerced into a table.
The file property of a tables list gives the file that the tables come from, if preferable you can have function to read and check these conform, potentially only allow specific formats (e.g. only CSV/RDS).

yaml::read_yaml() will read a YAML file and return an R list.

Add functionality for multi-table sheets

Consider a sheet with two tables. They should each have their own subtable title and be separated by one empty column.

Note that the guidance suggests you don't have more than one table per sheet, so this is low priority. not a dealbreaker: users will likely be able to create separate sheets, e.g. 'Table 1a' and 'Table 1b', in these instances.

Running list of features

It would be good to put this into the README or a vignette perhaps, but for now this is a place to note some useful features of the package:

warnings if you fail to provide information (e.g. you have notes that aren't in the notes sheet, you have blank cells but have provided a reason for them)
informative errors
dynamic insertion of provided sheet titles, presence of notes, blank cell explanations and data sources above tables
automatic table markup (i.e. recognised by spreadsheet software as a named table)
automatic formatting (e.g. large bold font for titles, taller row height on cover's section headers)
automatic widening if a column contains a long header
automatic guessing of the type of column content (text, numeric), which allows for e.g. right-alignment in cells even if the column is case as text because of e.g. '[c]' if a value is suppressed
built-in RStudio addin to insert template scripts that produce a11ytables
simple interface with only two major functions (and the user is free to make adjustments to the a11ytables- or Workbook-class objects if they want to)
nothing more complicated for input by the user than simple, familiar data.frames and vectors
a documentation website

Ability to specify decimal places

Would be useful to be able to customise the number of visible decimal places but retain the full precision within the underlying figure in the cell.

At the moment, you either need to round in R which means losing precision, or export the full number with all decimal places which clutters the final table.

This is confirmed as being compatible with screen readers in the guidance: "Consider leaving the underlying figures unrounded. A screen reader user will be able to access both the rounded figure and the unrounded one."

Add styling utils

Create styles that can be applied as needed to each sheet element. Consider:

all-workbook styles, e.g. font, font size
element styles, e.g. title, table header row, subtable titles

Create table_name automatically?

Is it really necessary for the user to add a table_name if it can be generated from tab_title? By creating them automatically, it'll mean users have to provide one less thing to new_a11ytable() and won't have to know/remember the rules for creating them.

Suggest tolower() and gsub(), etc.

This will be a breaking change.

Correct links to the Analysis Function pages

For example, the Releasing Statistics in Spreadsheets guidance is now here: https://analysisfunction.civilservice.gov.uk/policy-store/releasing-statistics-in-spreadsheets/

Change back-end to use {openxlsx2}

{openxlsx2} will replace {openxlsx} at some point, presumably? Investigate its potential, but don't integrate until CRAN release and some stability.

Reconsider package name

'a11ytables' is awkward because of:

pronunciation ('accessibilitytables', 'allytables', 'ay-eleven-why-tables')
misinterpretation (those 1s could be Ls)
typing (numbers and letters, ugh)

Worst of all, I think it provides 'false emphasis'. The package is more about automation for creating consistent, compliant spreadsheets, where accessibility is baked in.

And obviously 'gptables' (good practice tables) is already taken ;)

A note on blank cells should be provided on a sheet by sheet basis

We should probably insert a line above each statistical table that explains the meaning of blank cells for that table.

At the moment it's expected that the user will explain on the cover page what blank cells mean, but it's acceptable for blank cells to have a different meaning on a sheet by sheet basis. Users may also forget to do this, though new_a11ytable() warns when there's an empty cell in a statistical table.

It'll need to be an argument to new_a11ytable(), like blank_cells. Should the user provide a full sentence, or just the completion of the phrase 'Blank cells in this table indicate...'? Probably the former, at first.

Add defensive stops

Write utils functions that:

perform a generic if () stop() when reading arguments wb, content, table_name (where appropriate)
makes sure the 'content' input is as expected

Autogenerate contents table if user doesn't supply one

new_a11ytable() captures all the information required to make the contents page without the need for the user to create it themselves. It's just tab_title and sheet_title from the a11ytables-class object in a table format.

Would users want to add anything more to the contents table? Might be beneficial to be restrictive and stop this (better consistency, fewer inputs for end user, less chance for contents/tabs mismatch errors as per #34). They can always add things after the a11ytable has been created.

So for this demo dataset:

> mtcars_df
# # A tibble: 4 × 6
#   tab_title sheet_type sheet_title                                          source             table_name        table        
#   <chr>     <chr>      <chr>                                                <chr>              <chr>             <list>       
# 1 Cover     cover      The mtcars demo datset: 'Motor Trend Car Road Tests' NA                 cover_sheet       <df [2 × 2]> 
# 2 Contents  contents   Table of contents                                    NA                 table_of_contents <df [2 × 2]> 
# 3 Notes     notes      Notes                                                NA                 notes_table       <df [6 × 2]> 
# 4 Table 1   tables     Motor Trend Car Road Tests                           Motor Trend (1974) car_scores        <df [32 × 6]>

You can grab the tab and sheet titles and put them in a data.frame:

x <- mtcars_df[mtcars_df$sheet_type %in% c("notes", "tables"),  ]

contents_df <- data.frame(
  `Sheet name` = x$tab_title,
  `Sheet title` = x$sheet_title
)

Or whatever.

Add a vignette for developers

Explain the design of the backend, like the insert and update functions.

Should make it easier for people to understand how it works so they can contribute, or suggest better approaches.

Simplify API

There probably doesn't need to be a separate add_*() function for the three meta sheets plus one for tables.

Could have a more generic add_sheet() to which you pass an argument like table_name. Maybe it defaults to inserting a sheet of tables, but you can set the argument as "cover", "contents" or "notes".

Remove tab title from sheet title

Tab title 'Table 1' and sheet title 'benchmark scores' get pasted into 'Table 1: Benchmark scores' and inserted into cell A1.

Remove this functionality. If your tab is called 'Benchmarks' you'll end up with 'Benchmarks: Benchmark scores', which isn't helpful.

Let user know about preferred input styles

So:

provide table tab titles like 'Table 1', 'Table 2', etc.
prefer notes like '[1]', not '[note 1]', etc
notes should not be in the data themselves, only in the column header or in a dedicated notes column called 'Notes'

Choose better function and argument names

Need to be more descriptive and helpful to the user. These will be breaking. Suggest changing them for v0.1.0.

Functions

new_a11ytable() to create_a11ytable() (verb)
create_a11y_wb() to convert_to_wb() (the real purpose is to convert from a11ytable-class to Workbook-class), or simply create_workbook() to mirror create_a11ytable()
Greater consistency in the .insert_*() functions: .insert_notes_statement and .insert_blanks_message() should have consistent suffix (this could have a knock-on effect for other functions that handle the message for blank cells, like .has_blanks_message() and .get_start_row_blanks_message())?

Arguments

Change content to a11ytable in create_a11y_wb() (you're passing an a11ytable so call it that, also 'content' is too close to 'contents' as in the sheet type)
Change x in S3 functions (I think I used x by convention, but not helpful for user)?

Consider how to check sheet types

I think you are obliged to name the meta sheets as cover, content and notes (exactly in that format). I think that's too restrictive. Consider using the sheet_type column of an a11ytable-class object to be 'cover', 'contents', 'notes' and 'tables', rather than 'meta' and 'tables'.

Consider tibble printing, conditional on whether {tibble} is attached

As it stands, a11ytable-class objects are data.frames. It would be nice to view them as tibbles if the user has {tibble} installed/loaded. {a11ytables} seeks to be low-dependency—besides {openxlsx}—so doesn't import the tidyverse.

See this PR to {palmerpenguins} for how this could be done.

Might reduce the need for writing a specific print method (#23)?

Add basic tests

Test the exported functions at least, but may require some testing of unexported utils.

Make the cover table requirement more user friendly

We should alter the way that users provide the cover table to make it more user-friendly and intuitive. This will be a breaking change.

Currently, the information on the cover page should be aranged so there's a title, then subtitle-body pairs down the sheet. This is because the styling functions are naive and apply alternating title style then body style.

Here's what the table currently looks like for the built-in lfs_tables object:

> filter(lfs_tables, sheet_type == "cover") |> pull(table)
[[1]]
# A tibble: 17 × 1
   A                                                                                                                                 
   <chr>                                                                                                                             
 1 "Labour market overview, UK: December 2020"
 2 "Publication dates"
 3 "The next publication was published at 7:00am 26 January 2021."
 4 "Note on weighting methodology"
 5 "More information about the impact of COVID19 on the Labour Force Survey"
 6 "Dataset identifier codes"
 7 "The four-character identification codes appearing in the tables are the ONS' references for the data series." 
...

Rows 2 and 3 are a subtitle/body pair, so are rows 4 and 5, etc.

In other words, the users needs to supply a single-column dataframe (where the column name doesn't matter) for the cover sheet when constructing their a11ytable object. I think this is pretty unintuitive.

Better:

supply the information in two columns
column 1 is subtitle and column 2 is sub_body, or something
each subtitle-body pair is therefore a row of information

For example:

> tibble::tribble(
+     ~subtitle, ~sub_body,
+     "Publication dates", "The next publication was published at 7:00am 26 January 2021.",
+     "Note on weighting methodology", "More information about the impact of COVID19 on the Labour Force Survey",
+     "Dataset identifier codes", "The four-character identification codes appearing in the tables are the ONS' references for the data series."
+ )
# A tibble: 3 × 2
  subtitle                      sub_body                                                                                             
  <chr>                         <chr>                                                                                                
1 Publication dates             The next publication was published at 7:00am 26 January 2021.                                        
2 Note on weighting methodology More information about the impact of COVID19 on the Labour Force Survey                              
3 Dataset identifier codes      The four-character identification codes appearing in the tables are the ONS' references for the data…

The logic of create_a11y_wb() can then handle the translation of this format into the single column required during workbook construction.

Output to ODS

ODS files are the ideal output for publication, as indicated in the GOV.UK and GSS guidance.

Workbook-class objects created with {openxlsx} can't be written directly to ODS (?), you have to go with saveWorkbook() in the first instance.

Perhaps provide guidance rather than functionality? I think all the styling and markup are lost if you try to read in an Excel file, then save out to ODS.

Update references to gptables throughout

gptables has been updated to fall in line with the latest releasing statistics in spreadsheets guidance.

Data in Government blog: https://dataingovernment.blog.gov.uk/2022/06/24/automatically-produce-best-practice-spreadsheets/
Docs: https://gptables.readthedocs.io/en/latest/index.html
Source on GitHub: https://github.com/best-practice-and-impact/gptables

This should be referenced in the README and anywhere else that it's mentioned in e.g. vignettes.

Advise people that it can be used from within R via the {reticulate} package.

Perhaps want to point out some of the major differences?

Make wrapping default and column-width-sizing smarter

Cells with a large number of characters either wrap awkwardly or aren't wrapped. We need a sensible solution to decide things like cell widths.

Wrapping should probably be default in cells anyway. Probably not in pre-table elements (title, table count, note presence, data source), otherwise the title could get wrapped (usually not ideal) or the first column could become very wide, which would look odd if the contents have few characters.

In an example notes table that contains two columns (e.g. a lookup of question codes in column A, to full question labels in column B), the labels might get into the tens to hundreds of characters long. Default wrapping would help for the shorter ones, but it would help to detect maximum character length and adjust the column width as necessary. Don't make the column wide enough to fit the whole text on one line, but choose a sensible maximum width where the wrapping can then take over if there's overspill.

Ultimately, the success of these approaches is dependent on case-by-case requirements and it will be left to the user to tweak the output as they see fit.

Current way of setting column widths in 'tables' sheet types:

  openxlsx::setColWidths(
    wb = wb,
    sheet = tab_title,
    cols = seq(table_width),
    widths = 16  # <--- make this dynamic
  )

Is a print method necessary for a11ytables?

Should there be a print method for a11ytable-class objects and what should it be?

To be honest, the tibble view of the object is compact, concise and reinforces that you can interact with the class as though it's a data.frame. However, tibble is not imported, while the output from a data.frame can be a bit excessive (particularly with list-columns). It might be nice for people to see what's in each table itself, rather than just a numierc dim()-like description.

For now there's a print.a11ytables(), which is just a placeholder really. Maybe it could present on a sheet-by-sheet basis?

> lfs_tables |> as_a11ytable() |> print()
# An a11ytable with 5 sheets
 * Tab titles: 
  - cover
  - contents
  - notes
  - 1
  - 2 
 * Sheet types: 
  - cover
  - contents
  - notes
  - tables
  - tables 
 * Sheet titles: 
  - Labour market overview data tables, UK, December 2020 (accessibility example)
  - Table of contents
  - Notes
  - Number and percentage of population aged 16 and over in each labour market activity group, UK, seasonally adjusted
  - Number and percentage of population in each labour market activity group by age band, UK, seasonally adjusted 
 * Table names: 
  - Cover_content
  - Table_of_contents
  - Notes_table
  - Labour_market_summary_for_16_and_over
  - Labour_market_activity_groups_16_and_over 
 * Table sizes: 
  - 17 x 1
  - 2 x 5
  - 11 x 2
  - 6 x 10
  - 6 x 9

Example of tibble view of lfs_tables:

> tibble::as_tibble(lfs_tables)
# A tibble: 5 × 8
  tab_title sheet_type sheet_title                                   source      subtable_num subtable_title table_name          table   
  <chr>     <chr>      <chr>                                         <chr>       <chr>        <chr>          <chr>               <list>  
1 cover     cover      Labour market overview data tables, UK, Dece… NA          NA           NA             Cover_content       <tibble…
2 contents  contents   Table of contents                             NA          NA           NA             Table_of_contents   <tibble…
3 notes     notes      Notes                                         NA          NA           NA             Notes_table         <tibble…
4 1         tables     Number and percentage of population aged 16 … Labour For… NA           NA             Labour_market_summ… <tibble…
5 2         tables     Number and percentage of population in each … Labour For… NA           NA             Labour_market_acti… <tibble…

Add GitHub Actions

Check
Tests
{pkgdown}

Does a contents sheet actually need to be supplied by the user?

Currently the user must add a sheet with sheet_type of "contents". Why? Surely this can be entirely auto-generated; there's nothing extra a user should put in this sheet that isn't already provided (i.e. tab_title and sheet_title).

Apply proper header styles

The guidance says that sheet titles should be header H1. I can't see how to do this in {openxlsx}.

Also not sure if Excel can actually distinguish between simply adding H1 markup vs H1 style.

I think you apply H1 by applying it as a style, as per the Home > Cell styles menu:

It's possible that marking-up the cell will invoke the 'H1 cell style' that's default in Excel, which we don't want.

(Thanks for the reminder of this, @phil-hall-moj)

Allow user to set document properties programmatically

Is there a programmatic way to adjust document 'properties', i.e. the meta information associated with a spreadsheet file that lists the title, subject, author, category, etc?

This can be found and adjusted manually via File > Properties in spreadsheet programs.

Create an RStudio Addin

Could be used to quickly generate a skeleton for new_a11ytables(), which can be otherwise quite tedious to put together.

It could put in a skeleton for the cover, contents and notes tables as well.

Transfer repo to co-analysis

It should have been there in the first place 😬

Guidance: https://docs.github.com/en/github/administering-a-repository/managing-repository-settings/transferring-a-repository

Add an accessibility statement

Although not on GOV.UK, it might be beneficial to have an accessibility statement for the documentation site.

Ideally as an arbitrary file, not as a vignette, since it's not actually useful to users of the package itself.

Holding page for now, as per govuk-hugo-demo?

Implement thousands comma-separators by default

There has been a request to:

...implement comma separated thousands format in the workbook, using openxlsx::createStyle(numFmt = “COMMA”) as per the guidance.

(Also, this is a reminder to look over the guidance in detail to extract all these kinds of cell-level features.)

Formalise the user-input 'content' object

The 'content' data.frame object that's provided by the user has some specific, strict requirements. For example:

it must have a certain number of columns and at least three rows (meta plus one tables sheet)
as a bare minimum it must contain rows with data describing the cover, contents and one table
the tables column must be a listcol containing data.frames
all other inputs must be length 1 character vectors

(See .stop_bad_content() for defensive actions a time of writing.)

Consider what the best approach is for this object. A bespoke class? Would be ideal to provide the user with a tool for creating such an object and proper documentation, including a vignette.

Handle 'unreadable content' when opening spreadsheet

On opening a spreadsheet made with v0.0.9001:

Alert
We found a problem with some content in 'test.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

[Yes]

Excel was able to open the file by repairing or removing the unreadable content.

[View]

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><logFileName>Repair Result to test1.xml</logFileName><summary>Errors were detected in file '/Users/matt.dray/Desktop/test.xlsx'</summary><repairedRecords summary="Following is a list of repairs:"><repairedRecord>Repaired Records: Table from /xl/tables/table3.xml part (Table)</repairedRecord><repairedRecord>Repaired Records: Table from /xl/tables/table4.xml part (Table)</repairedRecord><repairedRecord>Repaired Records: Table from /xl/tables/table5.xml part (Table)</repairedRecord><repairedRecord>Repaired Records: Table from /xl/tables/table6.xml part (Table)</repairedRecord><repairedRecord>Repaired Records: Table from /xl/tables/table7.xml part (Table)</repairedRecord></repairedRecords></recoveryLog>

Allow arbitrary pre-table meta information

Should users be able to provide arbitrary content above the table, i.e. via an argument(s) to new_a11ytable()?

At the moment, users of the package can only provide information about very specific things: the title, table count, warning of notes, explanation of blanks (upcoming v0.1) and data source (arguably there should be more defined possibilities, like an explanation of shorthand).

I think it's positive to be restrictive; the whole point is that we want people to adhere to the guidance and not write anything they like above each table. However, this isn't great for situations where there's a genuine user need for specific additional information.

Improve `lfs_tables` as a demo dataset

Fix grammar
Make sure at least one tab contains notes (so that the 'this table contains notes' notice gets displayed)
Remove final NA-filled row in table 2
Notes tab should refer to table '1', not '1a'

Or just use something else/make something up? It might be useful though for people to compare the output from this process to the ODS that's on the GSS guidance page.

Ensure conditional alignment in columns

The alignment of columns should depend on their content. Currently, for dev purposes, the first column is left-aligned and the rest right-aligned.

Regardless of index, text should be left aligned, numeric should be right aligned. Slight complication: values that are suppressed or not applicable, etc, are text in the form [c] and [z], forcing the column to be text.

Could get user to specify the column types so that the formatting is correct, but don't want to burden the user with more input to new_a11ytable() and/or as_a11ytable().

Guess what contents should be based on majority of column? Doesn't work if most of the column has had to be removed.

Also not desirable to let the user 'fix it in post' by using {openxlsx} functions on the Workbook-class object that pops out of create_a11y_wb().

Create vignettes

I think:

The a11ytable class (contents, construction, restrictions)
Accessibility in a spreadsheet sense, referring to the GSS guidance, pointing out how things are addressed (or not) by a11ytables
A full, end-to-end, example
Design philosophy

Maintain trailing zeroes in decimal positions

There's been a request to:

...have percentage estimates be displayed to a chosen number of decimal places consistently throughout a sheet/tab.

So 1.200, 1.230 and 1.234 are presented as-is, without Excel truncating those trailing zeroes to produce the visually-inconsistent 1.2, 1.23 and 1.234.

I would have thought about a sprintf() solution, but looks like applying styles with {openxlsx} negates this:

I have attempted implementing this using base::sprintf() on the numbers but then whenever I applied a style from openxlsx e.g. to right align etc. it just defaulted to the situation I described above. openxlsx::createStyle(numFmt = “0.00”) also didn’t work for me.

Consider structure and guidance for 'contents' objects

The user-input 'contents' object in add_*() requires specific columns and data. Consider if:

there's an easier way to supply the information, e.g. a list, or with information being provided in more than one object/argument
such an object should be its own class of object
good documentation can help people, e.g. in @description, in a vignette, or with reference to in-built lfs_tables example

Insert pre-table elements dynamically

On a tables-type sheet, there could be the following sheet elements:

Title (mandatory)
Table-count statement (mandatory)
Data-source statement (non-mandatory)
Notes statement (non-mandatory)

So the insertion of the table of data could be row 3, 4 or 5, depending on whether a source was provided and whether there are notes in the sheet. But the code is currently hard-coded to insert in one of these locations every time.

This affects styling too, I think.

Will likely need detection for has_notes and has_source in a few places. Likely want these to be standalone utils functions.

Perform user research

I've been trying relatively quickly to get the minimum viable product. I don't—and you shouldn't—trust my design decisions and opinions.

We have two major user groups to test to make sure the package is fit for purpose:

citizens/users of the output spreadsheet, i.e. does it meet the accessibility guidance and if not, why not?
government analysts (maybe wider R users) who use the package itself, i.e. does the API make sense and can the documentation be followed?

This may well be the job of the ONS Best Practice team if/when this package, or aspects of it, are passed over to them for further development.

Bottom line: actual humans need to try this.

Rewrite tests given API changes

They currently fail because they care about add_*() functions, which are deprecated as user-facing functions.

Make guidance friendlier

Provide beginner-friendly documentation in the first instance. More in-depth information can be put in a relevant vignette.

Remove examples from README and use a vignette for a proper 'quickstart' guide with greatly reduced technical material.

Consider a 'Get started' tab on the site, as per {tidyxl}.

Separate out the functions in the reference page

(As discussed with @phil-hall-moj.)

Expand test suite for v0.1.0 release

Increase test quality and handling of edge cases. Particular focus on errors and warnings so that users don't build a11ytables that can't be converted to a workbook.

Error in `setColWidths()` when running `create_a11y_wb()`

Via DH. Error in setColWidths() when create_a11y_wb() is run:

> my_wb2 <- create_a11y_wb(as_a11ytable(a11ytables::mtcars_df))
Error in openxlsx::setColWidths(wb = wb, sheet = tab_title, cols = wide_cols_index,  : 
  More widths than columns supplied.
In addition: Warning messages:
1: There are 3 tables but only 2 in the contents sheet. 
2: You have blank cells in these tables: Table 1.

Session info

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rscs_0.0.0.9000        a11ytables_0.0.0.90015 forcats_0.5.1          stringr_1.4.0         
 [5] dplyr_1.0.4            purrr_0.3.4            readr_1.4.0            tidyr_1.1.2           
 [9] tibble_3.0.6           ggplot2_3.3.3          tidyverse_1.3.0       

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0 janitor_2.1.0    remotes_2.3.0    haven_2.3.1      snakecase_0.11.0
 [6] colorspace_2.0-0 vctrs_0.3.6      generics_0.1.0   utf8_1.1.4       rlang_0.4.10    
[11] pillar_1.5.1     glue_1.4.2       withr_2.4.1      DBI_1.1.1        dbplyr_2.1.0    
[16] modelr_0.1.8     readxl_1.3.1     lifecycle_1.0.0  plyr_1.8.6       munsell_0.5.0   
[21] gtable_0.3.0     cellranger_1.1.0 zip_2.1.1        rvest_1.0.0      curl_4.3        
[26] fansi_0.4.2      broom_0.7.5      Rcpp_1.0.6       scales_1.1.1     backports_1.2.1 
[31] jsonlite_1.7.2   fs_1.5.0         hms_1.0.0        stringi_1.5.3    openxlsx_4.2.3  
[36] grid_4.0.2       cli_2.5.0        tools_4.0.2      magrittr_2.0.1   crayon_1.4.1    
[41] pkgconfig_2.0.3  ellipsis_0.3.1   xml2_1.3.2       reprex_1.0.0     lubridate_1.7.10
[46] assertthat_0.2.1 httr_1.4.2       rstudioapi_0.13  R6_2.5.0         compiler_4.0.2

Looks like it might be related to the wide_cols_index, specifically. This handles when column headers have a lot of characters and a decision is made to widen the column.

I checked if it might be an R <v4.1 base-pipe issue, but no base pipes are in the underlying code. Somehow still related to R version?

I don't see why, but maybe a Windows issue? Runs okay on at least two macOS machines (M1 Pro, Monterey 12.3.1, R v4.1.1).

Handle hyperlinks, present them properly

The guidance says that hyperlinks should be:

In their own cell (because the whole cell becomes a hyperlink in Excel)
Formatted properly (i.e. do not use bare URLs)

{a11ytables} doesn't (currently) support (1), since subsections on the cover come in pairs: one subheading, one body. You currently would have to have a separate subheader/body to put in a hyperlink cell.

Re (2), {openxlsx} can insert formulas to the output spreadsheet with a string provided by the user.