crunch-io / crunchtabs Goto Github PK
View Code? Open in Web Editor NEWReport generation for Crunch and generic datasets in R
Home Page: https://crunch-io.github.io/crunchtabs/articles/Overview.html
License: GNU Lesser General Public License v3.0
Report generation for Crunch and generic datasets in R
Home Page: https://crunch-io.github.io/crunchtabs/articles/Overview.html
License: GNU Lesser General Public License v3.0
Currently, tables / pages are created by a myriad of functions that do not interact well with each other. There is a significant amount of "pasting" together conditional statements with little to no easy way for post preparation editions of a programmatic nature. Converting this pasted TeX into a list structure prior to print could provide a useful final-mechanism for string replacements or adjustments at the end of the process.
Hello,
This looks to be a small, new error introduced from today's changes. Currently I get
\fancyhead[L]{{\fontsize{16}{24}\textbf{Title}}}
logo.png\newcolumntype{d}{D{.}{.}{3.2}}
which is wrong but fixed by running this:
\fancyhead[L]{{\fontsize{16}{24}\textbf{Title}}\fontsize{12}{18}\textbf{}}
\fancyhead[R]{\includegraphics[scale=.4]{logo.png}}
in the .tex file. If I can help by giving more information, please let me know.
Brian
Find everywhere you have "YouGov" in the code and consider how that can be made an option. Perhaps function parameters, probably with a default that is getOption("something")
so that you can set those values/paths in your .Rprofile and not have to pass them in every time.
To avoid complaints from new endpoint in rcrunch
in writeExcel, when "name" is not included in show_information, array subvariable names do not appear.
Rows to columns > columns to rows
Request to include a rows at the bottom of the table that shows the mean, median and standard deviation. Ideally with options for which of these to include.
Client at Harvard "tabs export in Word would be huge".
Brainstorming, is it possible to take the pdf then force that to "open" in Word (as an image?) ? If not that, then I'm dubious about Word's built in "tables". Another alternative would be export to google docs which might be easier to work with and then they can export to Word from there if they really really need to.
Thank you.
In the case of grid type questions, where we have one statement per row and potential responses as columns;
Good | Bad | |
---|---|---|
Statement 1 | x% | y% |
Statement 1 | x% | y% |
The column containing the statement can span a larger width of the page to accomodate a larger statement and avoid text wrapping. Instead questions of this nature tend to center on the page horizontally, and spread widths equally. Forcing a the statement to wrap where there would otherwise be enough space if the numeric response columns were "squished".
Providing a setting or automatic adjustment based on the length of the statement text + response headers would be useful here.
Hello,
TargetSmart is testing this and found that if there is a newline character inside a variable name or description then the latex compiler crashes. I have not had a chance to build a minimum reproducible example yet (and will try to today) but wanted to file this now in case you can see the issue right away.
Here is an example from the .tex file, sorry I don't have their dataset yet.
\begin{center}
\begin{longtable}{p{0.3in}p{5.5in}}
\addcontentsline{lot}{table}{ 8. Q. 9 Now, I'd like to rate your feelings toward some people, with one hundred meaning a VERY WARM, FAVORABLE feeling; zero meaning a VERY COLD, UNFAVORABLE feeling;
and fifty meaning not particularly warm or cold. You can use any number from zero to one hundred, the higher the number the more favorable your feelings are toward that person. If you have no opinion or have never heard of that person, please say so. (IF DON'T KNOW) Would you say you are unable to give an opinion of (READ BELOW), or have you never heard of (READ BELOW)?
(NO OPINION/DK = 101) (NEVER HEARD = 102)
(RANDOMIZE)}
\hangindent=0em \parbox{6.5in}{
\formatvardescription{8. Q. 9 Now, I'd like to rate your feelings toward some people, with one hundred meaning a VERY WARM, FAVORABLE feeling; zero meaning a VERY COLD, UNFAVORABLE feeling; and fifty meaning not particularly warm or cold. You can use any number from zero to one hundred, the higher the number the more favorable your feelings are toward that person. If you have no opinion or have never heard of that person, please say so.
(IF DON'T KNOW) Would you say you are unable to give an opinion of (READ BELOW), or have you never heard of (READ BELOW)?
(NO OPINION/DK = 101) (NEVER HEARD = 102)
(RANDOMIZE)}} \\longtablesep
& 0-10 \hspace*{0.15em} \dotfill 8%\
& 10-20 \hspace*{0.15em} \dotfill 1%\
& 20-30 \hspace*{0.15em} \dotfill 1%\
& 30-40 \hspace*{0.15em} \dotfill 2%\
& 40-50 \hspace*{0.15em} \dotfill 2%\
& 50-60 \hspace*{0.15em} \dotfill 12%\
& 60-70 \hspace*{0.15em} \dotfill 3%\
& 70-80 \hspace*{0.15em} \dotfill 6%\
& 80-90 \hspace*{0.15em} \dotfill 3%\
& 90-100 \hspace*{0.15em} \dotfill 2%\
& 100-110 \hspace*{0.15em} \dotfill 60% \
& Totals \hspace*{0.15em} \dotfill 100% \
& Unweighted N \hspace*{0.15em} \dotfill 350 \
\end{longtable}
\end{center}
I googled around and found this page which seems to have the answer, that is to manually remove the newlines from the .tex file (I didn't follow the insert [zz] thing in this page, that didn't seem to do it). And so if you had some QC step that removes those newlines, then the .tex file that generates should be fine.
add more tests, especially around MRs
I got an error as below when I ran toplines. After I excluded variables of type multiple-response, crosstabs() worked. FYI, the category names of those multiple-response variables are "selected" and "not selected". I also tried "Selected" and "not selected", but I got the same error.
[crunch] > toplines <- crosstabs(ds, weight = 'weight')
Error in ret$proportions[, "Selected"] : incorrect number of dimensions
In addition: Warning message:
In crosstabs(ds, weight = "weight") :
Variables of types: text, datetime are not supported and have been skipped
[crunch] > self(ds)
[1] "https://app.crunch.io/api/datasets/76683330d4c24ee9b84a2b211b9f996b/"
In Gryphon, we often use basic markdown formatting on question text. However, these updates are not typically visible in toplines or crosstab documents created by crunchtabs. Need to explore a method for "copying over" this injected formatting.
In a perfect world - that there was some formatting in the question that the respondent saw would also be replicated in the tabs
**
should be converted to \textbf{text}
<u></u>
should be \underline{} (need to use character entity too, such as < >)<br>
or <br />
should be converted to a newline (a carriage return)The XML export from Gryphon appears to provide all of the information that we need to copy over formatting elements like underline, bold and new lines.
A themeNew object is typically created with a path to a logo that is relative to the filename. However, sometimes our working directory makes the file inaccessible via the specified path. It would be good to check if the logo file exists, and if not, point the user to look at their working directory or logo file path before continuing execution because it will otherwise fail with an error that is uninformative.
Current:
Error in pdflatex(filename, open) :
PDF file does not exist. Check that there are no errors in the LaTeX file.
Planned:
Error: The logo specified in themeNew does not appear to exist. Please check your current working directory to verify the path to the file
LaTeX Warning: Font shape declaration has incorrect series value `mc'.
It should not contain an `m'! Please correct it.
Found on input line 20.
Package Fancyhdr Warning: \headheight is too small (12.0pt):
Make it at least 14.68837pt.
We now make it that large for the rest of the document.
This may cause the page layout to be inconsistent, however.
I was trying to export reports from crunch using package crunchtabs. I could export pdf report correctly. But I got an error when I exported excel report.
Without option proportions = TRUE in function writeExcel(), there is no error. But I want to see proportions rather than count in the excel report.
With option proportions = TRUE in function writeExcel(), there is an error as below:
Error in names(object) <- nm :
'names' attribute [4] must be the same length as the vector [1]
This is a post-processing action that should happen after the call to crosstabs
but before the call to writeLatex or writeExcel. It is expected that there will be many exceptional cases. It would be simple to apply, crosstab
wide, a sort. However, it's likely that groups of variables will need to be sorted in the same way. This suggests that a passthrough function with a list of variables included would be the clearest method for the user. Although themeNew
already has infrastructure for this type of formatting, we feel that it is already too complex additional functionality that manages the structure of the data in addition to the visual presentation would be confusing for the user.
Add two functions:
sort_alpha(ct, vars, descending, pin_to_top, pin_to_bottom)
sort_numeric(ct, vars, descending, pin_to_top, pin_to_bottom)
Usage example:
ct = crosstabs(ds)
ct = sort_numeric(ct, vars)
writeLatex()
The function acts as a passthrough that could be applied to a group of variables or to a single variable allowing us to create exceptions.
ct = crosstabs(ds)
ct = sort_numeric(ct, vars=c("a", "b"))
ct = sort_numeric(ct, vars=c("c"), pin_to_bottom = "Don't know")
writeLatex(ct)
Joe Williams reports in an email to [email protected] (emphasis added):
Data export from Gryphon to Crunch with dyngrid-check / grid-check problems is problematic. Variables are not bound together as would be expected for a grid type question. Instead, the grid-check subvariables are treated as stand alone variables in a separate folder with the label of the grid. The new stand alone variables take the description of the grid-check variable. The label for the new stand alone variables are the subvariable labels from the original grid-check question. Unexpected treatment, but not catastrophic. However, when creating a pdf using Crunch tabs, the toplines do not identify the subvariable being asked about. Also, the crosstabs, for some reason count the "No" response as part of the unweighted N. Rather than presenting an N, the crosstabs give percentages.
Sounds like an issue/request with this package, so I'm redirecting here. If you find that there's something for us to look into in crunch
with multiple response handling, please let us know.
R CMD check fails with an error on a Travis worker instance:
Error: package 'testthat' was installed by an R version with different internals; it needs to be reinstalled for use with this R version
Generate a codebook that includes data like: https://electionstudies.org/wp-content/uploads/2018/12/anes_timeseries_2016_userguidecodebook.pdf
We also want to be able to include basic summary information (almost like a topline, but for unweighted data)
Summaries for:
Latex Header Objects for:
banner recodes should support combines
See https://travis-ci.org/Crunch-io/crunchtabs/jobs/209846431 for a recent build on Travis. Lots of man/namespace issues. Among the ones I see are:
The method documentation can be a little tricky. What I've done for S4 is to define documentation separate from the methods and give it a @name
, like https://github.com/Crunch-io/rcrunch/blob/master/R/dataset-catalog.R#L75. Then you link all methods you want to go with it to it like this: https://github.com/Crunch-io/rcrunch/blob/master/R/dataset-catalog.R#L78-L79.
Failure looks like
-- 1. Failure: Write Latex crosstab (@test-write-latex.R#31) ------------------
`tex` not identical to `ref`.
4/723 mismatches
x[178]: "\\addcontentsline{lot}{table}{ 3A. Name the kinds of pets you have at the
x[178]: se locations. � Home}"
y[178]: "\\addcontentsline{lot}{table}{ 3A. Name the kinds of pets you have at the
y[178]: se locations. �\200� Home}"
That's from this line:
Line 361 in 67cd8ce
I'm not sure if this suggests a general Windows failure, or if this is specific to Appveyor's configuration, or the test environment, or what. Not sure how much it matters since there are alternatives within LaTeX. We could probably just do ---
instead, but some googling suggests that \textemdash
might work best when using other fonts. Should be a quick fix, only slowed by needing to check that the PDFs are right and then updating the fixture .tex files.
Goal: 80%
crunchtabs Coverage: 68.95%
R/crunchtabs.R: 0.00%
R/banner.R: 1.15%
R/crosstabs.R: 1.43%
R/getters.R: 15.00%
R/tabBooks.R: 24.48%
R/utils.R: 26.53%
R/themes-built-in.R: 59.42%
R/forNowTransforms.R: 64.44%
R/writeExcel.R: 79.07%
R/tex.R: 83.64%
R/tex-table.R: 88.31%
R/theme.R: 90.58%
R/reformatResults.R: 91.08%
R/writeLatex.R: 95.57%
In some situations where we have a subtable inside of a cross tab, if the number of responses is large, the table can span multiple pages. This is undesireable. Major clients prefer to see a complete table per page, if a table would overrun, the subtable should be shown on the next page.
Internal Note: See Battleground Tracker Dem Presidential Primary Vote by Party/Ideology
test-write-latex.R:67: failure: Write Latex toplines
`tex` not identical to `ref`.
Lengths differ: 367 is not 368
test-write-latex.R:31: failure: Write Latex crosstab
`tex` not identical to `ref`.
Lengths differ: 754 is not 723
test-write-latex.R:40: failure: Write Latex crosstab
tex[90] not equal to "Sample & Adults \\\\ ".
1/1 mismatches
x[1]: "\\setlength{\\LTright}{
x[1]: \\fill}"
y[1]: "Sample & Adults \\\\
y[1]: "
Focusing on writelatex for toplines here.
Currently, we use long table to extend a table that would run across the bottom of a page. A client would rather have it be so that long table did not intervene and if a new table would have to be spread across multiple pages, then instead it would skip to a new page and start from the top of that page.
I asked them, what about a scenario where you have a table so long that even if it starts on a new page for itself it would run off? They said, then in those cases it's ok to do long table. So might be fairly complicated what they want. They also said that it "almost never" happens in their work where a table that was given a whole new page would run off, but that's specific to them.
There is an option to round percentages down when the rounded sums are > 100. e.g., if the percents with were: 20.6, 20.4, 29.5, 29.5 that sums to 100, but when you round to no decimals, you get numbers that sum to 101. So, the function tries to find the place with the lowest error to round down, and takes the floor of that number instead.
You'd get something like 21, 20, 29, 30 in this case.
This is a long standing request from a client because for most questions, they can't display graphics on television that sum to > 100. This option is in the theme: latex_round_percentages = TRUE
A recent change request from a client was for certain questions, to ignore this default. (e.g., for vote questions, they'd rather have it sum to > 101 than report not the proper rounded percent for a candidate).
So the option to the theme was added: latex_round_percentages_exception = c('alias1', 'alias2')
to ignore the default.
Which works in that it doesn't round down, but doesn't work in that the crosstabs for these specific Qs are reporting a base size of 100 regardless. See questions 3 and 5 in the crosstabs attached.
If there is a table of contents in an .xls set of crosstabs, have a hyperlink at hte bottom of the table that jumps back to the Table of Contents page.
There aren't any, and there need to be. Start at the highest level with an integration test or two, see what kind of coverage that gets you. Then you can work to refactor and extend the code with greater confidence.
Clients ask questions that involve quotes and use double quotation marks in the wording. However, crunchtabs will throw an error if it encounters double quotation marks. Current work around is to replace double quotation marks with single quotation marks, but that seems less than ideal. Is there a way for crunchtabs to identify double quotation marks and handle without error?
theme$format_label_column$col_width
Testing each category against the other categories for that variable. This is what clients have gotten traditionally and is what they expect.
May already have this functionality
I noticed when I was doing it that duration which is numeric is dropped. So we will need to get crunchtabs to do that...
Duration should be included as part of the sample description if available.
see title. Let's get it functional so that we can actually 'continuously' 'integrate'
investigate if they can be made? Errors currently with basic examples from pet dataset.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.