awalker89 / openxlsx Goto Github PK
View Code? Open in Web Editor NEWR package for .xlsx file reading and writing.
License: Other
R package for .xlsx file reading and writing.
License: Other
! LaTeX Error: File `tableStyles' not found.
As per title
The test is
library(openxlsx)
getwd()
## Absolute path
dir.create("absolute_path")
wb <- createWorkbook()
my.file <- paste(getwd(),
"absolute_path/test_path.xlsx",
sep = "/")
my.file
unlink(my.file)
addWorksheet(wb = wb, sheetName = "data.frame")
writeData(wb = wb, sheet = "data.frame", x = Indometh)
saveWorkbook(wb, my.file, overwrite = TRUE)
list.files(path = "absolute_path",
pattern = glob2rx("*.xlsx"))
## Relative path
dir.create("relative_path")
wb <- createWorkbook()
my.file <- "relative_path/test_path.xlsx"
my.file
unlink(my.file)
addWorksheet(wb = wb, sheetName = "data.frame")
writeData(wb = wb, sheet = "data.frame", x = Indometh)
saveWorkbook(wb, my.file, overwrite = TRUE)
list.files(path = "relative_path",
pattern = glob2rx("*.xlsx"))
sessionInfo()
What i get is
list.files(path = "absolute_path",
pattern = glob2rx("*.xlsx"))
[1] "test_path.xlsx"list.files(path = "relative_path",
pattern = glob2rx("*.xlsx"))
character(0)
That is saveWorkbook
seems to fail silently when a relative path is given to file
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=it_IT.UTF-8 LC_NUMERIC=C
[3] LC_TIME=it_IT.UTF-8 LC_COLLATE=it_IT.UTF-8
[5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=it_IT.UTF-8
[7] LC_PAPER=it_IT.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] datasets grDevices graphics utils stats methods base
other attached packages:
[1] openxlsx_1.0.4
loaded via a namespace (and not attached):
[1] Rcpp_0.11.1
This would be helpful as a new class
option.
It should be possible to use the package without attaching it via library
. The following happens with 39b3873 on my machine:
$ Rscript --vanilla -e "names(openxlsx::loadWorkbook('a.xlsx'))"
Error in Workbook$new : could not find function "loadMethod"
Calls: <Anonymous> -> createWorkbook -> $
Execution halted
$ Rscript --vanilla -e "library(openxlsx); names(loadWorkbook('a.xlsx'))"
Error in Workbook$new : could not find function "loadMethod"
Calls: loadWorkbook -> createWorkbook -> $
Execution halted
$ Rscript --vanilla -e "library(methods); names(openxlsx::loadWorkbook('a.xlsx'))"
NULL
$ Rscript --vanilla -e "library(methods); library(openxlsx); names(loadWorkbook('a.xlsx'))"
<works>
$ Rscript --vanilla -e "library(openxlsx); library(methods); names(loadWorkbook('a.xlsx'))"
<works>
From what I remember, you have to (at least) declare a dependency on the methods
package and import it. But the third example suggests that there's more to it.
A weird bug that appeares when reading a limited group of columns from a row > 1. If there is content above the data, colnames and data are messed up.
library(openxlsx)
wb <- createWorkbook()
# adding 4 sheets (would be nice if this could be vectorized)
addWorksheet(wb, "first")
addWorksheet(wb, "third0")
addWorksheet(wb, "third1")
addWorksheet(wb, "third2")
## Need data on worksheet to see all headers and footers
writeData(wb, sheet = 1, CO2)
writeData(wb, sheet = 2, startRow = 3, CO2)
writeData(wb, sheet = 3, startRow = 3, CO2)
writeData(wb, sheet = 4, startRow = 3, CO2)
writeData(wb, sheet = 3, startRow = 1, "lineoftext")
writeData(wb, sheet = 4, startRow = 1, c("lineoftext", "anotherlineoftext"))
## Save workbook
saveWorkbook(wb, "min.xlsx", overwrite = TRUE)
The file min.xlsx
contains four versions of identical data:
Reading data from sheet 'third1' and 'third2' leads to wrong colnames and data in the following calls:
read.xlsx("min.xlsx", sheet = "third1", startRow = 3, cols = 1:2)
# Plant 1
#1 Qn1 6
#2 Qn1 6
read.xlsx("min.xlsx", sheet = "third2", startRow = 3, cols = 1:2)
# Error in stringInds[[1]] : subscript out of bounds
However, reading all columns work as expected
read.xlsx("min.xlsx", sheet = "third1", startRow = 3, cols = 1:5)
read.xlsx("min.xlsx", sheet = "third2", startRow = 3, cols = 1:5)
# Plant Type Treatment conc uptake
#1 Qn1 Quebec nonchilled 95 16.0
#2 Qn1 Quebec nonchilled 175 30.4
Also, reading from the first sheet works fine
read.xlsx("min.xlsx", sheet = "first", startRow = 1, cols = 1:2)
# Plant Type
#1 Qn1 Quebec
#2 Qn1 Quebec
Some of the file I worked with has excessive formatting like
X1<font=font1>1 which really is just X11
like what I get from sharedStrings.xml in the follow img:
They break the value in cell. Such "X11" becomes "X" in dataframe loaded via openxlsx.
I wonder if its possible for openxlsx to accept some user defined regex rules for data cleaning.
or provide a option to just dump these xml fragments to dataframe for user to further process.
e.g. for the above img, I loaded the file in python and did a replacement,
<rPr>.*?</rPr><
<phoneticPr [^<]+<
</t></r><r><t>([^<])
after that openxlsx works just fine.
Hey, thanks for the update on stackable styles, it works great. One thing I noticed is that text decorations stopped working. Could you look into this?
Eg. the following results in unformatted text
wb = createWorkbook()
sBoldText = createStyle(textDecoration="bold") ## Replace bold with anything here (eg. underline, strikeout)
ws = addWorksheet(wb, sheetName = "1", gridLines = T)
writeData(wb, sheet=ws, startCol=1, startRow=1, "test")
addStyle(wb, ws, rows=1, cols=1, style=sBoldText)
saveWorkbook(wb, "textstyle.xlsx", overwrite = TRUE)
Hey,
Thanks for the quick fixes and response to feedback! I really like conditional formatting, and I hope you could consider the following improvements over time:
Thanks!
Floris
Thanks for the package.
On my system, I can read a file, write it. But it is corrupted for excel. Even a simple file such as:
write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "rows")
I am sending by email the corresponding saved file
I am writing crosstables to Excel, and to add a caption for the table, I use one row with merged cells above the table.
However, this clashes with the nicely working auto
option for the column widths, as the column where the caption starts is expanded to accomodate the entire caption-string, which is not needed.
Is it possible to disregard the contents of merged cells when calculating the width of the single colums that these merged cells span?
If not (or if it is a to specific feature), is there a workaround?
When opening an existing workbook, adding sheets to it and saving it, the sheets are not necessarily being added after the last existing sheet (intended), but after the first existing sheet. When calling worksheetOrder(wb) it does spit out the intended order, but writes it differently anyway.
Btw.: Thank you so much for your effort! I've posted a few issues now and I'm really glad that you're fixing them in almost no time! Keep up the good work!
Hi, thanks for this neat package which makes working with excel file really easy.
It would be nice to be able to specify the table name with WriteDataTable
BTW openxlsx removes pivot tables in saved new files.
Not a big deal though since the source sheets can be saved in a separate file.
...are converted to spaces. Is this by design?
read.xlsx("c:/book1.xlsx",sheet=1)
Error in tolower(sharedStrings) : invalid multibyte string 4
Original code in read.xlsx function:
z <- tolower(sharedStrings)
sharedStrings[z == "true"] <- "TRUE"
sharedStrings[z == "false"] <- "FALSE"
Can be revised as follows:
z <- tolower(sharedStrings)
Encoding(z) <- "UTF-8"
sharedStrings[z == "true"] <- "TRUE"
sharedStrings[z == "false"] <- "FALSE"
Hi
thanks for the great package.
The link on your Git Page http://awalker89.github.io/openxlsx/ does not work.
Christof
Hi! I'm working with data including Chinese multibyte string.
When using read.xlsx, if there's "any" Chinese character in any place(even in the sheet which not to be read), it will cause error as below
# Error in tolower(sharedStrings) : invalid multibyte string 414
or
# Error in tolower(sharedStrings) : invalid multibyte string 2
Hi Alex
There seems to be an issue with local characters.
This produces a broken xlsx file. Notice the "ø" in "Iriø". The file can't be repaired.
require(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, sheetName = "Iriø")
saveWorkbook(wb=wb, file="basics1.xlsx", overwrite=TRUE)
This also produces a broken xlsx file, but the file can be repaired.
require(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, sheetName = "Iris")
saveWorkbook(wb=wb, file="basics2.xlsx", overwrite=TRUE)
My session info.
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252
[3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
[5] LC_TIME=Danish_Denmark.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_0.9.3.1 openxlsx_1.0.4 devtools_1.5
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 digest_0.6.4 evaluate_0.5.3 grid_3.1.0 gtable_0.1.2
[6] httr_0.3 MASS_7.3-31 memoise_0.2.1 munsell_0.4.2 parallel_3.1.0
[11] plyr_1.8.1 proto_0.3-10 Rcpp_0.11.1 RCurl_1.95-4.1 reshape2_1.4
[16] scales_0.2.4 stringr_0.6.2 tools_3.1.0 whisker_0.3-2
This should be the current dev version here from github.
Not sure what causes this one.
The first time I run my code the output is corrupt. The second time it works.
It's not possible to select cells and then some error messages and then excel crashes.
ie.
"cannot make changes to table or xml mapping when multiple sheets are selected"
"cannot modify the contents of a table total row"
selecting several tabs before interacting with the document resolves the problem.
The code worked before an update from git on sept 22.
Example xlsx
https://db.tt/2zekiE8T
I have a numerical variable with one value.
mean_a <- mean(df$a)
Now, I would like to use this value for conditional formatting on all cells in column number 2.
conditionalFormat(df_wb, 1, cols = 2, rows=2:(nrow(df)+1), rule='>=mean_a', style = posyellowflagStyle)
However, above conditionalFormat line does not work.
Can one use a variable value to set rule in conditionalFormat function?
Thank you for your help!
Calling removeWorksheet throws the following error if the last remaining sheet is removed:
Error in *tmp*[[i]] : subscript out of bounds
The sheet is removed anyway, though.
Shiny writes output in temp file with name like "file818612a4ca0" when client tries to download it. So this name is passed to write.xlsx() function and it throws error "File name must end with '.xlsx'".
Code:
output$downloadOverall <- downloadHandler(
filename = function() { "name.xlsx" },
content = function(file) {
#"data" is a dataframe
write.xlsx(data,file)
}
)
Suggest adding extension checking as a parameter passing to function write.xlsx() and further to saveWorkbook().
In a current build saveWorkbook does not work anymore:
Error: expecting a string
Called from: .self$writeSheetDataXML(xldrawingsDir, xldrawingsRelsDir, xlworksheetsDir,
xlworksheetsRelsDir)
The loadWorkbook()
function allow to create a Workbook
object from a xlsx file. I didn't find any function to transform a sheet from this object to a data.frame. I tried read.xlsx
and readWorkbook
but both read from a file, not from an Workbook
object.
Then a readSheet
function, or, better, a method read.xlsx
for Workbook
should be useful.
If this function already exists, maybe add the name of this function in the document page of the loadWorkbook()
function.
And thank you again for this very useful package.
I would like to be able to use fileInput from the shiny package--it creates a temporary file that does not end in "xlsx", which causes read.xlsx to fail because of the following check.
if (!grepl("xlsx$", xlsxFile))
stop("File must have extension .xlsx!")
Love the package, thanks!
The use case that I find particularly helpful is being able to load preformatted xlsx templates for writing to.
Can the package handle writing formulas? If I try and write hyperlinks as below, the xlsx is corrupted. I saw that you can assign a hyperlink
style but some of the urls are quite long so alt text is necessary.
=HYPERLINK("http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=gene&ln=TNXB","TNXB")
R 3.1.1 on Linux, was OK to install via install_github a week ago to verify inline formatting cleaning.
since commit ad2574b
$ R CMD build openxlsx-master
The read.xlsx function grepls for "true" or "false" in the shared strings xml and converts those to the R boolean types. This is a problem if there are character cells being read in that contain the words "true" or "false" but are not booleans in their own right (this has come up as I am importing records of alarm systems that contain categories like "examined, false alarm").
Perhaps this would be better looking for an exact match rather than searching for "true" or "false" anywhere within the string.
Hi,
I wonder why I cannot define e.g.
s <- createStyle(numFmt = "0.00")
while the manual/vignette says so.
Regards,
Willi
Hi awalker89!
He seems write.xlsx export my data but there is the columns with cells NA. I think it's a problem encoding.
You have an example data source (data_patho_2012) in my github (GabSaniR repository).
Do you have any solution? Thanks.
When I create a new excel file with openxlsx and write a data.frame to it, the text encoding is fine. But when I load the excel file afterwards, add a sheet and data and save it again with openxlsx, the encoding of the existing sheet is broken (weird characters instead of german umlauts).
It seems Microsoft Excel 2003 (with Microsoft Office Compatibility Pack version 12.0.4518.1018) can't open a file written by openxlsx.
However, the same file is opened by OpenOffice/Libreoffice without troubles.
Not a high priority issue IMHO.
cheers,
Luca
Working out how github works.
It would be great if there was a function like autoSizeColumn (xlsx Package), that automatically adjusts the width of each column based on the column's content. An awesome addition would be if I could set a maximum width.
Hi,
I noticed that adding another style on top of textDecoration with stack=T does not stack, adding textDecoration on top of another style with stack=T works fine.
Thanks!
Floris
It will be very helpful if this function provide parameter like colClasses="character", instead of guessing the class type of the variable, which might be not a robust way of reading data.
I constantly get into Error: Index Out of Bound issue, not sure where I got wrong?
wb<-createWorkbook()
addWorksheet(wb, sheetName = "Hard_Benefits")
writeDataTable(wb,sheet = "Hard_Benefits",Hard_Benefits, rowNames = F)
Error: index out of boundsaddWorksheet(wb, sheetName = "Soft_Benefits")
writeDataTable(wb,sheet = "Soft_Benefits",Soft_Benefits, rowNames = F)
Error: index out of boundsaddWorksheet(wb, sheetName = "Other_Costs")
writeDataTable(wb, sheet = "Other_Costs", Other_Costs, rowNames = F)
Error: index out of boundsaddWorksheet(wb, sheetName = "Raw_Data")
writeDataTable(wb, sheet = "Raw_Data",x = data, rowNames = F)saveWorkbook(wb,"Output.xlsx",overwrite = T)
Below are the data tables I'm trying to write into excel 2010: (crossed out data are irrelevant)
This is a really killer effort. I wonder if you would consider adding the option to read.xls
to indicate which rows and columns you want to import. I frequently get data sourced that has stuff pasted all over in various cells and I would love to drop xlsx
as a dependency so I don't have to use rJava
.
A few styling requests to add to the queue. They may be available now but I wasn't sure.
Vertical text. It would be helpful for long headers.
Significant Figures / Rounding. Often the exported floats are unreasonably large in terms of precision.
A way to map logical values in R to excel for conditional formatting. Often I have row/column/cell logicals and it's a bit tricky to use in excel. I started a function but thought you'd have something better.
Daryl
openxlsx does not seem to work with *.xlsm files containing macros. What I'm trying to do is opening a xlsm file, writing some data and saving it. The file extension xlsx is being added automatically, resulting in a .xlsm.xlsx with all macros removed.
The package xlsx on the other hand does not have any issues with xlsm.
For the convertToDateTime function, can the behavior for dealing with a NaN or NA in a list of Excel date time numeric values be to give back an NA instead of halting the entire conversion? I currently get this as an error if there are NaN values in a list of dates:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
Thanks!
Would be great to define for each worksheet a separate header and footer or/and to use the Excel build in functions (e.g. &[Page] of &[Pages] or &[Tab] --> ends up in a corrupt file at the moment)
Once published, this package could be very helpful for externalizing common "complex" (not matrix-like) objects management for writeData
.
Daily usage of openxlsx should be something like
writeData(x = tidy(lm.model), ......)
instead of current
writeData(x = lm.model, ......)
No need to declare it as a dependency/import too ...
What do you think? Do you prefer to have package own's methods?
best,
Luca
When you open files created by openxlsx in Excel and try to add a sheet using Excel you get the error message "That command cannot be used on multiple selection". Something seems to be corrupted there.
Hi,
First off, thanks for developing the best R-excel package available. This package has been a great aid in my work.
I'd like to request that formatting becomes stackable. Ie. when you add a "bold" style to a cell you can subsequently add a "orange" fill to a cell. Currently, if you were to add the orange fill to the bold it would overwrite it. Perhaps such a mechanism would be possible using a getStyle(cellIndices) function?
Regards,
Floris
Currently read.xlsx
in openxlsx
returns dates as numbers, presumably the number of days from the relevant origin.
But given the inconsistency (also here) of the origin in Excel between platforms, this date format is not straightforward to parse correctly. It would be more useful (and less error-prone) to instead return a string with the date as formatted in the relevant cell.
This would also make it consistent with the treatment of dates by the xlsx
package.
This is a test, sorry
> getOption("scipen")
[1] 0
>
> wb <- createWorkbook()
> addWorksheet(wb, "Cars")
>
> x <- mtcars[1:6,]
> writeData(wb, "Cars", x, startCol = 2, startRow = 3, rowNames = TRUE)
>
> getOption("scipen")
$scipen
[1] 0
Instead of options("scipen")
, you should use getOption("scipen")
instead within the following functions:
- wrappers.R
- writeData.R
- writeDataTable.R
eg: exSciPen <- getOption("scipen")
instead of exSciPen <- options("scipen")
Your present code returns a list of length one instead of a numeric. With each call of one of the above functions, this list is growing and openxlsx crashes at some point. Thanks for your package - it's really great!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.