awalker89 / openxlsx Goto Github PK

View Code? Open in Web Editor NEW

364.0 364.0 79.0 8.05 MB

R package for .xlsx file reading and writing.

License: Other

R 90.13% C++ 9.30% C 0.58%

openxlsx's People

Contributors

Stargazers

Watchers

openxlsx's Issues

R CMD build openxlsx-master fails with ERROR

! LaTeX Error: File `tableStyles' not found.

Implement auto-detection of date columns in read.xlsx

As per title

saveWorkbook and relative paths

The test is

library(openxlsx)
getwd()

## Absolute path 
dir.create("absolute_path")
wb <- createWorkbook()
my.file <- paste(getwd(),
                 "absolute_path/test_path.xlsx",
                 sep = "/")
my.file
unlink(my.file)
addWorksheet(wb = wb, sheetName = "data.frame")
writeData(wb = wb, sheet = "data.frame", x = Indometh)
saveWorkbook(wb, my.file,  overwrite = TRUE)
list.files(path = "absolute_path",
           pattern = glob2rx("*.xlsx"))

## Relative path
dir.create("relative_path")
wb <- createWorkbook()
my.file <- "relative_path/test_path.xlsx"
my.file
unlink(my.file)
addWorksheet(wb = wb, sheetName = "data.frame")
writeData(wb = wb, sheet = "data.frame", x = Indometh)
saveWorkbook(wb, my.file,  overwrite = TRUE)
list.files(path = "relative_path",
           pattern = glob2rx("*.xlsx"))

sessionInfo()

What i get is

list.files(path = "absolute_path",
pattern = glob2rx("*.xlsx"))
[1] "test_path.xlsx"

list.files(path = "relative_path",
pattern = glob2rx("*.xlsx"))
character(0)

That is saveWorkbook seems to fail silently when a relative path is given to file

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=it_IT.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] datasets  grDevices graphics  utils     stats     methods   base     

other attached packages:
[1] openxlsx_1.0.4

loaded via a namespace (and not attached):
[1] Rcpp_0.11.1

scientific notation formatting (feature request)

This would be helpful as a new class option.

Using the package without attaching it

It should be possible to use the package without attaching it via library. The following happens with 39b3873 on my machine:

$ Rscript --vanilla -e "names(openxlsx::loadWorkbook('a.xlsx'))"
Error in Workbook$new : could not find function "loadMethod"
Calls: <Anonymous> -> createWorkbook -> $
Execution halted
$ Rscript --vanilla -e "library(openxlsx); names(loadWorkbook('a.xlsx'))"
Error in Workbook$new : could not find function "loadMethod"
Calls: loadWorkbook -> createWorkbook -> $
Execution halted
$ Rscript --vanilla -e "library(methods); names(openxlsx::loadWorkbook('a.xlsx'))"
NULL
$ Rscript --vanilla -e "library(methods); library(openxlsx); names(loadWorkbook('a.xlsx'))"
<works>
$ Rscript --vanilla -e "library(openxlsx); library(methods); names(loadWorkbook('a.xlsx'))"
<works>

From what I remember, you have to (at least) declare a dependency on the methods package and import it. But the third example suggests that there's more to it.

Reading error if there is content above the data

A weird bug that appeares when reading a limited group of columns from a row > 1. If there is content above the data, colnames and data are messed up.

Creating a minimal example

library(openxlsx)
wb <- createWorkbook()

# adding 4 sheets (would be nice if this could be vectorized)
addWorksheet(wb, "first")
addWorksheet(wb, "third0")
addWorksheet(wb, "third1")
addWorksheet(wb, "third2")


## Need data on worksheet to see all headers and footers
writeData(wb, sheet = 1, CO2)
writeData(wb, sheet = 2, startRow = 3, CO2)
writeData(wb, sheet = 3, startRow = 3, CO2)
writeData(wb, sheet = 4, startRow = 3, CO2)

writeData(wb, sheet = 3, startRow = 1, "lineoftext")
writeData(wb, sheet = 4, startRow = 1, c("lineoftext", "anotherlineoftext"))

## Save workbook
saveWorkbook(wb, "min.xlsx", overwrite = TRUE)

The file min.xlsx contains four versions of identical data:

in sheet 'first', the data is on the first row
in sheet 'third0', the data is on the first row, with 0 lines of content above the data
in sheet 'third1', the data is on the first row, with 1 line of content above the data
in sheet 'third2', the data is on the first row, with 2 line of text in the second row.

Replicating the bug

Reading data from sheet 'third1' and 'third2' leads to wrong colnames and data in the following calls:

read.xlsx("min.xlsx", sheet = "third1", startRow = 3, cols = 1:2)
#    Plant  1
#1    Qn1  6
#2    Qn1  6

read.xlsx("min.xlsx", sheet = "third2", startRow = 3, cols = 1:2)
# Error in stringInds[[1]] : subscript out of bounds

However, reading all columns work as expected

read.xlsx("min.xlsx", sheet = "third1", startRow = 3, cols = 1:5)
read.xlsx("min.xlsx", sheet = "third2", startRow = 3, cols = 1:5)
#    Plant        Type  Treatment conc uptake
#1    Qn1      Quebec nonchilled   95   16.0
#2    Qn1      Quebec nonchilled  175   30.4

Also, reading from the first sheet works fine

read.xlsx("min.xlsx", sheet = "first", startRow = 1, cols = 1:2)
#    Plant        Type
#1    Qn1      Quebec
#2    Qn1      Quebec

excessive styling breaks cell data

Some of the file I worked with has excessive formatting like
X1<font=font1>1 which really is just X11
like what I get from sharedStrings.xml in the follow img:

They break the value in cell. Such "X11" becomes "X" in dataframe loaded via openxlsx.
I wonder if its possible for openxlsx to accept some user defined regex rules for data cleaning.
or provide a option to just dump these xml fragments to dataframe for user to further process.
e.g. for the above img, I loaded the file in python and did a replacement,

<rPr>.*?</rPr><
<phoneticPr [^<]+<
</t></r><r><t>([^<])

after that openxlsx works just fine.

textDecoration is broken

Hey, thanks for the update on stackable styles, it works great. One thing I noticed is that text decorations stopped working. Could you look into this?

Eg. the following results in unformatted text

Test

wb = createWorkbook()
sBoldText = createStyle(textDecoration="bold") ## Replace bold with anything here (eg. underline, strikeout)
ws = addWorksheet(wb, sheetName = "1", gridLines = T)
writeData(wb, sheet=ws, startCol=1, startRow=1, "test")
addStyle(wb, ws, rows=1, cols=1, style=sBoldText)
saveWorkbook(wb, "textstyle.xlsx", overwrite = TRUE)

Feature request: more complex conditional formatting

Hey,

Thanks for the quick fixes and response to feedback! I really like conditional formatting, and I hope you could consider the following improvements over time:

Conditional formatting based on cells containing a certain text
Conditional formatting based on duplicate or unique values
Conditional formatting based on empty cells
Allow setting of intervals for 2- or 3- color scales. Eg. add two parameters type and value. Type you could change to integer and then set an integer value in the value paramter.

Thanks!

Floris

Corrupt file

Thanks for the package.

On my system, I can read a file, write it. But it is corrupted for excel. Even a simple file such as:
write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "rows")

I am sending by email the corresponding saved file

Disregard merged columns when calculating automatic column width

I am writing crosstables to Excel, and to add a caption for the table, I use one row with merged cells above the table.

However, this clashes with the nicely working auto option for the column widths, as the column where the caption starts is expanded to accomodate the entire caption-string, which is not needed.

Is it possible to disregard the contents of merged cells when calculating the width of the single colums that these merged cells span?

If not (or if it is a to specific feature), is there a workaround?

Worksheet order incorrect after writing

When opening an existing workbook, adding sheets to it and saving it, the sheets are not necessarily being added after the last existing sheet (intended), but after the first existing sheet. When calling worksheetOrder(wb) it does spit out the intended order, but writes it differently anyway.

Btw.: Thank you so much for your effort! I've posted a few issues now and I'm really glad that you're fixing them in almost no time! Keep up the good work!

define table name

Hi, thanks for this neat package which makes working with excel file really easy.
It would be nice to be able to specify the table name with WriteDataTable

BTW openxlsx removes pivot tables in saved new files.
Not a big deal though since the source sheets can be saved in a separate file.

Embedded newlines in cell values

...are converted to spaces. Is this by design?

Encoding issue in read.xlsx

read.xlsx("c:/book1.xlsx",sheet=1)
Error in tolower(sharedStrings) : invalid multibyte string 4

Original code in read.xlsx function:
z <- tolower(sharedStrings)
sharedStrings[z == "true"] <- "TRUE"
sharedStrings[z == "false"] <- "FALSE"

Can be revised as follows:
z <- tolower(sharedStrings)
Encoding(z) <- "UTF-8"
sharedStrings[z == "true"] <- "TRUE"
sharedStrings[z == "false"] <- "FALSE"

openxlsx Homepage links to "404 There isn't a GitHub Page here."

thanks for the great package.

The link on your Git Page http://awalker89.github.io/openxlsx/ does not work.

Christof

Read xlsx file containing Chinese character using `read.xlsx` shows error message

Hi! I'm working with data including Chinese multibyte string.
When using read.xlsx, if there's "any" Chinese character in any place(even in the sheet which not to be read), it will cause error as below

# Error in tolower(sharedStrings) : invalid multibyte string 414

 # Error in tolower(sharedStrings) : invalid multibyte string 2

Special characters not valid in sheet name

Hi Alex

There seems to be an issue with local characters.

This produces a broken xlsx file. Notice the "ø" in "Iriø". The file can't be repaired.

require(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, sheetName = "Iriø")
saveWorkbook(wb=wb, file="basics1.xlsx", overwrite=TRUE)

This also produces a broken xlsx file, but the file can be repaired.

require(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, sheetName = "Iris")
saveWorkbook(wb=wb, file="basics2.xlsx", overwrite=TRUE)

My session info.

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252   
[3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C                   
[5] LC_TIME=Danish_Denmark.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_0.9.3.1 openxlsx_1.0.4  devtools_1.5   

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4     evaluate_0.5.3   grid_3.1.0       gtable_0.1.2    
 [6] httr_0.3         MASS_7.3-31      memoise_0.2.1    munsell_0.4.2    parallel_3.1.0  
[11] plyr_1.8.1       proto_0.3-10     Rcpp_0.11.1      RCurl_1.95-4.1   reshape2_1.4    
[16] scales_0.2.4     stringr_0.6.2    tools_3.1.0      whisker_0.3-2

This should be the current dev version here from github.

corrupt xlsx

Not sure what causes this one.

The first time I run my code the output is corrupt. The second time it works.

corrupt_example.xlsx

can't copy cells then excel crashes

It's not possible to select cells and then some error messages and then excel crashes.

ie.
"cannot make changes to table or xml mapping when multiple sheets are selected"
"cannot modify the contents of a table total row"

selecting several tabs before interacting with the document resolves the problem.

The code worked before an update from git on sept 22.

Example xlsx
https://db.tt/2zekiE8T

conditionalFormat with a specific value from a varible

I have a numerical variable with one value.

mean_a <- mean(df$a)

Now, I would like to use this value for conditional formatting on all cells in column number 2.

conditionalFormat(df_wb, 1, cols = 2, rows=2:(nrow(df)+1), rule='>=mean_a', style = posyellowflagStyle)

However, above conditionalFormat line does not work.

Can one use a variable value to set rule in conditionalFormat function?

Thank you for your help!

removeWorksheet throws error

Calling removeWorksheet throws the following error if the last remaining sheet is removed:
Error in *tmp*[[i]] : subscript out of bounds

The sheet is removed anyway, though.

Mandatory ".xlsx" extension in name with Shiny framework

Shiny writes output in temp file with name like "file818612a4ca0" when client tries to download it. So this name is passed to write.xlsx() function and it throws error "File name must end with '.xlsx'".

Code:

  output$downloadOverall <- downloadHandler(
    filename = function() { "name.xlsx" },
    content = function(file) {
      #"data" is a dataframe
      write.xlsx(data,file)
    }
  )

Suggest adding extension checking as a parameter passing to function write.xlsx() and further to saveWorkbook().

Error: expecting a string on saveWorkbook

In a current build saveWorkbook does not work anymore:

Error: expecting a string
Called from: .self$writeSheetDataXML(xldrawingsDir, xldrawingsRelsDir, xlworksheetsDir,
xlworksheetsRelsDir)

Feature request : read a sheet from a Workbook object

The loadWorkbook() function allow to create a Workbook object from a xlsx file. I didn't find any function to transform a sheet from this object to a data.frame. I tried read.xlsx and readWorkbook but both read from a file, not from an Workbook object.

Then a readSheet function, or, better, a method read.xlsx for Workbook should be useful.

If this function already exists, maybe add the name of this function in the document page of the loadWorkbook() function.

And thank you again for this very useful package.

read.xlsx cannot read from a shiny selected file

I would like to be able to use fileInput from the shiny package--it creates a temporary file that does not end in "xlsx", which causes read.xlsx to fail because of the following check.

 if (!grepl("xlsx$", xlsxFile)) 
        stop("File must have extension .xlsx!")

Corrupted xlsx when writing formulas

Love the package, thanks!

The use case that I find particularly helpful is being able to load preformatted xlsx templates for writing to.

Can the package handle writing formulas? If I try and write hyperlinks as below, the xlsx is corrupted. I saw that you can assign a hyperlink style but some of the urls are quite long so alt text is necessary.

=HYPERLINK("http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=gene&ln=TNXB","TNXB")

build failed when creating vignettes.

R 3.1.1 on Linux, was OK to install via install_github a week ago to verify inline formatting cleaning.
since commit ad2574b

$ R CMD build openxlsx-master

checking for file ‘openxlsx-master/DESCRIPTION’ ... OK
preparing ‘openxlsx’:
checking DESCRIPTION meta-information ... OK
cleaning src
installing the package to build vignettes
creating vignettes ... ERROR
Warning in remind_sweave(if (in.file) input, sweave_lines) :
It seems you are using the Sweave-specific syntax in line(s) 23; you may need Sweave2knitr("Introduction.Rnw") to convert it to knitr
Warning: running command 'kpsewhich framed.sty' had status 1
Warning in test_latex_pkg("framed", system.file("misc", "framed.sty", package = "knitr")) :
unable to find LaTeX package 'framed'; will use a copy from knitr
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Introduction.tex' failed.
LaTeX errors:
! Undefined control sequence.
l.72 \SweaveOpts
{concordance=TRUE}
The control sequence at the end of the top line
of your error message was never \def'ed. If you have
Calls: -> texi2pdf -> texi2dvi
Execution halted

text to boolean on read

The read.xlsx function grepls for "true" or "false" in the shared strings xml and converts those to the R boolean types. This is a problem if there are character cells being read in that contain the words "true" or "false" but are not booleans in their own right (this has come up as I am importing records of alarm systems that contain categories like "examined, false alarm").

Perhaps this would be better looking for an exact match rather than searching for "true" or "false" anywhere within the string.

error: invalid numFmt

Hi,

I wonder why I cannot define e.g.

s <- createStyle(numFmt = "0.00")

while the manual/vignette says so.

Regards,
Willi

Output cell NA

Hi awalker89!
He seems write.xlsx export my data but there is the columns with cells NA. I think it's a problem encoding.

You have an example data source (data_patho_2012) in my github (GabSaniR repository).

Do you have any solution? Thanks.

Broken text encoding after loadWorkbook and saveWorkbook

When I create a new excel file with openxlsx and write a data.frame to it, the text encoding is fine. But when I load the excel file afterwards, add a sheet and data and save it again with openxlsx, the encoding of the existing sheet is broken (weird characters instead of german umlauts).

Excel 2003 and openxlsx

It seems Microsoft Excel 2003 (with Microsoft Office Compatibility Pack version 12.0.4518.1018) can't open a file written by openxlsx.
However, the same file is opened by OpenOffice/Libreoffice without troubles.
Not a high priority issue IMHO.
cheers,
Luca

How does this issue stuff work

Working out how github works.

Feature request: Auto column width

It would be great if there was a function like autoSizeColumn (xlsx Package), that automatically adjusts the width of each column based on the column's content. An awesome addition would be if I could set a maximum width.

textDecoration non stackable

Hi,

I noticed that adding another style on top of textDecoration with stack=T does not stack, adding textDecoration on top of another style with stack=T works fine.

Thanks!
Floris

Will `read.xlsx` support selecting specific class type of variables?

It will be very helpful if this function provide parameter like colClasses="character", instead of guessing the class type of the variable, which might be not a robust way of reading data.

Error: Index out of bound

I constantly get into Error: Index Out of Bound issue, not sure where I got wrong?

wb<-createWorkbook()
addWorksheet(wb, sheetName = "Hard_Benefits")
writeDataTable(wb,sheet = "Hard_Benefits",Hard_Benefits, rowNames = F)
Error: index out of bounds

addWorksheet(wb, sheetName = "Soft_Benefits")
writeDataTable(wb,sheet = "Soft_Benefits",Soft_Benefits, rowNames = F)
Error: index out of bounds

addWorksheet(wb, sheetName = "Other_Costs")
writeDataTable(wb, sheet = "Other_Costs", Other_Costs, rowNames = F)
Error: index out of bounds

addWorksheet(wb, sheetName = "Raw_Data")
writeDataTable(wb, sheet = "Raw_Data",x = data, rowNames = F)

saveWorkbook(wb,"Output.xlsx",overwrite = T)

Below are the data tables I'm trying to write into excel 2010: (crossed out data are irrelevant)

Request: colIndex and rowIndex

This is a really killer effort. I wonder if you would consider adding the option to read.xls to indicate which rows and columns you want to import. I frequently get data sourced that has stuff pasted all over in various cells and I would love to drop xlsx as a dependency so I don't have to use rJava.

Formatting (Feature Request)

A few styling requests to add to the queue. They may be available now but I wasn't sure.

Vertical text. It would be helpful for long headers.
Significant Figures / Rounding. Often the exported floats are unreasonably large in terms of precision.
A way to map logical values in R to excel for conditional formatting. Often I have row/column/cell logicals and it's a bit tricky to use in excel. I started a function but thought you'd have something better.

Daryl

Opening and writing in xlsm files (with macros preserved)

openxlsx does not seem to work with *.xlsm files containing macros. What I'm trying to do is opening a xlsm file, writing some data and saving it. The file extension xlsx is being added automatically, resulting in a .xlsm.xlsx with all macros removed.

The package xlsx on the other hand does not have any issues with xlsm.

convertToDateTime

For the convertToDateTime function, can the behavior for dealing with a NaN or NA in a list of Excel date time numeric values be to give back an NA instead of halting the entire conversion? I currently get this as an error if there are NaN values in a list of dates:

Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format

Thanks!

Define header and footer for each worksheet (Feature Request)

Would be great to define for each worksheet a separate header and footer or/and to use the Excel build in functions (e.g. &[Page] of &[Pages] or &[Tab] --> ends up in a corrupt file at the moment)

read.xlsx can read .xlsm files, but error checking prevents it

package broom

Once published, this package could be very helpful for externalizing common "complex" (not matrix-like) objects management for writeData.
Daily usage of openxlsx should be something like

writeData(x = tidy(lm.model), ......)

instead of current

writeData(x = lm.model, ......)

No need to declare it as a dependency/import too ...

What do you think? Do you prefer to have package own's methods?

best,
Luca

Cannot add sheets to created workbook in Excel

When you open files created by openxlsx in Excel and try to add a sheet using Excel you get the error message "That command cannot be used on multiple selection". Something seems to be corrupted there.

Feature request: stackable formatting

Hi,

First off, thanks for developing the best R-excel package available. This package has been a great aid in my work.

I'd like to request that formatting becomes stackable. Ie. when you add a "bold" style to a cell you can subsequently add a "orange" fill to a cell. Currently, if you were to add the orange fill to the bold it would overwrite it. Perhaps such a mechanism would be possible using a getStyle(cellIndices) function?

Regards,
Floris

Read dates as text

Currently read.xlsx in openxlsx returns dates as numbers, presumably the number of days from the relevant origin.

But given the inconsistency (also here) of the origin in Excel between platforms, this date format is not straightforward to parse correctly. It would be more useful (and less error-prone) to instead return a string with the date as formatted in the relevant cell.

This would also make it consistent with the treatment of dates by the xlsx package.

Test

This is a test, sorry

restoring of initial option scipen fails

> getOption("scipen")
[1] 0
> 
> wb <- createWorkbook()
> addWorksheet(wb, "Cars")
> 
> x <- mtcars[1:6,]
> writeData(wb, "Cars", x, startCol = 2, startRow = 3, rowNames = TRUE)
> 
> getOption("scipen")
$scipen
[1] 0

Instead of options("scipen"), you should use getOption("scipen") instead within the following functions:

- wrappers.R
- writeData.R
- writeDataTable.R

eg: exSciPen <- getOption("scipen") instead of exSciPen <- options("scipen")

Your present code returns a list of length one instead of a numeric. With each call of one of the above functions, this list is growing and openxlsx crashes at some point. Thanks for your package - it's really great!

awalker89 / openxlsx Goto Github PK

openxlsx's People

Contributors

Stargazers

Watchers

Forkers

openxlsx's Issues

Creating a minimal example

Replicating the bug

Test

Recommend Projects

Recommend Topics

Recommend Org

Jobs