GithubHelp home page GithubHelp logo

ropensci / readods Goto Github PK

View Code? Open in Web Editor NEW
55.0 4.0 22.0 1.26 MB

Read ODS (OpenDocument Spreadsheet) into R as data frame. Also support writing data frame into ODS file.

Home Page: https://docs.ropensci.org/readODS/

License: Other

R 21.20% C++ 78.79% Makefile 0.01%

readods's Introduction

readODS

CRAN status Lifecycle: stable Codecov test coverage rOpenSci R-CMD-check

The only goal of readODS is to enable R to read and write OpenDocument Spreadsheet (ODS) files. This package supports both ordinary ODS and “Flat ODS” (fods).

Installation

Install the latest stable version from CRAN:

install.packages("readODS")

from R-universe:

install.packages("readODS", repos = "https://ropensci.r-universe.dev")

Or install the development version from Github:

remotes::install_github("ropensci/readODS")

Usage

In almost all use cases, you only need two functions: read_ods and write_ods. Simple.

Reading

library(readODS)
read_ods("starwars.ods")
#> # A tibble: 10 × 3
#>    Name               homeworld species
#>    <chr>              <chr>     <chr>  
#>  1 Luke Skywalker     Tatooine  Human  
#>  2 C-3PO              Tatooine  Human  
#>  3 R2-D2              Alderaan  Human  
#>  4 Darth Vader        Tatooine  Human  
#>  5 Leia Organa        Tatooine  Human  
#>  6 Owen Lars          Tatooine  Human  
#>  7 Beru Whitesun lars Stewjon   Human  
#>  8 R5-D4              Tatooine  Human  
#>  9 Biggs Darklighter  Kashyyyk  Wookiee
#> 10 Obi-Wan Kenobi     Corellia  Human

Reading from the 2nd sheet

read_ods("starwars.ods", sheet = 2)
#> # A tibble: 10 × 8
#>    Name           height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>           <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Luke Skywalker    172    77 blond      fair       blue            19   male  
#>  2 C-3PO             202   136 none       white      yellow          41.9 male  
#>  3 R2-D2             150    49 brown      light      brown           19   female
#>  4 Darth Vader       178   120 brown, gr… light      blue            52   male  
#>  5 Leia Organa       165    75 brown      light      blue            47   female
#>  6 Owen Lars         183    84 black      light      brown           24   male  
#>  7 Beru Whitesun…    182    77 auburn, w… fair       blue-gray       57   male  
#>  8 R5-D4             188    84 blond      fair       blue            41.9 male  
#>  9 Biggs Darklig…    228   112 brown      unknown    blue           200   male  
#> 10 Obi-Wan Kenobi    180    80 brown      fair       brown           29   male

Reading from a specific range

read_ods("starwars.ods", sheet = 2, range = "A1:C11")
#> # A tibble: 10 × 3
#>    Name               height  mass
#>    <chr>               <dbl> <dbl>
#>  1 Luke Skywalker        172    77
#>  2 C-3PO                 202   136
#>  3 R2-D2                 150    49
#>  4 Darth Vader           178   120
#>  5 Leia Organa           165    75
#>  6 Owen Lars             183    84
#>  7 Beru Whitesun lars    182    77
#>  8 R5-D4                 188    84
#>  9 Biggs Darklighter     228   112
#> 10 Obi-Wan Kenobi        180    80

Reading as a dataframe

read_ods("starwars.ods", range="Sheet1!A2:C11", as_tibble = FALSE)
#>       Luke.Skywalker Tatooine   Human
#> 1              C-3PO Tatooine   Human
#> 2              R2-D2 Alderaan   Human
#> 3        Darth Vader Tatooine   Human
#> 4        Leia Organa Tatooine   Human
#> 5          Owen Lars Tatooine   Human
#> 6 Beru Whitesun lars  Stewjon   Human
#> 7              R5-D4 Tatooine   Human
#> 8  Biggs Darklighter Kashyyyk Wookiee
#> 9     Obi-Wan Kenobi Corellia   Human

Writing

## preserve the row names
write_ods(mtcars, "mtcars.ods", row_names = TRUE)

Appending a sheet

write_ods(PlantGrowth, "mtcars.ods", append = TRUE, sheet = "plant")
## Default: First sheet
read_ods("mtcars.ods")
#> New names:
#> • `` -> `...1`
#> # A tibble: 32 × 12
#>    ...1          mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2 Mazda RX4 …  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5 Hornet Spo…  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7 Duster 360   14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
read_ods("mtcars.ods", sheet = "plant", range = "A1:B10")
#> # A tibble: 9 × 2
#>   weight group
#>    <dbl> <chr>
#> 1   4.17 ctrl 
#> 2   5.58 ctrl 
#> 3   5.18 ctrl 
#> 4   6.11 ctrl 
#> 5   4.5  ctrl 
#> 6   4.61 ctrl 
#> 7   5.17 ctrl 
#> 8   4.53 ctrl 
#> 9   5.33 ctrl

Maximum Sheet Size

Reading The maximum size of sheet you can read is determined by your machine’s RAM.

Writing You can theoretically write sheets up to 16 384 columns by 1 048 576 rows (the current maximum sheet size in Excel and LibreOffice Calc). While larger ODS files than this are valid, they are not well supported. However older version of LibreOffice (<=7.3) and Excel (<=2003) have significantly smaller maximum sheet sizes, and so this should be considered when writing files for distribution.

Misc

The logo of readODS is a remix of LibreOffice Calc v6.1 icon created by the Document Foundation. The original LibreOffice logo is licensed under the Creative Commons Attribution Share-Alike 3.0 Unported License. readODS is not a product of the Document Foundation. The logo of readODS is licensed under the Creative Commons Attribution Share-Alike 3.0 Unported License.

The creator of this package is Gerrit-Jan Schutten. The current maintainer is Chung-hong Chan. This package benefits from contributions by Peter Brohan, Thomas J. Leeper, John Foster, Sergio Oller, Jim Hester, Stephen Watts, Arthur Katossky, Stas Malavin, Duncan Garmonsway, Mehrad Mahmoudian, Matt Kerlogue, Detlef Steuer, Michal Lauer, and Till Straube.

This package emulates the behaviours of readxl::read_xlsx, writexl::write_xlsx and xlsx::write.xlsx.

This package should be a silent member of rio, so that you don’t need to care about file format any more.

License

GPL3

Contributing

Contributions in the form of feedback, comments, code, and bug report are welcome.

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

readods's People

Contributors

chainsawriot avatar phonixor avatar pbrohan avatar drjxf avatar mattkerlogue avatar michallauer avatar katossky avatar nacnudus avatar jimhester avatar mmahmoudian avatar zeehio avatar stas-malavin avatar leeper avatar ktiu avatar olivroy avatar scwatts avatar

Stargazers

David Schoch avatar Mwavu avatar Andrew Allen Bruce avatar  avatar luifrancgom avatar Henrik Bengtsson avatar vjvelascorios avatar  avatar Nick Untun avatar  avatar Bruno Montezano avatar Ersoy Filinte avatar Jimmy Briggs avatar Andreas Scharmüller avatar  avatar Adam Howes avatar BradL avatar Pierre-Yves Berrard avatar Henrique de Mello de Assunção avatar Pedro Cunha avatar Matt Dray avatar Antonio Canepa avatar AM avatar  avatar Dominic Dennenmoser avatar  avatar  avatar  avatar Salim B avatar Roberto Salas avatar Adam H. Sparks avatar Andaru avatar Julio Trecenti avatar Kirill Müller avatar Romain Lesur avatar boB Rudis avatar Robin Lovelace avatar Stéphane Guillou avatar  avatar Haijun Leng avatar Jose Angel Heras avatar Basil Eric Rabi avatar Mikhail Popov avatar Claudine Chionh avatar Kun Ren avatar  avatar Jennifer (Jenny) Bryan avatar Guillermo Garza avatar Emil O. W. Kirkegaard avatar Timothée Flutre avatar Shinya Uryu avatar  avatar  avatar Ben Marwick avatar Peter Ansell avatar

Watchers

 avatar  avatar  avatar  avatar

readods's Issues

"Modernize" the code

Also prepare this package for future ropensci submission

  • Using namespace operator
  • Use a CI Suite
  • covr

Memory error when reading medium sized file

Hi @chainsawriot, thanks for a great package! It's been working well for me until I tried this:

download_and_read_ods_file = function(x) {
  file_name = basename(x)
  message("Downloading file ", file_name, " from ", x)
  download.file(x, file_name)
  readODS::read.ods(file_name)
}
x = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/748222/jts0509.ods"
download_and_read_ods_file(x)

As reported by @joeytalbot this also seems to fail on Windows, with the following message:

Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html,  : 
  Memory allocation failed : growing input buffer [2]

It consumed 20+ GB RAM!

Cheers.

Increase Test coverage

The following have no test case.

  • Range parameter
  • na parameter
  • col_types parameter
  • Unicode Characters
  • ods files available online (any open dataset?)
  • ods files generated by MS Excel

broken XML

cell with a ">" are stored wrong!

<text:p>></text:p>

need to test if > is in the middle of a sentence... still FUCK .ods's fugly invalid xml format!

Skip rows argument

Be able to specify how many rows to skip at the top of a sheet, like read.table skip=.

write_ods: user interface for v1.7

Thanks to the pull request from @drjxf , this package is significantly improved.

@drjxf proposes to have a separate read_ods_2 which can update and append a new sheet to an existing ods file, which is very neat. It also makes #56 possible. However, there is no reason to split it into read_ods and read_ods_2. So, my proposal is to join them back together, so that we have only read_ods but it can append and update, like @drjxf 's read_ods_2.

@drjxf 's read_ods_2 uses combinations of overwrite_file and overwrite_sheet to govern its behaviors. Although the manual prepared by @drjxf has a truth table describing these behaviors, it is not very intuitive.

File exists Sheet exists overwrite_file overwrite_sheet Result
F F X X New file with named sheet
T X T X Overwrite file with named sheet
T F F X New sheet added to file
T T F F Exception thrown
T T F T Sheet replaced in file

After referencing how other packages (e.g. xlsx) deals with this, I think parameters append and update are more intuitive.

File exists Sheet exists append update Result
F / T or F T or F New file with named sheet
T T or F F F Overwrite the file with named sheet
T F T T or F New sheet appended to file
T T T F Error (duplicated sheet name), say there is the update parameter
T T T or F T Update content of the sheet in file
T F T or F T Error (No sheet)

However, things that are intuitive to me, are not always intuitive to users. Therefore, may I ask all of you to try out

https://github.com/chainsawriot/readODS/tree/proposedv1.7

Or in R

devtools::install_github("chainsawriot/readODS", ref = "proposedv1.7")

and give me feedback? If many of you think that this interface makes sense, then I will merge it into the master.

This message is targeted to both @drjxf and @leeper (original author of the function write_ods) , of course, I appreciate feedback from other users as well.

Thank you very much!

when a column is a col_skip() type, then it should skip the column

When I specify some columns as col_skip(), read_ods() complains there are insufficient col_types.

For example, a spreadsheet with 14 columns, when I set 3 of them a col_skip() I get the following output:

Insufficient `col_types`. Guessing 3 columns. 

If I replace, col_skip() by something else like col_character(), then it does not complain. But it fails on what I am trying to accomplish.

appearance of multiple warning messages

Have been trying out your read.ods function and have a slight problem - not sure if it's me or the function itself.

I have an ODS file with 93 identical sheets. When I call the read.ods function for, for example, sheet 18, the function appears to read all sheets and sends the following pre-scripted warning message:

[1] "maybe make me a warning... but found no value for defined cell at sheet: 93 row: 33 col: 18"

It gives this warning for every empty cell in column 18 of every sheet, and takes 20-30 seconds to run through all these warnings. I don't know why it only focuses on column 18, as there are numerous columns with empty cells.

Oddly, when I set formulaAsFormula = T, the warning message does not appear, but it is not the return that I want (I want the data values, not the formulae behind them).

minor discrepancy in "write_ods" help file

The "Description" on this help page says it's a "Function to write a single data.frame to an ods file and return a data frame." Actually, it returns "the value of path invisibly.", as it says in the "Value" section. I suggest you shorten the Description to just, "Function to write a single data.frame to an ods file."

Thanks for your work on this. Spencer Graves

calcext:value-type

i should respect this!
and maybe even remove the functionality from smartSheet

need to investigate if this also helps with date/time bug

though its hard... as data.frames dont allow diffent types per column....
so... mmmh... no clue

Multicolumn col_types throws warnings and not respected

From the PR #40 , @basilrabi reported an issue with col_types. His solution can fix the issue but can potentially introduce problems in some edge cases. e.g. col_types = c(NA, 1, 1, 2).

A more elegant solution is to test whether the col_types is of the classcol_spec first. If it is the case, then use it, then test for NULL, then test for NA.

na= parameter is vectorized (this not documented and maybe not the desired behaviour)

Writing read_ods(sheet='myfile', na=c('', '?', 'missing')) is interpreted as:

  • use for NA for the fist variable
  • use ? for NA for the second variable
  • use missing for the third
  • start over again with for the fourth, ? for the fifth and missing for the sixth variables.

This is most probably not the desired behaviour. I expected that all the specified values are replaced by NA for all the variables (table-wise), and I guess most people would expect the same. Recycling is, however, what one would never expect.

Arguably, one could say that if na= is a vector of the same size as the number of column, then it should be interpreted as the NA-string to be valid in each corresponding column. But I think it is a bad idea, since it would be a programmatic hassle: in a table with 2 columns, who can tell the difference between 1 NA-string by column and 2 NA-strings to be valid for the whole table.

I suggest that if na= is a vector, then it must be interpreted as valid for the whole table. And that if it is a named list, or a list with as many elements as the table has columns, then it is interpreted as to be valid column-wise. There is then no way to specify both a table-wise and a column-specific NA string, but eh, nothing is perfect.

I can implement the change as soon as I get (hopefully positive) feedbacks.

speed

It is a great package and a big help, but with large files read_ods() is very slow, at least on my computer/usecase.

Optimization with RSQLite

Currently the parse_rows use an expanding data.frame (i.e. continuously rbinding) to hold values in the ods sheet.

I have create a branch that use RSQLite to hold values in a in-memory DB.

The benchmark is here. Original:

microbenchmark::microbenchmark({read_ods('./tests/testdata/datasets.ods', 4)}, times = 10)
Unit: seconds
                                                 expr      min       lq    mean
 {     read_ods("./tests/testdata/datasets.ods", 4) } 14.94037 15.01843 15.4036
  median       uq      max neval
 15.2089 15.60137 16.52064    10

RSQLite.

Unit: seconds
                                                 expr     min       lq     mean
 {     read_ods("./tests/testdata/datasets.ods", 4) } 6.48596 7.434865 8.406698
   median       uq      max neval
 8.030323 8.929039 11.75338    10

This small hack can reduce the parsing time. But on the other hand, we need to make this package depends on RSQLite and DBI. So I put this as a separate branch.

https://github.com/chainsawriot/readODS/tree/sqlite

[FeatureRequest] Writing multiple sheets into the same file

It would be nice to also write into multiple sheets of the same .ods file. The function can:

  1. Accept a list of dataframes
  2. In case the list has names, create sheets with names otherwise follow the convention of LibreOffice/OpenOffice naming
  3. write the dataframes into the corresponding sheets

Tasks for v2.0

Deprecation

  • read.ods
  • overwrite argument for write_ods
  • getNrOfSheetsInODS
  • get_num_sheets_in_ods
  • ods_sheets
  • getOption("write_ods_na", default = FALSE) #79
  • verbose argument for write_ods
  • check_names #115

License

  • readxl zip (MIT)
  • rapidxml (MIT)

Functionalities

  • as_tibble
  • as_tibble default for v2

Revdep

  • check
  • submitt prs

misc

  • R>=3.5 (as readr needs 3.5)

read.ods() : add a "header" argument

Feature request :

It should be very usefull to be able to select a line as a header instead of the name of the column in the .ods. There is this option in the packages xlsx and openxlsx.

col_types not working

The option col_types is not working if set e.g. col_types = "numeric". Looking at readr documentation, it seems to me that col_types should be specified in a different manner. Should readODS be updated to reflect this?

Test case:
test.ods

first
1

non-working code

read_ods("test.ods", col_types = "numeric")

Add na.string parameter

If the ODF file has empty cells, they are read as "" (empty strings). This is usually not what is wanted. They should be loaded as NA. This should also be the default. However, it would be wise to keep a parameter for this in case there are more than one possible string for missing values, such as often found in survey datasets.

Because this isn't there, one has to do replacing after loading, which cancels the benefits of using this loader as opposed to saving the ODS files as .csv and just using read.csv().

cellranger reboot

Hi @chainsawriot,

cellranger is getting a major reboot, because I need it to do more than it currently does. And I have to update it, to get compatible with the new testthat.

Do I have this right? The only place you use it is here:

https://github.com/chainsawriot/readODS/blob/ee96772d4de374b17d533fa5861feeb55130d1ef/R/readODS.R#L200

and the only function you use is cellranger::as.cell_limits()? You use it just to parse a single range that might look like A5:D12?

I'm not sure if the cell_limits object will survive, but obviously something equivalent would take its place. It just might be a little bit different. I will let you know...

read.ods() : only last values of each line are imported

How to reproduce the bug ?

Create an ods file with this simple table

gender visit1 visit2 visit3
m 4 6 8
f 8 9 4
m 8 2 1

then import it

library(readODS)
read.ods(file="table.ods", sheet=1)

Tested on MacOS X and Windows XP with R 3.1 and readODS 1.2

What append ?

I obtain a data.frame with only the last value of each line and the name of the column

a b c d
visit3
8
4
1

What is expected

A full data.frame like the original table and maybe the name of the column as the first line of my original table (maybe add an option header)

Maybe it's related with #3

Should be really nice if this package work. Nice initiative!

Does not overwrite when writing

When writing a data.frame to ods, if a file with the same filename already exists, it will not be overwritten, so in effect, nothing is written. Any other write function I know will overwrite existing files, so I think that is expected.

Column specification message lying...

Not sure if this is the same as #27, but I got the following message when reading in a file:

# Parsed with column specification:
# cols(
#   Name = col_character(),
#   PS1 = col_double(),
#   PS2 = col_double(),
#   PS3 = col_double(),
#   PS4 = col_double(),
#   PS5 = col_double(),
#   PS6 = col_double(),
#   PS7 = col_double(),
#   Midterm = col_double(),
#   Quiz = col_double()
# )

But when I do sapply(data, class) thereafter, all columns are character?

Reading and writing non-ASCII characters

cell values containing "" turn into â€�â€, this seems to happen on windows only (tested on ubuntu and win7)

its probably an encoding issue...
but the unz and open commands seem to ignore the encoding parameters...

Prepare a 1.6 version for CRAN submission

  • Fix as many issues as possible, or at least note it in the documentation
    • #17
    • #15 Probably cannot fix this in short time, add a note in the documentation
    • #13 Infer the type from value-type alas read.csv?
    • #2 ditto #15
  • use jennybc/cellranger
  • emulate the API of readxl::read_excel
    • header versus col_names
  • Data import is actually damn slow, probably because of numerous rbindings of data.frame in parse_rows. Need to optimize that.
    • Use sqlite?

read_ods reading one more row than required

I'm using read_ods() from the readODS package for getting a table from a LibreOffice spreadsheet into r. It works but it seems to read one more row than expected:

        > read_ods(data_dir %+% "OpenDocument Spreadsheet.ods", sheet = "Sheet1", range = "A1:B4")
    Parsed with column specification:
    cols(
      A = col_character(),
      B = col_character()
    )
              A         B
    1         1         4
    2         2         5
    3         3         6
    4 DoNotRead DoNotRead

The table has 4 rows including titles, but read_ods gets 5 rows (1 titles row + 4). If I set the range argument with one less row (which is wrong) I get the expected result:

  > read_ods(data_dir %+% "OpenDocument Spreadsheet.ods", sheet = "Sheet1", range = "A1:B3")
   Parsed with column specification:
   cols(
     A = col_character(),
     B = col_character()
   )
     A B
   1 1 4
   2 2 5
   3 3 6

Thanks,

P.D. : sorry for any mistakes this is my first post in GitHub.

Feature request: adding a possibility to read comments

Comment field is a great last-resort way to put an extra dimension of informations into a matrix (e.g. matrix of analysis to do).

At the moment comments are concatenated for string fields, and ignored in numeric fields, because they are stored in the text:p XML tag just as string contents.

I think about a flag to the read.ods (or a separate function) that instead of reading the field itself, it only parses office:annotation XML tag, and expects to find there only string.

I think it shouldn't be too much problem. And I will gladly help you, if you tell me what to do. I know how to write R packages, but I never have programmed XML parsing before (although I know JSON and YAML).

No documentation in v1.5

Typing help(package="readODS") return no documentation. Maybe an error with roxygen2?
Or maybe because I download it directly from GitHub and you put man/ directory in gitignore? I believe that's a good practice to keep man/ under version control, even if you use roxygen2? ( Hadley Wickham does it, example : https://github.com/hadley/ggplot2 )

Argument `check.names`

Adding check.names, as in: data.frame(..., check.names = TRUE, as additional argument to read_ODS would be great, as it would give the user more freedom as to how column names would be created.

Need a better readme file

also

  • a hex logo
  • Design principle for read_ods: emulate readxl::read_xlsx
  • Design principle for write_ods: emulate either writexl::write_xlsx or xlsx::write.xlsx
  • Overall design goal: be a silent member of rio so that you don't need to care about file format.

multi-line .ods files

Hi,
It seems readODS does not support multi-line .ods files (e.g. one cell with multiple lines, using Ctrl+Enter to write every line). I have checked the code and it seems pretty easy to add a workaround:

You just have to slightly modify the main for loop of the "to_data_frame" function like this:

for(i in 1:nrow(cell_values)){
    my_row <- cell_values[i, 1]
    my_col <- cell_values[i, 2]
    previous_value = res[my_row, my_col]
    if ( previous_value == "") res[my_row, my_col] = cell_values[i, 3]
    else res[my_row, my_col] = paste(previous_value, cell_values[i, 3], sep='\n') #support for multi-line cells
}

I hope this helps (I attach an example of .ods file with a multiline cell).
Best regards,
Hector
multiline_cells.zip

Bad UTF-8 Encoding in ODS sheet names

I found this bug looking for a sheet named with a é in it. Did not work even if I double-checked the spelling 10 times. When debugging, I found out that the select_sheet function has issues with encodings. In:

if (which_sheet %in% sheet_names)

... which_sheet is the string you enter in R, while sheet_names is taken from the ODS-file. %in% does not work with accentuated letters. In my case, the é was encoded as c3 a9 in R (legitimate encoding) while it is encoded 65 cc 81 in the ODS-file, i.e. e+´ (which should be avoided, as far as I know).

Do you know a way to convert all known combinations of letter+diacritic to their proper UTF-8 code? Just tell me, I'll do the changes and pull request.

PS: maybe the issue comes from xml2 package (but I doubt it), which is used to read the ODS-file with:

sheet_names <- sapply(sheets, function(x) xml_attr(x, "table:name", ods_ns))

Error reading worksheets with external data sources

I get the following error if I try to read a worksheet containing data linked from separate worksheet:

Error in if (!is.na(xmlAttrs(row)["number-rows-repeated"])) { : 
  argument is of length zero
In addition: Warning message:
In is.na(xmlAttrs(row)["number-rows-repeated"]) :
  is.na() applied to non-(list or vector) of type 'NULL'

Workaround is to break links in the affected file before reading but it's not ideal. Is this something that can be fixed?

Example files here: https://github.com/ndoylend/Testing/

write_ods creates corrupted file

No problem with the CRAN version (v1.6.7)

The current version (6ac61e1), although pass all the tests, ods files generated cannot be read into LibreOffice without repairing. Can't import into Google Sheets.

add date/time support

i am unsure how it is handled now... on both sides.
but i should at least investigate...

and decide if it should be part of smartSheet or read.ods

read.ods() : error ".ods file is broken beyond repair" with a good .ods

How to reproduce the bug

Create a simple .ods with libreoffice 4.2.1.1 and try to import it with

read.ods("simpletable.ods")

What's append ?

An error :

[1] "added office:document-content"
[1] "emptyElement"
[1] "empty office:scripts"
[1] "added office:font-face-decls"
[1] "emptyElement"
[1] "empty style:font-face"
[1] "emptyElement"
[1] "empty style:font-face"
[1] "emptyElement"
[1] "empty style:font-face"
[1] "emptyElement"
[1] "empty style:font-face"
[1] "emptyElement"
[1] "empty style:font-face"
[1] "stopElement"
[1] "removed office:font-face-decls"
[1] "added office:automatic-styles"
[1] "added style:style"
[1] "emptyElement"
[1] "empty style:table-column-properties"
[1] "stopElement"
[1] "removed style:style"
[1] "added style:style"
[1] "emptyElement"
[1] "empty style:table-row-properties"
[1] "stopElement"
[1] "removed style:style"
[1] "added style:style"
[1] "emptyElement"
[1] "empty style:table-properties"
[1] "stopElement"
[1] "removed style:style"
[1] "stopElement"
[1] "removed office:automatic-styles"
[1] "added office:body"
[1] "added office:spreadsheet"
[1] "added table:table"
[1] "emptyElement"
[1] "empty table:table-column"
[1] "added table:table-row"
[1] "added table:table-cell"
[1] "added text:p"
[1] "stopElement"
Error in odsPreParser(file) : 
  this .ods file is broken beyond repair..., it's element are inconsistent expected: 'text:p' but given: 'table:table-cell'

What is expected

A data.frame.

Version used

  • readODS GitHub version downloaded with devtools the 2014-07-02
  • R 3.1
  • Windows XP

Quiet reading

Hi,

I would like to disable the messages when reading ods files :

Parsed with column specification:
cols(
...
)

I don't see how it is possible now (except surrounding the call with suppressMessages), so maybe it could be implemented, via a quiet or verbose argument?

Regards

Decimal with comma still buggy

This is related to https://github.com/chainsawriot/readODS/pull/35 and https://github.com/chainsawriot/readODS/issues/30

On my machine (Archlinux 64Bit, R 3.4.1), comma will not be interpreted as decimal point properly: either it is just ignored or the related number is just imported as character, see attached test.ods.zip which results in

> readODS::read_ods("/tmp/test.ods")
Parsed with column specification:
cols(
  `1,00` = col_character(),
  `0,00` = col_number(),
  `0,50` = col_character()
)
  1,00 0,00 0,50
1 0,50  100 0,75

Expected:

1,00 0,00 0,50
0,50 1,00 0,75

Also, the test file https://github.com/chainsawriot/readODS/blob/master/tests/testdata/decimal_comma.ods is not imported correctly:

> read_ods('~/Downloads/decimal_comma.ods')
Parsed with column specification:
cols(
  A = col_number(),
  B = col_character(),
  C = col_character()
)
   A      B     C
1 34 2,30 € 3,00%

Maybe related to system-specific localization settings - I have LANG=en_DK.utf8 and LANGUAGE=en_DK.utf8

Prevent reading comments as cell data

Hello,

Thanks for writing readODS, I've found it very helpful in my analyses! I ran into an issue where comments in my ods files were read as actual cell data and I don't think this is intended behaviour. Comments seem to be represented within the <office:annotation /> tag under <table:table-cell /> and along side the actual cell data:

          <table:table-cell office:value-type="string" calcext:value-type="string">
            <office:annotation office:display="true" draw:style-name="gr1" draw:text-style-name="P2" svg:width="28.99mm" svg:height="17.99mm" svg:x="208.99mm" svg:y="9mm" draw:caption-point-x="-1.5mm" draw:caption-point-y="4.59mm">
              <dc:date>2019-05-21T00:00:00</dc:date>
              <text:p text:style-name="P1">
                <text:span text:style-name="T1">Comment regarding my cell data</text:span>
              </text:p>
            </office:annotation>
            <text:p>my_cell_data_string</text:p>
          </table:table-cell>

Currently a call to xml2::xml_find_all uses an XPath query that collects all <text:p /> tags within each cell, including those inside comments (readODS.R#L90). A fix for this would be to restrict the XPath query to only retrieve <text:p /> tags that are direct children of each cell.

I've applied this fix over at scwatts@aa3ea55 and also checked that it doesn't break multiline cell data (https://github.com/chainsawriot/readODS/issues/23). If this is something you'd like to see merged, I'd be very happy to make a pull request.
Thanks!

repeats can be something different then an empty line...

<table:table-cell table:number-columns-repeated="12" office:value-type="string" calcext:value-type="string"><text:p>sample</text:p></table:table-cell>

fugly XML... the behaviour makes sense, but it is not all ways applied... this one was converted from an xls file...

i really don't wanna see the libre office code 😠

Speed of writing?

I have a dataframe test of ~80,000 obs of 9 variables.

I tried to run write_ods(test, "whatever.ods")

After 30 minutes the action still hadn't completed.

Smaller DF work. For example write_ods(iris, "iris.ods") completes nearly instantly.

Am I doing something wrong? Is there a filesize limitation?

read_ods function defaults incorrectly to parsed_df

Hi,

After encountering some problems using the read_ods function - I was only able to import data from ods files as text - I have checked the code and think I have found a solution.

Within the read_ods function, after creating the parsed_df, some checks are done. The second check will always result in a true value, and will never allow the following else statement. See below.

if (is.na(NA)) { raw_sheet <- parsed_df } else ....

As stated above the code of the function, col_types can be NULL, a character vector or NA. The code below should provide the correct behaviour. Allowing the else statement to call the readr package using a character vector of col_types.

if (is.na(col_types)) { raw_sheet <- parsed_df } else ....

read from url

Is there any possibility of developing this to read directly from a url as per read.csv?

At the moment it says when submitting a url as argument:

Error in parse_ods_file(file) : file does not exist

I would prefer to not include any side-effects where possible in my code, so ideally would prefer to not have to download data initially.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.