GithubHelp home page GithubHelp logo

r-lib / archive Goto Github PK

View Code? Open in Web Editor NEW
141.0 6.0 15.0 6.67 MB

R bindings to libarchive, supporting a large variety of archive formats

Home Page: https://archive.r-lib.org/

License: Other

R 7.09% Shell 0.03% C++ 92.69% C 0.19%
libarchive r compression connections

archive's Introduction

archive

R-CMD-check Coverage Status CRAN status

R bindings to libarchive http://www.libarchive.org. Supports many archives formats, including tar, ZIP, 7-zip, RAR, CAB. Also supports many filters such as gzip, bzip2, compress, lzma, xz and uuencoded files, among others.

archive provides interfaces to read and write connections into archives, as well as efficiently reading and writing archives directly to disk.

Installation

You can install archive from CRAN with:

# install.packages("archive")

Example

Single file archives

Use archive_read() and archive_write() to read and write single files to an archive. These files return connections, which can be passed to any R interface which can take a connection. Most base R file system functions use connections, as well as some packages like readr.

library(readr) # read_csv(), write_csv(), cols()

# Write a single dataset to zip
write_csv(mtcars, archive_write("mtcars.zip", "mtcars.csv"))

# Read the data back, by default the first file is read from the archive.
read_csv(archive_read("mtcars.zip"), col_types = cols())
#> # A tibble: 32 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> # ℹ 28 more rows

# Also supports things like archiving and compression together
# Write a single dataset to (gzip compressed) tar
write_csv(mtcars, archive_write("mtcars.tar.gz", "mtcars.csv", options = "compression-level=9"))

# Read the data back
read_csv(archive_read("mtcars.tar.gz"), col_types = cols())
#> # A tibble: 32 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> # ℹ 28 more rows

# Archive file sizes
file.size(c("mtcars.zip", "mtcars.tar.gz"))
#> [1] 742 808

Multi file archives

archive_write_files() is used to create a new archive from multiple files on disk.

# Write a few files to the temp directory
write_csv(iris, "iris.csv")
write_csv(mtcars, "mtcars.csv")
write_csv(airquality, "airquality.csv")

# Add them to a new archive
archive_write_files("data.tar.xz", c("iris.csv", "mtcars.csv", "airquality.csv"))

# View archive contents
a <- archive("data.tar.xz")
a
#> # A tibble: 3 × 3
#>   path            size date               
#>   <chr>          <int> <dttm>             
#> 1 iris.csv        3716 2023-12-11 12:18:04
#> 2 mtcars.csv      1281 2023-12-11 12:18:04
#> 3 airquality.csv  2890 2023-12-11 12:18:04

# By default `archive_read()` will read the first file from a multi-file archive.
read_csv(archive_read("data.tar.xz"), col_types = cols())
#> # A tibble: 150 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa 
#> 4          4.6         3.1          1.5         0.2 setosa 
#> # ℹ 146 more rows

# Use a number to read a different file
read_csv(archive_read("data.tar.xz", file = 2), col_types = cols())
#> # A tibble: 32 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> # ℹ 28 more rows

# Or a filename to read a specific file
read_csv(archive_read("data.tar.xz", file = "mtcars.csv"), col_types = cols())
#> # A tibble: 32 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> # ℹ 28 more rows

Regular files (with compression)

file_write() returns a connection to filtered by one or more compressions or encodings. file_read() reads a compressed file, automatically detecting the compression used.

# Write bzip2, uuencoded data
write_csv(mtcars, file_write("mtcars.bz2", filter = c("uuencode", "bzip2")))

# Read it back, the formats are automatically detected
read_csv(file_read("mtcars.bz2"), col_types = cols())
#> # A tibble: 32 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> # ℹ 28 more rows

archive's People

Contributors

ajdamico avatar allenluce avatar arisp99 avatar barracuda156 avatar cielavenir avatar coolbutuseless avatar gaborcsardi avatar jeroen avatar jimhester avatar salim-b avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

archive's Issues

Move `master` branch to `main`

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

  • Help us firm up the list of targetted repositories
  • Make sure all maintainers are aware of what's coming
  • Give us an issue to close when the job is done
  • Give us a place to put advice for collaborators re: how to adapt

message id: euphoric_snowdog

UBSAN failure

archive_write_files.cpp:46:45: runtime error: nan is outside the range of representable values of type 'int'
    #0 0x7f7007879af7 in archive_write_files_(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, cpp11::r_vector<cpp11::r_string>, int, cpp11::r_vector<int>, cpp11::r_vector<cpp11::r_string>, unsigned long) /data/gannet/ripley/R/packages/tests-clang-SAN/archive/src/archive_write_files.cpp:46:45
    #1 0x7f700787d399 in _archive_archive_write_files_ /data/gannet/ripley/R/packages/tests-clang-SAN/archive/src/cpp11.cpp:33:27
    #2 0x6dcbe1 in R_doDotCall /data/gannet/ripley/R/svn/R-devel/src/main/dotcode.c:617:17
    #3 0x8397be in bcEval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:7684:21
    #4 0x81d3ae in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:740:8
    #5 0x8861b7 in R_execClosure /data/gannet/ripley/R/svn/R-devel/src/main/eval.c
    #6 0x881b1f in Rf_applyClosure /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:1836:16
    #7 0x841dff in bcEval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:7096:12
    #8 0x81d3ae in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:740:8
    #9 0x8861b7 in R_execClosure /data/gannet/ripley/R/svn/R-devel/src/main/eval.c
    #10 0x881b1f in Rf_applyClosure /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:1836:16
    #11 0x81dde8 in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:863:12
    #12 0x892236 in do_set /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:2982:8
    #13 0x81d798 in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:815:12
    #14 0x8910d2 in do_begin /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:2530:10
    #15 0x81d798 in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:815:12
    #16 0x81d798 in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:815:12
    #17 0x94d266 in Rf_ReplIteration /data/gannet/ripley/R/svn/R-devel/src/main/main.c:264:2
    #18 0x9507b0 in R_ReplConsole /data/gannet/ripley/R/svn/R-devel/src/main/main.c:316:11
    #19 0x9505b9 in run_Rmainloop /data/gannet/ripley/R/svn/R-devel/src/main/main.c:1129:5
    #20 0x4e247a in main /data/gannet/ripley/R/svn/R-devel/src/main/Rmain.c:29:5
    #21 0x7f7016993081 in __libc_start_main (/lib64/libc.so.6+0x27081)
    #22 0x43129d in _start (/data/gannet/ripley/R/R-clang-SAN/bin/exec/R+0x43129d)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior archive_write_files.cpp:46:45 in 

Please correct before 2021-10-29 to safely retain your package on CRAN.

archive::archive_extract confuses line endings on rar files

hi, here are two files created with winrar. when decompressed with winrar versus archive_extract, both give the same file.size() but the R.utils::countLines() result for archive_extract is too few lines. for some reason, archive_extract is missing line endings on winrar files and ends up cramming them together.. hope the diagnostics below are helpful. thanks!

[1] "currently working on ftp://ftp.cdc.gov/pub/health_statistics/nchs/datasets/dvs/natality/nat2009us.zip"
[1] "archive::archive_extract extracts 1 lines"
[1] "archive::archive_extract file.size 3215098572"
[1] "winrar extracts 4137836 lines"
[1] "winrar file.size 3215098572"

[1] "currently working on http://download.inep.gov.br/microdados/microdados_enem2009.rar"
[1] "archive::archive_extract extracts 80937 lines"
[1] "archive::archive_extract file.size 4078192743"
[1] "winrar extracts 4148721 lines"
[1] "winrar file.size 4078192743"

loop that reproduces the problem:

# install.packages("devtools")
# devtools::install_github("ajdamico/lodown")
# devtools::install_github("jimhester/archive")

# path to winrar on local machine
path_to_winrar <- normalizePath( "C:/Program Files/winrar/winrar.exe" )
tf <- tempfile()

for( this_file in c( 'ftp://ftp.cdc.gov/pub/health_statistics/nchs/datasets/dvs/natality/nat2009us.zip' , 'http://download.inep.gov.br/microdados/microdados_enem2009.rar' ) ){

	print( paste( "currently working on" , this_file ) )

	# archive fails for both
	lodown::cachaca( this_file , tf , mode = 'wb' )
	archive::archive_extract( tf , dir = tempdir() )
	windows_unzip <- grep( "Nat2009|DADOS_ENEM" , list.files( tempdir() , recursive = TRUE , full.names = TRUE ) , value = TRUE )
	print( paste( "archive::archive_extract extracts" , R.utils::countLines( windows_unzip ) , "lines" ) )
	print( paste( "archive::archive_extract file.size" , file.size( windows_unzip ) ) )
	file.remove( windows_unzip )

	# winrar succeeds for both
	lodown::cachaca( this_file , tf , mode = 'wb' )
	sys.command <- paste0( '"' , path_to_winrar , '" x ' , tf , ' "' , tempdir() , '"' )
	system( sys.command )
	windows_unzip <- grep( "Nat2009|DADOS_ENEM" , list.files( tempdir() , recursive = TRUE , full.names = TRUE ) , value = TRUE )
	print( paste( "winrar extracts" , R.utils::countLines( windows_unzip ) , "lines" ) )
	print( paste( "winrar file.size" , file.size( windows_unzip ) ) )
	file.remove( windows_unzip )

}

sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 15063)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.1     tibble_1.3.3       Rcpp_0.12.11       R.methodsS3_1.7.1  digest_0.6.12     
 [6] lodown_0.1.0       R.utils_2.5.0      rlang_0.1.1        R.oo_1.21.0        archive_0.0.0.9000

Release archive 1.1.1

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Error: package or namespace load failed for ‘archive’ in dyn.load(file, DLLpath = DLLpath, ...):

I'm trying to install on Ubuntu 16.04 and get the following error:

*** installing help indices
** building package indices
** testing if installed package can be loaded
Error: package or namespace load failed for ‘archive’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/austin/lib/R/library/archive/libs/archive.so':
/home/austin/lib/R/library/archive/libs/archive.so: undefined symbol: archive_write_set_format_raw
Error: loading failed
Execution halted
ERROR: loading failed

  • removing ‘/home/austin/lib/R/library/archive’

I've tried installing with devtools directly and also cloning the archive. Any suggestions?

Release archive 1.1.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

silent failure using symlinked paths

Empty archive created when dir is a symlinked location.

Workaround is to normalizePath(), but might be worth having the package automatically normalize the path for the user.

Locale problem

I'm using archive v1.1.2 running on Garurda Linux, an Arch derivative. As a rolling release, it is completely up-to-date, including R 4.1.1.

When I use archive to generate a list of files in a .7z archive, it works, but generates a warning that is odd to me. It says:

Setting UTF-8 locale failed

I have never changed locales in any way in R or Linux, so I'm confused. Using the locale command in the terminal gives me all UTF-8, with the exception of LC_ALL:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

Is this something I should be worried about?

Release archive 1.1.2

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • revdepcheck::cloud_check()
  • Update cran-comments.md

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Release archive 1.0.0

First release:

Prepare for release:

  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('major')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Update install instructions in README
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Is it possible to unzip .csv.7z files?

I ran the following reproducible code

library(archive)
tf <- tempfile() ; td <- tempdir()
file.path <- "https://www.kaggle.com/c/favorita-grocery-sales-forecasting/download/holidays_events.csv.7z"
download.file(url = file.path, destfile = tf, mode = "wb")
archive(tf)

I get the error

Error in archive_metadata(path) : Unrecognized archive format

I was wondering if unzipping .csv.7z files was supported.

Test failing in R 4.0

See also the CI

Last 13 lines of output:
  == Failed tests ================================================================
  -- Failure (test-archive.R:258:5): archive_write_files: can write a zip file ---
  read.csv(unz("data.zip", files[["iris"]]), row.names = 1) not equal to `iris`.
  Component "Species": Modes: character, numeric
  Component "Species": Attributes: < target is NULL, current is list >
  Component "Species": target is character, current is factor
  -- Failure (test-archive.R:280:5): archive_write_dir: can write a zip file -----
  read.csv(unz("data.zip", files[["iris"]]), row.names = 1) not equal to `iris`.
  Component "Species": Modes: character, numeric
  Component "Species": Attributes: < target is NULL, current is list >
  Component "Species": target is character, current is factor
  
  [ FAIL 2 | WARN 0 | SKIP 0 | PASS 87 ]
  Error: Test failures
  Execution halted

Feature request: list files in an archive

AFAICT there is not functionality equivalent to utils::unzip(..., list = TRUE), to list the files in the archive. This can be quite useful for examining the contents of an archive and then selecting a subset of files to extract. This could be an argument to archive_extract(), or maybe a function like archive_ls().

Extract specific file without subdirectories

Hi,

it's possible to extract a specific file to a specific directory without all directories in archive file?

Exemple
Create new zip file with one subdirectory

a <- archive(system.file(package = "archive", "extdata", "data.zip"))
d <- tempfile()
archive_extract(a, d, c("iris.csv", "airquality.csv"))
dir.create(paste(d,"TEST", sep = "\\"))
file.rename(paste(d,"iris.csv", sep = "\\"), paste(d,"TEST", "iris.csv", sep = "\\"))
z <- tempfile(fileext = ".zip")
archive_write_dir(z,d)
archive(z)

Result:

> archive(z)
# A tibble: 2 x 3
  path            size date               
  <chr>          <dbl> <dttm>             
1 airquality.csv   142 2017-04-28 21:55:29
2 TEST/iris.csv    192 2017-04-28 21:55:29

Extract file iris.csv

td <- tempfile()
archive_extract(z,td,"TEST/iris.csv")
list.files(td,recursive = T)

Result:

> list.files(td,recursive = T)
[1] "TEST/iris.csv"

But i only want iris.csv in my temporary directory like that

> list.files(td,recursive = T)
[1] "iris.csv"

Thx

Add compression configuration

Feature request: Expose the individual format/filter configuration options to the R interface i.e. archive_compressor_[name]_options and archive_filter_[name]_options

For example, zstd defaults to compression level 3 and there's no way to configure this from R.

installation trouble

It seems that archive is having trouble dealing with my system libarchive installation. I am on Ubuntu 16.04 and I initially tried to apt-get libarchive-dev(version 3.1.2).archive` installation failed almost immediately with the message:

archive.cpp: In function ‘Rcpp::IntegerVector archive_filters()’:
archive.cpp:61:26: error: ‘ARCHIVE_FILTER_LZ4’ was not declared in this scope
, Rcpp::_["lz4"] = ARCHIVE_FILTER_LZ4

Then, I built libarchive from source (version 3.3.1). Now the installation proceeds further but ends with this error message:

Error: package or namespace load failed for ‘archive’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '~/R/x86_64-pc-linux-gnu-library/3.4/archive/libs/archive.so':
~/R/x86_64-pc-linux-gnu-library/3.4/archive/libs/archive.so: undefined symbol: archive_write_set_format_raw
Error: loading failed

Any ideas?

Segfault for repeated "close(con)" for non-existent format

Specifying a non-existent format when creating an archive_write connection will still create the connection, but will cause a segfault.

In the following minimal reprex, I'm using a double close(con) to cause a segfault. The first close(con) throws an error we expect (i.e. No such format). The second close(con) causes the segfault.

Not sure if the bad format should be caught early to stop this from happening at all, or if the segfault is indicative of a sneaky memory error elsewhere.

con = archive::archive_write(archive = "archive.something", file="Robject", format='bad_and_wrong')
open(con)
write.csv(mtcars, con)
close(con)
Error in close.connection(con) : No such format
close(con)
 *** caught segfault ***
address 0x0, cause 'unknown'

Traceback:
 1: close.connection(con)
 2: close(con)

Running latest archive from github.

> devtools::session_info()
Session info ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.2.907)           
 language (EN)                        
 collate  en_AU.UTF-8                 
 date     2018-09-28                  

Packages -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 package    * version    date       source                         
 archive    * 1.0.0      2018-09-28 local                          
 base       * 3.5.1      2018-07-05 local                          
 codetools    0.2-15     2016-10-05 CRAN (R 3.5.1)                 
 compiler     3.5.1      2018-07-05 local                          
 crayon       1.3.4      2017-09-16 CRAN (R 3.5.0)                 
 datasets   * 3.5.1      2018-07-05 local                          
 devtools     1.13.6     2018-06-27 CRAN (R 3.5.0)                 
 digest       0.6.15     2018-01-28 CRAN (R 3.5.0)                 
 glue         1.3.0      2018-09-28 Github (tidyverse/glue@4e74901)
 graphics   * 3.5.1      2018-07-05 local                          
 grDevices  * 3.5.1      2018-07-05 local                          
 magrittr     1.5        2014-11-22 CRAN (R 3.5.0)                 
 memoise      1.1.0      2017-04-21 CRAN (R 3.5.0)                 
 methods    * 3.5.1      2018-07-05 local                          
 packrat      0.4.9-3    2018-06-01 CRAN (R 3.5.0)                 
 pillar       1.3.0      2018-07-14 CRAN (R 3.5.0)                 
 pryr         0.1.4      2018-02-18 CRAN (R 3.5.0)                 
 Rcpp         0.12.18    2018-07-23 cran (@0.12.18)                
 rlang        0.2.1.9000 2018-08-10 Github (r-lib/rlang@8dc87a9)   
 rstudioapi   0.7        2017-09-07 CRAN (R 3.5.0)                 
 stats      * 3.5.1      2018-07-05 local                          
 stringi      1.2.4      2018-07-20 cran (@1.2.4)                  
 stringr      1.3.1      2018-05-10 CRAN (R 3.5.0)                 
 tibble       1.4.2      2018-01-22 CRAN (R 3.5.0)                 
 tools        3.5.1      2018-07-05 local                          
 utils      * 3.5.1      2018-07-05 local                          
 withr        2.1.2      2018-03-15 CRAN (R 3.5.0)   

[question]: How are multiple files read with the same name?

The bindings only support reading one file, so how could many files with the same name across many subdirectories be read?

I'm new to R but made a script that can at least read an archive supplied as a parameter:

#!/usr/bin/env Rscript
library(archive)
options(max.print=1000000)

args <- commandArgs()
fname <- args[6]
archive <- archive_read(fname)
lines <- readLines(con=archive)

close(archive)
cat(lines, sep="\n")

malloc/memory error

Hi Jim,

Not sure if this related to 'archive' or 'readRDS' or an interaction between the two.

Using save/readRDS with format = 'tar' or 'cpio' will crash eventually with a memory error.

This crash only seems to occur with tar or cpio compressors. zip works fine.

# Using save/readRDS with format = 'tar' or 'cpio' will crash
saveRDS(mtcars, archive_write(archive = "archive.file", file = 'first', format = 'tar'))
for (i in 1:100) {
  zz <- readRDS(archive_read(archive = "archive.file", format='tar'))
}
R(9561,0x7fffac92c380) malloc: *** error for object 0x10a6fd200: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

This does not happen with write/read.csv

If save/readRDS replaced by write/read.csv it works fine. No crash.

# Using write.csv/read.csv works
write.csv(mtcars, archive_write(archive = "archive.file", file = 'first', format = 'tar'))
for (i in 1:100) {
  zz <- read.csv(archive_read(archive = "archive.file", format='tar'))
}

'libarchive' v3.3.3 installed via 'brew'

# * installing *source* package ‘archive’ ...
# PKG_CFLAGS=-I/usr/local/Cellar/libarchive/3.3.3/include -I/usr/local/Cellar/xz/5.2.4/include
# PKG_LIBS=-L/usr/local/Cellar/libarchive/3.3.3/lib -L/usr/local/Cellar/xz/5.2.4/lib -larchive -lexpat -llzma -lzstd -llz4 -lbz2 -lz -llzma -D_THREAD_SAFE -pthread

Session info

> devtools::session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.2.907)           
 language (EN)                        
 collate  en_AU.UTF-8                
 date     2018-09-26                  

Packages ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 package    * version    date       source                            
 archive    * 1.0.0      2018-09-25 Github (jimhester/archive@11e65d7)
 base       * 3.5.1      2018-07-05 local                             
 compiler     3.5.1      2018-07-05 local                             
 crayon       1.3.4      2017-09-16 CRAN (R 3.5.0)                    
 datasets   * 3.5.1      2018-07-05 local                             
 devtools     1.13.6     2018-06-27 CRAN (R 3.5.0)                    
 digest       0.6.15     2018-01-28 CRAN (R 3.5.0)                    
 glue         1.3.0      2018-09-25 Github (tidyverse/glue@4e74901)   
 graphics   * 3.5.1      2018-07-05 local                             
 grDevices  * 3.5.1      2018-07-05 local                             
 memoise      1.1.0      2017-04-21 CRAN (R 3.5.0)                    
 methods    * 3.5.1      2018-07-05 local                             
 packrat      0.4.9-3    2018-06-01 CRAN (R 3.5.0)                    
 pillar       1.3.0      2018-07-14 CRAN (R 3.5.0)                    
 Rcpp         0.12.18    2018-07-23 cran (@0.12.18)                   
 rlang        0.2.1.9000 2018-08-10 Github (r-lib/rlang@8dc87a9)      
 rstudioapi   0.7        2017-09-07 CRAN (R 3.5.0)                    
 stats      * 3.5.1      2018-07-05 local                             
 tibble       1.4.2      2018-01-22 CRAN (R 3.5.0)                    
 tools        3.5.1      2018-07-05 local                             
 utils      * 3.5.1      2018-07-05 local                             
 withr        2.1.2      2018-03-15 CRAN (R 3.5.0)  

Possibly handle zip files differently

Apparently zip is one of the few formats that doesn't require knowing the file size up front, so we could stream into zip directly from the connection rather than using the scratch file.

Not sure if this is worth the effort...

Installation fails with R 4.0.2 on Windows with Rtools40

I've successfully installed archive in R 3.6.x (Windows) with the previous Rtools package. Now archive installation fails with R 4.0.2 and the new Rtools40. Perhaps something is missing in the new toolchain?

The error is as follows:

   ** using staged installation
   ** libs
   "C:/Programs/R/R-40~1.2/bin/x64/Rscript.exe" "../tools/winlibs.R"
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c RcppExports.cpp -o RcppExports.o
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c archive.cpp -o archive.o
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c extract.cpp -o extract.o
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c r_archive.cpp -o r_archive.o
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c read.cpp -o read.o
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c read_file.cpp -o read_file.o
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c write.cpp -o write.o
   "C:/Programs/rtools40/mingw64/bin/"g++  -std=gnu++11 -I"C:/Programs/R/R-40~1.2/include" -DNDEBUG -I../windows/libarchive-3.2.2/include -I. -I'C:/Programs/R/R-4.0.2/library/Rcpp/include'        -O3 -march=native -c write_file.cpp -o write_file.o
   C:/Programs/rtools40/mingw64/bin/g++ -shared -s -static-libgcc -o archive.dll tmp.def RcppExports.o archive.o extract.o r_archive.o read.o read_file.o write.o write_file.o -L../windows/libarchive-3.2.2/lib/x64 -larchive -lcrypto -lnettle -lregex -lexpat -llzo2 -llzma -llz4 -lbz2 -lz -LC:/Programs/R/R-40~1.2/bin/x64 -lR
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x2e1): undefined reference to `locale_charset'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x366): undefined reference to `libiconv_close'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x375): undefined reference to `libiconv_close'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x1318): undefined reference to `libiconv_open'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x133f): undefined reference to `libiconv_open'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x1382): undefined reference to `libiconv_open'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x157d): undefined reference to `libiconv_open'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x159b): undefined reference to `libiconv_open'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x4990): undefined reference to `libiconv'
   C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x317): undefined reference to `locale_charset'
   collect2.exe: error: ld returned 1 exit status
   no DLL was created
   ERROR: compilation failed for package 'archive'
-  removing 'C:/Users/Boris/AppData/Local/Temp/Rtmpct45WA/Rinst8db46030695b/archive'
         -----------------------------------
   ERROR: package installation failed
Error: Failed to install 'archive' from GitHub:
  System command 'Rcmd.exe' failed, exit status: 1, stdout + stderr (last 10 lines):
E> C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x157d): undefined reference to `libiconv_open'
E> C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x159b): undefined reference to `libiconv_open'
E> C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchive.a(archive_string.o):(.text+0x4990): undefined reference to `libiconv'
E> C:/Programs/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../windows/libarchive-3.2.2/lib/x64/libarchiv

progress bars for really large archives?

Hi @jimhester , this may not be practical given the constraints of the library, but just wanted to ask anyway since I've always really appreciated the very precise implementation of progress bars in vroom etc. Would it be possible for functions like archive::extract_archive() to display progress in some way?

Release archive 1.2.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Support for .lz (lzip) files

For now .lz (lzip) file are unrecognized format. would it be possible to change this?

download.file(url = "https://parltrack.org/dumps/ep_votes.json.lz",
              destfile = "ep_votes.json.lz",
               mode = "wb")
archive("ep_votes.json.lz")
# Erreur : archive.cpp:37 archive_read_open1(): Unrecognized archive format

Error unzipping file with Ubuntu: "Invalid central directory signature"

Hi, I know this is likely an issue with unix libarchive, but if you have any insights that would be great. Here is a minimal reproducible example. Any thoughts? Thanks! Jeff

# debug extract
archive:::libarchive_version()
tf <- tempfile()
download.file( 'https://www2.census.gov/programs-surveys/acs/data/pums/2014/5-Year/csv_pus.zip' , tf , mode = 'wb' )
archive::archive_extract( tf , dir = tempdir() )
> # debug extract
> archive:::libarchive_version()
[1] ‘3.1.2’
> tf <- tempfile()
> download.file( 'https://www2.census.gov/programs-surveys/acs/data/pums/2014/5-Year/csv_pus.zip' , tf , mode = 'wb' )
trying URL 'https://www2.census.gov/programs-surveys/acs/data/pums/2014/5-Year/csv_pus.zip'
Content type 'application/zip' length 2512044838 bytes (2395.7 MB)
==================================================
downloaded 2395.7 MB

> archive::archive_extract( tf , dir = tempdir() )
Error in archive_extract_(attr(archive, "path"), file) : 
  archive_read_next_header(): Invalid central directory signature

I am using lodown (ajdamico). From that identical download, I was able to use jar to extract the files no problem (it is the identical file downloaded).

Error in archive_extract_(attr(archive, "path"), file) : 
  archive_read_next_header(): Invalid central directory signature
   year time_period                                                         base_folder db_tablename
23 2014      5-Year https://www2.census.gov/programs-surveys/acs/data/pums/2014/5-Year/  acs2014_5yr
                               dbfolder                              output_filename include_puerto_rico case_count
23 /media/jeff/jeff/ACS2015_5yr/MonetDB /media/jeff/jeff/ACS2015_5yr/acs2014_5yr.rds                TRUE         NA

> tf
[1] "/tmp/RtmprKBUz9/file16976313e25b"
> 
jeff@jeff-Precision-7520:/tmp/RtmprKBUz9$ jar xvf file16976313e25b
  inflated: ss14pusa.csv
  inflated: ss14pusb.csv
  inflated: ss14pusc.csv
  inflated: ss14pusd.csv
  inflated: ACS2010-2014_PUMS_README.pdf
jeff@jeff-Precision-7520:/tmp/RtmprKBUz9$

Here is Ubuntu info:

jeff@jeff-Precision-7520:/tmp/RtmprKBUz9$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.3 LTS
Release:	16.04
Codename:	xenial
jeff@jeff-Precision-7520:/tmp/RtmprKBUz9$ uname -a
Linux jeff-Precision-7520 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
jeff@jeff-Precision-7520:/tmp/RtmprKBUz9$ 

Can't install with pak

> pak::pkg_install("jimhester/archive")
ℹ Checking for package metadata updates
✔ All 12 metadata files are current.                                                       
✔ Loading session disk cached package metadata                                             
✔ Using cached package metadata                                                            
                                                                                           
→ Will install 1 packages:
  jimhester/archive
  
→ Will update 1 packages:
  tidyverse/glue
  
→ Will not update 15 packages.

! Package(s) `glue` are already loaded, installing them may cause
  problems. Use `pkgload::unload()` to unload them.

→ Will download 2 packages with unknown size.

? Do you want to continue (Y/n) Y
Error: callr subprocess failed: Failed to download archive from `https://api.github.com/repos/jimhester/archive/zipball/09754896e63a96f928aaacf9528589caba7d6128`.

Release archive 1.0.1

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

error in installation

Looks like a great package, facing this currently when trying to install:
ERROR: configuration failed for package ‘archive’
fatal error: 'archive.h' file not found
Any other package dependencies for archive?

archive_extract on 7-zip file fails on unix

hi, thanks for the awesome software. here's a minimal reproducible example that walks through a file that fails with archive_extract but not 7-zip software on unix. the problem does not occur on windows

## warning: large download


# example from the PISA 2015 website to show there's no download or site-specific problem
fn <- "http://vs-web-fs-1.oecd.org/pisa/PUF_SAS_COMBINED_CMB_SCH_QQQ.zip"

tf <- tempfile()
tf2 <- tempfile()

download.file( fn , tf , mode = 'wb' )

file.info( tf )
file.copy( tf , tf2 )

# works fine
system(paste0("7za e -o'" , tempdir() , "' '" , tf2 , "'"))
R.utils::countLines( file.path( tempdir() , "cy6_ms_cmb_sch_qqq.sas7bdat" ) )

file.remove( file.path( tempdir() , "cy6_ms_cmb_sch_qqq.sas7bdat" ) )

# works fine
archive::archive_extract( tf , dir = tempdir() )
R.utils::countLines( file.path( tempdir() , "cy6_ms_cmb_sch_qqq.sas7bdat" ) )

file.remove( tf )
file.remove( tf2 )


# much larger file from the PISA 2015
fn <- "http://vs-web-fs-1.oecd.org/pisa/PUF_SAS_COMBINED_CMB_STU_QQQ.zip"

download.file( fn , tf , mode = 'wb' )

file.info( tf )
file.copy( tf , tf2 )

# works fine
system(paste0("7za e -o'" , tempdir() , "' '" , tf2 , "'"))
R.utils::countLines( file.path( tempdir() , "cy6_ms_cmb_stu_qq2.sas7bdat" ) )

file.remove( file.path( tempdir() , "cy6_ms_cmb_stu_qq2.sas7bdat" ) )

# FAILS on unix WORKS on windows
archive::archive_extract( tf , dir = tempdir() )
# archive_extract() appears to have worked, but the extracted files are all blank
R.utils::countLines( file.path( tempdir() , "cy6_ms_cmb_stu_qq2.sas7bdat" ) )

archive_extract failure on both unix and windows

hi, archive_extract fails on unix and windows but R's default unzip() function succeeds..thank you

tf <- tempfile()
download.file( "http://download.inep.gov.br/microdados/micro_enem1998.zip" , tf , mode = 'wb' )

first_zipped_file <- unzip( tf , exdir = tempdir() )

second_zipped_file <- grep( "\\.zip$" , first_zipped_file , value = TRUE )

# works
windows_unzip <- unzip( second_zipped_file , exdir = tempdir() )
R.utils::countLines( windows_unzip )
file.remove( windows_unzip )


# works
system(paste0("7za e -o'" , tempdir() , "' '" , second_zipped_file , "'"))
R.utils::countLines( windows_unzip )
file.remove( windows_unzip )


# fails
archive::archive_extract( second_zipped_file , dir = tempdir() )
R.utils::countLines( windows_unzip )

Installation

Do we need to install libarchive-dev before installing this R Package? Is there a way that all the dependencies can be built within the installation of this R Package.

Currently, when I try to install the package from github, I get the following message

devtools::install_github("jimhester/archive")
Note: no visible binding for global variable '.Data'
Downloading GitHub repo jimhester/archive@master
from URL https://api.github.com/repos/jimhester/archive/zipball/master
Installing archive
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL
'/tmp/RtmpyKaB6I/devtools4a93024acee/jimhester-archive-a203312'
--library='/home/rstudio/R/x86_64-pc-linux-gnu-library/3.2' --install-tests

  • installing source package ‘archive’ ...
    PKG_CFLAGS=
    PKG_LIBS=-larchive
    :1:21: fatal error: archive.h: No such file or directory
    compilation terminated.
    ------------------------- ANTICONF ERROR ---------------------------
    Configuration failed because libarchive was not found. Try installing:
  • deb: libarchive-dev (Debian, Ubuntu, etc)
  • rpm: libarchive-devel (Fedora, CentOS, RHEL)
  • csw: libarchive_dev (Solaris)
  • brew: libarchive (Mac OSX)
    If libarchive is already installed, check that 'pkg-config' is in your
    PATH and PKG_CONFIG_PATH contains a libarchive.pc file. If pkg-config
    is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
    R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'

ERROR: configuration failed for package ‘archive’

  • removing ‘/home/rstudio/R/x86_64-pc-linux-gnu-library/3.2/archive’
    Error: Command failed (1)

R has to be restarted

I have a winrar file which is a group of 7 csv files. In the below code, if I run one file at a time, it works but when I use a "For Loop" against the set of 7 files, R gets hanged and has to be restarted

Code:

library(archive)
library(readr)
setwd("C:/Users/debsush/Documents/IEOD/")

EOD <- data.frame(matrix(NA, nrow = 1, ncol = 11),stringsAsFactors = FALSE)
colnames(EOD) <- c("TICKER","NAME","PRODUCT","EXCHANGE","DATE","OPEN","HIGH","LOW","CLOSE","VOLUME","IO")
#Remove all NA rows from GStocksFinancials
EOD<- EOD[which(!is.na(EOD$TICKER)), ]
#Process Data
doclist = list.files()
archive <- archive(doclist[[1]])
arcFile=NULL
for (counter in 1:6){
  fileName = toupper(archive$path[counter])
  if (length(grep(".CSV",fileName))>0 && grep(".CSV",fileName)==1) {
    
    if (length(grep("BSE",fileName))>0 && grep("BSE",fileName)==1){
      exchange="BSE"
    } else {
      exchange ="NSE"
    }
    
    if (length(grep("INDICES",fileName))>0 && grep("INDICES",fileName)==1){
      product="EQINDEX"
    } else if (length(grep("OPTIONS",fileName))>0 && grep("OPTIONS",fileName)==1){
      product="OPTIONS"
    } else if (length(grep("FOREX",fileName))>0 && grep("FOREX",fileName)==1){
      product ="FOREXFUT"
    } else if (length(grep("FUT",fileName))>0 && grep("FUT",fileName)==1){
      product ="EQFUT"
    } else {
      product ="EQCASH"
    }
    arcFile = archive_read(archive,archive$path[counter])
    file=read_csv(arcFile,col_type=cols())
    arcFile=NULL
    cat(counter)
    file$exchange=exchange
    file$product=product
    colnames(file)=c("TICKER","NAME","DATE","OPEN","HIGH","LOW","CLOSE","VOLUME","IO","EXCHANGE","PRODUCT")
    file=file[,c("TICKER","NAME","PRODUCT","EXCHANGE","DATE","OPEN","HIGH","LOW","CLOSE","VOLUME","IO")]
    EOD = rbind(EOD,file)
  }
}

I am not sure if it an issue with archive or readr. The behavior is very random and unpredictable.

PS: I can share the winrar file if that helps. Github does not allow attaching winrar files so I would have to email you.

Regards
SD

Put on CRAN?

Hi, this package works wonders for 7z products in R. Thank you!

Are there any plans for it to return to CRAN? It would be useful as a dependency for packages aiming for a CRAN submission.

Mike

Support appending to archives?

Currently archive_write always creates a new archive. It would be useful to add new files to an existing archive as well as appending to a file within the archive.

zipped file works on windows, fails on unix

hi, here's a minimal reproducible example.. failed on two different unix machines. not sure if this is unsupported? thank you

tf <- tempfile()
download.file( 'https://nlsinfo.org/cohort-data/nlsy97_all_1997-2013.zip' , tf , mode = 'wb' )
archive::archive_extract( tf , dir = tempdir() )



> archive::archive_extract( tf , dir = tempdir() )
Error in archive_extract_(attr(archive, "path"), file) :
  archive_write_data_block(): Write failed
>
> traceback()
3: stop(list(message = "archive_write_data_block(): Write failed",
	   call = archive_extract_(attr(archive, "path"), file), cppstack = list(
		   file = "", line = -1L, stack = c("/export/scratch1/home/damico/R/x86_64-redhat-linux-gnu-library/3.3/archive/libs/archive.so(Rcpp::exception::exception(char const*, bool)+0x84) [0x7f6a10b583b4]",
		   "/export/scratch1/home/damico/R/x86_64-redhat-linux-gnu-library/3.3/archive/libs/archive.so(void Rcpp::stop<char const*>(char const*, char const*&&)+0x4f) [0x7f6a10b662cf]",
		   "/export/scratch1/home/damico/R/x86_64-redhat-linux-gnu-library/3.3/archive/libs/archive.so(archive_extract_(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Rcpp::Vector<16, Rcpp::PreserveStorage>, unsigned long)+0x2c7) [0x7f6a10b661e7]",
		   "/export/scratch1/home/damico/R/x86_64-redhat-linux-gnu-library/3.3/archive/libs/archive.so(_archive_archive_extract_+0x191) [0x7f6a10b560f1]",
		   "/usr/lib64/R/lib/libR.so(+0x10406a) [0x7f6a1ff7106a]",
		   "/usr/lib64/R/lib/libR.so(Rf_eval+0x180) [0x7f6a1ff788b0]",
		   "/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x51d) [0x7f6a1ff7a51d]",
		   "/usr/lib64/R/lib/libR.so(+0x103ad6) [0x7f6a1ff70ad6]",
		   "/usr/lib64/R/lib/libR.so(Rf_eval+0x180) [0x7f6a1ff788b0]",
		   "/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x51d) [0x7f6a1ff7a51d]",
		   "/usr/lib64/R/lib/libR.so(Rf_eval+0x30d) [0x7f6a1ff78a3d]",
		   "/usr/lib64/R/lib/libR.so(Rf_ReplIteration+0x1ba) [0x7f6a1ffa03aa]",
		   "/usr/lib64/R/lib/libR.so(+0x1337b1) [0x7f6a1ffa07b1]",
		   "/usr/lib64/R/lib/libR.so(run_Rmainloop+0x48) [0x7f6a1ffa0868]",
		   "/usr/lib64/R/bin/exec/R(main+0x1b) [0x55bcf9ac78cb]",
		   "/lib64/libc.so.6(__libc_start_main+0xf1) [0x7f6a1d4eb731]",
		   "/usr/lib64/R/bin/exec/R(_start+0x29) [0x55bcf9ac7909]"
		   ))))
2: archive_extract_(attr(archive, "path"), file)
1: archive::archive_extract(tf, dir = tempdir())
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 24 (Twenty Four)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] httr_1.2.1      R6_2.2.2        tools_3.3.3     withr_1.0.2
 [5] tibble_1.3.3    curl_2.3        Rcpp_0.12.12    memoise_1.0.0
 [9] git2r_0.18.0    digest_0.6.12   rlang_0.1.1     devtools_1.13.2
[13] archive_1.0.0
>

R Session Aborted

When i try use archive my R session drops.

See the reproducible example below:

library(RCurl) 
library(archive)
library(readr)




caged_url <- paste0("ftp://ftp.mtps.gov.br/pdet/microdados/CAGED/",year(Sys.Date()),"/")

caged_arquivos <- getURL(url = caged_url,
                         verbose=TRUE,ftp.use.epsv=TRUE, dirlistonly = TRUE)

caged_arquivos <- unlist(strsplit(caged_arquivos, "\r\n"))
caged_arquivos <- sort(caged_arquivos, decreasing = FALSE)

#Define o nome do último arquivo a ser baixado
ultimo_arquivo <- tail(caged_arquivos,n=1)

endereco_ultimo_arquivo <- paste0(caged_url,ultimo_arquivo)

bin <- getBinaryURL(endereco_ultimo_arquivo,ssl.verifypeer=FALSE)

con <- file(ultimo_arquivo, open = "wb")
writeBin(bin, con)
close(con)

caged <- archive("CAGEDEST_092017.7z")

caged <- read_csv(file =  archive_read(caged),col_types = cols()) 

Many thanks!

Packing to RAR files

Hi,

I tried to pack some file to « .rar » with the archive R package but I was not able to. From my point of view, it would be great if we could add this feature to the archive package.

Best regards,

Emmanuel

Parsing filters is unsupported

archive returns the following error when extracting .rar file:

> archive_extract("frota_por_municipio_e_tipo_1-2015.rar")
Error in archive_extract_(attr(archive, "path"), file) :
  archive_read_data_block(): Parsing filters is unsupported.

The file is extracted, but its size is 0 bytes

The extraction is done correctly with:

unrar e frota_por_municipio_e_tipo_1-2015.rar

R version 4.0.4
libarchive 3.5.1-1
unrar 1:6.0.3-1

Error in archive$path[[file]] : subscript out of bounds

Hello. First of all, thank you for your work and the generosity of making it publicly available.

I'm trying to open a .rar.001 file and I get this: Error in archive$path[[file]] : subscript out of bounds. I guess it's because of the extension. I'm not used to debugging in R, so the following might not be helpful. However, I'm attaching the code just in case.

In case multi-part rar files are not supported, I'd be helpful to add a brief an error message explaining that.

Thanks!

#I've set the wd appropriately in the following example
> arch <- "compressed_file_name.rar.001"
> debug(archive_read)
> read_csv(archive_read(arch), col_types = cols())
debugging in: archive_read(arch)
debug: {
    archive <- as_archive(archive)
    if (is_number(file)) {
        file <- archive$path[[file]]
    }
    assert("`file` must be a length one character vector or numeric", 
        length(file) == 1 && (is.character(file) || is.numeric(file)))
    assert(paste0("`file` {file} not found in `archive` {archive}"), 
        file %in% archive$path)
    read_connection(attr(archive, "path"), mode = mode, 
        file, archive_formats()[format], archive_filters()[filter])
}
Browse[2]> 
debug: archive <- as_archive(archive)
Browse[2]> 
debug: if (is_number(file)) {
    file <- archive$path[[file]]
}
Browse[2]> 
debug: file <- archive$path[[file]]
Browse[2]> 
Error in archive$path[[file]] : subscript out of bounds
> debug(as_archive)
Error in debug(as_archive) : object 'as_archive' not found

Fails to install with R 4.1.0

I am running Ubuntu 20.04, and had successfully install archive with devtools in the previous 4.0.x versions of R. But now, I'm getting a strange installation failure. I definitely have libarchive-dev installed.

sudo apt install libarchive-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libarchive-dev is already the newest version (3.4.0-2ubuntu1).

Any ideas what I can do?

Running `R CMD build`...
* checking for file ‘/tmp/RtmpSXhw5i/remotes4a59f2fd4cfb7/jimhester-archive-8ce0ba7/DESCRIPTION’ ... OK
* preparing ‘archive’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* running ‘cleanup’
* installing the package to process help pages
      -----------------------------------
* installing *source* package ‘archive’ ...
** using staged installation
Found pkg-config cflags and libs!
'config' variable 'CXXCPP' is defunct
PKG_CFLAGS=
PKG_LIBS=-larchive
./configure: line 53: -g: command not found
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libarchive was not found. Try installing:
 * deb: libarchive-dev (Debian, Ubuntu, etc)
 * rpm: libarchive-devel (Fedora, CentOS, RHEL)
 * csw: libarchive_dev (Solaris)
 * brew: libarchive (Mac OSX)
If libarchive is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libarchive.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘archive’
* removing ‘/tmp/RtmpTk4Goh/Rinst4a8833af25591/archive’
      -----------------------------------
ERROR: package installation failed
STDOUT:
* checking for file ‘/tmp/RtmpSXhw5i/remotes4a59f2fd4cfb7/jimhester-archive-8ce0ba7/DESCRIPTION’ ... OK
* preparing ‘archive’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* running ‘cleanup’
* installing the package to process help pages
      -----------------------------------
* installing *source* package ‘archive’ ...
** using staged installation
Found pkg-config cflags and libs!
'config' variable 'CXXCPP' is defunct
PKG_CFLAGS=
PKG_LIBS=-larchive
./configure: line 53: -g: command not found
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libarchive was not found. Try installing:
 * deb: libarchive-dev (Debian, Ubuntu, etc)
 * rpm: libarchive-devel (Fedora, CentOS, RHEL)
 * csw: libarchive_dev (Solaris)
 * brew: libarchive (Mac OSX)
If libarchive is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libarchive.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘archive’
* removing ‘/tmp/RtmpTk4Goh/Rinst4a8833af25591/archive’
      -----------------------------------
ERROR: package installation failed
STDERR:

Error: Failed to install 'archive' from GitHub:
  Failed to `R CMD build` package, try `build = FALSE`.

Is the Remote specification on glue needed?

I have an appveyor build failing that I don't quite understand: https://ci.appveyor.com/project/raymondben/bowerbird. The failure is related to archive: it appears that glue is being installed OK from CRAN but then because the github version is newer, and because archive's DESCRIPTION file includes Remotes: tidyverse/glue, it then tries to install from github and fails.
Is the Remotes spec needed? (Bearing in mind that I am not sure that this is even the problem here. I've tried to discourage appveyor from building glue from source, but that doesn't seem to help)

Unused/unclosed connections with saveRDS

This may be related to #17

  • When using saveRDS() with archive, it eventually results in a warning about unclosed connections.
  • This does not happen with write.csv().
  • I get this warning with archive format set to tar, cpio and zip.
for (i in 1:100) {
  saveRDS(mtcars, archive::archive_write(archive = "archive.file", file = 'first', format = 'tar'))
}
head(warnings())
Warning messages:
1: In .Internal(textConnection(nm, object, open, env, type)) :
  closing unused connection 127 (input)
2: In .Internal(textConnection(nm, object, open, env, type)) :
  closing unused connection 126 (input)
3: In .Internal(textConnection(nm, object, open, env, type)) :
  closing unused connection 125 (input)
4: In .Internal(textConnection(nm, object, open, env, type)) :
  closing unused connection 124 (input)
5: In .Internal(textConnection(nm, object, open, env, type)) :
  closing unused connection 123 (input)
6: In .Internal(textConnection(nm, object, open, env, type)) :
  closing unused connection 122 (input)

'libarchive' v3.3.3 installed via 'brew'

# * installing *source* package ‘archive’ ...
# PKG_CFLAGS=-I/usr/local/Cellar/libarchive/3.3.3/include -I/usr/local/Cellar/xz/5.2.4/include
# PKG_LIBS=-L/usr/local/Cellar/libarchive/3.3.3/lib -L/usr/local/Cellar/xz/5.2.4/lib -larchive -lexpat -llzma -lzstd -llz4 -lbz2 -lz -llzma -D_THREAD_SAFE -pthread

Session info

> devtools::session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.2.907)           
 language (EN)                        
 collate  en_AU.UTF-8                
 date     2018-09-26                  

Packages ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 package    * version    date       source                            
 archive    * 1.0.0      2018-09-25 Github (jimhester/archive@11e65d7)
 base       * 3.5.1      2018-07-05 local                             
 compiler     3.5.1      2018-07-05 local                             
 crayon       1.3.4      2017-09-16 CRAN (R 3.5.0)                    
 datasets   * 3.5.1      2018-07-05 local                             
 devtools     1.13.6     2018-06-27 CRAN (R 3.5.0)                    
 digest       0.6.15     2018-01-28 CRAN (R 3.5.0)                    
 glue         1.3.0      2018-09-25 Github (tidyverse/glue@4e74901)   
 graphics   * 3.5.1      2018-07-05 local                             
 grDevices  * 3.5.1      2018-07-05 local                             
 memoise      1.1.0      2017-04-21 CRAN (R 3.5.0)                    
 methods    * 3.5.1      2018-07-05 local                             
 packrat      0.4.9-3    2018-06-01 CRAN (R 3.5.0)                    
 pillar       1.3.0      2018-07-14 CRAN (R 3.5.0)                    
 Rcpp         0.12.18    2018-07-23 cran (@0.12.18)                   
 rlang        0.2.1.9000 2018-08-10 Github (r-lib/rlang@8dc87a9)      
 rstudioapi   0.7        2017-09-07 CRAN (R 3.5.0)                    
 stats      * 3.5.1      2018-07-05 local                             
 tibble       1.4.2      2018-01-22 CRAN (R 3.5.0)                    
 tools        3.5.1      2018-07-05 local                             
 utils      * 3.5.1      2018-07-05 local                             
 withr        2.1.2      2018-03-15 CRAN (R 3.5.0)  

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.