mholt / archiver Goto Github PK
View Code? Open in Web Editor NEWEasily create & extract archives, and compress & decompress files of various formats
Home Page: https://pkg.go.dev/github.com/mholt/archiver/v4
License: MIT License
Easily create & extract archives, and compress & decompress files of various formats
Home Page: https://pkg.go.dev/github.com/mholt/archiver/v4
License: MIT License
I am trying to extract a zip file on Windows that seems to be created on Linux.
I think it does not work because inside the archive / are used and in sanitizeExtractPath filepath.Join convert it to \ which cause a mismatch in the check with strings.HasPrefix.
I have created a simple piece of code that can reproduce the issue (only on windows):
package main
import (
"github.com/mholt/archiver"
"io"
"log"
"net/http"
"os"
)
var url = "https://dl.google.com/go/go1.10.4.windows-386.zip"
func getTestZip(fileName string) {
log.Print("Start download")
out, err := os.Create(fileName)
if err != nil {
log.Fatal(err)
}
defer out.Close()
resp, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
_, err = io.Copy(out, resp.Body)
if err != nil {
log.Fatal(err)
}
}
func main() {
tmpZip := "test.zip"
if _, err := os.Stat(tmpZip); os.IsNotExist(err) {
getTestZip(tmpZip)
}
log.Print("Start extract")
err := archiver.Zip.Open(tmpZip, "tmp/")
if err != nil {
log.Fatal(err)
}
log.Print("Finished")
}
which will output:
2018/09/02 15:04:05 Start download
2018/09/02 15:04:19 Start extract
2018/09/02 15:04:19 go/: illegal file path
I used go 1.11 on windows 10.
compress two directories(/usr/local/bin","/usr/local/etc") to example.tar.gz
$ tar -ztf example.tar.gz
bin/
bin/aria2c
bin/awk
...
bin/zipdetails
etc/
etc/GeoIP.conf
...
etc/nginx/
etc/nginx/fastcgi.conf
extract example.tar.gz to /tmp/
Current directory structure.
.
├── bin
└── etc
Expected directory structure.
.
└── usr
└── local
├── bin
└── etc
in targz.go
// Create opens txz for writing a compressed
// tar archive to out.
func (tgz *TarGz) Create(out io.Writer) error {
tgz.wrapWriter()
return tgz.Create(out)
}
is a recursive call, which won't work.
it should be
// Create opens txz for writing a compressed
// tar archive to out.
func (tgz *TarGz) Create(out io.Writer) error {
tgz.wrapWriter()
return tgz.Tar.Create(out)
}
Hello !
Actually i'm became a frequent user of archiver since its totally generalist and efficient. The case of openning huge archives or several archives at once is bit painfull which leaded me to think about these improvements.
Regex matching:
Archiving or reading archive files using simple regex match
archiver open -r *.zip mydestination
Multiple targets (for regex expressions and simple file names)
for files:
archiver open file1.zip file2.tar mydestination
for regex
archiver open -r *.zip *.tar mydestination
distinct destinations
# simple
archiver open file1.zip -d mydest1 file2.zip -d mydest2
# regex
archiver open -r *.zip -d mydest1 *.tar -d mydest2
parralel openning
archiver --parallel --cores=2 open -r *.zip *.tar mydestination
All the examples above for multiple files Open situations. For making archives, it's not too usefull to add parralel support but regex matching might be interessting.
archiver make -r myarchive.zip *.png
Specifications
Actually archiver is built on simple command line parser, implementing all this features might need to use Cobra or any other command line library.
Would be glad to hear what do you think and help implementing these features !
First of all big thanks for this great little helper. It makes writing replacements for shell scripts a real joy :)
One thing I ran into, though, is that the current implementation of for instance the Zip extractor seems to leak some file descriptors. Especially for larger archives this can lead to too many file descriptors being open at the same time. In such a situation I received something like that as error:
creating new file: open path/to/a/file: too many open files in system
I haven't yet found the dangling descriptors yet :(
I'd like to make a PR to exclude certain directories when archiving.
So we just need to pass down a slice of excluded paths to filepath.Walk, then return filepath.SkipDir if the path matches any of the excluded path?
What do you think?
Hi..I think than it could be possible using https://godoc.org/xi2.org/x/xz
package main
import (
"archive/tar"
"fmt"
"io"
"log"
"os"
"xi2.org/x/xz"
)
func main() {
// Open a file
f, err := os.Open("myfile.tar.xz")
if err != nil {
log.Fatal(err)
}
// Create an xz Reader
r, err := xz.NewReader(f, 0)
if err != nil {
log.Fatal(err)
}
// Create a tar Reader
tr := tar.NewReader(r)
// Iterate through the files in the archive.
for {
hdr, err := tr.Next()
if err == io.EOF {
// end of tar archive
break
}
if err != nil {
log.Fatal(err)
}
switch hdr.Typeflag {
case tar.TypeDir:
// create a directory
fmt.Println("creating: " + hdr.Name)
err = os.MkdirAll(hdr.Name, 0777)
if err != nil {
log.Fatal(err)
}
case tar.TypeReg, tar.TypeRegA:
// write a file
fmt.Println("extracting: " + hdr.Name)
w, err := os.Create(hdr.Name)
if err != nil {
log.Fatal(err)
}
_, err = io.Copy(w, tr)
if err != nil {
log.Fatal(err)
}
w.Close()
}
}
f.Close()
}
can u check this?..thanks!
I'm attempting to extract rar files that are multi-part (.rar .r01 r.02 etc). Using the unrar
tools available out there, I just pass them a single file name (the .rar file) and they extract all the subsequent archives. I'm using archiver
as a library and calling archiver.Rar.Open(file, path)
then receive this error: writing file: rardecode: archive continues in next volume
Am I missing something, or is this support missing? Thanks!
Go packaging tools that work with versions (e.g. dep) are picking up the release version v2.0.0
from 2016. The latest code needs to be packaged in a release version that is above 2.0.0
so that it can work with the tools. Is this possible to do ASAP?
With your last change, the interface changed and I am trying to move from archiver.Tar.Make(...
to the new interface, but not having any luck. A bit of a Go novice, so please bare with me.
"github.com/mholt/archiver"
...
err := archiver.Archiver.Archive([]string{
"src",
"test",
"help"
}, "/tmp/some.tar")
gives me:
./main.go:32:34: not enough arguments in call to method expression archiver.Archiver.Archive
have ([]string, string)
want (archiver.Archiver, []string, string)
打包成gzip压缩包时,目录过长情况会出错。在windows打包,放在linux解压时会有问题
(Packaged into gzip compressed package, the directory is too long will be wrong. Packaged in windows, there will be problems on the linux decompression)
Hi,
Please consider to provide exported functions having Reader
/ Writer
as argument instead of file names too (where possible). It would make the library more flexible by enabling streaming, buffering, in-memory use-cases. Thanks.
I've been using archiver for a while for a tool that downloads releases from github (and other places) and automatically unpacks them to make their commands available for running. Archiver has worked out very well so far.
Today, I wanted to add restic to my tool, and it's downloads are only bzip2 compressed, as opposed to being tar'd first. This is not supported by archiver.
It looks like it would be possible to add support for archives of one file that are only compressed (gz or bzip2). I am wondering, would this support be in line with the goals of the project and likely be accepted? I want to know before investing more effort.
Thanks for making such a useful library.
Having been using this package for a few years now, I've encountered a number of issues that lead me to want to redesign this package entirely: burn it down and start over, copying only the fundamental parts of the code, and not worrying about backwards compatibility.
Some specific issues I've experienced:
Too much magic. Recently I spent a day debugging an problem where a .zip file couldn't be extracted every time with archiver. Sometimes it would work, sometimes it wouldn't. I eventually discovered that it's because archiver determines which extractor to use based on the extension and the file header while iterating through a map of formats (which is not ordered). If the Zip format came first, it matched by extension but failed to extract; if the TarGz format came first, it matched by file header (because it was actually a tar.gz file), and extraction succeeded.
Weak API. Apparently I was able to accidentally create a .tar.gz file with the Zip archiver, because the name I built for the file was not attached to which archiver format I was using. I can do archiver.TarGz.Make("file.zip")
without errors, which is bad. Here's the code that led to my bug in the first place (notice the missing .
in "zip"
):
var a archiver.Archiver = archiver.TarGz
if filepath.Ext(outputFile) == "zip" {
a = archiver.Zip
}
^ Bad package design.
Not enough customizability. Namely: compression level; whether to include a top-level named folder vs. just its contents), similar to how rsync works based on presence of a trailing slash; and whether to overwrite existing files when outputting.
Lack of native streaming capabilities. Recently there were From a library perspective, I should be able to stream in a zip file and spit out individual files, or stream in individual files (or a list of filenames?) and spit out a single zip file.
There is no true cross-platform native solution to zip-slip (yet). I had to disable the "security feature" that prevented me extracting a perfectly safe archive. Even "SecureJoin" solutions don't cut it (read the linked thread, and its linked threads). For now, these "mitigations" only get in the way.
Not enough power to inspect archives or cherry-pick files. It would be helpful to be able to work with archives' contents without performing an extraction, such as getting listings, or filtering which files are extracted, etc.
General solutions:
When possible (almost always), match only by file header and ignore file extension. If the file contents are not (yet) available, then use extension but only after a warning or explicit opt-in. Or, (maybe "Also,") require that the file extension, where present, matches the format when creating an archive.
Be verbose in the error messages; if doing any magic, report it or make the magic explicitly opt-in either with a configuration parameter or a special high-level API that is documented as being magical, which wraps the underlying, concrete, explicit functions.
Couple the file extension to the archiver. For example: don't allow the Zip archiver to make a .tar.gz file. For example, the buggy code above could have been avoided with something more like this: archiver.Make(outputFile, files...)
which uses the extension of outputFile
to force a format that matches.
Expand the API so that an archiver is created for a specific format before being used, rather than having hard-coded globals like archiver.Zip like we do now. This will allow more customization too. Imagine zipArch := archiver.Zip{CompressionLevel: 10}
or something similar.
Be explicit about our threat model, which is being adjusted, to state that the files are expected to be trusted, i.e. don't download files you don't trust. Maybe it is possible to inspect a file before extracting it to know whether it could be malicious (e.g. look for zip-slip patterns in file names), but I am not sure about that yet.
Moar interfaces. We have one, Archiver, but we might need more, to accommodate an expanded design with more features. Small interfaces are the best.
Rename the package to (Decided to keep it the same)archive
.
This issue is to track the discussion about the new design; work will hopefully begin soon, as I can find the time.
It seems possible to use Create() and Archive(), but archive requires a file path to be specified and does some checks, creates a directory, that isn't necessary when streaming.
how to add a file to an existed zip file?
Travis Build log(shortened to show relevant information):
go version go1.7.1 linux/amd64
...
gostuff/util.go:26: cannot call non-function archiver.Zip (type archiver.zipFormat)
gostuff/util.go:36: undefined: archiver.Unzip
Line 26 in util.go:
err := archiver.Zip(destination, source)
Line 36 in util.go:
err := archiver.Unzip(source, destination)
The travis build was successful last night, just pushed a commit tonight and noticed the travis build failing. I tested on my development and production environments and I do not encounter this error. I did not modify any code recently in my util.go file so I can only guess that this archiver package made an update within the past 24 hours that was not backward compatible.
The link to the travis build can be seen here https://travis-ci.org/jonpchin/GoChess/builds/166621466
I was looking at the tarFile
function and saw a harcoded "/" in it:
https://github.com/mholt/archiver/blob/master/tar.go#L146
Shouldn't we use os.PathSeparator instead of "/"? If necessary I can send a PR.
Ignore/Close this issue if there's no problem on using this character as path separator in Windows environments. I'm also a UNIX user. 😄
I want to compress all the file the current directory, output to the current directory; the result is that untar the tar file, it contains a tar file that has a same tar-filename.
While it's not particularly hard to loop over all SupportedFormats calling Match with a given filename, it's something that users of the library will probably end up needing often so I think it would be nice to have it here.
./archiver_linux_amd64 -h returns a non-zero exit status. It should be zero.
Opening an archive containing hard linked files results in unknown type flag: 1
It would be cool if there was a staticly compiled option on the releases page.
Having used this library to zip files for use in AWS CodeDeploy, I noticed a strange bug:
All .png files extracted by CodeDeploy have size 0 bytes. I suspect this is the same for all compressed formats. I am using the latest version of both tools.
I fixed it by removing all extensions from the compressedFormats map in zip.go.
There could be some sort of incompatibility between this archiver and the one used by CodeDeploy. I couldn't not investigate this any further due to lack of time, so I apologize for the brief explanation.
Hope this helps.
I find that this package has a reference to lzma.
Windows 10 issue
zip.Make comment states that it handles filePaths being a directory. However, on use it os.stat (line 57 of zip.go) fails for paths.
Hi. Can you please add tags more often? It's for dependency management. The last one was over a year ago and if I want to use latest features I have to use commit hashes in my Gopkg.toml.
Here is an example of the file:
test.zip (and test.rar)
_______/testfolder/
________________file1.jpg
________________file2.jpg
When using archive.Unzip, it behaves as expected -- creates the subfolders as they are in the zip archive:
err := archiver.Unzip("c:\\tmp\\test.zip", "c:\\tmp\\test\\")
However when running the same command to unrar, it appears as though it's not able to Mkdir the subfolders.
err := archiver.Unrar("c:\\tmp\\test.rar", "c:\\tmp\\test\\")
Error:
c:\tmp\test\testfolder\file1.jpg: creating new file: open c:\tmp\test\testfolder\file1.jpg: The system cannot find the path specified.
If it helps, line 41 of rar.go is checking
if header.IsDir
but it seems to evaluate as false, since technically the path includes both the folder and the filename (I'm assuming).
Hi @mholt, this is a great tool/library!
I've found it useful to avoid overwriting files if they already exist when opening an archive to a folder. I'm wondering if you'd like to incorporate this as an option to your library? Here's my change: schollz@8b912ca. Of course, it would need to be amended so it can be toggled as optional.
At the moment my relative symlinks are converted to absolute symlinks when using TarGz, this breaks the code I'm archiving.
https://github.com/ulikunitz/xz is very slow at extracting.
https://github.com/xi2/xz is 10x faster for me. Consider using this for extraction. It doesn't support compression, so it won't be a full replace.
as a docker drop in
ISO_URL=
filter to trash iso , get squashfs and unpack @ / would be a nice to have so far i have a slower python script nearly done , but ... as a lib to do that could make re-baking live iso's faster in docker..
go compiled right tends to be mounds faster.
ability to iso dump and squashfile also nice to have.
I have problems with unpacking a large rar archive, the archive itself takes about 6GB and unpacked 25GB. The archive contains XML files with a database of Russian addresses.
You can download it from the official site http://fias.nalog.ru or from the direct link http://fias.nalog.ru/Public/Downloads/Actual/fias_xml.rar
The full error text
ERROR: 2017/10/12 00:37:10 unpack.go:12: Error unpack archive: data\AS_HOUSE_20171008_bed24a8e-4646-448d-acb8-8de765818389.XML: writing file: rardecode: decoder expected more data than is in packed file
The function I'm using is
package unpack
import (
"github.com/mholt/archiver"
"os"
"../loger"
)
func Unpack(dest string, archive string) {
err := archiver.Rar.Open(archive, dest)
if err != nil {
loger.Error.Printf("Error unpack archive: %v\n", err)
os.Exit(1)
}
}
I'm new to Go, you can tell me what the problem is.
This issue seems related to: #61 however I decided to create a new issue in case my hunch was incorrect.
Relevant Info
Compress-Archive
and the integrated system.io.compression.filesystem
Zipfile class (that powershell also uses under the hood).Observed High Level Issue
C:\Users\lg\temp\testfoldera\0\1\a: making directory for file: mkdir C:\Users\lg\temp\testfoldera\0: The system cannot find the path specified.
And can see that in the directory structure that should have been created there are "blank files" instead of some directories (aka files with no extension that in the actual zip are also directories).Current Theory
zip.go
needs to be updated to support Windows styled path endings: https://github.com/mholt/archiver/blob/master/zip.go#L195\
endings (depending on util, i did a very quick test with 7zip which seemed to force everything to unix style), this heuristic fails.compress-archive
makes some really weird zip structures that do have folder entries strewn about and make the issue much more prevalent. Not 100% sure but it feels like whenever you have a folder that only contains only folders it will perform this behavior.Minimum Test Case
archiver open
, notice it's the inner folder is now file.Thoughts
isDir()
function to work listed here: https://golang.org/pkg/os/#FileMode.IsDir which seems like the obvious solution if it did work.
isDir
perhaps something like os.PathSeparator
could get halfway there.if strings.HasSuffix(zf.Name, "/") || strings.HasSuffix(zf.Name, "\\") {
did solve my use-case.link, e:= os.Readlink(path)
if e ==nil { path=link }
header, err := tar.FileInfoHeader(info, path)
This may be able to be a non-issue, but I was expecting that when I passed a directory location that did not yet exist, the directory would get created and the zip would still get output to that path.
err := archiver.Zip.Make(outputPath, files)
Instead, it returns an error:
error creating [outputPath]: open [outputPath]: The system cannot find the path specified.
I created the output directory structure and the previous code worked fine.
When zipping using a desktop application, you can provide an nonexisting output path and the directories will be created for you. So, who should be the owner of creating the directories?
I'm on Win10 x64
Archiver is not careful enough when unpacking tar archives that contain symlinks. It will happily write over a symlink it previously created. This could cause directory traversal.
Proof of concept:
$ wget -q https://github.com/jwilk/path-traversal-samples/releases/download/0/symlink.tar -O traversal.tar
$ tar -tvvf traversal.tar
lrwxrwxrwx root/root 0 2018-06-05 16:55 moo -> /tmp/moo
-rw-r--r-- root/root 4 2018-06-05 16:55 moo
$ pwd
/home/jwilk
$ ls /tmp/moo
ls: cannot access '/tmp/moo': No such file or directory
$ archiver open traversal.tar
$ ls /tmp/moo
/tmp/moo
Tested with git master (e4ef56d).
See subject
Hey @mholt
There seem to be some issues with one dependency of this package: github.com/pierrec/lz4.
github.com/pierrec/lz4 (download) package github.com/pierrec/lz4/v2/internal/xxh32: cannot find package "github.com/pierrec/lz4/v2/internal/xxh32" in any of: /usr/lib/go-1.10/src/github.com/pierrec/lz4/v2/internal/xxh32 (from $GOROOT) /home/user001/go;/home/user001/git/filebrowser.github/src/github.com/pierrec/lz4/v2/internal/xxh32 (from $GOPATH)
cmd/server.go:14:4:error: could not import github.com/filebrowser/filebrowser/lib/http (type-checking package "github.com/filebrowser/filebrowser/lib/http" failed (/go/src/github.com/filebrowser/filebrowser/lib/http/download.go:12:2: could not import github.com/mholt/archiver (type-checking package "github.com/mholt/archiver" failed (/go/src/github.com/mholt/archiver/tarlz4.go:9:2: could not import github.com/pierrec/lz4 (type-checking package "github.com/pierrec/lz4" failed (/go/src/github.com/pierrec/lz4/reader.go:9:2: could not import github.com/pierrec/lz4/v2/internal/xxh32 (cannot find package "github.com/pierrec/lz4/v2/internal/xxh32" in any of: (gotype)
The directory v2
of their package seems to be missing.
./main_darwin_amd64.go:80:24: invalid method expression archiver.TarGz.Open (needs pointer receiver: (*archiver.TarGz).Open)
the following snippet was used on darwin (macos):
archiver.TarGz.Open(input, installDirPath())
the zip equivalent works on windows
archiver.Zip.Open(input, installDirPath())
On line 78 of zip.go a path.Join is used. the path package does not support windows. Please change to filepath.Join
While researching xz as a storage medium I came across this article.
https://www.nongnu.org/lzip/xz_inadequate.html
Is what they've outlined on this page a valid concern and we shouldn't be using xz?
When compressing a entire folder with
archiver.TarGz.Make("myfolder.tar.gz", []string{"./myfolder"})
If i open that arhive in Windows it will look great i can go through the directory structure... However when i extract this on linux it wil create filenames like this:
"picons\picons\france\1_0_1_EB2_AAFF_7EB6_0_0_0_0.png"
and empty folders named this:
"picons\picons\france"
That is not paths that is the actual name of the file....
Hi there.
I have a gzipped tarball generated from git (specifically using the github API).
When I attempt to Open it using archiver.TarGz.Open
I get this error: "pax_global_header: unkown type flag: g"
I searched for "pax_global_header" and I learned it seems to be an extended header that holds the git commit ID. I'm running OSX 10.12.
How should I go about investigating further? Is this known? Has anyone else had an issue?
Thanks for the nice utility.
Cheers!
could you please add a tag v2.0.0 so that it is semantic version compatible?
Hi,
I need to only extract 1 file (no more that 10KB, that I made it be first in the archive) of a very big .tar archive (more than 1GB).
Could you add a way to provide a list of file that we want to extract without extracting every files.
Thanks.
Hello, I have just tried to run go get -u github.com/mholt/archiver/cmd/archiver
and it comes back with...
package github.com/mholt/archiver/cmd/archiver: cannot find package "github.com/mholt/archiver/cmd/archiver" in any of:
/usr/local/go/src/github.com/mholt/archiver/cmd/archiver (from $GOROOT)
/Users/carlca/go/src/github.com/mholt/archiver/cmd/archiver (from $GOPATH)
I have Modules set to GO111MODULE=auto
.
Any idea what the problem is?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.