GithubHelp home page GithubHelp logo

go-getter's Introduction

go-getter

CircleCI Go Documentation

go-getter is a library for Go (golang) for downloading files or directories from various sources using a URL as the primary form of input.

The power of this library is being flexible in being able to download from a number of different sources (file paths, Git, HTTP, Mercurial, etc.) using a single string as input. This removes the burden of knowing how to download from a variety of sources from the implementer.

The concept of a detector automatically turns invalid URLs into proper URLs. For example: "github.com/hashicorp/go-getter" would turn into a Git URL. Or "./foo" would turn into a file URL. These are extensible.

This library is used by Terraform for downloading modules and Nomad for downloading binaries.

Installation and Usage

Package documentation can be found on GoDoc.

Installation can be done with a normal go get:

$ go get github.com/hashicorp/go-getter

go-getter also has a command you can use to test URL strings:

$ go install github.com/hashicorp/go-getter/cmd/go-getter
...

$ go-getter github.com/foo/bar ./foo
...

The command is useful for verifying URL structures.

Security

Fetching resources from user-supplied URLs is an inherently dangerous operation and may leave your application vulnerable to server side request forgery, path traversal, denial of service or other security flaws.

go-getter contains mitigations for some of these security issues, but should still be used with caution in security-critical contexts. See the available security options that can be configured to mitigate some of these risks.

go-getter may return values that contain caller-provided query parameters that can contain sensitive data. Context around what parameters are and are not sensitive is known only by the caller of go-getter, and specific to each use case. We recommend the caller ensure that go-getter's return values (e.g., error messages) are properly handled and sanitized to ensure sensitive data is not persisted to logs.

URL Format

go-getter uses a single string URL as input to download from a variety of protocols. go-getter has various "tricks" with this URL to do certain things. This section documents the URL format.

Supported Protocols and Detectors

Protocols are used to download files/directories using a specific mechanism. Example protocols are Git and HTTP.

Detectors are used to transform a valid or invalid URL into another URL if it matches a certain pattern. Example: "github.com/user/repo" is automatically transformed into a fully valid Git URL. This allows go-getter to be very user friendly.

go-getter out of the box supports the following protocols. Additional protocols can be augmented at runtime by implementing the Getter interface.

  • Local files
  • Git
  • Mercurial
  • HTTP
  • Amazon S3
  • Google GCP

In addition to the above protocols, go-getter has what are called "detectors." These take a URL and attempt to automatically choose the best protocol for it, which might involve even changing the protocol. The following detection is built-in by default:

  • File paths such as "./foo" are automatically changed to absolute file URLs.
  • GitHub URLs, such as "github.com/mitchellh/vagrant" are automatically changed to Git protocol over HTTP.
  • GitLab URLs, such as "gitlab.com/inkscape/inkscape" are automatically changed to Git protocol over HTTP.
  • BitBucket URLs, such as "bitbucket.org/mitchellh/vagrant" are automatically changed to a Git or mercurial protocol using the BitBucket API.

Forced Protocol

In some cases, the protocol to use is ambiguous depending on the source URL. For example, "http://github.com/mitchellh/vagrant.git" could reference an HTTP URL or a Git URL. Forced protocol syntax is used to disambiguate this URL.

Forced protocol can be done by prefixing the URL with the protocol followed by double colons. For example: git::http://github.com/mitchellh/vagrant.git would download the given HTTP URL using the Git protocol.

Forced protocols will also override any detectors.

In the absence of a forced protocol, detectors may be run on the URL, transforming the protocol anyways. The above example would've used the Git protocol either way since the Git detector would've detected it was a GitHub URL.

Protocol-Specific Options

Each protocol can support protocol-specific options to configure that protocol. For example, the git protocol supports specifying a ref query parameter that tells it what ref to checkout for that Git repository.

The options are specified as query parameters on the URL (or URL-like string) given to go-getter. Using the Git example above, the URL below is a valid input to go-getter:

github.com/hashicorp/go-getter?ref=abcd1234

The protocol-specific options are documented below the URL format section. But because they are part of the URL, we point it out here so you know they exist.

Subdirectories

If you want to download only a specific subdirectory from a downloaded directory, you can specify a subdirectory after a double-slash //. go-getter will first download the URL specified before the double-slash (as if you didn't specify a double-slash), but will then copy the path after the double slash into the target directory.

For example, if you're downloading this GitHub repository, but you only want to download the testdata directory, you can do the following:

https://github.com/hashicorp/go-getter.git//testdata

If you downloaded this to the /tmp directory, then the file /tmp/archive.gz would exist. Notice that this file is in the testdata directory in this repository, but because we specified a subdirectory, go-getter automatically copied only that directory contents.

Subdirectory paths may also use filesystem glob patterns. The path must match exactly one entry or go-getter will return an error. This is useful if you're not sure the exact directory name but it follows a predictable naming structure.

For example, the following URL would also work:

https://github.com/hashicorp/go-getter.git//test-*

Checksumming

For file downloads of any protocol, go-getter can automatically verify a checksum for you. Note that checksumming only works for downloading files, not directories, but checksumming will work for any protocol.

To checksum a file, append a checksum query parameter to the URL. go-getter will parse out this query parameter automatically and use it to verify the checksum. The parameter value can be in the format of type:value or just value, where type is "md5", "sha1", "sha256", "sha512" or "file" . The "value" should be the actual checksum value or download URL for "file". When type part is omitted, type will be guessed based on the length of the checksum string. Examples:

./foo.txt?checksum=md5:b7d96c89d09d9e204f5fedc4d5d55b21
./foo.txt?checksum=b7d96c89d09d9e204f5fedc4d5d55b21
./foo.txt?checksum=file:./foo.txt.sha256sum

When checksumming from a file - ex: with checksum=file:url - go-getter will get the file linked in the URL after file: using the same configuration. For example, in file:http://releases.ubuntu.com/cosmic/MD5SUMS go-getter will download a checksum file under the aforementioned url using the http protocol. All protocols supported by go-getter can be used. The checksum file will be downloaded in a temporary file then parsed. The destination of the temporary file can be changed by setting system specific environment variables: TMPDIR for unix; TMP, TEMP or USERPROFILE on windows. Read godoc of os.TempDir for more information on the temporary directory selection. Content of files are expected to be BSD or GNU style. Once go-getter is done with the checksum file; it is deleted.

The checksum query parameter is never sent to the backend protocol implementation. It is used at a higher level by go-getter itself.

If the destination file exists and the checksums match: download will be skipped.

Unarchiving

go-getter will automatically unarchive files into a file or directory based on the extension of the file being requested (over any protocol). This works for both file and directory downloads.

go-getter looks for an archive query parameter to specify the format of the archive. If this isn't specified, go-getter will use the extension of the path to see if it appears archived. Unarchiving can be explicitly disabled by setting the archive query parameter to false.

The following archive formats are supported:

  • tar.gz and tgz
  • tar.bz2 and tbz2
  • tar.xz and txz
  • zip
  • gz
  • bz2
  • xz

For example, an example URL is shown below:

./foo.zip

This will automatically be inferred to be a ZIP file and will be extracted. You can also be explicit about the archive type:

./some/other/path?archive=zip

And finally, you can disable archiving completely:

./some/path?archive=false

You can combine unarchiving with the other features of go-getter such as checksumming. The special archive query parameter will be removed from the URL before going to the final protocol downloader.

Protocol-Specific Options

This section documents the protocol-specific options that can be specified for go-getter. These options should be appended to the input as normal query parameters (HTTP headers are an exception to this, however). Depending on the usage of go-getter, applications may provide alternate ways of inputting options. For example, Nomad provides a nice options block for specifying options rather than in the URL.

General (All Protocols)

The options below are available to all protocols:

  • archive - The archive format to use to unarchive this file, or "" (empty string) to disable unarchiving. For more details, see the complete section on archive support above.

  • checksum - Checksum to verify the downloaded file or archive. See the entire section on checksumming above for format and more details.

  • filename - When in file download mode, allows specifying the name of the downloaded file on disk. Has no effect in directory mode.

Local Files (file)

None

Git (git)

  • ref - The Git ref to checkout. This is a ref, so it can point to a commit SHA, a branch name, etc. If it is a named ref such as a branch name, go-getter will update it to the latest on each get.

  • sshkey - An SSH private key to use during clones. The provided key must be a base64-encoded string. For example, to generate a suitable sshkey from a private key file on disk, you would run base64 -w0 <file>.

    Note: Git 2.3+ is required to use this feature.

  • depth - The Git clone depth. The provided number specifies the last n revisions to clone from the repository.

The git getter accepts both URL-style SSH addresses like git::ssh://[email protected]/foo/bar, and "scp-style" addresses like git::[email protected]/foo/bar. In the latter case, omitting the git:: force prefix is allowed if the username prefix is exactly git@.

The "scp-style" addresses cannot be used in conjunction with the ssh:// scheme prefix, because in that case the colon is used to mark an optional port number to connect on, rather than to delimit the path from the host.

Mercurial (hg)

  • rev - The Mercurial revision to checkout.

HTTP (http)

Basic Authentication

To use HTTP basic authentication with go-getter, simply prepend username:password@ to the hostname in the URL such as https://Aladdin:[email protected]/index.html. All special characters, including the username and password, must be URL encoded.

Headers

Optional request headers can be added by supplying them in a custom HttpGetter (not as query parameters like most other options). These headers will be sent out on every request the getter in question makes.

S3 (s3)

S3 takes various access configurations in the URL. Note that it will also read these from standard AWS environment variables if they're set. S3 compliant servers like Minio are also supported. If the query parameters are present, these take priority.

  • aws_access_key_id - AWS access key.
  • aws_access_key_secret - AWS access key secret.
  • aws_access_token - AWS access token if this is being used.
  • aws_profile - Use this profile from local ~/.aws/ config. Takes priority over the other three.

Using IAM Instance Profiles with S3

If you use go-getter and want to use an EC2 IAM Instance Profile to avoid using credentials, then just omit these and the profile, if available will be used automatically.

Using S3 with Minio

If you use go-gitter for Minio support, you must consider the following:

  • aws_access_key_id (required) - Minio access key.
  • aws_access_key_secret (required) - Minio access key secret.
  • region (optional - defaults to us-east-1) - Region identifier to use.
  • version (optional - defaults to Minio default) - Configuration file format.

S3 Bucket Examples

S3 has several addressing schemes used to reference your bucket. These are listed here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html

Some examples for these addressing schemes:

GCS (gcs)

GCS Authentication

In order to access to GCS, authentication credentials should be provided. More information can be found here

GCS Bucket Examples

GCS Testing

The tests for get_gcs.go require you to have GCP credentials set in your environment. These credentials can have any level of permissions to any project, they just need to exist. This means setting GOOGLE_APPLICATION_CREDENTIALS="~/path/to/credentials.json" or GOOGLE_CREDENTIALS="{stringified-credentials-json}". Due to this configuration, get_gcs_test.go will fail for external contributors in CircleCI.

Security Options

Disable Symlinks

In your getter client config, we recommend using the DisableSymlinks option, which prevents writing through or copying from symlinks (which may point outside the directory).

client := getter.Client{
    // This will prevent copying or writing files through symlinks
    DisableSymlinks: true,
}

Disable or Limit X-Terraform-Get

Go-Getter supports arbitrary redirects via the X-Terraform-Get header. This functionality exists to support Terraform use cases, but is likely not needed in most applications.

For code that uses the HttpGetter, add the following configuration options:

var httpGetter = &getter.HttpGetter{
    // Most clients should disable X-Terraform-Get
    // See the note below
    XTerraformGetDisabled: true,
    // Your software probably doesn’t rely on X-Terraform-Get, but
    // if it does, you should set the above field to false, plus
    // set XTerraformGet Limit to prevent endless redirects
    // XTerraformGetLimit: 10,
}

Enforce Timeouts

The HttpGetter supports timeouts and other resource-constraining configuration options. The GitGetter and HgGetter only support timeouts.

Configuration for the HttpGetter:

var httpGetter = &getter.HttpGetter{
    // Disable pre-fetch HEAD requests
    DoNotCheckHeadFirst: true,
    
    // As an alternative to the above setting, you can
    // set a reasonable timeout for HEAD requests
    // HeadFirstTimeout: 10 * time.Second,

    // Read timeout for HTTP operations
    ReadTimeout: 30 * time.Second,

    // Set the maximum number of bytes
    // that can be read by the getter
    MaxBytes: 500000000, // 500 MB
}

For code that uses the GitGetter or HgGetter, set the Timeout option:

var gitGetter = &getter.GitGetter{
    // Set a reasonable timeout for git operations
    Timeout: 5 * time.Minute,
}
var hgGetter = &getter.HgGetter{
    // Set a reasonable timeout for hg operations
    Timeout: 5 * time.Minute,
}

go-getter's People

Contributors

alexandrecarlton avatar apparentlymart avatar azr avatar claire-labry avatar cotarg avatar dadgar avatar dancannon avatar dpowley avatar eastebry avatar eculver avatar hashicorp-tsccr[bot] avatar jbardin avatar jen20 avatar kmoe avatar kpenfound avatar lfarnell avatar lifangmoler avatar mdeggies avatar mitchellh avatar nodyhub avatar picatz avatar preetapan avatar radeksimko avatar ryanuber avatar schmichael avatar scottsuarez avatar swampdragons avatar troyready avatar vancluever avatar zachwhaley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-getter's Issues

support for git shallow clone

It would be great to support (or even default to) shallow git clones.

Especially in terraform where you can get a checkout for each usage of a module, reducing both the storage and bandwidth necessary to get files would be great.

Support for symlink when untarring an archive

Operating System:
Ubuntu 16.04

Issue:
When a tarball has symlink inside it, go-getter turns the symlink into zero-length file. One benefit of having go-getter support this would be with respect to Nomad specifically with the artifact stanza https://www.nomadproject.io/docs/job-specification/artifact.html, since some tarball could contain symlink in it.

Repro:

  1. Create a test_dir.tar.gz with the following contents (note: test_ls.symlink)
vagrant@nomad-server01:/tmp$ tree test_dir/
test_dir/
├── a.txt
└── test_ls.symlink -> /bin/ls
  1. On the same directory, execute: python -m SimpleHTTPServer 8000
  2. Run go-getter http://127.0.0.1:8000/test_dir.tar.gz ~/
  3. Check what go-getter untarred (note: test_ls.symlink is 0 bytes)
vagrant@nomad-server01:~$ ls -l ~/test_dir/
total 4
-rw-r--r-- 1 vagrant vagrant 4 May  2 01:19 a.txt
-rwxr-xr-x 1 vagrant vagrant 0 May  2 01:19 test_ls.symlink

how to get a specific file from a zip file

Thanks for this wonderful tool.

Now I can successfully get the total zip file and let it auto extracted in local, but how about if I only want a specific file within the zip?

For example, below is the file structure within zip:

-+
  |- dir1
  |- dir2
        |---- foo.txt
        |---- bar.txt

My goal is to get the bar.txt only

Retrieving a single file from a git repo?

Is go-getter able to retrieve a single file from a git repo? It appears to work fine when the source is a directory, but not when the source is a file.

It does create the target directory, but does not write the file.

$ ./go-getter github.com/hashicorp/go-getter//helper/url/url.go url
2018/01/29 16:12:51 Success!
$ cat url/url.go
cat: url/url.go: No such file or directory
$ ls -al url/url.go
ls: cannot access url/url.go: No such file or directory
$ ls -al url
total 0
drwxr-xr-x 2 ec2-user ec2-user   6 Jan 29 16:12 .
drwx------ 5 ec2-user ec2-user 124 Jan 29 16:12 ..

Also tried specifying the destination as a filename, but that just creates a directory named for the file:

$ ./go-getter github.com/hashicorp/go-getter//helper/url/url.go url/url.go
2018/01/29 16:21:36 Success!
$ ls -al url
total 0
drwxr-xr-x 3 ec2-user ec2-user  20 Jan 29 16:21 .
drwx------ 5 ec2-user ec2-user 124 Jan 29 16:12 ..
drwxr-xr-x 2 ec2-user ec2-user   6 Jan 29 16:21 url.go
$ ls -al url/url.go
total 0
drwxr-xr-x 2 ec2-user ec2-user  6 Jan 29 16:21 .
drwxr-xr-x 3 ec2-user ec2-user 20 Jan 29 16:21 ..

Release of go-getter cli in releases.hashicorp.com tree

I like the go-getter cli for defining dependencies in automation projects. Thanks, for this simple and great tool.

Are there official binary releases of the go-getter cli tool? I would really appreciate if the binaries could be made available in the releases.hashicorp.com tree.

Support for custom headers

I want to download build artifacts from a private GitLab instance using Nomad's artifact stanza. I can download the build artifacts with curl if I add a Private-Token header, but there doesn't seem to be a way to handle with go-getter right now.

Add Support for Manta URL structure

I want to make sure that go-getter can be used to download files from manta - therefore, I need to add, at least!, some tests to see if Manta object URLs work as expected

add support for client cert auth

Creating an issue to track go-getter supporting client cert auth. This could work similarly to curl where you specify your certificate and private key at invocation time, or could read environment variables for the locations of the files.

If go-getter goes the "specify an option" route, it should ideally be exposed as a parameter in the nomad artifact stanza from the get-go as well. If we go the "use an environment variable" route, it can just be set when starting the nomad service, much like setting https_proxy before starting nomad as a service "just works" and go-getter respects that variable.

Broken packge state?

go-getter git::https://github.com/yoctocloud/packer-centos.git ./tests
2016/04/29 18:39:01 Error downloading: /usr/host/bin/git exited with 128: Cloning into '/tmp/getter-git175513477'...
fatal: repository 'https://github.com/yoctocloud/' not found

why command strips last component?
i'm try use go-getter lib in my app and seems that it strips last component too.

Alignment with Subresource Integrity

This is a proposal for allowing another separator in checksum: As of Subresource Integrity states:

An integrity value begins with at least one string, with each string including a prefix indicating a particular hash algorithm (currently the allowed prefixes are sha256, sha384, and sha512), followed by a dash, and ending with the actual base64-encoded hash.

I suggest adding a dash as an alternative separator at checksum detection, along side with another checksum option sha384 (go doc).

The case I have in mind is something similar to what was discussed at ry/deno#200, specifically for FaaS context: import modules (maybe using dynamic import()) with canonical URLs that can be checked by the runtime before execution.

Proposal: Use a pure Go library for GitGetter

Right now, GitGetter requires that git be installed and in the path. This is ok, not too hard, maybe a bit of a put-off on Windows.

I noticed there is now what looks to be a pretty decent implementation of git in pure Go, go-git. Far as I can tell from skimming get_git.go, the library implements enough git commands to be a viable replacement for the git binary currently required by GitGetter.

Thought it might be worth investigating.

Git protocol fails when using sshkey parameter on Windows clients

When using the git protocol with an sshkey on Windows clients, go-getter fails due to its version checking code, as git-for-windows doesn't use semver versioning.

For example, running git version will return git version 2.17.0.windows.1, which would need to use a hyphen after the patch version to be valid semver.

Due to this incorrect input, checkGitVersion is returning an error. This is preventing me using artifacts from a private git repository in a Nomad job running on Windows.

I'm not sure of the best fix for this... Perhaps Git 2.3 was released long enough ago that the version check could be removed?

go-getter doesn't implicitly create directories

The common gnu and bsd tar implementations will implicitly create directories for nested files, even if the directory entry has not been previously seen in the archive.

For example, go-getter will fail if it encounters an archive only containing subdir/file.txt, while the common tar implementations will create subdir as required.

hashicorp/terraform#15732

AppVeyor tests are failing

It looks like the AppVeyor tests (tests on Windows) started failing with commit 039c4e2 and haven't passed since. The error seems consistent:

=== RUN   TestParse
--- FAIL: TestParse (0.00s)
	url_test.go:85: test 2: expected url.String() = "C:/", got "./C:/"

I don't have a working Go dev environment on a Windows system to try this right now, but this smells like absolute source paths might be broken on Windows.

Question: Using checksum to avoid download

Hi!

I was wondering if it is possible (or even in scope of go-getter) to skip a download in case a checksum is provided and a file at the destination can be found that matches this checksum.

My use case would be, that I have to download rather large files and I'd like to avoid the redundant downloads. Of corse I can check the destination myself, but like I said, I was wondering if this is considered to be in-scope for this package.

HTTP Basic auth support?

Hi,

Any plans on adding parameters for http basic auth? I'm using Nomad and would like to download artifacts from a secured JFrog Artifactory.

thanks in advance!

Go-getter does not Preserve Permissions when Untarring Archives

Issue:
When go-getter untars an archive, the original permission inside the tarball are not preserved.

Repro:

  1. Create a test_dir.tar.gz with the following contents (note: files are owned by vagrants)
$ ls -l test_dir
total 4
-rw-r--r-- 1 vagrant vagrant 4 May  2 01:18 a.txt
-rwxr-xr-x 1 vagrant vagrant 0 May  2 01:18 test_ls.symlink
  1. On the same directory, execute: python -m SimpleHTTPServer 8000
  2. Run sudo go-getter http://127.0.0.1:8000/test_dir.tar.gz ~/
  3. Check what go-getter untarred (note: files are now owned by root)
$ ls -l test_dir
total 4
-rw-r--r-- 1 root root 4 May  2 14:15 a.txt
-rwxr-xr-x 1 root root 0 May  2 14:15 test_ls.symlink

Support FTP

I would like to see this library support [S]FTP. Submitting this issue to track and maybe hang a PR off of.

go-getter seems to corrupt zip files

Trying to download zip files through the below pseudo code and it looks like zip files are getting corrupted

import "github.com/hashicorp/go-getter"
var my getter.Client
my.Src=source
my.Dst=target
my.Dir=true
err :=my.Get()

S3: Support for getting credentials from EC2 meta-data service when using IAM roles.

I can see go-getter using the EC2RoleProvider but I couldn't find where it was calling creds.Get() to download the actual credentials, from AWS metadata service. I ran across this issue while trying to use Nomad with a job using artifacts located in S3.

I also manually verified that retrieving credentials using CURL was possible in order to rule out IAM configuration issues.

GitGetter update assumes a master branch is available

When calling update in a detached head repo (e.g. a Terraform module), git checkout master is always executed. This causes a pathspec 'master' did not match any file(s) known to git error to be raised during terraform get if a master branch is not available.

The // Not a branch, switch to master... logic should be updated to check for the default branch instead of assuming master is available. Something akin to git branch -r --points-at refs/remotes/origin/HEAD | grep '\->' | cut -d' ' -f5 | cut -d/ -f2 (sourced from SO) seems like a reliable option.

(This option seemed like it would be much cleaner, but unfortunately doesn't seem to work in detached HEAD)

feature: Support for maven type urls

Hi,

a feature that would be useful for us is urls using the groupId, artifactId and versionId of maven artifacts. So for example, the url would look something like this:
maven::<groupId>/<artifactId>/<versionId> where versionId is optional. Query params could include, localRepo::Boolean, mavenRepoUrls::List, etc.

I can work on it an create a pr if you think it would be worth having?

Thanks

Zip files that contain directories that don't exist on dest fail

If a zip file contains: new_folder/new_file and I tell go-getter to download it to dest/ which doesn't contain new_folder I get the following error:

open /tmp/NomadClient446359139/1491681f-974e-9e16-3276-7d3ac7aa5b1a/redis/local/bin/cp: no such file or directory

Add support for getting files from S3

We are looking into using Nomad and store our binaries in S3. To fetch these as part of the job startup it would be helpful for go-getter to support S3.

I have started this here but I am not sure about my approach. I would be interested to hear your thoughts.

Download progress information

It would be nice if the library can return download progress information. For example, how many bytes have been downloaded, the total file size if known, the current speed and estimated time remaining. Perhaps this could be returned over a channel.

The use-case for this is the ability to use the information to draw a progress bar. Other use-cases include collecting this data for later analysis.

git URLs fail when ref includes a '+' character

In the Git getter, the 'ref' value used to determine which Git reference to check out is url-query-decoded, meaning that '+' signs in the ref are replaced with spaces (' ').

https://github.com/hashicorp/go-getter/blob/master/get_git.go#L35

You could argue that folks should escape '+' characters in the ref query value (with '%2B'). However, there are two very good reasons to pass through '+' to Git:

  1. '+' is the official semver mechanism for appending version metadata, and server versions (prefixed with v) are widely used as GitHub tag names for releases.
  2. ' ' (space) is illegal in git refs anyway

so, it would be a shame to force people to encode '+' in git refs.

Using git always gives error 128

All of these give the same error:

go-getter "git://github.com/kelseyhightower/hashiconf-eu-2016.git" "dest"
2018/09/08 21:16:38 Error downloading: error downloading 'git://github.com/kelseyhightower/hashiconf-eu-2016.git': /usr/bin/git exited with 128: fatal: Not a git repository (or any of the parent directories): .git

go-getter git://github.com/kelseyhightower/hashiconf-eu-2016.git "dest"
2018/09/08 21:16:49 Error downloading: error downloading 'git://github.com/kelseyhightower/hashiconf-eu-2016.git': /usr/bin/git exited with 128: fatal: Not a git repository (or any of the parent directories): .git

go-getter "git://github.com/kelseyhightower/hashiconf-eu-2016.git" "dest"
2018/09/08 21:16:53 Error downloading: error downloading 'git://github.com/kelseyhightower/hashiconf-eu-2016.git': /usr/bin/git exited with 128: fatal: Not a git repository (or any of the parent directories): .git

go-getter "git://github.com/kelseyhightower/hashiconf-eu-2016.git" "dest"
2018/09/08 21:16:57 Error downloading: error downloading 'git://github.com/kelseyhightower/hashiconf-eu-2016.git': /usr/bin/git exited with 128: fatal: Not a git repository (or any of the parent directories): .git

go-getter "git::http://github.com/kelseyhightower/hashiconf-eu-2016.git" "dest"
2018/09/08 21:17:09 Error downloading: error downloading 'http://github.com/kelseyhightower/hashiconf-eu-2016.git': /usr/bin/git exited with 128: fatal: Not a git repository (or any of the parent directories): .git

go-getter "git::https://github.com/kelseyhightower/hashiconf-eu-2016.git" "dest"
2018/09/08 21:17:14 Error downloading: error downloading 'https://github.com/kelseyhightower/hashiconf-eu-2016.git': /usr/bin/git exited with 128: fatal: Not a git repository (or any of the parent directories): .git

go-getter [email protected]:kelseyhightower/hashiconf-eu-2016.git "dest"
2018/09/08 21:17:53 Error downloading: error downloading 'ssh://[email protected]/kelseyhightower/hashiconf-eu-2016.git': /usr/bin/git exited with 128: fatal: Not a git repository (or any of the parent directories): .git

This also occurs on nomad. What is the problem? I'm using the latest version.

Proxy support

Hi,

While using Nomad, I got stuck with a problem which might be related to go-getter.
In the Nomad job I specify an artifact to download and the job fails because the download times out:

GET error: Get https://raw.githubusercontent.com/FRosner/nomad-docker-wrapper/master/nomad-docker-wrapper: dial tcp 23.235.43.133:443: i/o timeout

The machine I am using is behind the proxy, can this cause the problem? http_proxy and https_proxy variables are set, but can it be the case that go-getter is not picking them?

Original Nomad issue: https://groups.google.com/forum/#!topic/nomad-tool/rwLe3rIE2ZI

Thank you in advance!

local transfer required special privileges on Windows

Windows 7 and more have a very bad handling of symlinks, requiring the UAC dialog even if you're administrator (see http://answers.perforce.com/articles/KB/3472 for more details).
Since using symlink is the default behavior for local transfer, that's create weird situations where downloading from internet can be done without special permission but a local file those.
I think go-getter should better handled this case by default.

This issue is related to nomad issue 1714 (hashicorp/nomad#1714).

Add support for S3 buckets in AWS China Regions

By 2018-11, there are 2 AWS regions in China:

  • Beijing (cn-north-1)
  • Ningxia (cn-northwest-1)

Unfortunately, the schemes of S3 endpoints in both China regions are different from those in Global regions, as in AWS Regions and Endpoints.

This is causing problems when running Nomad in those regions.

For example, I am getting "URL is not a valid S3 URL" error with the following code in Job file:

artifact {
  source = "https://s3.cn-north-1.amazonaws.com.cn/my-bucket-example/my_app.tar.gz"
}

I think I can temporarily work around this issue by allowing the VPC to do s3:Get* actions via S3 bucket policy, but it would be nice to get the native support as well.

SRV Support Feature Request

I often download artifacts for nomad from my internal file servers which are operated by consul/nomad. I use consul DNS support and it's awesome. However, I am forced to put my file servers behind a load balancer (even though consul provides this support for free!) with a static port that becomes magic in my job config files. It would be great if go getter had support for SRV DNS support (and the rest of the hashicorp stack could inherit by extension). Old habits die hard and communities are opposed to making the standard "http://" schema do anything other than "<some_url>:80" which is regrettable given the power of SRV DNS; luckily, go getter has the ability to execute "forced protocols." A new protocol could be added to perform the SRV DNS lookup and otherwise maintain complete backward-compatibility.

Example:
service_discovery::http://internal.file.server.consul/my-files

Cheers

File getter does not work with relative paths from a relative symlinked directory

Getting a file with a local path (say ./filename) returns an error if you are running go-getter or terraform from a symlinked directory.

Reproducer:

# this works
cd src/github.com/hashicorp/go-getter
go-getter -mode file ./module_test.go ./foo
2016/07/13 10:40:46 Success!

# this does not work
cd ..
ln -sf go-getter go-getter2
cd go-getter2/
go-getter -mode file ./module_test.go ./foo
2016/07/13 10:41:31 Error downloading: source path error: stat /go-getter/module_test.go: no such file or directory

Update: works as expected if the symlink is a full, non-symlinked file.

Use relative symlink when destination is already relative

Hey there, when using terraform between two machines where I was syncing the code including .terraform directory (or even when using docker on the same machine with --volume option to mount the local folder) I started having problems because I had to remove .terraform and run terraform init everytime I switched the machine because I had modules on my tree the needed to be initialized.

When running terraform init inside docker this would be the resulting tree:

/code # ls -al .terraform/modules/
total 4
drwxr-xr-x    5 root     root           160 Oct 29 13:53 .
drwxr-xr-x    4 root     root           128 Oct 29 13:53 ..
lrwxrwxrwx    1 root     root            17 Oct 29 13:53 7ba959c0b347b795b64cc07f92ddb021 -> /code/modules/vpn
lrwxrwxrwx    1 root     root            17 Oct 29 13:53 c8b6dc561a820f2e652d98808f4b1baa -> /code/modules/vpn
-rw-r--r--    1 root     root           308 Oct 29 13:53 modules.json

where /code does not exist in my host (MacOS machine) forcing me to remove .terraform and initialized again.
I wrote some patch and I would like to see if it is interesting to you to send a pull request, here is a draft of it:

diff --git a/vendor/github.com/hashicorp/go-getter/get_file_unix.go b/vendor/github.com/hashicorp/go-getter/get_file_unix.go
index c89a2d5a4..735d8399d 100644
--- a/vendor/github.com/hashicorp/go-getter/get_file_unix.go
+++ b/vendor/github.com/hashicorp/go-getter/get_file_unix.go
@@ -7,17 +7,33 @@ import (
 	"io"
 	"net/url"
 	"os"
+	"path"
 	"path/filepath"
+	"strings"
 )
 
 func (g *FileGetter) Get(dst string, u *url.URL) error {
-	path := u.Path
+	isDestinationRelative := path.IsAbs(dst) == false
+	sourcePath := u.Path
+	cwd, err := os.Getwd()
+	if err != nil {
+		return err
+	}
 	if u.RawPath != "" {
-		path = u.RawPath
+		sourcePath = u.RawPath
+	}
+
+	// If destination is relative to the current path, we can just crate a
+	// relative link thus allowing the destination folder to be moved freely
+	sourcePathNormalized := sourcePath
+	if isDestinationRelative {
+		basePath := fmt.Sprintf("%s%c", cwd, os.PathSeparator)
+		sourcePath = strings.Replace(sourcePath, basePath, "", -1)
+		sourcePathNormalized = path.Join("..", "..", sourcePath)
 	}
 
 	// The source path must exist and be a directory to be usable.
-	if fi, err := os.Stat(path); err != nil {
+	if fi, err := os.Stat(sourcePath); err != nil {
 		return fmt.Errorf("source path error: %s", err)
 	} else if !fi.IsDir() {
 		return fmt.Errorf("source path must be a directory")
@@ -46,7 +62,7 @@ func (g *FileGetter) Get(dst string, u *url.URL) error {
 		return err
 	}
 
-	return os.Symlink(path, dst)
+	return os.Symlink(sourcePathNormalized, dst)
 }
 
 func (g *FileGetter) GetFile(dst string, u *url.URL) error {

and then here is the result of terraform init with it:

/code # ls -al .terraform/modules/
total 4
drwxr-xr-x    5 root     root           160 Oct 29 13:54 .
drwxr-xr-x    4 root     root           128 Oct 29 13:54 ..
lrwxrwxrwx    1 root     root            17 Oct 29 13:54 7ba959c0b347b795b64cc07f92ddb021 -> ../../modules/vpn
lrwxrwxrwx    1 root     root            17 Oct 29 13:54 c8b6dc561a820f2e652d98808f4b1baa -> ../../modules/vpn
-rw-r--r--    1 root     root           308 Oct 29 13:54 modules.json

by using the relative paths I can move the whole tree freely because the symlinks would always be referenced using a relative path.
Is it worth it a PR?

Add Azure Blob Storage Support

Allow go-getter to download artifacts from Azure Blob Storage private containers.

S3 is already supported, this is a feature request to have same functionality for Azure Blob Storage.

SRV Support Feature Request

I often download artifacts for nomad from my internal file servers which are operated by consul/nomad. I use consul DNS support and it's awesome. However, I am forced to put my file servers behind a load balancer (even though consul provides this support for free!) with a static port that becomes magic in my job config files. It would be great if go getter had support for SRV DNS support (and the rest of the hashicorp stack could inherit by extension). Old habits die hard and communities are opposed to making the standard "http://" schema do anything other than "<some_url>:80" which is regrettable given the power of SRV DNS; luckily, go getter has the ability to execute "forced protocols." A new protocol could be added to perform the SRV DNS lookup and otherwise maintain complete backward-compatibility.

Example:
service_discovery::http://internal.file.server.consul/my-files

Cheers

S3: Vulnerable to confused deputy attack?

go-getter uses AWS instance metadata to automatically try and acquire IAM role data so as to be able to access data in S3.

Unless I'm mistaken, which I may be, this allows go-getter to be used by an attacker to request access to data stored in S3 to which only the EC2 instance in question should have access.

Use of ambient credentials to access data stored in S3 should be made the subject of an explicit opt-in by the calling code.

Example:

  • Instance Z has an IAM role permitting it to access private S3 bucket A.
  • Instance Z runs two daemons: Daemon Z1 which makes use of sensitive data in bucket A, and Daemon Z2 which uses go-getter to retrieve and process publically accessible data at the behest of users.
  • User X requests Z2 running on Instance Z to retrieve data from private S3 bucket A.
  • Z2 makes use of the IAM role provided for the purposes of Z1 to retrieve data from private S3 bucket A.

S3: Custom url

Is it possible to use custom url? I'm looking to use minio with nomad artifacts

XZ support

Nice to have. XZ is generally overkill, but some systems have silly small size quotas.

Auto yes when cloning git repos

When using go-getter programmatically and cloning repositories using git over ssh it would be good to have an AutoYes config parameter to move pass this:

The authenticity of host 'github.com (192.30.253.113)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.