GithubHelp home page GithubHelp logo

spotify / gcs-tools Goto Github PK

View Code? Open in Web Editor NEW
71.0 16.0 14.0 160 KB

GCS support for avro-tools, parquet-tools and protobuf

License: Apache License 2.0

Scala 47.64% Java 41.28% Shell 11.09%
gcs google-storage avro protobuf gcs-connector gcp parquet

gcs-tools's People

Contributors

ajitgogul avatar clairemcginty avatar luster avatar nevillelyh avatar regadas avatar rustedbones avatar scala-steward avatar syodage avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcs-tools's Issues

no JSON input found: gcloud credentials

on gcs-avro-tools 0.1.7 from homebrew, there appear to be issues loading the application default credentials.

Exception in thread "main" java.lang.IllegalArgumentException: no JSON input found
	at com.google.api.client.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
	at com.google.api.client.util.Preconditions.checkArgument(Preconditions.java:49)
	at com.google.api.client.json.JsonParser.startParsing(JsonParser.java:222)
	at com.google.api.client.json.JsonParser.parse(JsonParser.java:379)
	at com.google.api.client.json.JsonParser.parse(JsonParser.java:335)
	at com.google.api.client.json.JsonParser.parseAndClose(JsonParser.java:165)
	at com.google.api.client.json.JsonParser.parseAndClose(JsonParser.java:147)
	at com.google.api.client.json.JsonFactory.fromInputStream(JsonFactory.java:206)
	at com.google.api.client.extensions.java6.auth.oauth2.FileCredentialStore.loadCredentials(FileCredentialStore.java:154)
	at com.google.api.client.extensions.java6.auth.oauth2.FileCredentialStore.<init>(FileCredentialStore.java:86)
	at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromFileCredentialStoreForInstalledApp(CredentialFactory.java:301)

....

when I gcloud auth application-default login, it saves my credentials to /Users/cchow/.config/gcloud/application_default_credentials.json. did the expected path change?

New release with parquet-tools 1.10.1?

Hello there ๐Ÿ‘‹

I noticed that you updated the version of parquet-tools on master (allowing usage of rowcount ๐Ÿ™ ) for a fair amount of time now but there was no release with it.

Do you have any idea if/when you would be able to make a new release?

Relying heavily on parquet hosted on GCS, this is cruelly missing!

Nonetheless, thanks for this awesome tool ๐Ÿ‘

All latest tools fail to authenticate to GCS

STR:

1a. Install all latest (v0.2.2 on Aug 29) tools
1b. Or build latest master to parquet-cli-1.12.3.jar, proto-tools-3.21.1.jar, avro-tools-1.11.0.jar,magnolify-tools-0.4.8.jar

  1. Run all of them using basic read command like <TOOL> tojson <GCS_PATH>

Actual:
Tool launches browser that shows a page:
Screen Shot 2022-08-29 at 9 50 20 AM

With a message:

The version of the app you're using doesn't include the latest security features to keep you protected. Please make sure to download from a trusted source and update to the latest, most secure version.

Exected:
Tool reads a file according to spec

NoSuchMethodError when running parquet-tools locally

Hello there ๐Ÿ‘‹

Following the README, I tried to build the project & use it locally but parquet-tools fails with a NoSuchMethodError:

The command:

% java -jar parquet-tools/target/scala-2.12/parquet-tools-1.10.1.jar rowcount --debug gs://path/to/parquet/file

The error:

java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$ParentTimestampUpdateIncludePredicate.create(GoogleHadoopFileSystemBase.java:790)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createOptionsBuilderFromConfig(GoogleHadoopFileSystemBase.java:2140)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1832)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1013)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:976)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:83)
        at org.apache.parquet.tools.Main.main(Main.java:223)
java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;

error running proto-tools tojson

running proto-tools tojson throws this error:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.protobuf.CodedInputStream.newInstance(Ljava/nio/ByteBuffer;)Lcom/google/protobuf/CodedInputStream;
at me.lyh.protobuf.generic.GenericReader.read(GenericReader.scala:21)
at org.apache.avro.tool.ProtobufReader.toJson(ProtobufReader.scala:9)
at org.apache.avro.tool.ProtoToJsonTool.run(ProtoToJsonTool.java:59)
at org.apache.avro.tool.ProtoMain.run(ProtoMain.java:64)
at org.apache.avro.tool.ProtoMain.main(ProtoMain.java:53)`

add `proto-tools fromPb` method?

an example use case is inspecting the pipelineUrl file that Dataflow stages (which is a .pb file representing org.apache.beam.model.pipeline.v1.Pipeline) to verify coders and transforms. It would just be a wrapper around protoc's decode or decode_raw method, maybe with built-in support for common Protobuf messages like Pipeline (although we'd have to handle different schema versions for different Beam versions).

proto-tools NoSuchMethod error

When calling proto-tools with either tojson or getschema, the following error is thrown:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$ParentTimestampUpdateIncludePredicate.create(GoogleHadoopFileSystemBase.java:641)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createOptionsBuilderFromConfig(GoogleHadoopFileSystemBase.java:1978)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1675)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:862)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:825)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.avro.tool.Util.openFromFS(Util.java:88)
	at org.apache.avro.tool.Util.fileOrStdin(Util.java:60)
	at org.apache.avro.tool.ProtoToJsonTool.run(ProtoToJsonTool.java:48)
	at org.apache.avro.tool.ProtoMain.run(ProtoMain.java:64)
	at org.apache.avro.tool.ProtoMain.main(ProtoMain.java:53)

No valid credential configuration discovered

Hello,
Coming from Scio's documentation, I ended-up installing proto-tools through Homebrew (spotify/public/gcs-proto-tools stable 0.2.4). However, when I run it I get the following error:

$ proto-tools getschema gs://bucket/data.protobuf.avro
Exception in thread "main" java.lang.IllegalArgumentException: No valid credential configuration discovered:  [CredentialOptions{serviceAccountEnabled=false, serviceAccountPrivateKeyId=null, serviceAccountPrivateKey=null, serviceAccountEmail=null, serviceAccountKeyFile=null, serviceAccountJsonKeyFile=null, nullCredentialEnabled=false, transportType=JAVA_NET, tokenServerUrl=https://oauth2.googleapis.com/token, proxyAddress=null, proxyUsername=null, proxyPassword=null, authClientId=32555940559.apps.googleusercontent.com, authClientSecret=<redacted>, authRefreshToken=null}]
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
	at com.google.cloud.hadoop.util.CredentialOptions$Builder.build(CredentialOptions.java:171)
	at com.google.cloud.hadoop.util.HadoopCredentialConfiguration.getCredentialFactory(HadoopCredentialConfiguration.java:227)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1343)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1501)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1483)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:470)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3572)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3673)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3624)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.avro.mapred.FsInput.<init>(FsInput.java:38)
	at org.apache.avro.tool.ProtoGetSchemaTool.run(ProtoGetSchemaTool.java:33)
	at org.apache.avro.tool.ProtoMain.run(ProtoMain.java:64)
	at org.apache.avro.tool.ProtoMain.main(ProtoMain.java:53)

I'm logged into my GCP project with gcloud.

The README seems to suggest that something needs to be done with GCS-connector but I can't figure out what exactly:

  • Is it something to be installed separately? How?
  • How can I edit some core-site.xml file within a Homebrew installation? Or can it be passed to the command-line?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.