spotify / gcs-tools Goto Github PK
View Code? Open in Web Editor NEWGCS support for avro-tools, parquet-tools and protobuf
License: Apache License 2.0
GCS support for avro-tools, parquet-tools and protobuf
License: Apache License 2.0
Related to #12
on gcs-avro-tools 0.1.7 from homebrew, there appear to be issues loading the application default credentials.
Exception in thread "main" java.lang.IllegalArgumentException: no JSON input found
at com.google.api.client.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at com.google.api.client.util.Preconditions.checkArgument(Preconditions.java:49)
at com.google.api.client.json.JsonParser.startParsing(JsonParser.java:222)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:379)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:335)
at com.google.api.client.json.JsonParser.parseAndClose(JsonParser.java:165)
at com.google.api.client.json.JsonParser.parseAndClose(JsonParser.java:147)
at com.google.api.client.json.JsonFactory.fromInputStream(JsonFactory.java:206)
at com.google.api.client.extensions.java6.auth.oauth2.FileCredentialStore.loadCredentials(FileCredentialStore.java:154)
at com.google.api.client.extensions.java6.auth.oauth2.FileCredentialStore.<init>(FileCredentialStore.java:86)
at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromFileCredentialStoreForInstalledApp(CredentialFactory.java:301)
....
when I gcloud auth application-default login
, it saves my credentials to /Users/cchow/.config/gcloud/application_default_credentials.json
. did the expected path change?
Hello there ๐
I noticed that you updated the version of parquet-tools
on master (allowing usage of rowcount
๐ ) for a fair amount of time now but there was no release with it.
Do you have any idea if/when you would be able to make a new release?
Relying heavily on parquet hosted on GCS, this is cruelly missing!
Nonetheless, thanks for this awesome tool ๐
STR:
1a. Install all latest (v0.2.2 on Aug 29) tools
1b. Or build latest master to parquet-cli-1.12.3.jar, proto-tools-3.21.1.jar, avro-tools-1.11.0.jar,magnolify-tools-0.4.8.jar
<TOOL> tojson <GCS_PATH>
Actual:
Tool launches browser that shows a page:
With a message:
The version of the app you're using doesn't include the latest security features to keep you protected. Please make sure to download from a trusted source and update to the latest, most secure version.
Exected:
Tool reads a file according to spec
Hello there ๐
Following the README, I tried to build the project & use it locally but parquet-tools
fails with a NoSuchMethodError
:
The command:
% java -jar parquet-tools/target/scala-2.12/parquet-tools-1.10.1.jar rowcount --debug gs://path/to/parquet/file
The error:
java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$ParentTimestampUpdateIncludePredicate.create(GoogleHadoopFileSystemBase.java:790)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createOptionsBuilderFromConfig(GoogleHadoopFileSystemBase.java:2140)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1832)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1013)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:976)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:83)
at org.apache.parquet.tools.Main.main(Main.java:223)
java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
running proto-tools tojson
throws this error:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.protobuf.CodedInputStream.newInstance(Ljava/nio/ByteBuffer;)Lcom/google/protobuf/CodedInputStream;
at me.lyh.protobuf.generic.GenericReader.read(GenericReader.scala:21)
at org.apache.avro.tool.ProtobufReader.toJson(ProtobufReader.scala:9)
at org.apache.avro.tool.ProtoToJsonTool.run(ProtoToJsonTool.java:59)
at org.apache.avro.tool.ProtoMain.run(ProtoMain.java:64)
at org.apache.avro.tool.ProtoMain.main(ProtoMain.java:53)`
parquet-tools
is deprecated.
https://github.com/apache/parquet-mr/tree/master/parquet-tools-deprecated
an example use case is inspecting the pipelineUrl
file that Dataflow stages (which is a .pb file representing org.apache.beam.model.pipeline.v1.Pipeline) to verify coders and transforms. It would just be a wrapper around protoc
's decode
or decode_raw
method, maybe with built-in support for common Protobuf messages like Pipeline
(although we'd have to handle different schema versions for different Beam versions).
When calling proto-tools
with either tojson
or getschema
, the following error is thrown:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$ParentTimestampUpdateIncludePredicate.create(GoogleHadoopFileSystemBase.java:641)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createOptionsBuilderFromConfig(GoogleHadoopFileSystemBase.java:1978)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1675)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:862)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:825)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.avro.tool.Util.openFromFS(Util.java:88)
at org.apache.avro.tool.Util.fileOrStdin(Util.java:60)
at org.apache.avro.tool.ProtoToJsonTool.run(ProtoToJsonTool.java:48)
at org.apache.avro.tool.ProtoMain.run(ProtoMain.java:64)
at org.apache.avro.tool.ProtoMain.main(ProtoMain.java:53)
Wondering if we could update avro-tools so we can do avro-tools tojson --head=10 <avrofile>
? ๐
Currently missing this, which seems to be a part of later version of apache avro.
Hello,
Coming from Scio's documentation, I ended-up installing proto-tools
through Homebrew (spotify/public/gcs-proto-tools stable 0.2.4). However, when I run it I get the following error:
$ proto-tools getschema gs://bucket/data.protobuf.avro
Exception in thread "main" java.lang.IllegalArgumentException: No valid credential configuration discovered: [CredentialOptions{serviceAccountEnabled=false, serviceAccountPrivateKeyId=null, serviceAccountPrivateKey=null, serviceAccountEmail=null, serviceAccountKeyFile=null, serviceAccountJsonKeyFile=null, nullCredentialEnabled=false, transportType=JAVA_NET, tokenServerUrl=https://oauth2.googleapis.com/token, proxyAddress=null, proxyUsername=null, proxyPassword=null, authClientId=32555940559.apps.googleusercontent.com, authClientSecret=<redacted>, authRefreshToken=null}]
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
at com.google.cloud.hadoop.util.CredentialOptions$Builder.build(CredentialOptions.java:171)
at com.google.cloud.hadoop.util.HadoopCredentialConfiguration.getCredentialFactory(HadoopCredentialConfiguration.java:227)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1343)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1501)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1483)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:470)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3572)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3673)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3624)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.avro.mapred.FsInput.<init>(FsInput.java:38)
at org.apache.avro.tool.ProtoGetSchemaTool.run(ProtoGetSchemaTool.java:33)
at org.apache.avro.tool.ProtoMain.run(ProtoMain.java:64)
at org.apache.avro.tool.ProtoMain.main(ProtoMain.java:53)
I'm logged into my GCP project with gcloud
.
The README seems to suggest that something needs to be done with GCS-connector but I can't figure out what exactly:
core-site.xml
file within a Homebrew installation? Or can it be passed to the command-line?A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.