GithubHelp home page GithubHelp logo

embulk-input-athena's Introduction

embulk-input-athena's People

Contributors

giwa avatar gms-hi-oda avatar hi1280 avatar sakama avatar shinji19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

embulk-input-athena's Issues

SQL NULL value becomes zero

@shinji19 Hello. I report an issue. Do you think about this?

phenomenon

When I use long, double and boolean type,
PrestoSQL(Athena) NULL Value becomes zero in embulk .

I expect PrestoSQL NULL Value become null.

cause

AthenaInputPlugin uses java.sql.ResultSet#getLong (getDouble, getBoolean, as well).

https://github.com/shinji19/embulk-input-athena/blob/develop/src/main/java/org/embulk/input/athena/AthenaInputPlugin.java#L158

getLong method returns 0 if value is SQL NULL.

Returns:
the column value; if the value is SQL NULL, the value returned is 0

https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html#getLong-int-

suggestion

I suggest AthenaInputPlugin to check SQL NULL with wasNull() method.
When wasNull method return true, input NULL in embulk.

SQL Exception does not cause abnormal exit

I got an Athena error with this plugin, but it didn't result in an abnormal exit.
Below are some of the logs from that time.

2020-09-04 09:05:58.886 +0000 [INFO] (0015:task-0000): Loading 30,690 rows
2020-09-04 09:06:03.408 +0000 [INFO] (0015:task-0000): > 4.52 seconds (loaded 215,165 rows in total)
java.sql.SQLException: Error fetching results
 at com.amazonaws.athena.jdbc.AthenaResultSet.next(AthenaResultSet.java:184)
 at org.embulk.input.athena.AthenaInputPlugin.run(AthenaInputPlugin.java:133)
 at org.embulk.spi.util.Executors.process(Executors.java:62)
 at org.embulk.spi.util.Executors.process(Executors.java:38)
 at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:170)
 at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:167)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.athena.jdbc.shaded.com.amazonaws.services.athena.model.AmazonAthenaException: Rate exceeded (Service: AmazonAthena; Status Code: 400; Error Code: ThrottlingException; Request ID: 09759443-002f-49a4-b411-971607667f03)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.services.athena.AmazonAthenaClient.doInvoke(AmazonAthenaClient.java:1549)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.services.athena.AmazonAthenaClient.invoke(AmazonAthenaClient.java:1525)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.services.athena.AmazonAthenaClient.executeGetQueryResults(AmazonAthenaClient.java:1064)
 at com.amazonaws.athena.jdbc.shaded.com.amazonaws.services.athena.AmazonAthenaClient.getQueryResults(AmazonAthenaClient.java:1040)
 at com.amazonaws.athena.jdbc.AthenaServiceClient.fetchQueryResult(AthenaServiceClient.java:205)
 at com.amazonaws.athena.jdbc.AthenaStatementClient.paginateQueryResult(AthenaStatementClient.java:264)
 at com.amazonaws.athena.jdbc.AthenaStatementClient.nextQueryResult(AthenaStatementClient.java:292)
 at com.amazonaws.athena.jdbc.AthenaResultSet.fetchRows(AthenaResultSet.java:133)
 at com.amazonaws.athena.jdbc.AthenaResultSet.next(AthenaResultSet.java:164)
 ... 9 more
2020-09-04 09:06:20.364 +0000 [INFO] (0001:transaction): {done:  1 / 1, running: 0}

I'd like to create a workflow that triggers an abnormal exit of embulk and reruns it.
Is there any reason not to have an abnormal exit?

Add driver_path option

I have tried to test this cool plugin. But I have faced to this problem.
It needs introduce driver_path option as other jdbc driver has.

As workaround, I have installed jar which I got from AWS page and install that like following command in OSX.
sudo cp AthenaJDBC41-1.1.0.jar /Library/Java/Extensions/

java.lang.ClassNotFoundException: com.amazonaws.athena.jdbc.AthenaDriver
	at org.embulk.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:257)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at org.embulk.input.athena.AthenaInputPlugin.getAthenaConnection(AthenaInputPlugin.java:193)
	at org.embulk.input.athena.AthenaInputPlugin.run(AthenaInputPlugin.java:101)
	at org.embulk.exec.PreviewExecutor$2$1.run(PreviewExecutor.java:128)
	at org.embulk.spi.util.Filters$RecursiveControl.transaction(Filters.java:84)
	at org.embulk.spi.util.Filters.transaction(Filters.java:42)
	at org.embulk.exec.PreviewExecutor$2.run(PreviewExecutor.java:118)
	at org.embulk.input.athena.AthenaInputPlugin.resume(AthenaInputPlugin.java:82)
	at org.embulk.input.athena.AthenaInputPlugin.transaction(AthenaInputPlugin.java:77)
	at org.embulk.exec.PreviewExecutor.doPreview(PreviewExecutor.java:116)
	at org.embulk.exec.PreviewExecutor.doPreview(PreviewExecutor.java:104)
	at org.embulk.exec.PreviewExecutor.access$000(PreviewExecutor.java:30)
	at org.embulk.exec.PreviewExecutor$1.run(PreviewExecutor.java:74)
	at org.embulk.exec.PreviewExecutor$1.run(PreviewExecutor.java:71)
	at org.embulk.spi.Exec.doWith(Exec.java:22)
	at org.embulk.exec.PreviewExecutor.preview(PreviewExecutor.java:71)
	at org.embulk.EmbulkEmbed.preview(EmbulkEmbed.java:152)
	at org.embulk.EmbulkRunner.previewInternal(EmbulkRunner.java:215)
	at org.embulk.EmbulkRunner.preview(EmbulkRunner.java:107)
	at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:434)
	at org.embulk.cli.EmbulkRun.run(EmbulkRun.java:92)
	at org.embulk.cli.Main.main(Main.java:26)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.