GithubHelp home page GithubHelp logo

hive-json-serde's People

Contributors

andykram avatar dblock avatar dennyriadi avatar dependabot[bot] avatar georgekankava avatar guyrt avatar mhandwerker avatar peterdm avatar powerrr avatar ptrstpp950 avatar rcongiu avatar reecexlm avatar rjainqb avatar saaldjormike avatar songyunseop avatar wangxianbin1987 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hive-json-serde's Issues

handle \n linebreaks

Hi,
Your SerDe does a great job thanks!
The option to ignore malformed json is also pretty useful.
Most of the time it works fine, but in some cases the content has some \n characters in strings and Hive just doesn't handle that well at all, creating null rows everytime it sees a \n..
is there an existing property to fix this?

Thanks!

Querying a hive table with nested struct type over JSON data results in errors

The static field: static List values = new ArrayList(); in JsonStructObjectInspector.java is causing the issue to hive. I do not see any reason why this field should be static.
Basically because of this, querying a hive table with nest struct type over Json data results in errors like java.lang.IndexOutOfBoundsException or data corrupt.

See hive code in org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.java method

public static void serialize(ByteStream.Output out, Object obj,
ObjectInspector objInspector, byte[] separators, int level,
Text nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape)

throws IOException, SerDeException {

it can be called recursively for nested structs while the value of variable list the outer loop can be overwritten by the inner loop since they reference to the same static arraylist values in JsonStructObjectInspector.

Manually specifying JSON path names

Hey Robert,

I have some JSON path names with periods in them.

Is it possible to manually specify the JSON path names and for the coulmns? I saw this in karmasphere's analyst software:

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ('columns.types'='string:string', 'serialization.format'='1', 'columns'='ipAddress,userAgent')

Thanks for your help,
Ariel

Failed tests: testTimestampDeSerializeNumericTimestampWithNanoseconds(org.openx.data.jsonserde.JsonSerDeTimeStampTest)

Using the latest master, 1f925db:

$ mvn package
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for org.openx.data:json-serde:jar:1.1.7
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 37, column 21
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building openx-json-serde 1.1.7
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ json-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/brett/Development/src-mirror/Hive-JSON-Serde/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ json-serde ---
[INFO] Compiling 23 source files to /home/brett/Development/src-mirror/Hive-JSON-Serde/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ json-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ json-serde ---
[INFO] Compiling 3 source files to /home/brett/Development/src-mirror/Hive-JSON-Serde/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ json-serde ---
[INFO] Surefire report directory: /home/brett/Development/src-mirror/Hive-JSON-Serde/target/surefire-reports

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.openx.data.jsonserde.JsonSerDeTest
initialize
initialize
initialize
initialize
initialize
getSerializedClass
testSerializeWithMapping
testMapping
serialize
Output object {"timestamp":7898,"two":43.2,"one":true,"three":[],"four":"value1"}
testMapping
testMapping
initialize
deserialize
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.177 sec
Running org.openx.data.jsonserde.JsonSerDeTimeStampTest
Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec <<< FAILURE!
testTimestampDeSerializeNumericTimestampWithNanoseconds(org.openx.data.jsonserde.JsonSerDeTimeStampTest)  Time elapsed: 0.001 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2013-05-05 19:58:45.0> but was:<2013-05-05 17:58:45.0>
  at org.junit.Assert.fail(Assert.java:93)
  at org.junit.Assert.failNotEquals(Assert.java:647)
  at org.junit.Assert.assertEquals(Assert.java:128)
  at org.junit.Assert.assertEquals(Assert.java:147)
  at org.openx.data.jsonserde.JsonSerDeTimeStampTest.testTimestampDeSerializeNumericTimestampWithNanoseconds(JsonSerDeTimeStampTest.java:73)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
  at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
  at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
  at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
  at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
  at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
  at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
  at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

testTimestampDeSerializeNumericTimestamp(org.openx.data.jsonserde.JsonSerDeTimeStampTest)  Time elapsed: 0 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2013-05-05 19:58:45.0> but was:<2013-05-05 17:58:45.0>
  at org.junit.Assert.fail(Assert.java:93)
  at org.junit.Assert.failNotEquals(Assert.java:647)
  at org.junit.Assert.assertEquals(Assert.java:128)
  at org.junit.Assert.assertEquals(Assert.java:147)
  at org.openx.data.jsonserde.JsonSerDeTimeStampTest.testTimestampDeSerializeNumericTimestamp(JsonSerDeTimeStampTest.java:64)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
  at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
  at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
  at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
  at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
  at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
  at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
  at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)


Results :

Failed tests:   testTimestampDeSerializeNumericTimestampWithNanoseconds(org.openx.data.jsonserde.JsonSerDeTimeStampTest): expected:<2013-05-05 19:58:45.0> but was:<2013-05-05 17:58:45.0>
  testTimestampDeSerializeNumericTimestamp(org.openx.data.jsonserde.JsonSerDeTimeStampTest): expected:<2013-05-05 19:58:45.0> but was:<2013-05-05 17:58:45.0>

Tests run: 11, Failures: 2, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.558s
[INFO] Finished at: Tue Oct 15 11:02:07 CDT 2013
[INFO] Final Memory: 18M/299M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project json-serde: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/brett/Development/src-mirror/Hive-JSON-Serde/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

java.lang.StackOverflowError Exception

I get a java.lang.StackOverflowError while using your SerDe. Could it be because of this infinite recursive call in file JSONObjectMapAdapter -

@Override
public Object get(Object key) {
    return get(key);
}

@Override
public Object put(Object key, Object value) {
    return put(key,value);
}

All other functions in this file seem to be calling the 'cache.'

I'm basically reading in a highly nested JSON. One of the more complicated fields looks like following in hive -
parameters array<struct<decisions: array<map<string,string>>>>
Your SerDe seems to work really well in most cases, but it gives me that stack overflow exception sometimes. Do you know why?

My hadoop error log -

2012-07-04 00:53:39,726 INFO io.HiveIgnoreKeyTextOutputFormat (HiveIgnoreKeyTextOutputFormat.java:getHiveRecordWriter(82)) - HiveIgnoreKeyTextOutputFormat progressor=org.apache.hadoop.mapred.Task$TaskReporter@6c91e321
2012-07-04 00:53:39,970 INFO exec.MapOperator (Operator.java:forward(718)) - 10 forwarding 10000 rows
2012-07-04 00:53:39,970 INFO exec.TableScanOperator (Operator.java:forward(718)) - 0 forwarding 10000 rows
2012-07-04 00:53:39,971 INFO ExecMapper (ExecMapper.java:map(148)) - ExecMapper: processing 10000 rows: used memory = 39525040
2012-07-04 00:53:40,047 FATAL ExecMapper (ExecMapper.java:map(160)) - java.lang.StackOverflowError
at org.openx.data.jsonserde.objectinspector.JSONObjectMapAdapter.get(JSONObjectMapAdapter.java:97)
at org.openx.data.jsonserde.objectinspector.JSONObjectMapAdapter.get(JSONObjectMapAdapter.java:97)
at org.openx.data.jsonserde.objectinspector.JSONObjectMapAdapter.get(JSONObjectMapAdapter.java:97)
at org.openx.data.jsonserde.objectinspector.JSONObjectMapAdapter.get(JSONObjectMapAdapter.java:97)
at org.openx.data.jsonserde.objectinspector.JSONObjectMapAdapter.get(JSONObjectMapAdapter.java:97)
at org.openx.data.jsonserde.objectinspector.JSONObjectMapAdapter.get(JSONObjectMapAdapter.java:97)

How to read logs formatted as JSON Array?

I've a log file in hdfs (/saggar/hdfs_out/devices.json) containing logs in the format as shown below:

[{"dev_id":"aa","reg_time":1372638753,"os_version":"7.1"},{"dev_id":"ab","reg_time":1372638753,"os_version":"7.1.0"}]
[{"dev_id":"ba","reg_time":1372638753,"os_version":"7.1"},{"dev_id":"bb","reg_time":1372638753,"os_version":"7.1.0"}]

i.e. a file in which each log line is a JSON array of maps containing key/value pairs of string datatype.

Now is I specify ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' and load above hdfs data into my table, then any select query results in the following error:

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]

Does the code only supports JSON maps and not JSON arrays? TIA

saggar

Keywords as JSON properties

I am unable to figure out how to read "timestamp" from my json, as hive

FAILED: ParseException line 3:2 mismatched input 'timestamp' expecting Identifier near ',' in column specification

is there an "as ts" syntax?

Issue With Alter Table Not Registering New Data

Hi All,

I'm having some problems with alter table statements using the Serde.

After issuing the following command to my table:

alter table adx_event ADD columns (adx_id string, app_name string, app_id string);

The new columns are always NULL, even though the data that backs them is not.

I've reloaded the data into partitions, which doesn't seem to change matters. The only way around this is to completely recreate the table (not ideal!)

Any ideas?

EDIT: I wonder if it is related to #28 however, I'm getting NULLs due to me applying patches for NULL support from https://github.com/elreydetodo/Hive-JSON-Serde

Doug

Wrong datatype causes crash, ignore.malformed.json does not help

A value with a wrong datatype causes the generated MR job to crash. ignore.malformed.json does not seem to fix it.

Here is the sample data, mixed2.json

{"f1":"hello", "f2":7}
{"f1":"goodbye", "f2":8}
{"f1":"this", "f2":9}
{"f1":"that", "f2":"ten"}

Here is the sample Hive script, mixed2.hive. The first query (on f1) works. The other queries (on * and f2) crash. It would be nice to see NULL or something else. The get_json_object() function actually returns the bad string, so it prints "ten"!

drop table mixed2;

create table mixed2 (f1 string, f2 int)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties ("ignore.malformed.json" = "true")
stored as textfile;

load data inpath '/tmp/mixed2.json' overwrite into table mixed2;

select f1 from mixed2;

select f2 from mixed2;

select * from mixed2;

Do you support alternate row delimiters?

Suppose a JSON object spans rows in the input file, does your code support an alternate row delimiter such as vertical bar "|" or something else?

The best way to do this might be with the "SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]" clause.

Thanks very much,
Chuck

Seems to be installed correctly, but crashing on use

I create a table using this SerDe. That works fine. I load a datafile with valid JSON on each line. That works fine. I type "select * from table1". That works fine and shows the parsed fields, indicating that the SerDe is being used correctly.

Then I type "select field2 from table1". The MR job starts, but crashes. Log files below. Any clue what is going on? I would really like to use this SerDe for a big JSON/Hive project.

Thanks very much,
Chuck

++++++++++++++++++

hive.log

2012-09-06 09:20:24,264 WARN parse.SemanticAnalyzer (SemanticAnalyzer.java:genBodyPlan(5821)) - Common Gby keys:null
2012-09-06 09:20:31,244 WARN parse.SemanticAnalyzer (SemanticAnalyzer.java:genBodyPlan(5821)) - Common Gby keys:null
2012-09-06 09:20:31,420 WARN mapred.JobClient (JobClient.java:copyAndConfigureFiles(660)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2012-09-06 09:20:34,627 WARN mapreduce.Counters (AbstractCounters.java:getGroup(224)) - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2012-09-06 09:20:56,751 WARN mapreduce.Counters (AbstractCounters.java:getGroup(224)) - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2012-09-06 09:20:56,755 WARN mapreduce.Counters (AbstractCounters.java:getGroup(224)) - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2012-09-06 09:20:56,756 ERROR exec.Task (SessionState.java:printError(380)) - Ended Job = job_201209041602_0009 with errors
2012-09-06 09:20:56,757 ERROR exec.Task (SessionState.java:printError(380)) - Error during job, obtaining debugging information...
2012-09-06 09:20:56,759 ERROR exec.Task (SessionState.java:printError(380)) - Examining task ID: task_201209041602_0009_m_000002 (and more) from job job_201209041602_0009
2012-09-06 09:20:56,760 ERROR exec.Task (SessionState.java:printError(380)) - null
2012-09-06 09:20:56,769 ERROR ql.Driver (SessionState.java:printError(380)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

+++++++++++++++++++++++++++++++

hive_job_log_guest_NNNN_NNNNN.txt

SessionStart SESSION_ID="guest_201209051439" TIME="1346870375744"
QueryStart QUERY_STRING="create table t2 (field1 string, field2 string, field3 string) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as textfile" QUERY_ID="guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0" TIME="1346870426783"
Counters plan="{"queryId":"guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0","queryType":null,"queryAttributes":{"queryString":"create table t2 (field1 string, field2 string, field3 string) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as textfile"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"DDL","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}" TIME="1346870426790"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.DDLTask" TASK_ID="Stage-0" QUERY_ID="guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0" TIME="1346870426793"
Counters plan="{"queryId":"guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0","queryType":null,"queryAttributes":{"queryString":"create table t2 (field1 string, field2 string, field3 string) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as textfile"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"DDL","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"false","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346870426795"
Counters plan="{"queryId":"guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0","queryType":null,"queryAttributes":{"queryString":"create table t2 (field1 string, field2 string, field3 string) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as textfile"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"DDL","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}" TIME="1346870430239"
TaskEnd TASK_RET_CODE="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.DDLTask" TASK_ID="Stage-0" QUERY_ID="guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0" TIME="1346870430239"
QueryEnd QUERY_STRING="create table t2 (field1 string, field2 string, field3 string) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as textfile" QUERY_ID="guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" TIME="1346870430239"
Counters plan="{"queryId":"guest_20120905144040_e91da01b-fc7a-4f10-b335-7d35351c62d0","queryType":null,"queryAttributes":{"queryString":"create table t2 (field1 string, field2 string, field3 string) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as textfile"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"DDL","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}" TIME="1346870430239"
QueryStart QUERY_STRING="load data inpath '/tmp/simple1.json' into table t2" QUERY_ID="guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99" TIME="1346870445811"
Counters plan="{"queryId":"guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99","queryType":null,"queryAttributes":{"queryString":"load data inpath '/tmp/simple1.json' into table t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"MOVE","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}" TIME="1346870445811"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99" TIME="1346870445811"
Counters plan="{"queryId":"guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99","queryType":null,"queryAttributes":{"queryString":"load data inpath '/tmp/simple1.json' into table t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"MOVE","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"false","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346870445812"
Counters plan="{"queryId":"guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99","queryType":null,"queryAttributes":{"queryString":"load data inpath '/tmp/simple1.json' into table t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"MOVE","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}" TIME="1346870445989"
TaskEnd TASK_RET_CODE="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99" TIME="1346870445989"
QueryEnd QUERY_STRING="load data inpath '/tmp/simple1.json' into table t2" QUERY_ID="guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" TIME="1346870445990"
Counters plan="{"queryId":"guest_20120905144040_e8316ce6-a3c0-4a8a-a2d4-83b8797d9f99","queryType":null,"queryAttributes":{"queryString":"load data inpath '/tmp/simple1.json' into table t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"MOVE","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}" TIME="1346870445990"
QueryStart QUERY_STRING="select * from t2" QUERY_ID="guest_20120905144040_6e652430-2650-435f-9585-0998386cfcf9" TIME="1346870455696"
Counters plan="{"queryId":"guest_20120905144040_6e652430-2650-435f-9585-0998386cfcf9","queryType":null,"queryAttributes":{"queryString":"select * from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":"]","done":"false","started":"true"}" TIME="1346870455696"
QueryEnd QUERY_STRING="select * from t2" QUERY_ID="guest_20120905144040_6e652430-2650-435f-9585-0998386cfcf9" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" TIME="1346870455696"
Counters plan="{"queryId":"guest_20120905144040_6e652430-2650-435f-9585-0998386cfcf9","queryType":null,"queryAttributes":{"queryString":"select * from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":"]","done":"true","started":"true"}" TIME="1346870455696"
QueryStart QUERY_STRING="select field1 from t2" QUERY_ID="guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891" TIME="1346870464100"
Counters plan="{"queryId":"guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}" TIME="1346870464105"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask" TASK_ID="Stage-1" QUERY_ID="guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891" TIME="1346870464106"
Counters plan="{"queryId":"guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346870464110"
TaskProgress TASK_HADOOP_PROGRESS="2012-09-05 14:41:06,755 Stage-1 map = 0%, reduce = 0%" TASK_NUM_REDUCERS="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask" TASK_NUM_MAPPERS="1" TASK_COUNTERS="Job Counters .Total time spent by all maps in occupied slots (ms):1607,Map-Reduce Framework.CPU time spent (ms):0,org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter.CREATED_FILES:0" TASK_ID="Stage-1" QUERY_ID="guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891" TASK_HADOOP_ID="job_201209041602_0008" TIME="1346870466761"
Counters plan="{"queryId":"guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"0","CNTR_NAME_Stage-1_MAP_PROGRESS":"0"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346870466762"
TaskProgress TASK_HADOOP_PROGRESS="2012-09-05 14:41:31,959 Stage-1 map = 100%, reduce = 100%" TASK_NUM_REDUCERS="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask" TASK_NUM_MAPPERS="1" TASK_COUNTERS="Job Counters .Failed map tasks:1,Job Counters .Launched map tasks:4,Job Counters .Data-local map tasks:4,Job Counters .Total time spent by all maps in occupied slots (ms):22907,Job Counters .Total time spent by all reduces in occupied slots (ms):0,Job Counters .Total time spent by all maps waiting after reserving slots (ms):0,Job Counters .Total time spent by all reduces waiting after reserving slots (ms):0,Map-Reduce Framework.CPU time spent (ms):0,org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter.CREATED_FILES:0" TASK_ID="Stage-1" QUERY_ID="guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891" TASK_HADOOP_ID="job_201209041602_0008" TIME="1346870491960"
Counters plan="{"queryId":"guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346870491960"
Counters plan="{"queryId":"guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}" TIME="1346870491965"
Counters plan="{"queryId":"guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}" TIME="1346870491984"
QueryEnd QUERY_STRING="select field1 from t2" QUERY_ID="guest_20120905144141_09d44635-a05b-48c8-98d2-a35e4d235891" QUERY_NUM_TASKS="1" TIME="1346870491984"
QueryStart QUERY_STRING="select * from t2" QUERY_ID="guest_20120906092020_54687988-630d-40e0-99ca-e1ec97d64409" TIME="1346937624275"
Counters plan="{"queryId":"guest_20120906092020_54687988-630d-40e0-99ca-e1ec97d64409","queryType":null,"queryAttributes":{"queryString":"select * from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":"]","done":"false","started":"true"}" TIME="1346937624275"
QueryEnd QUERY_STRING="select * from t2" QUERY_ID="guest_20120906092020_54687988-630d-40e0-99ca-e1ec97d64409" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" TIME="1346937624276"
Counters plan="{"queryId":"guest_20120906092020_54687988-630d-40e0-99ca-e1ec97d64409","queryType":null,"queryAttributes":{"queryString":"select * from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":"]","done":"true","started":"true"}" TIME="1346937624276"
QueryStart QUERY_STRING="select field1 from t2" QUERY_ID="guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748" TIME="1346937631256"
Counters plan="{"queryId":"guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}" TIME="1346937631256"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask" TASK_ID="Stage-1" QUERY_ID="guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748" TIME="1346937631257"
Counters plan="{"queryId":"guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346937631259"
TaskProgress TASK_HADOOP_PROGRESS="2012-09-06 09:20:34,628 Stage-1 map = 0%, reduce = 0%" TASK_NUM_REDUCERS="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask" TASK_NUM_MAPPERS="1" TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local map tasks:1,Job Counters .Total time spent by all maps in occupied slots (ms):1477,Map-Reduce Framework.CPU time spent (ms):0,org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter.CREATED_FILES:0" TASK_ID="Stage-1" QUERY_ID="guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748" TASK_HADOOP_ID="job_201209041602_0009" TIME="1346937634628"
Counters plan="{"queryId":"guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"0","CNTR_NAME_Stage-1_MAP_PROGRESS":"0"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346937634629"
TaskProgress TASK_HADOOP_PROGRESS="2012-09-06 09:20:56,751 Stage-1 map = 100%, reduce = 100%" TASK_NUM_REDUCERS="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask" TASK_NUM_MAPPERS="1" TASK_COUNTERS="Job Counters .Failed map tasks:1,Job Counters .Launched map tasks:4,Job Counters .Data-local map tasks:4,Job Counters .Total time spent by all maps in occupied slots (ms):22551,Job Counters .Total time spent by all reduces in occupied slots (ms):0,Job Counters .Total time spent by all maps waiting after reserving slots (ms):0,Job Counters .Total time spent by all reduces waiting after reserving slots (ms):0,Map-Reduce Framework.CPU time spent (ms):0,org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter.CREATED_FILES:0" TASK_ID="Stage-1" QUERY_ID="guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748" TASK_HADOOP_ID="job_201209041602_0009" TIME="1346937656752"
Counters plan="{"queryId":"guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1346937656752"
Counters plan="{"queryId":"guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}" TIME="1346937656756"
Counters plan="{"queryId":"guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748","queryType":null,"queryAttributes":{"queryString":"select field1 from t2"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}" TIME="1346937656769"
QueryEnd QUERY_STRING="select field1 from t2" QUERY_ID="guest_20120906092020_3456aab7-c45d-4cea-b984-1489aff81748" QUERY_NUM_TASKS="1" TIME="1346937656769"

Map JSON key with Hive table

Hi team,

is there any way i can map json key with hive table.

e.g.
Json String {"col1" : "test" }
and in my hive table i am having column name as xcol1. so when i load json string to hive is it possible that col1 data will go to xcol1.

If yes, Do we have example for this.

Unable to deserialize null certain primitive columns.

Hi,

When I create a table such as

CREATE EXTERNAL TABLE test (
    name string,
    cost int
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/test';

and add some test data such as { "name":"Roberto", "cost": null} then when I query hive using SELECT * FROM test; I get an error that JSONObject Null cannot be cast into JavaInteger. Will send a pull request with possible fix shortly.

String cannot be cast to Integer

basically, I'm getting the following error:
"Failed with exception java.io.IOException:java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer"

I defined a column type as int but when I query the table, I always get the above error.
This is an example of what i have in the file:

{"src_port":"42603"}

thanks.

Documentation - multi-type arrays?

How would one go about creating a table from objects which look like this:

{ "key_field" : "abc", "values" : [ 12345, "someValue1" ] }
{ "key_field" : "def", "values" : [ 12346, "someValue2" ] }

This doesn't work:

CREATE TABLE table (
     key_field string,
     values array<int, string>
);

postgresql JSON_ARRAY_ELEMENTS equivalent

This SerDe is great Roberto!

I've got a quick question: Is there an equivalent to the json_array_elements function in postgresql? If you're not familiar with the function, it explodes an array into its elements. There's a few examples here (as well as additional postgresql json functions):
http://www.postgresql.org/docs/9.3/static/functions-json.html

Example: if you had 1 record, stored in an array field as follows:
[[a, b], [c,d], e]
then json_array_elements would result in 3 records:
[a,b]
[c,d]
e
which could then be split into multiple columns, using a this SerDe.
a | b
c | d
e |

This function is particularly useful for arrays that contain multiple columns since it allows you to 'explode' first and parse columns later (as opposed to parsing columns and then exploding, which results in many unwanted combinations of values if you only combinations where position is equal).

Any thoughts on this?

Require jar file for the Hive Json serde

Hi There,
I tried downloading the source for https://github.com/rcongiu/Hive-JSON-Serde and built a jar using Eclipse. However, I did not have much success using the resulting jar.
Hence,retried by downloading Maven and tried "mvn package" option but thats resulting in an error.

Could you please provide me the jar file for this, I really need to try this Serde, as this seems the best to serve my purpose of working with complex types.

Many thanks for looking at.

Regards,
Roopa

Case-insensitive nested structures

I have a JSON structure that's similar to this:

{ "filename": "test.txt", "meta": { "datas": { "INFO": { "foo": "bar" } } } }

I am trying to load this into Hive using the following:

CREATE TABLE file (
  filename STRING,
  meta STRUCT<datas:STRUCT<info:MAP<string,string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "ignore.malformed.json" = "true" )
STORED AS TEXTFILE

Note that in the JSON structure, the 'INFO' key is capitalized, whereas in the Hive structure definition it's lower-case. Hive column names and structure key names are case-insensitive.

I have found that if I try to import the original JSON, the map in the meta.datas.info field is null. However, if I change the original JSON such that 'INFO' is lowercased, I get {"foo":"bar"} as I'd expect.

I've "fixed" this with a rather brute-force application of toLowerCase() in JSONObject.java (see bradcavanagh@8ad988c for my commit). I haven't fully tested this, so I don't know if this is an appropriate patch, which is why I haven't submitted a pull request for it.

Project does not build against Hive 0.12.0-cdh5.0.0-beta-2 and later

Looks like this project does not build Hive versions based on 0.12.0. I tried both 0.12.0-cdh5.0.0-beta-2 and 0.12.0-cdh5.0.0. It does build against 0.11.0-cdh5.0.0-beta-1. Here is the output of the mvn package command:

$ mvn -Dcdh.version=0.12.0-cdh5.0.0-beta-2 package
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.openx.data:json-serde:jar:1.1.9.2
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 37, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building openx-json-serde 1.1.9.2
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ json-serde ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ json-serde ---
[INFO] Compiling 31 source files to /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringLongObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR]     constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
      (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
    constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
      (actual and formal argument lists differ in length)
/Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringTimestampObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR]     constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
      (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
    constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
      (actual and formal argument lists differ in length)
/Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringFloatObjectInspector.java:[28,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR]     constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
      (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
    constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
      (actual and formal argument lists differ in length)
/Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringByteObjectInspector.java:[28,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR]     constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
      (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
    constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
      (actual and formal argument lists differ in length)
/Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringDoubleObjectInspector.java:[28,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR]     constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
      (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
    constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
      (actual and formal argument lists differ in length)
/Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/JsonStringJavaObjectInspector.java:[30,4] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR]     constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
      (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
    constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
      (actual and formal argument lists differ in length)
/Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringShortObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR]     constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
      (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
    constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
      (actual and formal argument lists differ in length)
/Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringIntObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[INFO] 8 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.491s
[INFO] Finished at: Wed Apr 16 15:00:52 PDT 2014
[INFO] Final Memory: 22M/310M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project json-serde: Compilation failure: Compilation failure:
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringLongObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
[ERROR] (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringTimestampObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
[ERROR] (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringFloatObjectInspector.java:[28,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
[ERROR] (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringByteObjectInspector.java:[28,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
[ERROR] (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringDoubleObjectInspector.java:[28,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
[ERROR] (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/JsonStringJavaObjectInspector.java:[30,4] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
[ERROR] (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringShortObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector(PrimitiveTypeInfo) is not applicable
[ERROR] (actual argument PrimitiveTypeEntry cannot be converted to PrimitiveTypeInfo by method invocation conversion)
[ERROR] constructor AbstractPrimitiveJavaObjectInspector.AbstractPrimitiveJavaObjectInspector() is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] /Users/ilyam/src/github/rcongiu/Hive-JSON-Serde/src/main/java/org/openx/data/jsonserde/objectinspector/primitive/JavaStringIntObjectInspector.java:[29,8] error: no suitable constructor found for AbstractPrimitiveJavaObjectInspector(PrimitiveTypeEntry)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[Wed Apr 16 15:00:52] 1.9.2-p290 - 1.7.0_51 cdh5.0.0-beta-2 ilyam ~/src/github/rcongiu/Hive-JSON-Serde(master)
$

Case Issue for JSON Fields

If my input data contains column names which are in mixed case, the SERDE does not recognize those columns and get NULL values while reading their values.

For e.g. if my input is :
{"verb":"getusersessioninfo","actor":"object":{"bookID":4454},"published":"2012-02-18T09:38:43Z"}

My corresponding table is :
create external table books (
verb string,
object struct < bookID:string >,
published string
)
row format
serde 'org.openx.data.jsonserde.JsonSerDe'

If I try too read books.bookID , I get NULL.

However if I change my data to
{"verb":"getusersessioninfo","actor":"object":{"bookid":4454},"published":"2012-02-18T09:38:43Z"}
it works fine.

Hive is case in-sensitive, however during de-serialization , the cols should be considered as lower case.

mvn compile fails: hive:hive-*:pom:0.6.0_trunk not found

$ mvn compile
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building openx-json-serde
[INFO]    task-segment: [compile]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/jaka/dev/Hive-JSON-Serde/src/main/resources
Downloading: http://repo1.maven.org/maven2/hive/hive-serde/0.6.0_trunk/hive-serde-0.6.0_trunk.pom
[INFO] Unable to find resource 'hive:hive-serde:pom:0.6.0_trunk' in repository central (http://repo1.maven.org/maven2)
Downloading: http://repo1.maven.org/maven2/hive/hive-exec/0.6.0_trunk/hive-exec-0.6.0_trunk.pom
[INFO] Unable to find resource 'hive:hive-exec:pom:0.6.0_trunk' in repository central (http://repo1.maven.org/maven2)
Downloading: http://repo1.maven.org/maven2/hive/hive-serde/0.6.0_trunk/hive-serde-0.6.0_trunk.jar
[INFO] Unable to find resource 'hive:hive-serde:jar:0.6.0_trunk' in repository central (http://repo1.maven.org/maven2)
Downloading: http://repo1.maven.org/maven2/hive/hive-exec/0.6.0_trunk/hive-exec-0.6.0_trunk.jar
[INFO] Unable to find resource 'hive:hive-exec:jar:0.6.0_trunk' in repository central (http://repo1.maven.org/maven2)
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Missing:
----------
1) hive:hive-serde:jar:0.6.0_trunk

  Try downloading the file manually from the project website.

  Then, install it using the command: 
      mvn install:install-file -DgroupId=hive -DartifactId=hive-serde -Dversion=0.6.0_trunk -Dpackaging=jar -Dfile=/path/to/file

  Alternatively, if you host your own repository you can deploy the file there: 
      mvn deploy:deploy-file -DgroupId=hive -DartifactId=hive-serde -Dversion=0.6.0_trunk -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

  Path to dependency: 
    1) org.openx.data:json-serde:jar:1.1
    2) hive:hive-serde:jar:0.6.0_trunk

2) hive:hive-exec:jar:0.6.0_trunk

  Try downloading the file manually from the project website.

  Then, install it using the command: 
      mvn install:install-file -DgroupId=hive -DartifactId=hive-exec -Dversion=0.6.0_trunk -Dpackaging=jar -Dfile=/path/to/file

  Alternatively, if you host your own repository you can deploy the file there: 
      mvn deploy:deploy-file -DgroupId=hive -DartifactId=hive-exec -Dversion=0.6.0_trunk -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

  Path to dependency: 
    1) org.openx.data:json-serde:jar:1.1
    2) hive:hive-exec:jar:0.6.0_trunk

----------
2 required artifacts are missing.

for artifact: 
  org.openx.data:json-serde:jar:1.1

from the specified remote repositories:
  central (http://repo1.maven.org/maven2)



[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: Sun Jan 08 05:47:16 EST 2012
[INFO] Final Memory: 8M/19M
[INFO] ------------------------------------------------------------------------

ClassCastException String to Timestamp when doing select date field in Hive Query

For the following JSON,
{
"id":123,
"to_date":"2014-03-20 13:23:16"
}

CREATE EXTERNAL TABLE my_table(data_info struct <id bigint, to_date timestamp> )
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
LOCATION
'/tmp/my_table';

Select data_info.to_date from my_table;

I am getting the following ClassCast Exception

java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Timestamp
at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveWritableObject(JavaTimestampObjectInspector.java:33)
at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:236)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:586)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:658)
... 9 more

Error with hive 0.13

org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V (RuntimeError)

The data load wrong after alter the table's column

Hi,I create a external partitioned table on HIVE 0.11 use JSON-Serde.
The data like this๏ผˆthe partition is the date):
{"player":{"play_count": 100, "play_from_daquan": 70, "play_from_home": 30}}
{"player":{"less_5_count": 235200, "less_30_count": 3040}}
{"player":{"play_count": 300, "play_from_daquan": 200, "play_from_home": 100}}
{"player":{"less_5_count": 235, "less_30_count": 300}}
{"shaking":{"play_count": 100, "play_from_daquan": 70, "play_from_home": 30}}
{"shaking":{"less_5_count": 235200, "less_30_count": 3040}}
{"shaking":{"play_count": 300, "play_from_daquan": 200, "play_from_home": 100}}
{"shaking":{"less_5_count": 235, "less_30_count": 300}}

at first I create the table with schema
(player struct <play_count:int,play_from_daquan:int,play_from_home:int,less_5_count>,
shaking struct <play_count:int,less_5_count:int,less_30_count:int,play_from_daquan:int>)

which shows
{"play_count":100,"play_from_daquan":70,"play_from_home":30,"less_5_count":null} NULL 20140327
{"play_count":null,"play_from_daquan":null,"play_from_home":null,"less_5_count":235200} NULL 20140327
{"play_count":300,"play_from_daquan":200,"play_from_home":100,"less_5_count":null} NULL 20140327
{"play_count":null,"play_from_daquan":null,"play_from_home":null,"less_5_count":235} NULL 20140327
NULL {"play_count":100,"less_5_count":null,"less_30_count":null,"play_from_daquan":70} 20140327
NULL {"play_count":null,"less_5_count":235200,"less_30_count":3040,"play_from_daquan":null} 20140327
NULL {"play_count":300,"less_5_count":null,"less_30_count":null,"play_from_daquan":200} 20140327
NULL {"play_count":null,"less_5_count":235,"less_30_count":300,"play_from_daquan":null} 20140327

this is the right result.

But when I alter the column of the table,for example:

alter table mobile CHANGE COLUMN player player struct<play_count:int,less_5_count:int,less_30_count:int,play_from_home:int,play_from_daquan:int>;
after I run select * from mobile;I got this:
{"play_count":100,"less_5_count":70,"less_30_count":30,"play_from_home":null,"play_from_daquan":null} NULL 20140327
{"play_count":null,"less_5_count":null,"less_30_count":null,"play_from_home":235200,"play_from_daquan":null} NULL 20140327
{"play_count":300,"less_5_count":200,"less_30_count":100,"play_from_home":null,"play_from_daquan":null} NULL 20140327
{"play_count":null,"less_5_count":null,"less_30_count":null,"play_from_home":235,"play_from_daquan":null} NULL 20140327
NULL {"play_count":100,"less_5_count":null,"less_30_count":null,"play_from_home":70} 20140327
NULL {"play_count":null,"less_5_count":235200,"less_30_count":3040,"play_from_home":null} 20140327
NULL {"play_count":300,"less_5_count":null,"less_30_count":null,"play_from_home":200} 20140327
NULL {"play_count":null,"less_5_count":235,"less_30_count":300,"play_from_home":null} 20140327

Apparently I got null on column play_from_daquan, and some data like Play_from_home shows the wrong number ,which should be less_5_count.

Must I DROP AND CREATE table if I want alter the column of the table?

Nested JSON objects with different types

How do you create a table with different types under nested structure? Lets say I have the following JSON object(returned by twitter API). Nested json object(user) has a boolean, string and integer.

{ "id_str" : "1732912097569888868",
"text" : "Like, grow up !",
"user" : {
"default_profile" : false,
"default_profile_image" : false,
"description" : "My profile description",
"favourites_count" : 1,
}
}

This is how I created the table

CREATE EXTERNAL TABLE tweets (
id_str string,
text string,
user map<string, string>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://sample-tweets/test-data';

Following works
select id_str, text from tweets;
select id_str, text, user['description'] from tweets;

Following doesn't work for obvious reason. contributors_enabled is a boolean type and hence I get "Caused by: java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String"
select user['contributors_enabled'] from tweets;

Is there a way to tell hive to treat all columns as Strings? Appreciate your input. Thanks.

integer cannot be cast to double

Hi, there's a bug with casting doubles that have nothing after the decimal point to doubles, since it seems like they're being first cast to integers.

Example file test1:
{"price":3.55"}

create external table test (price double) row format serde 'org.openx.data.jsonserde.JsonSerDe';

create external table test2 (price double) row format serde 'org.openx.data.jsonserde.JsonSerDe';

load file test1 into table test, then
insert overwrite test2 select * from test;

running "select * from test2" works just fine.

now if I run the same thing but with the following test file:
{"price":3.55"}
{"price":5.0}

when I run "select * from test2" it throws the error:

Failed with exception java.io.IOException:java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double

It seems like 5.0 is being converted to an int since it has no fractional part. If we do something hacky like
insert overwrite test2 select price + 0.0000001 from test;
it works, except all the values are off by 0.0000001, but you can choose an arbitrary precision.

escape characters handling

I noticed JSON serde translates (deserializes) escaped characters back to the original value, for example \n will be translated to newline instead of kept as simple string "\n".

Is there an option to suppress this functionality and keep the string as is?

thank you

Carriage return and new line confuses Hive

Hi,
I'm having trouble with json looking like this:

  {"key":"Foo\rBar"}

This will be interpreted by the library as the character '\r' which is carriage return. This seams to be all according to the spec.

However, Hive will interpret the control character and my result set will get messed up.
Actually a select key on the above would cause hive to return two rows looking something like this:

  hive> select key from table;
  ...
  Job 0: HDFS Write: xxxxxx SUCCESS
  Total MapReduce CPU Time Spent: x minutes
  OK
  Foo
  Bar
  Time taken: xxx.xxx seconds, Fetched: 2 row(s)
  hive>

If I use the default SerDe and read the whole json as a string, the SerDe will return '' 'r' and the result set will be one row.

  hive> select * from raw_table;
  ...
  Job 0: HDFS Write: xxxxxx SUCCESS
  Total MapReduce CPU Time Spent: x minutes
  OK
  {"key":"Foo\rBar"}
  Time taken: xxx.xxx seconds, Fetched: 1 row(s)
  hive>

I've looked through the code and it would be no big thing to fix this to behave the same way as the default SerDe. Howerver, this would slightly bend the rules of JSON.

I'll have a look and see how other implementations like Gson and jackson does this.

org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable

Still have this issue with casting.

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:191)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:454)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1061)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1113)
... 14 more
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:191)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:454)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1061)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1113)
... 14 more

Mapping deeper keys

Is it possible to map a nested key into a simple column? I tried to parse something like

{"firstname": "xxx", "deep": {"lastname": "yyy"}}

with

CREATE EXTERNAL TABLE rawTest (
  firstname string,
  lastname string )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("mapping.lastname" = "deep.lastname") 
LOCATION 's3n://blabla/'

but "lastname" comes empty, trying to look at the code didn't find any parsing
of the mapping value, so Im guessing it doesn't work, right?

Int converted to Bigint when creating table

The serde seems to convert int to bigint when creating a Hive table. This causes a ClassCastException later when I try to insert int. Is this a defect?

hive (default)> CREATE TABLE tempdatatable731
( year int)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' ;

OK
Time taken: 0.078 seconds

hive (default)> DESCRIBE tempdatatable731;

OK
col_name data_type comment
year bigint from deserializer

Time taken: 0.128 seconds, Fetched: 1 row(s)
hive (default)>

Null Ptr Exception in MR Job @ JsonStructObjectInspector.java via HCatalog

I have experienced an Exception in a MapReduce Job:
As can be seen from the call stack, HCatalog is used in the MR job:

java.lang.NullPointerException
at org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.getStructFieldsDataAsList(JsonStructObjectInspector.java:134)
at org.apache.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:162)
at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:194)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapT

The HIVE table create is:
CREATE TABLE IF NOT EXISTS twitter_feed1 (retweeted_status struct<contributors:string, text:string, geo:string, retweeted:string, in_reply_to_screen_name:string, truncated:string, entities:struct<urls:array, hashtags:array<struct<text:string, indices:array>>, user_mentions:array>, in_reply_to_status_id_str:string, id:bigint, in_reply_to_user_id_str:string, source:string, favorited:string, in_reply_to_status_id:string, created_at:string, retweet_count:int, in_reply_to_user_id:string, id_str:string, place:string, user:struct<location:string, default_profile:string, profile_background_tile:string, statuses_count:int, lang:string, profile_link_color:string, id:int, following:string, favourites_count:int, protected:string, profile_text_color:string, description:string, verified:string, contributors_enabled:string, name:string, profile_sidebar_border_color:string, profile_background_color:string, created_at:string, default_profile_image:string, followers_count:int, profile_image_url_https:string, geo_enabled:string, profile_background_image_url:string, profile_background_image_url_https:string, follow_request_sent:string, url:string, utc_offset:int, time_zone:string, notifications:string, profile_use_background_image:string, friends_count:int, profile_sidebar_fill_color:string, screen_name:string, id_str:string, show_all_inline_media:string, profile_image_url:string, listed_count:int, is_translator:string>, coordinates:string>, contributors string, text string, geo string, retweeted string, in_reply_to_screen_name string, truncated string, entities structurls:array<string, hashtags:array<struct<text:string, indices:array>>, user_mentions:array<struct<id:int, name:string, indices:array, screen_name:string, id_str:string>>>, in_reply_to_status_id_str string, id bigint, in_reply_to_user_id_str string, source string, favorited string, in_reply_to_status_id string, created_at string, retweet_count int, in_reply_to_user_id string, id_str string, place string, user struct<location:string, default_profile:string, profile_background_tile:string, statuses_count:int, lang:string, profile_link_color:string, id:int, following:string, favourites_count:int, protected:string, profile_text_color:string, description:string, verified:string, contributors_enabled:string, name:string, profile_sidebar_border_color:string, profile_background_color:string, created_at:string, default_profile_image:string, followers_count:int, profile_image_url_https:string, geo_enabled:string, profile_background_image_url:string, profile_background_image_url_https:string, follow_request_sent:string, url:string, utc_offset:string, time_zone:string, notifications:string, profile_use_background_image:string, friends_count:int, profile_sidebar_fill_color:string, screen_name:string, id_str:string, show_all_inline_media:string, profile_image_url:string, listed_count:int, is_translator:string>, coordinates string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE;

Error parsing null json values when the data type is an array

Let me start by saying thanks for this SerDe! I've tried two others and this one has worked the best so far.

I'm having a problem creating a table parsing the "content" data using the definition below

create external table if not exists raw_urls (http_result string, content map<string,array>) ...

It works well until I eventually get a "Runtime Error while processing row {"http_result":"200","content":null}" which throws "Integer cannot be cast to org.openx.data.jsonserde.json.JSONArray"

Any help you can provide would be great!

java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object

describe table_name:
mn int from deserializer
sn string from deserializer
msn string from deserializer
iu string from deserializer
iv int from deserializer
vi string from deserializer
at string from deserializer
tg array from deserializer
un string from deserializer
wt string from deserializer
fui array from deserializer
de string from deserializer
id string from deserializer
email string from deserializer
ei string from deserializer
an int from deserializer
ci string from deserializer
ad string from deserializer
fn int from deserializer
bd string from deserializer
dr string from deserializer
sx string from deserializer
qq string from deserializer

select * from table_name

OK
Failed with exception java.io.IOException:java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object;

Cannot INSERT OVERWRITE a table defined with the SerDe when using Hive 0.8

First off, thanks for releasing this SerDe on GitHub. It's been very helpful!

I'm having a problem writing to a table defined with this SerDe when running Hive 0.8 with Hadoop 1.0.3 on Amazon ElasticMapReduce. If I define an EXTERNAL TABLE foobar and then run a query that attempts to INSERT OVERWRITE TABLE foobar, I generate an error that looks like this:

Error during job, obtaining debugging information...
Examining task ID: task_201208172225_1012_m_000009 (and more) from job job_201208172225_1012
Examining task ID: task_201208172225_1012_m_000004 (and more) from job job_201208172225_1012
Examining task ID: task_201208172225_1012_m_000001 (and more) from job job_201208172225_1012
Examining task ID: task_201208172225_1012_m_000002 (and more) from job job_201208172225_1012
Exception in thread "Thread-65" java.lang.RuntimeException: Error while reading from task log url
     at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
     at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
     at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
     at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://10.17.34.118:9103/tasklog?taskid=attempt_201208172225_1012_m_000005_2&start=-8193
     at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
     at java.net.URL.openStream(URL.java:1010)
     at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
     ... 3 more
Counters:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

The error makes it look like a tasktracker is not responding, but because I am able to write just fine to tables using Hive's default SerDe on the same cluster, I suspect that this has something to do with the SerDe rather than an actual tasktracker connectivity problem. Moreover, writing a table defined with the JSON SerDe works consistently in Hive 0.7 but fails consistently in Hive 0.8, so I think it has something to do with something not quite working right with the SerDe and Hive 0.8.

Did a SerDe interface change between Hive 0.7 and Hive 0.8?

Let me know if I can help in any way. Happy to provide as much information as you need.

treatment of null

JSONObject will parse { field = null } but instead of a null, it uses a NULL object singleton that confuses the heck out of hive.
Have the ObjectInspector handle that.

Class Cast Exceptions - need to avoid Hive Jobs Crashing due to minor data inconsistencies. - DECIMAL support

I have run into a couple of different exceptions:

  1. A string trying to be converted to BigDecimal
  2. A null value present in the imported JSON causes a Hive query to blow up.

Is there a way to make this more fault tolerant with some fallback actions in the event of a CastClass Exception occuring?

Details below:
java.lang.String cannot be cast to java.math.BigDecimal
at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaBigDecimalObjectInspector.getPrimitiveJavaObject(JavaBigDecimalObjectInspector.java:40)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:284)

[Error getting row data with exception java.lang.ClassCastException: org.openx.data.jsonserde.json.JSONObject$Null cannot be cast to java.lang.Long
at org.openx.data.jsonserde.objectinspector.primitive.JavaStringLongObjectInspector.get(JavaStringLongObjectInspector.java:49)

Exception when parsing multi-byte twitter data

Good stuff here in this Serde, probably the best out there.

I think we're hitting the same or similar problem parsing twitter data. The problem seems prevalent when parsing tweets with multi-byte characters, e.g. lang="jp" is common in the failures we're seeing.

I'm still working on problem determination, though careful reading of the exception says that "300 " is not being parsed as an int. The only place in the data where the characters 300 occur is at the end of the description field and it's part of a multibyte string thats not even being projected.

It would be great for us if the serde ignored JSON that causes parse exceptions rather than throw the exception.

Many thanks in advance,
Douglas

CODE:

CREATE EXTERNAL TABLE tweets_clean (
 created_at STRING,
 lang STRING,
 entities STRUCT<
   urls:ARRAY<STRING>,
   user_mentions:ARRAY<STRING>,
   hashtags:ARRAY<STRUCT<text:STRING>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "ignore.malformed.json" = "true")
LOCATION '/twitter-data/';

SELECT
 LOWER(hashtags.text) AS hashtag,
 unix_timestamp(created_at, "EEE MMM d HH:mm:ss Z yyyy")/900 AS timebucket,
 COUNT(*) AS total_count
FROM tweets_clean
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
WHERE hashtags is not null
AND created_at is not null
GROUP BY LOWER(hashtags.text), unix_timestamp(created_at, "EEE MMM d HH:mm:ss Z yyyy")/900
ORDER BY total_count DESC

LOG ENTRY:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"text":"\u30b3\u30a4\u30f3\u30e9\u30f3\u30c9\u30ea\u30fc\u3067\u3050\u308b\u3050\u308b\u56de\u308b\u6d17\u6fef\u7269\u3092\u3058\u3063\u3068\u898b\u3064\u3081\u308b\u304a\u3070\u3055\u3093\u306e\u76ee\u304c\u6016\u3044\u304f\u3089\u3044\u306b\u6b7b\u3093\u3067\u308b\u3002\u4f55\u304c\u3042\u3063\u305f\u3093\u3060\u3088\u304a\u3070\u3055\u3093\uff01","entities":{"urls":[],"hashtags":[],"user_mentions":[]},"id_str":"145726868841181186","place":null,"truncated":false,"contributors":null,"in_reply_to_user_id":null,"source":"\u003Ca href="http://twicca.r246.jp/" rel="nofollow"\u003Etwicca\u003C/a\u003E","created_at":"Sun Dec 11 04:49:27 +0000 2011","geo":null,"retweet_count":0,"favorited":false,"coordinates":null,"in_reply_to_screen_name":null,"in_reply_to_status_id_str":null,"user":{"profile_use_background_image":true,"lang":"ja","favourites_count":117,"profile_text_color":"3E4415","id_str":"183252066","profile_background_image_url":"http://a1.twimg.com/images/themes/theme5/bg.gif","screen_name":"aka_navratilova","statuses_count":4267,"profile_link_color":"D02B55","description":"work to eat,eat to work.\r\n\u30b5\u30f3\u30c7\u30fc\u30b7\u30f3\u30ac\u30fc\u30bd\u30f3\u30b0\u30e9\u30a4\u30bf\u30fc\u3067\u3059\u300

    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"text":"\u30b3\u30a4\u30f3\u30e9\u30f3\u30c9\u30ea\u30fc\u3067\u3050\u308b\u3050\u308b\u56de\u308b\u6d17\u6fef\u7269\u3092\u3058\u3063\u3068\u898b\u3064\u3081\u308b\u304a\u3070\u3055\u3093\u306e\u76ee\u304c\u6016\u3044\u304f\u3089\u3044\u306b\u6b7b\u3093\u3067\u308b\u3002\u4f55\u304c\u3042\u3063\u305f\u3093\u3060\u3088\u304a\u3070\u3055\u3093\uff01","entities":{"urls":[],"hashtags":[],"user_mentions":[]},"id_str":"145726868841181186","place":null,"truncated":false,"contributors":null,"in_reply_to_user_id":null,"source":"\u003Ca href="http://twicca.r246.jp/" rel="nofollow"\u003Etwicca\u003C/a\u003E","created_at":"Sun Dec 11 04:49:27 +0000 2011","geo":null,"retweet_count":0,"favorited":false,"coordinates":null,"in_reply_to_screen_name":null,"in_reply_to_status_id_str":null,"user":{"profile_use_background_image":true,"lang":"ja","favourites_count":117,"profile_text_color":"3E4415","id_str":"183252066","profile_background_image_url":"http://a1.twimg.com/images/themes/theme5/bg.gif","screen_name":"aka_navratilova","statuses_count":4267,"profile_link_color":"D02B55","description":"work to eat,eat to work.\r\n\u30b5\u30f3\u30c7\u30fc\u30b7\u30f3\u30ac\u30fc\u30bd\u30f3\u30b0\u30e9\u30a4\u30bf\u30fc\u3067\u3059\u300

    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
    ... 8 more
Caused by: java.lang.NumberFormatException: For input string: "300  "
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:458)
    at org.openx.data.jsonserde.json.JSONTokener.nextString(JSONTokener.java:279)
    at org.openx.data.jsonserde.json.JSONTokener.nextValue(JSONTokener.java:359)
    at org.openx.data.jsonserde.json.JSONObject.<init>(JSONObject.java:208)
    at org.openx.data.jsonserde.json.JSONTokener.nextValue(JSONTokener.java:362)
    at org.openx.data.jsonserde.json.JSONObject.<init>(JSONObject.java:208)
    at org.openx.data.jsonserde.json.JSONObject.<init>(JSONObject.java:310)
    at org.openx.data.jsonserde.JsonSerDe$1.<init>(JsonSerDe.java:150)
    at org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:150)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
    ... 9 more

Does the SerDe support Hive FLOAT type?

I created two Hive tables that are identical, each with some float fields. One table is standard without this SerDe, the other table uses this SerDe. I put modest floating values into both tables, such as 4444.55 and 33.888.

The regular table works fine when I do "select * from table-name". The SerDe table crashes on the same command, with a Java error about "java.lang.Double cannot be cast to java.lang.Float".

Thank you,
Chuck

Where clause matches but result is null

Hi, I am using this project to see if I can use it to query many large nested documents. So far I have only tried with small documents and I have a question.

Schema

create table fresspayload (
payload struct<id:string,jsonrpc:string,
params:struct<
device_info:structmodel:string,
session:struct<
id:struct<type:string,value:string
>
>
>
> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE;

Json document:

{
"payload": {
"jsonrpc": "2.0",
"id": "0cd0f2f8-31b6-4038-9864-1bcc01dee188",
"params": {
"device_info": {
"model": "HTC Desire",
"device_id": "xxxx"
},
"session": {
"id": {
"type": "msisdn",
"value": "xxxxx"
},
"type": "app"
},
"received_at": "2012-08-28T00:07:21.858Z"
},
"method": "connect"
},
"protocol_version": "1.1",
"route": {
"source": {
"id": {
"type": "msisdn",
"value": "xxxx"
},
"type": "app"
}
}
}

Once imported I run:

hive> select payload.params from fresspayload;
Total MapReduce CPU Time Spent: 2 seconds 70 msec
OK
{"device_info":{"model":"HTC Desire"},"session":null}
Time taken: 30.232 seconds

It seems like "session" is null

But when I write a Where clause with the session id it returns a hit, see below.
hive> select payload.params from fresspayload where payload.params.session.id.value == "xxxx";
Total MapReduce CPU Time Spent: 2 seconds 230 msec
OK
{"device_info":{"model":"HTC Desire"},"session":null}
Time taken: 30.117 seconds

But when I run with nonexisting id I do not get any matches.
hive> select payload.params from fresspayload where payload.params.session.id.value = "non_existing";
MapReduce Total cumulative CPU time: 1 seconds 910 msec
Ended Job = job_201209041503_0096
MapReduce Jobs Launched:
Job 0: Map: 1 Accumulative CPU: 1.91 sec HDFS Read: 0 HDFS Write: 0 SUCESS
Total MapReduce CPU Time Spent: 1 seconds 910 msec
OK

Since the WHERE clause matches there must be something right with this... do you have any input concerning this?

Kind regards /Johan Rask

Issue when altering a table

hello,

I stumble upon an issue when I alter the table:
I have a partitioned table, and I decided to add a column:

alter table impressions add columns (ip STRING);

But thereafter I have an exception:

2013-07-05 13:19:28,270 ERROR CliDriver (SessionState.java:printError(386)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating user_id
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating user_id
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating user_id
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:490)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
    ... 11 more
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.openx.data.jsonserde.json.JSONObject
    at org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.getStructFieldData(JsonStructObjectInspector.java:57)
    at org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldData(DelegatedStructObjectInspector.java:79)
    at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
    at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
    ... 17 more

I discovered that in file: JsonStructObjectInspector::getStructFieldData is actually an array of null object....

I'm using hive-0.11.0 on hadoop 0.23.8.

Do you have any recommendation or clue of what happens ? I can easily test whatever you think relevant to track down this issue.

Hive w/ Json Serde returns zero records for partitioned tables

I wonder if anyone ever tried rcongiu's Json Serde to define an external table on a partitioned Hive table.

If I define a non-partitioned external table setting the location to one of the partition folders, then it returns records as expected (for that single partition, of course).

However, if I set the parent folder for all partitions as the location for the partitioned external table, zero rows are returned.

By setting enabling DEBUG on the hive shell command line, I can see that the job reports a single path to be processed, which in turn results in zero map tasks.

If I define a partitioned table that sees the entire record as a single string column, then it fetches records from the partitions without a problem.

Any ideas?

Thanks,

Julius

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.