GithubHelp home page GithubHelp logo

hive-json-schema's People

Contributors

quux00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hive-json-schema's Issues

IllegalStateException on json with empty arrays

Hi @quux00, thank you for the tool, it's very handy!

So, I tried to generate a Hive schema out of a json containing an empty array, but it throws an exception

Exception in thread "main" java.lang.IllegalStateException: Array is empty: [] at net.thornydev.JsonHiveSchema.arrayJoin(JsonHiveSchema.java:136) at net.thornydev.JsonHiveSchema.toHiveSchema(JsonHiveSchema.java:129) at net.thornydev.JsonHiveSchema.valueToHiveSchema(JsonHiveSchema.java:179) at net.thornydev.JsonHiveSchema.createHiveSchema(JsonHiveSchema.java:103) at net.thornydev.JsonHiveSchema.main(JsonHiveSchema.java:66)

I read your recommendation on using a doc with a single entry in each array, does that mean that the tool doesn't support Union types yet? Even though I've come to understand that these fields support is incomplete and can only be used in SELECT clauses.

Wrong classpath in README

In readme there are execution examples e.g.:
java -cp target/json-hive-schema-1.0.jar net.thorndev.JsonHiveSchema file.json

This won't work, because correct classpath is net.thornydev.JsonHiveSchema file.json
Likely a 'y' as omitted by a typo.

So, the correct examples should be like
java -cp target/json-hive-schema-1.0.jar net.thornydev.JsonHiveSchema file.json

Impala compatibility

Hi!

I'm trying to use the Hive table on Impala, but I can't find a way to make Impala understand the "ADD JAR" command.

It is possible to use a Hive table created with this Json serialization with Impala?

Regards,
André

Order of the columns are not same as input.json

I have used below JSON to generate the DDL.

{
"business_id": "String",
"name": "String",
"neighborhood": "String",
"address": "String",
"city": "String",
"state": "String",
"postal_code": 12345,
"latitude": 124124124124,
"longitude": -111.936102,
"stars": 4.5,
"review_count": 17,
"is_open": 0,
"attributes": [{
"BikeParking": true,
"BusinessAcceptsBitcoin": false,
"BusinessAcceptsCreditCards": false,
"BusinessParking": {
"street": false,
"validated": false,
"lot": true,
"valet": false
},
"DogsAllowed": false,
"RestaurantsPriceRange2": 2,
"WheelchairAccessible": true
}],
"categories": [
"Tobacco Shops",
"Nightlife",
"Vape Shops",
"Shopping"
],
"hours": [
"Monday 11:0-21:0",
"Tuesday 11:0-21:0",
"Wednesday 11:0-21:0",
"Thursday 11:0-21:0",
"Friday 11:0-22:0",
"Saturday 10:0-22:0",
"Sunday 11:0-18:0"
],
"type": "business"
}

It created below

CREATE TABLE my_table_name (
address string,
attributes array<struct<bikeparking:boolean, businessacceptsbitcoin:boolean, businessacceptscreditcards:boolean, businessparking:struct<lot:boolean, street:boolean, valet:boolean, validated:boolean>, dogsallowed:boolean, restaurantspricerange2:int, wheelchairaccessible:boolean>>,
business_id string,
categories array,
city string,
hours array,
is_open int,
latitude int,
longitude double,
name string,
neighborhood string,
postal_code int,
review_count int,
stars double,
state string,
type string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

I want to generate the order of the columns in the same order of JSON but in the output, columns are sorted alphabetical order.

Support multiple JSONs

I find you project interesting, however, I think it lacks a key feature - the ability to deduce a schema from multiple json documents, one per line. Then you compute the "greatest common denominator" of all of them.

This removes a layer of human intervention (putting all the keys in one document). For the implementation details, you can check out this project:

https://github.com/strelec/hive-serde-gen

generating jars not working

Hi, I have just submitted the issue in comment of existing issues, but that did not sound relevant to the existing issue so I created a new one. I am trying to build jar files but getting it stuck at the download phase as follows. Please can you help>

[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for net.thornydev:json-hive-schema:jar:1.0
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 13, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] -------------------< net.thornydev:json-hive-schema >-------------------
[INFO] Building json-hive-schema 1.0
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from nexus: http://nexus.elsst.com/content/groups/public/org/apache/maven/plugins/maven-assembly-plugin/2.4/maven-assembly-plugin-2.4.jar
Progress (1): 25/226 kB

Hive creation failed

Hi,
This was so great tool for simplifing the json record into hive structure.
I have created the jar files and used them to generate hive DDL .
But the create table statement is failing in hive.
can you pls help to resolve the issue?

sample data record:
{"country":"uk","state":"ny","city":"fr","street":"nyk","zip":"1009","data":[{"country_code":"uk","state_set":"ny","city_code":"fr","street_code":"nyk","zip_code":"1009"}]}

user@ubuntu:~/lab/programs$ java -jar json-hive-schema-1.0-jar-with-dependencies.jar /home/user/Documents/json_cntry_schema.json
CREATE TABLE x (
city string,
country string,
data array<struct<city_code:string, country_code:string, state_set:string, street_code:string, zip_code:string>>,
state string,
street string,
zip string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

hive> CREATE external TABLE jcountry (
> city string,
> country string,
> data array<struct<city_code:string, country_code:string, state_set:string, street_code:string, zip_code:string>>,
> state string,
> street string,
> zip string)
> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
> LOCATION '/user/data1/jason/';
FAILED: Parse Error: line 4:2 mismatched input 'data' expecting Identifier near ',' in column specification

Thanks ,
Pushpa

Only first item examined?

It seems as if only the first item in an array is examined?

If the two elements of "wobble" in the example are swapped, "details2" is not generated.

auto generated schema with reserved name: timestamp

I changed: timestamp to time_stamp, to overcome the problem.

CREATE TABLE visits (
time_stamp string,
user_info struct<app_key:string, device_id:string, user_id:string>,
visit struct<end_time:string, event_type:string, id:string, is_confirmed:boolean, is_ongoing:boolean, place:struct<estimated_address:struct<cc:string, city:string, country:string, formatted_address:string, formatted_city:string, postal_code:string, state:string, street_address:string>, estimated_geolocation:struct<accuracy:double, lat:double, long:double>, first_visit_time:string, id:string, last_visit_time:string, type:string>, place_id:string, start_time:string>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

The attribute name is unique in entire json.

Hi, The object name/attribute name is duplicate in the different object like with in array or structure then it is now working.
expected: The names within an object SHOULD be unique. but not in the entire json.

keys with whitespace or punctuation

Hi, thanks for making this, it is saving me some trouble.

I did notice that when reading keys with whitespace, the output schema syntax is invalid.

Here's an example:

{"name": "sometext",
"stuff": {"white space":false},
"white space": "blah blah"
}

The resulting schema output will be invalid syntax due to the spaces:

CREATE TABLE x (
  name string,
  stuff struct<white space:boolean>,
  white space string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

Hive does support column names with whitespace (and, in fact, any Unicode character).

The simple solution is to enclose the column names in backticks.

Use bigint for Long values

The generated table definition uses 'int' data types for values that are longs.
The 'curated' JSON document used Long.MAX_VALUE as the field value.

The scalarNumericType() method should probably attempt convert the value to an int and if that fails then return bigint instead of int.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.