GithubHelp home page GithubHelp logo

esri / geoprocessing-tools-for-hadoop Goto Github PK

View Code? Open in Web Editor NEW
73.0 34.0 48.0 371 KB

The Hadoop GP Toolbox provides tools to exchange features between a Geodatabase and Hadoop and run Hadoop workflow jobs.

License: Apache License 2.0

Python 100.00%

geoprocessing-tools-for-hadoop's Introduction

geoprocessing-tools-for-hadoop

The Geoprocessing Tools for Hadoop provides tools to help integrate ArcGIS with Hadoop. More specifically, tools are provided that:

See these tools in action as part of the samples in GIS Tools for Hadoop.

Features

  • Tools to convert between Feature Classes in a Geodatabase and JSON formatted files.
  • Tools that copy data files from ArcGIS to Hadoop, and copy files from Hadoop to ArcGIS.
  • Tools to run an Oozie workflow in Hadoop, and to check the status of a submitted workflow.

Wiki and Tutorials

  • Wiki of available geoprocessing tools.
  • Tutorials on how to run the geoprocessing tools.

Instructions

  1. Download this repository as a .zip file and unzip to a suitable location or clone the repository with a git tool.
  2. WebHDFS and Requests libraries in the tool folder are provided for convenience. If you know you will using libraries installed in /site-packages folder then remove 'webhdfs' or 'requests' folders - otherwise do not.
  3. In the ‘ArcToolbox’ pane of ArcGIS Desktop, use the ‘Add Toolbox…’ command to add the Hadoop Tools toolbox (the HadoopTools.pyt file you saved in step 1) file into ArcGIS Desktop.
  4. Use the tools individually, or use them in models and scripts, such as the examples in: GIS Tools for Hadoop.

Requirements

  • ArcGIS 10.1 or later.
  • A Hadoop system with WebHDFS support.

Dependencies

Resources

Issues

Find a bug or want to request a new feature? Please let us know by submitting an issue.

Contributing

Esri welcomes contributions from anyone and everyone. Please see our guidelines for contributing

Licensing

Copyright 2013-2019 Esri

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

A copy of the license is available in the repository's license.txt file.

geoprocessing-tools-for-hadoop's People

Contributors

azhigimont avatar climbage avatar erikhoel avatar mjoseph27 avatar randallwhitman avatar smambrose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geoprocessing-tools-for-hadoop's Issues

Error using CopyFromHDFS

Hi all,

I'm having troubles when importing JSON table from HDFS to ArcMap 10.3 (English language package).

The table is succesfully generated from earthquake.csv file from sample by:

CREATE TABLE agg_samp(point binary)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
INSERT OVERWRITE TABLE agg_samp SELECT ST_Point(latitude, longitude) FROM earthquakes_new;

and it looks like this:
tabla

When importing it by the set up:
copyfromhdfs

I get the following error:
error

I already checked Issue #22 but couldn't find a solution.

Can you, please, help me?

Thanks in advance.

FeaturesToJSON chokes on % in attributes

In shapefiles with text attributes that include the percent sign (%), JSONUtil.py fails for both Enclosed and Unenclosed Features to JSON options.

I'd submit a pull request if only I was better at python.

Here are two examples and the resulting errors for both enclosed and unenclosed JSON:

Example 1

secure.firstenergycorp.com/servlet/com.firstenergycorp.webobjects.Engine;jsessionid=NILS2LQAABJWHLA1AAP1LPI?s=com.firstenergycorp.www.Home&o=43304514&q=3&p=%2FContact+Us

Enclosed output:

Executing: FeaturesToJSON iou_terr D:\GIS_DATA\platts\enc_float.json ENCLOSED_JSON FORMATTED
Start Time: Fri Mar 06 11:39:38 2015
Running script FeaturesToJSON...

Traceback (most recent call last):
  File "<string>", line 363, in execute
  File "D:\Code\geoprocessing-tools-for-hadoop\JSONUtil.py", line 368, in ConvertFC2JSON
    geometry_str = unicode(row[len(row) - 1]) if pjson != True else unicode(json.dumps(json.loads(row[len(row) - 1]), indent=4))
TypeError: a float is required

Failed to execute (FeaturesToJSON).
Failed at Fri Mar 06 11:39:39 2015 (Elapsed Time: 0.97 seconds)

Unenclosed output:

Executing: FeaturesToJSON iou_terr D:\GIS_DATA\platts\iou_unenc_some_bad.json UNENCLOSED_JSON FORMATTED
Start Time: Fri Mar 06 11:24:03 2015
Running script FeaturesToJSON...

Traceback (most recent call last):
  File "<string>", line 365, in execute
  File "D:\Code\geoprocessing-tools-for-hadoop\JSONUtil.py", line 422, in ConvertFC2JSONUnenclosed
    attributes_json.clear()
TypeError: a float is required

Failed to execute (FeaturesToJSON).
Failed at Fri Mar 06 11:24:04 2015 (Elapsed Time: 0.93 seconds)

Example 2

"www.heco.com/CDA/default/0,1999,TCID%253D8%2526CCID%253D0%2526LCID%253D0%2526CTYP%253DARTC,00.html"

Enclosed output:

Executing: FeaturesToJSON iou_terr D:\GIS_DATA\platts\enc_format_char.json ENCLOSED_JSON FORMATTED
Start Time: Fri Mar 06 11:40:48 2015
Running script FeaturesToJSON...

Traceback (most recent call last):
  File "<string>", line 363, in execute
  File "D:\Code\geoprocessing-tools-for-hadoop\JSONUtil.py", line 368, in ConvertFC2JSON
    geometry_str = unicode(row[len(row) - 1]) if pjson != True else unicode(json.dumps(json.loads(row[len(row) - 1]), indent=4))
ValueError: unsupported format character 'D' (0x44) at index 261

Failed to execute (FeaturesToJSON).
Failed at Fri Mar 06 11:40:52 2015 (Elapsed Time: 3.92 seconds)

Unenclosed output:

Executing: FeaturesToJSON iou_terr D:\GIS_DATA\platts\iou_unenc_some_bad.json UNENCLOSED_JSON FORMATTED
Start Time: Fri Mar 06 11:25:06 2015
Running script FeaturesToJSON...

Traceback (most recent call last):
  File "<string>", line 365, in execute
  File "D:\Code\geoprocessing-tools-for-hadoop\JSONUtil.py", line 422, in ConvertFC2JSONUnenclosed
    attributes_json.clear()
ValueError: unsupported format character 'D' (0x44) at index 261

Failed to execute (FeaturesToJSON).
Failed at Fri Mar 06 11:25:09 2015 (Elapsed Time: 3.88 seconds)

Workaround

I'm just going to null out those attribute values.

Communication via KNOX

A customer wants to use mainly KNOX to communicate with his Hadoop System.
As far as I see this GP-Tools cann't use KNOX.
Am I right?
Are there plans to implement this?

Better Error Messages in the case of Network Issues

It is a common occurence that users encounter issues with network configuration - such as etc/hosts or firewall - when trying to use the Geoprocessing Tools for Hadoop. An obscure error message such as getaddrinfo failed leads to the first impression that there are software bugs in the GP Tools. We should present more informative and helpful error messages, that instruct the user to investigate network configuration, rather than giving up or filing a bug report before checking the network. Cross-reference the multiple GIS-Tools-for-Hadoop issues that arose out of network issues:
Esri/gis-tools-for-hadoop#22
Esri/gis-tools-for-hadoop#16
Esri/gis-tools-for-hadoop#14

Not able to retrieve DATE values in hive

Hi,
I am having problem related to NULL values on date column after querying in HIVE.

Procedure
Our aim is to transfer the feature classes from a geodatabase and copy it to HDFS using hadoop toolbox for esri. So the steps are we create a json file using features to json and then create a table in hive using the create . Then we copy it to hdfs and then start querying on hive to retrieve the field values.

Problems I am facing
I am having issue with the date fields as mentioned in the screenshots. While creating a table I tried using date as the data type in one create statement and string as the data type in another create statement .
In the former statement after copying to hdfs and while querying, i am unable to retrieve any values and it shows Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to org.apache.hadoop.io.Text.
In the latter statement when I used string as the datatype, it shows some random values as expected and when i tried to typecast the date column as date I am getting null values.

I have attached the two screenshots . Kindly help me with this issue as I am not able to proceed further in my attempt to migrating the feature classes to hadoop
date error
date string bhuj

Port Question - Copy from HDFS

Does the tool "copy from HDFS" communicate only via the namenode port, which is usually 50070?
Or can it use other ports like from datanodes or zookeeper?

Additional question: If the customer is not sure which port his namenode (HDFS TCP port number) is configured, how could he find out which port to use?

Alias in toolbox is missing

Hey guys,

the alias is missing in the toolbox and as a regualr user you are not able to write something in the properties. At least with my user account (full rights).

That's annoying, because when exporting the toolbox to a python script, ArcMap wants to have an alias for the toolbox.

Thanks!

failure

Support for Line & Polygon?

Does "JSON to Features" only support Point-Features?
I allways get an error: "sequence size must match size of the row" when trying to convert a polygon.
I created the json with the "Features to JSON" tool with a very simple polygon featureclass.
The json-file is attached, (just rename the txt to json).

Poly_WGS84_Enclosed_Formatted.txt

Publishing geoprocessing service

Hi,

I created a Model and and ran it as a tool. From the results, I tried to publish it to my server as a geoprocesing service so that I can consume it from webappbuilder. But it failed.
capture

I have 10.3.1 server, 10.3.1 desktop and Hortonworks 2.3.2 VM running locally.

Is this GP service capability supported?

tools copy to HDFS Unexpected error : [Errno 10060]

I have set up the hadoop( it is Pseudo-distributed),it can be run perfectly; I attempt to use the tools of copy to HDFS,the ERROR happened;
I depolyed the hdfs-site.xml with webhdfs attributes;
I import the webhdfs and requests
qq 2017100418110

how to use this tool ???

1、down the geoprocessing-tools-for-hadoop-master.zip and unzip;
2、add the HadoopTools.pyt to the box;

but all of the tools show a problem: and not to use?

the python error

Traceback (most recent call last):
File "", line 205, in getParameterInfo
File "d:\arcgis\arcgisinfo10.2\desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 286, in init
setattr(self, attrib, attribvalue)
File "d:\arcgis\arcgisinfo10.2\desktop10.2\arcpy\arcpy\arcobjects_base.py", line 89, in _set
return setattr(self._arc_object, attr_name, cval(val))
ValueError: ParameterObject: DataType \u5c5e\u6027\u7684\u8f93\u5165\u503c\u65e0\u6548

esri hive JSON serde not found

Following along in spark at https://github.com/geoHeil/spatial-heatmaps/tree/master/esri
the JSON serde is not found
ClassNotFoundException: Class com.esri.hadoop.hive.serde.JsonSerde not found
even though:

"com.esri.hadoop" % "spatial-sdk-hive" % esriVersion,
 "com.esri.hadoop" % "spatial-sdk-json" % esriVersion,

i.e. the current master branch (2.1.0-SNAPSHOT) are on the class path. Am I missing a dependency? The base JSON Serde would be available but is not called.

The issue here probably makes more sense than at Esri/gis-tools-for-hadoop#65

Oozie Workflow Generator Tool - documentation of Execute Workflow tool

Idea: complementary to #5, or in the interim, evaluate Oozie Workflow Generator Tool for mention in the documentation of the Execute Workflow GP tool.

Background:
The documentation for Execute Workflow mentions the need for workflow XML but does not contain, nor link to, information on how to get or make the workflow XML.
The tutorials cover the other four tools but not Execute Workflow.
The trip-discovery blog article mentions Execute Workflow only in passing, opting instead for command-line invocation.
There are examples with the point-in-polygon and trip-discovery source code - is this easy enough to find?
A job-execution tool based on HCatalog instead of Oozie is proposed for easier usage. Effort might better be invested there, unless there is something that can be done substantially faster with documentation of the Oozie-based Execute Workflow tool.

Brand new- which version of Linux will work best

Hello,
We are experimenting with Hadoop and ArcGIS. We have downloaded the Apache Hadoop software (1.2.1). We can install it on either a CentOS or Ubuntu Linux box. Which will work better for us (classroom examples of ArcGIS/Hadoop) and which version of the OS. Thank you.

CopyFromHDFS result is empty json

Hi, i have some problem when use this tools,
i already check all of requirement.

  1. the data already in hadoop:
    image

  2. then this is table's decscription:
    image

  3. then using geoprocessing tools for hadoop
    image

succesfull, but got empty json

hdtool

image

please, give me some advise,
thanks

X when adding Hadoop tools in 10.2.2

ArcMap 10.2.2, English
Windows 7

Error when adding the Hadoop toolbox
image

Users md5sum matches working version on my computer (e68151f010f1c4908f18eabe91200e25 *HadoopTools.pyt)

Issue first reported on GeoNet

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.