GithubHelp home page GithubHelp logo

openaddresses / pyesridump Goto Github PK

View Code? Open in Web Editor NEW
318.0 15.0 69.0 389 KB

Scrapes an ESRI MapServer REST endpoint to spit out more generally-usable geodata.

License: MIT License

Python 100.00%

pyesridump's Introduction

esri-dump

Scrapes an Esri REST endpoint and writes a GeoJSON file.

Installation

If you just want to use the command line tool esri2geojson, the recommended way to install this package is to create a virtual environment and install it there. This method does not require that you git clone this repository and can get you up and running quickly:

virtualenv esridump
source esridump/bin/activate
pip install esridump

Usage

Command line

This module will install a command line utility called esri2geojson that accepts an Esri REST layer endpoint URL and a filename to write the output GeoJSON to:

esri2geojson https://maps.six.nsw.gov.au/arcgis/rest/services/sixmaps/MaritimePublic/MapServer/13 martime_maps.geojson

You can write to stdout by using the special output filename of - (a single dash character).

You can also pass in the --jsonlines option to write newline-separated (\n) lines of GeoJSON features, which you can then pipe into other applications.

Python module

You can use this module in your code to get GeoJSON Feature-shaped Python dicts into your code:

import json
from esridump.dumper import EsriDumper

d = EsriDumper('http://example.com/arcgis/rest/services/Layer/MapServer/1')

# Iterate over each feature
for feature in d:
    print(json.dumps(feature))

d = EsriDumper('http://example.com/arcgis/rest/services/Layer/MapServer/2')

# Or get all features in one list
all_features = list(d)

Methodology

The module will do its best to find the most efficient method of retrieving data from the Esri server, given the capabilities of the server. There are several strategies we use to get the data, described here in most to least efficient order:

resultOffset Pagination

In ArcGIS REST API version 10.3, Esri added support for pagination directly with the resultOffset and resultRecordCount parameters. Unfortunately, most servers don't support this feature because the backend SQL engine must also be configured to support it. So far, it seems that only the Esri-hosted layers support this feature reliably.

objectId Field Chunking

In ArcGIS REST API version 10.0, Esri added support for the server to return an exhaustive list of object IDs for all features in a layer. Once this list of object IDs is retrieved, we break it into chunks of maxRecordCount queries using the objectIds parameter.

objectId Statistics where-clauses

In ArcGIS REST API version 10.1, Esri added support for performing various statistical queries on the server without requiring the client to download the whole dataset. On servers that support this and don't respond to the objectIds queries, we will use a minimum and maximum statistics query to find the minimum and maximum values for the objectId column, then build chunks of where-clauses that narrow the range down to objectIds between two fenceposts.

Geometry Quadtree Queries

When a server does not support any of these methods, we'll make recursive quad-tree queries using bounding envelopes. We start with a query for the layer's entire extent. If the server returns exactly the maxRecordCount number of features, we split that extent into 4 equal rectangles and query those. If those smaller queries return maxRecordCount features, we split the rectangle again and continue until the server returns something less than the maxRecordCount.

Development

To suggest changes or improvements to this code, create a fork on Github and clone your repository locally:

git clone [email protected]:openaddresses/pyesridump.git # replace with your fork
cd pyesridump

We use Pipenv to manage dependencies for development. Make sure you have Pipenv installed and then install the dependencies for development:

pipenv install --dev
pipenv shell

Your changes to the code will be reflected when you run the esri2geojson command from within the virtual environment. You can also run (and add) tests to check that your changes didn't break anything:

nosetests

See Also

This Python module was extracted from OpenAddresses machine, which was inspired by code from koop. A similar node/JavaScript module is available in esri-dump.

pyesridump's People

Contributors

ahmednoureldeen avatar albarrentine avatar andrewharvey avatar candrsn avatar dionysio avatar fgregg avatar gangerang avatar hancush avatar iandees avatar ingalls avatar migurski avatar minicodemonkey avatar mmorley0395 avatar ramseraph avatar taxproper-bryan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyesridump's Issues

Could not retrieve a section of features: Invalid or missing input parameters.

The zoning layer: https://mapservices.phoenix.gov/arcgis/rest/services/PDD/Planning_Permit/MapServer/9/

the log:

Mar 09 20:24:43 sonder-growth app/scheduler.4484:  2018-03-10 04:24:42,791 INFO [esridump] __iter__ - Source does not support feature count 
Mar 09 20:24:43 sonder-growth app/scheduler.4484: 2018-03-10 04:24:43,091 ERROR [esridump] __iter__ - Finding max/min from statistics failed. Trying OID enumeration. 
Mar 09 20:24:43 sonder-growth app/scheduler.4484: Traceback (most recent call last): 
Mar 09 20:24:43 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 312, in __iter__ 
Mar 09 20:24:43 sonder-growth app/scheduler.4484:     (oid_min, oid_max) = self._get_layer_min_max(oid_field_name) 
Mar 09 20:24:43 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 175, in _get_layer_min_max 
Mar 09 20:24:43 sonder-growth app/scheduler.4484:     metadata = self._handle_esri_errors(response, "Could not retrieve min/max oid values") 
Mar 09 20:24:43 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 98, in _handle_esri_errors 
Mar 09 20:24:43 sonder-growth app/scheduler.4484:     ', '.join(error['details']), 
Mar 09 20:24:43 sonder-growth app/scheduler.4484: esridump.errors.EsriDownloadError: Could not retrieve min/max oid values: Unable to complete operation.  
Mar 09 20:24:43 sonder-growth app/scheduler.4484: 2018-03-10 04:24:43,397 INFO [esridump] __iter__ - Falling back to geo queries 
Mar 09 20:24:44 sonder-growth app/scheduler.4484: 2018-03-10 04:24:43,705 INFO [zoning_phoenix] start - Finished processing 
Mar 09 20:24:44 sonder-growth app/scheduler.4484: 2018-03-10 04:24:43,706 ERROR [root] load - Exception occurred during the connector (<class 'arcgis.zoning_phoenix.connector.ConnectorZoningPhoenix'>) run 
Mar 09 20:24:44 sonder-growth app/scheduler.4484: Traceback (most recent call last): 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 344, in __iter__ 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     oids = sorted(map(int, self._get_layer_oids())) 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 192, in _get_layer_oids 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     oid_data = self._handle_esri_errors(response, "Could not retrieve object IDs") 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 98, in _handle_esri_errors 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     ', '.join(error['details']), 
Mar 09 20:24:44 sonder-growth app/scheduler.4484: esridump.errors.EsriDownloadError: Could not retrieve object IDs: Invalid or missing input parameters.  
Mar 09 20:24:44 sonder-growth app/scheduler.4484: During handling of the above exception, another exception occurred: 
Mar 09 20:24:44 sonder-growth app/scheduler.4484: Traceback (most recent call last): 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "load.py", line 47, in load 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     result = getattr(connector, operation, lambda: None)() 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/src/connector_base.py", line 82, in start 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     result = self.updated() 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/src/connector_base.py", line 111, in updated 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     return self.add_items(self.client.updated()) 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/src/connector_base.py", line 128, in add_items 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     for item in items: 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/arcgis/arcgis_client.py", line 56, in updated 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     yield from self.__iter__(extra_query_args=extra_query_args) 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/arcgis/arcgis_client.py", line 19, in __iter__ 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     for item in dumper.__iter__(): 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 371, in __iter__ 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     for feature in self._scrape_an_envelope(bounds, self._outSR, page_size): 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 247, in _scrape_an_envelope 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     features = self._fetch_bounded_features(envelope, outSR) 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 213, in _fetch_bounded_features 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     features = self._handle_esri_errors(response, "Could not retrieve a section of features") 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:   File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 98, in _handle_esri_errors 
Mar 09 20:24:44 sonder-growth app/scheduler.4484:     ', '.join(error['details']), 
Mar 09 20:24:44 sonder-growth app/scheduler.4484: esridump.errors.EsriDownloadError: Could not retrieve a section of features: Invalid or missing input parameters.  

Can I provide some more debug info?

Unable to use services requiring an API Login

Where an API login page is presented the esridump tool ceases to function.
There are cases where a service is accessed using legitimate credentials, but the esridump tool does not have a facility for authentication such as either username/password, or token.

Cannot download data from a map server served by Geocortex Essentials

Geocortex Essentials is a non-ESRI product that looks and acts like ArcMap/MapServer.

The City of Naperville, Illinois, uses it to power its Your Place (zoning) map.

The zoning layer is here: http://gis.naperville.il.us/Geocortex/Essentials/REST/sites/GisViewer/map/mapservices/0/layers/20

Here's the full command and error response:

$ esri2geojson http://gis.naperville.il.us/Geocortex/Essentials/REST/sites/GisViewer/map/mapservices/0/layers/20 naperville_zoning_122417.geojson
2017-12-24 18:31:14,265 - cli.esridump - INFO - Source does not support feature count
Traceback (most recent call last):
  File "/usr/local/bin/esri2geojson", line 11, in <module>
    sys.exit(main())
  File "/Library/Python/2.7/site-packages/esridump/cli.py", line 103, in main
    feature = next(feature_iter)
  File "/Library/Python/2.7/site-packages/esridump/dumper.py", line 304, in __iter__
    oid_field_name = self._find_oid_field_name(metadata)
  File "/Library/Python/2.7/site-packages/esridump/dumper.py", line 156, in _find_oid_field_name
    if f['type'] == 'esriFieldTypeOID':
KeyError: 'type'

Use case: Script didn't grab all features of this map layer

Industrial Growth Zones, hosted on the Cook County ArcGIS server

http://gis1.cookcountyil.gov/arcgis/rest/services/EconDevSrvc_IndustrialGrowthZones/MapServer/1

The script grabs only 5 out of the 7 features.

In the terminal the script responds:

 cli.esridump - INFO - Built 1 requests using OID where clause method

Do you know how I can get the other 2 features?

Source doesn't work with `--proxy`

This command fails:

(esridump) [step9581@SHess pyesridump (master)]$ esri2geojson --proxy=http://map.nca.by/proxy.php http://arcgisserver:8399/arcgis/rest/services/ADDRESS_NEW/MapServer/0 belarus.geojson
Traceback (most recent call last):
  File "/Users/step9581/git/openaddresses/pyesridump/esridump/bin/esri2geojson", line 11, in <module>
    sys.exit(main())
  File "/Users/step9581/git/openaddresses/pyesridump/esridump/lib/python3.6/site-packages/esridump/cli.py", line 103, in main
    feature = next(feature_iter)
  File "/Users/step9581/git/openaddresses/pyesridump/esridump/lib/python3.6/site-packages/esridump/dumper.py", line 263, in __iter__
    metadata = self.get_metadata()
  File "/Users/step9581/git/openaddresses/pyesridump/esridump/lib/python3.6/site-packages/esridump/dumper.py", line 133, in get_metadata
    response = self._request('GET', url, params=query_args, headers=headers)
  File "/Users/step9581/git/openaddresses/pyesridump/esridump/lib/python3.6/site-packages/esridump/dumper.py", line 37, in _request
    url += '?' + urllib.urlencode(kwargs.get('params'))
AttributeError: module 'urllib' has no attribute 'urlencode'

About EsriDump and the future

Hi hi, I have some questions about how the project is focused.

Actually, the project, have two parts, one, the scrapper, second, the recovered data is transformed to geojson.

The project, is able to be used with, and without the geojson format, as written in the readme, we can use esri2geojson to get the data in geojson format, and we can use the project as a python module to use the raw data.

The scrapper of this project is great, I use it as a module, is pretty useful, while more and best info I have, I can do more things in other projects, at least, I think that is part of the idea of having a module.

I want to know to where the project goes, because the scrapper can be improved, like, parallel processing, get more data available in the map server, in order to use WKT2 change a little how to work with the SR, and any other things, support more types of data, more than geojson, recover edited times of maps, etc, etc.

I think is pretty important here where goes the project, for example, if the project will only be focused in geojson data, there will be a lot of things that will never get support, or even removed from the actual project, even the python module will become almost useless, note, that would not be a problem, in the sense that would be to where the project goes, nothing more.

I want to know, based in the perspective of the project, if this project will have what I need, or can have what I need, that leads to, if I can spend time contributing here, and not only me, knowing this, we decide what to do.

Thx.

pause_seconds and requests_to_pause as CLI params?

Hi! Is there any particular reason why the init parameters pause_seconds and requests_to_pause on EsriDumper aren't available as CLI parameters? I know they are there so that the use of the tool doesn't overwhelm the server, but I think it would be useful to allow the user to tweak them when using the CLI.

Give a better error when user enters a URL that isn't supported

Running pyesridump on this point:

http://apps.fs.fed.us/arcx/rest/services/EDW/EDW_InventoriedRoadlessAreas2001IdCo_01/MapServer

getting this back in terminal:

j.albertbowden@sunlight_desktop:~/desktop/pyesridump-master $ esri2geojson http://apps.fs.fed.us/arcx/rest/services/EDW/EDW_InventoriedRoadlessAreas2001IdCo_01/MapServer edw-inventoriedroadlessarea2001idco01  
2016-09-23 09:06:54,652 - cli.esridump - INFO - Source does not support feature count  
Traceback (most recent call last):  
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/esri2geojson", line 11, in <module>  
    load_entry_point('esridump==1.4.0', 'console_scripts', 'esri2geojson')()  
  File "build/bdist.macosx-10.6-intel/egg/esridump/cli.py", line 100, in main  
  File "build/bdist.macosx-10.6-intel/egg/esridump/dumper.py", line 244, in __iter__  
  File "build/bdist.macosx-10.6-intel/egg/esridump/dumper.py", line 129, in _find_oid_field_name  
KeyError: 'fields'

In the README.md I see that it doesn't map 100%, so I'm assuming this is data is of really bad quality, but I'm not sure and am stumped as to how to even check.
But more importantly, I'm stumped as to which direction to move in towards resolving this, if possible.
I did see esri-dump while trying to debug this, and debated using it...but I didn't want to bail on python.
tl;dr: it really seems to me that this is just old, bad data.
Any help is greatly appreciated!

Outputs `NaN,NaN` for invalid geometry centroids

While working on Orange County, NY, I tried dumping with pyesridump which worked but output invalid coordinates for geometries with a NaN,NaN centroid:

{
  "geometry": {
    "type": "Point",
    "coordinates": [
      NaN,
      NaN
    ]
  },
  "type": "Feature",
  "properties": {
    "OBJECTID": 147905,
    "CityStateZip": "MIDDLETOWN NY 10940",
    "UnitType": "",
    "StreetAddress": "88 DUNNING RD ",
    "UnitNumber": "",
    "SHAPE.fid": 146604
  }
}

I was unable to load the resulting geojson into QGIS. It would be nice if pyesridump didn't output lines that have invalid coordinates.

Issue with version when install with pip install esridump

When I installed this package using the command pip install esridump as in the readme, I ended up realizing that is was installing an older version, since I wanted to use the max_page_size param. As I was previously getting a
ERROR: EsriDumper.__init__() got an unexpected keyword argument 'max_page_size' error even though I could see in the latest version this param was available.

I had to uninstall the package, then install using the command:

pip install --upgrade https://github.com/openaddresses/pyesridump/tarball/master

to get the latest version. Just a note for anyone who encounters some issues.

option to force windowed id method instead of pagination

There are a bunch of issues here relating to server timeout etc. From testing, I've found it's faster and likely less intensive on the server, leading to less chance of a timeout by using a windowed WHERE clause on an ID instead of paginating WHERE 1=1.

I noticed the code at

if row_count is not None and (metadata.get('supportsPagination') or \
(metadata.get('advancedQueryCapabilities') and metadata['advancedQueryCapabilities']['supportsPagination'])):
# If the layer supports pagination, we can use resultOffset/resultRecordCount to paginate
# There's a bug where some servers won't handle these queries in combination with a list of
# fields specified. We'll make a single, 1 row query here to check if the server supports this
# and switch to querying for all fields if specifying the fields fails.
if query_fields and not self.can_handle_pagination(query_fields):
self._logger.info("Source does not support pagination with fields specified, so querying for all fields.")
query_fields = None
for offset in range(self._startWith, row_count, page_size):
query_args = self._build_query_args({
'resultOffset': offset,
'resultRecordCount': page_size,
'where': '1=1',
'geometryPrecision': self._precision,
'returnGeometry': self._request_geometry,
'outSR': self._outSR,
'outFields': ','.join(query_fields or ['*']),
'f': 'json',
})
page_args.append(query_args)
self._logger.info("Built %s requests using resultOffset method", len(page_args))
else:
# If not, we can still use the `where` argument to paginate

If the server supports pagination, it will use that instead of windowed ID. We should expose an option to allow the user to force windowed ID even if pagination is supported.

Source does not support feature count

Hi! I'm not sure if this is an error or just a problem the code can't yet surmount. I try the following:

(esridump) holtdwyer@CuChulainn ~/Documents/res/teevrat/get_esri % esri2geojson http://gis-gfw.wri.org/arcgis/rest/services/commodities/MapServer/12 indonesia_landcover.geojson 2021-07-13 19:38:36,369 - cli.esridump - INFO - Source does not support feature count Traceback (most recent call last): File "/Users/holtdwyer/Documents/res/teevrat/get_esri/esridump/bin/esri2geojson", line 8, in <module> sys.exit(main()) File "/Users/holtdwyer/Documents/res/teevrat/get_esri/esridump/lib/python3.9/site-packages/esridump/cli.py", line 114, in main feature = next(feature_iter) File "/Users/holtdwyer/Documents/res/teevrat/get_esri/esridump/lib/python3.9/site-packages/esridump/dumper.py", line 343, in __iter__ oid_field_name = self._find_oid_field_name(metadata) File "/Users/holtdwyer/Documents/res/teevrat/get_esri/esridump/lib/python3.9/site-packages/esridump/dumper.py", line 167, in _find_oid_field_name for f in metadata['fields']: TypeError: 'NoneType' object is not iterable

Apparently, the server is returning something that isn't iterable when asked for an enterable feature list. Is there any way to get the pull to work regardless?

Keep SR from the server

Hi hi, checking:

https://github.com/openaddresses/pyesridump/blob/master/esridump/dumper.py

Is great can change the SR, but, I think the default value, or at least as an option, is to keep the SR from the server.

Why? Any transformation, even from the server have "price", so, can keep the original data is great and transform when we want or need. There is cases like WKT2, where a new way to set CRS is out, there can be a better way to transform, here the original data is even more precious.

Thx.

Can't download this arcgis map - retry pause

Hi, I was trying to get a map, but always end in retry pause 0, 0, 1, 2, 3....

Layer: https://services3.arcgis.com/Dhl01RVOOnbjTdY7/ArcGIS/rest/services/COBERTURAS_ADJUDICADAS_5G/FeatureServer/11

The layer is working, we can access and use it from: https://www.arcgis.com/home/webmap/viewer.html?url=https://services3.arcgis.com/Dhl01RVOOnbjTdY7/ArcGIS/rest/services/COBERTURAS_ADJUDICADAS_5G/FeatureServer&source=sd

JSON is working, we can get data with the next arguments:

https://services3.arcgis.com/Dhl01RVOOnbjTdY7/ArcGIS/rest/services/COBERTURAS_ADJUDICADAS_5G/FeatureServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry={"xmin":-8140237.764258992,"ymin":-4461476.466945019,"xmax":-8061966.24729499,"ymax":-4383204.949981019,"spatialReference":{"wkid":102100,"latestWkid":3857}}&geometryType=esriGeometryEnvelope&inSR=102100&outFields=*&returnCentroid=false&returnExceededLimitFeatures=false&maxRecordCountFactor=3&outSR=102100&resultType=tile&quantizationParameters={"mode":"view","originPosition":"upperLeft","tolerance":152.87405657031263,"extent":{"xmin":-8140237.764258992,"ymin":-4461476.466945019,"xmax":-8061966.247294992,"ymax":-4383204.949981019,"spatialReference":{"wkid":102100,"latestWkid":3857}}}

No idea why ends like this, I try checking debug info, higher timeout, the --paginate-oid, and always ends the same.

Thx.

Error when downloading layer 'Could not retrieve this chunk of objects: Failed to execute query. '

I have tried to download the layer of the following web link but always get an error that I could not identify:

http://www.secretariadeambiente.gov.co/arcgis/rest/services/MapasVisorGeo/Cal_Aire_Geo/MapServer/68

[samtux@ultrabook tmp]$ esri2geojson h
[samtux@ultrabook tmp]$ esri2geojson http://www.secretariadeambiente.gov.co/arcgis/rest/services/MapasVisorGeo/Cal_Aire_Geo/MapServer/68 prueba.geojson
2017-11-20 13:02:13,012 - cli.esridump - INFO - Built 1 requests using resultOffset method
Traceback (most recent call last):
  File "/usr/bin/esri2geojson", line 11, in <module>
    load_entry_point('esridump==1.7.0', 'console_scripts', 'esri2geojson')()
  File "/usr/lib/python2.7/site-packages/esridump/cli.py", line 103, in main
    feature = next(feature_iter)
  File "/usr/lib/python2.7/site-packages/esridump/dumper.py", line 394, in __iter__
    raise EsriDownloadError("Could not connect to URL", e)
esridump.errors.EsriDownloadError: ('Could not connect to URL', EsriDownloadError('Could not retrieve this chunk of objects: Failed to execute query. ',))
[samtux@ultrabook tmp]$ttp://www.secretariadeambiente.gov.co/arcgis/rest/services/MapasVisorGeo/Cal_Aire_Geo/MapServer/68 prueba.geojson
2017-11-20 13:02:13,012 - cli.esridump - INFO - Built 1 requests using resultOffset method
Traceback (most recent call last):
  File "/usr/bin/esri2geojson", line 11, in <module>
    load_entry_point('esridump==1.7.0', 'console_scripts', 'esri2geojson')()
  File "/usr/lib/python2.7/site-packages/esridump/cli.py", line 103, in main
    feature = next(feature_iter)
  File "/usr/lib/python2.7/site-packages/esridump/dumper.py", line 394, in __iter__
    raise EsriDownloadError("Could not connect to URL", e)
esridump.errors.EsriDownloadError: ('Could not connect to URL', EsriDownloadError('Could not retrieve this chunk of objects: Failed to execute query. ',))
[samtux@ultrabook tmp]$

This query returns 26 duplicates for every feature

esri2geojson -f ICN,TotalInjured,OInjuries,AInjuries,BInjuries,CInjuries,CrashInjurySeverity,IsHitAndRun,ContribCausePrim,ContribCauseSec,CrashReportCity,CrashDateTimeText,TotalFatals,FunctionalClassCIS,TypeOfFirstCrash,IsAnyCitation,CrashVehicleCount,AgencyCrashReportNo,IsAlcoholRelated,CISCrashID -p "where=CrashReportCity%3D%27Chicago%27" http://ags10s1.dot.illinois.gov/ArcGIS/rest/services/SafetyPortal/SafetyPortal/MapServer/12 idotcrashes5.geojson
cli.esridump - INFO - Built 26 requests using OID enumeration method

This will return 26 records for each of 1,000 features. This should only return 12,552 unique records, according to a simple count using this ArcGIS server's web interface.

CISCrashID is the ObjectID for this table and this screenshot shows how this one crash ID appears 26 times in the completed GeoJSON file.

screenshot 2017-05-20 14 28 35

Feature request: download and name all layers?

If I understand correctly, this tool only downloads one layer at a time from an ESRI server. It'd be handy to expand the tool to be able to download all layers on a service if pointed at the root URL. The only subtle trick is it'd be nice to name the layers based on the metadata from the server; the naive implementation might just name things with the layer numbers.

Download Individual Features

Context

I've found a couple sources like this one

https://gis3.gworks.com/arcgis/rest/services/Golden_Valley_County_ND_PZ/MapServer/1

In which the Query API does not return the geometry as requested (There is lat/lng in the properties). However upon accessing the feature directory:

https://gis3.gworks.com/arcgis/rest/services/Golden_Valley_County_ND_PZ/MapServer/1/100

The geometry is returned as expected.

cc/ @iandees Thoughts on this one?

esri2geojson Feature server 10.7 returns min max object ids as floats causing TypeError

It appears that the rest api now (v10.7) returns the min/max objectsid as float value. This will causes a type error as the code is expecting integers

(geo_venv) C:\Users\kt01\PycharmProjects\esridump_test>esri2geojson -v https://ws.lioservices.lrc.gov.on.ca/arcgis1071a/rest/services/Access_Environment/Access_Environment_Map/MapServer/0 results.geojson
2020-03-27 12:01:04,055 - cli.esridump - DEBUG - GET https://ws.lioservices.lrc.gov.on.ca/arcgis1071a/rest/services/Access_Environment/Access_Environment_Map/MapServer/0, args {'f': 'json'}
2020-03-27 12:01:04,509 - cli.esridump - DEBUG - GET https://ws.lioservices.lrc.gov.on.ca/arcgis1071a/rest/services/Access_Environment/Access_Environment_Map/MapServer/0/query, args {'returnCountOnly': 'tru
e', 'where': '1=1', 'f': 'json'}
2020-03-27 12:01:04,664 - cli.esridump - DEBUG - GET https://ws.lioservices.lrc.gov.on.ca/arcgis1071a/rest/services/Access_Environment/Access_Environment_Map/MapServer/0/query, args {'outStatistics': '[{"on
StatisticField":"OBJECTID","statisticType":"min","outStatisticFieldName":"THE_MIN"},{"onStatisticField":"OBJECTID","statisticType":"max","outStatisticFieldName":"THE_MAX"}]', 'outFields': '', 'f': 'json'}
2020-03-27 12:01:04,835 - cli.esridump - DEBUG - GET https://ws.lioservices.lrc.gov.on.ca/arcgis1071a/rest/services/Access_Environment/Access_Environment_Map/MapServer/0/query, args {'returnIdsOnly': 'true'
, 'where': 'OBJECTID = 960653393.0 OR OBJECTID = 962040305.0', 'f': 'json'}
Traceback (most recent call last):
File "c:\python27\Lib\runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "c:\python27\Lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Users\kt01\python_venvs\geo_venv\Scripts\esri2geojson.exe_main
.py", line 9, in
File "c:\users\kt01\python_venvs\geo_venv\lib\site-packages\esridump\cli.py", line 108, in main
feature = next(feature_iter)
File "c:\users\kt01\python_venvs\geo_venv\lib\site-packages\esridump\dumper.py", line 345, in iter
for page_min in range(oid_min - 1, oid_max, page_size):
TypeError: range() integer end argument expected, got float.

Example Min / Max Query

esridump.errors.EsriDownloadError: Could not retrieve a section of features: Unable to complete operation. Unable to complete Query operation.

Hi! Thank you for your great work :) I have this problem with a mapserver. Can you help me please?

robsalasco@leucippus:~/Desktop/FONDECYT_ENVIRONMENT/WMS_CONAF$ esri2geojson http://mapaforestal.infor.cl/ArcGIS/rest/services/20160803_pl_apl/MapServer/6 20160803_pl_apl_6.geojson
2017-02-09 11:43:33,867 - cli.esridump - INFO - Source does not support feature count
Traceback (most recent call last):
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/dumper.py", line 243, in __iter__
    row_count = self.get_feature_count()
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/dumper.py", line 124, in get_feature_count
    count_json = self._handle_esri_errors(response, "Could not retrieve row count")
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/dumper.py", line 76, in _handle_esri_errors
    ', '.join(error['details']),
esridump.errors.EsriDownloadError: Could not retrieve row count: Unable to complete  operation. Unable to complete Query operation.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/robsalasco/.pyenv/versions/3.5.2/bin/esri2geojson", line 9, in <module>
    load_entry_point('esridump==1.5.0', 'console_scripts', 'esri2geojson')()
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/cli.py", line 100, in main
    feature = next(feature_iter)
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/dumper.py", line 256, in __iter__
    for feature in self._scrape_an_envelope(bounds, self._outSR, page_size):
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/dumper.py", line 222, in _scrape_an_envelope
    features = self._fetch_bounded_features(envelope, outSR)
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/dumper.py", line 188, in _fetch_bounded_features
    features = self._handle_esri_errors(response, "Could not retrieve a section of features")
  File "/Users/robsalasco/.pyenv/versions/3.5.2/lib/python3.5/site-packages/esridump/dumper.py", line 76, in _handle_esri_errors
    ', '.join(error['details']),
esridump.errors.EsriDownloadError: Could not retrieve a section of features: Unable to complete  operation. Unable to complete Query operation.

HTTP read timeout too short for New Jersey source

2016-10-19 11:14:44,848 - cli.esridump - INFO - Built 2950 requests using resultOffset method
Traceback (most recent call last):
  File "/usr/local/bin/esri2geojson", line 9, in <module>
    load_entry_point('esridump==1.4.1', 'console_scripts', 'esri2geojson')()
  File "build/bdist.linux-x86_64/egg/esridump/cli.py", line 93, in main
  File "build/bdist.linux-x86_64/egg/esridump/dumper.py", line 365, in __iter__
esridump.errors.EsriDownloadError: ('Could not connect to URL', ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='geodata.state.nj.us', port=80): Read timed out. (read timeout=30)",),))

is it possible to increase read timeout=30 from the CLI?

Could not retrieve a section of features: Cannot perform query. Invalid query parameters. Unable to perform query. Please check your parameters.

I'm trying to get the zoning data off of https://services.arcgis.com/g1fRTDLeMgspWrYp/arcgis/rest/services/Zoning/FeatureServer/0/

(it's listed as a source on http://opendata-cosagis.opendata.arcgis.com/datasets/cosa-zoning). The whole log says:

Mar 09 20:18:31 sonder-growth app/scheduler.4484:  2018-03-10 04:18:30,860 INFO [esridump] __iter__ - Source does not support feature count 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:  2018-03-10 04:18:31,037 ERROR [esridump] __iter__ - Finding max/min from statistics failed. Trying OID enumeration. 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:  Traceback (most recent call last): 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 312, in __iter__ 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:      (oid_min, oid_max) = self._get_layer_min_max(oid_field_name) 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 175, in _get_layer_min_max 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:      metadata = self._handle_esri_errors(response, "Could not retrieve min/max oid values") 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 98, in _handle_esri_errors 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:      ', '.join(error['details']), 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:  esridump.errors.EsriDownloadError: Could not retrieve min/max oid values:  Unable to perform query. Please check your parameters. 
Mar 09 20:18:31 sonder-growth app/scheduler.4484:  2018-03-10 04:18:31,399 INFO [esridump] __iter__ - Falling back to geo queries 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:  2018-03-10 04:18:31,753 INFO [san_antonio_zoning] start - Finished processing 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:  2018-03-10 04:18:31,754 ERROR [root] load - Exception occurred during the connector (<class 'arcgis.san_antonio_zoning.connector.ConnectorSanAntonioZoning'>) run 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:  Traceback (most recent call last): 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 344, in __iter__ 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      oids = sorted(map(int, self._get_layer_oids())) 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 195, in _get_layer_oids 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      raise EsriDownloadError("Server doesn't support returnIdsOnly") 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:  esridump.errors.EsriDownloadError: Server doesn't support returnIdsOnly 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:  During handling of the above exception, another exception occurred: 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:  Traceback (most recent call last): 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "load.py", line 47, in load 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      result = getattr(connector, operation, lambda: None)() 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/src/connector_base.py", line 82, in start 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      result = self.updated() 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/src/connector_base.py", line 111, in updated 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      return self.add_items(self.client.updated()) 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/src/connector_base.py", line 128, in add_items 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      for item in items: 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/arcgis/arcgis_client.py", line 56, in updated 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      yield from self.__iter__(extra_query_args=extra_query_args) 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/arcgis/arcgis_client.py", line 19, in __iter__ 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      for item in dumper.__iter__(): 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 371, in __iter__ 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      for feature in self._scrape_an_envelope(bounds, self._outSR, page_size): 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 247, in _scrape_an_envelope 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      features = self._fetch_bounded_features(envelope, outSR) 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 213, in _fetch_bounded_features 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      features = self._handle_esri_errors(response, "Could not retrieve a section of features") 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:    File "/app/.heroku/python/lib/python3.6/site-packages/esridump/dumper.py", line 98, in _handle_esri_errors 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:      ', '.join(error['details']), 
Mar 09 20:18:32 sonder-growth app/scheduler.4484:  esridump.errors.EsriDownloadError: Could not retrieve a section of features: Cannot perform query. Invalid query parameters. Unable to perform query. Please check your parameters. 

Unable to dump source

Trying to dump https://gismaps.sedgwickcounty.org/arcgis/rest/services/Map/Op_SiteAddress_Dynamic_SP/MapServer/0 returns this error:

2017-03-01 11:44:36,011 - cli.esridump - ERROR - Could not parse response from https://gismaps.sedgwickcounty.org/arcgis/rest/services/Map/Op_SiteAddress_Dynamic_SP/MapServer/0/query?returnCountOnly=true&where=1%3D1&f=json as JSON:

<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. Please consult with your administrator.<br><br>Your support ID is: 11833783245905056836</body></html>

How does one go about dumping a source that is locked down tightly like this one? Is it possible to do with pyesridump as-is? If so, could documentation be added so people who aren't well-versed in arcgis/esri/whatever-term can try different approaches to dealing with problematic servers (this is my situation)?

How do you send parameters in command line?

esri2geojson -p "outFields=PIN14" http://gis2.cookcountyil.gov/arcgis/rest/services/cookVwrDynmcCondo/MapServer/44 condos_pins.geojson

returns:

Traceback (most recent call last):
  File "/usr/local/bin/esri2geojson", line 9, in <module>
    load_entry_point('esridump==1.1.1', 'console_scripts', 'esri2geojson')()
  File "/Library/Python/2.7/site-packages/esridump/cli.py", line 63, in main
    params = _collect_params(args.params)
  File "/Library/Python/2.7/site-packages/esridump/cli.py", line 22, in _collect_params
    params.update(dict(urllib.parse.parse_qsl(string)))
AttributeError: 'module' object has no attribute 'parse'

unable to download data from one specific ayer

unable to download data from one layer (https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/4)
but works fine for
(https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/0)
(https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/1)
(https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/2)
(https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/3)
Here is the error

azimshaik91@cloudshell:~$ esri2geojson https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/4 test.geojson
2020-08-13 00:08:38,690 - cli.esridump - WARNING - Retrying https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/4 without SSL verification
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gis.ohiodnr.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
2020-08-13 00:08:38,949 - cli.esridump - WARNING - Retrying https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/4/query without SSL verification
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gis.ohiodnr.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
2020-08-13 00:09:01,274 - cli.esridump - INFO - Built 6262 requests using resultOffset method
2020-08-13 00:09:01,408 - cli.esridump - WARNING - Retrying https://gis.ohiodnr.gov/arcgis_site2/rest/services/OIT_Services/odnr_landbase_v2/MapServer/4/query without SSL verification
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gis.ohiodnr.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
Traceback (most recent call last):
  File "/home/azimshaik91/.local/bin/esri2geojson", line 10, in <module>
    sys.exit(main())
  File "/home/azimshaik91/.local/lib/python2.7/site-packages/esridump/cli.py", line 114, in main
    feature = next(feature_iter)
  File "/home/azimshaik91/.local/lib/python2.7/site-packages/esridump/dumper.py", line 427, in __iter__
    raise EsriDownloadError("Could not connect to URL", e)
esridump.errors.EsriDownloadError: ('Could not connect to URL', ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='gis.ohiodnr.gov', port=443): Read timed out. (read timeout=30)",),))

More info:
(https://stackoverflow.com/questions/63388162/esri2geojson-could-not-connect-to-url-for-a-specific-layer-but-works-for-all-o)

Unable to get untransformed ESRI JSON data

Hi there, it seems that the dumper automatically converts the ESRI JSON to GeoJSON when iterating through the features (https://github.com/openaddresses/pyesridump/blob/master/esridump/dumper.py#L505). It would be great if the iterator could could be configured to return ESRI JSON in place of GeoJSON. Basically, this could be a configuration parameter that yields the JSON as is, if enabled, and defaults to calling esri2geojson if not. Could put together a PR for this if if this is something that sounds useful.

[BUG] Timeout default value

Hi hi, actually the project do this:

dumper.py

timeout=None
...
self._http_timeout = timeout or 30

The None value, is a valid value for request, means, without a limit timeout, so, set the initial timeout value to None cause we can't disable it.

Thx.

Dates are converted to timestamps far in the future

In the Chicago zoning boundaries endpoint on this ArcGIS server, there are at least two esriFieldTypeDate fields that are converted to timestamps.

EDIT_DATE ( type: esriFieldTypeDate , alias: EDIT_DATE , length: 36 )
CREATE_DATE ( type: esriFieldTypeDate , alias: CREATE_DATE , length: 36 )

A sample value is 1033603200000, which is equivalent to 08/02/34723 @ 12:00am (UTC).

Is your script or ESRI mis-converting the date values?

Downloading all Map Server

Hi hi, related to #61

Downloading an entire layer is great, get all of the layers is more great, and get all the server is more!

I think would be great add te next thing, and is download with info of every service/map/layer.

About the format, we would think "lets download all in json", but I found something curious, ex, if you download the html version, it will have different info if we compare to the json version, maybe, for the info, download in all available formats and compress them in one file.

Other weird thing, the available features, like identify/query/etc, some times, the map says, "is available", but you can't do almost anything with it.

Here a very brute implementation:
https://github.com/latot/pyMapEsriDump

Thx.

Handle proxy URLs

Some OpenAddresses sources are Esri layers behind an HTTP proxy. The URL ends up looking like this:

http://map.sccmo.org/proxy/proxy.ashx?http://10.10.143.115/scc_gis/rest/services/appservices/taxinformation/MapServer/0

from this build run

This URL currently gets parsed by the requests library and turns into a request like this:

http://map.sccmo.org/proxy/proxy.ashx?http://10.10.143.115/scc_gis/rest/services/appservices/taxinformation/MapServer/0&f=json

...which doesn't work, because the f=json gets passed as a query arg to the proxy.ashx script instead of getting sent to the Esri endpoint.

Can't install via pip

This package is very useful, but is a chore to install as it's not available via pip.

It'd be great to have it published for pip install.

Sources that support returnIdsOnly but not returnCountOnly

For the Wuhan, China data source (seem to be getting all the legacy servers lately), pyesridump was only pulling 1001 records when there were 3429 in the full data set.

From this line:

except EsriDownloadError:
, it looks like if the source doesn't support returnCountOnly, the bounding box is recursively subdivided into four quadrants (Quadtree-style) with a stopping condition when there are < maxRecords in a given quadrant.

  1. This should generally retrieve everything, except for the following test in _scrape_an_envelope:
    if len(features) == max_records:
    It appears the Wuhan source returns 1001 records where max_records is 1000, which executes the same code as if it had returned 999 results i.e. assumes the base case has been met and returns early. This could be fixed by changing the conditional to:
    if len(features) >= max_records
  2. With the new OID enumeration from #33, it might make sense to use the quadrant-based method as a fallback only if the source supports neither returnCountOnly nor returnIdsOnly. Otherwise OID enumeration should be fewer queries. Does that make sense or are there some other edge cases to consider?

Large dataset timeout

When downloading a very large dataset, esri2geojson encounters this issue:

./esri2geojson https://gismaps.kingcounty.gov/arcgis/rest/services/Property/KingCo_PropertyInfo/MapServer/2 asdf.geojson 2018-05-22 12:52:15,082 - cli.esridump - INFO - Built 615 requests using OID where clause method Traceback (most recent call last): File "./esri2geojson", line 11, in <module> sys.exit(main()) File "/home/<username>/esridump/local/lib/python2.7/site-packages/esridump/cli.py", line 111, in main feature = next(feature_iter) File "/home/<username>/esridump/local/lib/python2.7/site-packages/esridump/dumper.py", line 425, in __iter__ raise EsriDownloadError("Could not connect to URL", e) esridump.errors.EsriDownloadError: ('Could not connect to URL', EsriDownloadError('https://gismaps.kingcounty.gov/arcgis/rest/services/Property/KingCo_PropertyInfo/MapServer/2/query: Could not retrieve this chunk of objects HTTP 500 <html><head><title>Apache Tomcat/7.0.57 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 500 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Exception report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The server encountered an internal error that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre>java.lang.NullPointerException\n</pre></p><p><b>note</b> <u>The full stack trace of the root cause is available in the Apache Tomcat/7.0.57 logs.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/7.0.57</h3></body></html>',))

Multiple runs of the same command download a file between 10mb and 600mb, depending on when the connection is lost. I think it would be very beneficial for esri2geojson to not exit upon this error, but to continue down the queue of batches to download.

Time out on large layers

This parcel layer has 1.846 million features.

esri2geojson http://gis2.cookcountyil.gov/arcgis/rest/services/cookVwrDynmcCondo/MapServer/44 condos.geojson
cli.esridump - INFO - Built 1846 requests using OID where clause method
Traceback (most recent call last):
  File "/usr/local/bin/esri2geojson", line 9, in <module>
    load_entry_point('esridump==1.1.1', 'console_scripts', 'esri2geojson')()
  File "/Library/Python/2.7/site-packages/esridump/cli.py", line 88, in main
    feature = feature_iter.next()
  File "/Library/Python/2.7/site-packages/esridump/dumper.py", line 360, in __iter__
    raise EsriDownloadError("Could not connect to URL", e)
esridump.errors.EsriDownloadError: ('Could not connect to URL', ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='gis2.cookcountyil.gov', port=80): Read timed out. (read timeout=30)",),))

It cuts out at 226 MB after 1 minute. I'm guessing the final version will be nearly 1 GB.

`fields` parameter never works with pagination

Overview

Hey, thanks for the great package!

I'm trying to grab data off an esri mapserver API using the EsriDumper class, and only want to grab a few fields (hopefully speeding up download speed and decreasing file size).

What I expected to happen

When I use list(EsriDumper(url, fields=['OBJECTID'])), I get back geometries with only the OBJECTID properties.

What actually happens

I get all of the properties regardless of the fields setting.

Explanation

I noticed that even when I set fields=['OBJECTID'], I was getting back every field anyways. The cause is line 127 in dumper.py:

return data.get('error') and data['error']['message'] != "Failed to execute query."

I think the intention is to return False when the response has an error field and it doesn't have that exact message, and return True in all other cases.

What actually happens is that when there is no error, data.get('error') evaluates to None, and this is then the return value of can_handle_pagination. This gets coerced into a boolean later on in the __iter__ method:

if query_fields and not self.can_handle_pagination(query_fields):

So, the actual logic is that it will only think it can handle pagination when it receives an error other than "Failed to execute query".

Solution

I think this achieves the intended logic in can_handle_pagination:

if 'error' in data and 'message' in data['error']:
    return data['error']['message'] != "Failed to execute query."
return True

page_size is being limited to 1000 for servers which support maxRecordCount >1000

The default maxRecordCount value can be values larger than 1000, with 2000 being another default value.
The current implemetation of dumper.py appears to use a page_size value which is the minimum of 1000 and the layers maxRecordCount. This means when for layers with a maxRecordCount of 2000, requests are still only made in groups of 1000 records.

Is there a reason for this limiting, or could it be changed to the maximum of 1000 and the layer maxRecordCount?

Relevant line being:
page_size = min(1000, metadata.get('maxRecordCount', 500))

Not all features are coming through

I'm trying to pull features from here. There are a little over 100k records in the DB. However, when I run the esri2json tool, and then run an ogrinfo on the resulting json, I'm only seeing 56k records. I turned on the verbose flag for esri2json, and saw that it is properly enumerating all the way up to the max (100369), but doesn't seem to actually pull down all of the records.

NoneType is not iterable, source does not support feature count

Hi hi, here a weird map (?

esri2geojson https://geografia.ine.cl/server/rest/services/Agropecuario/Marco_Maestro_Agro/MapServer/0 test.geojson
2022-03-25 11:35:46,407 - cli.esridump - INFO - Source does not support feature count
Traceback (most recent call last):
  File "/home/user/Documentos/git/pyesridump/esridump/bin/esri2geojson", line 8, in <module>
    sys.exit(main())
  File "/home/user/Documentos/git/pyesridump/esridump/lib/python3.9/site-packages/esridump/cli.py", line 114, in main
    feature = next(feature_iter)
  File "/home/user/Documentos/git/pyesridump/esridump/lib/python3.9/site-packages/esridump/dumper.py", line 354, in __iter__
    oid_field_name = self._find_oid_field_name(metadata)
  File "/home/user/Documentos/git/pyesridump/esridump/lib/python3.9/site-packages/esridump/dumper.py", line 178, in _find_oid_field_name
    for f in metadata['fields']:
TypeError: 'NoneType' object is not iterable

Thx!

Request 838 of 885 timed out, would you like to [A]bort or [S]kip or [R]etry and continue?

Let me start by giving BIG thanks for a most useful tool! THANKS!

The attached Traceback is of a timeout that occured after a LONG all night dump
(request 838 out of 885 requests using resultOffset method).

Might be a good idea to catch this and turn it into a user prompt, something like:

  • "Request 838 of 885 timed out, would you like to [A]bort or [S]kip or [R]etry and continue?"

preferably with a default TimeOutRetry=3 ( or more general FailRetry ) and a flag argument to override it.

Another helpful aid in this and similar situations (Just had a "similar"situation with "socket.gaierror: [Errno 11002] getaddrinfo failed") can be to expose the --resultOffset so it can restart an aborted download at the offset last reported by -v or, even better, reported by the exception handler.

What do you think?

Thanks!

Traceback (most recent call last):
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "c:\python37\Lib\http\client.py", line 1321, in getresponse
    response.begin()
  File "c:\python37\Lib\http\client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "c:\python37\Lib\http\client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "c:\python37\Lib\socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\adapters.py", line 445, in send
    timeout=timeout
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\util\retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
    raise value
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='***XXX***', port=80): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\dumper.py", line 418, in __iter__
    response = self._request('POST', query_url, headers=headers, data=query_args)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\dumper.py", line 43, in _request
    return requests.request(method, url, timeout=self._http_timeout, **kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\adapters.py", line 526, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='www***XXX***', port=80): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

2018-07-09 05:22:02,786 - cli.esridump - DEBUG - POST http://www.***XXX***/MapServer/18/query, args {'resultOffset': 838000, 'resultRecordCount': 1000, 'where': '1=1', 'geometryPrecision': 7, 'returnGeometry': True, 'outSR': '4326', 'outFields': '*', 'f': 'json'}

Traceback (most recent call last):
  File "c:\python37\Lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\python37\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\xampp\htdocs\Data\Snippets\esridump\Scripts\esri2geojson.exe\__main__.py", line 9, in <module>
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\cli.py", line 111, in main
    feature = next(feature_iter)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\dumper.py", line 425, in __iter__
    raise EsriDownloadError("Could not connect to URL", e)
esridump.errors.EsriDownloadError: ('Could not connect to URL', ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='www.***XXX***', port=80): Read timed out. (read timeout=30)")))

Query by geography

Hi @iandees,

I have a project where I need to query a server by an area (I need to find all the blockgroups within a school attendance boundary). Is this something you would be interested in having in the library?

Best,

Forest

Document CLI headers arg

I noticed the CLI headers arg isn't covered by the README and just wanted to make a note. (I'm using Node esri-dump but using this code as a reference โ€” very helpful ๐Ÿ‘)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.