bcgov / designatedlands Goto Github PK

Python script to combine conservation related spatial data from many sources to create a single 'Designated Lands' layer for British Columbia

License: Apache License 2.0

Python 78.40% Shell 5.86% PLpgSQL 15.74%

postgis python env rstats r data-science

designatedlands's Introduction

designatedlands

Combine spatial data for 40+ designations that contribute to land management to create a single 'Designated Lands' layer for British Columbia. Land designations are categorized according to the industries to which they apply (forestry, oil and gas, and mining), and summarized according to the level of restriction that applies to each industry (Full, High, Medium, Low, None). Overlaps are removed such that areas with overlapping designations are assigned to the highest category.

This is an updated version of the code, currently being run with the included config_202-10-08.cfg - this sets the resolution of the raster outputs to 25m.

A complete run of the previous version of the tool was completed May 2018, and the results are reported on Environmental Reporting BC, with the data available throught the BC Data Catalogue.

Requirements

Python >=3.7
GDAL (with ogr2ogr available at the command line) (tested with GDAL 3.0.2)
a PostGIS enabled PostgreSQL database (tested with PostgreSQL 13, scripts require PostGIS >=3.1/Geos >=3.9)
for the raster processing, a relatively large amount of RAM (tested with 64GB at 10m resolution, 16GB at 25m resolution)

Optional

conda for managing Python requirements
Docker for easy installation of PostgreSQL/PostGIS

Installation (with conda and Docker)

This pattern should work on most OS.

Install Anaconda or miniconda
Open a conda command prompt

Clone the repository and navigate to the project folder:

 $ git clone https://github.com/bcgov/designatedlands.git
 $ cd designatedlands

Create and activate a conda enviornment for the project using the supplied environment.yml:
```
 $ conda env create -f environment.yml
 $ conda activate designatedlands
```
Download and install Docker using the appropriate link for your OS:
- MacOS
- Windows
Get a postgres docker image with PostGIS >=3.1 and GEOS >=3.9:
```
 $ docker pull postgis/postgis:14-3.2
```
Run the container, create the database, add required extensions (note: you will have to change the line continuation characters from \ to ^ if running the job in Windows):
```
 $ docker run --name dlpg \
   -e POSTGRES_PASSWORD=postgres \
   -e POSTGRES_USER=postgres \
   -e PG_DATABASE=designatedlands \
   -p 5433:5432 \
   -d postgis/postgis:14-3.2
 $ psql -c "CREATE DATABASE designatedlands" postgres
 $ psql -c "CREATE EXTENSION postgis"
 $ psql -c "CREATE EXTENSION intarray"
```
Running the container like this:
- allows you to connect to it on port 5433 from localhost or 127.0.0.1
- names the container dlpg
Note that designatedlands.py uses the above database credentials as the default. If you need to change these (for example, changing the port to avoid conflicting with a system installation), modify the db_url parameter in the config file you supply to designatedlands (see below).

As long as you don't remove this container, it will retain all the data you put in it. If you have shut down Docker or the container, you can start it up again with this command:
```
   $ docker start dlpg
```

Usage

First, modify the sources_designations.csvand sources_supporting.csv files as required. These files define all designation data sources to be processed and how the script will process each source. See below for a full description of these files and how they defines the various data sources.

If any data sources are specified as manual downloads in the source csv files, download the data to the source_data folder (or optionally to the folder identified by the source_data key in the config file)

Using the designatedlands.py command line tool, load and process all data then dump the results to .tif/geopackage:

$ python designatedlands.py download
$ python designatedlands.py preprocess
$ python designatedlands.py process-vector
$ python designatedlands.py process-raster
$ python designatedlands.py dump

See the --help for more options:

$ python designatedlands.py --help
Usage: designatedlands.py [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  cleanup          Remove temporary tables
  download         Download data, load to postgres
  dump             Dump output tables to file
  overlay          Intersect layer with designatedlands and write to GPKG
  preprocess       Create tiles layer and preprocess sources where required
  process-raster   Create raster designation/restriction layers
  process-vector   Create vector designation/restriction layers
  test-connection  Confirm that connection to postgres is successful

For help regarding an individual command:

$ python designatedlands.py download --help
Usage: designatedlands.py download [OPTIONS] [CONFIG_FILE]

  Download data, load to postgres

Options:
  -a, --alias TEXT  The 'alias' key for the source of interest
  --overwrite       Overwrite any existing output, force fresh download
  -v, --verbose     Increase verbosity.
  -q, --quiet       Decrease verbosity.
  --help            Show this message and exit.

sources csv files

The files sources_designations.csv and sources_supporting.csv define all source layers and how they are processed. Edit these tables to customize the analysis. Columns are noted below. All columns are present in sources_designations.csv, but designation/process_order/restriction columns are not included in sources_supporting.csv - but the remaining column definitions are identical. Note that order of rows in these files is not important, order your designations by populating the process_order column with integer values. Do not include a process_order integer for designations that are to be excluded (exclude = T)

COLUMN	DESCRIPTION
process_order	An integer defining the order in which to overlay layers.
exclude	A value of `T` will exclude the source from all operations
manual_download	A value of `T` indicates that a direct download url is not available for the data. Download these sources manually to the downloads folder and ensure that value given for file_in_url matches the name of the file in the download folder
name	Full name of the designated land category
designation	A unique underscore separated value used for coding the various designated categories (eg `park_provincial`)
source_id_col	The column in the source data that defines the unique ID for each feature
source_name_col	The column in the source data that defines the name for each feature
forest_restriction	Level of restriction for the designation, forestry related activities (`Full`, `High`, `Medium`, `Low`, `None`)
og_restriction	Level of restriction for the designation, oil and gas related activities (`Full`, `High`, `Medium`, `Low`, `None`)
mine_restriction	Level of restriction for the designation, mine related activities (`Full`, `High`, `Medium`, `Low`, `None`)
url	Download url for the data source
file_in_url	Name of the file of interest in the download from specified url. Not required for BCGW downloads.
layer_in_file	For downloads of multi-layer files. Not required for BCGW downloads
query	A query defining the subset of data of interest from the given file/layer (CQL for BCGW sources, SQLITE dialect for other sources). If it is a BCGW source (ie. url starts with https://catalogue.gov.bc.ca/) and you want to include the current date in the query (eg., current query for 'Designated Areas'), put a `'{currdate}'` placeholder in the query (e.g., `RETIREMENT_DATE > '{currdate}'`). The placeholder will be replaced by the current date when the script runs.
metadata_url	URL for metadata reference
info_url	Background/info url in addtion to metadata (if available)
preprocess_operation	Pre-processing operation to apply to layer (`clip` and `union` are the only supported operations)
preprocess_args	Argument(s) to passs to preprocess_operation . `clip` requires a layer to clip by and `union` requires column(s) to aggregate by. For example, to clip a source by the Muskwa-Kechika Management Area boundary, set preprocess_operation = `clip` and preprocess_args = `mk_boundary`
notes	Misc notes related to layer
license	The license under which the data is distrubted.

Supported formats / data validation

Non-BCGW data loads from url will generally work for any vector format that is supported by GDAL. Note however:

shapefile urls must be zip archives that include all required shapefile sidecar files (dbf, prj, etc)
designatedlands presumes that input data are valid and of types POLYGON or MULTIPOLYGON

Configuration

If required, you can modify the general configuration of designatedlands when running the commands above by supplying the path to a config file as a command line argument. Note that the config file does not have to contain all parameters, you only need to include those where you do not wish to use the default values.

An example configuration file is included designateldands_sample_config.cfg, listing all available configuration parameters, setting the raster resolution to 25m, and using only 4 cores.

When using a configuration file, remember to specify it each time you use designatedlands.py, for example:

$ python designatedlands.py download designatedlands_sample_config.cfg
$ python designatedlands.py preprocess designatedlands_sample_config.cfg
$ python designatedlands.py process-vector designatedlands_sample_config.cfg
$ python designatedlands.py process-raster designatedlands_sample_config.cfg
$ python designatedlands.py dump designatedlands_sample_config.cfg

KEY	VALUE
`source_data`	path to folder that holds downloaded datasets
`sources_designations`	path to csv file holding designation data source definitions
`sources_supporting`	path to csv file holding supporting data source definitions
`out_path`	path to write output .gpkg and tiffs
`db_url`	SQLAlchemy connection URL pointing to the postgres database. The port specified in the url must match the port your database is running on - default is 5433.
`resolution`	resolution of output geotiff rasters (m)
`n_processes`	Input layers are broken up by tile and processed in parallel, define how many parallel processes to use. (default of -1 indicates number of cores on your machine minus one)

Vector outputs

The designatedlands.py dump command writes two layers to output geopackage outputs/designatedlands.gpkg:

1. `designations_overlapping`

Each individual designation polygon is clipped to the terrestrial boundary of BC, repaired if necessary, then loaded to this layer otherwise unaltered. Where designations overlap, output polygons will overlap. Overlaps occur primarily between different designations, but are also present within the same designation. See the following table for the structure of this layer:

designations_overlapping_id	process_order	designation	source_id	source_name	forest_restriction	og_restriction	mine_restriction	map_tile
1	1	private_conservation_lands_admin	10001	Arrow Lakes (ACQ)	3	1	1	082K011

2. `designations_planarized`

Above designations_overlapping is further processed to remove overlaps and create a planarized output. Where overlaps occur, they are noted in the attributes as semi-colon separated values. For example, a polygon where a uwr_no_harvest designation overlaps with a land_act_reserves_17 designation will have values like this:

designation	source_id	source_name	forest_restrictions	mine_restrictions	og_restrictions
`uwr_no_harvest;land_act_reserves_17`	`137810341;964007`	`u-3-005;SEC 17 DESIGNATED USE AREA`	`4;0`	`2;1`	`0;0`

The output restriction columns (forest_restriction_max,mine_restriction_max,og_restriction_max) are assigned the value of the highest restriction present within the polygon for the given restriction type.

QA tables

Area totals for this layer are checked. To review the checks, see the tables in the postgres db:

qa_compare_outputs - reports on total area of each designation and the difference between designations_overlapping and designations_planarized. Any differences should be due to same source overlaps.
qa_summary - check that the total area of designations_overlaps matches total area of BC and check restriction areas.
qa_total_check - check that the total for each restriction class adds up to the total area of BC

To connect to the database, you must do so via the host and port configured (localhost & 5433 by default), using the correct parameters (db name and credentials as described above). You can connect through any frontend database application (e.g., pgAdmin, dBeaver), GIS (e.g., QGIS), or the command line tool psql:

$ psql -p 5433 designatedlands

If you are connecting via the psql command line tool, once you have connected you would run a SQL query such as:

SELECT * FROM qa_compare_outputs ORDER BY pct_diff;

If you want to save the qa outputs to a file you can run something like this:

\copy (SELECT * FROM qa_compare_outputs ORDER BY pct_diff) TO outputs/qa_compare_outputs.csv CSV HEADER;
\copy (SELECT * FROM qa_summary) TO outputs/qa_summary.csv CSV HEADER;
\copy (SELECT * FROM qa_total_check) TO outputs/qa_total_check.csv CSV HEADER;

Raster outputs

Four output rasters are created:

designatedlands.tif - output designations. In cases of overlap, the designation with the highest process_order is retained
forest_restriction.tif - output forest restriction levels
mine_restriction.tif - output mine restriction levels
og_restriction.tif - output oil and gas restriction levels

Raster attribute tables are available for each tif.

Overlay

In addition to creating the output designated lands layer, this tool also provides a mechanism to overlay the results with administration or ecological units of your choice:

$ python designatedlands.py overlay --help
Usage: designatedlands.py overlay [OPTIONS] IN_FILE OUT_FILE [CONFIG_FILE]

  Intersect layer with designatedlands and write to GPKG

Options:
  -l, --in_layer TEXT     Name of input layer
  -nln, --out_layer TEXT  Name of output layer
  -v, --verbose           Increase verbosity.
  -q, --quiet             Decrease verbosity.
  --help                  Show this message and exit.

For example, to overlay designatedlands with BC ecosections, first get ERC_ECOSECTIONS_SP.gdb from here, then run the following command to create output dl_eco.gpkg/eco_overlay:

$ python designatedlands.py overlay \
    ERC_ECOSECTIONS_SP.gdb \
    dl_eco.gpkg \
    --in_layer WHSE_TERRESTRIAL_ECOLOGY_ERC_ECOSECTIONS_SP \
    --out_layer eco_overlay

Aggregate output layers with Mapshaper

As a part of data load, designatedlands dices all inputs into BCGS 1:20,000 map tiles. This speeds up processing significantly by enabling efficient parallel processing and limiting the size/complexity of input geometries. However, very small gaps are created between the tiles and re-aggregating (dissolving) output layers across tiles in PostGIS is error prone. While the gaps do not have any effect on the designated lands stats, they do need to be removed for display. Rather than attempt this in PostGIS, we can aggregate outputs using the topologically enabled mapshaper tool:

If not already installed, install node (https://nodejs.org/en/) and then install mapshaper with:

npm install -g mapshaper

# mapshaper doesn't read .gpkg, convert output to shp and use mapshaper
# to snap and dissolve tiles
# requires mapshaper v0.4.72 to dissolve on >1 attribute
# use mapshaper-xl to allocate enough memory
ogr2ogr \
  designatedlands_tmp.shp \
  -sql "SELECT
         designatedlands_id as dl_id,
         designation as designat,
         bc_boundary as bc_bound,
         category,
         geom
        FROM designatedlands" \
  designatedlands.gpkg \
  -lco ENCODING=UTF-8 &&
mapshaper-xl \
  designatedlands_tmp.shp snap \
  -dissolve designat,bc_bound \
    copy-fields=category \
  -explode \
  -o designatedlands_clean.shp &&
ls | grep -E "designatedlands_tmp\.(shp|shx|prj|dbf|cpg)" | xargs rm

Do the same for the overlaps file

ogr2ogr \
  designatedlands_overlaps_tmp.shp \
  -sql "SELECT
         designatedlands_overlaps_id as dl_ol_id,
         designation as designat,
         designation_id as des_id,
         designation_name as des_name,
         bc_boundary as bc_bound,
         category,
         geom
        FROM designatedlands_overlaps" \
  designatedlands.gpkg \
  -lco ENCODING=UTF-8 &&
mapshaper-xl \
  designatedlands_overlaps_tmp.shp snap \
  -dissolve designat,des_id,des_name,bc_bound \
    copy-fields=category \
  -explode \
  -o designatedlands_overlaps_clean.shp &&
ls | grep -E "designatedlands_overlaps_tmp\.(shp|shx|prj|dbf|cpg)" | xargs rm

Results

The results of previous runs of the tool can be found on the releases page of this repository. The make_resources.sh script is used to generate the data hosted in the release.

License

Copyright 2022 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

This repository is maintained by Environmental Reporting BC. Click here for a complete list of our repositories on GitHub.

Credits

Straightforward overlay queries now practical with complex data thanks to PostGIS 3.1/GEOS 3.9.

designatedlands's People

Contributors

Stargazers

Watchers

Forkers

dheerajchand mwalds smnorris jessefraser

designatedlands's Issues

GBR name change

PROTECTED_AREA_NAME = 'GBR Forest Management Area Act'
should now be
PROTECTED_AREA_NAME = 'Great Bear Rainforest'

sorting by hierarchy, but as interger

Rather than iterate through the source csv file, the script now sorts layers by hierarchy - just to be sure things are nicely ordered.

But the result from the sort is currently 1, 11 etc.

overlay output with various admin or ecological definitions

Usage could be something like:

python conservationlands.py overlay data/bec.shp –o conservationlands_x_bec.shp
OR
python conservationlands.py overlay "data/bec.shp, data/eco.shp, data/admin_regions.shp" –o conservationlands_soup.shp

Gaps in output

There are several strange gaps in the output layer in these tiles. The area is all within the GBR, it should be filled by that or other sources.

VLI data at hillcrestgeo.ca

The hillcrestgeo.ca address used for layer vqo_preserve needs to be updated to use https.
However, the source for this data should instead be modified to the file provided in the github release if possible.

sqlalchemy, sorry, too many clients already

Running process gives this error. The issue is likely this one: smnorris/pgdata#5

$ python designatedlands.py process
.......
INFO:root:Inserting c02_park_er into designatedlands_prelim
Traceback (most recent call last):
  File "designatedlands.py", line 995, in <module>
    cli()
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "designatedlands.py", line 865, in process
    pool.map(func, tiles)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  sorry, too many clients already

Define BC and land/marine

Add a provincial boundary and land/marine definition to sources.csv and include this layer/layers as the final step in the process function.

Deal with large generalized WHA polygons for sensitive species

These are not the actual designated areas but are much larger to mask the true location for sensitive species. Best solution is probably to remove them and just report the true area in a tabular/text format

logging

Tidy logging

log download failures to file
don't show the user all info level logging (ie requests starting new connection)

support preprocessing

Support preprocessing operations like clipping inputs.

non-bcgw download duplications

The downloads function creates a new folder for each layer.
For example, VQO gets written to the downloads folder 5 times.

Function to combine all layers, keeping attributes and overlaps

Will also likely warrant a tweak to the dump function to allow dumping this instead of the dissolved layer

VQO data source

VQO data is available in BCGW, but the file size exceeds the capacity of the download service.

As a stopgap, the data is currently pulled from a manual extract posted to my web server:
http://www.hillcrestgeo.ca/outgoing/conservationlands/VLI.gdb.zip

Function to list layers in db, list fields in a layer

Similar to ogrinfo and ogrListLayers

update download_bcgw to work with latest bcdata

I've updated bcdata, removing selenium and phantom js thanks to @ateucher's guide.

There is now just a single method - download: https://github.com/smnorris/bcdata#usage

DWDS SSL certificate verify fails

The same problem occurs when using a browser, so it is problem an issue on the server end.

Unless the server is blocking me because of the dozens of requests?

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): catalogue.data.gov.bc.ca
Traceback (most recent call last):
  File "conservationlands.py", line 224, in <module>
    cli()
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "conservationlands.py", line 52, in download
    file = utils.download_bcgw(source["url"], dl_path, gdb=gdb)
  File "/Volumes/Data/Projects/env/conservationlands/utils.py", line 77, in download_bcgw
    order_id = bcdata.create_order(url, email)
  File "/Volumes/Data/Projects/geobc/bcdata/bcdata/__init__.py", line 90, in create_order
    r = requests.get(url)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 433, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

Some BCGW objects are now directly downloadable

e.g.: https://catalogue.data.gov.bc.ca/dataset/bc-parks-ecological-reserves-and-protected-areas.

Download process should use these when possible

remove pgdb dependency

The pgdb module is handy for quick development and works fine but is a dependency that isn't really necessary and the module may not be maintained. Use sqlalchemy and/or psycopg2 directly, they have developers that keep things up to date :-)

download fails on MK special wildland

Doesn't seem to be picking up that the .gdb already exists, likely a small problem in sources.csv

Traceback (most recent call last):
  File "conservationlands.py", line 194, in <module>
    GROUP BY category) as foo
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "conservationlands.py", line 52, in download
    file = utils.download_bcgw(source["url"], dl_path, gdb=gdb)
  File "/Volumes/Data/Projects/env/conservationlands/utils.py", line 85, in download_bcgw
    return os.path.join(dl_path, out_gdb)
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 177, in copytree
    os.makedirs(dst)
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 17] File exists: 'download_cache/RMP_PLAN_LEGAL_POLY_SVW.gdb'

documentation

Document installation and usage

tiles option for process function isn't handled correctly

Worked fine in testing as I was providing tile names but defaulting to province bails.

General tidy up and refactor duplicated code

Running process with and without overlaps also needlessly produces some nearly identical tables in the db

dump to shape doesn't work on windows

python conservationlands.py dump doesn’t work. I get this error:

ERROR 1: ERROR: column "category" does not exist
LINE 1: SELECT category, rollup, geom
^

But running this on the command line works just fine (and pretty fast!):

ogr2ogr -t_srs EPSG:3005 -lco OVERWRITE=YES conservation_lands.shp PG:"host=localhost user=postgres dbname=postgis password=postgres" -sql "SELECT category, rollup, geom FROM conservation_lands.conservation_lands"

And I verified in a python session that that is the string being generated, so I don’t really know what’s up there.

upgrade to latest bcdata and pgdb (now pgdata)

Mostly a minor change.

There will be more updates coming in bcdata when I figure out the changes to the download service api (see smnorris/bcdata#32) but hopefully this won't change usage of bcdata itself.

loading non bcgw gdb file to postgres fails

Minor wrinkles to the download function haven't quite been ironed out. Looks like it just needs the filename.

validate --out_table option in CLI

Supplying an --out_table option value with an upper case character will throw an error - all table names are lowercase. Clean the provided option value.

For manual downloads, add file and layer names to sources.csv

load to postgres fails on src_designated_area

Convert any and all Oracle SQL in sources.csv query column to SQLite dialect

Add CDFmm land use plan

We don't currently have the CDFmm land use plan, however, the polygons are (almost completely) also covered by other types of designations (mostly mineral reserves, OGMAs, one rec site) that are all in Resource Exclusion Areas, so the areas are accounted for in the current analysis.

URL: https://www.for.gov.bc.ca/tasb/slrp/legalobjectives/pdf/VanIsCDF/orders/CDF_signed_order_26july2010.pdf
Category: Resource Exclusion Areas
Layer metadata: https://catalogue.data.gov.bc.ca/dataset/legal-planning-objectives-current-polygon
Layer name: WHSE_LAND_USE_PLANNING.RMP_PLAN_LEGAL_POLY_SVW
Query: STRGC_LAND_RSRCE_PLAN_NAME = 'Coastal Douglas fir moist maritime (CDFmm) Biogeoclimatic Subzone Land Use Objectives Order'

process by tile

Process by tile to:

enable subdivide of data into smaller pieces without using ST_Subdivide (using ST_Subdivide in combination with ST_SnapToGrid currently leaves small gap artifacts)
enable parallel processing

Add name column to extracted data

Harvesting restrictions mapping requires that the feature name be retained in the cleaned tables. (ie PA_NAME etc).

The column to be retained can be noted by adding a name_column_source or similar to sources.csv. However, retaining this in the cleaned tables would require an additional ST_Union when things get added to the output - there are overlapping features within individual sources (with different names).

Whether the name column is retained could perhaps be a switch in the CONFIG dict. Or it could be the default behaviour - it wouldn't change the conservationlands output, just the intermediate layers which are not currently used.

_overlaps layer doesn't have the `category` field populated

I think tidy_designations() isn't working properly on the _overlaps layer, as the national park name isn't cleaned either. Possibly to do with the b prefix used in the 'preliminary output layer' but c prefix in 'preliminary overlaps layer`

create_db fails with latest pgdb

Use the pgdb function create_db instead.

add db url to config and document it in README

north edge of 20k tiles does not match official BC Boundary

The 20k grid/tile layer from DataBC https://catalogue.data.gov.bc.ca/dataset/bcgs-1-20-000-grid has a northern boundary of 60deg N.

The BC Boundary file that defines the provincial border has a northern boundary that is close but not exactly at 60degN, it is generally a few metres farther north.

Since inputs are cut by tile to speed processing, some area is lost. The effect is very minor (especially as there are few if any designated lands in this gap) but the difference is noticeable when calculating total areas.

To fix, nudge the northernmost tile boundaries up by ~250m in this step https://github.com/bcgov/designatedlands/blob/master/sql/create_tiles.sql

topology exceptions

The uwr_conditional_harvest layer probably needs some additional cleaning, adding it to output fails:

INFO:root:Inserting uwr_conditional_harvest into output
Traceback (most recent call last):
  File "conservationlands.py", line 223, in <module>
    cli()
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "conservationlands.py", line 175, in process
    db.execute(sql)
  File "/Volumes/Data/Projects/python/pgdb/pgdb/database.py", line 162, in execute
    return self._get_cursor().execute(sql, params)
  File "/usr/local/lib/python2.7/site-packages/psycopg2/extras.py", line 120, in execute
    return super(DictCursor, self).execute(query, vars)
psycopg2.InternalError: GEOSUnaryUnion: TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point 1227334.5330000001 1492096.138 at 1227334.5330000001 1492096.138

Python 3, tests

Ensure script is compatible with Python 3, add tests, consider modularizing to facilitate testing and code reuse

Remove duplicated BC parks record on import

From the metadata record: "*Please note: this dataset contains duplicate records. Filter by Protected_Lands_Name or Orc_Primary if using this dataset for analysis purposes."

https://catalogue.data.gov.bc.ca/dataset/bc-parks-ecological-reserves-and-protected-areas

Try with geopackage (GPKG) driver

The ESRI gdb driver is really finicky on Windows - you can read only with OpenFileGDB, and write only with ESRIFileGDB.

Support multi-source designations

Currently, a designation coming from several source files needs to be combined explicitly via tidy_designations.

National Parks is the only designation where this is an issue: https://github.com/bcgov/designatedlands/blob/master/designatedlands/main.py#L68

We could remove this complication and make the code a bit more straightforward by giving all the source layers for the designation the same alias in sources.csv and refactoring read_csv to simplify or remove the a/b/c column prefixes. https://github.com/bcgov/designatedlands/blob/master/designatedlands/util.py#L24

Great Bear Rainforest (GBR) Forest Management Area

Change source - data now available via BCGW, WHSE_ADMIN_BOUNDARIES.FADM_SPECIAL_PROTECTION_AREA.

Add field that lists activities that are restricted for each layer

Eg., forestry, mining, oil and gas exploration

Remove water when dumping to file

For dumping final file would just require an additional WHERE statement on bc_boundary field, but designatedlands_overlaps table doesn't have that field, so will need to ensure it's there upstream

Add regional and municipal parks?

Layer available here: https://catalogue.data.gov.bc.ca/dataset/6a2fea1b-0cc4-4fc2-8017-eaf755d516da.

Will require carefully parsing of the PARK_PRIMARY_USE column as there are many parks in here that are not natural areas (e.g., baseball diamonds etc), and many do not have a specific primary use defined.

add --force option to download()

For updates, force new download and overwriting existing table rather than using data already on file/existing table

Change category for Forest Rec Sites layer

Currently Forest Rec Sites are included in the 03_Exclude_1_2_Activities category. This designation category should be changed to 04_Managed. Forest Rec Sites cover ~577,090 ha, about 0.61% of terrestrial land in B.C. This change will result in a small to no change in the provincial summaries of land designation totals for the 3 categories (Protected Lands, Resource Exclusion Areas, and Spatially Managed Areas), however may result in larger changes to category amounts at smaller spatial scales (e.g. ecoregions).

Fix tiny gaps in output

When the outputs are aggregated across tiles sometimes tiny gaps (<10cm) remain where tile edges were. These are more common in the results of the overlay operations, but also occur in the main output.

It doesn't affect numerical results because the gaps are so small, but you can see them when the polygons have an outline, and they make unnecessarily complex polygons.

test installation

click recommends using setup.py, this probably removes need for requirements.txt
http://click.pocoo.org/5/setuptools/

Fill in output with the rest of BC

Filing this as an issue so I don't forget.

Add BC non-conservation lands remainder (with marine and terrestrial designation) to the output layer. Code the areas as NULL or non_conservation or similar.

This will make a few things much easier:

QA (total area)
% calculations for overlay outputs
using the script output for other jobs (harvesting restrictions)

To do this, use the generated bc_boundary layer as the final input to the output layer (rather than intersecting the two).

no_preprocess flag

On re-running the script I think it might be simpler for the designatedlands process command to have a --force_preprocess flag rather than --no_preprocess.

Preprocessing is slow and is required at least once - the script could check for existing tables and skip the tiling/prep when the prepped layer already exists. A user that wants to redo the tiling/prep could call --force_preprocess. Calling the preprocess routine by layer might be valuable too but does not seem like a priority.