trias-project / indicators Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 2.51 GB

📈 Alien species indicators

Home Page: https://trias-project.github.io/indicators/

License: MIT License

CSS 100.00%

indicators invasive-species oscibio r rstats

indicators's People

Contributors

Watchers

Forkers

amyjsdavis

indicators's Issues

Checklist data filters for indicators

This issue describes how data will be filtered to get to the data frame described in #17. Some of these will be tackled in the unified checklist.

Filters:

Specific checklists
Distribution in Belgium
occurrenceStatus = present
establishmentMeans = introduced
Linked to taxonomic backbone. This removes valid taxa, but is necessary for higher classification
Remove genera and above (i.e. no species info, is the case for RINSE)
Group by accepted taxon?? So we don’t count synonyms of same species.
Group by species?? So we don’t count infraspecific taxa

natural_dispersal: pathway level1 and level2 not correct

I used the first underscore (_) to distinguish pathway level 1 and pathway level 2 (see pathway table). This is good for all possibilities except natural_dispersal, which is a pathway level1, and should be not divided in two levels.
Thanks to @SanderDevisscher to find this bug. I try to solve it now.

Correctness shapefile Belgian regions coming from our server

As noticed while commenting #57 , we can download shapefiles of Belgian regions from this federal site: https://data.gov.be/en/dataset/fb1e2993-2020-428c-9188-eb5f75e284b9

I have just compared the shapefiles from our server, as uploaded by @timadriaens, with the downloaded ones, via R.

I see the following:

the shapefiles of Belgian regions coming from INBO server appear shifted if we compare it with the national Belgian borders (by openStreetMap). See figure below:
the shapefiles of Belgian regions downloaded by internetsite of Federal Government are not shifted. See figure below:

I transofrmed both shapefiles to WGS84 before plotting them with mapview/leaflet of course.

@timadriaens , can you please check it via a GiS software?

(re)appearing species table should evaluate data up to present year

in #48 we found that data are published with a delay of approximately two years. So, we limited to analyze data up to 2017. However, it would be nice to have a screening for appearing species (which is done by decision rules, see #49 (comment)) up to present year. In this way we can provide a more effective policy relevant output.

checklist indicators per taxon, not per species

Be sure to change checklist indicators so that all of the piepelins count taxa (key), not species (speciesKey).

occurrenceStatus filter for checklist data indicator

This issue will follow the part of issue #21 dedicated to occurrenceStatus (point 3).
Based on what @qgroom wrote on that issue (see #21 (comment)) and what he told me today, we decided to change this filter.
I take taxa with occurrenceStatus not equal to ABSENT, EXCLUDED or EXTINCT. In R words:

df %>% filter(!status %in% c("ABSENT", "EXCLUDED", "EXTINCT"))

@peterdesmet , @LienReyserhove, @timadriaens, something to add about?

Write gbif_download_meta() function

Usage:

gbif_download_meta("../data/output/gbif_downloads.csv")

The script would:

Verify and report errors the input:
- Can it be found?
- Is it a csv file?
- Has a the file a column gbif_download_key
Use rgbif occ_download_meta() to get status information for all downloads in the list
Add information for all records in the input file in the columns:
- gbif_download_created
- gbif_download_status
- gbif_download_doi

Example of a response you'll get: http://api.gbif.org/v1/occurrence/download/0000251-150304104939900

Update Eu concern species list

@timadriaens found up that the list is not anymore updated.
He will update it by modifying file data/input/eu_concern_species.tsv.

Document repo structure in README

... to the reduced scope of indicators (rather than the full pipeline).

Adding attribute about "degree of uncertainty"

From e-mail of @amyjsdavis:

By the way, Diederik and I are very enthusiastic about the data cube and your approach to handling spatial uncertainty. We ask that you consider adding one attribute that identifies whether the grid cell contains at least one presence with an "exact" location, or whether the presences in the grid cell have all been randomly assigned. This attribute will be very useful to our modeling if it turns out that we need to reduce the amount of the uncertainty in our model.

GBIF match for modelling species

Notes regarding GBIF match for modelling species:

Podarcis sicula (https://www.gbif.org/species/2469233) considered synonym of Podarcis siculus. I propose to update our spelling to Podarcis siculus. OK?
Persicaria wallichii (https://www.gbif.org/species/6391908) considered synonym of Koenigia polystachya (https://www.gbif.org/species/8848208) which has a whole number of synonyms (see on the left). Do we keep our restricted Persicaria wallichii taxon concept or widen to Koenigia polystachya?
Aspius aspius (https://www.gbif.org/species/2360181) considered synonym of Leuciscus aspius (https://www.gbif.org/species/5851603) with a number of synonyms. Keep restricted or widen?
Astacus leptodactylus (https://www.gbif.org/species/4417551) considered synonym of Pontastacus leptodactylus (https://www.gbif.org/species/8946295) with 3 synonyms. Keep restricted or widen?
Mimulus guttatus matches with 2:
- Mimulus guttatus Fisch. (https://www.gbif.org/species/7887942) considered synonym of Erythranthe lutea (L.) G.L.Nesom (https://www.gbif.org/species/7730307) with a number of synonyms.
- Mimulus guttatus Fisch. ex DC. (https://www.gbif.org/species/6070603) considered synonym of different species Erythranthe guttata (DC.) G.L.Nesom (https://www.gbif.org/species/7346102) with a number of synonyms.
Which one of the two is it? I'll then add that author. Also: keep restricted or widen?

Use only verified data for pipeline

Hi, this came up when checking a unverified, false record of a supposedly new alien species for Belgium in the wnm.be data (Vespa orientalis). Waarnemingen and observations publish all records with IdentificationVerificationStatus on gbif (which is ok!). However, for the pipeline, the models etc. it is imperative we only use validated occurrences. Therefore: the pipeline needs a line to subset data based on IdentificationVerificationStatus

approved on expert judgement
approved on photographic evidence
approved on knowledge rules

Perhaps we can do some sort of sensitivity analysis to see how this impacts (I'm sure there is no time)...

Question to @damianooldoni @peterdesmet @qgroom @SoVDH , anticipating that perhaps many datasets/records on gbif do not even have a IdentificationVerificationStatus : what do we do if that field is not filled?

Indicator: number of new introductions of alien species per year in Belgium

Description

Temporal trends in first record rates of alien species (number of first records per year/per x year interval) for alien species (based on Seebens et al. 2017). This indicator provides information on the number of new introductions in time, for instance the rate of increase of alien species introductions and the accumulation rate of alien species (Rabitsch et al. 2016). The information will be updated and refined as the checklist is further supplemented.

Data needs

Country-level checklist
Checklist-based
Breakdowns:
- environment (marine, terrestrial, freshwater)
- taxon (kingdom --> family)
- native range
- pathway (level 1 and level 2 CBD)
- invasion stage (introduced, established)
- source checklist
- distribution (country-level: Belgium)

Data output

The data retrieved by GBIF would be organized in a (tidy) data.frame

key	nubKey	scientificName	datasetKey	species	genus	family	order	class	phylum	kingdom	rank	speciesKey	taxonomicStatus	acceptedKey	accepted	locationId	locality	country	status	first_observed	last_observed	establishmentMeans	native range	origin	invasion stage	habitat	pathway_level1	pathway_level2

This data should be already filtered as explained in #21 .
This data output will serve as input for the plots based on group_by-like pipes, e.g.

df %>% 
  group_by(class) %>% 
  count()

It is suggested to write these series of group_by() in a function for plotting because of the high number of combinations of information.
See issue #18 about temporal information (first_observed and last_observed) in checklists.

Add description of decision rules to indicators page

The decision rules described in documentation of apply_decison_rules() from trias package should be added to https://trias-project.github.io/indicators/07_occurrence_indicators_modelling.html#33_decision_rules

To be done during update of indicators before end 2020.

Not all taxa downloaded

According to the download list, last GBIF download (https://doi.org/10.15468/dl.9unif7) used the eu_concern_species.tsv list to query for taxa. However, only 38 of the 49 species are queried. These are 11 missing:

checklist_scientificName	backbone_taxonKey
Alopochen aegyptiacus	2498252
Elodea nuttallii	5329212
Gunnera tinctoria	2984306
Heracleum mantegazzianum	3034825
Heracleum persicum	3628745
Myriophyllum aquaticum	5361785
Myriophyllum heterophyllum	5361762
Parthenium hysterophorus	3086784
Pennisetum setaceum	2706134
Persicaria perfoliata	4033648
Pueraria lobata	9035634

@damianooldoni can you figure out what might cause this? Is it a character limit on the querystring in the URL?

Terminology consistency between pipelines

At this stage of programming, it should be important to use the same terms to refer to taxon in all pipelines.
Do we use nubKey and taxonomicStatus as in pipeline get_taxa or gbif_taxonKey and gbif_species_status as in pipeline occurrence?
I would personally opt for nubKey and taxonomicStatus as referred in checklists.
I would also consequently rename the columns of the input file eu_concern_species.tsv.

species on checklist without occurrences

New species on one of the checklists without any occurrences should actually appear in the list as "appearing". An example is the recently discovered Procambarus acutus which was meanwhile added to the macroinvertebrates checklist, but does not appear yet in any of our datasets on gbif (pending publication of the records...). Is it possible to retreive those in the indicator pipeline @damianooldoni e.g. for the separate list of appearing spp?

TaxonKeys don't match

When trying to match taxon data from gbif (using code described here) with distribution data from the "include distribution regions" - branch it seems the taxonKeys do not match.

Below a subset of 2 species to illustrate the issue:

Species	taxonKey in TaxonData	taxonKey in Distributions
Procambarus clarkii	152543866	140563018
Vespa velutina	152544481	148438120

ps: I used following code to read distribution data:

distributions_unified <- read_csv("https://raw.githubusercontent.com/
trias-project/unified-checklist/include-distribution-
regions/data/interim/distributions_unified.csv")

Common species to test for AOO

@damianooldoni FYI:

Species	taxon_key	records in BE	type
Pieris rapae	1920496	175197	butterfly
Pararge aegeria	8049830	145023	butterfly
Vanessa atalanta	1898286	153714	butterfly
Aglais io	4535827	130204	butterfly
Anas platyrhynchos	9761484	101445	bird
Rutilus rutilus	2359706	90766	fish
Cirsium arvense	3113414	40449	plant

How retrieve number of occupied cells at specific rank level

Based on discussion in #46 , we would like to know how many of our cells are occupied at a certain rank level (in our case, we thought about kingdom, but the issue can be extended easily to any other rank). As we are working at 1km scale (~50k cells) and we are working on a time span of around 50 years, let's say, this means sending a lot of queries to GBIF, which is not recommended (see interesting discussion about here: ropensci/rgbif#320).
So, how to get this aggregated(!) numbers in a quite fast way and without sending thousands of queries?
If I see how fast the GBIF tool observation trend works, I think a possible solution is in the air.
This is the link to the repo behind the tool: https://github.com/gbif/species-population.

Moreover, GBIF has already an API for doing it, they are writing the documentation for it: see gbif/species-population#6. However, the issue is quite old, their plans are very likely changed.

Analyzing the requests my computer does while searching for Branta canadensis against higher taxon Anatidae, I see some requests which are linked to binary files with .mvt extension:

https://api.gbif.org/v2/map/occurrence/regression/3/3/2.mvt?higherTaxonKey=2986&minYears=10&taxonKey=5232437&year=1970,2015

and if we zoom a lot:

https://api.gbif.org/v2/map/occurrence/density/7/65/42.mvt?srs=EPSG:3857&basisOfRecord=OBSERVATION&basisOfRecord=HUMAN_OBSERVATION&basisOfRecord=MACHINE_OBSERVATION&basisOfRecord=MATERIAL_SAMPLE&basisOfRecord=PRESERVED_SPECIMEN&higherTaxonKey=2986&minYears=10&taxonKey=5232437&year=1970,2015

Notice the /v2 instead of /v1, the GBIF API standard version. So, the tool loads some binary files (downloadable, just click on the links) and query on them. The directory structure is very likely linked to zoom level and geographical area. I have tried to opened one of these files in R via ReadBin() but it didn't work. I am afraid I need to know the file structure before hand.

Another way: in the issue cited before (ropensci/rgbif#320) Tim Robertson was speaking about SQL API which is under development: is it the right alternative to get what we want?

@stijnvanhoey, @peterdesmet : what do you think we can implement and what's at GBIF side? Thanks!

Add date to new taxa in EU concern species list

New species have been added to the EU concern list. File in data/input/eu_concern_species.tsv updated. Still, field entry_into_force is empty as new version of the list is still not officially published. See #51.
@timadriaens : add date to missing taxa when it will be available. Thanks.

Discuss common data cleaning pipeline

It seems WP3 and WP4 both need to clean GBIF occurrence data.
We should :

Discuss a common data filtering strategy.
Create a pipeline dedicated to data cleaning

How to tackle uncertainty due to range of year_of_introduction variable

Improve GAM output visualization

How to make plots with covariates more clear?

More in @ToonVanDaele 's repo: ToonVanDaele/trias-test#10

Structure of data output for checklist indicators

The data output we get by merging taxonomic information with distribution and description extensions, is "fake" tidy. I wrote fake because most of the description is saved by following the Entity-Value-Model (thanks @stijnvanhoey to find the right word for it)

For each taxon it appears like the t able here below:

taxonKey	type	description
141264591	pathway	cbd_2014_pathway:escape_horticulture
141264591	native range	Southern America (WGSRPD:8)
141264591	origin	vagrant

I tidy already the pathways by creating two new columns pathway_level1 and pathway_level2:

taxonKey	type	description	pathway_level1	pathway_level2
141264591	pathway	cbd_2014_pathway:escape_horticulture	escape	horticulture
141264591	native_range	Southern America (WGSRPD:8)	NA	NA
141264591	origin	vagrant	NA	NA

However, I am thinking this half-tidy approach can create confusion, because it mixes EAV and not EAV (typical representation) models!

Would it be better to have description data 100% tidy like table here below?

taxonKey	pathway_level1	pathway_level2	origin	native_range
141264591	escape	horticulture	vagrant	Southern America (WGSRPD:8)

I think it would be more understandable. And it would make plot workflow easier I think. What do you think, @stijnvanhoey, @SanderDevisscher and @Yasmine-Verzelen ? If you agree then I will implement it in the workflow and I will export a new test data.frame so that you all can use it asap. Thanks.

Modelling species occurrences

Download based on the modelling species is available at https://doi.org/10.15468/dl.6cljf9

No occurrences

No occurrences were found in Belgium for:

Under other species name

The following taxa are listed under a different species name (i.e. the name of the species GBIF considers the accepted one). scientific name is still the original one + only taxa were returned for the name we looked for, as we wanted:

Astacus leptodactylus → Pontastacus leptodactylus
Mimulus guttatus Fisch. ex DC. → Erythranthe guttata

Synonyms that are also included

And to know what synonym names were lumped in with the download, check this file (yellow rows): modelling_species_in_download.xlsx

Note: for most (all?) of these, it's very good that they get lumped, because e.g. there are no occurrences in Belgium published under Neovison vison: all are published as Mustela vison which we wouldn't get if GBIF didn't look for synonyms.

Let me know if you consider all the above acceptable.

Write Rmd that explores grid size effect on AOO

As discussed with @damianooldoni, he will create a Rmd name grid_size_effect.Rmd

Effect of grid size on AOO (area of occupancy)

Takeway: AOO for 1km2 < 10km2 < 100km2
Let's take the most precise 1km grid
Problem: data collection occurred on larger grid

Solution 1: downscaling

Method to resample larger grid data to a smaller grid (many methods)
Choose simple ensemble to get number of occupied 1KM grids
Compare with values before

Solution 2: records have uncertainty

Lot of data is collected on a larger grid and that information is reported in coordinateUncertaintyInMeters which is radius of grid
Plot coordinate uncertainty circle on grid (figure)
Ignore all occurrences with uncertainty > 10km
Randomly assign occurrence points to grids in overlapping area
New number of occupied 1KM grid
Compare with values before

For a select number of species make

Table:

Year	Occ	Coordunc	100km	10km	1km	1km (downscaling)	1km (circle based)
2011	count	mode	% (count)	% (count)	% (count)	% (count)	% (count)
2012	count	mode	% (count)	% (count)	% (count)	% (count)	% (count)
2013	count	mode	% (count)	% (count)	% (count)	% (count)	% (count)
2014	count	mode	% (count)	% (count)	% (count)	% (count)	% (count)
2015	count	mode	% (count)	% (count)	% (count)	% (count)	% (count)
Totals	total	mode	count	count	count	count	count

Chart:

Comparing % (AOO) of 1km, 1km (downscaling), 1km (circle based) per year

Where to find world wide grid at 1x1km level

@amyjsdavis , @DiederikStrubbe : is there a link where I can download a world wide grid with cell size 1x1km? This is the link for EU countries, which you both already know, but I don't find anything similar at world level. For Belgium, I use these shapefiles combined with function st_read() from sf package which is very flexible about file formats. If no link available, maybe do you have something I can use as well? Thanks!

Data publishing delay assessment

The assessment of data publishing delay is very important for a correct use of segmented regression or any other analysis for studying the emerging status of alien species. If we see a sensible decrease in data publication, then we can assume that the decrease in number of occurrences at species level is not realistic. A easy but effective way to find it, is to get number of occurrences with geographic coordinates in Belgium for each kingdom during the last years.

You can manually search via GBIF site if you want. Here below a link to get all occurrences for:

year = 2017
kingdom = Animalia
hasCoordinate=TRUE
country=BE
https://www.gbif.org/occurrence/search?has_coordinate=true&year=2017&kingdom_key=1&country=BE

Here below the graphs. You can make them on your laptop, by using the code in this gist.

You can see that two years is a good threshold for publishing delay. Only Fungi do better in Belgium. With the popularity of Citizen Science projects like iNaturalist this delay will eventually decrease in the future. But, at the moment 2016 seems the last year to use for performing segmented regression. At the end of 2019, we could then include 2017.

@qgroom , @timadriaens , @ToonVanDaele , what do you think about?

Replace occupancy with AoO on indicator graphs and in workflows

Hi, just a detail for the occurrence indicator graphs Y-axis legend: Occupancy, in principle, is mostly used for the probability that a site is occupied (and expressed between 0 and 1) cf. site occupancy models. What we look at in TrIAS is Area of Occupancy the way IUCN use it to quantify range size for species. This is an interesting paper about the concept.

@damianooldoni we should probably change the legend for the Y axis to "Area of Occupancy (km2)" to avoid confusion.

basic data to display in Harmonia

It would be nice to have some very basic data in Harmonia on which the more advanced indicators/indexes are based. This always helps interpreting trend graphs. The ones I can think of are:

the number of occurrences per year
the number of km squares per year

I believe for now these are 'byproducts' of the pipeline, perhaps we should think of a suitable output (simple barplot graph).

From occurrences and AreaOfOccupancy to an emerging species decision rule at year level

This issue describes a part of the general workflow for assessing the emerging status of alien species, as discussed on Friday, 15 Feb 2019 by @damianooldoni , @timadriaens and @ToonVanDaele .

Input data

We start from the output of occ-processing repository called cube_belgium.csv as mentioned in trias-project/occ-cube-alien#3. This file contains occurrences with (at least) the following key columns:

taxonKey
speciesKey
kingdomKey
year
CELLCODE (grid id from European Environment)

Grouping by speciesKey and year, we get the number of occurrences per year (x: year, y: n_occs). We work at year level, no more detailed temporary information used. The research effort bias of area of occupancy (AOO) already corrected at this stage (for details about research effort bias correction, see #46). Working at species level can be not always the case, issue discussed separately (see trias-project/unified-checklist#35).

AOO and occurrences are time series (x: year, y: occurrences or y: AOO).
Although we could have data before 1950, we start analysis from 1950, the birth date of invasion ecology (cit. @timadriaens 😃 ).

Limit cases

No occurrences for a species: no analysis possible. Very unlikely situation, but still possible. however, these species MUST be in the final list for Risk Assessment.
Occurrences only at the last valid year (due to delay in data publishing; see #48): no sufficient data for analysis. However, these species MUST be in the final list for Risk Assessment.
Occurrences only at one of the very last years: no sufficient data for analysis. However, these species MUST be in the final list for Risk Assessment. How many years should we consider, is still not clear. However, it should not be too much far in the past, as the absence of observations can just say that such species is not emerging at all.

Segmented regression

After extracting the limit cases, we set occ and AOO equal to zero for years with no occurrences as only years with occurrences are present in the cube. Segmented regression will be applied to the AOO and occ time series separately. So, for each of the two time series and for each year, the slope of the last segment and its confidence interval is evaluated as a categorical variable. We can have three situations:

Slope is positive: occ/AOO is increasing.
Slope is zero (zero is within slope's confidence interval): occ/AOO is stable.
Slope is negative: occ/AOO is decreasing.

Emerging decision table at year level

For each year and species we can then apply a decision table to define the status of emergency of the species:

AOO	n. occurrences	emerging status
decrease	decrease	not emerging
decrease	stable	not emerging
decrease	increase	potentially emerging
stable	decrease	not emerging
stable	stable	not emerging
stable	increase	potentially emerging
increase	decrease	potentially emerging
increase	stable	potentially emerging
increase	increase	emerging

This will end up in an output like this:

species	year	emerging status
A	1950	potentially emerging
...	...	...
A	2012	not emerging
A	2013	pot. emerging
A	2014	pot. emerging
A	2015	pot. emerging
A	2016	emerging
...	...	...
B	2012	not emerging
B	2013	pot. emerging
B	2014	emerging
B	2015	emerging
B	2016	not emerging

Next steps: how to aggregate this emerging labels in order to estimate the general emerging status of a species?
My two cents: as our analysis is future oriented, the emerging status in the recent past should definitely weight more in the finale decision than the status in the far past.

@ToonVanDaele , @timadriaens : please comment if you think I missed something or you have new thoughts about it.

incorrect calculation of ncells ?

indicators/src/06_occurrence_indicators_appearing_taxa.Rmd

Line 103 in 0d0b9b5

ncells = sum(pa_cobs),

shouldn't this be ncells = sum(pa_obs) ?

From `type`/`description` column combination (Entity-Attribute-Value style) to individual columns

I propose to adapt the current columns type and description in the preprocessing phase (after gbif download and before scripting indicators:

type	description	key
native range	cultivated origin	141264581
origin	introduced	141264581
pathway	cbd_2014:escape_horticulture	141264581
native range	temperate Asia	141264583
origin	vagrant	141264583
pathway	cbd_2014:escape_horticulture	141264583

to three seperate columns:

native range	origin	pathway	key
cultivated origin	introduced	cbd_2014:escape_horticulture	141264581
temperate Asia	vagrant	cbd_2014:escape_horticulture	141264583

This will simplify the implementation of #17 #20 and require minimal adaptation to #19...

Write gbif_download() function

Usage:

gbif_download(
    taxa="https://raw.githubusercontent.com/trias-project/alien-plants-belgium/afbd2805de77afd79fb74669c403d40f1416661b/data/processed/taxon.csv",
    country="BE", # default
    output="../data/output/gbif_downloads.csv" # default
)

The script would:

Verify and report errors the parameter checklist:
- Can it be found
- Is it a csv file
- Is it a GitHub commit URL
- Has the file a column gbif_nubKey
- Does gbif_nubKey contain numbers only
Verify and report errors for the option country:
- Is it a valid ISO 3166-1 alpha-2
Verify and report errors for the option output:
- Does the file exist
Use rgbif to trigger a download
Write information to data/output/gbif_downloads.csv:
- gbif_download_key: GBIF download UUID
- input_checklist: checklist (parameter)
- input_country: country (parameter)

Hedera subspecies problem

Hi, upon wanting to check emergence status of Atlantic ivy (Hedera hibernica), I noticed differences in taxon matching of several datases:

Florabank matches with Hedera helix subsp. hibernica (Poit.) D.C.McClint. (gbif key 6307044) - considered a synonym by gbif
observations.be matches with Hedera hibernica (G.Kirchn.) Carrière (gbif key 8168344), whereas in obs.be itself the taxon name is Hedera helix subsp. hibernica (Kirchner) McClintock

In GRISS Belgium, there is only Hedera hibernica (G.Kirchn.) Bean (such as in the Manual of Alien Plants) (gbif key 8410115).

Fact: 3 different gbif keys for the same thing. Consequence: the species does not pop up as emerging in the trias indicator flow, whilst probably every field person will say it is clearly emerging.

Solution?

visualizations of pathways

some ideas to visually do more with introduction pathways of alien species in the checklist based on a similar exercise by Van Wilgen & Wilson 2017 (The status of Biological Invasions and their management in South Africa). I know that for TrIAS we have opted for a tabular view but just to show some possibilities for future use - planning to code these graphs for NARA-T see this issue:

Aantal soorten per pathway (gerangschikt op CBD level1 en level2)

evolution of pathways in time (based on first introduction date)

pathways for subset of policy relevant species

It would be nice to draft the pathway 'indicator' table for the species of the Union list as this links directly to policy on action plans etc.

GitHub cannot render tsv because of double quotes in scientificName

On https://github.com/trias-project/pipeline/blob/master/data/output/checklist_taxa.tsv GitHub indicates that it cannot display the file nicely because "Illegal quoting in line 2710.". That line contains:

Sedum "Autumn Joy" (S. spectabile Boreau x telephium L.)

We could set quote=TRUE for the output, but I think that is going to unnecessarily quote everything and might not even resolve the issue. @damianooldoni can you test this on a branch?
At some point the file will be to big to display on GitHub anyway.

Indicator: cumulative number of alien species

Description

This indicator measures the trends of all alien species introductions (publication of first observations). At the national level this indicator is useful to measure the trends in the presence/occurrence of alien (and potentially invasive) species and inform decisions to do with prevention of alien species introduction and the management and control of invasive species causing impacts on biodiversity and ecosystems. It is based on the same information for #17 but is an alternative representation in line with international policy indicators on (invasive) alien species such as EU headline indicators, SEBI, Aichi, IPBES.

Data needs and data output are the same as #17.

Visualisation

A lineplot is envisaged with colours for breakdowns. Visualizing uncertainty due to use of time periods is discussed in #18

indicator on number of occurrences in protected areas

Another very interesting indicator that we did not touch upon yet because of time constraints, is the proportion of occurrences of an alien species in protected areas. This is imo a very important one that can directly inform risk assessment but also risk management evaluations.

Ideally, this is a geographic subset of the weighted trend, but we need not make things too complicated. So, can we, based on the Area of Occupancy we already have, do an intersect with the N2000 network for Belgium (the official delimitation is on our gis server but probably best to take the one on EEA website)? It could be reported per year (e.g. since 2004, when Europe agreed on the Belgian N2000 areas) which is more interesting internationally and for other countries.

Compensating research effort bias for occupancy: temporal solution vs spatial solution?

The goal of this issue is to discuss the best way of compensate the research effort bias. Based on interesting discussions with @qgroom and @timadriaens, I am working on two different ideas:

Temporal solution

This issue starts from point 4 of @qgroom' comment on issue 40:

Occupancy values are clearly sensitive to research effort. To improve it we need to aggregate along years. The (lack of) research effort is also a source of underestimation of occupancy: some areas are scanned at a certain year, others in other years. So, the question is: how many years should we use to get the optimal aggregation span? To do it, we should plot occupancy vs number of aggregation years (aggregation span). Hopefully we get the same curve as the curve of occurrence vs research effort (search literature for references). We should see a kind of saturation point. Do it species by species (year by year): the saturation point will be probably different among taxa. That would be obviously a problem as we aim to hold it simple, i.e. using a single aggregation span. Of course, we will have to make a decision at the end (number of years to use for temporal aggregation should not be too high, otherwise we loose policy relevancy), but investigation is needed.

I am investigating the research effort bias correction on branch research_effort_bias_corrction, more specifically in ./src/_research_effort_bias.Rmd.

Here below some plots showing the occupancy vs. time window from 2007 to 2018. I don't see a clear saturation curve valid for all species and all years. I think correcting research effort bias by working on temporal dimension will be not effective as we want to use temporal dimension to detect changes in occupancy as well.

Spatial solution

As alternative, @timadriaens and I discussed yesterday a possible alternative: why not working on spatial scale? We can calculate yearly occupancy dividing #occupied cells at species level by #occupied cells at kingdom level instead of dividing by #cells of Belgium. This way we will remove all cells not showing any research effort at all. I am still working on making this calculation feasible (technical discussion opened in #47).
Meanwhile, ideas and comments about temporal and/or spatial solution are welcome!

Physella acuta does not have Haitia acuta as synonym

Originally reported in trias-project/alien-macroinvertebrates#25

Physella acuta is a species in the Alien macroinvertebrates checklist (alien-macroinvertebrates-checklist:taxon:57). Haitia acuta should be a synonym of it... and thus find occurrence records named as such (including 167 occurrences in the Alien macroinvertebrates occurrence dataset, e.g. PB:Ugent:AqE:2342). The GBIF backbone however considers both 2 accepted species:

Physella acuta thus won't return records for Haitia acuta.

Note: Haitia acuta is the only species in the macroinvertebrate occurrence dataset that is not listed by the same name in the checklist.

Indicator: pathways associated with alien species introductions

Description

The indicator shows the number of non-native plant and animal species introduced in Belgium via a certain pathway. It is based on a checklist of alien species, composed of various existing sources and databases. The information will be updated and refined as the checklist is further supplemented. The available information on introduction pathways was organized following the Convention on Biological Diversity standard (CBD 2014).

This indicator uses the same data output as issue #17. It is a specific group by (per pathway) on the same dataframe.

Is there need to use a European reference grid at 1x1km resolution?

Based on previous conversations, we found that no worldwide grid is available and it is actually not needed for risk assessment (RA). We need it at European level for RA and at Belgian level for indicators. Based on what I can find from European Environmental Agency page I cannot find any grid at 1x1km resolution at European level, only 10x10km and 100x100km.

@DiederikStrubbe, @amyjsdavis : do you need to calculate occupancy at European level at 1x1km or is it fine for you to get 1x1km resolution at Belgian level?

If you really need, do you have ideas how to get it? Maybe we can build it by using all reference grids at country level and join them together but we will have a lot of problems at country borders where duplicates will occur. Something to think about... Let me know. If you have a shape file for it, could you please upload it in ./data/external/? Thanks.

Effects of grid size on occupancy: ideas for improvements

Based on remarks of @qgroom :

Try to study these effects with two or three common native species.
PRESERVED_SPECIMEN should also be included in the basis of record.
Check that the occurrences without coordinates don't have a sensible grid reference in verbatimCoordinateSystem field.
For records missing a coordinateUncertaintyInMeters is it possible to tell the coordinate uncertainty from the number of significant digits in the long/lat?
An output graph showing the mean and variance of coordinateUncertaintyInMeters vs. time would be useful ancillary information.

UTF-8 issues in scientificName

We currently have scientificNames with:

2772890 134087647 Loncomelos brevistylus (Wolfner ) Dost<U+00E1>l 9ff7d317-609b-4c08-bd86-3bc404b77c42 Loncomelos brevistylum (Wolfner) Dost<U+00E1>l Plantae SYNONYM 2772885 Ornithogalum pyramidale L.

As you notice, these issues appear in the checklist_scientificName and backbone_scientificName.

The name is displayed correctly on the webpage: https://www.gbif.org/species/134087647
The name is displayed correctly on in the API (in my browser): https://api.gbif.org/v1/species/134087647
The name is incorrect in the dataframe: checklist_taxa[433,3]: "Loncomelos brevistylus (Wolfner ) Dost\u00e1l"

So, must be related to how the data are read from the API (because of our own code or rgbif).

establishmentMeans filter for checklist data indicator

This issue follows point 4 of #21 (comment)

Based on discussion with @timadriaens and @qgroom we think to filter based on establishmentMeans equal to one of the following terms: INTRODUCED, NATURALISED, INVASIVE, ASSISTED COLONISATION. These are the new terms which will be proposed soon and hopefully approved and implemented. Is there a problem on doing that?

Get info about alien species at protect area level

Requeste of @timadriaens: add analysis at protected area level.
A tabular file as output:

protected_area_id	type	year	taxonKey	obs	ncells	coverage
BE3748432	A	2010	124245	34	15	0.30
BE3748432	A	2011	124245	44	29	0.58
BE3748432	A	2012	124245	58	32	0.64
BE3748432	A	2013	124245	69	36	0.72
BE3748432	A	2014	124245	88	43	0.86
BE3748432	A	2015	124245	92	45	0.90
BE3748432	A	2016	124245	95	47	0.94
BE3748432	A	2017	124245	99	48	0.96

GAM and decision rules output of marine species selected for PRA

From e-mail of Thomas Verleye:

We have pre-selected 5 species for PRA based on initial data availability and trends in occurrences (when we’ll face additional bottlenecks regarding data availability or expert identification we might reduce this number during the coming weeks, tbc):

Crepidula fornicata (gastropod)
Mnemiopsis leidyi (Ctenophora)
Ensis leei (bivalve)
Hemigrapsus takanoi (Crustacea)
Magalana (Crassostrea) gigas (bivalve)

@damianooldoni: would it be possible to create a GAM for C. fornicata & M. Leidyi? Thanks in advance!

trias-project / indicators Goto Github PK

indicators's People

Contributors

Watchers

Forkers

indicators's Issues

Description

Data needs

Data output

No occurrences

Under other species name

Synonyms that are also included

Effect of grid size on AOO (area of occupancy)

Solution 1: downscaling

Solution 2: records have uncertainty

For a select number of species make

Input data

Limit cases

Segmented regression

Emerging decision table at year level

Description

Visualisation

Temporal solution

Spatial solution

Description

Recommend Projects

Recommend Topics

Recommend Org

Jobs