GithubHelp home page GithubHelp logo

trias-project / indicators Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 1.0 2.51 GB

📈 Alien species indicators

Home Page: https://trias-project.github.io/indicators/

License: MIT License

CSS 100.00%
indicators invasive-species oscibio r rstats

indicators's People

Contributors

damianooldoni avatar peterdesmet avatar sanderdevisscher avatar soriadelva avatar stijnvanhoey avatar timadriaens avatar toonvandaele avatar yasmine-verzelen avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

amyjsdavis

indicators's Issues

Checklist data filters for indicators

This issue describes how data will be filtered to get to the data frame described in #17. Some of these will be tackled in the unified checklist.

Filters:

  1. Specific checklists
  2. Distribution in Belgium
  3. occurrenceStatus = present
  4. establishmentMeans = introduced
  5. Linked to taxonomic backbone. This removes valid taxa, but is necessary for higher classification
  6. Remove genera and above (i.e. no species info, is the case for RINSE)
  7. Group by accepted taxon?? So we don’t count synonyms of same species.
  8. Group by species?? So we don’t count infraspecific taxa

Correctness shapefile Belgian regions coming from our server

As noticed while commenting #57 , we can download shapefiles of Belgian regions from this federal site: https://data.gov.be/en/dataset/fb1e2993-2020-428c-9188-eb5f75e284b9

I have just compared the shapefiles from our server, as uploaded by @timadriaens, with the downloaded ones, via R.

I see the following:

  1. the shapefiles of Belgian regions coming from INBO server appear shifted if we compare it with the national Belgian borders (by openStreetMap). See figure below:
    image
  2. the shapefiles of Belgian regions downloaded by internetsite of Federal Government are not shifted. See figure below:
    image

I transofrmed both shapefiles to WGS84 before plotting them with mapview/leaflet of course.

@timadriaens , can you please check it via a GiS software?

Write gbif_download_meta() function

Usage:

gbif_download_meta("../data/output/gbif_downloads.csv")

The script would:

  1. Verify and report errors the input:
    • Can it be found?
    • Is it a csv file?
    • Has a the file a column gbif_download_key
  2. Use rgbif occ_download_meta() to get status information for all downloads in the list
  3. Add information for all records in the input file in the columns:
    • gbif_download_created
    • gbif_download_status
    • gbif_download_doi

Example of a response you'll get: http://api.gbif.org/v1/occurrence/download/0000251-150304104939900

Adding attribute about "degree of uncertainty"

From e-mail of @amyjsdavis:

By the way, Diederik and I are very enthusiastic about the data cube and your approach to handling spatial uncertainty. We ask that you consider adding one attribute that identifies whether the grid cell contains at least one presence with an "exact" location, or whether the presences in the grid cell have all been randomly assigned. This attribute will be very useful to our modeling if it turns out that we need to reduce the amount of the uncertainty in our model.

GBIF match for modelling species

Notes regarding GBIF match for modelling species:

Use only verified data for pipeline

Hi, this came up when checking a unverified, false record of a supposedly new alien species for Belgium in the wnm.be data (Vespa orientalis). Waarnemingen and observations publish all records with IdentificationVerificationStatus on gbif (which is ok!). However, for the pipeline, the models etc. it is imperative we only use validated occurrences. Therefore: the pipeline needs a line to subset data based on IdentificationVerificationStatus

  • approved on expert judgement
  • approved on photographic evidence
  • approved on knowledge rules

Perhaps we can do some sort of sensitivity analysis to see how this impacts (I'm sure there is no time)...

Question to @damianooldoni @peterdesmet @qgroom @SoVDH , anticipating that perhaps many datasets/records on gbif do not even have a IdentificationVerificationStatus : what do we do if that field is not filled?

Indicator: number of new introductions of alien species per year in Belgium

Description

Temporal trends in first record rates of alien species (number of first records per year/per x year interval) for alien species (based on Seebens et al. 2017). This indicator provides information on the number of new introductions in time, for instance the rate of increase of alien species introductions and the accumulation rate of alien species (Rabitsch et al. 2016). The information will be updated and refined as the checklist is further supplemented.

Data needs

  • Country-level checklist
  • Checklist-based
  • Breakdowns:
    • environment (marine, terrestrial, freshwater)
    • taxon (kingdom --> family)
    • native range
    • pathway (level 1 and level 2 CBD)
    • invasion stage (introduced, established)
    • source checklist
    • distribution (country-level: Belgium)

Data output

The data retrieved by GBIF would be organized in a (tidy) data.frame

key nubKey scientificName datasetKey species genus family order class phylum kingdom rank speciesKey taxonomicStatus acceptedKey accepted locationId locality country status first_observed last_observed establishmentMeans native range origin invasion stage habitat pathway_level1 pathway_level2

This data should be already filtered as explained in #21 .
This data output will serve as input for the plots based on group_by-like pipes, e.g.

df %>% 
  group_by(class) %>% 
  count()

It is suggested to write these series of group_by() in a function for plotting because of the high number of combinations of information.
See issue #18 about temporal information (first_observed and last_observed) in checklists.

Not all taxa downloaded

According to the download list, last GBIF download (https://doi.org/10.15468/dl.9unif7) used the eu_concern_species.tsv list to query for taxa. However, only 38 of the 49 species are queried. These are 11 missing:

checklist_scientificName backbone_taxonKey
Alopochen aegyptiacus 2498252
Elodea nuttallii 5329212
Gunnera tinctoria 2984306
Heracleum mantegazzianum 3034825
Heracleum persicum 3628745
Myriophyllum aquaticum 5361785
Myriophyllum heterophyllum 5361762
Parthenium hysterophorus 3086784
Pennisetum setaceum 2706134
Persicaria perfoliata 4033648
Pueraria lobata 9035634

@damianooldoni can you figure out what might cause this? Is it a character limit on the querystring in the URL?

Terminology consistency between pipelines

At this stage of programming, it should be important to use the same terms to refer to taxon in all pipelines.
Do we use nubKey and taxonomicStatus as in pipeline get_taxa or gbif_taxonKey and gbif_species_status as in pipeline occurrence?
I would personally opt for nubKey and taxonomicStatus as referred in checklists.
I would also consequently rename the columns of the input file eu_concern_species.tsv.

species on checklist without occurrences

New species on one of the checklists without any occurrences should actually appear in the list as "appearing". An example is the recently discovered Procambarus acutus which was meanwhile added to the macroinvertebrates checklist, but does not appear yet in any of our datasets on gbif (pending publication of the records...). Is it possible to retreive those in the indicator pipeline @damianooldoni e.g. for the separate list of appearing spp?

TaxonKeys don't match

When trying to match taxon data from gbif (using code described here) with distribution data from the "include distribution regions" - branch it seems the taxonKeys do not match.

Below a subset of 2 species to illustrate the issue:

Species taxonKey in TaxonData taxonKey in Distributions
Procambarus clarkii 152543866 140563018
Vespa velutina 152544481 148438120

ps: I used following code to read distribution data:

distributions_unified <- read_csv("https://raw.githubusercontent.com/
trias-project/unified-checklist/include-distribution-
regions/data/interim/distributions_unified.csv")

Common species to test for AOO

@damianooldoni FYI:

Species taxon_key records in BE type
Pieris rapae 1920496 175197 butterfly
Pararge aegeria 8049830 145023 butterfly
Vanessa atalanta 1898286 153714 butterfly
Aglais io 4535827 130204 butterfly
Anas platyrhynchos 9761484 101445 bird
Rutilus rutilus 2359706 90766 fish
Cirsium arvense 3113414 40449  plant

How retrieve number of occupied cells at specific rank level

Based on discussion in #46 , we would like to know how many of our cells are occupied at a certain rank level (in our case, we thought about kingdom, but the issue can be extended easily to any other rank). As we are working at 1km scale (~50k cells) and we are working on a time span of around 50 years, let's say, this means sending a lot of queries to GBIF, which is not recommended (see interesting discussion about here: ropensci/rgbif#320).
So, how to get this aggregated(!) numbers in a quite fast way and without sending thousands of queries?
If I see how fast the GBIF tool observation trend works, I think a possible solution is in the air.
This is the link to the repo behind the tool: https://github.com/gbif/species-population.

Moreover, GBIF has already an API for doing it, they are writing the documentation for it: see gbif/species-population#6. However, the issue is quite old, their plans are very likely changed.

Analyzing the requests my computer does while searching for Branta canadensis against higher taxon Anatidae, I see some requests which are linked to binary files with .mvt extension:

https://api.gbif.org/v2/map/occurrence/regression/3/3/2.mvt?higherTaxonKey=2986&minYears=10&taxonKey=5232437&year=1970,2015

and if we zoom a lot:

https://api.gbif.org/v2/map/occurrence/density/7/65/42.mvt?srs=EPSG:3857&basisOfRecord=OBSERVATION&basisOfRecord=HUMAN_OBSERVATION&basisOfRecord=MACHINE_OBSERVATION&basisOfRecord=MATERIAL_SAMPLE&basisOfRecord=PRESERVED_SPECIMEN&higherTaxonKey=2986&minYears=10&taxonKey=5232437&year=1970,2015

Notice the /v2 instead of /v1, the GBIF API standard version. So, the tool loads some binary files (downloadable, just click on the links) and query on them. The directory structure is very likely linked to zoom level and geographical area. I have tried to opened one of these files in R via ReadBin() but it didn't work. I am afraid I need to know the file structure before hand.

Another way: in the issue cited before (ropensci/rgbif#320) Tim Robertson was speaking about SQL API which is under development: is it the right alternative to get what we want?

@stijnvanhoey, @peterdesmet : what do you think we can implement and what's at GBIF side? Thanks!

Add date to new taxa in EU concern species list

New species have been added to the EU concern list. File in data/input/eu_concern_species.tsv updated. Still, field entry_into_force is empty as new version of the list is still not officially published. See #51.
@timadriaens : add date to missing taxa when it will be available. Thanks.

Discuss common data cleaning pipeline

It seems WP3 and WP4 both need to clean GBIF occurrence data.
We should :

  1. Discuss a common data filtering strategy.
  2. Create a pipeline dedicated to data cleaning

Structure of data output for checklist indicators

The data output we get by merging taxonomic information with distribution and description extensions, is "fake" tidy. I wrote fake because most of the description is saved by following the Entity-Value-Model (thanks @stijnvanhoey to find the right word for it)

For each taxon it appears like the t able here below:

taxonKey type description
141264591 pathway cbd_2014_pathway:escape_horticulture
141264591 native range Southern America (WGSRPD:8)
141264591 origin vagrant

I tidy already the pathways by creating two new columns pathway_level1 and pathway_level2:

taxonKey type description pathway_level1 pathway_level2
141264591 pathway cbd_2014_pathway:escape_horticulture escape horticulture
141264591 native_range Southern America (WGSRPD:8) NA NA
141264591 origin vagrant NA NA

However, I am thinking this half-tidy approach can create confusion, because it mixes EAV and not EAV (typical representation) models!

Would it be better to have description data 100% tidy like table here below?

taxonKey pathway_level1 pathway_level2 origin native_range
141264591 escape horticulture vagrant Southern America (WGSRPD:8)

I think it would be more understandable. And it would make plot workflow easier I think. What do you think, @stijnvanhoey, @SanderDevisscher and @Yasmine-Verzelen ? If you agree then I will implement it in the workflow and I will export a new test data.frame so that you all can use it asap. Thanks.

Modelling species occurrences

Download based on the modelling species is available at https://doi.org/10.15468/dl.6cljf9

No occurrences

No occurrences were found in Belgium for:

Under other species name

The following taxa are listed under a different species name (i.e. the name of the species GBIF considers the accepted one). scientific name is still the original one + only taxa were returned for the name we looked for, as we wanted:

  • Astacus leptodactylusPontastacus leptodactylus
  • Mimulus guttatus Fisch. ex DC.Erythranthe guttata

Synonyms that are also included

And to know what synonym names were lumped in with the download, check this file (yellow rows): modelling_species_in_download.xlsx

Note: for most (all?) of these, it's very good that they get lumped, because e.g. there are no occurrences in Belgium published under Neovison vison: all are published as Mustela vison which we wouldn't get if GBIF didn't look for synonyms.

Let me know if you consider all the above acceptable.

Write Rmd that explores grid size effect on AOO

As discussed with @damianooldoni, he will create a Rmd name grid_size_effect.Rmd

Effect of grid size on AOO (area of occupancy)

  • Takeway: AOO for 1km2 < 10km2 < 100km2
  • Let's take the most precise 1km grid
  • Problem: data collection occurred on larger grid

Solution 1: downscaling

  • Method to resample larger grid data to a smaller grid (many methods)
  • Choose simple ensemble to get number of occupied 1KM grids
  • Compare with values before

Solution 2: records have uncertainty

  • Lot of data is collected on a larger grid and that information is reported in coordinateUncertaintyInMeters which is radius of grid
  • Plot coordinate uncertainty circle on grid (figure)
  • Ignore all occurrences with uncertainty > 10km
  • Randomly assign occurrence points to grids in overlapping area
  • New number of occupied 1KM grid
  • Compare with values before

For a select number of species make

Table:

Year Occ Coordunc 100km 10km 1km 1km (downscaling) 1km (circle based)
2011 count mode % (count) % (count) % (count) % (count) % (count)
2012 count mode % (count) % (count) % (count) % (count) % (count)
2013 count mode % (count) % (count) % (count) % (count) % (count)
2014 count mode % (count) % (count) % (count) % (count) % (count)
2015 count mode % (count) % (count) % (count) % (count) % (count)
Totals total mode count count count count count

Chart:

Comparing % (AOO) of 1km, 1km (downscaling), 1km (circle based) per year

Where to find world wide grid at 1x1km level

@amyjsdavis , @DiederikStrubbe : is there a link where I can download a world wide grid with cell size 1x1km? This is the link for EU countries, which you both already know, but I don't find anything similar at world level. For Belgium, I use these shapefiles combined with function st_read() from sf package which is very flexible about file formats. If no link available, maybe do you have something I can use as well? Thanks!

Data publishing delay assessment

The assessment of data publishing delay is very important for a correct use of segmented regression or any other analysis for studying the emerging status of alien species. If we see a sensible decrease in data publication, then we can assume that the decrease in number of occurrences at species level is not realistic. A easy but effective way to find it, is to get number of occurrences with geographic coordinates in Belgium for each kingdom during the last years.

You can manually search via GBIF site if you want. Here below a link to get all occurrences for:

  1. year = 2017
  2. kingdom = Animalia
  3. hasCoordinate=TRUE
  4. country=BE
    https://www.gbif.org/occurrence/search?has_coordinate=true&year=2017&kingdom_key=1&country=BE

Here below the graphs. You can make them on your laptop, by using the code in this gist.

n_occ_animalia
n_occ_bacteria
n_occ_chromista
n_occ_fungi
n_occ_incertae sedis
n_occ_plantae
n_occ_protozoa
n_occ_viruses
You can see that two years is a good threshold for publishing delay. Only Fungi do better in Belgium. With the popularity of Citizen Science projects like iNaturalist this delay will eventually decrease in the future. But, at the moment 2016 seems the last year to use for performing segmented regression. At the end of 2019, we could then include 2017.

@qgroom , @timadriaens , @ToonVanDaele , what do you think about?

Replace occupancy with AoO on indicator graphs and in workflows

Hi, just a detail for the occurrence indicator graphs Y-axis legend: Occupancy, in principle, is mostly used for the probability that a site is occupied (and expressed between 0 and 1) cf. site occupancy models. What we look at in TrIAS is Area of Occupancy the way IUCN use it to quantify range size for species. This is an interesting paper about the concept.

@damianooldoni we should probably change the legend for the Y axis to "Area of Occupancy (km2)" to avoid confusion.

basic data to display in Harmonia

It would be nice to have some very basic data in Harmonia on which the more advanced indicators/indexes are based. This always helps interpreting trend graphs. The ones I can think of are:

  • the number of occurrences per year
  • the number of km squares per year

I believe for now these are 'byproducts' of the pipeline, perhaps we should think of a suitable output (simple barplot graph).

From occurrences and AreaOfOccupancy to an emerging species decision rule at year level

This issue describes a part of the general workflow for assessing the emerging status of alien species, as discussed on Friday, 15 Feb 2019 by @damianooldoni , @timadriaens and @ToonVanDaele .

Input data

We start from the output of occ-processing repository called cube_belgium.csv as mentioned in trias-project/occ-cube-alien#3. This file contains occurrences with (at least) the following key columns:

  1. taxonKey
  2. speciesKey
  3. kingdomKey
  4. year
  5. CELLCODE (grid id from European Environment)

Grouping by speciesKey and year, we get the number of occurrences per year (x: year, y: n_occs). We work at year level, no more detailed temporary information used. The research effort bias of area of occupancy (AOO) already corrected at this stage (for details about research effort bias correction, see #46). Working at species level can be not always the case, issue discussed separately (see trias-project/unified-checklist#35).

AOO and occurrences are time series (x: year, y: occurrences or y: AOO).
Although we could have data before 1950, we start analysis from 1950, the birth date of invasion ecology (cit. @timadriaens 😃 ).

Limit cases

  1. No occurrences for a species: no analysis possible. Very unlikely situation, but still possible. however, these species MUST be in the final list for Risk Assessment.
  2. Occurrences only at the last valid year (due to delay in data publishing; see #48): no sufficient data for analysis. However, these species MUST be in the final list for Risk Assessment.
  3. Occurrences only at one of the very last years: no sufficient data for analysis. However, these species MUST be in the final list for Risk Assessment. How many years should we consider, is still not clear. However, it should not be too much far in the past, as the absence of observations can just say that such species is not emerging at all.

Segmented regression

After extracting the limit cases, we set occ and AOO equal to zero for years with no occurrences as only years with occurrences are present in the cube. Segmented regression will be applied to the AOO and occ time series separately. So, for each of the two time series and for each year, the slope of the last segment and its confidence interval is evaluated as a categorical variable. We can have three situations:

  1. Slope is positive: occ/AOO is increasing.
  2. Slope is zero (zero is within slope's confidence interval): occ/AOO is stable.
  3. Slope is negative: occ/AOO is decreasing.

Emerging decision table at year level

For each year and species we can then apply a decision table to define the status of emergency of the species:

AOO n. occurrences emerging status
decrease decrease not emerging
decrease stable not emerging
decrease increase potentially emerging
stable decrease not emerging
stable stable not emerging
stable increase potentially emerging
increase decrease potentially emerging
increase stable potentially emerging
increase increase emerging

This will end up in an output like this:

species year emerging status
A 1950 potentially emerging
... ... ...
A 2012 not emerging
A 2013 pot. emerging
A 2014 pot. emerging
A 2015 pot. emerging
A 2016 emerging
... ... ...
B 2012 not emerging
B 2013 pot. emerging
B 2014 emerging
B 2015 emerging
B 2016 not emerging

Next steps: how to aggregate this emerging labels in order to estimate the general emerging status of a species?
My two cents: as our analysis is future oriented, the emerging status in the recent past should definitely weight more in the finale decision than the status in the far past.

@ToonVanDaele , @timadriaens : please comment if you think I missed something or you have new thoughts about it.

From `type`/`description` column combination (Entity-Attribute-Value style) to individual columns

I propose to adapt the current columns type and description in the preprocessing phase (after gbif download and before scripting indicators:

type description key
native range cultivated origin 141264581
origin introduced 141264581
pathway cbd_2014:escape_horticulture 141264581
native range temperate Asia 141264583
origin vagrant 141264583
pathway cbd_2014:escape_horticulture 141264583

to three seperate columns:

native range origin pathway key
cultivated origin introduced cbd_2014:escape_horticulture 141264581
temperate Asia vagrant cbd_2014:escape_horticulture 141264583

This will simplify the implementation of #17 #20 and require minimal adaptation to #19...

Write gbif_download() function

Usage:

gbif_download(
    taxa="https://raw.githubusercontent.com/trias-project/alien-plants-belgium/afbd2805de77afd79fb74669c403d40f1416661b/data/processed/taxon.csv",
    country="BE", # default
    output="../data/output/gbif_downloads.csv" # default
)

The script would:

  1. Verify and report errors the parameter checklist:
    • Can it be found
    • Is it a csv file
    • Is it a GitHub commit URL
    • Has the file a column gbif_nubKey
    • Does gbif_nubKey contain numbers only
  2. Verify and report errors for the option country:
    • Is it a valid ISO 3166-1 alpha-2
  3. Verify and report errors for the option output:
    • Does the file exist
  4. Use rgbif to trigger a download
  5. Write information to data/output/gbif_downloads.csv:
    • gbif_download_key: GBIF download UUID
    • input_checklist: checklist (parameter)
    • input_country: country (parameter)

Hedera subspecies problem

Hi, upon wanting to check emergence status of Atlantic ivy (Hedera hibernica), I noticed differences in taxon matching of several datases:

  • Florabank matches with Hedera helix subsp. hibernica (Poit.) D.C.McClint. (gbif key 6307044) - considered a synonym by gbif
  • observations.be matches with Hedera hibernica (G.Kirchn.) Carrière (gbif key 8168344), whereas in obs.be itself the taxon name is Hedera helix subsp. hibernica (Kirchner) McClintock

In GRISS Belgium, there is only Hedera hibernica (G.Kirchn.) Bean (such as in the Manual of Alien Plants) (gbif key 8410115).

Fact: 3 different gbif keys for the same thing. Consequence: the species does not pop up as emerging in the trias indicator flow, whilst probably every field person will say it is clearly emerging.

Solution?

visualizations of pathways

some ideas to visually do more with introduction pathways of alien species in the checklist based on a similar exercise by Van Wilgen & Wilson 2017 (The status of Biological Invasions and their management in South Africa). I know that for TrIAS we have opted for a tabular view but just to show some possibilities for future use - planning to code these graphs for NARA-T see this issue:

Aantal soorten per pathway (gerangschikt op CBD level1 en level2)
20200317_160603

evolution of pathways in time (based on first introduction date)
20200317_160612

20200317_160633

GitHub cannot render tsv because of double quotes in scientificName

On https://github.com/trias-project/pipeline/blob/master/data/output/checklist_taxa.tsv GitHub indicates that it cannot display the file nicely because "Illegal quoting in line 2710.". That line contains:

Sedum "Autumn Joy" (S. spectabile Boreau x telephium L.)

  1. We could set quote=TRUE for the output, but I think that is going to unnecessarily quote everything and might not even resolve the issue. @damianooldoni can you test this on a branch?
  2. At some point the file will be to big to display on GitHub anyway.

Indicator: cumulative number of alien species

Description

This indicator measures the trends of all alien species introductions (publication of first observations). At the national level this indicator is useful to measure the trends in the presence/occurrence of alien (and potentially invasive) species and inform decisions to do with prevention of alien species introduction and the management and control of invasive species causing impacts on biodiversity and ecosystems. It is based on the same information for #17 but is an alternative representation in line with international policy indicators on (invasive) alien species such as EU headline indicators, SEBI, Aichi, IPBES.

Data needs and data output are the same as #17.

Visualisation

A lineplot is envisaged with colours for breakdowns. Visualizing uncertainty due to use of time periods is discussed in #18

indicator on number of occurrences in protected areas

Another very interesting indicator that we did not touch upon yet because of time constraints, is the proportion of occurrences of an alien species in protected areas. This is imo a very important one that can directly inform risk assessment but also risk management evaluations.

Ideally, this is a geographic subset of the weighted trend, but we need not make things too complicated. So, can we, based on the Area of Occupancy we already have, do an intersect with the N2000 network for Belgium (the official delimitation is on our gis server but probably best to take the one on EEA website)? It could be reported per year (e.g. since 2004, when Europe agreed on the Belgian N2000 areas) which is more interesting internationally and for other countries.

Compensating research effort bias for occupancy: temporal solution vs spatial solution?

The goal of this issue is to discuss the best way of compensate the research effort bias. Based on interesting discussions with @qgroom and @timadriaens, I am working on two different ideas:

Temporal solution

This issue starts from point 4 of @qgroom' comment on issue 40:

Occupancy values are clearly sensitive to research effort. To improve it we need to aggregate along years. The (lack of) research effort is also a source of underestimation of occupancy: some areas are scanned at a certain year, others in other years. So, the question is: how many years should we use to get the optimal aggregation span? To do it, we should plot occupancy vs number of aggregation years (aggregation span). Hopefully we get the same curve as the curve of occurrence vs research effort (search literature for references). We should see a kind of saturation point. Do it species by species (year by year): the saturation point will be probably different among taxa. That would be obviously a problem as we aim to hold it simple, i.e. using a single aggregation span. Of course, we will have to make a decision at the end (number of years to use for temporal aggregation should not be too high, otherwise we loose policy relevancy), but investigation is needed.

I am investigating the research effort bias correction on branch research_effort_bias_corrction, more specifically in ./src/_research_effort_bias.Rmd.

Here below some plots showing the occupancy vs. time window from 2007 to 2018. I don't see a clear saturation curve valid for all species and all years. I think correcting research effort bias by working on temporal dimension will be not effective as we want to use temporal dimension to detect changes in occupancy as well.
2014research_effort_bias_time_window
2015research_effort_bias_time_window
2016research_effort_bias_time_window
2017research_effort_bias_time_window
2018research_effort_bias_time_window
2007research_effort_bias_time_window
2008research_effort_bias_time_window
2009research_effort_bias_time_window
2010research_effort_bias_time_window
2011research_effort_bias_time_window
2012research_effort_bias_time_window
2013research_effort_bias_time_window

Spatial solution

As alternative, @timadriaens and I discussed yesterday a possible alternative: why not working on spatial scale? We can calculate yearly occupancy dividing #occupied cells at species level by #occupied cells at kingdom level instead of dividing by #cells of Belgium. This way we will remove all cells not showing any research effort at all. I am still working on making this calculation feasible (technical discussion opened in #47).
Meanwhile, ideas and comments about temporal and/or spatial solution are welcome!

Physella acuta does not have Haitia acuta as synonym

Originally reported in trias-project/alien-macroinvertebrates#25

Physella acuta is a species in the Alien macroinvertebrates checklist (alien-macroinvertebrates-checklist:taxon:57). Haitia acuta should be a synonym of it... and thus find occurrence records named as such (including 167 occurrences in the Alien macroinvertebrates occurrence dataset, e.g. PB:Ugent:AqE:2342). The GBIF backbone however considers both 2 accepted species:

Physella acuta thus won't return records for Haitia acuta.

Note: Haitia acuta is the only species in the macroinvertebrate occurrence dataset that is not listed by the same name in the checklist.

Indicator: pathways associated with alien species introductions

Description

The indicator shows the number of non-native plant and animal species introduced in Belgium via a certain pathway. It is based on a checklist of alien species, composed of various existing sources and databases. The information will be updated and refined as the checklist is further supplemented. The available information on introduction pathways was organized following the Convention on Biological Diversity standard (CBD 2014).

This indicator uses the same data output as issue #17. It is a specific group by (per pathway) on the same dataframe.

Is there need to use a European reference grid at 1x1km resolution?

Based on previous conversations, we found that no worldwide grid is available and it is actually not needed for risk assessment (RA). We need it at European level for RA and at Belgian level for indicators. Based on what I can find from European Environmental Agency page I cannot find any grid at 1x1km resolution at European level, only 10x10km and 100x100km.

@DiederikStrubbe, @amyjsdavis : do you need to calculate occupancy at European level at 1x1km or is it fine for you to get 1x1km resolution at Belgian level?

If you really need, do you have ideas how to get it? Maybe we can build it by using all reference grids at country level and join them together but we will have a lot of problems at country borders where duplicates will occur. Something to think about... Let me know. If you have a shape file for it, could you please upload it in ./data/external/? Thanks.

Effects of grid size on occupancy: ideas for improvements

Based on remarks of @qgroom :

  1. Try to study these effects with two or three common native species.
  2. PRESERVED_SPECIMEN should also be included in the basis of record.
  3. Check that the occurrences without coordinates don't have a sensible grid reference in verbatimCoordinateSystem field.
  4. For records missing a coordinateUncertaintyInMeters is it possible to tell the coordinate uncertainty from the number of significant digits in the long/lat?
  5. An output graph showing the mean and variance of coordinateUncertaintyInMeters vs. time would be useful ancillary information.

UTF-8 issues in scientificName

We currently have scientificNames with:

2772890 134087647 Loncomelos brevistylus (Wolfner ) Dost<U+00E1>l 9ff7d317-609b-4c08-bd86-3bc404b77c42 Loncomelos brevistylum (Wolfner) Dost<U+00E1>l Plantae SYNONYM 2772885 Ornithogalum pyramidale L.

As you notice, these issues appear in the checklist_scientificName and backbone_scientificName.

So, must be related to how the data are read from the API (because of our own code or rgbif).

Get info about alien species at protect area level

Requeste of @timadriaens: add analysis at protected area level.
A tabular file as output:

protected_area_id type year taxonKey obs ncells coverage
BE3748432 A 2010 124245 34 15 0.30
BE3748432 A 2011 124245 44 29 0.58
BE3748432 A 2012 124245 58 32 0.64
BE3748432 A 2013 124245 69 36 0.72
BE3748432 A 2014 124245 88 43 0.86
BE3748432 A 2015 124245 92 45 0.90
BE3748432 A 2016 124245 95 47 0.94
BE3748432 A 2017 124245 99 48 0.96

GAM and decision rules output of marine species selected for PRA

From e-mail of Thomas Verleye:

We have pre-selected 5 species for PRA based on initial data availability and trends in occurrences (when we’ll face additional bottlenecks regarding data availability or expert identification we might reduce this number during the coming weeks, tbc):

  1. Crepidula fornicata (gastropod)
  2. Mnemiopsis leidyi (Ctenophora)
  3. Ensis leei (bivalve)
  4. Hemigrapsus takanoi (Crustacea)
  5. Magalana (Crassostrea) gigas (bivalve)

@damianooldoni: would it be possible to create a GAM for C. fornicata & M. Leidyi? Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.