GithubHelp home page GithubHelp logo

austraits.build's Introduction

austraits.build: source for AusTraits

DOI build

AusTraits is a transformative database, containing measurements on the traits of Australia’s plant species, standardised from hundreds of disconnected primary sources. So far, data have been assembled > 300 distinct sources, describing > 500 plant traits for > 25k taxa. The dataset and approach is documented in detail in the following publication

Falster D, Gallagher R, Wenk, E et al. (2021) AusTraits, a curated plant trait database for the Australian flora. Scientific Data 8: 254. DOI: 10.1038/s41597-021-01006-6

The repo contains the data for rebuilding AusTraits, while the workflow to rebuild the dataset is on the traits.build repo.

AusTraits is continually evolving, as new datasets are contributed. As such, there is no single canonical version. We are continually making new versions available. Overtime, we expect that different versions will be released and used in different analyses.

Accessing data

Those interested in simply using data from AusTraits, should visit download the compiled resource from the versioned releases archived on Zenodo at DOI: 10.5281/zenodo.3568417.

Users will want to read up on the database structure, described in the traits.build manual.

Definitions for the traits are described the AusTraits Plant Dictionary (APD), at

Citation

Users of AusTraits are requested to cite the source publication, which documents the dataset and approach:

Falster D, Gallagher R, Wenk, E et al. (2021) AusTraits, a curated plant trait database for the Australian flora. Scientific Data 8: 254. DOI: 10.1038/s41597-021-01006-6

Rebuilding AusTraits from source

This repository (austraits.build) contains the raw data and code used to compile AusTraits from diverse, original sources.

To handle the harmonising of diverse data sources, we use a reproducible workflow to implement the various changes required for each source to reformat it into a form suitable for incorporation in AusTraits. Such changes include restructuring datasets, renaming variables, changing variable units, changing taxon names. For the sake of transparency and continuing development, the entire workflow is made available here.

We use the traits.build R package and workflow to harmonise > 300 different sources into a unified dataset. The workflow is fully-reproducible and open, meaning it exposes the decisions made in the processing of data into a harmonised and curated dataset and can also be rerun by others. AusTraits is built so that the database can be rebuilt from its parts at any time. This means that decisions made along the way (in how data is transformed or encoded) can be inspected and modified, and new data can be easily incorporated.

To build the database follows these steps

Install traits.build

The first step is to install a copy of traits.build:

remotes::install_github("traitecoevo/traits.build", quick = TRUE)

Clone repository

Next you need to download a copy of this repository from GitHub. Then open the Rstudio project, or open R into the right repo directory.

Build

Building the database should then be as easy as running the code in the file build.R. Note this code can use multiple CPUs, to do this, change the number of workers to > 1.

source("build.R")

After running, you should have an object austraits available in your workspace, as well as a version saved in export/data.

Updating the build script

To update the build process

traits.build::build_setup_pipeline(method="furrr", database_name = "austraits", workers = 1)

Contributing to AusTraits

We envision AusTraits as an ongoing collaborative community resource that:

  1. Increases our collective understanding of the Australian flora
  2. Facilitates the accumulation and sharing of trait data
  3. Builds a sense of community among contributors and users
  4. Aspires to be fully transparent and reproducible research of the highest standard.

We'd love for you to contribute to the projects. Below are some ways you can contribute:

  • Contributing new data
  • Improving data quality and reporting errors
  • Improving documentation
  • Development of `traits.build`` workflow

For details on on how to contribute, please see the file CONTRIBUTING.md

The AusTraits project is released with a Contributor Code of Conduct. By contributing to this project you agree to abide by its terms.

Acknowledgements

Funding: This work was supported via the following investments:

  • Investment (https://doi.org/10.47486/TD044, https:// doi.org/10.47486/DP720) from the Australian Research Data Commons (ARDC). The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).
  • Fellowship from the Australian Research Council to Falster (FT160100113), Gallagher (DE170100208) and Wright (FT100100910),
  • A UNSW Research Infrastructure Grant to Falster, and
  • A grant from Macquarie University to Gallagher.

Recognition: Many people have contributed to AusTraits. A list of contributors is provided on the on Zenodo at DOI: 10.5281/zenodo.3568417.

Further information about the AusTraits project is available at the project website austraits.org.

Resuse: At this stage, only the compiled AusTraits dataset is available for reuse, via Zenodo. The raw data sources provided in this repository are not available for reuse in their current form, without further discussion from data contributors.

austraits.build's People

Contributors

caitlanb avatar dcol2804 avatar dfalster avatar dindiarto avatar ehwenk avatar garytruong avatar gilliankowalick avatar jamesrlawson avatar rachaelgallagher avatar richfitz avatar samcandrew avatar snubian avatar yangsophieee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

austraits.build's Issues

Establish process for reviewing studies

Things to check

  • did we catch all traits (as noted in #25, some datasets don't have all traits correctly mapped)
  • metadata
  • do values seem reasonable
  • methods
  • replicates

For methods:

  • location of samples (e.g. branch, trunk, dbh, etc)

Investigate why regexpr calls warning

Calls to regexpr in processData function give the warning:

Error in regexpr(regex, x) : 
  (converted from warning) argument 'pattern' has length > 1 and only the first element will be used

A note in the file by @snubian says: " # NOTE: this throws a warning for some reason, saying the pattern has length > 1. Not sure why, doesn't seem to matter."

For now I have silenced warnings with suppressWarnings(...) but would be good to understand why this warning appears.

Update tests for latest testthat

Recently testthat package has been updated and the tests i've written may not work with new version. I'm running testthat_0.11.0 whereas latest is testthat_1.0.2

review readme page

Rahchael can you read over the new readme page and check you are happy with content. It will get as expanded in future.

Add data for all sources

@snubian, I noticed you only added some of the data. Is there a reason we can't add it all now? I'm doing some operations like adding a metadata file to each folder, so I'll need to redo these each time a new cache of data appears.

Also, I removed all the config subfolder within the folder for each dataset. Easier to have just as ingle folder for each dataset I think. I wonder if you might have recently moved the config files into config folder, so apologies if this cost you time.

Standardise categorical trait values

I think stuarts existing code has this functionality, via the file configLookups.csv. Elsewhere James has been documenting changes to be made, so task is to import James's changes into existing workflow.

Character T being read as TRUE

Just noticed this, for example in dataset_066, that a single character "T" is converted to TRUE by read.csv. Here is a line of data from the csv:

Berrimah_66,Eucalyptus tetrodonta,T,E,S,C3,No,,63.5,,0.68,1.09,105,16.4, 7.0 ,1.1

In R this becomes:

2 Berrimah_66       Eucalyptus miniata TRUE          E         S   C3     No                     NA        74.0 etc

Not only is "TRUE" written to the output file, but the lookup value is not found - in this case we want T to give the lookup value of "tree" for trait growth form.

Implement unit conversions

Should be able to more or less use existing function (from BAAD), combined with James' updated table of conversions

Flowering time data is messy

A number of different formats for flowering and fruiting time data are present (see below). austraits v1 had code to standardise this data which could be used to similar effect in austraits v2

e.g.

a_ <- a[a$trait_name == "flowering_time",]
unique(a_$value)
[1] "Sept - Oct" "All year" "Nov - Dec" NA "Dec - Feb" "Aug - Nov" "Aug. - Oct" "Oct - Nov"
[9] "Aug - Dec" "Dec - Jan" "Jan - Feb" "June - Aug" "Oct - Dec" "Oct -Dec" "Sept - Dec" "July - Nov"
[17] "July - OCt" "Aug - Sept" "Mar - May" "Oct - Feb" "Sept - Nov" "Feb - Apr" "July - Oct" "Aug - Oct"
[25] "April - June" "Nov - Feb" "Nov - Jan" "Nov - March" "Sept - Jan" "Summer" "Oct - Jan" "July - Dec"
[33] "Oct - Mar" "Spring - Summer" "Summer - Autumn" "Dec - Mar" "Jan - Apr" "Sept - Mar" "Apr - May" "April - July"
[41] "April - Oct" "Feb - June" "June - Sept" "May - July" "Spring" "Aug - Mar" "Dec - May" "Feb - Mar"
[49] "July - Feb" "June - July" "Nov - MAr" "May - Oct" "Nov - April" "Aug - Feb" "Sept - Feb" "Aug - Dec ?"
[57] "Mar - Nov" "Dec - Apr" "Dec. - Feb" "July - Sept" "June - Nov" "Nov - Mar" "Dec - Feb, fire" "May - June"
[65] "irregular" "Nov - Apr" "April - May" "July - Aug" "Mar - Sept" "Sept - May" "Apr - Nov" "Apr - Sept"
[73] "Feb - Apr;July - Aug" "Feb - May" "Mar - June" "Mar - May; Oct - Dec" "May - Sept" "Sept -Nov" "Apr - Aug" "Feb - July"
[81] "Jan - July" "June - Oct" "Dec - MAr" "Spring- summer" "May - Aug" "Dec - July" "Mar - MAy" "Nov. - Jan"
[89] "April - Sept" "Mar - April" "Mar - Aug" "Mar - July" "June - Jan" "May - Dec" "May - Nov" "Apr - Oct"
[97] "Aug - Oct" "Dec - June" "June - Dec" "Nov" "Nov -Dec" "Oct" "Apr - July" "Aug - Jan"
[105] "Aug - NOv" "Jan" "Jan - Apr" "Jan - June" "Jan - May" "Aug -Nov" "Nov - Dec" "Oct. - Nov"
[113] "Jan - Mar" "July - Oct" "Sept" "Oct - April" "Sept- Dec" "all year" "allyear" "apr-aug"
[121] "jan-sep" "jan-mar" "feb-may" "feb-sep" "feb-aug" "mar-jul" "mar-may" "feb-jun"
[129] "jan-jun" "jun-sep" "jan-jul" "jan-apr" "sep-nov" "sep-dec" "dec" "nov-may"
[137] "dec-jul" "oct-jan" "nov-apr" "jan-aug" "apr-dec" "dec-may" "jun-oct" "apr-oct"
[145] "jul-sep" "jul-aug" "aug-nov" "dec-jun" "mar-nov" "jul-nov" "sep-feb" "dec-jan"
[153] "mar-apr" "jan-feb" "dec-mar" "mar-sept" "feb-jul" "apr-jul" "mar-jun" "may-nov"
[161] "apr-nov" "jul-oct" "jan-may" "june" "oct-mar" "jun-feb" "nov-jan" "nov-jun"
[169] "aug-mar" "nov-aug" "dec-aug" "nov-mar" "jun-dec" "nov-feb" "oct-dec" "nov-dec"
[177] "oct-feb" "aug-jan" "apr-jun" "may-oct" "sep-mar" "mar" "jan" "jun-aug"
[185] "apr-sep" "jan-oct" "jul-mar" "jun-may" "sep-may" "ay" "oct-may" "may-aug"
[193] "apr-may" "dec-feb" "feb" "nov-jul" "feb-mar" "feb-apr" "apr" "mar-dec"
[201] "may-sep" "jul-jan" "feb-nov" "feb-oct" "mar-aug" "feb-dec" "sep-oct" "may-jul"
[209] "oct" "sep" "aug-dec" "dec-apr" "AY" "mar-sep" "dec-sep" "jun-nov"
[217] "sep-jun" "jul" "mar-oct" "jun" "jun-mar" "oct-jul" "aug-apr" "oct-jun"
[225] "nov" "jan-nov" "sep-apr" "oct-nov" "aug-feb" "aug-may" "aug-oct" "may-jun"
[233] "jun-jul" "aug" "nov-oct" "jul-dec" "apr-feb" "apr-jan" "aug-sep" "dec-oct"
[241] "ephemeral" "Feb" "feb-april" "jan/june" "jan-feb/jun-jul" "jul-apr" "jul-may" "mar,sep"
[249] "mar-aug/ephemeral" "mar-aug/nov-jan" "mar-may/aug-oct" "may" "may-dec" "may-july" "nov-sep" "oct-apr"
[257] "sep-jan" "sep-jen" "sep-jul" "spring" "spring/autumn" "spring-summer" "summer" "aug-jul"
[265] "autumn" "jan-july" "july" "summer-autumn" "periodic" "may-feb" "jun-jan" "aug-jun"
[273] "jul-feb" "may-jan" "4-10" "1-12" "8-1" "7-10" "5-9" "5-11"
[281] "8-10" "7-9" "2-7" "10-1" "4-5" "7-8" "9-10" "5-10"
[289] "12-8" "5-8" "2-8" "6-9" "11-2" "11-12" "6-7" "7-11"
[297] "9-3" "1-3" "5" "8-11" "12-7" "5-6" "4-6" "4-8"
[305] "6-8" "11-1" "8-9" "5-7" "3-5" "9-12" "12-1" "1"
[313] "12-3" "11-6" "6" "10-2" "9" "3-10" "3-8" "2-9"
[321] "1-8" "2-11" "12-4" "4-9" "3-7" "1-7" "8-12" "1-5"
[329] "1-6" "10-12" "3-6" "11-3" "1-10" "6-11" "11-4" "6-12"
[337] "4-7" "1-4" "9-11" "3" "7-1" "8" "3-9" "2-10"
[345] "1-2" "7-12" "3-12" "6-1" "6-10" "10" "10-11" "12-2"
[353] "2-4" "9-2" "2-5" "2-3" "12-5" "9-1" "8-2" "10-6"
[361] "2" "9-5" "4-12" "9-4" "2-6" "10-3" "4-2" "12"
[369] "11" "11-5" "10-4" "1-9" "4" "4-11" "7" "10-7"
[377] "5-4" "6-2" "6-3" "7-2" "5-12" "10-5" "3-11" "12-6"
[385] "7-3" "3-4" "8-5" "8-3" "5-2" "5-1" "8-4" "1-11"
a_ <- a[a$trait_name == "flowering_month_start",]
unique(a_$value)
[1] "10" "12" NA "11" "9" "3" "8" "1" "6" "2" "4" "7" "all year" "5"
[15] "May" "July" "March" "All year" "September" "November" "December" "June" "August" "April" "January" "February" "October" "NULL"
[29] "MAR" "MAY" "SEP" "FEB" "JAN" "AUG" "DEC" "APR" "JUN" "JUL" "OCT, JAN" "OCT" "NOV" "SEPT"
[43] "NOV, JAN" "ephemeral" "JULY" "JUNE" "APRIL" "JAN, AUG" "APRIL, OCT" "MAR, JULY" "APRI" "FEB,AUG" "OCT,JAN" "MAY, AUG" "FEB, MAY"
a_ <- a[a$trait_name == "flowering_month",]
unique(a_$value)
[1] "Sep" "Oct" "Nov" "Dec" "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul"
[12] "Aug" "all year" "after rain" "after flooding" "after fire"

Table of site variable targets

We need a table of desired variables names and units for site variables, like this one for plant traits config/variableDefinitions.csv

review contributing page

Stuart, could you review the README.md and CONTRIBUTING.md pages, to see if the instructions generally make sense?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.