nceas / datateam-training Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 21.0 21 MB

Training and reference materials for ADC and SASAP data team members

Home Page: https://nceas.github.io/datateam-training/training/

License: Apache License 2.0

CSS 23.70% R 76.30%

datateam-training's People

Contributors

Stargazers

Watchers

datateam-training's Issues

R 4.0.0 future updates

Keeping a running list of things that need updating when we do move to R 4.0.0

I don't anticipate a lot but just preventing myself from forgetting.

need to make sure arcticdatautils is available for R 4.0.0
remove stringsAsFactors = FALSE from attribute data.table creation

update EML section with eml2 syntax

we will want to have this updated by the time EML2 gets released on CRAN

adding unitlist to additional metadata

I was trying to add custom units based on the editing EML training

I was couldn't run the line (appears to be coercing the unitlist to additionalMetadata):
doc$additionalMetadata <- c(as(unitlist, "additionalMetadata"))

The Error seems to be as(unitlist, "additionalMetadata"):

Error in as(unitlist, "additionalMetadata") : 
  no method or default for coercing “list” to “additionalMetadata”

is it ok to go ahead and doc$additionalMetadata <- unitlist instead?

Add profile page summary

Add an overview on how pages like this https://arcticdata.io/catalog/profile/http://orcid.org/0000-0001-6691-9384 get populated.

4.1 clarify text on pids

rm_pid <- "your_resource_map_pid"

pkg <- get_package(adc_test,
                   rm_pid,
                   file_names = TRUE)

add more guiding text to help clarify how to get pids through R and refer back to this function whenever the document asks for pids
refer users back to chapter 2 if you need a refresher

`formatId` should be `format_id` in section 3.4

Section 3.4 has the following example code:

pid <- publish_object(adc_test,
                      path = path,
                      formatId = formatId)

...but the correct keyword argument for publish_object is format_id, not formatId. The code chunk should be:

pid <- publish_object(adc_test,
                      path = path,
                      format_id = formatId)

add EML paper to training

http://onlinelibrary.wiley.com/doi/10.1890/0012-9623(2005)86%5B158:MTVOED%5D2.0.CO;2/abstract

needs to be added to the intro to EML section

4.3 remove print button directions ?

In my version of the shiny_attributes app there doesn't seem to be the print button mentioned. Only a Download, Quit App and Help buttons. Is the workflow different now?

Once you are done editing a table in the app, click the Print button to print text of a code that will build a data.frame in R. Copy that code and assign it to a variable in your script (e.g. attributes - data.frame(...)).

Create Data Portals section

include dbo_packages.Rmd so it isn't hidden in misc any more
more general document on creating portals (refer to https://arcticdata.io/data-portals/) and filters

move chapter 9 to reference guide

move nesting to reference guide as we transition to portals (but keeping for rare or older requests).

Revising sections

Solr - Irene (Emily can still revise/etc!)
Git/RStudio - Emily (Emily, I can add more about the RStudio part if you don't do pointy clicky)
Formatting in EML (currently on Enterprise) - Irene
Add section about EML references (currently on Enterprise)
Expand on the exploring EML schema section in References: Editing EML Steph
Insert more links to other training sections within existing text Steph

New sections

Useful workflows (intern contributions)

Sharis

Adding taxonomic coverage
Adding single data temporal coverage
Adding data tables for a whole folder of files with the same attributes
Adding a pre generated DOI to the eml
Obsolescence chain
Adding sampling info in methods
Set rights and access
Working with NetCDF’s

Vivian

Reading in data
- Single file
- Multiple files
Removing rows and columns with all blank cells
Indices of cells that contain a certain string (or part of a string)
Reformatting dates/times (YYYY-MM-DDThh:mm:ss)

Other To-do's: update training with datamgmt functions
(moved from google doc)

Update 'Data Management' guidelines article URL in training docs, or include the PDF directly

The URL we currently list in "01_Introduction.Rmd" for "effective data management" does not resolve any longer, but this one does (for now):

http://www.jstor.org/stable/pdf/bullecosociamer.90.2.205.pdf?refreqid=excelsior:086f3755ac689fab9f491f083875cb0c

Consider downloading the PDF and storing it in our repo.

Text doesn't match what's happening

In 3.2, it says:

For example, let’s take a look at eml-party. To start off, notice that some elements are in solid boxes, whereas others are in dashed boxes.

Elements are not in solid/dashed boxes...in fact, the EML-party schema looks nothing like the screenshots on the training page.

Add eml_get() to training

Not sure yet about the limits of the function, but it seems like a useful way to get into the EML without having to go in super deep with @ and [[1]].

Example: eml_get(eml, "methodStep")

4.2.2 EML schema link broken

Looks like the link in chapter 4.2.2 refers to an older version of EML (2.1.1). But i'm not sure what the original was supposed to show from here: https://eml.ecoinformatics.org/eml-schema.html

Try embedding create_attributes_table()

It would be cool to play around with the shiny app within training!

Warning in "Use References" Section

@isteves There is a code chunk with a warning in the "Use references" section of the reference and training

Add new arcticdatautils functions to Editing EML chapter

doc <- eml_add_publisher(doc)
doc <- eml_add_entity_system(doc)

The result won't show up on the webpage but it should add a publisher element to the dataset element and a system to all of the entities based on what the PID is. This will help make our metadata more FAIR (Findable, Accessible, Interoperable, Reusable). Let me know if you run into issues!

Update github section with workflow diagrams

probably mostly in reference, rather than training

Add super/subscript to references

Add section on adding superscript and subscript to abstract / methods. Abstract is more straightforward since there as there's only one section.

RT settings

change user preferences to immediately and Show oldest history first "No"

2 broken links in 1st paragraph of datateam-training/workflows/explore_eml/understand_eml_schema.Rmd

Both links in this portion of the paragraph are broken -

Additional information on the schema and how different elements (or "slots") are defined can be found [here](https://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/index.html)). Further explanations of the symbology can be found [here](https://manual.altova.com/xmlspy/spyenterprise/index.html?xseditingviews_schv_cmview_objects.htm).

@jeanetteclark I poked around a little but wasn't sure if I was truly finding webpages that would be appropriate replacements, I think you might be a better judge of that / be able to think of some off the top of your head

EML 2.2.0 updates

Keeping track on what needs to be updated when we switch to EML 2.2.0

update final_review_checklist
update publish_an_object
add section on citations

Revise PI correspondence section

We need to revise & update the email templates/PI correspondence section.
@maier-m @jagoldstein

Add metacat version somewhere...

https://test.arcticdata.io/metacat/metacat?action=getversion

probably References/misc

add file paths section

add `eml_get_simple` examples and introduction to chapter 3

Remove rawToChar from eml call

Not sure how many times this appears, but it can be changed from eml <- EML::read_eml(rawToChar(dataone::getObject(mnT, pkg$metadata))) to eml <- EML::read_eml(dataone::getObject(mnT, pkg$metadata))

add section describing DBO specific considerations

for all DBO datasets:

the group CN=DBO,DC=dataone,DC=org should have readPermission and writePermission
geographic coverage should be one coverage per DBO line, with the geographicDescription and bounding coordinates matching those in the attached file
the name of the ship the data were collected from should be listed somewhere in the metadata record

geo_locs.txt

Add commonly used custom units

Add commonly used custom units - much like the example solr queries section.

Some of the top of my head:
[] partsPerMillion
[] partsPerThousand
[] wattsPerSquareMeter

update spatialVector section to include spatialRaster

update to include spatialRaster and files to look for that maybe spatial data
add new helper functions for rasters made by Jeanette

update email templates

Update email templates that has any language similar to: "NSF requires X,Y,etc."

Add update_package_object()

@isteves Add update_package_object() to the "Update an Object" section in the training and reference

Data Team Training Issues

Broken link in 1.2 Effective Data Management to Matt Jones et al.’s paper on effective data management
404 Webpage not found for eml-party (2x) and eml-attribute, and eml-physical under 3.2 "Understand the EML schema"
Link not found for "exploring EML (more on that here)" under 3.3 "Access specific elements"
404 Webpage not found for "attributeList" under 4.3 "Edit attributeLists"
5.7.1 - Blank link - Under provided dataset "Nothing was found"

Add recover_failed_submission to reference guide

We should add a short section here about using the recover_failed_submission function. It should include details about what happens when a submission fails - metacatUI catches a submission error and uploads the EML text as a data object instead. Next, we can use the recover_failed_submission function to try to remove the error text and get a valid EML document - note that this may not always work based on the error. Finally we want to upload the recovered document as a metadata document and set the rights and access to the correct submitter.

Add more robust guidelines for dealing with packages

Related to work @maier-m is already doing with Kathryn/arctic-outreach.

Some questions that need to be answered:

When is it appropriate to touch the PI's data?
- Excel to csv/txt changes are ok, but we prefer if the PI does it themselves
- Changing headers are ok if there are other changes to be made; needs to be documented in the description and (preferably) also an R script
- Changed files should be linked to originals with prov (in the future, we may want to obsolete the old versions of files and link them via prov, but prov is not yet robust enough for this)
What constitutes a "good enough" methods/abstract?

add taxonomicCoverage to checklist

all biological datasets should include some taxonomic coverage. We need to add this to the EML editing section and the final checklist

break up chapter 4 and exercise 3

Make exercise 3 into part A, B , C etc.

will help chapter feel less long
gives a little more guidance to the user

ie exercise 3a - create attribute table, exercise 3b - set physical, exercise 3c review using checklist

Custom units example

I noticed that we don't have an example of creating a custom units data frame in section 4.3.2 of Editing EML. Additionally some of the references to the datamgmt functions in that section are now deprecated.

Create an example custom_units data frame with 3ish custom units
Remove any deprecated datamgmt functions and replace those workflows with appropriate instructions

explain `cn` and `mn` differences more clearly

Section 3.4 I think could benefit from a more robust description of what the different nodes are. This is the question I got about this section:

On step 3.4, do we set the PROD nodes before setting the Staging nodes? The staging nodes uses cn, which is in the PROD node. When I set the Staging node in the console I get an error saying object: cn is not found. (I dont want to set something up on accident and end up submitting the training set using the PROD node).

and the answer:
PROD and STAGING are two different coordinating nodes (`cn`) and each of those coordinating nodes has many member nodes (`mn`) including KNB and Arctic which have different names depending on which coordinating node you are working in

Revise attributes section

Remove get_custom_units (deprecated)

Introduction to Solr - More Use Cases

@isteves Some more use cases for Solr query would be helpful. For instance, what would the workflow look like? When would it be helpful to query Solr in context of the Arctic Data Center/data processing?

Remove `add_creator_id` to Reference Manual

datamgmt::add_creator_id is no longer used

Replace links with html

@jagoldstein's idea: Links direct in the current tab instead of opening a new one. We should replace the all markdown training links with the following html format: <a href="http://example.com/" target="_blank">example</a>

Solr training issues

(1) Add tip for using "obsoletedBy" field to find only the most recent versions of the packages you're searching for.
(2) Add an example of a query that looks for packages where fields are missing (e.g. -keywords:*)
(3) The link highlighted at the bottom of the page in this image is broken.

revise email template text

I'd like to update and consolidate some of the language in this section

Exercise 1 Part 1 bullets not rendering properly

For some reason 1.8.1 part 1 looks like a large paragraph but part 2 is rendering correctly as bullet points.

Clarify 2.6 to use test.arcticdata.io

I had a little bit of difficulty figuring out why I couldn't publish_object until I realized I got my token from the regular arcticdata.io rather than the test.arcticdata.io because I had the two open from earlier because I was trying to code along with the document.

I would suggest to either clarify 2.6 to use test.arcticdata.io to get the token to prevent the user from following the hyperlink in 2.3 to the regular site

add to chapter 8 - merge tickets

add information on how to merge multiple tickets (PI submitting multiple related datasets) into one for consolidated response

Broken Link - Simple Data Management Paper

In 1,2 (Effective Data Management), there's a broken link.

Read Matt Jones et al.’s paper on [effective data management] to learn how we will be organizing datasets prior to archival.

I imagine the updated link should be here: https://esajournals.onlinelibrary.wiley.com/doi/epdf/10.1890/0012-9623-90.2.205

nceas / datateam-training Goto Github PK

datateam-training's People

Contributors

Stargazers

Watchers

Forkers

datateam-training's Issues

Revising sections

New sections

Useful workflows (intern contributions)

Recommend Projects

Recommend Topics

Recommend Org

Jobs