nceas / datateam-training Goto Github PK
View Code? Open in Web Editor NEWTraining and reference materials for ADC and SASAP data team members
Home Page: https://nceas.github.io/datateam-training/training/
License: Apache License 2.0
Training and reference materials for ADC and SASAP data team members
Home Page: https://nceas.github.io/datateam-training/training/
License: Apache License 2.0
Keeping a running list of things that need updating when we do move to R 4.0.0
I don't anticipate a lot but just preventing myself from forgetting.
arcticdatautils
is available for R 4.0.0stringsAsFactors = FALSE
from attribute data.table
creationwe will want to have this updated by the time EML2 gets released on CRAN
I was trying to add custom units based on the editing EML training
I was couldn't run the line (appears to be coercing the unitlist to additionalMetadata):
doc$additionalMetadata <- c(as(unitlist, "additionalMetadata"))
The Error seems to be as(unitlist, "additionalMetadata")
:
Error in as(unitlist, "additionalMetadata") :
no method or default for coercing “list” to “additionalMetadata”
is it ok to go ahead and doc$additionalMetadata <- unitlist
instead?
Add an overview on how pages like this https://arcticdata.io/catalog/profile/http://orcid.org/0000-0001-6691-9384 get populated.
rm_pid <- "your_resource_map_pid"
pkg <- get_package(adc_test,
rm_pid,
file_names = TRUE)
Section 3.4 has the following example code:
pid <- publish_object(adc_test,
path = path,
formatId = formatId)
...but the correct keyword argument for publish_object
is format_id
, not formatId
. The code chunk should be:
pid <- publish_object(adc_test,
path = path,
format_id = formatId)
http://onlinelibrary.wiley.com/doi/10.1890/0012-9623(2005)86%5B158:MTVOED%5D2.0.CO;2/abstract
needs to be added to the intro to EML section
In my version of the shiny_attributes
app there doesn't seem to be the print button mentioned. Only a Download
, Quit App
and Help
buttons. Is the workflow different now?
Once you are done editing a table in the app, click the Print button to print text of a code that will build a data.frame in R. Copy that code and assign it to a variable in your script (e.g. attributes - data.frame(...)).
dbo_packages.Rmd
so it isn't hidden in misc any moremove nesting to reference guide as we transition to portals (but keeping for rare or older requests).
update_package_object
- Irene written, needs to be incorporatedqa_package
- Emilyadd_creator_id
- Irene mostly written, might (?) be deprecated when new editor is releasedshow_indexing_status
remove_public_read
/set_public_read
janitor::excel_numeric_to_date()
, get_dupes()
, clean_names()
(vignette)eml_get
/data exploration section - IreneSharis
Vivian
Other To-do's: update training with datamgmt functions
(moved from google doc)
The URL we currently list in "01_Introduction.Rmd" for "effective data management" does not resolve any longer, but this one does (for now):
Consider downloading the PDF and storing it in our repo.
In 3.2, it says:
For example, let’s take a look at eml-party. To start off, notice that some elements are in solid boxes, whereas others are in dashed boxes.
Elements are not in solid/dashed boxes...in fact, the EML-party schema looks nothing like the screenshots on the training page.
Not sure yet about the limits of the function, but it seems like a useful way to get into the EML without having to go in super deep with @ and [[1]].
Example: eml_get(eml, "methodStep")
Looks like the link in chapter 4.2.2 refers to an older version of EML (2.1.1). But i'm not sure what the original was supposed to show from here: https://eml.ecoinformatics.org/eml-schema.html
It would be cool to play around with the shiny app within training!
@isteves There is a code chunk with a warning in the "Use references" section of the reference and training
doc <- eml_add_publisher(doc)
doc <- eml_add_entity_system(doc)
The result won't show up on the webpage but it should add a publisher element to the dataset element and a system to all of the entities based on what the PID is. This will help make our metadata more FAIR (Findable, Accessible, Interoperable, Reusable). Let me know if you run into issues!
probably mostly in reference, rather than training
Add section on adding superscript and subscript to abstract / methods. Abstract is more straightforward since there as there's only one section.
change user preferences to immediately and Show oldest history first "No"
Both links in this portion of the paragraph are broken -
Additional information on the schema and how different elements (or "slots") are defined can be found [here](https://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/index.html)). Further explanations of the symbology can be found [here](https://manual.altova.com/xmlspy/spyenterprise/index.html?xseditingviews_schv_cmview_objects.htm).
@jeanetteclark I poked around a little but wasn't sure if I was truly finding webpages that would be appropriate replacements, I think you might be a better judge of that / be able to think of some off the top of your head
Keeping track on what needs to be updated when we switch to EML 2.2.0
final_review_checklist
publish_an_object
We need to revise & update the email templates/PI correspondence section.
@maier-m @jagoldstein
https://test.arcticdata.io/metacat/metacat?action=getversion
probably References/misc
Not sure how many times this appears, but it can be changed from eml <- EML::read_eml(rawToChar(dataone::getObject(mnT, pkg$metadata)))
to eml <- EML::read_eml(dataone::getObject(mnT, pkg$metadata))
for all DBO datasets:
CN=DBO,DC=dataone,DC=org
should have readPermission
and writePermission
Add commonly used custom units - much like the example solr queries section.
Some of the top of my head:
[] partsPerMillion
[] partsPerThousand
[] wattsPerSquareMeter
Update email templates that has any language similar to: "NSF requires X,Y,etc."
@isteves Add update_package_object() to the "Update an Object" section in the training and reference
We should add a short section here about using the recover_failed_submission
function. It should include details about what happens when a submission fails - metacatUI catches a submission error and uploads the EML text as a data object instead. Next, we can use the recover_failed_submission
function to try to remove the error text and get a valid EML document - note that this may not always work based on the error. Finally we want to upload the recovered document as a metadata document and set the rights and access to the correct submitter.
Related to work @maier-m is already doing with Kathryn/arctic-outreach.
Some questions that need to be answered:
all biological datasets should include some taxonomic coverage. We need to add this to the EML editing section and the final checklist
Make exercise 3 into part A, B , C etc.
ie exercise 3a - create attribute table, exercise 3b - set physical, exercise 3c review using checklist
I noticed that we don't have an example of creating a custom units data frame in section 4.3.2 of Editing EML. Additionally some of the references to the datamgmt
functions in that section are now deprecated.
custom_units
data frame with 3ish custom unitsdatamgmt
functions and replace those workflows with appropriate instructionsSection 3.4 I think could benefit from a more robust description of what the different nodes are. This is the question I got about this section:
On step 3.4, do we set the PROD nodes before setting the Staging nodes? The staging nodes uses cn, which is in the PROD node. When I set the Staging node in the console I get an error saying object: cn is not found. (I dont want to set something up on accident and end up submitting the training set using the PROD node).
and the answer:
PROD and STAGING are two different coordinating nodes (`cn`) and each of those coordinating nodes has many member nodes (`mn`) including KNB and Arctic which have different names depending on which coordinating node you are working in
Remove get_custom_units (deprecated)
@isteves Some more use cases for Solr query would be helpful. For instance, what would the workflow look like? When would it be helpful to query Solr in context of the Arctic Data Center/data processing?
datamgmt::add_creator_id
is no longer used
@jagoldstein's idea: Links direct in the current tab instead of opening a new one. We should replace the all markdown training links with the following html format: <a href="http://example.com/" target="_blank">example</a>
I'd like to update and consolidate some of the language in this section
For some reason 1.8.1 part 1 looks like a large paragraph but part 2 is rendering correctly as bullet points.
I had a little bit of difficulty figuring out why I couldn't publish_object
until I realized I got my token from the regular arcticdata.io rather than the test.arcticdata.io because I had the two open from earlier because I was trying to code along with the document.
I would suggest to either clarify 2.6 to use test.arcticdata.io to get the token to prevent the user from following the hyperlink in 2.3 to the regular site
add information on how to merge multiple tickets (PI submitting multiple related datasets) into one for consolidated response
In 1,2 (Effective Data Management), there's a broken link.
Read Matt Jones et al.’s paper on [effective data management] to learn how we will be organizing datasets prior to archival.
I imagine the updated link should be here: https://esajournals.onlinelibrary.wiley.com/doi/epdf/10.1890/0012-9623-90.2.205
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.