The guide-to-n3c-v1 from national-covid-cohort-collaborative

distinguish lead/author/correspding/contributor

during today's meeting, we decided

"leads" and "contributors" are both represented as "authors" in quarto.
"corresponding" authors will have an asterisk (or some symbol)
"lead" authors will have a cross (or some other symbol)

See the custom-info tag in quarto: https://quarto.org/docs/journals/authors.html

possible new chapter: "Starting the Analysis"

@oneilsh as I said in a review (of a chapter that I'm suggesting "Introducing the Enclave Tools"), I think the book needs a chapter that starts the reader with code for graphs & models. I think it could resemble @jerrodanzalone's and my day of the 2022 short course. And probably start with a section that includes some sql code.

snippets

@National-COVID-Cohort-Collaborative/book-of-n3c

Use the for-contributors/snippets/ directory to temporarily store files that likely will go into the book, but whose location hasn't yet been determined. I'm hoping this directory will free authors to create content when they're inspired, instead of being paralyzed because they don't yet know the perfect location.

Once the material has been transferred to the book, move the snippet to the for-contributors/snippets/incorporated/ directory.

I'm guessing this snippet system will work best when the files are fairly atomic. In other words, its contents all hang together and are moved at once. But let's be flexible if I'm wrong.

colophon

At least for now, I want diagnostics to view the quarto settings

create stubs for each chapter

https://docs.google.com/spreadsheets/d/18FWdK1jJXZxhB4t_CKyAHPWkSWIfg5kVcR0vF4_iqlI/

establish GitHub Actions

Bookdown works best with a GitHub Action, which essentially spawns a small VM that (a) collects all the markdown documents, (b) uses pandoc to convert them to html, and (c) moves the compiled products to the “gh-pages” branch. GitHub Pages takes it from there and serves it to anyone with a browser.

thumbnails for images

@oneilsh is advocating that the images zoom easily (maybe in a popout window), run with JavaScript probably

Not quite, but along the lines of

css for links so they're less distracting

try to style the css for links so they're less distracting. Maybe a grayer shade of blue.

@oneilsh, are you good with css? I'm a novice. I've started a css-hyperlink branch to experiment with. I can get the browser to recognize the hotpink css (see the screenshot). But

I don't know why/where something is override the color with "#fff"
I change the size only on the light theme css file. But somehow it's spilling over to the dark theme. This 2nd point may be related to Quarto, and not standard css rules.

initialize bookdown structure

starting with an adequate, but minimal skeleton of files.

add cff file

When I created a release (#13) and checked Zenodo, I noticed its advice about cff files. It has some useful information, including these two paragraphs:

https://citation-file-format.github.io/

CITATION.cff files are plain text files with human- and machine-readable citation information for software (and datasets). Code developers can include them in their repositories to let others know how to correctly cite their software.

This is an example of a simple CITATION.cff file:

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Druskat
    given-names: Stephan
    orcid: https://orcid.org/0000-0003-4925-7248
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2021-08-11

When you have a CITATION.cff file in your GitHub repository, make a release and publish it on Zenodo via the Zenodo-GitHub integration, Zenodo will use the citation information you’ve provided to populate the publication entry! This makes it easier for software developers and maintainers to publish their software with complete and correct metadata.

The readme and schema guide have a lot more details.

re-enable pdf build

The GitHub Actions build was flakey about building the latex/pdf version. (See how https://github.com/National-COVID-Cohort-Collaborative/guide-to-n3c-v1/actions/runs/4470694862 had run fail and the second run pass; also #73 (comment).) Some LaTeX package versions weren't being updated synchronously maybe?

I've turned it off for now (with a1a3fba), because I don't want it to distract from people contributing content. I'll re-enable it when the rush has calmed down. As a bonus, the whole job takes 90 sec now, instead of 210 sec,

tasks from 2023-03-23

@oneilsh & I went over the first three chapters. My list of misc todos:

fix internal links where I specify the visible text (eg [Logic Liaison Templates](tools.md#sec-tools-ll) instead of [Logic Liaison Templates](@sec-tools-ll)
show "Emily R. Pfaff et al. 2022" instead of "Pfaff et al. 2022"
spelling list
#83

Chapter 4: Governance, Leadership, and Operations Structures

source is complete
final conversion to Quarto
cross refs
add {{< fa lock size=2xs title="Link requires an N3C Enclave account" >}}
intention behind [NIH Information Security and Information Management TrainingNIH data management and security training](https://irtsectraining.nih.gov/FYR/00_005.aspx)?
walk-through with author
remove "undergoing final edits..." callout

Preface add authors

Need to sync authors list in preface to Chap 3 Data Lifecycle, add Xiaohan Tanner Zhang, Maya Choudhury, Sofia Z. Dard

Chapter 11: Help & Support

stub in Quarto
source is complete
final conversion to Quarto
cross refs
add {{< fa lock size=2xs title="Link requires an N3C Enclave account" >}}
walk-through with author
remove "undergoing final edits..." callout

https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/chapters/support.html

Chapter 3: Data Life Cycle - From Patients to N3C Researchers

source is complete
convert Google Docs to markdown (see #33 for some notes)
download figures (& rename if necessary)
manually incorporate figures (paths, text, & references)
split sentences on separate lines
- pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
- sub:
```
.
```
  (or with two spaces for indented bullets)
replace curly quotes with straight quotes [“”] and [‘’]
add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
replace +/- with ± and other scientific symbols
outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
end each figure caption with a period (#116)
change "lifecycle" to "life cycle" (per this Slack discussion) and the filename/cross-references from "lifecycle" to "cycle"
There are some paragraphs you might want to break into two. Web pages/sites tend to be narrower than a piece of paper. So some of the paragraph extend from the top to beyond the bottom of a laptop screen. Nothing bad or wrong. You just might want a visual break in a few more places.
Stephanie & Tanner should use the same organization abbreviation
the figure captions need some more detail, and some captions aren't easily distinguishable
the resolution of the figures is very low. @oneilsh captured some great screenshots. Can you get tips from him? In short, use a big monitor and maximize the browser.
What is this supposed to connect to? *Clinical Data contains data enhancements from data partners which includes long COVID clinic visits, ADT transactions, O2 devices, NLP, and SDoH datasets.
I changed "O2" to "oxygen" --is this ok?
something is off/unbalanced with this double-quotation and the sentence maybe.
I changed the bullets to numbers for the "grouped into 7 categories..." --is this ok?
I created a numbered list for the "The N3C Data ingestion pipeline..." --is this ok?
I created a numbered list for the "unit-harmonization pipeline..." --is this ok?
walk-through with author
remove "undergoing final edits..." callout

All Hands Agenda 2022-11-28

Shawn's Content

Quick overview of mechanism that's relevant to authors (Will Beasley)

Quarto
Conversion from Google Docs to Markdown tool (#33)
Images (described near the bottom of the readme)
author & contributor info (described near the bottom of the readme)
~1 hour session w/ Will and the authors to go through chapter
Once converted to markdown, writing & editing of chapter should occur on GitHub w/ Markdown

TOC & outline

Populate a markdown document that represents the table of contents. Bookdown doesn't have an explicit table contents file (because it's derived from the files and the _bookdown.yml).

But this document wild help us plan and communicate.

Welcome & front page (ie, index.qmd)

source is complete
convert Google Docs to markdown (see #33 for some notes)
download figures (& rename if necessary)
manually incorporate figures (paths, text, & references)
split sentences on separate lines
- pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
- sub:
```
.
```
  (or with two spaces for indented bullets)
replace curly quotes with straight quotes “ ” ‘ ’
add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
create (and link to) the chapter for "Funding and support for individual authors"
walk-through with author
remove "undergoing final edits..." callout

pages:

Chapter 6: Getting & Managing Data Access

https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/chapters/access.html

experiment with Quarto

Quarto is the next version of Bookdown/knitr. I think it is worth looking now into because

there will be a lot of similarities with what we already know from Bookdown
it's not R-centric, so in theory is more inclusive to all the N3C users who are mostly Python. This is a small issue though, because I don't think we'll have many chapters with any dynamic code. Just code blocks. But it does mean that authors/editors who are developing locally don't have to have RStudio. Here are the options: https://quarto.org/docs/books/#quick-start
It sounds like Bookdown will be around for many years, but most of the company's/developers' effort will be on Quarto. This might be relevant for issues like #20, where someone wants a new feature.
It looks like all the GitHub Actions have been developed, so I won't have to experiment with those settings & Docker.
- https://github.com/quarto-dev/quarto-actions
- https://github.com/jjallaire/visualization-curriculum/blob/master/.github/workflows/build-book.yml

Good Resources:

Chapter 5: Onboarding, Enclave Access, N3C Team Science

stub in Quarto
convert Google Docs to markdown (see #33 for some notes)
download figures (& rename if necessary)
manually incorporate figures (paths, text, & references)
split sentences on separate lines
- pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
- sub:
```
.
```
  (or with two spaces for indented bullets)
replace curly quotes with straight quotes [“”] and [‘’]
add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
replace +/- with ± and other scientific symbols
outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
end each figure caption with a period (#116)
walk-through with author
missing four figures (that were embedded in the Google Doc, but were missing from the stand-alone pngs)
Is there supposed to be a heading and transition here?
This section is pretty choppy. Can you think of a way to make it flow better?
fig-onboarding-030-login.png needs the red ellipse around the first dropdown box.
(never mind --remove the red circle)
Is there a sentence or two you can use to open 9.5?
remove "undergoing final edits..." callout

Use FontAwesome icon to indicate secured link to an Enclave page

FA Lock: https://fontawesome.com/icons/lock?s=solid&f=classic
Quarto Extension: https://github.com/quarto-ext/fontawesome

{{< fa lock size=2xs title="Link requires an N3C Enclave account" >}}

Chapters to Process

call out block for not-quite-finalized chapter

@JohannaLoomba & @oneilsh, you seemed to be the people most interested in declaring the chapters that were migrated, but not yet finalized. How do you feel about a callout block that says:

This chapter is being finalized. The content might change a little. Most of the remaining work involves formatting and cross references to other chapters. We hope to complete this chapter during May 2023.

Add DOI to help citations to it

suggested by @oneilsh. I'll try Zenodo

Chapter 6 date shifting and zip code refs

There are a references in here to date shifting, both the +/- 180 days N3C shifting and site pre-shifting which need to be updated to point to a proper location when they exist (data lifecycle most likely).

create & populate teams

Currently

style guide chapter/section

along the lines of

Using a consistent style across your projects can decrease the overhead as your data science team discusses options, decides on a good choice, and develops in compliant code. But like in most themes in this document, the cost is worth the effort. Unforced code errors are reduced when code is consistent, because mistake-prone styles are more apparent.

{Copied from https://ouhscbbmc.github.io/data-science-practices-1/style.html}

Make it easier for readers to cite a chapter

https://quarto.org/docs/authoring/create-citeable-articles.html

You can make it easier for others to cite your work by providing additional metadata with the YAML front-matter of your article. Citations can be provided for both articles published to the web or for articles published in journals (with or without a DOI).

https://quarto.org/docs/reference/metadata/citation.html

You can provide citation data for Quarto documents in the document front matter. The citation options are based upon the Citation Style Language (CSL) specification for items, but as YAML (rather than XML).

Code of Conduct

The contact method is "N3C Education Committee (training coordinator: Shawn O'Neil at [email protected])"

So the whole paragraph is:

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at N3C Education Committee (training coordinator: Shawn O'Neil at [email protected]). All complaints will be reviewed and investigated promptly and fairly.

See the entire file at https://github.com/National-COVID-Cohort-Collaborative/book-of-n3c-v1/blob/main/CODE_OF_CONDUCT.md

Google Analytics

To watch page counts and other things that might help justify our time.

Besides @oneilsh, is there anyone else who would like access? @jmcmurry?

References:

Chapter 7: Understanding the Data

source is complete

"freeze" version in Google Docs and include warning in markdown

:::{.callout-warning}
This chapter is in the middle of being converted.  Please do NOT make changes to the frozen Google Docs version.  The markdown source will be ready soon to edit directly.
:::

convert author yaml
convert Google Docs to markdown (see #33 for some notes)
download figures (& rename if necessary: rename 's/Lifecycle Image./fig-cycle-/' *.png)
manually incorporate figures (paths, text, & references)
split sentences on separate lines
- pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
- sub:
```
.
```
  (or with two spaces for indented bullets)
replace curly quotes with straight quotes [“”] and [‘’]
add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
replace +/- with ± and other scientific symbols
outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
lots of missing author info in yaml
tables are crossrefed as tables, not figures
What's going on with the extended caption for fig-090? "*New container name should be similar to source container, but different enough to communicate altered intention. MS=Manuscript"
Please check the heading depths. Some are six levels deep (e.g, `7.3.3.1.1.1 Refinement Process). Is that intentional? Most chapter have 3 (and a few have 4 levels).
In this single chapter some sections went several pages without a section (eg, 7.4 didn't have any children) and some areas had a new header every paragraph.
I tried to make things more consistent while reflecting the intent of each subsection's author. Please check me.
The vocabulary is reference inconsistently. For example sometimes "ICD10" and sometimes "ICD-10-CM". Please adjust so it's consistent (and reflecting the desired specificity).
Can you think of a way to tighten up the list starting with "The concept set content that..."?
I tweaked the "Concept Set Metadata" section. See if it's ok. It should be easier to reference than bullets with bold & italics headers
Check my revisions to standard concepts (c4a523e)
walk-through with author
remove "undergoing final edits..." callout

Chapter 1: Introduction

stub in Quarto
source is complete
final conversion to Quarto
add {{< fa lock size=2xs title="Link requires an N3C Enclave account" >}}
cross refs
@Kcrowley16's author info (orcid & email)
opaque background of images --assigned to @oneilsh ?
walk-through with author
remove "undergoing final edits..." callout

https://github.com/national-covid-cohort-collaborative/guide-to-n3c-v1/blob/main/chapters/intro.md

Chapter 10: Publishing & Sharing your Work

source is complete
convert Google Docs to markdown (see #33 for some notes)
download figures (& rename if necessary)
manually incorporate figures (paths, text, & references)
split sentences on separate lines
- pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
- sub:
```
.
```
  (or with two spaces for indented bullets)
replace curly quotes with straight quotes “ ” ‘ ’
add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
need affiliation urls for Christine, Carolyn, & Mary
what should the link be for [Concept Set Browser](#concept-set-browser)?
what's the double asterisk in table 1 for? "to knowledge artifacts used directly by the study** "
walk-through with author
remove "undergoing final edits..." callout

~10 min tutorial

record a tutorial for authors & contributors to use GitHub & Markdown

Maybe record a one-person zoom meeting and upload to Google Drive afterwards.

editors: please add your information to the CFF file

Please add your information to the CITATION.cff file. You can probably just copy my entry and replace with your info. If you need more details, see the cff schema guide.

(Builds on issue #15)

Remove unnecessary dependencies

Remove unused packages in the DESCRIPTION file. Last time I tried to wholesale remove, I got rid of something that was unexpectedly needed by the GitHub Actions build. This time I'll remove a few packages at a time.

user story idea

Idea: early in book, present a user story, with enough detail to hit the high points and referencing later chapters for more detail.

User story: a research team with a specific question in mind (basic 2x2 sort of test with an outcome: severe? death?), but show notional data for screenshots etc?

Registration, have a team already, covered by DUA
Browse enclave, check out the learning datasets, read up on OMOP
Submit DUR for say level 2/DAC approval
hook up with a domain team, attend one of the meetings
Collaborators join via request/approval
Find their project space
interested in something, say obesity (?), have a starting ICD code
- go to atlas, look up code, define concept set, import into N3C
use contour to generate histogram of ages, export cohort of interest
use person-level template (or table) to do the 2x2 based on defined outcome from the template (hosp vs non hosp?)
produce a table of stat results & a figure
submit download request
draft manuscript
submit to pub committee
submit to journal
get accepted, fame and fortune

Funding sources

A "chapter" dedicated to each institution's funding. Include each authors name & ORCiD icon. Then list each institution's grant numbers.

@jerrodanzalone will start completing the material in the format he wants, then we'll ask/Slack each author to include the grant numbers they want.

https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/chapters/funding.html

frozen Google docs version: https://docs.google.com/document/d/1PpeQBbX-tjHbww45fC9wdWQXIQM4R7fw8OaJnZN9csM/edit

experiment converting a chapter in Google Docs to Markdown

The repo for the CD2H Informatics Playbook recommends the Docs to Markdown addin, available in the Google Workspace Marketplace.

@JohannaLoomba, your analysis Chapter is the most complete, so I'm using it as a guinea pig. Keep writing & editing in Google Docs. I'll redo the conversion where you're done with the chapter and ready to transition to Markdown in GitHub. (If anyone else is ready this, the transition process is described at the bottom of this document.)

Some little notes to help me remember the process:

In the addin, checked/selected "Demote headings (H1 → H2, etc.)" But it really depends on how the authors defined the headings.
manually deleted the TOC produced in Markdown.
Made little comments to the authors, while I was in the document (eg, "Apache Spark is "Spark", not "SPARK"). Made other edits in suggestion mode.
Remember that my suggestions will not be reflected in the converted Markdown until they are accepted in the Google Doc.
The hyperlinks to sites outside the book were converted well. Even the ones into the Enclave, requiring a login (eg Tutorial: [Intro to Code Workbook](https://unite.nih.gov/workspace/module/view/latest/ri.workshop.main.module.e7b83a8c-545e-49ac-8714-f34bfa7f7767?view=focus&Id=22).)
The Google Doc has curly quotes, not straight quotes. They remained curly in the markdown. But they seemed to be rendered to html without a problem.
Footnotes need to be manually converted.
Section anchors need to be manually added (and linked).

Chapter 2: A Research Story (Overview)

https://docs.google.com/document/d/1ttUKgwVcIZHM87elrlUNV6Qi9thzOwKBg8GegKObEtg/edit#

50,000-foot view of a research project, from onboarding to publishing

Full of links to relevant later sections

Onboarding, ORCiD

DUA coverage

DUR,

Team building, collaborators

Protocol, variables, definitions

Learning about and using OMOP (e.g. concept sets)

Pinning to a release, finalizing analysis & figures

Download request

Draft paper, pub committee

Authors: Will (lead) & Jerrod

https://github.com/National-COVID-Cohort-Collaborative/guide-to-n3c-v1/blob/main/chapters/research-story.md

Chapter 8: Introducing Enclave Analysis Tools

stub in Quarto
source is complete
final conversion to Quarto
cross refs
add {{< fa lock size=2xs title="Link requires an N3C Enclave account" >}}
get fig 16 to work with pdf --the gif works fine for html, but not for pdf
walk-through with author
remove "undergoing final edits..." callout

https://github.com/National-COVID-Cohort-Collaborative/guide-to-n3c-v1/blob/main/chapters/tools.md

rename to "Guide to N3C" from "Book of N3C"

As suggested by @oneilsh and people in the NIH.

Other places to change:

Google Drive space: Guide to N3C (Previously Book of N3C)
README: Notes for Contributors
Slack Channel

selecting license

During the editorial committee meeting, we were favoring CC-BY-ND-4.0.

I'll add that to cff file and the license file and (from https://github.com/idleberg/Creative-Commons-Markdown/blob/master/4.0/by-nd.markdown)
@oneilsh is talking to NIH people about it

high res images for Life Cycle chapter

@stephanieshong & @bryanlaraway, the resolution of the test figure looks good. Thanks. Please continue with the rest of the images (and remember to keep everything lowercase).

Note that the squiggly underline (reflecting Word's spelling & grammar checks) are visible in that recent figure. I suggest you temporarily turn them off during the screenshots.

https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/chapters/cycle.html

builds on #98

Zenodo community for COVID-19?

Does anyone have any experience or opinions about submitting to Zenodo Communities? I think we just ask the curators to grab our Zenodo entries (which are automatically triggered any time the repo has a major release).

https://zenodo.org/communities/covid-19/about/

https://zenodo.org/communities/covid-19/?page=1&size=20

national-covid-cohort-collaborative / guide-to-n3c-v1 Goto Github PK

guide-to-n3c-v1's People

Contributors

Stargazers

Watchers

Forkers

guide-to-n3c-v1's Issues

Shawn's Content

Quick overview of mechanism that's relevant to authors (Will Beasley)

Recommend Projects

Recommend Topics

Recommend Org

Jobs