GithubHelp home page GithubHelp logo

national-covid-cohort-collaborative / guide-to-n3c-v1 Goto Github PK

View Code? Open in Web Editor NEW
11.0 8.0 4.0 253.15 MB

Research with the National COVID Cohort Collaborative (N3C: https://ncats.nih.gov/n3c)

Home Page: https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/

License: Other

TeX 56.02% SCSS 2.42% CSS 0.64% Lua 11.86% HTML 0.48% JavaScript 28.58%
covid-19 omop n3c national-covid-cohort-collaborative covid health palantir enclave quarto

guide-to-n3c-v1's People

Contributors

cfsuver avatar chrisroederucdenver avatar dependabot[bot] avatar hlehmann17 avatar jerrodanzalone avatar johannaloomba avatar oneilsh avatar stephanieshong avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guide-to-n3c-v1's Issues

snippets

@National-COVID-Cohort-Collaborative/book-of-n3c

Use the for-contributors/snippets/ directory to temporarily store files that likely will go into the book, but whose location hasn't yet been determined. I'm hoping this directory will free authors to create content when they're inspired, instead of being paralyzed because they don't yet know the perfect location.

Once the material has been transferred to the book, move the snippet to the for-contributors/snippets/incorporated/ directory.

I'm guessing this snippet system will work best when the files are fairly atomic. In other words, its contents all hang together and are moved at once. But let's be flexible if I'm wrong.

colophon

At least for now, I want diagnostics to view the quarto settings

create stubs for each chapter

  • ch-intro.md: Introduction
  • ch-research-story.md: A Research Story
  • ch-lifecycle.md: Data Lifecycle - From Patients to N3C Researchers
  • ch-governance.md: Governance, Leadership, and Operations Structures
  • ch-onboarding.md: Onboarding, Enclave Access, N3C Team Science
  • ch-data-access.md: Getting & Managing Data Access
  • ch-data-understanding.md: Understanding the Data
  • ch-data-analysis.md: Analyzing the Data
  • ch-practices.md: Best Practices and Important Data Considerations
  • ch-publishing.md: Publishing and Sharing Your Work
  • ch-support.md: Special Topic: Help and Support
  • ch-machine-learning.md: Special Topic: Machine Learning
  • ch-enclave-advanced.md: Special Topic: Advanced Enclave Coding Techniques
  • ch-examples.md: Special Topic: Start to finish examples or worked examples

https://docs.google.com/spreadsheets/d/18FWdK1jJXZxhB4t_CKyAHPWkSWIfg5kVcR0vF4_iqlI/

establish GitHub Actions

Bookdown works best with a GitHub Action, which essentially spawns a small VM that (a) collects all the markdown documents, (b) uses pandoc to convert them to html, and (c) moves the compiled products to the “gh-pages” branch. GitHub Pages takes it from there and serves it to anyone with a browser.

css for links so they're less distracting

try to style the css for links so they're less distracting. Maybe a grayer shade of blue.

@oneilsh, are you good with css? I'm a novice. I've started a css-hyperlink branch to experiment with. I can get the browser to recognize the hotpink css (see the screenshot). But

  1. I don't know why/where something is override the color with "#fff"
  2. I change the size only on the light theme css file. But somehow it's spilling over to the dark theme. This 2nd point may be related to Quarto, and not standard css rules.

image

add cff file

When I created a release (#13) and checked Zenodo, I noticed its advice about cff files. It has some useful information, including these two paragraphs:

https://citation-file-format.github.io/

CITATION.cff files are plain text files with human- and machine-readable citation information for software (and datasets). Code developers can include them in their repositories to let others know how to correctly cite their software.

This is an example of a simple CITATION.cff file:

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Druskat
    given-names: Stephan
    orcid: https://orcid.org/0000-0003-4925-7248
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2021-08-11

When you have a CITATION.cff file in your GitHub repository, make a release and publish it on Zenodo via the Zenodo-GitHub integration, Zenodo will use the citation information you’ve provided to populate the publication entry! This makes it easier for software developers and maintainers to publish their software with complete and correct metadata.

The readme and schema guide have a lot more details.

re-enable pdf build

The GitHub Actions build was flakey about building the latex/pdf version. (See how https://github.com/National-COVID-Cohort-Collaborative/guide-to-n3c-v1/actions/runs/4470694862 had run fail and the second run pass; also #73 (comment).) Some LaTeX package versions weren't being updated synchronously maybe?

I've turned it off for now (with a1a3fba), because I don't want it to distract from people contributing content. I'll re-enable it when the rush has calmed down. As a bonus, the whole job takes 90 sec now, instead of 210 sec,

tasks from 2023-03-23

@oneilsh & I went over the first three chapters. My list of misc todos:

  • fix internal links where I specify the visible text (eg [Logic Liaison Templates](tools.md#sec-tools-ll) instead of [Logic Liaison Templates](@sec-tools-ll)
  • show "Emily R. Pfaff et al. 2022" instead of "Pfaff et al. 2022"
  • spelling list
  • #83

Chapter 4: Governance, Leadership, and Operations Structures

  • source is complete
  • final conversion to Quarto
  • cross refs
  • add {{< fa lock size=2xs title="Link requires an N3C Enclave account" >}}
  • intention behind [NIH Information Security and Information Management TrainingNIH data management and security training](https://irtsectraining.nih.gov/FYR/00_005.aspx)?
    image
  • walk-through with author
  • remove "undergoing final edits..." callout

Preface add authors

Need to sync authors list in preface to Chap 3 Data Lifecycle, add Xiaohan Tanner Zhang, Maya Choudhury, Sofia Z. Dard

Chapter 3: Data Life Cycle - From Patients to N3C Researchers

  • source is complete
  • convert Google Docs to markdown (see #33 for some notes)
  • download figures (& rename if necessary)
  • manually incorporate figures (paths, text, & references)
  • split sentences on separate lines
    • pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
    • sub:
      .
      
      
      (or with two spaces for indented bullets)
  • replace curly quotes with straight quotes [“”] and [‘’]
  • add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
  • replace +/- with &plusmn; and other scientific symbols
  • outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
  • incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
  • end each figure caption with a period (#116)
  • change "lifecycle" to "life cycle" (per this Slack discussion) and the filename/cross-references from "lifecycle" to "cycle"
  • There are some paragraphs you might want to break into two. Web pages/sites tend to be narrower than a piece of paper. So some of the paragraph extend from the top to beyond the bottom of a laptop screen. Nothing bad or wrong. You just might want a visual break in a few more places.
  • Stephanie & Tanner should use the same organization abbreviation
  • the figure captions need some more detail, and some captions aren't easily distinguishable
  • the resolution of the figures is very low. @oneilsh captured some great screenshots. Can you get tips from him? In short, use a big monitor and maximize the browser.
  • What is this supposed to connect to? *Clinical Data contains data enhancements from data partners which includes long COVID clinic visits, ADT transactions, O2 devices, NLP, and SDoH datasets.
  • I changed "O2" to "oxygen" --is this ok?
  • something is off/unbalanced with this double-quotation and the sentence maybe.
    image
  • I changed the bullets to numbers for the "grouped into 7 categories..." --is this ok?
  • I created a numbered list for the "The N3C Data ingestion pipeline..." --is this ok?
  • I created a numbered list for the "unit-harmonization pipeline..." --is this ok?
  • walk-through with author
  • remove "undergoing final edits..." callout

All Hands Agenda 2022-11-28

Shawn's Content

Quick overview of mechanism that's relevant to authors (Will Beasley)

  • Quarto
  • Conversion from Google Docs to Markdown tool (#33)
  • Images (described near the bottom of the readme)
  • author & contributor info (described near the bottom of the readme)
  • ~1 hour session w/ Will and the authors to go through chapter
  • Once converted to markdown, writing & editing of chapter should occur on GitHub w/ Markdown

TOC & outline

Populate a markdown document that represents the table of contents. Bookdown doesn't have an explicit table contents file (because it's derived from the files and the _bookdown.yml).

But this document wild help us plan and communicate.

Welcome & front page (ie, index.qmd)

  • source is complete
  • convert Google Docs to markdown (see #33 for some notes)
  • download figures (& rename if necessary)
  • manually incorporate figures (paths, text, & references)
  • split sentences on separate lines
    • pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
    • sub:
      .
      
      
      (or with two spaces for indented bullets)
  • replace curly quotes with straight quotes “ ” ‘ ’
  • add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
  • outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
  • incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
  • create (and link to) the chapter for "Funding and support for individual authors"
  • walk-through with author
  • remove "undergoing final edits..." callout

pages:

Chapter 6: Getting & Managing Data Access

  • source is complete
  • convert Google Docs to markdown (see #33 for some notes)
  • download figures (& rename if necesssary)
  • manually incorporate figures (paths, text, & references)
  • replace curly quotes with straight quotes “ ” ‘ ’
  • add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
  • reference the Help & Support chapter, instead of the external site. When the url changes, fewer places in the book need to be updated (Suggested by Amy Olex)
  • need Mariam's author info (eg, orcid)
  • Should "enclave" be capitalized?
  • IRB is defined after the 4th time its used, not the 1st time
  • capitalize "institutional Data Use"?
  • what type of callout style do you want for "Update: As of mid-2022, N3C is no longer..."
  • Fig 3's caption is incomplete. Make sure you also get the alt text.
    image
  • Fig 4's red annotations are missing
    image
  • correct use of dash in " but cannot share data or files across them–thus it is impossible"?
  • outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
  • incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
  • walk-through with author
  • remove "undergoing final edits..." callout

https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/chapters/access.html

experiment with Quarto

Quarto is the next version of Bookdown/knitr. I think it is worth looking now into because

  1. there will be a lot of similarities with what we already know from Bookdown
  2. it's not R-centric, so in theory is more inclusive to all the N3C users who are mostly Python. This is a small issue though, because I don't think we'll have many chapters with any dynamic code. Just code blocks. But it does mean that authors/editors who are developing locally don't have to have RStudio. Here are the options: https://quarto.org/docs/books/#quick-start
  3. It sounds like Bookdown will be around for many years, but most of the company's/developers' effort will be on Quarto. This might be relevant for issues like #20, where someone wants a new feature.
  4. It looks like all the GitHub Actions have been developed, so I won't have to experiment with those settings & Docker.

Good Resources:

Chapter 5: Onboarding, Enclave Access, N3C Team Science

  • stub in Quarto
  • convert Google Docs to markdown (see #33 for some notes)
  • download figures (& rename if necessary)
  • manually incorporate figures (paths, text, & references)
  • split sentences on separate lines
    • pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
    • sub:
      .
      
      
      (or with two spaces for indented bullets)
  • replace curly quotes with straight quotes [“”] and [‘’]
  • add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
  • replace +/- with &plusmn; and other scientific symbols
  • outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
  • incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
  • end each figure caption with a period (#116)
  • walk-through with author
  • missing four figures (that were embedded in the Google Doc, but were missing from the stand-alone pngs)
  • Is there supposed to be a heading and transition here?
    image
  • This section is pretty choppy. Can you think of a way to make it flow better?
    image
  • fig-onboarding-030-login.png needs the red ellipse around the first dropdown box.
    (never mind --remove the red circle)
  • Is there a sentence or two you can use to open 9.5?
    image
  • remove "undergoing final edits..." callout

call out block for not-quite-finalized chapter

@JohannaLoomba & @oneilsh, you seemed to be the people most interested in declaring the chapters that were migrated, but not yet finalized. How do you feel about a callout block that says:

This chapter is being finalized. The content might change a little. Most of the remaining work involves formatting and cross references to other chapters. We hope to complete this chapter during May 2023.

image

Chapter 6 date shifting and zip code refs

There are a references in here to date shifting, both the +/- 180 days N3C shifting and site pre-shifting which need to be updated to point to a proper location when they exist (data lifecycle most likely).

style guide chapter/section

along the lines of

Using a consistent style across your projects can decrease the overhead as your data science team discusses options, decides on a good choice, and develops in compliant code. But like in most themes in this document, the cost is worth the effort. Unforced code errors are reduced when code is consistent, because mistake-prone styles are more apparent.

{Copied from https://ouhscbbmc.github.io/data-science-practices-1/style.html}

Make it easier for readers to cite a chapter

https://quarto.org/docs/authoring/create-citeable-articles.html

You can make it easier for others to cite your work by providing additional metadata with the YAML front-matter of your article. Citations can be provided for both articles published to the web or for articles published in journals (with or without a DOI).

https://quarto.org/docs/reference/metadata/citation.html

You can provide citation data for Quarto documents in the document front matter. The citation options are based upon the Citation Style Language (CSL) specification for items, but as YAML (rather than XML).

Code of Conduct

The contact method is "N3C Education Committee (training coordinator: Shawn O'Neil at [email protected])"

So the whole paragraph is:

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at N3C Education Committee (training coordinator: Shawn O'Neil at [email protected]). All complaints will be reviewed and investigated promptly and fairly.

See the entire file at https://github.com/National-COVID-Cohort-Collaborative/book-of-n3c-v1/blob/main/CODE_OF_CONDUCT.md

Chapter 7: Understanding the Data

  • source is complete
  • "freeze" version in Google Docs and include warning in markdown
    :::{.callout-warning}
    This chapter is in the middle of being converted.  Please do NOT make changes to the frozen Google Docs version.  The markdown source will be ready soon to edit directly.
    :::
    
  • convert author yaml
  • convert Google Docs to markdown (see #33 for some notes)
  • download figures (& rename if necessary: rename 's/Lifecycle Image./fig-cycle-/' *.png)
  • manually incorporate figures (paths, text, & references)
  • split sentences on separate lines
    • pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
    • sub:
      .
      
      
      (or with two spaces for indented bullets)
  • replace curly quotes with straight quotes [“”] and [‘’]
  • add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
  • replace +/- with &plusmn; and other scientific symbols
  • outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
  • incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
  • lots of missing author info in yaml
  • tables are crossrefed as tables, not figures
  • What's going on with the extended caption for fig-090? "*New container name should be similar to source container, but different enough to communicate altered intention. MS=Manuscript"
  • Please check the heading depths. Some are six levels deep (e.g, `7.3.3.1.1.1 Refinement Process). Is that intentional? Most chapter have 3 (and a few have 4 levels).
    In this single chapter some sections went several pages without a section (eg, 7.4 didn't have any children) and some areas had a new header every paragraph.
    I tried to make things more consistent while reflecting the intent of each subsection's author. Please check me.
  • The vocabulary is reference inconsistently. For example sometimes "ICD10" and sometimes "ICD-10-CM". Please adjust so it's consistent (and reflecting the desired specificity).
  • Can you think of a way to tighten up the list starting with "The concept set content that..."?
  • I tweaked the "Concept Set Metadata" section. See if it's ok. It should be easier to reference than bullets with bold & italics headers
  • Check my revisions to standard concepts (c4a523e)
  • walk-through with author
  • remove "undergoing final edits..." callout

Chapter 10: Publishing & Sharing your Work

  • source is complete
  • convert Google Docs to markdown (see #33 for some notes)
  • download figures (& rename if necessary)
  • manually incorporate figures (paths, text, & references)
  • split sentences on separate lines
    • pattern: (?<=[^1])\.[ ]{1,2}(?=\w)
    • sub:
      .
      
      
      (or with two spaces for indented bullets)
  • replace curly quotes with straight quotes “ ” ‘ ’
  • add {{< fa lock title="Link requires an N3C Enclave account" >}} (with a space separating the closing ) and the leading {{) (see #49 for some notes)
  • outgoing cross refs (make sure the references to other chapters/sections/figures/tables are resolved)
  • incoming cross refs (define anchors for all headers for other chapters/sections to refer to)
  • need affiliation urls for Christine, Carolyn, & Mary
  • what should the link be for [Concept Set Browser](#concept-set-browser)?
  • what's the double asterisk in table 1 for? "to knowledge artifacts used directly by the study** "
  • walk-through with author
  • remove "undergoing final edits..." callout

~10 min tutorial

record a tutorial for authors & contributors to use GitHub & Markdown

Maybe record a one-person zoom meeting and upload to Google Drive afterwards.

Remove unnecessary dependencies

Remove unused packages in the DESCRIPTION file. Last time I tried to wholesale remove, I got rid of something that was unexpectedly needed by the GitHub Actions build. This time I'll remove a few packages at a time.

user story idea

Idea: early in book, present a user story, with enough detail to hit the high points and referencing later chapters for more detail.

User story: a research team with a specific question in mind (basic 2x2 sort of test with an outcome: severe? death?), but show notional data for screenshots etc?

  • Registration, have a team already, covered by DUA
  • Browse enclave, check out the learning datasets, read up on OMOP
  • Submit DUR for say level 2/DAC approval
  • hook up with a domain team, attend one of the meetings
  • Collaborators join via request/approval
  • Find their project space
  • interested in something, say obesity (?), have a starting ICD code
    • go to atlas, look up code, define concept set, import into N3C
  • use contour to generate histogram of ages, export cohort of interest
  • use person-level template (or table) to do the 2x2 based on defined outcome from the template (hosp vs non hosp?)
  • produce a table of stat results & a figure
  • submit download request
  • draft manuscript
  • submit to pub committee
  • submit to journal
  • get accepted, fame and fortune

Funding sources

A "chapter" dedicated to each institution's funding. Include each authors name & ORCiD icon. Then list each institution's grant numbers.

@jerrodanzalone will start completing the material in the format he wants, then we'll ask/Slack each author to include the grant numbers they want.

https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/chapters/funding.html

frozen Google docs version: https://docs.google.com/document/d/1PpeQBbX-tjHbww45fC9wdWQXIQM4R7fw8OaJnZN9csM/edit

experiment converting a chapter in Google Docs to Markdown

The repo for the CD2H Informatics Playbook recommends the Docs to Markdown addin, available in the Google Workspace Marketplace.

@JohannaLoomba, your analysis Chapter is the most complete, so I'm using it as a guinea pig. Keep writing & editing in Google Docs. I'll redo the conversion where you're done with the chapter and ready to transition to Markdown in GitHub. (If anyone else is ready this, the transition process is described at the bottom of this document.)


Some little notes to help me remember the process:

  1. In the addin, checked/selected "Demote headings (H1 → H2, etc.)" But it really depends on how the authors defined the headings.
  2. manually deleted the TOC produced in Markdown.
  3. Made little comments to the authors, while I was in the document (eg, "Apache Spark is "Spark", not "SPARK"). Made other edits in suggestion mode.
  4. Remember that my suggestions will not be reflected in the converted Markdown until they are accepted in the Google Doc.
  5. The hyperlinks to sites outside the book were converted well. Even the ones into the Enclave, requiring a login (eg Tutorial: [Intro to Code Workbook](https://unite.nih.gov/workspace/module/view/latest/ri.workshop.main.module.e7b83a8c-545e-49ac-8714-f34bfa7f7767?view=focus&Id=22).)
  6. The Google Doc has curly quotes, not straight quotes. They remained curly in the markdown. But they seemed to be rendered to html without a problem.
  7. Footnotes need to be manually converted.
  8. Section anchors need to be manually added (and linked).

Chapter 2: A Research Story (Overview)

https://docs.google.com/document/d/1ttUKgwVcIZHM87elrlUNV6Qi9thzOwKBg8GegKObEtg/edit#

  • 50,000-foot view of a research project, from onboarding to publishing
  • Full of links to relevant later sections
    • Onboarding, ORCiD
    • DUA coverage
    • DUR,
    • Team building, collaborators
    • Protocol, variables, definitions
    • Learning about and using OMOP (e.g. concept sets)
    • Pinning to a release, finalizing analysis & figures
    • Download request
    • Draft paper, pub committee

Authors: Will (lead) & Jerrod

https://github.com/National-COVID-Cohort-Collaborative/guide-to-n3c-v1/blob/main/chapters/research-story.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.