Programming Historian in English has received a proposal for a lesson, 'Visualizing da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

I confirm <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Hello Giulia <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I added captions and alt texts (<a class="commit-link" data-hovercard-type="commit" da

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

What's happening now? Hello Igor <a class="user-mention notranslat

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Visualizing data with R and ggplot2 about ph-submissions HOT 12 OPEN

hawc2 commented on June 6, 2024

Visualizing data with R and ggplot2

from ph-submissions.

Comments (12)

rogorido commented on June 6, 2024 2

@semanticnoodles thanks for your extensive comments. I will have a look at the enhancements you're proposing in the next days.

from ph-submissions.

anisa-hawes commented on June 6, 2024 2

Thank you, @nabsiddiqui!

@semanticnoodles will review these revisions and advise if we are ready to move onwards to the next Phase of the workflow (which will be Phase 4 Open Peer Review). Giulia is away this week, returning on June 3rd.

In the meantime, @charlottejmc and I can help with ensuring that functions and arguments are typographically consistent. These are aspects we always check as part of typesetting at Phase 6, but we'll do a quick scan now so that this isn't a distraction for Reviewers.

from ph-submissions.

charlottejmc commented on June 6, 2024 2

Hello @nabsiddiqui and @semanticnoodles,

I've made some adjustments to add backticks to functions, arguments and other parts of code, trying to stay consistent with our house style.

from ph-submissions.

semanticnoodles commented on June 6, 2024 1

I confirm @rogorido and @nabsiddiqui shared with me access to their repository containing all the required files, and that I handed them over to @anisa-hawes to allow the publishing team to generate the preview, thanks.

from ph-submissions.

anisa-hawes commented on June 6, 2024 1

Hello Giulia @semanticnoodles, Igor @rogorido and Nabeel @nabsiddiqui,

Many thanks for sharing the lesson submission materials with me. I've now checked the Markdown file, and add some key elements of metadata. I've also checked the accompanying images and assets, ensuring each element meets our requirements.

You can find the key files here:

You can review a Preview of the lesson here:

http://programminghistorian.github.io/ph-submissions/en/drafts/originals/visualizing-data-with-r-and-ggplot2

A few initial notes:

I've made a slight adjustment to the Header sizes used in the lesson. Our typesetting convention is that ## Header 2 is the largest.
I've added placeholder alt_text + captions for each of your images. We have committed to providing alt-text for all figure images, plots and graphs included in our lessons, so you'll need to add this as part of your revisions. These notes on Descriptive Alt text may be useful to you.
I've checked to ensure that you both have the Write access you'll need to edit your draft directly. We ask authors to work on their own files with direct commits: (we prefer you don't fork our repo, or use the Pull Request system in ph-submissions).
I imagine Giulia @semanticnoodles may have noted this too, but I noticed that you include both a .tsv and a .csv version of the dataset, although only the .csv appears to be used in the lesson. Is the .tsv alternative required too?

from ph-submissions.

rogorido commented on June 6, 2024 1

@anisa-hawes Thanks for your comments. As for the tsv file: no, it is not required. It can be deleted.

I'll add the alternative captions. Thanks.

from ph-submissions.

rogorido commented on June 6, 2024 1

I added captions and alt texts (10a6a9e), but Nabeel should take a look whether it looks 'Englishly' enough...

from ph-submissions.

semanticnoodles commented on June 6, 2024 1

Hello @rogorido and @nabsiddiqui,

here follows my preliminary feedback; I am aware it is quite extensive, but I believe these indications could help you strengthen your tutorial. If you need any clarification, please do not hesitate to ask!

Overall feedback

In general, your tutorial provides valuable guidance on navigating and producing a wide range of visualisations, effectively walking through the various features of ggplot2. The piece meets the accessibility and inclusivity goals of the Programming Historian fairly well, and in most cases the language is easy to understand and straightforward. However, some elements need further work, mostly falling under two intertwined aspects discussed in the following paragraphs.

Usability: Enhancing the logical structure of the lesson

In my opinion, this is the most critical point to consider. The tutorial lacks a cohesive element to tie its components together and the organisation of the content could benefit from a more linear and less convoluted approach. The case study you propose (sister cities) seems to be just a tool to obtain a series of visualisations. This is fair enough, but it could benefit from further methodological contextualisation and unpacking: the people following your tutorial may not be historians not have a clear understanding of the methods you are using -- although they can be familiar with R.

In terms of improving the overall content, I think there are two possible directions for you to consider: either revising the content to follow a visualisation task-based narrative or placing more emphasis on the structure of the case study. The first option would privilege the visualisation tasks (but still require some methodological support for the case study), while the second would require you to generate stronger and sharper research questions from the case study, to be answered (at least in part) by the visualisation tasks. I think @nabsiddiqui did a very good job of structuring the content in the lesson Data Wrangling and Management in R, so I would recommend keeping that in mind as a reference.

The title of the proposal could benefit from being more specific - or at least mentioning the context of application. The table of contents looks unbalanced: the headings and their actual wording could be better aligned with the content they cover, and the nesting could be more linear.

You give very clear information about the concept of the grammar of graphics - this is really the cornerstone of understanding how ggplot2 is designed. I really appreciate you explaining this and including many useful resources, although I think they could be arranged more organically, instead of including relatively short hints throughout the tutorial, as they tend to overshadow the walkthrough steps on several occasions.

Sustainability: Critically reviewing the data analysis narrative

The dataset looks more than adequate for the visualisation tasks you have set as objectives, but the data narrative and its wording could benefit from further tuning. What you offer in this lesson is mostly visualisation of data distributions and there is little statistical testing involved. As your topic is sister cities, it makes perfect sense to talk about relationships, although what you observe are mostly trends or tendencies that you could try to explain through further research; sometimes you clearly point that out and sometimes it looks rather implicit. I think this is just a matter of fine-tuning the language, nothing more.

Section-specific feedback

Para stands for paragraph number; please refer to the preview generated by @anisa-hawes

Introduction, Lesson Goals and Data

Para 1, line 2: there is an extra )
Lesson’s goals could be more specific (you could pick outcomes that have major resonance that adding meaningful labels to plots)
No reference to the dataset is presented here (it comes from Wikidata, right?). Make sure you at least have a couple of words about it here represented.
Review the heading accordingly with the edits.

ggplot2: General Overview

This acts more like an introductory section, although it is nested under the previous one. Bring it to the same level as the previous or put it before it to give a more comprehensive introduction (or re-arrange it for better consistency, please).
A couple of words about the Tidyverse here would better contextualise the workflow.
Para 7 could be added to the Additional Resources section.
Para 8 could mention more strategically the arguments – review it for a better alignment with the walkthrough. You could even thinking of following the official layers featured in the introduction to ggplot2 vignette, adapting that to match with the elements you thoroughly explain.
Review the heading accordingly with the edits.

Sister cities in Europe

Please clarify your understanding of sister cities by giving a working definition. This would clarify the starting point of your research.
The rationale of your case needs some more unpacking; please add some context here, also about the provenance of your dataset.
The research questions here listed are somewhat aligned with the steps you propose. I would recommend you to review them for enhanced consistency.
Review the heading accordingly with the edits. Most importantly, from here on you start with the walkthrough. Make sure you clarify this by tuning the headings.

Loading Data with `readr`

If you referenced the tidyverse above you won’t need to explain tibbles extensively here. Please review this part for conciseness.
Including head(eudata) could support your explanation about the observations occurring in the dataset – this is also considered good practice in data science.
Para 16 could benefit the previous section.
Consider raising the level of this heading and review it accordingly.

Creating a bar graph

IMPORTANT: There is no typecountry column included in your dataset. I tested the walkthrough using the data contained in the eu column, just remember to send us the correct version of the dataset.
Paras 20-23 could be more focused on the walkthrough; anticipating para 23 once obtained the barplot could enhance the clarity.
Para 30 could use a bit more details about the interpretation of the results. If you plan
Review the heading accordingly with the edits.

Other Geoms: Histograms, Distribution Plots and Boxplots

Para 31, penultimate line: comma missing space afterwards.
Para 33, please review this for clarity (here you should mention why you used log10 once for all or put it into another spot. Consider explaining why none of the methods is ideal)

This leads to an uninformative histogram. We can take log10(dist) as our variable or filter to exclude values above 5000kms. None of these methods is ideal, but as far as we know, we are operating with manipulated data making it less problematic
Para 36, please review it for clarity (it reads implicitly why you employed ECDF).
Para 41, same issue: you refer to ANOVA without explaining why you foresee that as a viable statistic test, cutting the paragraph short.
Review the heading accordingly with the edits.

Manipulating the Look of Graphs

This section would be more logically following the Other Geoms section. Evaluate how to make this and the following sessions more cohesive.
Para 42 could be revised for clarity – especially the research question. Mind that you first performed the random subsampling and then explained it.
Para 45 does not add much information to the following steps. Instead of pointing out which elements you want to manipulate, consider laying out clearly the goal for your tasks.
Para 55, review for conciseness (sometimes less is more).
Review the heading accordingly with the edits.

Scales: Colors, Legends, and Axes

Para 65, please review for straightforwardness - advantage of using a continuous scale? Also a repetition in the last line (“represent the distance”).
Para 68, review for accuracy: the way it is phrased seems like ggplot2 does not use discrete colour scales at all.
Para 70, would better fit in the Additional Resources section.
Para 74, review for accuracy.

Faceting a Graph

This section would be more logically part of the Other Geoms section and use a title anticipating also the theme changes.
Para 75, review for clarity and conciseness (“split by categories [space time and so]” is not very straightforward. Consider explaining straightforwardly what facetting is.)

Themes: Changing Static Elements

As the previous, this section would be more logically following the Other Geoms section.

Extending ggplot2 with Other Packages

Para 84, extra comma not rendering the link for Ridgeline plots
As the previous, this section would be more logically following the Other Geoms section.

Additional Resources

Consider reviewing and incorporating other elements into this section, following more closely the tools used in the tutorial instead of pointing towards general-purpose resources. A critical list of resources would be more useful to your readers.

Format & style

Two quick comments on the form and style.

Please homogenise the use of capitalisation in the headings (exclusion made for ggplot2 that always comes lowercased, but you know it 😄)
Please homogenise the way you refer to R functions and arguments – using the code format or not, you choose. Consistency is the only requirement.

Thank you for the great work done so far!

from ph-submissions.

anisa-hawes commented on June 6, 2024 1

What's happening now?

Hello Igor @rogorido and Nabeel @nabsiddiqui. Your lesson has been moved to the next phase of our workflow which is Phase 3: Revision 1.

This Phase is an opportunity for you to revise your draft in response to @semanticnoodles's initial feedback. You can make direct commits to your file here: /en/drafts/originals/visualizing-data-with-r-and-ggplot2.md. @charlottejmc or I are here to help if you encounter any practical problems!

When both of you + Giulia are happy with the revised draft, we will move forward to Phase 4: Open Peer Review.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 2 <br> Initial Edit
Who worked on this? : Editor (@semanticnoodles) 
All  Phase 1 tasks completed? : Yes
Section Phase 3 <br> Revision 1
Who's working on this? : Authors (@rogorido + @nabsiddiqui)  
Expected completion date? : May 17
Section Phase 4 <br> Open Peer Review
Who's responsible? : Reviewers (TBC) 
Expected timeframe? : ~60 days after request is accepted

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

from ph-submissions.

nabsiddiqui commented on June 6, 2024 1

Hello @semanticnoodles,

I have tried to rework a lot of the tutorial. I feel that changing some of the headings will make the flow more obvious. Let me see if it makes sense the way I have done it or if there should be additional changes. Here are some of what I reviewed based on your timeline. The rest I will leave to @rogorido unless he has an objection:

Introduction, Lesson Goals and Data

Para 1, line 2: there is an extra )
Lesson’s goals could be more specific (you could pick outcomes that have major resonance that adding meaningful labels to plots)
No reference to the dataset is presented here (it comes from Wikidata, right?). Make sure you at least have a couple of words about it here represented.
Review the heading accordingly with the edits.

ggplot2: General Overview

This acts more like an introductory section, although it is nested under the previous one. Bring it to the same level as the previous or put it before it to give a more comprehensive introduction (or re-arrange it for better consistency, please).
A couple of words about the Tidyverse here would better contextualise the workflow.
Para 7 could be added to the Additional Resources section.
Para 8 could mention more strategically the arguments – review it for a better alignment with the walkthrough. You could even thinking of following the official layers featured in the introduction to ggplot2 vignette, adapting that to match with the elements you thoroughly explain.
Review the heading accordingly with the edits.

Sister cities in Europe

Please clarify your understanding of sister cities by giving a working definition. This would clarify the starting point of your research.
The rationale of your case needs some more unpacking; please add some context here, also about the provenance of your dataset.
The research questions here listed are somewhat aligned with the steps you propose. I would recommend you to review them for enhanced consistency.
Review the heading accordingly with the edits. Most importantly, from here on you start with the walkthrough. Make sure you clarify this by tuning the headings.

Loading Data with `readr`

If you referenced the tidyverse above you won’t need to explain tibbles extensively here. Please review this part for conciseness.
Including head(eudata) could support your explanation about the observations occurring in the dataset – this is also considered good practice in data science.
Para 16 could benefit the previous section.
Consider raising the level of this heading and review it accordingly. (Felt it was better at this level)

Creating a bar graph

IMPORTANT: There is no typecountry column included in your dataset. I tested the walkthrough using the data contained in the eu column, just remember to send us the correct version of the dataset.
Paras 20-23 could be more focused on the walkthrough; anticipating para 23 once obtained the barplot could enhance the clarity.
Para 30 could use a bit more details about the interpretation of the results. If you plan
Review the heading accordingly with the edits.

Other Geoms: Histograms, Distribution Plots and Boxplots

Para 31, penultimate line: comma missing space afterwards.
Para 33, please review this for clarity (here you should mention why you used log10 once for all or put it into another spot. Consider explaining why none of the methods is ideal)

This leads to an uninformative histogram. We can take log10(dist) as our variable or filter to exclude values above 5000kms. None of these methods is ideal, but as far as we know, we are operating with manipulated data making it less problematic
Para 36, please review it for clarity (it reads implicitly why you employed ECDF).
Para 41, same issue: you refer to ANOVA without explaining why you foresee that as a viable statistic test, cutting the paragraph short.
Review the heading accordingly with the edits.

Manipulating the Look of Graphs

This section would be more logically following the Other Geoms section. Evaluate how to make this and the following sessions more cohesive.
Para 42 could be revised for clarity – especially the research question. Mind that you first performed the random subsampling and then explained it.
Para 45 does not add much information to the following steps. Instead of pointing out which elements you want to manipulate, consider laying out clearly the goal for your tasks.
Para 55, review for conciseness (sometimes less is more).
Review the heading accordingly with the edits.

Scales: Colors, Legends, and Axes

Para 65, please review for straightforwardness - advantage of using a continuous scale? Also a repetition in the last line (“represent the distance”).
Para 68, review for accuracy: the way it is phrased seems like ggplot2 does not use discrete colour scales at all.
Para 70, would better fit in the Additional Resources section.
Para 74, review for accuracy.

Faceting a Graph

This section would be more logically part of the Other Geoms section and use a title anticipating also the theme changes.
Para 75, review for clarity and conciseness (“split by categories [space time and so]” is not very straightforward. Consider explaining straightforwardly what facetting is.)

Themes: Changing Static Elements

As the previous, this section would be more logically following the Other Geoms section.

Extending ggplot2 with Other Packages

Para 84, extra comma not rendering the link for Ridgeline plots
As the previous, this section would be more logically following the Other Geoms section.

Additional Resources

Consider reviewing and incorporating other elements into this section, following more closely the tools used in the tutorial instead of pointing towards general-purpose resources. A critical list of resources would be more useful to your readers.

Format & style

Two quick comments on the form and style.

Please homogenise the use of capitalisation in the headings (exclusion made for ggplot2 that always comes lowercased, but you know it 😄)
Please homogenise the way you refer to R functions and arguments – using the code format or not, you choose. Consistency is the only requirement.

Other

Change Title to be More Descriptive

from ph-submissions.

anisa-hawes commented on June 6, 2024

Hello again Igor @rogorido and Nabeel @nabsiddiqui.

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 2: Initial Edit.

In this Phase, your editor Giulia @semanticnoodles will read your lesson, and provide some initial feedback. Giulia will post feedback and suggestions as a comment in this Issue, so that you can revise your draft in the following Phase 3: Revision 1.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 1 <br> Submission
Who worked on this? : Publishing Manager (@anisa-hawes) 
All  Phase 1 tasks completed? : Yes
Section Phase 2 <br> Initial Edit
Who's working on this? : Editor (@semanticnoodles)  
Expected completion date? : April 20
Section Phase 3 <br> Revision 1
Who's responsible? : Authors (@rogorido + @nabsiddiqui) 
Expected timeframe? : ~30 days after feedback is received

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

from ph-submissions.

semanticnoodles commented on June 6, 2024

Hello Igor @rogorido and Nabeel @nabsiddiqui, I hope you are doing well!

Just checking in with you about the draft revision (Phase 3 / Revision 1) as the deadline of the 17th of May has passed. If you need some extra time let me know approximately how much, so we can set up a new deadline -- and @anisa-hawes or @charlottejmc can update the Mermaid timeframe.

If you have doubts or need any clarification, please do not hesitate to keep in touch.

from ph-submissions.

Comments (12)

Overall feedback

Usability: Enhancing the logical structure of the lesson

Sustainability: Critically reviewing the data analysis narrative

Section-specific feedback

Introduction, Lesson Goals and Data

ggplot2: General Overview

Sister cities in Europe

Loading Data with readr

Creating a bar graph

Other Geoms: Histograms, Distribution Plots and Boxplots

Manipulating the Look of Graphs

Scales: Colors, Legends, and Axes

Faceting a Graph

Themes: Changing Static Elements

Extending ggplot2 with Other Packages

Additional Resources

Format & style

What's happening now?

Introduction, Lesson Goals and Data

ggplot2: General Overview

Sister cities in Europe

Loading Data with readr

Creating a bar graph

Other Geoms: Histograms, Distribution Plots and Boxplots

Manipulating the Look of Graphs

Scales: Colors, Legends, and Axes

Faceting a Graph

Themes: Changing Static Elements

Extending ggplot2 with Other Packages

Additional Resources

Format & style

Other

What's happening now?

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Loading Data with `readr`

Loading Data with `readr`