GithubHelp home page GithubHelp logo

Comments (11)

nabsiddiqui avatar nabsiddiqui commented on May 17, 2024

I will review this project if needed.

from ph-submissions.

drjwbaker avatar drjwbaker commented on May 17, 2024

Specific Comments

  • "But why would we use R instead of analyzing sources manually? Imagine that you have a census chart with birth dates and numbers of dependents. To collect information from these records, you could go through with pen and paper and manually calculate the average number of children families had or how much insurance people had. To go through an entire census like this would be very tedious and time-consuming. With R, you can find the same information but much faster using quantitative approaches." I'm not sure that the sell is R vs manual but R vs Excel. Why would someone using a spreadsheet package choose R instead?
  • "While R is a great tool for tabular data, you may find using other approaches to analyse non-tabular sources (such as newspaper transcriptions) more useful." Such as? You don't want to lose a potential ProgHist convert, so where else on the site might they find help?
  • "R is run off of a console". As you mention a bit later, it is for the purposes of this tutorial but doesn't have to be. So, be consistent.
  • "The benefit of running R through a command line is that because you are writing programs on a text editor, you can save your work as a reusable program so you won’t have to re-type your work each time you use R". Given that the lesson doesn't use the command line, perhaps - briefly - describe the benefits of using the console.
  • "R can do basic mathematical operations when you enter functions into the console. There are many different ways to compute different functions using R. This tutorial will give the basic ways to use some of the tools, but as you get more comfortable with the program, you will be able to develop faster ways to make computations.". Remove as you've kind of already said this and it doesn't fit under the heading.
  • "4. sum(Air50) [1] 1676 " Formatting issue here.
  • "mtcars[1,2] [1] 6 ```" This looks like an error?
  • "Matrices The benefit of knowing" Syntax error here.
    On the Matrices bit we have an Old Bailey range of data between "1670 and 1700" spread across four columns, so is this 1670s, 1680s, 1690s, and 1700s? Because if so, that data would be between 1670 and 1710. Basically, I think the data range and the values don't match.
  • "rbind() Look at the difference between Crime and Crime2". I think the rbind() bit is redundant.
  • "in R, multiplication is done component-wise instead and can be expressed as a%*%b, for example" This doesn't mean anything to me as the jargon 'component-wise' doesn't register. Explain in more simple terms.
  • "You can change the directory if needed by .)" Incomplete sentence.
  • #Summary and Next Steps change to # Summary and Next Steps

General Comments

This is much better. The top half is especially strong: if felt the lesson loses momentum as it goes along, both in terms of narrative and presentation. The latter is easy to fix (more careful editing). But on the former I do wonder about the ordering of the lesson. The narrative from that outset seems to be moving from smaller data to bigger data, but then we go to matrices (for small data) after the larger dataset example, followed by uploading a small dataset as a .csv (which, although more manageable in terms of learner interaction is odd given that in reality if you had a small dataset you'd make a matrix, right?)

For me then, what we have going on in this lesson are two distinct things: a lesson on simple calculation in R and a lesson on input methods for R (crudely speaking, matrices vs import via csv, and why you might choose one over the other). I'd suggest that these two either need to be separated into different lessons or those different learning outcomes reflected in the organisation of the lesson. I have a preference for the latter, given that as stand-alone lessons they wouldn't get a historian from no R and no idea what R can do for them to some R and some idea what R can do for them.
Best of luck and happy to discuss if something isn't clear!

from ph-submissions.

acrymble avatar acrymble commented on May 17, 2024

Thanks @drjwbaker. We'll let @taryndewar respond to these once we have the reviews in.

@nabsiddiqui the lesson is currently with another reviewer, but if you'd like to contribute to the open review stage, please feel free to do so. We'll close that phase once we've heard back from the other reviewer.

from ph-submissions.

nabsiddiqui avatar nabsiddiqui commented on May 17, 2024

If the reviewers are already chosen, I will just wait until something else is in need of review. If I do end up doing the tutorial, I will provide comments if they are needed. Congratulations on the tutorial, looks great.

from ph-submissions.

histlib avatar histlib commented on May 17, 2024

Introduction
P 2-5: these paragraphs all deal with, to varying degrees, why R rather than manually compute; as such would be better to merge to reduce redundancy (especially with the repetitions around the word "manually").
P 2,3-4: these two sentences communicate the same thing
I might approach the introduction more like:

  • there's more historical data, need way to work with lots of tabular data that is efficient and repeatable
  • manual approaches are clearly inefficient and do not lend themselves to repeated analysis (though I do agree with James above that the competition is more likely Excel)
  • but there's R, it's a programming language that has these useful features (what you've got already, plus bring in a taste of what's to come in the tutorial: easy to generate basic statistical info or to subset the data or ...)
    And then the intro to the tutorial proper (paragraph 6), which seems fine.

Installing R
P 1: I favor being slightly more complex in this first paragraph: "R is a programming language and environment for working with data. R can be run using the R console, which is what this tutorial will focus on, as well as on the command line or the more user-friendly interface of RStudio." Then you can continue on with "To get started with R..."

Using the R Console
P 1: I'd junk the command line sentences in the first paragraph, especially since you can also write savable scripts in the R console (File > New Script).
P 2: "...or by selecting GUI preferences in the Edit menu" or something like that; the way you have it now is a little unclear
P 3: This paragraph doesn't fit in here. I think you could cut it entirely.

Using Data Sets
P 1: What if you got rid of the first sentence and started with "Before working with your own data, it helps to practice [or perhaps, "to get a feel for R"] using the datasets included with R."
P 1: Final 3 sentences (starting with "These are great for practicing") should be cut. Introduce importing when it's time to introduce importing.
P 2: the sentence referring to who compiled the data would be better as a parenthetical at the end of the previous sentence
P 2: why repeat the data(AirPassengers) and AirPassengers commands twice?
P 3: Something more like: "You can now use R to answer a number of questions based on this data, for example, the most popular months to fly or if there was an increase in international travel over time. You could probably find the answers to such questions simply by scanning this table, but not as quickly as the computer. And what if there was a lot more data?"

Basic Functions
In your introduction of variables, it might be useful to link out to a trusted tutorial about naming conventions/best practices for variables in R.

In the solutions for this section, the fourth solution (for: What is the total number of people who flew in 1950?) has fallen out of the formatting for the numbered list/table.

Working with Larger Data Sets
My tendency is to be irritated when I'm made to do tedious things when there are much simpler solutions at hand. So I'm not sure of the value in having all of the examples in the previous sections. I'd much rather get to this section faster, using the previous section as a way to introduce the simple statistical functions plus variables.

There's an error in the code below "To see a column of the data, you could enter" (the return for mtcars[1,2] surely isn't three back-ticks). I suspect this is an error in how the R code is embedded since "This would show you..." is probably not part of the code window, and then there's the "mtcars[1,2][1]6" string after.

Matrices
P 1: random "Matrices" inserted after first sentence. The second sentence might be better off with "...knowing how to construct matrices in R..."
P 2: I'd rather "To do this, let's create the variables Theft and ViolentTheft using the totals from each decade as data points:" It also would be useful to use a screenshot (or something else) to show where this data is coming from.
The "cbind() combines the data by column" is redundant, just start in with the rbind stuff. Adding the t function here (in passing) might be too much.
P 3: I think the natural question while reading this would be "why can't I just run the matrix function on my two variables?"
P 4: I don't believe this paragraph adds much to the tutorial.
P 5: I'd redo the intro to this paragraph by just launching in: "The apply() function allows you to ..." I also wonder if this discussion - using the car data instead of the crime data - might be better off in the previous section.
Final paragraph for this section: the thing is, matrices can be useful with large amounts of data, too; manually creating matrices is only useful with small data. I would delete this.

Loading Your Own Data Sets into R
Intro into this section with some kind of "Now that you've practiced with simple data, you're probably ready to work with your own. Chances are your data is in a spreadsheet; how can you work with this data in R?"
You don't have to convert to CSV since there is a package for importing Excel files (readxl). Perhaps you could introduce the standard read functions (because what if they have tab-delimited rather than comma-delimited data) plus readxl.
The working directory issue is important, so good to introduce this here, but maybe give it more attention, such as its own paragraph with example.
Could also add in here how to write to file. Like, you've got your crime matrix, here's how to save it.

Summary and Next Steps
Erroneous pound sign in section title/formatting off
I might use "work with research data" in the first sentence. The tutorial is more about manipulating data than analyzing it, isn't it?
P 2: For more information on R, visit the R Manual.
P 3: Be more selective here, I think. Like one online tutorial and then the DataCamp course. And annotate - why do you like this tutorial in particular? Who is it good for? Are these free?
P 4: What's so great about Digital History Methods in R? Is it introductory or for advanced users?

General Issues
Regarding James' general comments: I agree that there is a slight disconnect - there's some basic quant work and then how to enter/manipulate data - but I think the solution is in the packaging. Tweaking the intro and conclusion along the lines I've suggested and then coming up with a different title would do the trick. I'm rubbish at titles or else I'd suggest a few ("R Data Basics"?).

from ph-submissions.

acrymble avatar acrymble commented on May 17, 2024

Thanks to you both. I'll need a few days to summarise this for our author. But I'll try to do that as soon as I can.

from ph-submissions.

acrymble avatar acrymble commented on May 17, 2024

Thanks to our two reviewers. We'll close reviews at this stage so @taryndewar can focus on making updates.

I think the two reviews are fairly self-explanatory, but ask questions if you need to @taryndewar. Both reviewers have focused primarily on helping to clarify concepts and language, and working on reducing some redundancy in the lesson.

There is a fairly substantial list of suggested copy edits, which I’d invite you to consider. Probably easiest to do these first, as they may help rectify some of the other issues. You don’t have to accept everything that was suggested, but you might want to acknowledge that anywhere a reviewer paused and thought: this doesn’t sound right, your readers will pause too, and they might not have as much experience.

A few things I’d particularly like you to respond to:

  1. R vs Excel rather than R vs manual. I think James makes a good point here, and that it might be more compelling to compare what R can do to what people tend to use Excel for. Does anyone calculate things with pen and paper anymore?
  2. The confusion about the different ways you can use R (eg, via console). This may confuse people, so perhaps best to give them one way and stick to it? They can learn other options later, but you don’t want to overwhelm them.
  3. James has suggested reordering sections to make the lesson flow better for a new learner. John has suggested an alternative solution that involves beefing up some sections (which he’s provided very clear suggestions on).

We’ll probably have to think about a title, but let’s do that at the end.

Finally, I can appreciate why someone might be frustrated going step by step through easy examples, but we need to be wary that some users will be starting from zero, so I’d like you to keep the easy early examples in place so that we don’t raise the barrier to entry.

When you've had a chance to make the changes @taryndewar, please post here letting us know what you have/have not done. It looks like a big list, but a lot of it is copy editing, so I don't expect you've got a big job ahead of you. Let me know if you need any support or have any questions.

from ph-submissions.

wcaleb avatar wcaleb commented on May 17, 2024

The images and figure syntax for this lesson will need to be updated according to the new guidelines posted here.

from ph-submissions.

acrymble avatar acrymble commented on May 17, 2024

@taryndewar has made edits based on the feedback. I've also done a copyedit and in the process have changed the name and URL:

https://github.com/programminghistorian/ph-submissions/blob/gh-pages/lessons/r-basics-with-tabular-data.md

Just waiting on a code example from @taryndewar for the 'Saving Data in R' section and then ready to go.

from ph-submissions.

acrymble avatar acrymble commented on May 17, 2024

Suggested images for icon:

https://www.flickr.com/photos/britishlibrary/11065618604/in/album-72157638733975756/
https://www.flickr.com/photos/britishlibrary/11054877045/in/album-72157638733975756/
https://www.flickr.com/photos/britishlibrary/11081312545/in/album-72157638733975756/
https://www.flickr.com/photos/britishlibrary/11081699376/in/album-72157638733975756/

from ph-submissions.

acrymble avatar acrymble commented on May 17, 2024

This has been published at: http://programminghistorian.org/lessons/r-basics-with-tabular-data

Thanks @drjwbaker @histlib for your efforts. We appreciate the time you put in to improve lessons.

from ph-submissions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.