GithubHelp home page GithubHelp logo

data-lessons / library-shell-deprecated Goto Github PK

View Code? Open in Web Editor NEW
9.0 8.0 19.0 143.64 MB

Unix shell lesson for librarians NOW MOVED > https://github.com/LibraryCarpentry/lc-shell

Home Page: https://github.com/LibraryCarpentry/lc-shell

License: Other

Makefile 2.27% HTML 36.41% CSS 2.47% JavaScript 0.70% Python 55.39% Ruby 0.32% R 2.29% Shell 0.15%

library-shell-deprecated's Introduction

This material has now been moved to the Library Carpentry organisation.

Find it here: https://github.com/LibraryCarpentry/lc-shell

Library Carpentry

The Library Carpentry module 'Shell Lessons for Librarians' is maintained by Belinda Weaver and Tim Dennis.

Background

Library Carpentry is a software skills training programme aimed at library and information professions. It builds on the work of Software Carpentry and Data Carpentry.

Library Carpentry is in the commons and for the commons. It is not tied to any institution of person. For more information on Library Carpentry, see our website librarycarpentry.github.io.

Contribution

There are many ways of contributing to Library Carpentry:

Code of Conduct

All participants should agree to abide by the Software Carpentry Code of Conduct.

Authors

Library Carpentry is authored and maintained by the community.

Citation

Please cite as:

Library Carpentry. Shell Intro for Librarians. June 2016. http://data-lessons.github.io/library-shell/.

library-shell-deprecated's People

Contributors

abbycabs avatar bkatiemills avatar bobharper1 avatar christinalk avatar cmacdonell avatar danmichaelo avatar erinbecker avatar evanwill avatar fmichonneau avatar gcapes avatar gdevenyi avatar gvwilson avatar hugolio avatar jcoliver avatar jduckles avatar johnborghi avatar jpallen avatar jt14den avatar ljsmart avatar maxim-belkin avatar naupaka avatar neon-ninja avatar ostephens avatar pbanaszkiewicz avatar pipitone avatar rgaiacs avatar scottcpeterson avatar synesthesiam avatar twitwi avatar wking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

library-shell-deprecated's Issues

For loop example output is wrong

In episode 1, there are already .txt files in the shell-lesson directory. The current example will select those too.
I'm putting in a PR where the files we create with touch are .doc instead. Seemed like the easiest way to sidestep the issue.

Add authors to AUTHORS file

We now have a workflow for releasing citable versions of our lessons (with DOIs) every 6 months via Zenodo. This makes our more discoverable and sustainable and ensures that everyone involved gets the credit they deserve. For more on this work see data-lessons/librarycarpentry#5

In order to make this happen we need to make one crucial change: all AUTHORS files need to change so that they list names of contributors in the following format:

James Allen
James Baker
Piotr Banaszkiewicz
Erin Becker

@jt14den will run a script that that strips names from lesson logs and edit AUTHORS across all Library Carpentry repos.

When this is actioned (hopefully, soon!), lesson maintainers are asked to eyeball the AUTHORS file to see if anyone obvious is missing (for example, people who contributed to discussions but didn't edit any lessons). Note: template developers are credited in this process; this is in line with Software Carpentry best practice.

In the future, lesson maintainers are encouraged to ensure that those who contribute to lessons are added manually to AUTHORS files (encourage contributors to do it so they see where and how we give credit!)

I/O, redirection and pipes

I think we should describe I/O, redirection and pipes before the example $ grep 2009 2014-01_JA.tsv | grep INTERNATIONAL | awk -F'\t' '{print $5}' | sort | uniq -c. It would provide context and reinforce the Unix philosophy of small programs which perform a specific task - building blocks to design the workflow you want.

Reference to iOS

The iOS reference in this episode may cause confusion. My understanding is that there is no way to use Terminal to access the iOS file system unless you jailbreak but you can use it to remotely SSH into another machine. Should iOS be changed to Mac OS?

Counting and mining with the shell: Learning Objectives

Learning Objectives should not use the word "understand."
Suggest changing objectives as follows:
Understand how to count lines, words, and characters with the shell
-- Demonstrate counting lines, words, and characters with the shell command wc and appropriate flags
Understand how to mine files and extract matched lines with the shell
-- use regular expressions to mine files and extract matched lines with the shell
Understand how to combine mining with the shell and regular expressions
-- create complex single line commands by combine shell commands and regular expressions to mine files

errors in 02-counting-mining.md?

When using the Shell lesson material, we found a couple of errors. Not sure if we misunderstood, please review.

  1. Under "CSV and TSV Files":

Finally we have our old friend head, that we can use to get the first line of the sorted-lengths.txt:

$ head sorted-lengths.txt
5375 2014-02-02_JA-britain.tsv

We found that the code didn't give us the first line of the sorted-lengths.txt, should it be "head -n1 sorted-lengths.txt"?

  1. Challenge "Searching with regular expressions"
    Use regular expressions to find all ISSN numbers (four digits followed by hyphen followed by four digits) in 2014-01_JA.tsv and print the results to a file results/issns.tsv.

The solution says: grep -E '\d{4}-\d{4}' 2014-01_JA.tsv > issns.tsv (this does not work)

This worked for us: grep -P '\d{4}-\d{4}' 2014-01_JA.tsv > issns.tsv

Extended vs Perl regular expressions in episode 2

In episode 2, some of the exercises use extended (-E) regular expressions. As far as I can tell, Perl compatible regular expressions (-P) are more powerful (see, e.g. https://www.regular-expressions.info/posix.html).

Is there any reason for using -E rather than -P?

Extended regular expressions don't support \d to match digits (tested on GNU grep 3.1) so the first solution to "Searching with regular expressions" is incorrect, and won't match anything. To use an extended regular expression you need:

grep -oE '[0-9]{4}-[0-9]{4}' 2014-01_JA.tsv

Rather than introduce ranges, it might be easier to only use Perl compatible regular expressions in the episode. I'm happy to write a pull request to do this if you agree with the approach.

Wording re: "false positives"

A suggestion for this episode - change:

There are a few false positives, but this is still a good start: from 500,000 lines of journal article metadata to a few numbers and names just by typing one line of code.

to

There are a few false positives (e.g. the string was found in the Publisher field and grep returns the whole line), but this is still a good start: from 500,000 lines of journal article metadata to a few numbers and names just by typing one line of code.

Why is there punctuation left in the cleaned file?

Re: Episode 3, end of option 1

Note: there are a few bits of punctuation in here - I’ve left these in deliberately as you should always bug fix! The internet is a always a good place to start searching for why this might have happened (something about the punct command we used…)

From this it seems that all punctuation should have been removed. So what's the reason?

Backup when shell not working

Sometimes shell doesn't work on an attendees laptop: it is installed incorrectly, they've installed the wrong thing, it just unfathomably doesn't work.

Now, peer programming is great from a pedagogical point of view, so "work with someone else" is a good option. But prompted by @weaverbel, we should consider adding a backup to our Instructor Notes.

The suggestion is to recommend https://www.pythonanywhere.com Workflow will need to be added for grabbing the lesson data with wget.

Proposed learning objective

After completing this lesson, you can:

  • explain what the Unix shell is and why it is useful
  • use basic shell commands to work with directories and files
  • use shell commands to count and mine your data (both tabulated data and free text)

Unused files in data directory

There are two files in the data directory which are not used in the lesson:

callnumbers.txt pdflist.txt

Great that they're not in the downloadable zip file, but maintenance of the lesson would be easier if they were removed from the repository(?)

Data file used in example 2014-01_JA.tsv.zip is corrupted

Hi,

I'm following through the exercises on shell as I'll be teaching it.
The material refers to the file 2014-01_JA.tsv. (gh-pages branch in /data folder)
This file doesn't exist but the 2014-01_JA.tsv.zip does, however when I unzip it I'm getting the file "2014-01_JA.tsv.zip.cpgz" on my Mac. Something isn't correct with this file.

Thanks,
Jay.

Filenames not explained

Episode 2 uses grep to process a file, and save the results in another file results/2016-07-19_JAi-revolution.tsv.
What is the significance of this date? It's different to that contained in the original file name.

Add brief bit on the structure of a bash command

Suggestions from Jamie @jamieviva that we add something on the general structure of bash. So, something like:

COMMAND -FLAG FILE.USED

Then explain structure of what happens thereafter, so:

if you add nothing = print to shell; if you add > FILE.OUTPUT = save to file; if you add '|' = save to memory for next command.

It probably should do somewhere in https://github.com/data-lessons/library-shell/blob/gh-pages/_episodes/01-intro-shell.md not before doing work in the shell, but as a "look at what you've also learned!" outcome.

Thoughts?

No exercises on loops

There are two loops in this course, but they're just demonstrations.
I think it would be useful to have a faded example of a loop to reinforce the syntax / keywords, and how to access the value of a variable.

Why use the shell?

I think it would be helpful if there was a more librarian/archivist description of why it is useful to know how to navigate the shell. I think the current description is good but, speaking as someone who does not do a lot of their own programming, the current description doesn't really speak to why it is important for someone like me to know the shell.

Instead of "The motivations for wanting to learn shell commands are many and various.", perhaps something like, "Even if you do not do your own programming or your work currently does not involve the command line, knowing some basics about the shell can be useful. For example, you may find yourself working with a scholar who primarily works through the shell or you may discover down the road that it is the best way to interact with a repository or archive."

Add a faded example?

Right. I'm in instructor training. I'm thinking we could build in a faded example (given that instructor training introduces it but SWC/DC/LC doesn't use it!!!). So, for example:

Learning Objective: understand how commands work; understand how to use pipes

Count the total lines in every tsv file. Sort. Print the first line of the file.

wc -l *.tsv | sort | head -n 1

In all csv files in a directory count the words for each file, then put them into order. Finally, make an output of the final 10 lines of the file.

__ -w *.csv | sort | ____

Logic error in first example

Hi all

Finally getting round to working through the LC courses for myself. Unless I'm totally dumb there si a problem with the first example:

$ grep 2009 2014-01_JA.tsv | grep INTERNATIONAL | awk -F'\t' '{print $5}' | sort | uniq -c

We state that this finds "the number of articles published in 2009 in academic journals whose title contains the word ‘International".

I know this is then qualified with "there are a few false positives" but I wonder if we should expand it a bit to get a better result. Maybe describing the dataset would help (variable names), then come up with an example that is free of those false positives while still demonstrating the power of the command. I'm happy to work on a replacement??

Loops need explaining

A loop is demonstrated at the end of episode 2, but without very much explanation. I'll submit a PR giving a brief outline of the concepts:

  • loops are used to repeat a command for each thing in a list
  • the loop variable takes on a new value on each iteration through the list
  • access variable's value using $variable

Fix missing datasets for 01 and 02

2014-01_JA.tsv seems to be missing from the /data folder, so we probably need to put this back. In any case the intro text needs to be updated to account for the new datasets. Will submit a PR soon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.