rse-classwork's People
Forkers
jimmylihui erg43 howardys shashvatshukla jiayi21 leong-ucl hsayginel roash rcampsemp jc-hammer97 ilusm603 zephromonian dpuclrse juanfrh7 jkos9 danield21 anushkagupt louisemccaul yinwenyang mrmoodyd zyxxyztp congzheng1208 uclyxy scerbo-robert ezubs zwei21 bozcar hululu12138 williamwooow pete-r-jones tongfei12 preller oliviayt1224 ela-kan nasrukhan159 hasara20 chrisnasseh ziyang-yan qingranlu mmmm6666 xenofon-chiotopoulos felixfelicisss myx2021 oliverg74 amazingdragon28 shiqi-wang-1999 haoranchang17 timlllllli matthewtam17 ikitcheng alessandrofelder linwanbo will43w captainfjc susizhou solemnshark871 yyzd-jz 09sienkiewicz-m xzyolanda tomholdendye dannychuniwu ucapgl0 jj-shi claine18 nicolejurjew keru-zeng llf54321 cx3506 profpeterastron zengxueyuan handsomeshao11 jfaine dom-kufel 512060578rse-classwork's Issues
Reading and writing structured data files
The purpose of this exercise is to check that you are able to read and write files in commonly-used formats.
Make sure you have read the course notes on structured data files first!
For this exercise, you will work with the code you wrote to represent a group of friends during the last lesson (#7).
If you prefer, you can use the sample solution instead of yours, especially if you have used a custom class in your solution.
You can use either JSON or YAML for this exercise - the choice is up to you.
Write some code to save your group to a file in the format of your choice (JSON or YAML).
Make sure you can read the file by loading its contents. Is the result identical to the original structure?
If you have questions or run into problems, leave a message on the Q&A forum on Moodle.
Writing a test for `times.py`
This exercise will look at understanding what a given python function (times.py
) does, and writing a test to check that it works as expected.
- Fork the times-tests repository and clone it on your computer.
- Read the description of the exercise in the README file.
- Start by reading the code in
times.py
, understanding what it does, and running it (before making any modifications to it). - The next step consists of converting the
__main__
part of the code into a unit test. - Check that your test passes by running
pytest
. - When you are happy with your solution (or want some feedback!):
- Push your new code to your own fork.
- On GitHub, open a pull request from your fork to the original repository.
- In the description, include the text
Answers UCL-MPHY0021-21-22/RSE-Classwork#16
. This will list your PR to this issue. - On the PR text, comment on what you found difficult or interesting, or something you learned.
- Choose one of the other pull requests listed on this issue, and leave a review. Comment on things you find interesting or don't understand, any problems you think you spot, good solutions, or potential improvements.
- Mark the assignment on Moodle as complete.
- Think about what other aspects of
times.py
should be tested and report them on the Moodle questionnaire.
If you have questions or get stuck, ask on Moodle or book an office hours slot!
Sample solution
Automating `git bisect` - part I
In your homework (#25), you've seen that even for just 24 commits (and there can be many more), you need to type quite a few, repetitive git bisect
commands to find the commit you're looking for.
It's therefore something that is useful to automate. The Solving automatically section in the notes may be useful. Given the same situation as in this week's homework.
Let's go through some steps in the next issues:
Step 0
๐คฉ If you've attempted the homework, your repository may be in a bisecting state.
Therefore, run the following to keep everyone in the same point:
git bisect reset # to make sure you are not in a bisecting state
git switch main # to go back to Charlene's original point
git reset --hard HEAD # to remove any modifications done to the tracked files
๐ If you've not tried the homework yet, then clone Charlene's repository locally:
git clone [email protected]:UCL-MPHY0021-21-22/sagittal_average.git
Step 1
In a new file (test_sagittal_brain.py
) create an input
and expected
numpy arrays to test Charlene's code.
Take a look to the diagram on the previous issue (#25) to understand what Charlene is trying to do.
Think why the current input (brain_sample.csv
) and output (brain_average.csv
) files that Charlene's been using to test her code are not very useful, and create new ones that could highlight any common error in this type of data manipulation.
Hint
Which array can you create that will produce different average values for each different row?
React to this issue with a ๐ when your team has completed the task.
Finding bugs in history
Charlene Bultoc has just started a post-doc at an important neuro-science institute. She is doing research on a new methodology to analyse signals in our brains detected through a combination of CT and MRI. Using image processing techniques she can simplify the whole dataset into a grid of 20x20 arrays.
Her theory is that the average of such signals through the sagittal plane is constant over time, so she has written some software to calculate this. She decided to write that software in Python so she could share it (via GitHub, sagittal_average) with people from other labs. She didn't know as much Python when she started as she does now, so you can see that evolution in her program.
Charlene is an advocate of reproducibility, and as such she has been keeping track of what versions she's run for each of her results. "That's better than keeping just the date!" you can hear her saying. So for each batch of images she processes she creates a file versions.txt
with a content like:
scikit-image == 0.16.2
scikit-brain == 1.0
git://git.example.com:brain_analysis.git@dfc801d7db41bc8e4dea104786497f3eb09ae9e0
git://github.com:UCL-MPHY0021-21-22/sagittal_average.git@d8bc3ebaecd0cc7a2872da4c81d30b56f9b746ad
numpy == 1.17
With that information she can go and run the same analysis again and again and be as reproducible as she can.
However she's found that sagittal_average
has a problem... and she needs to re-analyse all the data since that bug was introduced. Running the analysis for all the data she's produced is not viable as each run takes three days to execute - assuming she has the resources available in the university cluster, and she has more than 300 results.
In all the versions of the program, it reads and writes csv files. Charlene has improved the program considerably over the time, but kept the same defaults (specifically, there are an input file, brain_sample.csv
, and an output file, brain_average.csv
). She has always "tested" her program with the brain_sample.csv
input file provided in the repository. However (and that's part of the problem!), the effect of the bug is not noticeable with that file.
We can then help her either by letting her use our laptops or (better) by finding when the bug was introduced and then run only the ones that need to be re-analysed.
Finding when the bug was introduced seems the quickest way. Download the repository with her sagittal_average.py
script and use git bisect
to find the commit at which the script started to give wrong results.
Do it manually first (as explained in this section of the notes).
Steps to help Charlene:
- Fork Charlene's repository and clone your fork.
- Run the latest version of the code with the existing input file
- Create a new input file to figure out what the bug is
Hint: You can generate an input file that does show the error using the code snippet below:You may need to create theimport numpy as np data_input = np.zeros((20, 20)) data_input[-1, :] = 1 np.savetxt("brain_sample.csv", data_input, fmt='%d', delimiter=',')
brain_sample.csv
file each time you move through the commits. - Use bisect manually until you find the introduction of the error. Take note of the hash and date of the commit that introduced the bug - you will need this information in class.
- How would you fix the bug?
Creating a ๐๐ฆ with โน, ๐ท and ๐
Improve Charlene's package even further by adding basic information, a documentation website and the config to run the tests automatically on Github Actions.
-
Choose who in your team is sharing now! (make sure you've got a fork and a local copy from Charlene's repository)
-
Write three files that will make this library sharable, citable and descriptive.
-
Create a
.github/workflows/pytest.yml
file to run the test automatically each time something it's pushed to the repository (See also solutions to the #19 exercise. -
Optional: As we did last week, generate a documentation website using
sphinx
. (Using thegithubpages
sphinx extension and pushing the build directory into agh-pages
branch will show you the documentation in the repository's website) -
Share your solution even if it's a work-in-progress as a pull request to Charlene's repository mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#39
in the pull request description), remember to mention your team members too! (with@github_username
)
Refactoring - Part 2
This follows on from #43.
Stage 2: Using a Person Class
We will now look at at how to represent and manipulate the person data with our own Person class.
Instead of each person being a dictionary, we will represent them with the class that has methods for dealing with the connections. We will restructure our code so that the functions become methods of the class. You may also wish to refer to the course notes on object-oriented design.
One example of the starting point for the structure is the file initial_person_class.py
.
We have implemented some methods for these classes, but not everything that is required (the remaining methods have pass
instead of actual code).
Your task:
- You should already have the files from the previous part.
- Fill in the remaining code in
initial_person_class.py
so that the file works as before. - Run the file to make sure the assertions are still satisfied.
- Commit your changes.
- Create a pull request from your branch to the original friend-group repository and use the text in the description to link your PR to this issue
Answers UCL-MPHY0021-21-22/RSE-Classwork#44
Automating `git bisect` - part VI
Continuation from #30.
Now that you have a script that tests Charlene's script we can start to find when the error was introduced!
Step 6
Let's run our new script on the current state of Charlene's project and see what happens:
-
Make sure you are in Charlene's last commit (i.e.,
d8bc3eb
), you can check it withgit log
. -
Run
git bisect start
to start the bisect process (it should not produce any output) -
Run your script:
python test_sagittal_brain.py
-
In which state is the code? Did it fail (
bad
) or everything seemed fine (good
)?
Run thegit bissect <state> <id-commit>
Note
If you are in this commit, you can use
HEAD
as theid
Now let's got to a point in history we believe the code was working correctly.
- Take a look at the history of Charlene's repository:
git log --oneline
- Jump to her second commit:
git checkout <id-commit>
(When she introduced data for future testing) - Run your script:
python test_sagittal_brain.py
- In which state is the code now,
bad
orgood
?
Run thegit bissect <state> <id-commit>
By now we should have told bisect
that there is a good and a bad commit and we want to find when things started going wrong.
You can see which ones are by running git bisect log
.
Hint
The output of the log
should be something like:
git bisect start
# bad: [d8bc3ebaecd0cc7a2872da4c81d30b56f9b746ad] Makes the file Pep8 compliant and fixessome typos on docs
git bisect bad d8bc3ebaecd0cc7a2872da4c81d30b56f9b746ad
# good: [9dc8a27ada280e4479241c37bcb4d7f50c34ca09] Adds input and output data for future testing
git bisect good 9dc8a27ada280e4479241c37bcb4d7f50c34ca09
Let's now git bisect
to find the commit that introduce the bug!
git bisect run python test_sagittal_brain.py
React to this issue with a ๐ when your team has completed the task.
Automating `git bisect` - part II
Continuation from #26.
Now that you have created two arrays.
Step 2
Use numpy
- to save the input array into the
brain_sample.csv
, and - to read the
output
frombrain_average.csv
.
Hint
Look at Charlene's code to learn from her! You can also look the how-to guide about reading and writing files from Numpy
React to this issue with a ๐ when your team has completed the task.
Stretch Goal: Friend group data functions
Now that you've got a structure for your group in #7, we'd like you to create a new branch off your current group branch and create some functions.
- Turn your video cameras on!
- Choose one person who will share their screen.
- Create a new branch from your current group branch that starts with
stretch_
. e.g.stretch_dpshelio-ageorgou-stefpiatek
- Discuss with your group the best way to make the following functions, you can add extra parameters to the functions if you think it would be useful.
- forget(person1, person2) which removes the connection between two people in the group
- add_person(name, age, job, relations) which adds a new person with the given characteristics to the group
- average_age() which calculates the mean age for the group
- Commit your changes to your branch! (with a meaningful message)
- Push your changes from your computer to your fork.
- Create a pull request (PR) from your branch to original friend-group repository.
Add a meaningful title to that PR and don't forget to mention your partners in the description (as@username
) and a link to this issue:Answers UCL-MPHY0021-21-22/RSE-Classwork#8
Branch
Creating a ๐๐ฆ with an entry point
Help Charlene run a command in her package from anywhere. You can do so by adding an entry point following the instructions below.
-
Choose who in your team is sharing now! (make sure you've pulled from your colleague's fork!)
-
Move the
if __name__ == "__main__":
block to its own file (e.g.,command.py
) and add it as entry point tosetup.py
called "sagittal_average_run" -
Add the dependencies of this library as requirements to
setup.py
. -
Try to install it by running
pip install -e .
where thesetup.py
is. -
Go to a different directory, run
python -c "import sagittal_average"
and see whether the installation worked. -
Check you can use the entry point from anywhere, by calling
sagittal_average_run <path/to/input/csv>
from the different directory. -
Share your solution as a pull request to Charlene's repository mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#38
in the pull request description), remember to mention your team members too! (with@github_username
)
.
Plotting the earthquake dataset
Your goal is to analyse the same earthquake data as before (#13) and produce two plots, showing:
- the frequency (number) of earthquakes per year
- the average magnitude of earthquakes per year
To help you off, we have suggested an outline of the code. You can change this as you want, or use your own structure.
- Choose someone to share their screen and type. The other team members will tell them what to write.
- Make sure that person has forked the earthquakes repository and cloned their fork locally. They should also give access to the rest of the members.
- Make a new branch with your combined GitHub usernames, named
plots-@username1-@username2-...
- If you are not sure you have read the data correctly, you may want to look at the sample solution. You can start from that or from one of your own answers.
- Take a few minutes to look at the outline given, and think about how you will structure your code. What steps do you need and how will they connect? Do you want to change the provided functions or add some more?
- Write some code to produce one plot.
- When you are finished (or have done as much as you can), push your code, and open a Pull Request to the original
earthquakes
repository. Include the textAnswers UCL-MPHY0021-21-22/RSE-Classwork#15
in the description to link it to this issue. Add one or both plots if you want! - If you have time, continue with the other plot and add it to the PR!
Some hints:
- You can do the computations required in "plain" Python, but think about using the
numpy
library (theunique
function or others could be helpful, depending on how you have approached the problem) - For plotting:
- Make sure you have computed the values you need to plot!
- Choose an appropriate plot type (if you need inspiration, there are various galleries) and then see how to create that in
matplotlib
. - See whether you need to put your data in a particular form to create the plot.
- After plotting, do you need to make any visual adjustments? (on, for example, the axes, labels, colours...)
- Save your plots to a file and check the result.
More unit tests!
Now that you know how to create a test, create three further tests for times.py
:
Setup
Either start from your homework solution, or
- Create a fork of the
times-tests
repository and clone the fork locally. - Create/modify files
times.py
andtest_times.py
following the sample solution - Commit the files with an appropriate commit message
Add collaborators
Add everyone in your group as collaborators to your fork.
Create three further tests
- work collaboratively!
- create a test each in
test_times.py
for:- two time ranges that do not overlap
- two time ranges that both contain several intervals each
- two time ranges that end exactly at the same time when the other starts
- run
pytest
and see whether all tests are picked up bypytest
and whether they pass. - fix any bugs in
times.py
the tests may have helped you find. - Add the new and modified files to the repository, commit them (with a meaningful comment that also includes
https://github.com/UCL-MPHY0021-21-22/RSE-Classwork/issues/17
) and push it to your fork.
Sample Solution
Creating a fixture
Separating data from code
Your parametrised tests now probably got a bit too big and difficult to read.
Create a fixture.yaml
file where you can store what you parametrised before
in a more human readable way.
Load the yaml file within the test and use it that structure to feed the parametrize test.
The fixture.yaml
could look like:
- generic:
time_range_1: ...
time_range_2: ...
expected:
- ...
- ...
- no_overlap:
time_range_1: ...
time_range_2: ...
expected: []
Once you have a solution, commit it including Answers https://github.com/UCL-MPHY0021-21-22/RSE-Classwork/issues/22
in the message and push it to your fork on GitHub.
Sample solution
Summary data from group
Exercise done alone:
-
Using the fork of data structure for the group of people that you created earlier for #๏ธโฃ7๏ธโฃ, clone your fork locally if you havent already (or create new fork and add the code from the example solution if your group hadn't managed to create a solution)
-
Create a new branch from your team-named branch with your name only
-
Add some code that makes use of comprehension expressions to your
group.py
file so that it prints out the following when the script is run:- the maximum age of people in the group
- the average (mean) number of relations among members of the group
- the maximum age of people in the group that have at least one relation
- [more advanced] the maximum age of people in the group that have at least one friend
-
Create a pull request (PR) from your branch to the original repository.
Add a meaningful title to that PR and a link to this issue:Answers UCL-MPHY0021-21-22/RSE-Classwork#12
Sample solution (with previously given sample data structure)
Generating and Solving conflicts
- Read the content of the script below
- Run it one by one or as a script on your machine. It will create a merge conflict.
- Resolve the merge conflict so the text in README.md is "Hello World".
- Make sure your working tree is clean
- Create an issue in your fork with a code block showing how the file looks before and after the conflict. Add a link to this issue in the description of your issue as:
Answers UCL-MPHY0021-21-22/RSE-Classwork#3
cd Desktop/
mkdir MergeConflict
cd MergeConflict/
git init
touch README.md
echo "Hello" > README.md
git add README.md
git commit -m "first commit on main"
# if your default is not main; rename master with: git branch -m main
git checkout -b new-branch
echo "Hello World" > README.md
git commit -am "first commit on new-branch"
git checkout main
echo "Hola" > README.md
git commit -am "second commit on main: adds something in Spanish"
git merge new-branch
Automating `git bisect` - part V
Continuation from #29.
Now that you have created two arrays, can read and save them, compare expected values, and call external command from within Python.
Step 5
Bring what's needed from test_call_command.py
to test_sagittal_brain.py
so that when calling this script the following happens:
- a good input 20x20 array is created (
input
); - an expected array with 20 elements is created (
expected
); - the
input
array is saved as a csv file (brain_sample.csv
); - Charlene's code is executed from within this script (using subprocess);
- the output produced by Charlene's code (
brain_average.csv
) is read intooutput
; and - test that output and expected are equal.
run that script from the bash terminal.
React to this issue with a ๐ when your team has completed this task.
Approximating ฯ using Numba/Cython
This exercise builds on #46. It is part of a series that looks at execution time of different ways to calculate ฯ using the same Monte Carlo approach.
This exercise uses Numba and Cython to accomplish this approximation of ฯ. A Numba version of the code is already written, and you can find it in calc_pi_numba.py
file in the pi_calculation repository, on the class
branch. Your job is measure how much time it takes to complete in comparison to #46.
Preparation
The two frameworks we will look at allow you to write Python-looking code and compile it into more efficient code which should run faster. Numba is a compiler for Python array and numerical functions. Cython is a way to program C extensions for Python using a syntax similar to Python.
Both frameworks should come with your conda installation. If not, and you get errors when running the instructions below, use conda
or pip
to install them (see their websites linked above for instructions).
Using Numba
- Look at the implementation using numba:
calc_pi_numba.py
- Discuss how different it looks to the original. Is it more/less readable? Can you understand what the differences mean?
- Run the code with
python calc_pi_numba.py
. How does the time compare to the original?
Using Cython
Next, try to use Cython to approximate ฯ. This part will be easier for users of Linux and OS X, as getting Cython to run on Windows is a little more involved.
We will use a notebook for this example, as it lets us see more information about how Cython works.
- Open the Jupyter notebook in
calc_pi_cython.ipynb
. - As before, discuss how different the code looks to the original.
- Use
%timeit
within the notebook to compare with the runtime of the Numba version and the original code. - From what you have read or know, can the Cython performance be further improved?
Argument Parsing 1/3
When writing code, it is important to think about how you or others can run it. A popular way is to use a command-line interface, so that your code can be executed as a script from a terminal. In this exercise we will look at the tools that Python offers for creating such interfaces.
We will use the squares.py
file we used last week for the documentation exercise. We will make the code more generic by creating a command-line interface that will make it easier to call.
Constant weight
Let's first make our first interface without weights (assuming them constant and equal to 1).
- Choose who in your team is sharing
- Make sure you have a fork and a local copy of the average_squares repository.
- Open the file
squares.py
. Make sure you can run it from a terminal! (python squares.py
) - Look at the part of the file that is inside the
if __name__ == "__main__":
guard. This is the code that you will work on. Currently, the input values are hardcoded into the file. - Use the
argparse
library to rewrite this part so that it reads only thenumbers
from the command-line (keep for now the weights hardcoded). The file should be runnable aspython squares.py <numbers>...
(where<numbers>
should be replaced by the sequence of numbers of your choice)- Look at the example in the notes to get you started.
- Decide which values should be read from the command line.
- Add them as
argparser
arguments. - Check the auto-generated help:
python squares.py --help
. - Check that you can run the file with the new form.
- Share your solution as a pull request mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#32
in the pull request description), remember to mention your team members too! (with@github_username
)
Learning all about Pull Requests
In small groups:
- Fork our Travel guide repository
- Clone your fork locally
- Create a new branch named with a combination of your team
e.g.,dpshelio-ageorgou
- Create a new file in the right place named after a place both of you would like to visit. Create any intermediate directory needed.
e.g.,./europe/spain/canary_islands.md
- Add to that file:
- a title (e.g.,
# Canary Islands
) - a small paragraph why you would like to go there
- End the file with a link to wikivoyage and/or wikipedia of that place.
e.g.,More info at [wikivoyage](https://en.wikivoyage.org/wiki/Canary_Islands) and [wikipedia](https://en.wikipedia.org/wiki/Canary_islands)
- a title (e.g.,
- Commit that to your branch! (with a meaningful message)
- Add the internal links needed to get from the main page to that one
e.g., link from Europe'sREADME.md
to Spain'sREADME.md
, link from Spain'sREADME.md
to Canary Islands filecanary_island.md
- Commit these changes! (with a meaningful message)
- Create a pull request from your branch to my repository.
Add a meaningful title to that PR and don't forget to mention your partner in the description (as@username
) and a link to this issue
Answers UCL-MPHY0021-21-22/RSE-Classwork#6
Refactoring - Part 3
This follows on from #44.
Stage 3: Object-oriented structure
We will now look at at how to represent and manipulate this data using our own user-defined objects.
Instead of a dictionary, we will define two classes, which will represent a single person and the whole group. We will restructure our code so that group functions apply directly to the group, instead of the person having all of the methods.
Again, you may also wish to refer to the course notes on object-oriented design.
Take a look at the file initial_two_classes.py
to see one possible way in which the code could be structured.
Internally, the Group
class still uses a dictionary to track connections, but someone using the class does not need to be aware of that. We have implemented some methods for these classes, but not everything that is required (the remaining methods have pass
instead of actual code).
Your task:
- You should have the files from the previous parts of the exercise.
- Fill in the remaining method definitions.
- Update the section at the end of the file so that it creates the same group as in the previous example, but using the new classes you have defined.
- Run the file to make sure it gives the same results as before (that is, the assertions still pass).
- Commit your changes.
- Create a pull request from your branch to the original friend-group repository and use the text in the description to link your PR to this issue
Answers UCL-MPHY0021-21-22/RSE-Classwork#45
- Think of the benefits and drawbacks of the object-oriented structure compared to the original approach (collection of functions).
- If you have time, think of other changes you consider useful and try them.
Parametrising the tests
Avoid code repetition using pytest.mark.parametrize
Now that you have written four different positive tests for times.py
, take a step back and look at your code: There is a lot of repetition, almost every test (apart from the negative test) essentially does the same (albeit with different data), which makes our test code harder to change in the future.
We can use pytest.mark.parametrize
to get our tests DRY (Don't Repeat Yourself).
- You have seen
pytest.mark.parametrize
in the notes. Using the documentation ofpytest.mark.parametrize
if needed, see how you can compress most of the tests on a single one.
You will need the test function to accept parameters, for example time_range_1
,time_range_2
and expected
,
let the parametrize decorator know about it as its first argument and pass a list of tuples of length 3
with the values for each test.
Commit your solution including Answers https://github.com/UCL-MPHY0021-21-22/RSE-Classwork/issues/21
in the message, and push it to GitHub.
What are the advantages and disadvantages of using parametrize
in this case?
Sample solution
Creating a ๐๐ฆ with tests
Help Charlene to test her package (remember to commit after each step, if appropriate).
-
Choose who in your team is sharing now! (make sure you've pulled the latest changes from your team's fork.)
-
Create a
tests
directory insidesagittal_average
and add a test similar to what we used last week when we discovered the bug.Hint
You need a
test_something
function that runs all the below- Create an input dataset
data_input = np.zeros((20, 20)) data_input[-1, :] = 1
- Save it into a file
np.savetxt("brain_sample.csv", data_input, fmt='%d', delimiter=',')
- Create an array with expected result
# The expeted result is all zeros, except the last one, it should be 1 expected = np.zeros(20) expected[-1] = 1
- call the function with the files
run_averages(file_input="brain_sample.csv", file_output="brain_average.csv")
- Load the result
result = np.loadtxt(TEST_DIR / "brain_average.csv", delimiter=',')
- Compare the result with the expected values
np.testing.assert_array_equal(result, expected)
What could you do to make sure that these files we are creating don't interfere with our repository or the rest of the package?
- Create an input dataset
-
Add an
__init__.py
file to the tests folder. -
Fix
sagittal_brain.py
(as you may remember from last week, the code wrongly averages over the columns, not the rows), make sure the test passes and commit these changes. -
Try to install it by running
pip install -e .
where thesetup.py
is, and then run the tests withpytest
. -
Share your solution as a pull request to Charlene's repository mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#37
in the pull request description), remember to mention your team members too! (with@github_username
)
Completed #33
Binliang&Wei
Branch
Approximating ฯ using parallelisation
This exercise builds on #46. It is part of a series that looks at execution time of different ways to calculate ฯ using the same Monte Carlo approach.
This exercise uses the Message Passing Interface (MPI) to accomplish this approximation of ฯ. The code is already written, and you can find it in calc_pi_mpi.py
in the pi_calculation repository, on the class
branch. Your job is to install MPI, and measure how much time it takes to complete in comparison to #46.
MPI
MPI allows parallelisation of computation. An MPI program consists of multiple processes, existing within a group called a communicator. The default communicator contains all available processes and is called MPI_COMM_WORLD
.
Each process has its own rank and can execute different code. A typical way of using MPI is to divide the computation into smaller chunks, have each process deal with a chunk, and
have one "main" process to coordinate this and gather all the results. The processes can communicate with each other in pre-determined ways as specified by the MPI protocol -- for example, sending and receiving data to a particular process, or broadcasting a message to all processes.
Preparation
We are going to run the original (non-numpy) version in parallel, and compare it to the non-parallel version.
We will be using mpi4py
, a Python library that gives us access to MPI functionality.
Install mpi4py
using conda:
conda install mpi4py -c conda-forge
or pip:
pip install mpi4py
On windows you will also need to install MS MPI.
The MPI version of the code is available at calc_pi_mpi.py
. Look at the file and try to identify what it is doing -- it's fine if you don't understand all the details! Can you see how the concepts in the brief description of MPI above are reflected in the code?
Execution
- Run the MPI version as:
The
mpiexec -n 4 python calc_pi_mpi.py
-n
argument controls how many processes you start. - Increase the number of points and proceses, and compare the time it takes against the normal version. Note that to pass arguments to the python file (like
-np
below), we have to give those after the file name.Tip: To avoid waiting for a long time, reduce the number of repetitions and iterations ofmpiexec -n 4 python calc_pi_mpi.py -np 10_000_000 python calc_pi.py -np 10_000_000 -n 1 -r 1
timeit
(1 and 1 in this example) - Think of these questions:
- Is the MPI-based implementation faster than the basic one?
- Is it faster than the
numpy
-based implementation? - When (for what programs or what settings) might it be faster/slower?
- How different is this version to the original? How easy is it to adapt to using MPI?
Friend group data model
- Turn your video cameras on!
- Choose one person who will share their screen.
- Fork the friend group repository to one of your accounts.
- Add everyone else in your group as a collaborator to the forked repository.
- Clone your fork locally.
- Create a new branch named with a combination of your team
e.g.,dpshelio-ageorgou-stefpiatek
. - Write your code in the file
group.py
to do what the exercise asks - see the instructions in the README file of the exercise repository. - Commit your changes to your branch! (with a meaningful message)
- Push your changes from your computer to your fork.
- Create a pull request (PR) from your branch to original friend-group repository.
Add a meaningful title to that PR and don't forget to mention your partners in the description (as@username
) and a link to this issue:Answers UCL-MPHY0021-21-22/RSE-Classwork#7
If you finish all of that, you can work on our stretch goal #๏ธโฃ8๏ธโฃ
Learning branches with git
In small groups, using github's visualisation tool as shown in class, do the steps needed to replicate the repository structure shown below.
โ hash numbers for the commit are going to be different and the final shape of the graph may look slightly different.
โ๏ธ When done, take a screenshot of your result and create an issue in your fork that includes the screenshot of the result, a code block with your steps and a link to this issue.
To add a code block with your steps use the following syntax:
```bash
git commit -m "First commit"
git ...
```
that will render as:
git commit -m "First commit"
git ...
To refer back to this issue you need to add the following text to your issue message:
Answers UCL-MPHY0021-21-22/RSE-Classwork#2
That creates a link that will appear under this issue.
Adding a Continuous Integration system
Use CI to run the tests for you.
It's always possible that we forget to run the tests before pushing to GitHub. Luckily, continuous integration (CI) platforms can help us catch failing tests even when we do forget to run them.
In this exercise, you will use GitHub Actions to do exactly this.
Set up Github Actions to run our tests for every commit we push to our repository.
- Get some initial familiarity with GitHub Actions by reading the quickstart guide
- Work collaboratively!
- Copy the code snippet below and paste it into
.github/workflows/python-tests.yaml
in your repository.
name: Pytest
on: [push]
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.9]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest
- name: Test with pytest
run: |
pytest
- Discuss with your group what you think each of the lines in
.github/workflows/python-tests.yml
does - Add
.github/workflows/python-tests.yml
to the repository, commit it and push it to github. Link to this issue on that commit by includingAnswers https://github.com/UCL-MPHY0021-21-22/RSE-Classwork/issues/19
in the commit message. - Check whether the tests pass on your remote (Hint: check the Actions tab on your GitHub fork)
If you're done with this issue, try to add test coverage by working on this related issue: #20
Sample solution and sample solution in action.
Working with the US Geological Survey earthquake data set
This exercise will look at how to read data from an online source (web service), explore and and process it.
- Fork the earthquakes repository and clone it on your computer.
- Read the description of the exercise in the README file.
- Start by getting and exploring the data (step 1), then complete the code to process it (step 2).
- When you are happy with your solution (or want some feedback!):
- Push your new code to your own fork.
- On GitHub, open a pull request from your fork to the original repository.
- In the description, include the text
Answers UCL-MPHY0021-21-22/RSE-Classwork#13
. This will list your PR to this issue. - On the PR text, comment on what you found difficult or interesting, or something you learned. If you have finished the exercise, also mention the answers you found (e.g. "The maximum magnitude is 3 and it occurred at coordinates (4.0, -3.8)."
- Choose one of the other pull requests listed on this issue, and leave a review. Comment on things you find interesting or don't understand, any problems you think you spot, good solutions or potential improvements.
- Mark the assignment on Moodle as complete, and fill in the short feedback form.
If you have questions or get stuck, ask on Moodle or book an office hours slot!
๐๐ code - linting
-
Analyse Charlene's package and run one of the linting tools
You may need to install them. Which messages did you get? Was your IDE (e.g., VS Code) warning you of it already? -
Fix them either manually or automatically using a code formatter (e.g., yapf or black)
-
Can you think of a way of checking the style is checked before merging new contributions? Add your suggestions below.
-
Share your solution even if it's a work-in-progress as a pull request to Charlene's repository mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#40
in the pull request description), remember to mention your team members too! (with@github_username
)
..
Argument Parsing 2/3
Carrying on from the previous exercise, now We will add an optional parameter to accept weights in your latest squares.py
file.
With weights
-
Choose who in your team is sharing now! (pull the code from the previous exercise into your local repository)
Hint: You need to add a new remote from your team member and pull their branch -
Create a new branch from the branch used in the previous exercise.
-
Open the file
squares.py
. Make sure you can run it from a terminal with some input values! -
Look at the part of the file that is using
argparse
-
Add a new argument that's optional and that can accept the weights as done previously with the
numbers
. The file should be runnable aspython squares.py <numbers>... --weights <weights>...
(where
<numbers>
and<weights>
should be replaced by the sequence of numbers and weights of your choice).- Look at the argparse documentation
- Add the weights as
argparser
arguments. - Check the auto-generated help:
python squares.py --help
. - Check that you can run the file with the new form, whether you put the weights or not.
-
Share your solution as a pull request mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#33
in the pull request description), remember to mention your team members too! (with@github_username
)
Using docstrings and doctests
This exercise will show why it is important to keep documentation accurate, and how to do this automatically using docstrings and doctests.
Setup
- Make sure you've had a look at the course notes on documentation so that you understand some of the background around docstrings and doctests
- Fork and clone the
average_squares
repository. (git clone [email protected]:<your_user_name>/average_squares.git
) - Open the
squares.py
file
Understanding
- Spend some time reading and understanding the code.
- Do you understand what it's meant to do? Do the docstrings help?
- Run the code with the default inputs. Does it produce the output you expect?
- Try running the code with other inputs. What happens?
Exercises
As you may have discovered, the code in squares.py
does contain some mistakes. Thankfully the functions in the file include documentation that explains how they should behave.
Run the doctests
- Use the
doctest
module to see whether the documentation of the code is accurate:python -m doctest squares.py
- Try to understand the structure of the output - what errors are reported, are they what you expected from looking at the code in the previous steps?
Update the docstrings
- Look at the errors related to the
average_of_squares
function.- Figure out where the mismatch between the documentation (intended behaviour) and the actual behaviour of the function exists.
- Correct usage examples in the
average_of_squares
function that are incorrect
Correct the code and verify
- Re-run the code; again comparing the actual and expected behaviour. What is the error?
- Correct the error in the code and rerun
doctest
to confirm that theaverage_of_squares
documentation is now correct
Repeat the process for convert_numbers
- Look at the
doctest
error from theconvert_numbers
documentation. - Can you identify the bug? How would you fix this?
Submit a Pull Request
Once you have completed or made progress on the exercises
- Create a pull request (PR) from your branch to the upstream repository. Add a meaningful title to that PR and a link to this issue:
Answers UCL-MPHY0021-21-22/RSE-Classwork#23
Code coverage
Knowing the coverage
Make sure you've installed pytest-cov
in your environment.
- Run the coverage and produce an html report.
- Visualise it by opening the html report in your browser
- Commit, push and link to this issue by including
Answers https://github.com/UCL-MPHY0021-21-22/RSE-Classwork/issues/20
in the commit message.
Ensure Github Actions also reports your coverage!
sample solution
pytest --cov="times" --cov-report html
Fork this repository and enable issues
Automating `git bisect` - part III
Continuation from #27.
Now that you have created two arrays and be able to read and save them.
Step 3
Use numpy
to test whether the output
array you read in #27 is equal than the expected
array you created in #26.
Hint
Numpy has a testing function to test/compare arrays: np.testing.assert_array_equal
React to this issue with a ๐ when your team has completed this task.
Refactoring - Part 1
For this exercise, we will look at how to rewrite (refactor) existing code in different ways, and what benefits each new structure offers.
We will work with some code that describes a group of acquaintances, as we saw in a previous exercise (issue #7).
Stage 1: Remove global variables
Look at the initial version of the file, which defines a specific group using a dictionary and offers some functions for modifying and processing it.
You may notice that the dictionary is a global variable: all the functions refer to it but do not take it as a parameter.
This situation can lead to difficulties (why?), so we will restructure the code to avoid it.
Rewrite the functions so that they take in the dictionary that they work on as an argument.
For example, the function that computes the average age should now look like:
def average_age(group):
all_ages = [person["age"] for person in group.values()]
return sum(all_ages) / len(group)
Your task:
- Fork the friend group if you haven't already
- Checkout the
week09
branch and go to theweek09/refactoring
directory. - Change
average_group
as above, and the other functions ofgroup.py
in a similar way. - Update the section at the end of the file (after
if __name__ == "__main__"
) to create the sample dictionary
there, and running of the functions that alter it. - Run your file to make sure the asserts still pass.
- Commit your changes!
- Create a pull request from your branch to the original friend-group repository and use the text in the description to link your PR to this issue
Answers UCL-MPHY0021-21-22/RSE-Classwork#43
- Think of the benefits and drawbacks of this approach compared to the original version.
- If you have time, think of other changes you consider useful and try them.
Branch
Automating `git bisect` - part IV
Continuation from #28.
Now that you have created two arrays, can read and save them, and compare expected values.
Step 4
Use subprocess
to call a system command from within python. The aim is for later to call Charlene's programme.
On a different python file (e.g., test_call_command.py
) start with this content:
import subprocess
subprocess.run(["ls", "-aF"])
and run it!
Are you using Windows and getting errors?
If you're on Windows, ideally use Git Bash (If you're on the Windows Command Prompt cmd
, you need to pass cmd
-compatible commands to subprocess, e.g. dir
instead of ls
). On Windows, you might also need to pass shell=True
as an additional argument.
subprocess.run(["ls", "-lh"], shell=True)
run it and iterate a couple of times changing the command that runs with others like:
ls -aF
(which lists files),echo 'hello world'
(which prints hello world in the screen),date +%Y%m%d
(which prints today's date with the selected format)wc -l sagittal_brain.py
(which counts the number of lines of thesagittal_brain.py
file)wc -c AFileThatDoesNot.exist
(which should counts the number of characters, but should fail as the file doesn't exist)
Once you've understood how subprocess.run
works, try to call Charlene's programme.
Hint
How do you call a Python script from the command line? ______ filename.py
.
React to this issue with a โค when your team has completed this task.
Argument Parsing 3/3
Carrying on from the previous exercise, now we will change the options so instead of reading the numbers from the command line, they are read from text files (one number per line), and keeping the optional parameter to accept weights also as a file.
Reading data from files
-
Choose who in your team is sharing now! (pull the code from the previous exercise into your local repository)
Hint: You need to add a new remote from your team member and pull their branch -
Create a new branch from the branch used in the previous exercise.
-
Open the file
squares.py
. Make sure you can run it from a terminal with some input values! -
Look at the part of the file that is using
argparse
-
Modify the arguments so the data is read from a text file. Where the
weights
are still optional. The file should be runnable as:python squares.py <file_numbers> --weights <file_weights>
where
<...>
should be replaced by the file of your choice.- Look at the working with data section to refresh how we read files in Python.
- Modify the
argparser
arguments to receive file names instead of numbers. - Check the auto-generated help:
python squares.py --help
. - Check that you can run the file with the new form, whether you pass a weights file or not.
-
Share your solution as a pull request mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#34
in the pull request description), remember to mention your team members too! (with@github_username
)
ezubs
Measuring performance and using numpy
This exercise is the first in a series. The series will look at the execution time of calculating ฯ using the same algorithm, but different implementations and libraries.
This exercise initially uses pure Python to accomplish this approximation of ฯ. The code is already written, and you can find it in the pi_calculation repository.
Your job is to understand the code, measure how much time it takes to complete, and then adapt it to use numpy instead of pure Python.
Step 1: Measuring how long code takes using timeit
The code uses the timeit
module from the standard library. There are different ways you can use timeit
: either as a module or from its own command-line interface. Check out timeit
's documentation to see the different possibilities. Our calc_pi.py
wraps the module implementation of timeit
, and provides a similar interface to the command line interface of timeit
.
Your task:
- Run the code with the default values using
python calc_pi.py
. - Now run it by specifying values for some arguments (to see which arguments you can use, use
--help
or look at the source code ) - In case you would like to time a function (like
calculate_pi
in this case) without writing all that boilerplate, you can runTry it!python -m timeit -n 100 -r 5 -s "from calc_pi import calculate_pi_timeit" "calculate_pi_timeit(10_000)()"
- Try to understand the source code in more depth:
- What does
calculate_pi_timeit
function do, roughly? - How does
timeit.repeat
work? - Why do we repeat the calculation multiple times?
- Can you think of any changes that could make the code faster?
- What does
Step 2: Using numpy
The course notes describe how using the numpy
library can lead to faster and more concise code.
Your task:
- Complete the file
calc_pi_np.py
so that it does the same ascalc_pi.py
, but uses numpy arrays instead of lists. Update the functions accordingly (you can change their arguments if it makes more sense for your new version).
Hint: Instead of creating nx
andy
values independently, generate an array of size(n, 2)
. - Which version of the code is faster, the one that uses
numpy
or the one that uses pure Python?
When you have completed the exercise, react to this issue using the available emojis, or post your comparison of times below!
Negative testing
Negative tests - Test that something that is expected to fail actually does fail
time_range
may still work when end_time
is before start_time
, but that may make overlap_time
not working as expected.
- Work collaboratively!
- Write a test that tries to generate a time range for a date going backward.
- Modify
time_range
to produce an error (ValueError
) with a meaningful message. - Use
pytest.raises
to check for that error (including the error message!) in the test. - Commit, push and link to this issue by including
Answers https://github.com/UCL-MPHY0021-21-22/RSE-Classwork/issues/18
in the commit message.
What other similar tests could we add?
Sample Solution
Generating documentation with Sphinx
This exercise will introduce you to the basics of Sphinx using the same code you looked at in the previous exercise (#23).
Setup
- Navigate to the
average-squares
folder that you used in the previous exercise.
(Note: You will be able to complete this exercise even if you haven't finished the previous one - the only difference is that some of your generated documentation will be different)
Understanding
- This folder contains a simple project that could do with some documentation.
- The code is within the
average_squares
repository and your task is to generate some documentation to go alongside it.
- The code is within the
Exercises
Getting started with Sphinx
- Ensure that you have Sphinx installed for your system
- Create a
docs
folder within theaverage_squares
folder - this is where your documentation for the project will be stored - From within the
docs
folder runsphinx-quickstart
to generate the scaffolding files that Sphinx needs- Ensure that you select
no
forSeparate source and build directories
- this should be the default but if chosen incorrectly will mean your folder structure won't match up to the instructions below - You can accept the defaults and enter sensible information for the other fields.
- Ensure that you select
- Run
sphinx-build . _build/html
ormake html
to generate html docs. - Run
python -m http.server -d _build/html/
and open the link shown by this command to see the built documentation in a browser.
Modifying index.rst
- Open the
index.rst
file - this is the master document that serves as the entrypoint and welcome page to the documentation. - Add a line or two about the purpose of the project
- Save and rebuild the documentation - verify that it builds correctly
Adding content and structure
- In the
docs
folder create a subfolder calledcontent
. - Within
docs/content
create a file calledaverage-squares-docs.rst
with the following contents:
Average Squares Documentation
=============================
- Update the
toctree
directive inindex.rst
so that this new file is included. - Rebuild the documentation and verify that this file is now linked to.
Using Docstrings to create documentation
As you saw in the previous exercise (#23) the code in this project contains some docstrings - let's show this in our Sphinx generated documentation
- Follow the instruction on the Sphinx getting started page to enable the
autodoc
function - Can you modify the
content/average-squares-docs.rst
file to include docstrings from the code automatically? - Hint: You may find it useful to modify the path setup in
docs/conf.py
in the following way so it is easier for Sphinx to find the location of the code
# -- Path setup --------------------------------------------------------------
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
Updating your PR
Commit the changes to your branch, updating the PR you created in the previous exercise. Add a comment with Answers UCL-MPHY0021-21-22/RSE-Classwork#24
Explore further features of Sphinx
There are many additional features of Sphinx - explore them if you have time. For example:
- Are you able to modify the theme?
- What other Sphinx extensions are available?
- A more extensive introduction to Sphinx is linked to on the Moodle resources.
Profiling code
Even when we measure the total time that a function takes to run (#46), that doesn't help us with knowing which parts of the code are slow!
To look into that, we need to use a different too called a profiler. Python comes with its own profiler, but we will use a more convenient tool.
Setup
This exercise will work with IPython or Jupyter notebooks, and will use two "magic" commands available there. You may need some steps to set them up first.
If you use Anaconda, you should already have access to Jupyter. If you don't, let us know on Moodle or use pip install ipython
to install IPython.
The %prun
magic should be already available with every installation of IPython/Jupyter. However, you may need to install the second magic (%lprun
).
If you use Anaconda, run conda install line_profiler
from a terminal. Otherwise, use pip install line_profiler
.
Using profiling tools in IPython/Jupyter notebook
prun
's magic gives us information about every function called.
- Open a Jupyter notebook or an IPython terminal.
- Add an interesting function (from Jake VanderPlas's book)
def sum_of_lists(N): total = 0 for i in range(5): L = [j ^ (j >> i) for j in range(N)] # j >> i == j // 2 ** i (shift j bits i places to the right) # j ^ i -> bitwise exclusive or; j's bit doesn't change if i's = 0, changes to complement if i's = 1 total += sum(L) return total
- run
%prun
:%prun sum_of_lists(10_000_000)
- Look at the table of results. What information does it give you? Can you find which operation takes the most time? (You may find it useful to look at the last column first)
Using a line profiler in IPython/Jupyter
While prun
presents its results by function, the lprun
magic gives us line-by-line details.
- Load the extension on your IPython shell or Jupyter notebook
%load_ext line_profiler
- Run
%lprun
%lprun -f sum_of_lists sum_of_lists(10_000_000)
- Can you interpret the results? On which line is most of the time spent?
Finishing up
When you are done, react to this issue using one of the available emojis, and/or comment with your findings: Which function takes the most time? Which line of the code?
Creating a ๐๐ฆ
Help Charlene to create her repository into a package (remember to commit after each step).
-
Choose who in your team is sharing now! (make sure you've got a fork and a local copy of Charlene's repository)
-
Add a
.gitignore
file to the repository to avoid adding artefacts created by python or your text editor. You can use gitignore.io to generate a file for your needs. -
Modify the repository directory structure to make sagittal_average as an installable package (don't forget to add empty
__init__.py
files, but there is no need to add the.md
files (yet!)). -
Add a
setup.py
file with the information needed. -
Try to install it by running
pip install -e .
from the command line, in the filesystem location where thesetup.py
is. -
Share your solution as a pull request to Charlene's repository mentioning this issue (by including the text
Addresses UCL-MPHY0021-21-22/RSE-Classwork#36
in the pull request description), remember to mention your team members too! (with@github_username
) -
Congratulations, you've created a Python package! Now, let's see what can be improved about it in the subsequent issues!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.